Bridging Codex CLI to OpenAI-Compatible Chat APIs with a Local Adapter

Many coding agents and CLIs are built around the OpenAI Responses API shape:

POST /v1/responses
structured input items
tool calls as first-class response output items
streamed Server-Sent Events for incremental output
previous_response_id for continuation

Many model providers, including self-hosted inference servers and hosted OpenAI-compatible APIs, expose a different shape:

POST /chat/completions
chat messages with role and content
tool calls attached to assistant messages
optional streaming deltas
no built-in response id based history

The mismatch is small enough that you do not need to fork the CLI or rewrite the agent. A local adapter can sit between the CLI and the upstream provider:

Codex CLI
  -> http://127.0.0.1:5810/v1/responses
  -> local adapter
  -> https://api.provider.example/chat/completions
  -> local adapter
  -> Responses-compatible JSON or SSE
  -> Codex CLI

This gist describes the adapter pattern, the edge cases that matter for tool-using coding agents, and a sanitized DeepSeek configuration example.

Why use a local adapter?

Direct provider configuration works only when the provider implements the same protocol the client expects. “OpenAI-compatible” usually means Chat Completions compatibility, not full Responses API compatibility.

A local adapter gives you:

no changes to the CLI
one place to normalize provider quirks
safe local handling of API keys through environment variables
consistent tool-call behavior across providers
better debugging with request and response logs
a migration path between local models, hosted OpenAI-compatible APIs, and provider-specific APIs

For coding agents, the most important part is tool calling. A model can write plausible text all day, but if tool calls are dropped, malformed, or not replayed into later turns, the agent stops being useful.

Minimal Codex provider configuration

Configure Codex to talk to the local adapter instead of the upstream provider directly:

[profiles.deepseek]
model = "deepseek-v4-pro"
model_provider = "deepseek_proxy"

[model_providers.deepseek_proxy]

name = “deepseek_proxy” base_url = “http://127.0.0.1:5810/v1” env_key = “DEEPSEEK_API_KEY”

Then run the adapter locally:

export DEEPSEEK_API_KEY="sk-..."
python3 codex_deepseek.py

The CLI sends requests to http://127.0.0.1:5810/v1/responses. The adapter forwards them to the provider’s Chat Completions endpoint.

DeepSeek’s public OpenAI-compatible base URL is documented as:

https://api.deepseek.com

So the upstream request target is:

https://api.deepseek.com/chat/completions

Verify model names against the provider docs before publishing your own wrapper. At the time this note was written, DeepSeek’s docs listed deepseek-v4-flash and deepseek-v4-pro, with deepseek-chat and deepseek-reasoner kept for compatibility and scheduled for deprecation on 2026-07-24.

Adapter responsibilities

The adapter is not just a dumb HTTP proxy. It performs protocol translation.

1. Convert Responses input items into chat messages

Responses input can contain multiple item types. A chat endpoint wants an ordered list of messages:

def convert_input_item(item):
    if isinstance(item, str):
        return {"role": "user", "content": item}

    if item["type"] == "message":
        text = "".join(
            part.get("text", "")
            for part in item.get("content", [])
            if part.get("type") == "input_text"
        )
        return {"role": sanitize_role(item["role"]), "content": text}

    if item["type"] == "function_call_output":
        return {
            "role": "tool",
            "tool_call_id": item["call_id"],
            "content": item.get("output", ""),
        }

    if item["type"] == "function_call":
        return {
            "role": "assistant",
            "content": None,
            "tool_calls": [{
                "id": item["call_id"],
                "type": "function",
                "function": {
                    "name": item["name"],
                    "arguments": item["arguments"],
                },
            }],
        }

    return {"role": "user", "content": json.dumps(item)}

Most Chat Completions APIs do not support a developer role. Map it to system, or merge it into the first system message.

2. Merge instructions and system messages

Some providers are sensitive to multiple system messages or unexpected message ordering. A robust adapter builds one leading system message:

system_parts = []

if body.get("instructions"):
    system_parts.append(body["instructions"])

for message in messages:
    if message["role"] == "system":
        system_parts.append(message.get("content") or "")
    else:
        non_system_messages.append(message)

merged_messages = []
if system_parts:
    merged_messages.append({
        "role": "system",
        "content": "\n\n".join(system_parts),
    })
merged_messages.extend(non_system_messages)

This keeps the upstream prompt shape predictable.

3. Convert tools

Responses tools are already close to Chat Completions tools. The adapter mainly normalizes the function schema:

def convert_tools(tools):
    out = []
    for tool in tools or []:
        if tool.get("type") != "function":
            continue
        out.append({
            "type": "function",
            "function": {
                "name": tool["name"],
                "description": tool.get("description", ""),
                "parameters": tool.get("parameters", {}),
            },
        })
    return out or None

If the provider requires strict function schemas, this is the right layer to add strict: true or reject unsupported schemas before the upstream request.

4. Build the upstream chat request

The upstream request should be explicit:

chat_request = {
    "model": body.get("model"),
    "messages": build_messages(body),
    "stream": False,
}

tools = convert_tools(body.get("tools"))
if tools:
    chat_request["tools"] = tools
    chat_request["tool_choice"] = body.get("tool_choice", "auto") or "auto"

if body.get("max_output_tokens") is not None:
    chat_request["max_tokens"] = body["max_output_tokens"]

For provider compatibility, it is often better to force upstream stream: false and synthesize a Responses-compatible stream locally. This avoids a common failure mode where provider streaming emits text deltas but omits or corrupts final tool_calls.

The CLI can still receive streaming SSE. The adapter is simply faking that stream from a complete upstream response.

5. Translate assistant output back into Responses output items

Chat Completions usually returns:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "text",
      "tool_calls": []
    }
  }]
}

Responses expects output items:

{
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {"type": "output_text", "text": "text", "annotations": []}
      ]
    },
    {
      "type": "function_call",
      "call_id": "call_...",
      "name": "shell",
      "arguments": "{\"cmd\":\"ls\"}"
    }
  ]
}

Tool calls should be emitted as separate function_call output items, not hidden inside text.

6. Fake Responses SSE when the client asks for streaming

When the CLI asks for stream: true, send a valid Responses event sequence:

data: {"type":"response.created",...}

data: {"type":"response.in_progress",...}

data: {"type":"response.output_item.added",...}

data: {"type":"response.output_text.delta",...}

data: {"type":"response.output_text.done",...}

data: {"type":"response.output_item.done",...}

data: {"type":"response.completed",...}

data: [DONE]

For function calls, emit:

response.output_item.added
response.function_call_arguments.delta
response.function_call_arguments.done
response.output_item.done

The important part is consistency. The CLI should see the same logical output whether the upstream provider streamed or not.

7. Maintain conversation history

Chat Completions is stateless. Responses uses previous_response_id.

The adapter can bridge this by storing history in memory:

conversation_histories[response_id] = previous_messages + [assistant_message]

When the next request includes previous_response_id, prepend the stored messages before adding the new input.

This does not need to be durable for an interactive CLI session. In-memory state is enough if the adapter process lives for the duration of the session.

DeepSeek-specific notes

DeepSeek supports the OpenAI Chat Completions shape and tool calls. Its thinking mode adds a provider-specific field named reasoning_content.

For a coding agent with tools, there are two practical options:

Disable thinking mode in the adapter for the most predictable tool-call loop.
If thinking mode is enabled, preserve the provider-required reasoning metadata exactly as the provider expects in later turns.

For a simple and robust adapter, default to disabled thinking:

{
  "thinking": {"type": "disabled"}
}

Then expose an environment variable for experiments:

export DEEPSEEK_THINKING=enabled

The adapter should also avoid forwarding unsupported parameters when thinking mode is enabled if the provider documents them as ignored or invalid.

Operational safety

Do not hardcode keys:

export DEEPSEEK_API_KEY="sk-..."

Do not publish:

internal service domains
private model registry paths
hardcoded API keys
company names
customer names
local filesystem paths
request logs containing prompts or proprietary code

Useful logs are metadata-only:

model=deepseek-v4-pro stream=true messages=12 tools=5 status=200

Avoid logging full prompts or full upstream responses unless you are in a private debugging environment.

Failure modes to test

Before trusting the adapter, run these tests:

plain text response with stream: false
plain text response with stream: true
one tool call
multiple tool calls
tool call followed by tool result followed by final answer
invalid tool arguments
upstream 400 or 500 passthrough
missing API key
long conversation using previous_response_id
provider timeout

The fastest smoke test is a tool-requiring prompt such as:

List the files in the current directory, then tell me how many Markdown files exist.

If the CLI actually runs the filesystem tool and uses the result, the adapter is doing the important part correctly.

Mental model

The adapter is a protocol shim, not a model wrapper.

Keep it small:

normalize request shape
preserve tool calls
preserve conversation history
normalize response shape
surface upstream errors clearly

Everything else should be configuration.

That separation makes it easy to swap upstreams later:

Codex profile -> local adapter -> DeepSeek
Codex profile -> local adapter -> self-hosted vLLM
Codex profile -> local adapter -> llama.cpp server
Codex profile -> local adapter -> another OpenAI-compatible API

Once this boundary is clean, adding a new provider is mostly a matter of setting:

TARGET_URL=...
MODEL=...
API_KEY=...

and keeping the Responses-to-Chat translation stable.

References

DeepSeek API quick start: https://api-docs.deepseek.com/
DeepSeek Chat Completion API: https://api-docs.deepseek.com/api/create-chat-completion
DeepSeek function calling guide: https://api-docs.deepseek.com/guides/function_calling/
DeepSeek thinking mode guide: https://api-docs.deepseek.com/guides/thinking_mode