Many coding agents and CLIs are built around the OpenAI Responses API shape:
POST /v1/responses- structured input items
- tool calls as first-class response output items
- streamed Server-Sent Events for incremental output
previous_response_idfor continuation
Many model providers, including self-hosted inference servers and hosted OpenAI-compatible APIs, expose a different shape:
POST /chat/completions- chat messages with
roleandcontent - tool calls attached to assistant messages
- optional streaming deltas
- no built-in response id based history
The mismatch is small enough that you do not need to fork the CLI or rewrite the agent. A local adapter can sit between the CLI and the upstream provider:
Codex CLI
-> http://127.0.0.1:5810/v1/responses
-> local adapter
-> https://api.provider.example/chat/completions
-> local adapter
-> Responses-compatible JSON or SSE
-> Codex CLI
This gist describes the adapter pattern, the edge cases that matter for tool-using coding agents, and a sanitized DeepSeek configuration example.
Why use a local adapter?
Direct provider configuration works only when the provider implements the same protocol the client expects. “OpenAI-compatible” usually means Chat Completions compatibility, not full Responses API compatibility.
A local adapter gives you:
- no changes to the CLI
- one place to normalize provider quirks
- safe local handling of API keys through environment variables
- consistent tool-call behavior across providers
- better debugging with request and response logs
- a migration path between local models, hosted OpenAI-compatible APIs, and provider-specific APIs
For coding agents, the most important part is tool calling. A model can write plausible text all day, but if tool calls are dropped, malformed, or not replayed into later turns, the agent stops being useful.
Minimal Codex provider configuration
Configure Codex to talk to the local adapter instead of the upstream provider directly:
[profiles.deepseek]
model = "deepseek-v4-pro"
model_provider = "deepseek_proxy"
[model_providers.deepseek_proxy]
name = “deepseek_proxy” base_url = “http://127.0.0.1:5810/v1” env_key = “DEEPSEEK_API_KEY”
Then run the adapter locally:
export DEEPSEEK_API_KEY="sk-..."
python3 codex_deepseek.py
The CLI sends requests to http://127.0.0.1:5810/v1/responses. The adapter forwards them to the provider’s Chat Completions endpoint.
DeepSeek’s public OpenAI-compatible base URL is documented as:
https://api.deepseek.com
So the upstream request target is:
https://api.deepseek.com/chat/completions
Verify model names against the provider docs before publishing your own wrapper. At the time this note was written, DeepSeek’s docs listed deepseek-v4-flash and deepseek-v4-pro, with deepseek-chat and deepseek-reasoner kept for compatibility and scheduled for deprecation on 2026-07-24.
Adapter responsibilities
The adapter is not just a dumb HTTP proxy. It performs protocol translation.
1. Convert Responses input items into chat messages
Responses input can contain multiple item types. A chat endpoint wants an ordered list of messages:
def convert_input_item(item):
if isinstance(item, str):
return {"role": "user", "content": item}
if item["type"] == "message":
text = "".join(
part.get("text", "")
for part in item.get("content", [])
if part.get("type") == "input_text"
)
return {"role": sanitize_role(item["role"]), "content": text}
if item["type"] == "function_call_output":
return {
"role": "tool",
"tool_call_id": item["call_id"],
"content": item.get("output", ""),
}
if item["type"] == "function_call":
return {
"role": "assistant",
"content": None,
"tool_calls": [{
"id": item["call_id"],
"type": "function",
"function": {
"name": item["name"],
"arguments": item["arguments"],
},
}],
}
return {"role": "user", "content": json.dumps(item)}
Most Chat Completions APIs do not support a developer role. Map it to system, or merge it into the first system message.
2. Merge instructions and system messages
Some providers are sensitive to multiple system messages or unexpected message ordering. A robust adapter builds one leading system message:
system_parts = []
if body.get("instructions"):
system_parts.append(body["instructions"])
for message in messages:
if message["role"] == "system":
system_parts.append(message.get("content") or "")
else:
non_system_messages.append(message)
merged_messages = []
if system_parts:
merged_messages.append({
"role": "system",
"content": "\n\n".join(system_parts),
})
merged_messages.extend(non_system_messages)
This keeps the upstream prompt shape predictable.
3. Convert tools
Responses tools are already close to Chat Completions tools. The adapter mainly normalizes the function schema:
def convert_tools(tools):
out = []
for tool in tools or []:
if tool.get("type") != "function":
continue
out.append({
"type": "function",
"function": {
"name": tool["name"],
"description": tool.get("description", ""),
"parameters": tool.get("parameters", {}),
},
})
return out or None
If the provider requires strict function schemas, this is the right layer to add strict: true or reject unsupported schemas before the upstream request.
4. Build the upstream chat request
The upstream request should be explicit:
chat_request = {
"model": body.get("model"),
"messages": build_messages(body),
"stream": False,
}
tools = convert_tools(body.get("tools"))
if tools:
chat_request["tools"] = tools
chat_request["tool_choice"] = body.get("tool_choice", "auto") or "auto"
if body.get("max_output_tokens") is not None:
chat_request["max_tokens"] = body["max_output_tokens"]
For provider compatibility, it is often better to force upstream stream: false and synthesize a Responses-compatible stream locally. This avoids a common failure mode where provider streaming emits text deltas but omits or corrupts final tool_calls.
The CLI can still receive streaming SSE. The adapter is simply faking that stream from a complete upstream response.
5. Translate assistant output back into Responses output items
Chat Completions usually returns:
{
"choices": [{
"message": {
"role": "assistant",
"content": "text",
"tool_calls": []
}
}]
}
Responses expects output items:
{
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": "text", "annotations": []}
]
},
{
"type": "function_call",
"call_id": "call_...",
"name": "shell",
"arguments": "{\"cmd\":\"ls\"}"
}
]
}
Tool calls should be emitted as separate function_call output items, not hidden inside text.
6. Fake Responses SSE when the client asks for streaming
When the CLI asks for stream: true, send a valid Responses event sequence:
data: {"type":"response.created",...}
data: {"type":"response.in_progress",...}
data: {"type":"response.output_item.added",...}
data: {"type":"response.output_text.delta",...}
data: {"type":"response.output_text.done",...}
data: {"type":"response.output_item.done",...}
data: {"type":"response.completed",...}
data: [DONE]
For function calls, emit:
response.output_item.added
response.function_call_arguments.delta
response.function_call_arguments.done
response.output_item.done
The important part is consistency. The CLI should see the same logical output whether the upstream provider streamed or not.
7. Maintain conversation history
Chat Completions is stateless. Responses uses previous_response_id.
The adapter can bridge this by storing history in memory:
conversation_histories[response_id] = previous_messages + [assistant_message]
When the next request includes previous_response_id, prepend the stored messages before adding the new input.
This does not need to be durable for an interactive CLI session. In-memory state is enough if the adapter process lives for the duration of the session.
DeepSeek-specific notes
DeepSeek supports the OpenAI Chat Completions shape and tool calls. Its thinking mode adds a provider-specific field named reasoning_content.
For a coding agent with tools, there are two practical options:
- Disable thinking mode in the adapter for the most predictable tool-call loop.
- If thinking mode is enabled, preserve the provider-required reasoning metadata exactly as the provider expects in later turns.
For a simple and robust adapter, default to disabled thinking:
{
"thinking": {"type": "disabled"}
}
Then expose an environment variable for experiments:
export DEEPSEEK_THINKING=enabled
The adapter should also avoid forwarding unsupported parameters when thinking mode is enabled if the provider documents them as ignored or invalid.
Operational safety
Do not hardcode keys:
export DEEPSEEK_API_KEY="sk-..."
Do not publish:
- internal service domains
- private model registry paths
- hardcoded API keys
- company names
- customer names
- local filesystem paths
- request logs containing prompts or proprietary code
Useful logs are metadata-only:
model=deepseek-v4-pro stream=true messages=12 tools=5 status=200
Avoid logging full prompts or full upstream responses unless you are in a private debugging environment.
Failure modes to test
Before trusting the adapter, run these tests:
- plain text response with
stream: false - plain text response with
stream: true - one tool call
- multiple tool calls
- tool call followed by tool result followed by final answer
- invalid tool arguments
- upstream 400 or 500 passthrough
- missing API key
- long conversation using
previous_response_id - provider timeout
The fastest smoke test is a tool-requiring prompt such as:
List the files in the current directory, then tell me how many Markdown files exist.
If the CLI actually runs the filesystem tool and uses the result, the adapter is doing the important part correctly.
Mental model
The adapter is a protocol shim, not a model wrapper.
Keep it small:
- normalize request shape
- preserve tool calls
- preserve conversation history
- normalize response shape
- surface upstream errors clearly
Everything else should be configuration.
That separation makes it easy to swap upstreams later:
Codex profile -> local adapter -> DeepSeek
Codex profile -> local adapter -> self-hosted vLLM
Codex profile -> local adapter -> llama.cpp server
Codex profile -> local adapter -> another OpenAI-compatible API
Once this boundary is clean, adding a new provider is mostly a matter of setting:
TARGET_URL=...
MODEL=...
API_KEY=...
and keeping the Responses-to-Chat translation stable.
References
- DeepSeek API quick start: https://api-docs.deepseek.com/
- DeepSeek Chat Completion API: https://api-docs.deepseek.com/api/create-chat-completion
- DeepSeek function calling guide: https://api-docs.deepseek.com/guides/function_calling/
- DeepSeek thinking mode guide: https://api-docs.deepseek.com/guides/thinking_mode