Home / OpenAI Tool Call Arguments Fix

OpenAI Tool Call Arguments: Why They Break

Without strict: true, LLM tool call arguments can be malformed JSON. OpenAI's Structured Outputs (August 2024) solved this for OpenAI models — but multi-provider setups and open models still break. Here's what happens and how to handle it.

Applies to: GPT-4o, GPT-4o-mini, GPT-4-turbo, and non-OpenAI providers (Llama, Mixtral, Qwen) via OpenRouter, together.ai, Fireworks

The current state of tool call JSON

OpenAI Structured Outputs (August 2024): Setting strict: true in your function definition guarantees that arguments match your JSON Schema. If you use only OpenAI models with strict mode, tool args are reliable.

Without strict mode, or with non-OpenAI models: arguments are best-effort. OpenAI's older docs warned: "the model does not always generate valid JSON." That warning applied to non-strict calls and remains true for open models (Llama, Mistral, Qwen) served through OpenRouter or similar aggregators.

The problem is real when you use multi-provider setups, open-weight models, or older function calling without strict mode. When you call json.loads(tool_call.function.arguments) in those cases, you're trusting the model to produce syntactically perfect JSON. It doesn't always.

What breaks in practice

Consider a tool definition for search_products with a moderately complex schema — nested filters, enums, optional fields. The model returns something that looks like JSON but isn't quite right.

Tool definition

tools = [{
    "type": "function",
    "function": {
        "name": "search_products",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "filters": {
                    "type": "object",
                    "properties": {
                        "category": {"type": "string", "enum": ["electronics", "clothing", "home"]},
                        "price_max": {"type": "number"},
                        "in_stock": {"type": "boolean"},
                    }
                },
                "sort_by": {"type": "string", "enum": ["relevance", "price", "rating"]},
            },
            "required": ["query"]
        }
    }
}]

❌ Raw arguments string returned by the model

# tool_call.function.arguments contains:
{"query": "wireless headphones", "filters": {"category": "electronics", "price_max": 100, "in_stock": true,}, "sort_by": "rating",}
#                                                                                              ^                          ^
#                                                                                     trailing comma           trailing comma

❌ The error

import json

tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# json.decoder.JSONDecodeError: Expecting property name: line 1 column 95 (char 94)

Common malformations in tool call arguments include trailing commas, single-quoted strings, unquoted property names, truncated output (JSON cut off mid-key), and hallucinated parameters not in your schema.

When it's worse: non-OpenAI providers

OpenAI's own models (GPT-4o, GPT-4o-mini) have relatively low tool argument failure rates thanks to constrained decoding. But if you route through OpenRouter to use open-weight models, the numbers get much worse.

We tested 288 tool calls across 4 models using schemas of varying complexity. The results:

Model	Tool calls tested	Malformed args
meta-llama/llama-3.3-70b-instruct	72	71%
mistralai/mixtral-8x22b-instruct	72	44%
qwen/qwen-2.5-72b-instruct	72	28%
openai/gpt-4o-mini	72	3%

Complex schemas (nested objects, enums, arrays). See full benchmark methodology and results.

Key takeaway: If your agent supports model switching or uses open-weight models, you will encounter malformed tool arguments. Handling this isn't optional.

Fix 1: Retry on parse failure

The simplest approach: if json.loads() fails on the tool arguments, retry the entire API call and hope the model generates valid JSON on the next attempt.

✅ Retry loop

import json
from openai import OpenAI

client = OpenAI()

def call_with_tools(messages, tools, max_retries=3):
    for attempt in range(max_retries):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        tool_call = response.choices[0].message.tool_calls[0]
        try:
            args = json.loads(tool_call.function.arguments)
            return tool_call.function.name, args
        except json.JSONDecodeError:
            continue
    raise ValueError("Failed to get valid tool args after retries")

Caveats: Each retry costs tokens and adds latency (1-3 seconds per call). If the model consistently fails on a particular schema, retrying won't help. And with non-OpenAI models that fail 40-70% of the time, you'll burn through retries fast.

Fix 2: Manual JSON repair

Strip the most common syntax issues from the arguments string before parsing. This catches trailing commas, which account for a large share of failures.

✅ Regex-based repair

import re, json

def repair_tool_args(args_str: str) -> dict:
    """Attempt to fix common JSON issues in tool call arguments."""
    # Remove trailing commas before } or ]
    args_str = re.sub(r',\s*}', '}', args_str)
    args_str = re.sub(r',\s*]', ']', args_str)
    return json.loads(args_str)

# Usage
tool_call = response.choices[0].message.tool_calls[0]
try:
    args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
    args = repair_tool_args(tool_call.function.arguments)

Caveats: This only fixes trailing commas. It won't handle single-quoted strings, unquoted keys, truncated JSON, control characters inside strings, or any of the dozen other ways models produce broken arguments. Building a robust repair function is significantly harder than it looks.

Fix 3: Proxy-level repair with StreamFix

StreamFix sits between your code and the model provider. It intercepts tool_calls in the response and repairs the arguments JSON before it reaches your application. Works on both streaming and non-streaming responses.

✅ One base_url change

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://streamfix.dev/v1",
    api_key="sk_YOUR_STREAMFIX_KEY",
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    tools=[{
        "type": "function",
        "function": {
            "name": "search_products",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "filters": {"type": "object"},
                },
                "required": ["query"]
            }
        }
    }],
    messages=[{"role": "user", "content": "Find wireless headphones under $100"}],
)

# tool_call.function.arguments is guaranteed parseable
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)  # ✅ always valid JSON

StreamFix repairs trailing commas, single-quoted strings, unquoted keys, truncated JSON, control characters, and other malformations. It works with any model — OpenAI, Llama, Mixtral, Qwen, DeepSeek — routed through OpenRouter or direct.

Stop tool calls from breaking your agent

When json.loads(tool_call.function.arguments) throws, your agent loop crashes. StreamFix repairs tool call arguments in-flight — one base_url change, zero code changes to your tool handling.

from openai import OpenAI

client = OpenAI(
    base_url="https://streamfix.dev/v1",
    api_key="sk_YOUR_STREAMFIX_KEY",
)

# Your existing tool-calling code works unchanged.
# StreamFix ensures tool_call.function.arguments
# is always valid, parseable JSON.
response = client.chat.completions.create(
    model="openai/gpt-4o",
    tools=my_tools,
    messages=my_messages,
)
for tc in response.choices[0].message.tool_calls:
    args = json.loads(tc.function.arguments)  # ✅ guaranteed
    result = execute_tool(tc.function.name, args)

Get Free API Key →

Related guides

Agent tool-call JSON errors benchmark →

288 tool calls across 4 models: full failure rates and analysis

Fix LangChain OutputParserException →

Tool args break LangChain agents too — 4 solutions compared

Validate LLM JSON against schema →

Beyond syntax: guarantee tool output matches your expected structure

All guides →

LLM JSON repair patterns, integration guides, and benchmarks