Fix DeepSeek JSON Output Issues

DeepSeek R1 leaks <think> reasoning tags into your JSON responses. DeepSeek V3 wraps JSON in markdown fences. Both break json.loads().

Applies to: DeepSeek-R1, DeepSeek-R1-Distill, DeepSeek-V3, DeepSeek-Coder-V2 via OpenRouter, together.ai, or direct API

Issue 1: `<think>` tag leakage

DeepSeek R1 is a chain-of-thought model. It "thinks" before answering, and this reasoning is supposed to be in a separate field. But with many API providers and prompt styles, the <think> block bleeds into the content field you parse.

❌ What you get from DeepSeek R1

<think>
The user wants customer data as JSON. I should return
name and email fields. Let me format this correctly...
</think>
{"name": "Alice", "email": "alice@example.com"}

❌ The error

content = resp.choices[0].message.content
json.loads(content)
# JSONDecodeError: Expecting value: line 1 column 1 (char 0)
# (because content starts with "<think>", not "{")

✅ Fix — strip <think> blocks

import re, json

def strip_think_tags(text: str) -> str:
    """Remove <think>...</think> blocks from DeepSeek R1 output."""
    return re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL).strip()

content = resp.choices[0].message.content
cleaned = strip_think_tags(content)
data = json.loads(cleaned)  # ✅ {"name": "Alice", "email": "alice@example.com"}

Streaming note: In streaming mode, <think> blocks arrive token-by-token before the JSON starts. You need to buffer and detect the closing </think> tag before beginning JSON assembly. StreamFix handles this automatically.

Issue 2: Markdown fence wrapping

DeepSeek V3 (and R1 in non-streaming mode) frequently wraps JSON in markdown code blocks even when explicitly asked not to. This is the single most common JSON failure across all models — our benchmark found 95.5% of failures are fences.

❌ What DeepSeek V3 returns

```json
{
  "order_id": "ORD-1234",
  "status": "shipped",
  "items": [{"sku": "A1", "qty": 2}]
}
```

✅ Fix — strip fences

import re, json

def parse_llm_json(text: str) -> dict:
    """Parse JSON from LLM output — strips fences and think tags."""
    # 1. Strip <think> blocks (DeepSeek R1)
    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
    # 2. Extract content from fenced blocks if present
    fence_match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
    if fence_match:
        text = fence_match.group(1)
    # 3. Find first JSON object or array
    text = text.strip()
    return json.loads(text)

content = resp.choices[0].message.content
data = parse_llm_json(content)  # ✅ works on all DeepSeek models

Issue 3: Streaming with DeepSeek R1

Streaming DeepSeek R1 via OpenRouter adds another layer: the <think> content arrives as regular content delta tokens before the JSON. If you try to parse mid-stream you'll hit errors on the reasoning text.

✅ Fix — detect JSON start before parsing

from openai import OpenAI
import json, re

client = OpenAI(api_key="...", base_url="https://openrouter.ai/api/v1")

stream = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Return JSON: {name, score}"}],
    stream=True,
)

buffer = ""
in_think = False
json_started = False
json_chunks = []

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    buffer += delta

    # Skip <think> blocks
    if "<think>" in buffer: in_think = True
    if "</think>" in buffer: in_think = False; buffer = buffer.split("</think>")[-1]
    if in_think: continue

    # Start collecting once JSON begins
    if not json_started and ('{' in buffer or '[' in buffer):
        json_started = True

    if json_started:
        json_chunks.append(delta)

raw_json = "".join(json_chunks).strip()
raw_json = re.sub(r'```(?:json)?|```', '', raw_json).strip()
data = json.loads(raw_json)  # ✅

DeepSeek model behavior reference

Model	<think> leakage	Fence wrapping	Streaming issues
deepseek/deepseek-r1	High	Medium	High
deepseek/deepseek-r1-distill-llama-70b	High	Medium	Medium
deepseek/deepseek-chat (V3)	None	High	Medium
deepseek/deepseek-coder-v2	None	Medium	Low

Based on plain-prompt testing (no response_format or structured output params). Results vary by provider.

Handle all DeepSeek quirks automatically

StreamFix strips <think> tags and markdown fences in real-time during streaming. Works with any DeepSeek model via OpenRouter — one base_url change.

from openai import OpenAI

client = OpenAI(
    api_key="sk_YOUR_STREAMFIX_KEY",
    base_url="https://streamfix.up.railway.app/v1",
)

# <think> tags and fences stripped automatically
resp = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Return JSON: {name, score}"}],
)
data = json.loads(resp.choices[0].message.content)  # ✅ always works