Fix Pydantic ValidationError in LLM Streaming
Complete guide to solving validation errors when streaming structured output from OpenAI, Anthropic, and OpenRouter.
The Problem
pydantic_core._pydantic_core.ValidationError:
1 validation error for User
Input should be a valid dictionary or instance of User [type=model_type, input_value='{"name": "Alice",}', input_type=str]
Why This Happens
When streaming LLM responses with Instructor or Pydantic, you hit three common issues:
- Incomplete chunks - SSE streams send partial JSON:
{"name": "Al - Syntax errors - LLMs add trailing commas, markdown fences, unquoted keys
- Premature validation - Pydantic tries to parse before stream completes
Solution 1: Use Instructor's Partial Mode (Client-Side)
Best for: When you control the client and want progressive updates.
from instructor import Instructor, Partial
from pydantic import BaseModel
import openai
class User(BaseModel):
name: str
age: int
email: str
client = Instructor(openai.OpenAI())
# Stream partial objects - no validation errors!
for partial_user in client.chat.completions.create(
model="gpt-4o-mini",
response_model=Partial[User], # Key: Partial wrapper
messages=[{"role": "user", "content": "Extract: Alice, 30, alice@example.com"}],
stream=True,
):
print(partial_user)
# User(name='Alice', age=None, email=None)
# User(name='Alice', age=30, email=None)
# User(name='Alice', age=30, email='alice@example.com') ✓ Complete
Pros: Native Instructor support, fields populate progressively
Cons: Doesn't fix syntax errors (trailing commas, markdown fences)
Solution 2: DIY Regex Repair (Free, Simple)
Best for: Single provider (OpenAI/Anthropic), known error patterns, want full control.
import re
import json
from pydantic import BaseModel
def repair_json(text: str) -> str:
# Strip markdown fences (95% of errors)
text = re.sub(r'^```(?:json)?\n?', '', text)
text = re.sub(r'\n?```$', '', text)
# Remove trailing commas
text = re.sub(r',\s*}', '}', text)
text = re.sub(r',\s*]', ']', text)
return text.strip()
class User(BaseModel):
name: str
age: int
# In your streaming loop:
accumulated = ""
for chunk in stream:
accumulated += chunk.choices[0].delta.content or ""
try:
repaired = repair_json(accumulated)
user = User.model_validate_json(repaired)
except json.JSONDecodeError:
# Still fails? Add more patterns above
pass
Pros: Free, no dependencies, full control
Cons: Maintenance burden, need to handle each error type, breaks on edge cases
Solution 3: Use a JSON Repair Library/Proxy
Best for: Multi-provider setups (OpenRouter), don't want to maintain regex, need proven reliability.
Options:
- json-repair library (4.5k stars) - Python/JS/Go, local processing, battle-tested
- StreamFix - Hosted proxy (disclosure: this site), 98.4% benchmark success, 1000 free credits
- Custom Lambda/Edge function - Roll your own with json-repair library
# Example with StreamFix (similar for any proxy)
import instructor
from openai import OpenAI
client = instructor.from_openai(
OpenAI(
base_url="https://streamfix.up.railway.app/v1",
api_key="YOUR_KEY"
)
)
user = client.chat.completions.create(
model="openai/gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "..."}],
stream=True
)
Pros: No maintenance, handles edge cases, tested across models
Cons: Extra latency (~50ms), adds dependency, costs money at scale
When NOT to use: Single provider with good reliability (just use Partial mode), latency-critical apps
Which Solution to Pick?
from instructor import Instructor, Partial
from openai import OpenAI
client = Instructor(
OpenAI(
base_url="https://streamfix.up.railway.app/v1",
api_key="YOUR_STREAMFIX_KEY"
)
)
# Get both: syntax repair AND progressive updates
for partial_user in client.chat.completions.create(
model="openai/gpt-4o-mini",
response_model=Partial[User], # Progressive
messages=[{"role": "user", "content": "..."}],
stream=True,
):
print(partial_user) # Safe updates even with malformed JSON
Common Error Patterns & Fixes
Error: Trailing comma
{"name": "Alice", "age": 30,}
Fix: StreamFix removes automatically. Instructor alone can't fix this.
Error: Markdown fence
```json
{"name": "Alice"}
```
Fix: 95.5% of failures in our benchmark. StreamFix strips markdown blocks.
Error: Unquoted keys
{name: "Alice", age: 30}
Fix: StreamFix quotes keys. Happens with older models.
Error: Incomplete stream
{"name": "Alice", "age":
Fix: Use Instructor's Partial mode OR wait for complete message.
Performance Comparison
| Approach | Success Rate | Handles Syntax Errors | Progressive Updates |
|---|---|---|---|
| Raw (no solution) | 33.3% | ❌ | ❌ |
| Instructor Partial | ~60% | ❌ | ✅ |
| StreamFix | 98.4% | ✅ | ✅ (with streaming) |
| StreamFix + Instructor Partial | 98.4% | ✅ | ✅ |
Quick Start
# 1. Get your API key (1000 free credits)
curl -X POST https://streamfix.up.railway.app/account/create?email=you@example.com
# 2. Update your code (one line change)
client = OpenAI(
base_url="https://streamfix.up.railway.app/v1",
api_key="YOUR_KEY"
)
# 3. Your Pydantic validation errors are now fixed ✓
💡 Pro Tips
- Use
Partial[Model]for UI updates during streaming - Use StreamFix to fix syntax errors from any provider (OpenRouter, local models)
- Combine both for production-grade reliability (98.4%)
- For tool calls: already 99%+ reliable, focus on content JSON