Reliable JSON From LLMs: Structured Outputs Compared 2026
In short
As of June 2026, all three major LLM providers ship true constrained decoding: OpenAI through strict json_schema, Anthropic through output_config and strict tool use (public beta since November 2025), and Google through Gemini's improved responseSchema. The practical gap is large: plain JSON mode fails to match a non-trivial schema 8-15% of the time, while strict structured outputs hold roughly 99.9% compliance. Every AI auto-reply in my production reputation SaaS now flows through a strict schema, and parse failures are effectively zero.

On this page
- What are structured outputs and how do they differ from JSON mode?
- How does each provider implement structured outputs in 2026?
- How do I make one Zod schema the single source of truth in TypeScript?
- When does constrained decoding hurt?
- What failure rates did I measure on the same 50-field schema?
- Key takeaways
What are structured outputs and how do they differ from JSON mode?
Structured outputs compile your JSON Schema into a grammar that constrains token sampling at inference, so the model cannot emit a token that would violate the schema. JSON mode only promises syntactically valid JSON. In my testing, that gap is the difference between 8-15% failures and fewer than one failure per thousand requests.
The one-sentence version I keep in my notes: constrained decoding means the provider compiles your schema into a formal grammar and masks every candidate token that would break it before sampling, so invalid output is not filtered after generation, it is impossible during generation.
JSON mode, the older feature, guarantees the result tokenizes as JSON and nothing more. The failures I used to see weekly: missing required fields, a string where I asked for a number, hallucinated keys, enums replaced by synonyms, and the classic markdown fence wrapped around otherwise valid JSON. Across my own pipelines, JSON mode without schema enforcement failed between 8% and 15% of the time depending on schema complexity, while OpenAI's strict structured outputs hold about 99.9% schema compliance, under 0.1% failure. That one stat settles the architecture question. As of June 2026, the provider docs themselves treat plain JSON mode as legacy.
How does each provider implement structured outputs in 2026?
OpenAI uses response_format with a json_schema and strict: true. Anthropic shipped native Structured Outputs in public beta in November 2025: output_config.format for response shape and strict: true for tool inputs, gated by the anthropic-beta: structured-outputs-2025-11-13 header on Sonnet 4.5, Opus 4.1, and newer. Gemini uses responseSchema, with noticeably better complex-type handling since its 2026 update.
| Dimension | OpenAI | Anthropic (Claude) | Google (Gemini) |
| Request parameter | response_format: { type: "json_schema", strict: true } | output_config.format (type: "json_schema") plus strict: true on tools | generationConfig.responseSchema with JSON MIME type |
| Opt-in status | GA | Public beta, header anthropic-beta: structured-outputs-2025-11-13 | GA, improved handling rolled out 2026 |
additionalProperties | Must be false on every object | Must be false on every object | Not part of the dialect |
| Optional fields | Every key in required; optional means a nullable union | Omit the key from required | nullable: true, OpenAPI style |
| Unions | anyOf supported, oneOf rejected | anyOf supported | anyOf reliable since 2026, shaky beyond ~3 levels deep |
| Recursion | Supported via $defs and $ref | Not supported | Depth-limited |
| Numeric and length bounds | Not enforced at decode time | Not enforced; SDK strips them | Partially enforced |
| Enums | Supported | Supported | Supported |
| Streaming | Yes | Yes | Yes |
| Typical retry need | Almost never (<0.1%) | Almost never; watch max_tokens truncation | Rare; deep unions occasionally |
| Model coverage | GPT-4o and newer | Sonnet 4.5, Opus 4.1, and newer | Gemini 2.x |
| Provider and mode | Requests | Parse or schema failures | Failure rate |
OpenAI, strict json_schema | 500 | 0 | 0.0% |
Claude, output_config + beta header | 500 | 1 | 0.2% |
Gemini, responseSchema | 500 | 3 | 0.6% |
| JSON mode, no schema (baseline) | 500 | 58 | 11.6% |
Claude's single failure was a truncation mid-array; the grammar held right up to the cutoff, so I count it against my config, not the feature. Gemini's three failures were union coercions on the deepest nested object. The baseline lands inside the published 8-15% band, which matches what I see across client projects.
In my reputation SaaS, every AI auto-reply passes through a strict schema: reply text capped at four lines, a tone enum, an escalation boolean. After moving off JSON mode plus regex repair, parse failures went to effectively zero across roughly 6,000 generated replies a month, and the two-retry repair loop has fired twice in the last quarter. What changed most was monitoring.
Once parse failures hit zero, every failure that remains is semantic. The model returns a perfectly valid object with a wrong sentiment label or an invented order ID, and no schema will catch that. Strict schemas do not remove the need for monitoring, they relocate it.
That relocation is why I trace every structured call end to end, using the setup from my guide to AI agent observability in Node.js with OpenTelemetry ↗. The alerts now watch field-level distributions, not parse errors.
My verdict: in 2026 you should never parse free-text JSON from an LLM in production: one Zod schema should drive the request, the validation, and the retry. This schema-first pattern is the backbone of the AI systems I build for clients ↗, from RAG MVPs to agent backends.
Key takeaways
- Plain JSON mode fails 8-15% of the time on non-trivial schemas; strict structured outputs hold roughly 99.9% compliance. Treat JSON mode as legacy.
- All three majors now do true constrained decoding: OpenAI
response_formatwithstrict: true, Claudeoutput_configbehind thestructured-outputs-2025-11-13beta header, GeminiresponseSchema. - One Zod schema should be the single source of truth: it generates the wire schema, validates the response, and drives a max-two-attempt repair loop.
- Constrained decoding can degrade reasoning-heavy extraction. Reason in prose first, then extract with a second cheap call.
- With syntax solved, monitoring moves up a level: track semantic correctness in traces, not parse success.
FAQ
Is JSON mode the same as structured outputs?
No. JSON mode only guarantees output that tokenizes as valid JSON; it does not enforce your schema, so fields go missing and types drift, failing 8-15% of the time on complex shapes. Structured outputs compile the schema into a decoding grammar, which pushes compliance to roughly 99.9%. In 2026, JSON mode is legacy.
Does Claude support structured outputs natively in 2026?
Yes. Anthropic shipped Structured Outputs in public beta in November 2025. You send `output_config.format` with a JSON schema, or set `strict: true` on tool definitions, plus the `anthropic-beta: structured-outputs-2025-11-13` header. It works on Sonnet 4.5, Opus 4.1, and newer models, and the TypeScript SDK ships a `zodOutputFormat` helper.
Do structured outputs work with streaming responses?
Yes, on all three providers as of June 2026. The grammar constrains each token as it streams, so partial output is always a prefix of valid JSON. You still validate the complete document with `safeParse` at the end; for incremental UI rendering, run a partial-JSON parser over the accumulating buffer.
Working on something like this?
I build web apps, AI features, and mobile products for clients. If this article matches a problem you have, tell me about it.
Start a conversationMalik Hamza Shabbir · Full-Stack & AI Engineer
I build full-stack and AI products solo: a reputation SaaS in production, RAG pipelines, and React Native apps. I write from what I ship, not from documentation summaries.
Related articles
TypeScript 7 Beta (tsgo): What Broke in My Real Monorepo
tsgo cut my monorepo's full type check from 71s to 9.2s, but plugins and compiler-API tools broke. Real before/after numbers and a switch-or-wait verdict.
Do AI Agents Need a Memory Layer? Mem0 vs Letta vs Zep
Most AI agents don't need a memory vendor. Unless you need consolidation, decay, or cross-agent state, Postgres with pgvector covers memory for $0 extra.
How to Migrate Your MCP Server to the 2026 Stateless Spec
The final MCP spec ships July 28, 2026 and removes sessions from the protocol. I migrated my production Node server; here is the exact diff and checklist.
