# Structured Outputs: Schema-Guided Prompting for Reliable JSON (2026)

> Source: https://sukruyusufkaya.com/en/blog/yapilandirilmis-ciktilar-structured-outputs-json-schema-prompting-2026
> Updated: 2026-07-01T15:53:34.350Z
> Type: blog
> Category: yapay-zeka
**TLDR:** Getting reliable JSON from an LLM in production takes more than a prompt. How to guarantee schema conformance with JSON mode, structured outputs, constrained decoding, and function calling.

**TL;DR —** In production systems you don't want a "nice paragraph" from an LLM — you want valid JSON you can hand straight to your code. Most of the expensive failures I've seen in the field come from telling a model "return JSON" and then getting markdown fences, extra explanatory sentences, hallucinated fields, or broken commas mixed into the result. The fix isn't trust, it's guarantees. There are four approaches, weakest to strongest: (1) prompt-only "return JSON" (fragile), (2) JSON mode (valid syntax but not your schema), (3) Structured Outputs / schema-constrained decoding (guarantees conformance to a JSON Schema on the provider side), (4) function/tool calling (the model emits arguments matching a declared schema). The right recipe: flat and typed schemas, categories constrained by enums, an explicit required/optional split, in-schema field descriptions, a validate-and-feed-back loop, and a "think first, then emit clean JSON" separation. When extracting from Turkish documents (invoices, contracts), KVKK and Turkish enum labels demand extra care. I walk through all of it with field examples.

## First, an Honest Confession: Prose Is Lovely, but Code Can't Digest It

The most common trap I see teams fall into when they first put an LLM into production is this: in the demo phase, the model works beautifully. You ask something in a chat window, a lovely answer comes back, everyone's happy. Then you drop that model into the middle of a pipeline — its output will be read by another service, written to a database, sent to an API. That's the moment everything changes.

Because for a human, an answer that begins "Here are the three items you asked for: first..." is perfectly clear. But for a piece of code trying to parse that answer with `JSON.parse()`, it's a disaster. Code doesn't read prose. Code expects an exact structure: which fields exist, what their types are, which are required. One extra word, one missing quote, one markdown fence (` ```json `) and the program crashes.

After building dozens of systems in the field I can tell you very plainly: **the weakest link in production LLM integrations is often not the model's intelligence, but the shape of its output.** The model may know the right answer; but if it can't pour that answer into a mold your code can digest, that answer is worthless.

Let me explain with an analogy. Say you have a very talented expert who gives you perfect answers — but brings them each time on a different sheet of paper, sometimes typed, sometimes handwritten, sometimes with notes scribbled in the margins. You want to feed those answers into an automatic scanner. No matter how smart the expert is, if the format isn't standard, the scanner chokes. The whole structured-outputs question is exactly this: standardize the shape of the answer, not the expert.

## Let Me Explain From the Field Why "Return JSON" Isn't Enough

The most common starting point is this: you add "give your answer as JSON" to the end of your prompt. This can work. The problem is that when it doesn't work, it fails **silently**, and usually at the worst moment — in production, on real traffic.

The classic failure patterns I see in the field:

- **Markdown fences:** The model wraps the answer in a ` ```json ... ``` ` block. Your code expects raw JSON and finds three backticks at the front.
- **Preamble sentence:** It starts with "Sure, here's the data you asked for:". No longer valid JSON.
- **Closing note:** It appends "Hope that helps!" after the JSON.
- **Broken syntax:** A trailing comma after the last element, an apostrophe instead of a quote, an unescaped special character.
- **Hallucinated fields:** It adds fields not in your schema, or arbitrarily renames them (`totalAmount` instead of `total_amount`).
- **Type drift:** Where you expected a number, it puts a string like `"1,250.00 USD"`.

Each of these looks manageable on its own. "We'll clean it up with regex, strip the fences," you say. It holds for a while. But as scale grows, this cleanup layer becomes a knot of technical debt. Every new model version, every prompt change, breaks the edge cases again. I call this "JSON-scrubbing hell," and I've watched many teams burn weeks in it.

The core point is this: asking for JSON through a prompt is a **request**, not a guarantee. Production systems don't stand on requests. We need guarantees.

## Four Approaches: A Ladder From Weak to Strong

Think of this problem as a ladder. Each rung closes the gap the previous one left open.

### Rung 1 — Prompt Only ("return JSON")

You politely ask the model for JSON. There's no technical guarantee. The model complies in good faith but doesn't warn you when it doesn't. Acceptable for a prototype, dangerous for production. I never recommend it on its own.

### Rung 2 — JSON Mode

Many providers offer a "JSON mode." This guarantees that the model's output will be **valid JSON syntax**. That is, you get parseable, balanced-brace, valid JSON. This is a big step — it eliminates the markdown-fence and preamble problems.

But be careful: JSON mode does **not** guarantee conformance to **your schema**. Valid JSON comes back, but which fields are in it and what their types are is not up to it. `{"answer": "I don't know"}` is also valid JSON. So the syntax parses, but the semantic contract is still unguaranteed.

> JSON mode means "the sentence will be grammatically correct." It does not mean "the sentence will say what you wanted."

### Rung 3 — Structured Outputs / Schema-Constrained Decoding

Here's the real leap. With Structured Outputs (by that name at some providers, a different name in some APIs) you supply a JSON Schema and the provider guarantees the generated output will **conform to that schema exactly**. Fields are precisely as you asked, types are correct, required fields are always populated.

The mechanism that makes this possible is called **constrained/guided decoding**. Roughly, it works like this: as the model picks the next token, the decoder applies a mask — every token that is **not valid** under the schema/grammar at that moment is pulled to zero probability. So the model physically cannot pick a token that violates the schema. The result: producing invalid output becomes structurally impossible.

This nuance matters: here we're not telling the model "please follow the schema," we're making non-compliance **impossible**. That's the difference between a request and a guarantee.

### Rung 4 — Function / Tool Calling

The top rung is really a special application of rung 3. With function calling (or tool use) you define a "tool" for the model; that tool's arguments are described by a JSON Schema. When the model decides to call a tool, it produces those arguments in conformance with the schema.

This is the backbone of agent architectures. When the model calls a "get weather" tool, it produces clean, typed arguments like `{"city": "Istanbul", "day": "tomorrow"}`. You use the same schema guarantee not just for data extraction but for triggering actions.

| Approach | What It Guarantees | What It Doesn't | When to Use |
|----------|--------------------|-----------------|-------------|
| Prompt only | Nothing | Syntax, schema, type — all at risk | Quick prototype only |
| JSON mode | Valid JSON syntax | Your schema, fields, types | When free-shape but valid JSON suffices |
| Structured Outputs | Exact schema conformance | (Not truth — see hallucination) | Production extraction, classification |
| Function/tool calling | Schema-conformant arguments | That it's the right tool | Agents, action triggering, multi-tool flows |

A warning: the schema guarantee guarantees the **shape** of the output, not its **correctness**. The model can produce a value that conforms perfectly to the schema but is wrong (a hallucination) in content. Structured outputs solve the "format problem," not the "truth problem." Don't conflate the two.

## Behind the Scenes of Constrained Decoding: How the Mask Works

Let's go a little deeper technically, because teams that understand this mechanism design much more accurately.

A language model produces text token by token. At each step it computes a probability distribution over the entire vocabulary and picks a token from it. In constrained decoding, a layer inserts itself: a grammar/automaton determines which tokens are valid given the current JSON Schema state. Tokens that aren't valid have their probability zeroed out (masked); the choice is made among the rest.

Example: the model has written a field name, put a `:`, and per the schema this field must be an integer. At that point the decoder **forbids** the quote character, letters, characters like `{`; it only allows digits and a sign. So the model can never write a string into that field.

In the open-source world there are grammar-based sampling tools for this; you can run local models on your own servers and get the same guarantee. In the KVKK world this is an important option — getting a schema guarantee in your own environment, without sending data out.

A small but critical side effect: over-constraining can lower the model's reasoning quality. If the model is squeezed into a narrow corridor at every token, it sometimes can't find "room to think." That's why the fix is to separate reasoning from output — I'll get to that shortly.

## Field Rules for Schema Design

The success of structured outputs depends largely on the quality of your schema. A badly designed schema chokes even the strongest constrained decoding. Here are the rules I've distilled in the field.

**Keep the schema flat and typed.** Deeply nested structures both lower the model's accuracy and make debugging harder. I've measured many times that extraction quality visibly drops once you go three or four layers deep. Flatten the structure where you can; don't use nested objects unless you truly need to.

**Constrain categories with enums.** If a field can take only certain values (say, document type), don't leave it to free text. Enumerate the possible values with `enum`. That way the model doesn't drift among variations like `"invoice"`, `"Invoice"`, `"fatura"`, `"invoice document"`; it produces exactly the label you defined.

```json
{
  "type": "object",
  "properties": {
    "document_type": {
      "type": "string",
      "enum": ["invoice", "contract", "receipt", "other"],
      "description": "The type of the document. Use 'other' if unsure."
    },
    "total_amount": {
      "type": "number",
      "description": "VAT-included total, number only (no currency symbol or separators)."
    },
    "vat_rate": {
      "type": "integer",
      "enum": [1, 10, 20],
      "description": "VAT rate as a percentage."
    }
  },
  "required": ["document_type", "total_amount"]
}
```

**Make the required/optional split explicit.** Populate the `required` list deliberately. If a field doesn't always have to be filled, don't make it required — otherwise the model is pushed to fabricate (hallucinate) a value just to fill the blank. Explicitly encourage "leave empty if unknown" behavior through the schema and descriptions.

**Use field descriptions as documentation the model reads.** The `description` fields in JSON Schema aren't just notes for the developer; the model reads them and shapes its behavior accordingly. Embed clear instructions inside the schema: "number only, no separators," "use 'other' if unsure," "date in ISO 8601 format." This is the cleanest way to steer behavior without bloating the prompt.

**Be consistent with labels.** Decide at the very start whether your enum values will be in one language or another, and stay consistent. Mixed-language enums (some values in one language, some in another) confuse both the model and the downstream code. If you'll use non-ASCII characters in enum values, make sure they're encoded consistently across the whole system.

## Reliability Tactics: When Guarantees Aren't Enough

The schema guarantee saves the format, but production needs more than that. Here are the tactics I consider non-negotiable in the field.

### Validate and Feed Back

There are cases where you can't use schema-constrained decoding (certain provider/model combinations, some local setups). Then the **validate + retry** pattern kicks in:

1. Take the output, run it through a schema validator (Pydantic, a JSON Schema validator).
2. If it passes, you're done.
3. If it fails, **feed the validation error back to the model**: "This field is missing / this type is wrong, fix it and return only valid JSON."
4. Retry a limited number of times (usually 1–2).

Feeding the validation error back to the model is surprisingly effective. The model often fixes the first-attempt error immediately once it sees the concrete error message. But remember retries have a cost (money and latency); don't allow an infinite loop.

### Separate Reasoning From the Final Answer

This is my favorite tactic. Expecting the model to both think and produce flawless JSON at the same time weakens both. Instead, separate them:

- **One way:** Add a `thinking` (scratchpad) field to the schema. Let the model think freely there first, then write the clean answer into a `result` field. Constrained decoding still guarantees the whole structure, but you give the model room to breathe.
- **The other way (two passes):** In the first call, let the model reason/analyze in free text. In the second call, give it that analysis and say "now pour this into this schema." A separate pass isolates clean JSON generation from reasoning.

In the field I've seen the two-pass approach visibly raise accuracy, especially on complex extractions. The cost is two calls; but the drop in error rate is usually worth it.

### Few-Shot Examples and Low Temperature

Put one or two examples in the prompt that show exactly the schema you want. The model learns from a concrete example far better than from an abstract schema definition. Show the input-output pair; the model catches the pattern.

For extraction tasks, **lower the temperature.** You don't want creativity; you want determinism. Low temperature makes the model extract the same structure from the same document every time and reduces variance.

## Trade-offs: There's No Free Lunch

Structured outputs are powerful but, like every engineering decision, they have costs. Knowing them explicitly makes it easier to decide which rung to pick.

- **Over-constraining can clip reasoning.** If you push the model into a very narrow corridor, its thinking quality may drop. The fix: schema design that leaves room for reasoning (scratchpad, two passes).
- **Retries have a cost.** The validate-and-feed-back loop is safe but adds money and latency. Put a limit on it.
- **Schema drift and versioning.** Your schema changes over time. You add a new field, add a value to an enum. You have to think about compatibility with old data and old client code. Version your schemas and try to keep changes backward-compatible.
- **Provider lock-in.** How structured outputs are implemented varies from provider to provider. If you want portability, consider an abstraction layer.

> Keep one principle always in mind: the schema guarantee gives you a **valid** answer, not a **correct** one. You have to evaluate correctness separately.

## Use Cases: Where Does This Actually Pay Off?

Structured outputs aren't a theoretical nicety; the places they add the most value in daily production are clear:

- **Data extraction:** Pulling structured fields from documents, emails, forms. Invoice line items, contract parties, amounts.
- **Classification and routing:** Splitting an incoming request into categories and routing it to the right team/flow. Enums are the lead actor here.
- **Agent/tool arguments:** The model calling a tool with the right parameters. The foundation of all agent architectures.
- **Form filling:** Automatically filling a structured form from free text.
- **LLM-as-judge / evaluation:** Scoring an answer against set criteria. Structured scores like `{"accuracy": 4, "rationale": "..."}`.
- **ETL from documents:** Extracting structured data from PDF/scanned documents and streaming it into a database.

## The Turkish and KVKK Angle: Warnings From the Field

When working with Turkish documents, a few extra layers come into play, and teams that skip them get hurt.

**Extraction from Turkish documents demands care.** In Turkish invoices the amount format (`1,250.00` becomes `1.250,00` — period as thousands separator, comma as decimal) is the reverse of the English convention. State this clearly in your schema descriptions: "when converting the amount to a number, account for the Turkish decimal format, and use only a decimal point in the output." In contracts, constrain the format of fields like party names, national ID numbers, and tax numbers with an enum or a pattern.

**Turkish enum labels.** If you'll use Turkish labels for categorical fields like document type, department, or status, watch out for case and Turkish-character consistency. The "İ/ı" and "i/I" conversion is a classic trap in Turkish; normalize your enum values and make sure the downstream code matches that normalized form.

**KVKK and personal data.** Here's the most critical point. Structured extraction often produces **personal data** — names, national IDs, addresses, phone numbers. The moment you pour that data into a schema and write it to a database, you fall under KVKK (Turkey's data protection law). The questions are clear: Do you have a legal basis to process this data? In what environment is the data processed — a managed foreign API, or a local model running on your own servers? For fields containing personal data, constrained-decoding solutions that don't send data out of your environment (open source, local) offer a serious advantage. When designing your schema, not extracting personal-data fields you don't truly need is also a strategy — the data-minimization principle.

## Where to Start: Steps You Can Apply This Week

Let's leave theory and get to the field. If you have an LLM integration in hand, follow this order:

1. **Measure your current output.** For one week, log the JSON outputs in production and count how many throw a parse/validation error. See the size of the problem as a number.
2. **Climb the rung.** If you're still at the "I ask for JSON via the prompt" level, switch to your provider's Structured Outputs or function calling feature. This single step zeroes out most parse errors.
3. **Fix the schema.** Flatten your schema, turn categories into enums, write a clear `description` for each field, review the required/optional split.
4. **Add a validation layer.** Even with a schema guarantee, add a validation gate with Pydantic/JSON Schema and, on failure, retry 1–2 times by feeding the error back.
5. **Separate reasoning.** On complex extractions, try a scratchpad field or a two-pass flow; measure the drop in error rate.
6. **Design for KVKK from the start.** Which personal-data fields are in your schema, what's your legal basis, where does processing happen — write these into the design, not the code.

One last observation from the field: most teams that move to structured outputs say the biggest win isn't "fewer bugs" but "less fear." Once you guarantee the shape of the output, most of the midnight production alarms go quiet. And your engineering energy shifts from scrubbing JSON to real work.