Two-step structured outputs for LLMs

I’m experimenting with a two-step pipeline for structured outputs. Instead of asking for JSON in a single prompt, I capture the raw free-text response first, then format it in a second call.

Why bother?

Schema validation errors don’t invalidate the LLM response. If the JSON is malformed, I only retry the formatting step. The original reasoning is still there.

Most errors I see in data extraction pipelines are the model messing up JSON tokens, not the response being wrong. A missing bracket shouldn’t waste a complex reasoning chain.

If you want a different schema later, you don’t need to run the full pipeline again. Just re-extract from the same free-text response. This only works if the original response has enough detail, so make sure to prompt for verbose output if that’s the case.

Model tiering

This also enables model tiering. Use a reasoning model for the initial logic, then pass that text to something fast and cheap like Gemini Flash for the JSON conversion.

Big-brain model for doing the work, small cheap model to convert into structured format.

You can also keep the original “thinking” traces/summary as free text. The final output is strict and cheap to produce.

The downsides

It adds complexity to the pipeline. Two calls instead of one, plus an intermediate format to manage. Whether that’s worth it depends on your setup.

There’s also multiple ways to generate structured outputs. JSON-schema constraint? JSON mode and just put the JSON-schema in the prompt? Both? If you are using a high-level library, it may not always get this right. The two-step approach sidesteps that problem entirely. Most LLM providers have a straightforward way to do free-text generation, so it’s harder to mess that up.

rand[om]

Two-step structured outputs for LLMs

Why bother?

Model tiering

The downsides

Further reading