Comparing Python Libraries for Structured LLM Extraction

Comparing Python Libraries for Structured LLM Extraction

Most enterprise uses of LLMs require structured output. You need JSON objects that fit into databases, validate against schemas, and integrate with existing systems. You can’t just take free-form text and hope it works.

The problem is that LLMs generate text. Ask for JSON and you might get it wrapped in markdown code blocks, with extra commentary, missing fields, or wrong types. You get "founded": "2015" as a string when you need an integer. You get "employee_count": "approximately 450" when you need a number.

Three Python libraries solve this problem, each with a different approach: instructor, outlines, and pydantic-ai.

instructor: The Battle-Tested Choice

instructor wraps your existing OpenAI client and uses function calling under the hood:

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field

class Company(BaseModel):
    name: str
    founded_year: int = Field(ge=1800, le=2025)
    employee_count: int = Field(ge=1)

client = instructor.from_openai(OpenAI())

company = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Company,
    messages=[{
        "role": "user",
        "content": "TechVision Analytics, founded 2015, 450 employees"
    }]
)

No JSON parsing. No error handling. Just the data you asked for.

The key feature: Automatic retry loop. If Pydantic validation fails, instructor sends the error back to the LLM and asks it to try again (up to 3 times by default). This is helpful with weaker or cheaper models that might not get the structure right on the first try.

Strengths:

  • Most mature (3 million downloads/month)
  • Multi-provider support (works with 15+ LLM providers)
  • Best documentation and community
  • Minimal code changes (wraps existing clients)
  • Streaming support with partial Pydantic models

Weaknesses:

  • No built-in agent framework
  • Retry loop adds latency when validations fail

When to choose:

  • You need multi-provider support
  • Your team is new to LLM extraction
  • You need streaming capabilities
  • You want the largest community and most examples

outlines: The Speed Specialist

outlines takes a fundamentally different approach: constrained generation using Finite State Machines.

Instead of hoping the LLM generates valid JSON and retrying when it doesn’t, outlines uses FSMs to guarantee structural validity during generation.

from openai import OpenAI
import outlines
from pydantic import BaseModel

client = OpenAI()
model = outlines.from_openai(client, "gpt-4o-mini")

class Company(BaseModel):
    name: str
    employee_count: int

result = model.generate(
    prompt="Extract: Acme Corp, 500 employees",
    output_type=Company
)

The FSM approach guarantees valid structure without retries. For simple extraction tasks (sentiment classification, email extraction), this can be significantly faster than validation-based approaches.

Strengths:

  • Fast on simple extraction tasks
  • Guaranteed valid structure (zero parsing errors)
  • Excellent local model support (vLLM, Ollama)
  • Zero retries needed

Weaknesses:

  • Requires models with structured output support (gpt-4o, gpt-4o-mini)
  • No Optional fields with OpenAI (all fields marked required)
  • Advanced features (regex, grammar) only work with local models
  • Less mature than instructor

When to choose:

  • High-volume simple extraction where latency matters
  • Deploying with local models (unlocks full feature set)
  • User-facing applications needing fast response
  • You want guaranteed structure with no retries

pydantic-ai: The Agent Framework

pydantic-ai is the newest library (launched 2024) from the Pydantic team. It provides an agent framework, not just extraction:

from pydantic_ai import Agent
from pydantic import BaseModel, Field

class Company(BaseModel):
    name: str
    founded_year: int = Field(ge=1800, le=2025)
    employee_count: int = Field(ge=1)

agent = Agent(
    "openai:gpt-4o-mini",
    result_type=Company,
    system_prompt="Extract detailed company information...",
    retries=3
)

result = agent.run_sync("TechVision Analytics, founded 2015, 450 employees")
company = result.data

The agent abstraction feels more natural for complex workflows. You can add dependency injection, tool calling, and structured testing.

Strengths:

  • Agent-first architecture with dependency injection
  • Built-in testing utilities
  • Tool/function calling support
  • Well-written, clean codebase
  • Token-efficient prompting

Weaknesses:

  • Newer library (smaller community than instructor)
  • Less battle-tested in production
  • Currently OpenAI-focused (no multi-provider support yet)
  • No automatic retry feedback loop like instructor

When to choose:

  • You’re building agent workflows
  • You want modern architecture
  • You’re using gpt-4o-mini or gpt-4o
  • Clean code and testing matter to you

My Experience

I’ve used all three in production. Here’s what I’ve found:

For simple, high-volume tasks (like classifying support tickets), outlines is noticeably faster. The constrained generation approach just works, and the speed difference is real when you’re processing thousands of items.

For complex extraction (like parsing research papers or financial documents), I reach for pydantic-ai. The agent framework makes it easy to add context, tools, and custom logic. The code stays cleaner as complexity grows.

For teams new to this, instructor is the safest bet. The documentation is excellent, the community is large, and you’ll find examples for almost anything. The automatic retry loop is forgiving when you’re still figuring out your schemas. They support basically any model.

The Bottom Line

All three solve the structured output problem. The differences are in approach, maturity, and use case fit.

For new projects, I start with pydantic-ai because I like the agent abstraction and the code stays clean. But if I needed to support multiple providers or had a team new to this stuff, I’d use instructor. And for high-volume simple extraction, outlines is hard to beat.

The structured output problem is real, but any of these three libraries solve it in about 10 lines of code.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.