Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

    M

    Michael Ryaboy

    Published on May 31, 2025

    Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

    Yesterday, Osmosis AI dropped something special: a 0.6B parameter model that solves one of the most frustrating problems in production AI - structured outputs that lobotomize your smart models.

    If you've ever tried to force GPT-4 or Claude to output JSON, you know the pain. Your accuracy tanks. GPT-4.1 drops to just 2.79% on AIME math problems when you enforce structured output. Claude Sonnet only gets 16.29%, far below the innate capabilities of these models. Every reasoning model like Deepseek has this problem.

    The Problem with Structured Outputs

    Here's what happens when you force structured outputs: you're constraining the model so much that its intelligence decreases. Instead of letting it think freely and reason through problems, you're constraining every token to fit a predefined schema. The model can't explore, can't backtrack, can't use its full reasoning capabilities.

    This is especially painful for complex reasoning tasks like math problems, coding, or challenging document extraction tasks, where the model needs to work through multiple steps, check its work, and build up to a solution. When you force JSON from the start, you're cutting off the model's ability to think.

    The Osmosis Solution

    Osmosis-Structure-0.6B flips this problem on its head. Instead of forcing your expensive model to output structured data, you let it do what it does best - think freely and generate high-quality reasoning. Then you pass that output to Osmosis-Structure-0.6B, which extracts the structured information with remarkable accuracy.

    The results speak for themselves:

    AIME 1983-2024 Performance:

    • Claude Sonnet 4: 16.29% → 62.59% (+284% improvement)
    • GPT-4.1: 2.79% → 39.66% (+1322% improvement)
    • Claude Opus 4: 22.94% → 65.06% (+184% improvement)

    Math DAPO 17K Dataset:

    • Claude Sonnet 4: 15.52% → 69.40% (+347% improvement)
    • GPT-4.1: 10.53% → 70.03% (+565% improvement)
    • Claude Opus 4: 15.28% → 69.91% (+357% improvement)

    That's not a typo - GPT-4.1 saw a 1322% improvement on AIME problems.

    How It Works in Practice

    The workflow is surprisingly simple:

    1. Send your prompt to a smart model (Deepseek, Claude Sonnet etc.) without structured output constraints
    2. Let the model reason freely and generate its best response
    3. Pass that response to Osmosis-Structure-0.6B to extract the structured data

    You get the best of both worlds: the reasoning power of large models and guaranteed structured output. Plus, Osmosis-Structure-0.6B is tiny (0.6B parameters) and fast, so the additional processing step is barely a rounding error when it comes to latency and cost.

    Perfect Pairing: Osmosis + DeepSeek R1

    This is where things get really exciting. We already have DeepSeek R1 available on Inference.net - a state-of-the-art reasoning model that rivals OpenAI o1 performance but at a fraction of the cost. When you combine DeepSeek R1 with Osmosis-Structure-0.6B, you get an incredibly powerful and cost-effective solution.

    DeepSeek R1 Performance:

    • AIME 2024: 79.8% (compared to o1's 79.2%)
    • MATH-500: 97.3% (compared to o1's 96.4%)

    The Economics Are Compelling:

    DeepSeek R1 on Inference.net:

    • Input: $0.45 / 1M tokens
    • Output: $2.15 / 1M tokens

    Compare this to OpenAI o1 pricing and the savings are substantial. Add Osmosis-Structure-0.6B for reliable JSON extraction, and you have a solution that's both higher performing and dramatically cheaper than forcing structured outputs on premium models.

    For even more aggressive cost optimization, consider our DeepSeek R1 Distill Llama 70B at just $0.10/$0.40 per million tokens. This distilled model still beats GPT-4o on math benchmarks while being 10x+ cheaper than premium alternatives.

    Available Now on Inference.net

    We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family. This means you can access both the reasoning power of DeepSeek and the structured output reliability of Osmosis through a single API.

    Here's how to combine them for maximum performance and cost savings:

    import json
    from openai import OpenAI
    
    # Connect to Inference.net
    client = OpenAI(
        api_key="inference...",
        base_url="https://api.inference.net/v1",
    )
    
    # First, get reasoning from DeepSeek R1
    reasoning_response = client.chat.completions.create(
        model="deepseek/deepseek-r1/fp-8",
        messages=[{"role": "user", "content": "Solve for x in the equation 2x + 5 = 13"}],
        temperature=0.6,  # Recommended for DeepSeek R1
    )
    
    # Extract the reasoning trace
    reasoning_trace = reasoning_response.choices[0].message.content
    # Output: To solve the equation \(2x + 5 = 13\)...
    
    response = client.chat.completions.create(
        model="osmosis-ai/osmosis-structure-0.6b/fp-32",
        messages=[
            {
                "role": "system",
                "content": f"You are a helpful assistant that understands and translates text to JSON format according to the following schema. {json_schema}",
            },
            {
                "role": "user",
                "content": f"Extract the reasoning steps from this mathematical solution: {reasoning_trace}",
            },
        ],
        temperature=0,
        max_tokens=512,
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "reasoning_extraction",
                "schema": json_schema_dict,
                "strict": True,
            },
        },
        stream=False,
    )
    
    reasoning_json = json.loads(response.choices[0].message.content)
    print(json.dumps(reasoning_json, indent=2))
    # Output: {"answer": "4"}


    The DeepSeek R1 Advantage

    What makes this combination particularly powerful is how DeepSeek R1 was trained. Unlike traditional models, DeepSeek R1 was developed using large-scale reinforcement learning to specifically enhance reasoning capabilities. It naturally emerged with behaviors like:

    • Self-verification of solutions
    • Reflection on problem-solving approaches
    • Generation of long, detailed chain-of-thought reasoning

    This makes it perfect for the Osmosis workflow - DeepSeek R1 generates rich, detailed reasoning traces that Osmosis-Structure-0.6B can then reliably convert to structured output.

    Cost Optimization Strategies

    For production applications, consider this tiered approach:

    1. Premium users: DeepSeek R1 or V3 + Osmosis for maximum accuracy
    2. Standard users: DeepSeek R1 Distill Llama 70B + Osmosis for great performance at lower cost

    The premium models are incredibly powerful - DeepSeek V3-0324 actually outperforms Claude 3.7 in benchmarks like MMLU-Pro, GPQA Diamond, MATH-500, AIME 2024, and LiveCodeBench, while distilled models maintain much of that performance.

    What's Next

    Combining reasoning models with small extraction models is the new SOTA, which is why we're excited to announce that Osmosis-Structure-0.6B is now available on Inference.net. Whether you're extracting data from documents, building AI agents that need reliable JSON output, or tackling complex reasoning tasks, this combination delivers better results at dramatically lower cost than traditional approaches.

    Try it today and see the difference for yourself!

    Ready to get started? Sign up for Inference.net and start using DeepSeek R1 and Osmosis-Structure-0.6B today. Questions? Reach out to our team - we're here to help you build better AI applications.

    START BUILDING TODAY

    15 minutes could save you 50% or more on compute.