Banner background

    Announcing our $11.8M Series Seed.

    Read more

    Blog

    Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

    Hybrid-Attention models are the future for SLMs

    Hybrid-Attention models are the future for SLMs

    Hybrid attention delivers up to 3x cost reduction compared to traditional transformer models

    Nov 3, 2025

    A

    Amar Singh

    Announcing our $11.8M Series Seed

    Announcing our $11.8M Series Seed

    We raised $11.8 million in funding led by Multicoin Capital and a16z CSX to train and hosti custom language models that are faster, more affordable, and more accurate than what the Big Labs offer.

    Oct 14, 2025

    S

    Sam Hogan

    Schematron: An LLM trained for HTML -> JSON at scale

    Schematron: An LLM trained for HTML -> JSON at scale

    Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs.

    Sep 9, 2025

    S

    Sam Hogan

    Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost

    Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost

    We're thrilled to announce the release of ClipTagger-12b, a groundbreaking open-source vision-language model that delivers GPT-4.1-level performance for video understanding at a fraction of the cost.

    Aug 14, 2025

    S

    Sam Hogan

    On the Economics of Hosting Open Source Models

    On the Economics of Hosting Open Source Models

    The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?

    Jul 29, 2025

    A

    Amar Singh

    Batch vs Real-Time LLM APIs: When to Use Each

    Batch vs Real-Time LLM APIs: When to Use Each

    Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.

    Jul 24, 2025

    M

    Michael Ryaboy

    Do You Need Model Distillation? The Complete Guide

    Do You Need Model Distillation? The Complete Guide

    Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.

    Jul 22, 2025

    M

    Michael Ryaboy

    The Cheapest LLM Call Is the One You Don’t Await

    The Cheapest LLM Call Is the One You Don’t Await

    Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.

    Jul 21, 2025

    M

    Michael Ryaboy

    Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

    Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

    We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family.

    May 31, 2025

    M

    Michael Ryaboy

    How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment

    How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment

    They thought of a clever solution that saved them 90% on tokens: route people with the most followers to Claude, and everyone else to dirt cheap open-source models

    May 29, 2025

    M

    Michael Ryaboy