Blog
Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

Hybrid-Attention models are the future for SLMs
Hybrid attention delivers up to 3x cost reduction compared to traditional transformer models
Nov 3, 2025
Amar Singh

Announcing our $11.8M Series Seed
We raised $11.8 million in funding led by Multicoin Capital and a16z CSX to train and hosti custom language models that are faster, more affordable, and more accurate than what the Big Labs offer.
Oct 14, 2025
Sam Hogan

Schematron: An LLM trained for HTML -> JSON at scale
Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs.
Sep 9, 2025
Sam Hogan

Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost
We're thrilled to announce the release of ClipTagger-12b, a groundbreaking open-source vision-language model that delivers GPT-4.1-level performance for video understanding at a fraction of the cost.
Aug 14, 2025
Sam Hogan

On the Economics of Hosting Open Source Models
The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?
Jul 29, 2025
Amar Singh

Batch vs Real-Time LLM APIs: When to Use Each
Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.
Jul 24, 2025
Michael Ryaboy

Do You Need Model Distillation? The Complete Guide
Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.
Jul 22, 2025
Michael Ryaboy

The Cheapest LLM Call Is the One You Don’t Await
Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.
Jul 21, 2025
Michael Ryaboy

Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs
We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family.
May 31, 2025
Michael Ryaboy

How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment
They thought of a clever solution that saved them 90% on tokens: route people with the most followers to Claude, and everyone else to dirt cheap open-source models
May 29, 2025
Michael Ryaboy





