Blog
Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

The Workhorse Era of AI: Moats Are Built, Not Rented
When everyone's using the same frontier models, nobody has an edge.
Oct 17, 2025
Michael Ryaboy

Announcing our $11.8M Series Seed
We raised $11.8 million in funding led by Multicoin Capital and a16z CSX to train and hosti custom language models that are faster, more affordable, and more accurate than what the Big Labs offer.
Oct 14, 2025
Sam Hogan

RAG Is Over: RL Agents Are the New Retrieval Stack
RL takes search agents to the next level. Without RL, agentic search is powerful but slow; you often need expensive frontier models to get the best results. With RL, it becomes much more viable.
Sep 23, 2025
Michael Ryaboy

Introducing Schematron: Structured HTML Extraction 40-80x Cheaper than GPT-5
Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs.
Sep 9, 2025
Michael Ryaboy

Arbitraging Down LLM Inference to the Cost of Electricity
What if we allow every GPU to run serverless inference, and can verify that their LLM output is correct?
Aug 25, 2025
Michael Ryaboy

Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost
We're thrilled to announce the release of ClipTagger-12b, a groundbreaking open-source vision-language model that delivers GPT-4.1-level performance for video understanding at a fraction of the cost.
Aug 14, 2025
Sam Hogan

GPU-Rich Labs Have Won: What's Left for the Rest of Us is Distillation
massive training runs and powerful but expensive models means another technique is starting to dominate: distillation
Jul 31, 2025
Michael Ryaboy

On the Economics of Hosting Open Source Models
The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?
Jul 29, 2025
Amar Singh

Batch vs Real-Time LLM APIs: When to Use Each
Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.
Jul 24, 2025
Michael Ryaboy

Do You Need Model Distillation? The Complete Guide
Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.
Jul 22, 2025
Michael Ryaboy