Latest Updates
Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

On the Economics of Hosting Open Source Models
The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?
Jul 29, 2025
AAmar Singh

Batch vs Real-Time LLM APIs: When to Use Each
Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.
Jul 24, 2025
MMichael Ryaboy

Do You Need Model Distillation? The Complete Guide
Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.
Jul 22, 2025
SSam Hogan

The Cheapest LLM Call Is the One You Don’t Await
Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.
Jul 21, 2025
MMichael Ryaboy

Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs
We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family.
May 31, 2025
MMichael Ryaboy

How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment
They thought of a clever solution that saved them 90% on tokens: route people with the most followers to Claude, and everyone else to dirt cheap open-source models
May 29, 2025
MMichael Ryaboy

Migrating our Website and Dashboard to TanStack Start
We evaluated a few frontend frameworks and eventually settled on TanStack Start as the tool of choice to re-implement are dashboard and website. In particular, we wanted a flexible solution that would allow us to server-render static content while also powering a rich, JS-heavy client side application.
May 1, 2025
SSean

Introducing Inference.net
Inference.net is a global network of compute providers delivering affordable, serverless inference for the top open source AI models. We built a distributed infrastructure that allows developers to access state-of-the-art language models with the reliability of major cloud providers—but at a fraction of the cost.
Feb 19, 2025
SSam Hogan