Blog

Latest Updates

Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

On the Economics of Hosting Open Source Models

The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?

Jul 29, 2025

Amar Singh

Batch vs Real-Time LLM APIs: When to Use Each

Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.

Jul 24, 2025

Michael Ryaboy

Do You Need Model Distillation? The Complete Guide

Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.

Jul 22, 2025

Sam Hogan

The Cheapest LLM Call Is the One You Don’t Await

Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.

Jul 21, 2025

Michael Ryaboy

Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family.

May 31, 2025

Michael Ryaboy

How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment

They thought of a clever solution that saved them 90% on tokens: route people with the most followers to Claude, and everyone else to dirt cheap open-source models

May 29, 2025

Michael Ryaboy

Migrating our Website and Dashboard to TanStack Start

We evaluated a few frontend frameworks and eventually settled on TanStack Start as the tool of choice to re-implement are dashboard and website. In particular, we wanted a flexible solution that would allow us to server-render static content while also powering a rich, JS-heavy client side application.

May 1, 2025

Sean

Introducing Inference.net

Inference.net is a global network of compute providers delivering affordable, serverless inference for the top open source AI models. We built a distributed infrastructure that allows developers to access state-of-the-art language models with the reliability of major cloud providers—but at a fraction of the cost.

Feb 19, 2025

Sam Hogan