_        __                                          _   
(_)      / _|                                        | |  
 _ _ __ | |_ ___ _ __ ___ _ __   ___ ___   _ __   ___| |_ 
| | '_ \|  _/ _ \ '__/ _ \ '_ \ / __/ _ \ | '_ \ / _ \ __|
| | | | | ||  __/ | |  __/ | | | (_|  __/_| | | |  __/ |_ 
|_|_| |_|_| \___|_|  \___|_| |_|\___\___(_)_| |_|\___|\__|
  
 _        __                                          _   
(_)      / _|                                        | |  
 _ _ __ | |_ ___ _ __ ___ _ __   ___ ___   _ __   ___| |_ 
| | '_ \|  _/ _ \ '__/ _ \ '_ \ / __/ _ \ | '_ \ / _ \ __|
| | | | | ||  __/ | |  __/ | | | (_|  __/_| | | |  __/ |_ 
|_|_| |_|_| \___|_|  \___|_| |_|\___\___(_)_| |_|\___|\__|
  

What is this?

inference.net is a wholesaler of LLM inference tokens for models like Llama 3.1. We provide real-time and batch inference APIs at a 50-90% discount from what you would pay together.ai or groq. If you're spending > $10K/month on inference, we can likely reduce your costs substantially. You can reach us at support@inference.net.

How Does It Work?

There is less of a GPU shortage than you have been led to believe. Data centers have underutilized capacity, but it comes in a shape that most orchestration software is not capable of using; a few minutes here, a few hours there. Once those unused minutes have passed, they can never be reclaimed. Like electricity that cannot be stored, or produce that will spoil, compute capacity has an expiration date after which its value drops to zero. We function as a spot market for this perishable compute, ensuring that otherwise wasted capacity finds productive use before it expires.

To accomplish this, we built custom scheduling and orchestration software that aggregates these small chunks across data centers to run AI models. As a buyer of last resort for this otherwise-wasted capacity, we secure steep discounts from data centers and pass these savings directly to you.

KEY FEATURES

  1. Fast. Under 400ms TTFT and >60 tok/sec throughput.

  1. Reliable. 99.9% Uptime.

  1. Affordable. The best prices on the market.

Why?

LLM inference is a new form of commodity that will quickly outgrow the rest of the compute market by orders of magnitude. Like electricity or oil, inference requires sophisticated market infrastructure to match dynamic supply and demand. Just as energy markets evolved from simple bilateral trades to complex spot and futures markets, we're building the same market infrastructure for inference. By creating a liquid marketplace, we enable inference producers (read: data centers) to compete to give developers the best deal for AI services.

Inference.net is the marketplace where humanity discovers the moment-to-moment price of turning energy into intelligence.

We believe efficient markets for AI inference will drive the widespread proliferation of artificial intelligence over the next decade, leading to unprecedented human flourishing on Earth and beyond.

We aim to accelerate this process.

— The inference team