_        __                                          _   
(_)      / _|                                        | |  
 _ _ __ | |_ ___ _ __ ___ _ __   ___ ___   _ __   ___| |_ 
| | '_ \|  _/ _ \ '__/ _ \ '_ \ / __/ _ \ | '_ \ / _ \ __|
| | | | | ||  __/ | |  __/ | | | (_|  __/_| | | |  __/ |_ 
|_|_| |_|_| \___|_|  \___|_| |_|\___\___(_)_| |_|\___|\__|
    

What is this?

inference.net is a wholesaler of LLM inference tokens for models like Llama 3.1. We provide inference batch and streaming inference APIs at a 50-90% discount from what you would pay together.ai or groq. We can currently generate ~100B tokens per day.

Are you a researcher? Click here.

How?

There is less of a GPU shortage than you have been led to believe. Data centers have underutilized capacity, but it comes in a shape that most orchestration software is not capable of using; a few minutes here, a few hours there.

Once those unused minutes have passed, they can never be reclaimed. Like a stock option that is about to expire, unused compute becomes less valuable as it approaches its expiration date. Few customers need just a few minutes of compute time, making these fragments challenging to sell conventionally.

To solve this, we built custom scheduling and orchestration software that aggregates these small chunks across data centers to run AI models on compute that would otherwise go unused. Since we are the only purchaser of this compute, we are able to buy at a steep discount from data centers and pass those savings on to you.

Key Features

  1. Fast. <300ms TTFT and >100 tok/s throughput.
  2. Reliable. 99.9% uptime.
  3. Providers in dozens of countries, mostly centered around North America and Europe.

Why?

We believe LLM inference is a new form of commodity that will quickly outgrow the rest of the compute market by orders of magnitude. Inference will trade more like electricity or oil than like other forms of compute currently trade. In order to maximize the value of this new commodity, a market will emerge, where inference producers (read: data centers) compete to give developers the best deal.

We aim to accelerate this process.

Get Started

  1. Fill out the form here to receive an API key.
  2. Set the base url and API key in your favorite OpenAI SDK, or use curl:
curl -N https://api.inference.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -d '{
    "model": "llama3",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "stream": true
  }'

Questions?

Email us: [email protected]


© 2024 inference.net