SIMPLE LLM API
    90% LOWER COST

    Inference.net is a global network of data centers serving fast, scalable, pay-per-token APIs for models like DeepSeek V3 and Llama 3.3. Connect in minutes. Scale forever.

    Start for free

    Start for free

    Begin with $25 in free credits to explore our models via the Playground.

    Integrate in minutes

    Integrate in minutes

    Switch to Inference.net by changing a single line of code. Start saving today.

    Pay-as-you-go

    Pay-as-you-go

    Only pay for what you use. Set limits and monitor usage via our dashboards.

    TEXT TO TEXT

    Prices shown are per 1 million tokens

    ModelQuantizationInputOutput
    DeepSeek R1 FP8$0.50$3.00
    DeepSeek R1 Distill Llama 70B FP8$0.10$0.40
    DeepSeek V3 0324 FP8$0.45$1.45
    Google Gemma 3 BF16$0.30$0.40
    Llama 3.1 70B Instruct FP16$0.30$0.40
    Llama 3.1 8B Instruct FP16$0.02$0.03
    Llama 3.2 11B Vision Instruct FP16$0.055$0.055
    Llama 3.2 1B Instruct FP16$0.01$0.01
    Llama 3.2 3B Instruct FP16$0.02$0.02
    Llama 3.3 70B Instruct FP16$0.30$0.40
    Mistral Nemo 12B Instruct FP8$0.038$0.10
    Qwen 2.5 7B Vision Instruct BF16$0.20$0.20
    Qwen QWQ 32B FP8$0.20$0.20

    NEED A RESEARCH GRANT?

    Inference’s Grants program offers free compute resources to researchers and developers working on open-source AI projects. Fill out an application and our team will be in touch within 24 hours.

    NEED ENTERPRISE PRICING?

    Inference is the best solution for large scale operations looking to source affordable inference compute. Leverage our network's capabilities and our team's expertise for your next initiative.

    PLAYGROUND

    Total Cost = $0.00

    Time To First Token

    0ms

    Tokens Per Second

    0

    Total Tokens

    0

    Type a message to get started

    90% LOWER COST FOR THE SAME TOP MODELS

    Our pricing is up to 90% lower than other providers, with the same enterprise-grade reliability.

    Calculate Your Savings

    1B
    Input tokens: 700M | Output tokens: 300M

    Price on Together.ai

    $4.2K/mo

    Input: $3 / million tokens | Output: $7 / million tokens

    Price on

    $1.3K/mo

    Save 70%

    Input: $0.5 / million tokens | Output: $3 / million tokens

    OPENAI SDK COMPATIBLE

    Our APIs are OpenAI-compatible. Switch in under two minutes and start saving. A two-line code change is all you need.

    import OpenAI from "openai";
    
    const openai = new OpenAI({
      baseURL: "https://api.inference.net/v1",
      apiKey: process.env.INFERENCE_API_KEY,
    });
    
    const completion = await openai.chat.completions.create({
      model: "meta-llama/llama-3.1-8b-instruct/fp-16",
      messages: [
        {
          role: "user",
          content: "What is the meaning of life?"
        }
      ],
      stream: true,
    });
    
    for await (const chunk of completion) {
      process.stdout.write(chunk.choices[0]?.delta.content as string);
    }
    USE CASE

    REAL-TIME CHAT

    Powerful serverless inference APIs that scale from zero to billions.

    Top-Tier Performance Icon

    Top-Tier Performance

    Industry-leading latency and throughput powered by highly optimized GPU infrastructure.

    Unbeatable Pricing Icon

    Unbeatable Pricing

    Up to 90% cost savings vs legacy providers. Only pay for what you use, and never a penny more.

    Easy Integration Icon

    Easy Integration

    First-class support for LangChain, LlamaIndex and other popular LLM frameworks.

    Real-time Chat Illustration
    USE CASE

    BATCH INFERENCE

    Process millions of requests per batch with a single API call.

    Unmatched Scale & Cost Icon

    Unmatched Scale & Cost

    We handle the largest asynchronous LLM workloads at the lowest prices on the market.

    Build Advanced Workflows Icon

    Build Advanced Workflows

    Power massive-scale data analysis, synthetic data generation, document processing, and more with our batch API.

    Built for Developers Icon

    Built for Developers

    Easy to integrate. Find the code samples and documentation you need, when you need it.

    Batch Inference Illustration
    USE CASE

    DATA EXTRACTION

    Transform unstructured data into actionable insights with powerful schema validation and parsing.

    Precise Extraction Icon

    Precise Extraction

    Extract structured data with guaranteed schema compliance using JSON Schema validation. Handle complex nested objects with confidence.

    Flexible Processing Icon

    Flexible Processing

    Process data at scale with our Batch API, or stream response objects in real-time as they are generated.

    Familiar Tooling Icon

    Familiar Tooling

    First-class SDK support for TypeScript, Python, and more. Support for popular validation tools like Pydantic and Zod.

    Data Extraction Illustration

    JOIN THOUSANDS OF DEVS BUILDING THE FUTURE

    Arib Khan

    Arib Khan

    Founder, 24labs.ai

    We saved over $20k per month by switching to inference.net.

    Joel Martin

    Joel Martin

    Founder, SiteKick.co

    We were struggling to find a provider that had the features we needed and didn't cost an arm and a leg. Inference.net was the perfect fit.

    Rhys Sullivan

    Rhys Sullivan

    Product Engineer, Vercel

    If open source models are at the quality you need, inference.net may be helpful

    Michael Hess

    Michael Hess

    Co-founder & CTO, Outset.ai

    We use Inference.net for some of structured output tasks. It's a great product.

    Mike Pollard

    Mike Pollard

    Founder, Mikeathon

    We checked prices for all the top providers. Inference.net was the cheapest by a mile.

    Simple to Use Image
    Simple to Use Icon

    2 MINUTES TO INTEGRATE

    We designed our API from scratch to make integration as easy as possible. It takes only two minutes to fully integrate. Switch today. Satisfaction guaranteed.

    End-to-End Generative AI Image
    End-to-End Generative AI Icon

    END-TO-END GENERATIVE AI

    We power the most comprehensive generative AI workflows for your application without missing a beat. Get tokens to your users at blazing fast speeds.

    Affordable at Scale Image
    Affordable at Scale Icon

    AFFORDABLE AT EVERY SCALE

    We built custom AI-native orchestration and scheduling software to ensure you always get the best prices without compromising on performance.

    Optimized for the Future Image
    Optimized for the Future Icon

    TO INFINITY AND BEYOND

    We regularly update our model catalog when new models are released, so you always have access to the latest and greatest AI models.

    START BUILDING TODAY

    15 minutes could save you 50% or more on compute.