DeepSeek-V3-0324 is now live.Try it

    EXPLORE MODELS

    DeepSeek R1 visualization
    DeepSeek
    FP8

    DeepSeek R1

    DeepSeek-R1 is an open-source first-generation reasoning model leveraging large-scale reinforcement learning to achieve state-of-the-art performance in math, code, and reasoning tasks, and includes distilled models suitable for various applications.

    $0.75 / $3.00
    125K Context
    DeepSeek V3 visualization
    DeepSeek
    FP8

    DeepSeek V3

    DeepSeek-V3 is a 671 billion parameter Mixture-of-Experts (MoE) language model optimized for efficiency and performance, demonstrating superior results across various benchmarks through innovative strategies and extensive pre-training on high-quality data.

    $0.40 / $1.20
    125K Context
    Llama 3.1 70B Instruct visualization
    Meta
    FP16

    Llama 3.1 70B Instruct

    The Meta Llama 3.1 collection consists of high-performing, multilingual large language models optimized for dialogue and capable of handling text and code across 8 languages, available in 8B, 70B, and 405B parameter sizes, with a focus on safety, inclusivity, and societal benefit.

    $0.30 / $0.40
    16K Context
    JSON
    Llama 3.1 8B Instruct visualization
    Meta
    FP8

    Llama 3.1 8B Instruct

    Meta Llama 3.1 is a collection of advanced, multilingual large language models designed for dialogues, available in 8B, 70B, and 405B sizes, that outperform many chat models on industry benchmarks and emphasize safe, responsible use in various applications.

    $0.025 / $0.025
    16K Context
    JSON
    Llama 3.2 11B Vision Instruct visualization
    Meta
    FP16

    Llama 3.2 11B Vision Instruct

    Llama 3.2-Vision, developed by Meta, is a state-of-the-art multimodal language model optimized for image recognition, reasoning, and captioning, surpassing both open and closed models in industry benchmarks.

    $0.055 / $0.055
    16K Context
    JSON
    Mistral Nemo 12B Instruct visualization
    Mistral
    FP8

    Mistral Nemo 12B Instruct

    Mistral-NeMo-12B-Instruct is a 12-billion-parameter multilingual large language model designed for English-language chat applications, featuring impressive multilingual and code comprehension, with customization options via NVIDIA's NeMo Framework.

    $0.038 / $0.10
    16K Context
    JSON
    Tool Calling

    PLAYGROUND

    Total Cost = $0.00

    Time To First Token

    0ms

    Tokens Per Second

    0

    Total Tokens

    0

    Type a message to get started

    90% LOWER COST FOR THE SAME TOP MODELS

    Our pricing is up to 90% lower than other providers, with the same enterprise-grade reliability.

    Calculate Your Savings

    1B
    Input tokens: 700M | Output tokens: 300M

    Price on Together.ai

    $4.2K/mo

    Input: $3 / million tokens | Output: $7 / million tokens

    Price on

    $1.4K/mo

    Save 66%

    Input: $0.75 / million tokens | Output: $3 / million tokens

    OPENAI SDK COMPATIBLE

    Our APIs are OpenAI-compatible. Switch in under two minutes and start saving. A two-line code change is all you need.

    import OpenAI from "openai";
    
    const openai = new OpenAI({
      baseURL: "https://api.inference.net/v1",
      apiKey: process.env.INFERENCE_API_KEY,
    });
    
    const completion = await openai.chat.completions.create({
      model: "meta-llama/llama-3.1-8b-instruct/fp-8",
      messages: [
        {
          role: "user",
          content: "What is the meaning of life?"
        }
      ],
      stream: true,
    });
    
    for await (const chunk of completion) {
      process.stdout.write(chunk.choices[0]?.delta.content as string);
    }
    USE CASE

    REAL-TIME CHAT

    Powerful serverless inference APIs that scale from zero to billions.

    Top-Tier Performance Icon

    Top-Tier Performance

    Industry-leading latency and throughput powered by highly optimized GPU infrastructure.

    Unbeatable Pricing Icon

    Unbeatable Pricing

    Up to 90% cost savings vs legacy providers. Only pay for what you use, and never a penny more.

    Easy Integration Icon

    Easy Integration

    First-class support for LangChain, LlamaIndex and other popular LLM frameworks.

    Real-time Chat Illustration
    USE CASE

    BATCH INFERENCE

    Process millions of requests per batch with a single API call.

    Unmatched Scale & Cost Icon

    Unmatched Scale & Cost

    We handle the largest asynchronous LLM workloads at the lowest prices on the market.

    Build Advanced Workflows Icon

    Build Advanced Workflows

    Power massive-scale data analysis, synthetic data generation, document processing, and more with our batch API.

    Built for Developers Icon

    Built for Developers

    Easy to integrate. Find the code samples and documentation you need, when you need it.

    Batch Inference Illustration
    USE CASE

    DATA EXTRACTION

    Transform unstructured data into actionable insights with powerful schema validation and parsing.

    Precise Extraction Icon

    Precise Extraction

    Extract structured data with guaranteed schema compliance using JSON Schema validation. Handle complex nested objects with confidence.

    Flexible Processing Icon

    Flexible Processing

    Process data at scale with our Batch API, or stream response objects in real-time as they are generated.

    Familiar Tooling Icon

    Familiar Tooling

    First-class SDK support for TypeScript, Python, and more. Support for popular validation tools like Pydantic and Zod.

    Data Extraction Illustration

    JOIN THOUSANDS OF DEVS BUILDING THE FUTURE

    Arib Khan

    Arib Khan

    Founder, 24labs.ai

    We saved over $20k per month by switching to inference.net.

    Joel Martin

    Joel Martin

    Founder, SiteKick.co

    We were struggling to find a provider that had the features we needed and didn't cost an arm and a leg. Inference.net was the perfect fit.

    Rhys Sullivan

    Rhys Sullivan

    Product Engineer, Vercel

    If open source models are at the quality you need, inference.net may be helpful

    Michael Hess

    Michael Hess

    Co-founder & CTO, Outset.ai

    We use Inference.net for some of structured output tasks. It's a great product.

    Mike Pollard

    Mike Pollard

    Founder, Mikeathon

    We checked prices for all the top providers. Inference.net was the cheapest by a mile.

    Simple to Use Image
    Simple to Use Icon

    2 MINUTES TO INTEGRATE

    We designed our API from scratch to make integration as easy as possible. It takes only two minutes to fully integrate. Switch today. Satisfaction guaranteed.

    End-to-End Generative AI Image
    End-to-End Generative AI Icon

    END-TO-END GENERATIVE AI

    We power the most comprehensive generative AI workflows for your application without missing a beat. Get tokens to your users at blazing fast speeds.

    Affordable at Scale Image
    Affordable at Scale Icon

    AFFORDABLE AT EVERY SCALE

    We built custom AI-native orchestration and scheduling software to ensure you always get the best prices without compromising on performance.

    Optimized for the Future Image
    Optimized for the Future Icon

    TO INFINITY AND BEYOND

    We regularly update our model catalog when new models are released, so you always have access to the latest and greatest AI models.

    START BUILDING TODAY

    15 minutes could save you 50% or more on compute.