News

    Day Zero support for Nemotron 3 Super.

    Learn more

    Nemotron 3 Super

    Nemotron 3 Super is a high-throughput, open-weight 120B hybrid mixture-of-experts model by NVIDIA with 12B active parameters, optimized for complex agentic AI workflows. Featuring a 1-million-token context window, hybrid Mamba-transformer architecture, and multi-token prediction, it is designed for scalable deployment on workstations, data centers, and cloud environments.

    Nemotron 3 Super model graphic

    API Usage

    API IDENTIFIER

    nvidia/nemotron3-super
    import OpenAI from "openai";
    
    const openai = new OpenAI({
      baseURL: "https://api.inference.net/v1",
      apiKey: process.env.INFERENCE_API_KEY,
    });
    
    const completion = await openai.chat.completions.create({
      model: "nvidia/nemotron3-super",
      messages: [
        {
          role: "user",
          content: "What is the meaning of life?"
        }
      ],
      stream: true,
    });
    
    for await (const chunk of completion) {
      process.stdout.write(chunk.choices[0]?.delta.content as string);
    }
    MODEL PROVIDERNVIDIA
    TYPEText to Text
    PARAMETERS120B
    QUANTIZATIONFP8
    CONTEXT LENGTH1000K
    PRICINGInput $2.50 / Million Tokens
    Output $5.00 / Million Tokens
    JSON MODE
    TOOL CALLING
    DEPLOYMENT
    Serverless
    Batch
    DOCUMENTATION

    Playground

    System

    Start a conversation. Your assistant output and role context will appear here.

    CONTACT

    Meet with our research team

    Schedule a call with our research team. We'll propose a train-and-serve plan that beats your current SLA and unit cost.