News

    Introducing Catalyst: Train self-improving AI models

    Learn more

    Nemotron 3 Super

    Nemotron 3 Super is a high-throughput, open-weight 120B hybrid mixture-of-experts model by NVIDIA with 12B active parameters, optimized for complex agentic AI workflows. Featuring a 1-million-token context window, hybrid Mamba-transformer architecture, and multi-token prediction, it is designed for scalable deployment on workstations, data centers, and cloud environments.

    Nemotron 3 Super model graphic

    API Usage

    API IDENTIFIER

    nvidia/nemotron3-super
    import OpenAI from "openai";
    
    const openai = new OpenAI({
      baseURL: "https://api.inference.net/v1",
      apiKey: process.env.INFERENCE_API_KEY,
    });
    
    const completion = await openai.chat.completions.create({
      model: "nvidia/nemotron3-super",
      messages: [
        {
          role: "user",
          content: "What is the meaning of life?"
        }
      ],
      stream: true,
    });
    
    for await (const chunk of completion) {
      process.stdout.write(chunk.choices[0]?.delta.content as string);
    }
    MODEL PROVIDERNVIDIA
    TYPEText to Text
    PARAMETERS120B
    QUANTIZATIONFP8
    CONTEXT LENGTH1000K
    PRICINGInput $2.50 / Million Tokens
    Output $5.00 / Million Tokens
    JSON MODE
    TOOL CALLING
    DEPLOYMENT
    Serverless
    Batch
    DOCUMENTATION

    Playground

    Schematron Docs
    HTML Source
    Loading...
    Runs0

    No runs yet

    Click Run to start extracting structured output.

    CONTACT

    Meet with our research team

    Schedule a call with our research team to learn more about how Specialized Language Models can cut costs and improve performance.