Nemotron 3 Super

Nemotron 3 Super is a high-throughput, open-weight 120B hybrid mixture-of-experts model by NVIDIA with 12B active parameters, optimized for complex agentic AI workflows. Featuring a 1-million-token context window, hybrid Mamba-transformer architecture, and multi-token prediction, it is designed for scalable deployment on workstations, data centers, and cloud environments.

API Usage

API IDENTIFIER

nvidia/nemotron3-super

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const completion = await openai.chat.completions.create({
  model: "nvidia/nemotron3-super",
  messages: [
    {
      role: "user",
      content: "What is the meaning of life?"
    }
  ],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta.content as string);
}

MODEL PROVIDER	NVIDIA
TYPE	Text to Text
PARAMETERS	120B
QUANTIZATION	FP8
CONTEXT LENGTH	1000K
PRICING	Input $2.50 / Million Tokens Output $5.00 / Million Tokens
JSON MODE
TOOL CALLING
DEPLOYMENT	Serverless Batch
DOCUMENTATION

Playground

Image input

System

Start a conversation. Your assistant output and role context will appear here.

Temperature

0.7

Output length

1,024

Response format

Top P

1.00

Frequency penalty

0.0

Presence penalty

0.0

Stop sequences

0/4

Related Models

NVIDIA

FP8

Nemotron 3 Super

Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.

ClipTagger 12B

ClipTagger-12b is a highly efficient, open-source 12-billion parameter vision-language model designed for scalable video understanding, providing frontier-quality performance through schema-consistent JSON outputs for video frames at a fraction of the cost of leading closed-source models.

Google Gemma 3

CONTACT

Meet with our research team

Schedule a call with our research team to learn more Specialized Language Models can cut costs and improve performance.