Nemotron 3 Super
Nemotron 3 Super is a high-throughput, open-weight 120B hybrid mixture-of-experts model by NVIDIA with 12B active parameters, optimized for complex agentic AI workflows. Featuring a 1-million-token context window, hybrid Mamba-transformer architecture, and multi-token prediction, it is designed for scalable deployment on workstations, data centers, and cloud environments.

API Usage
API IDENTIFIER
nvidia/nemotron3-superimport OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.inference.net/v1",
apiKey: process.env.INFERENCE_API_KEY,
});
const completion = await openai.chat.completions.create({
model: "nvidia/nemotron3-super",
messages: [
{
role: "user",
content: "What is the meaning of life?"
}
],
stream: true,
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta.content as string);
}Playground
Start a conversation. Your assistant output and role context will appear here.
0/4
Related Models

Nemotron 3 Super
Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.

ClipTagger 12B
ClipTagger-12b is a highly efficient, open-source 12-billion parameter vision-language model designed for scalable video understanding, providing frontier-quality performance through schema-consistent JSON outputs for video frames at a fraction of the cost of leading closed-source models.

Google Gemma 3
Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.
Meet with our research team
Schedule a call with our research team. We'll propose a train-and-serve plan that beats your current SLA and unit cost.

