GLM-5.2 is now available on Inference.net

Deploy

DeployFully managed, global, turn-key AI infrastructure. Launch fast with dedicated uptime.

Observe

ObserveMonitor production AI with continuous benchmarking. Compare quality, latency, & cost.

Trace

TraceTrace every step your agents take. Capture LLM calls, tool calls, and framework steps.

Train

TrainCustom models in days, not months. Task-specific models tuned to your data.

Evaluate

EvaluateEvaluate AI model performance with rigorous benchmarks before deploying to production.

HALO

HALOOpen-source agent optimization. Analyze traces, rank failure modes, and ship concrete fixes.

Models Case Studies Pricing

Blog

BlogThe latest posts, updates, and announcements.

Guides

GuidesStep-by-step tutorials and how-to articles.

Articles

ArticlesTechnical articles and insights from our team.

120BTool-Calling Model

Nemotron 3 Super

Nemotron 3 Super is a high-throughput, open-weight 120B hybrid mixture-of-experts model by NVIDIA with 12B active parameters, optimized for complex agentic AI workflows. Featuring a 1-million-token context window, hybrid Mamba-transformer architecture, and multi-token prediction, it is designed for scalable deployment on workstations, data centers, and cloud environments.

Built byNVIDIA