Mistral Nemo 12B Instruct

Mistral-NeMo-12B-Instruct is a 12-billion-parameter multilingual large language model designed for English-language chat applications, featuring impressive multilingual and code comprehension, with customization options via NVIDIA's NeMo Framework.

API USAGE

API IDENTIFIER

mistralai/mistral-nemo-12b-instruct/fp-8

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const completion = await openai.chat.completions.create({
  model: "mistralai/mistral-nemo-12b-instruct/fp-8",
  messages: [
    {
      role: "user",
      content: "What is the meaning of life?"
    }
  ],
  stream: true,
});

for await (const chunk of completion) {
  process.stdout.write(chunk.choices[0]?.delta.content as string);
}

MODEL PROVIDER	Mistral
TYPE	Text to Text
PARAMETERS	12B
QUANTIZATION	FP8
CONTEXT LENGTH	16K
PRICING	Input $0.04 / Million Tokens Output $0.10 / Million Tokens
JSON MODE
TOOL CALLING
DEPLOYMENT	Serverless Batch
DOCUMENTATION

PLAYGROUND

Total Cost = $0.00

Time To First Token

0ms

Tokens Per Second

Total Tokens

Total Cost = $0.00

TTFT:

0ms

TPS:

Total Tokens:

Total Cost = $0.00

Time To First Token

0ms

Tokens Per Second

Total Tokens

Total Cost = $0.00

TTFT:

0ms

TPS:

Total Tokens:

Type a message to get started

System Prompt

Tweak the overall style and tone of the conversation.

Temperature

0.7

Control how creative you'd like the model to be when responding to you.

Output length

1,024

Set the maximum token length of generated text.

RELATED MODELS

Google

BF16

Google Gemma 3

Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.

TRY IT

Llama 3.2 11B Vision Instruct

Llama 3.2-Vision, developed by Meta, is a state-of-the-art multimodal language model optimized for image recognition, reasoning, and captioning, surpassing both open and closed models in industry benchmarks.

TRY IT

Llama 3.1 8B Instruct

Meta Llama 3.1 is a collection of advanced, multilingual large language models designed for dialogues, available in 8B, 70B, and 405B sizes, that outperform many chat models on industry benchmarks and emphasize safe, responsible use in various applications.

$0.025 / $0.025

16K Context

JSON

TRY IT

Mistral-NeMo-12B-Instruct

Model Overview:

Mistral-NeMo-12B-Instruct is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.

Key features

Released under the Apache 2 License
Pre-trained and instructed versions
Trained with a 128k context window
Comes with a FP8 quantized version with no accuracy loss
Trained on a large proportion of multilingual and code data

Intended use

Mistral-NeMo-12B-Instruct is a chat model intended for use for the English language.

The instruct model itself can be further customized using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using NeMo-Aligner.

Model Developer: NVIDIA and MistralAI

Model Dates: Mistral-NeMo-12B-Instruct was trained between June 2024 and July 2024.

Data Freshness: The pretraining data has a cutoff of April 2024.

Transformers format: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

Model Architecture:

Mistral-NeMo-12B-Instruct is a transformer model, with the following architecture choices:

Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,436
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Rotary embeddings (theta = 1M)
Vocabulary size: 2**17 ~= 128k

Architecture Type: Transformer Decoder (auto-regressive language model)

Evaluation Results

MT Bench (dev): 7.84
MixEval Hard: 0.534
IFEval-v5: 0.629
Wildbench: 42.57

Limitations

The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Save up to 90% on Mistral Nemo 12B Instruct inference

Deploy in under five minutes and immediately start saving money on your inference bill.