RECOMMENDED

Google Gemma 3
Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.
TRY IT

Llama 3.2 11B Vision Instruct
Llama 3.2-Vision, developed by Meta, is a state-of-the-art multimodal language model optimized for image recognition, reasoning, and captioning, surpassing both open and closed models in industry benchmarks.
TRY IT

Llama 3.1 8B Instruct
Meta Llama 3.1 is a collection of advanced, multilingual large language models designed for dialogues, available in 8B, 70B, and 405B sizes, that outperform many chat models on industry benchmarks and emphasize safe, responsible use in various applications.
TRY IT
TEXT-TO-TEXT
Prices shown are per 1 million tokens

Llama 3.1 8B Instruct
Meta Llama 3.1 is a collection of advanced, multilingual large language models designed for dialogues, available in 8B, 70B, and 405B sizes, that outperform many chat models on industry benchmarks and emphasize safe, responsible use in various applications.
TRY IT

Llama 3.2 1B Instruct
Llama 3.2 is a multilingual large language model collection from Meta, fine-tuned for dialogue and summarization tasks in multiple languages, designed for enhanced retrieval and conversational agents.
TRY IT

Llama 3.2 3B Instruct
Llama 3.2 is a multilingual large language model collection optimized for dialogue, retrieval, and summarization tasks with enhanced performance on industry benchmarks, employing supervised fine-tuning and reinforcement learning for safety and human-aligned responses.
TRY IT

Mistral Nemo 12B Instruct
Mistral-NeMo-12B-Instruct is a 12-billion-parameter multilingual large language model designed for English-language chat applications, featuring impressive multilingual and code comprehension, with customization options via NVIDIA's NeMo Framework.
TRY IT

Osmosis Structure 0.6B
Osmosis-Structure-0.6B is a small but capable language model optimized for generating structured outputs, particularly excelling in mathematical reasoning and problem-solving tasks with impressive performance enhancements through its structured training methodology.
TRY IT
IMAGE-TO-TEXT
Prices shown are per 1 million tokens

Google Gemma 3
Gemma 3 is a versatile, lightweight, multimodal open-source model family by Google DeepMind, primed for text and image processing and text generation, supporting over 140 languages with a 128K context window, designed for easy deployment in resource-constrained environments.
TRY IT

Llama 3.2 11B Vision Instruct
Llama 3.2-Vision, developed by Meta, is a state-of-the-art multimodal language model optimized for image recognition, reasoning, and captioning, surpassing both open and closed models in industry benchmarks.
TRY IT

Qwen 2.5 7B Vision Instruct
Qwen2.5-7B-Instruct is a multilingual large language model from Alibaba Cloud, offering enhanced capabilities in knowledge, coding, mathematics, and instruction-following, along with support for processing long texts and generating structured outputs like JSON.
TRY IT
EMBEDDINGS
Prices shown are per 1 million tokens