CHANGELOG
Mar 21, 2025
March 21st, 2025
New Model: Google Gemma3
The new Gemma 3 model family is a multi-modal model capable of accepting image inputs and understands over 140 languages.
It has a 128k context length and comes in 1b, 4b, 12b, and 27b variants.
As of today, we're offering the 27b variant at $0.40 per million input/output tokens.
Structured Output and JSON Mode
Enable structured outputs support for several models:
- Meta Llama 3.1 8B Instruct (FP8 & FP16)
- Meta Llama 3.1 70B Instruct FP8
- Meta Llama 3.3 70B Instruct (FP8 & FP16)
- Meta Llama 3.2 3B Instruct FP16
- Meta Llama 3.2 11B Instruct FP16
- Mistral Nemo 12B Instruct FP8
- DeepSeek V3 FP8
- DeepSeek R1 Distill Llama 70B FP8
5x-8x Faster DeepSeek Model Performance
We made some huge optimizations to our DeepSeek R1 and V3 models that have resulted in token-per-second speedups in the range of 5x to 8x.
Feb 28, 2025
February 28th, 2025
Happy Friday from the inference.net team!
We have some exciting new releases this week.
---
New Models, Including DeepSeek R1 and DeepSeek V3
We've released several new models that you can find on the model explorer page.
Here's whats new:
We offer some of the best pricing for these models and are so excited to see what you build with them!
Image Input Support
Vision-language-models (VLM's for short) are a special type of language model which has the ability to see visually and accepts images as input.
We support image input with the Llama 3.2 11b model and support for image inputs is now generally available.
Visit the docs here to learn how you can start using image inputs in your app today.
New Website Launch
We've released a brand new landing page at inference.net, packed with interactive model playgrounds and pricing tools to help you explore our capabilities and optimize your costs. Let us know what you think!
Feb 21, 2025
Week of Feb 21, 2025
- Release of new model playground experience, allowing for easy testing of models on the dashboard.
- Added GitHub OAuth. This is now the recommended way to login to Inference.net.
- Improved TTFT: Requests are saved and dispatched to inference runtime 30% faster than previously, resulting in improved TTFT.
- Release of VLM support for multi-modals like meta/llama-3.1-11b
Feb 4, 2025
Website & Dashboard Migration to TanStack Start
Major frontend overhaul and new marketing website.
- TanStack Start Integration: Implemented server-side rendering (SSR) capabilities for improved SEO and initial page load performance.
- TanStack Router Migration: Moved from React Router to TanStack Router for enhanced type safety and better integration with our stack.
- File-based Routing: Adopted a more intuitive file-based routing system.
- Type-safe Routing: Implemented fully type-safe routing throughout the application.
- New Marketing Website: Shipped a brand new marketing website, which is fully integrated with our dashboard as a single frontend.
- Integrated CMS: Integrated a new CMS tool into our system, to support blog and changelog content.