Changelog
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.
9/25/24
Multi-Language Support & Smoother Node Balancing
We’re excited to introduce Multi-Language Prompt Support, allowing you to run inference in over 20 languages, with automatic context translation. This new feature supports localized queries in Spanish, French, Mandarin, and more, making your LLM even more accessible globally. Alongside this, we’ve improved our Node Balancing Algorithm, leading to a 15% boost in performance when distributing inference across multi-region clusters. We’ve also addressed a bug where prompts exceeding 3,000 tokens could hang during parallel processing, ensuring smoother, uninterrupted operations for larger queries.
9/15/24
Dynamic Scaling & Memory Leak Fix
We’ve rolled out Dynamic Inference Scaling, which automatically adjusts node capacity based on real-time usage spikes and the complexity of the model being used. This ensures that your high-demand scenarios are handled seamlessly, with minimal delay. Additionally, we’ve tackled a persistent issue causing memory leaks when fine-tuned models were run on legacy hardware nodes, stabilizing performance across the board. Improvements to our API Rate Limits now give you more precise control over request bursts, reducing rejection rates by 22% and enhancing reliability during peak loads.
9/6/24
Custom Model Deployment & Streamlined API Authentication
With the new Custom Model Deployment feature, users can now upload and run their own large language models within our distributed network, supporting popular frameworks like PyTorch and TensorFlow. This feature provides greater flexibility for users who want to leverage their proprietary models. In tandem, we’ve simplified the API Authentication process by adding support for OAuth2.0 and Single Sign-On (SSO) integration, making secure team access faster and easier. Lastly, a bug affecting response times from non-US regions has been fixed, ensuring consistent performance globally.
8/29/24
Session History & GPU Efficiency Improvements
We’ve launched Inference Session History, allowing users to track their previous queries, monitor model usage, and review output statistics for up to 30 days. This gives you more insight and control over your workflows, enabling better long-term planning and analysis. We’ve also optimized GPU Utilization for distributed nodes, cutting energy consumption by 10% while maintaining peak performance. Lastly, we addressed an issue with our LLM response cache, which was incorrectly reusing outdated results, ensuring fresher, more accurate responses moving forward.
8/20/24
Model Previews & Dashboard Upgrades
The new LLM Model Preview feature allows users to quickly view inference outputs from different models before committing to a full session, saving valuable time and improving efficiency. We’ve also updated our Dashboard UI with new real-time performance monitoring widgets, helping you keep track of node health, usage statistics, and resource allocation with ease. Additionally, we’ve fixed a bug related to multi-turn conversations not persisting properly across distributed nodes, improving the experience for users with long-running chat or dialogue systems.