Blog

Stay informed about models we're releasing, upgrades to our API services and our thoughts on the industry.

Project OSSAS: Custom LLMs to process 100 Million Research Papers

Project OSSAS is a large-scale open-science initiative to make the world’s scientific knowledge accessible through AI-generated summaries of research papers.

Nov 11, 2025

Sam Hogan

LOGIC: Trustless Inference through Log-Probability Verification

A practical method for verifying LLM inference requests in trustless environments.

Nov 5, 2025

Amar Singh

Hybrid-Attention models are the future for SLMs

Hybrid attention delivers up to 3x cost reduction compared to traditional transformer models

Nov 3, 2025

Amar Singh

Announcing our $11.8M Series Seed

We raised $11.8 million in funding led by Multicoin Capital and a16z CSX to train and hosti custom language models that are faster, more affordable, and more accurate than what the Big Labs offer.

Oct 14, 2025

Sam Hogan

Schematron: An LLM trained for HTML -> JSON at scale

Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs.

Sep 9, 2025

Sam Hogan

Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost

We're thrilled to announce the release of ClipTagger-12b, a groundbreaking open-source vision-language model that delivers GPT-4.1-level performance for video understanding at a fraction of the cost.

Aug 14, 2025

Sam Hogan

On the Economics of Hosting Open Source Models

The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?

Jul 29, 2025

Amar Singh

Batch vs Real-Time LLM APIs: When to Use Each

Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.

Jul 24, 2025

Michael Ryaboy

Do You Need Model Distillation? The Complete Guide

Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.

Jul 22, 2025

Sam Hogan

The Cheapest LLM Call Is the One You Don’t Await

Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.

Jul 21, 2025

Michael Ryaboy

Schematron

ClipTagger

View All Models

Blog

Project OSSAS: Custom LLMs to process 100 Million Research Papers

LOGIC: Trustless Inference through Log-Probability Verification

Hybrid-Attention models are the future for SLMs

Announcing our $11.8M Series Seed

Schematron: An LLM trained for HTML -> JSON at scale

Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost

On the Economics of Hosting Open Source Models

Batch vs Real-Time LLM APIs: When to Use Each

Do You Need Model Distillation? The Complete Guide

The Cheapest LLM Call Is the One You Don’t Await