News

    Introducing Catalyst: Train self-improving AI models

    Learn more

    Jun 12, 2026

    Helicone Pricing and Alternatives for Agent Observability (2026)

    Inference Research

    Helicone Pricing Explained: Free Tier, Paid Plans, and Where Costs Inflect

    Helicone pricing as of June 2026 breaks down to four tiers: a free Hobby plan with 10,000 requests per month, a Pro plan at $79 per month, a Team plan at $799 per month, and custom Enterprise pricing. Usage-based charges apply once you pass the free request allotment on paid plans.

    But the tier table is no longer the most important fact about Helicone. On March 3, 2026, Mintlify acquired Helicone, and the product now runs in maintenance mode: security updates, new model support, and bug fixes keep shipping, but active feature development has ended. If you're evaluating Helicone pricing in 2026, the real question has changed from "what does it cost?" to "what does it cost, and should you still be building on it?"

    This guide answers both. You'll get verified June 2026 numbers for every tier, a worked example at realistic agent traffic, and a framework for choosing among alternatives based on architecture (a proxy that logs requests versus a platform that traces agents) rather than a flat list of brands. We'll close with a migration path that's about as small as migrations get.

    The tiers as of June 2026

    Here's the full breakdown from Helicone's pricing page, verified June 11, 2026.

    PlanPrice/moRequestsRetentionStand-out limits
    Hobby$010,000 free7 days1 seat, 1 GB storage
    Pro$7910k + usage1 monthUnlimited seats, alerts, HQL
    Team$79910k + usage3 monthsSOC-2, HIPAA, 5 orgs
    EnterpriseCustomCustomForeverSSO, on-prem

    Verified June 11, 2026 from helicone.ai/pricing. Per-request overage rates beyond the free allotment are not published.

    The inflection points matter more than the sticker prices. The free Hobby plan caps you at 10,000 requests per month with 7-day retention and a single seat. Pro at $79 per month lifts you to unlimited seats and 1,000 logs per minute of ingestion, but retention only extends to one month. If you need SOC-2 or HIPAA compliance, you're looking at Team, a jump to $799 per month. Forever retention and on-prem deployment are Enterprise-only.

    There's one thing you won't find on the pricing page: a per-request overage rate. Helicone doesn't publish what each log costs beyond the free allotment. The embedded calculator returns estimated totals ($0.97 per month for 10K requests with 0.30 GB of storage, for example) without showing the underlying rates. For a production team trying to forecast a bill, that opacity is itself a data point.

    The gateway side is more transparent. Helicone's cloud AI Gateway, launched September 10, 2025, offers passthrough billing at a 0% markup, so the model spend itself isn't padded. Startups under the program's criteria get 50% off the first year.

    Free Tier Included Volume per Month
    Figure 1: Free Tier Included Volume per Month — each vendor meters a different unit; labels show each tool's own unit (June 2026)

    A worked example at realistic agent traffic

    Take a modest production agent: 2,000 sessions per day, eight LLM calls per session. That's 16,000 requests a day, or roughly 480,000 per month, which is 48 times Helicone's free allotment. The 10,000 free requests are gone before lunch on day one.

    From there, your bill is the base plan ($79 on Pro, $799 on Team) plus usage charges that the public pricing page doesn't let you compute. And on Pro, the one-month retention window means you can't compare this month's failure patterns against last month's; the older traces are already gone. These are illustrative numbers, not vendor quotes, but the shape holds at almost any production volume: the free tier is a demo budget, and the real cost curve is partially hidden.

    What changed in 2026: the Mintlify acquisition and maintenance mode

    Helicone earned its traction. Before the acquisition, it had processed 14.2 trillion tokens for around 16,000 organizations. The March 3, 2026 announcement was explicit about what happens next: services stay live "for the foreseeable future," security patches and new model support continue, and feature development stops.

    To be fair to the product, that's a workable state for some teams. The proxy is stable, and the platform is open source, so self-hosting remains a genuine option. But it means the gaps that exist today, most importantly the agent-tracing gaps covered below, are permanent. A maintenance-mode product doesn't close feature distance; everyone else opens it.

    Read time: 12 minutes

    What You Get for the Price — and What a Proxy Cannot See

    Helicone's pitch was always integration speed: "one line of code to monitor, evaluate, and experiment." Change your SDK's base URL, and every request flows through Helicone's edge. In return you get genuinely useful things: request and response logs with cost and token counts, a Rust-based AI Gateway that routes across 100+ models with caching, rate limiting, and failover built in. Sessions let you group related requests for multi-step workflows with session-level metrics. For Helicone LLM observability at the request level, that package works.

    The limits aren't a quality problem. They're an architecture problem. A proxy can only observe traffic that crosses it.

    Figure 2: What a proxy sees — only the LLM requests that cross it. Tool executions, framework steps, and retries inside your process never reach its logs.

    Figure 3: What in-process tracing captures — a parented span tree with agent and session identity, exported as OpenInference-shaped OpenTelemetry.

    Everything that happens inside your process is invisible to a proxy: the tool your agent executed and what it returned, the LangGraph node that routed the conversation, the retry loop that silently swallowed a parsing failure, the retrieval that came back empty. None of those events make an HTTP call to an LLM provider, so none of them appear in a proxy's logs. Sessions help group what the proxy does see, but they're metadata attached to requests — not a parented tree of everything the agent did.

    Here's what that looks like in practice. Your refund agent approves the wrong amount. The proxy log shows a perfectly healthy LLM call: well-formed prompt, valid completion, 200 status, $0.04. Nothing is red. The actual bug happened between the calls: a tool returned stale account data and the agent never re-checked it. That part of the run is exactly what the proxy architecturally cannot see.

    Proxy Logs vs. Full Agent Traces

    An LLM observability proxy is a server that sits between your application and a model provider, logging each request and response that passes through it. It captures what every call cost and how long it took. It cannot capture what your application did around those calls. With agents, that's where the failures live.

    Span-level tracing inverts the model: instead of watching traffic from outside, an SDK inside your process records the full execution as a tree. Four capabilities fall out of that, each tied to a debugging task a request log can't do:

    1. Tool calls with arguments and return values. Tracing records TOOL spans alongside CHAIN, RETRIEVER, and EMBEDDING spans, so "what did the tool actually return?" is a click, not an archaeology project.
    2. Framework steps. Integration instrumentation captures LangChain and LangGraph steps, OpenAI Agents SDK runs, Claude Agent SDK sessions, and a dozen other frameworks as structured spans.
    3. Session grouping as a real tree. Runs nest under an agent span carrying session.id, so a multi-turn conversation reads as one connected execution rather than scattered log rows.
    4. Stable agent identity across deploys. A persistent agent.id keeps the same agent's runs grouped through redeploys, renames, and environment changes. That stability is what makes week-over-week comparisons meaningful. The mechanics are simple enough to set up once and forget; see how stable agent identity across deploys works in practice.

    Portability is the quiet advantage of doing this on open standards. Spans emitted as OpenInference-shaped OpenTelemetry can be consumed by any OpenInference-aware viewer, so your instrumentation isn't married to one vendor's schema the way proxy logs are.

    Helicone Alternatives by Category

    Most Helicone alternatives lists are flat brand rundowns. The more useful question is architectural: do you want a gateway that logs requests, an SDK that traces executions, or both? Here's how the options sort by category, with pricing verified June 2026.

    Gateway-first: Helicone (and staying put)

    Staying on Helicone is a legitimate choice if request logs, caching, and failover are all you need. The gateway is fast, the passthrough billing carries 0% markup, and self-hosting the open-source platform sidesteps both the pricing opacity and the acquisition risk. Just go in with clear eyes: maintenance mode means the proxy-vs-traces gap is now permanent.

    Tracing-first: LangSmith, Langfuse, Arize Phoenix

    LangSmith is the natural pick for teams deep in the LangChain ecosystem. The Developer plan is free with 5,000 base traces per month; Plus runs $39 per seat per month with 10,000 base traces included. Beyond that, base traces with 14-day retention cost $2.50 per 1,000, and extended 400-day retention traces cost $5.00 per 1,000. It's closed source, and trace costs compound at agent volumes.

    Langfuse is the open-source counterpoint, and the most common "helicone vs langfuse" comparison comes down to one line: Helicone is a proxy in your request path; Langfuse is an SDK that observes asynchronously outside it. Langfuse self-hosting is free. Langfuse Cloud starts with a free Hobby tier (50,000 units per month, 30-day retention), then Core at $29 per month, Pro at $199, and Enterprise at $2,499, with overage at $8 per 100,000 units tapering to $6 at high volume.

    Arize Phoenix is fully open source, self-hosted, and uncapped. Arize also created the OpenInference conventions much of the industry now traces in. The managed Arize AX adds a free tier of 25,000 spans per month with 15-day retention and a Pro plan at $50 per month.

    All three give you real trace trees. What none of them gives you is a gateway: no caching or failover at the edge, no cost capture before the SDK, and one more vendor if you still want routing.

    Combined gateway + OpenTelemetry tracing: Catalyst

    Catalyst (by inference.net) is built on the premise that the proxy-versus-tracing choice is false. The Gateway works like Helicone's: point your SDK at a new base URL and every request is captured with full payloads, per-call cost, latency including time to first token, token counts, and error rates, all at roughly 10ms of added overhead. The tracing SDKs add the span depth a proxy can't see, emitting OpenInference-shaped OpenTelemetry.

    Captured traffic doesn't dead-end in a viewer, either: you can filter it and build datasets from production traffic for evals or fine-tuning, score outputs with plain-English rubrics judged by an LLM, and train smaller task-specific models from the same data.

    The economics are also a different shape. The free tier on the Catalyst pricing page includes 1 million gateway requests and 1 million OTel spans per month; Helicone's free tier, for comparison, is 10,000 requests. Paid plans start at $25 per month (Starter) and $250 (Growth).

    Entry Paid Tier: Monthly Base Price
    Figure 4: Entry Paid Tier: Monthly Base Price — lowest paid plan per tool, June 2026; usage charges excluded; LangSmith is per seat

    Comparison Table: Helicone vs. Alternatives at a Glance

    Two tables, because pricing and capability are different decisions. One caution as you read: every vendor meters a different unit. Helicone counts requests, Langfuse counts units, LangSmith counts traces, and Arize counts spans, so the "included" numbers are not directly interchangeable.

    ToolFree tier includesEntry paid tierOverage / metering
    Helicone10k requests/mo$79/mo (Pro)Per request, unpublished
    Langfuse50k units/mo$29/mo (Core)$8 per 100k units
    LangSmith5k traces/mo$39/seat/mo (Plus)$2.50 per 1k traces
    Arize Phoenix / AXOSS uncapped; AX 25k spans/mo$50/mo (AX Pro)Span-based
    Catalyst1M requests + 1M spans/mo$25/mo (Starter)Plan limits

    Prices verified June 11, 2026 from each vendor's pricing page. LangSmith base traces carry 14-day retention ($2.50/1k); extended 400-day traces cost $5/1k. Langfuse overage tapers to $6 per 100k units at 50M+. Langfuse and Phoenix are free to self-host. Helicone's cloud gateway passes model spend through at 0% markup.

    ToolGateway routingTrace depthEvalsDatasets → fine-tuning
    Helicone✅ Proxy + cachingRequest logs, sessionsBasic
    Langfuse❌ SDK onlyFull spans (OTel)Datasets only
    LangSmith❌ SDK onlyFull spansDatasets only
    Arize Phoenix / AX❌ SDK onlyFull spans (OpenInference)Datasets only
    Catalyst✅ Gateway + cost captureFull spans (OpenInference/OTel)✅ Rubric judges✅ Traffic → train

    Catalyst adds cross-run failure analysis (Halo) on top of traces — ranked failure modes citing trace IDs. "Datasets only" means eval/test datasets without a managed fine-tuning path on the same platform.

    The Missing Layer in Every Log Viewer: Cross-Run Failure Analysis

    Every tool above, Helicone included, ends at the same place: a human reading one log or one trace at a time. That works when you're debugging a single bad run someone reported. It doesn't answer the question that actually determines agent quality: what goes wrong most often across the last 10,000 runs?

    This is where Halo comes in. Halo (Hierarchical Agent Loop Optimization) is an open-source, RLM-based engine that reads OpenTelemetry-compatible spans, decomposes them across many runs, and returns a ranked list of systemic failure modes. Each finding cites the specific trace IDs it came from and arrives with concrete recommended fixes. Inside Catalyst it's hosted in the Agents tab's Analysis view, where you can run Halo on your traces on demand or put it on a schedule, hourly to monthly, with any single window capped at 30 days. Prefer to run it yourself? It's pip install halo-engine and a JSONL of traces.

    Notice what makes this possible: structure. Halo can rank "tool X returns empty results in 14% of sessions" only because the traces contain tool spans, framework steps, and session boundaries to decompose. Proxy request logs don't have that structure, which is why no amount of dashboarding on top of them produces this workflow.

    If your team's debugging loop is "someone notices, someone scrolls logs, someone guesses," this is the upgrade worth switching for.

    Find out why your agents fail

    Halo reads your production traces and returns a ranked list of failure modes, with the trace IDs to prove it. It works across all your traffic instead of one trace at a time.

    Migrating from Helicone: A One-Line Change, Then a Drop-In setup()

    The irony of leaving a one-line-integration product is that leaving is also one line.

    Step 1: move the proxy. Helicone onboarded you by swapping a base URL. The gateway quickstart is the same motion in reverse: point your SDK at https://api.inference.net/v1, authenticate with your Catalyst project key, and pass your provider key in the x-inference-provider-api-key header. The gateway adds roughly 10ms of latency and forwards your requests to the provider unchanged.

    import os
    from openai import OpenAI
    
    client = OpenAI(
        # Before, this base_url pointed at your logging proxy. Now it points at
        # the Catalyst Gateway; everything else about your client stays the same.
        base_url="https://api.inference.net/v1",
        api_key=os.environ["INFERENCE_API_KEY"],
        default_headers={
            "x-inference-provider-api-key": os.environ["OPENAI_API_KEY"],
            "x-inference-provider": "openai",
        },
    )
    
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Hello, world!"}],
    )
    
    print(response.choices[0].message.content)

    Optional headers tag each request with an environment, a task ID, or arbitrary metadata, all filterable in the dashboard. From the first request, you're capturing full payloads, cost per call, latency with time to first token, and token counts.

    Step 2: add the tracing depth. This is the part that has no Helicone equivalent. Install inference-catalyst-tracing (Python) or @inference/tracing (TypeScript) per the tracing quickstart, call setup() once before your LLM clients are constructed (the Python SDK auto-detects installed integrations), and wrap each logical run in agent_span so it carries agent and session identity.

    import os
    
    from inference_catalyst_tracing import agent_span, setup
    from openai import OpenAI
    
    tracing = setup()  # auto-detects installed integrations, patches OpenAI
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    
    with agent_span(
        tracing.tracer,
        agent_id="refund-review-agent",
        agent_name="Refund Review Agent",
        session_id="conversation-refund-1842",
    ) as span:
        span.set_input("Review refund request #1842.")
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Review refund request #1842."}],
            max_tokens=16,
        )
        span.set_output(response.choices[0].message.content or "")
    
    tracing.shutdown()

    Integrations cover OpenAI, Anthropic, the Vercel AI SDK, LangChain, LangGraph, Pydantic AI, the OpenAI Agents SDK, the Claude Agent SDK, and more, so most stacks need zero manual span work.

    That's the whole migration: one base URL change for the gateway, one setup() call for the traces.

    Which Setup Fits Your Team

    Prototyping. Free tiers are fine; don't over-instrument. Helicone's 10,000 free requests or Langfuse's free Hobby tier (50,000 units) will carry a prototype comfortably. Optimize for iteration speed, not observability depth.

    Production, single agent. Once real users hit one agent, request logs stop being enough. You need tool calls, sessions, and the ability to answer "what changed since last week." That means a tracing-first tool or a combined platform. Watch the per-unit math at your volume: LangSmith's $2.50 per 1,000 base traces and Langfuse's $8 per 100,000 units are very different bills at 500K runs a month.

    Multi-agent at scale. Multiple agents, multiple models, real budgets. Here the combined architecture earns its keep: cost attribution at the gateway edge, span-level traces for debugging, cross-run analysis to rank what to fix, and a path from captured traffic to datasets, evals, and fine-tuned models without exporting anything. Catalyst's free tier (1M requests, 1M spans monthly) covers a surprising amount of this before you pay anything.

    And honestly: stay on Helicone if request logs genuinely suffice, your traffic fits the free or Pro thresholds, and you're comfortable self-hosting a maintenance-mode but stable open-source proxy. Not every team needs agent traces. Most teams shipping agents do.

    Conclusion

    The 2026 Helicone pricing question isn't really about the $79 or the $799. It's about whether to keep building on a proxy whose feature development ended in March, and if not, what architecture replaces it. Request-logging proxies tell you what each call cost. Agent platforms tell you why the run failed. Pick based on which question your team asks more often; for most teams shipping agents, it's the second one.

    If you've decided you want both gateway economics and trace depth, the switch costs one line.

    Provider routing without the DIY ops

    The Catalyst Gateway gives you multi-provider routing with observability built in. Point your existing SDK at the gateway, keep your own provider keys, and every request is captured automatically.

    FAQ

    Is Helicone free?

    Helicone has a free Hobby tier with 10,000 requests per month, 1 GB of storage, one seat, and 7-day data retention. Production traffic typically exceeds the request cap quickly, at which point you're on Pro ($79/month) or Team ($799/month) plus usage-based charges.

    What happened to Helicone?

    Mintlify acquired Helicone on March 3, 2026. The product remains live in maintenance mode (security updates, new models, and bug fixes continue), but active feature development has ended.

    What's the practical difference between Helicone and Langfuse?

    Helicone is a proxy: traffic routes through it, so you get logging plus gateway features like caching, with a base-URL integration. Langfuse is an open-source, SDK-based tracing tool that observes from inside your app without sitting in the request path, free to self-host with cloud plans from $29 per month.

    Can Helicone trace agent tool calls?

    Not in the span-level sense. Helicone's Sessions group related requests with metadata, but a proxy only sees the LLM calls that cross it; tool executions, framework steps, and retries inside your process never reach it. Capturing those requires in-process tracing instrumentation that records them as structured spans.


    CONTACT

    Meet with our research team

    Schedule a call with our research team to learn more about how Specialized Language Models can cut costs and improve performance.