Choosing an LLM Gateway in 2026: SaaS, Library, or Self-Hosted

2026年5月12日 · llm-gateway · comparison · architecture

By mid-2026, the LLM gateway category has settled into three architectural shapes. Choosing between them isn’t really about features — most cover the same checklist — it’s about which axis of complexity you want to absorb.

This post walks through the three categories with the trade-offs that matter when you’re evaluating one for production.

What you actually need from a gateway

Skip this section if you’ve read What is an LLM API Gateway.

The short version: a gateway centralizes the cross-cutting concerns of talking to multiple LLM providers — auth, routing, billing, retries, observability, prompt caching, content safety. The interesting variance is in how those concerns are packaged.

Option 1: Managed SaaS

The shape: someone else runs the gateway service. You point your SDK at their hostname, give them your provider keys (or use their consolidated pool), and they handle everything from billing to retries.

Examples include OpenRouter (closest to a pure routing layer with consolidated billing), Portkey (gateway + prompt management), and Helicone (more observability than gateway, but in the neighborhood).

Strong points:

Time-to-value is fastest. You can be in production in an hour.
Pooled provider keys give better rate-limit headroom than any single-tenant deployment can match.
Centralized incident communication. When a provider is down, the SaaS operator sees it across all customers and can route accordingly.
No infrastructure to maintain. No Redis, no Postgres, no on-call rotation.

Trade-offs:

You give the operator visibility into your prompts. For most use cases this is fine; for regulated workloads, internal IP, or competitive data it’s a hard blocker.
Per-token margin is added on top of upstream costs. Typically 5%–10%. At small volumes this is negligible; past low six-figure annual spend it becomes a real number.
You’re at the vendor’s mercy for features. Custom routing logic, internal SSO, or specific compliance reports may not be available.
Pricing transparency varies. Some operators make the surcharge clear; others obfuscate it with credits, tiers, or per-model markup.

Use this if you’re a small team, your prompts aren’t sensitive, you want to ship fast, and you’re optimizing for engineer-hours over LLM dollars.

Option 2: Library or SDK inside your application

The shape: a Python or TypeScript library lives inside your application, abstracting provider differences at the SDK level. There’s no separate gateway service; routing, retries, and unified APIs are all in-process.

Examples include LiteLLM (the most prominent in this category), Vercel’s AI SDK, and LangChain’s LLM module.

Strong points:

Zero added network hop. The “gateway” is just function calls in your app, so there’s zero added latency.
No additional infrastructure. Deploys with your application.
Open source, no lock-in. You can fork, modify, or replace.
Trivial to start with. pip install litellm, configure providers, done.

Trade-offs:

Centralization is fake. If you have five services, each one runs its own copy of the library with its own configuration. Updating a model price means redeploying five services.
Cross-service observability is hard. “How much did feature X cost this month?” requires every service to ship usage data to a central place — which is exactly the centralization the library avoided providing.
No multi-tenant primitive. A library has no concept of “this request belongs to user 1234, deduct from their quota.” You build that on top.
Streaming and cross-protocol bridging is harder in-process. A separate gateway service can buffer, transform, and re-stream cleanly; an in-process library has to interleave with your application’s request handling loop.

Use this if you’re a single application talking to multiple providers, you don’t have per-tenant billing requirements, and you want to keep operational surface area small.

Option 3: Self-hosted gateway service

The shape: a dedicated gateway service runs in your infrastructure. Applications call it over HTTP; it owns auth, routing, billing, caching, and observability as a centralized service.

Examples include Synthorai (this site’s project), Kong AI Gateway, and rolling your own from scratch.

Strong points:

Prompts never leave your network. Important for any regulated or competitive workload.
Real multi-tenancy. Per-user / per-team quota, BYOK, custom rate limits — these are natural to express in a dedicated service.
Custom routing logic. Route certain models to a private endpoint for compliance, or A/B test prompts across providers — easy in a dedicated service, awkward in a library.
No surcharge tax. You pay upstream at list price; the gateway is your own infrastructure cost.

Trade-offs:

You operate it. Redis quota state, billing reconciliation, key rotation, monitoring — all your problem.
Higher engineer-hour investment. Even with an open-source base, you’re taking on a production system with its own incidents and on-call surface.
Slower time-to-value than SaaS. Days to weeks to stand up properly.
Cross-protocol translation is your problem. OSS gateways vary in how thoroughly they implement OpenAI ↔ Anthropic ↔ Gemini translation; the long tail of streaming, tool, and vision edge cases is where things break.

Use this if prompts are sensitive, you have multi-tenant requirements (team budgets, BYOK), you have the engineering capacity to operate the service, or your LLM spend is high enough that SaaS surcharges start to dominate.

Decision matrix

Concern	Managed SaaS	Library	Self-hosted
Time to production	Hours	Hours	Days–weeks
Latency overhead	+30–80ms (network)	0ms	+5–20ms (intra-VPC)
Cost overhead	5–10% surcharge	0	infra cost (~$50–500/mo)
Prompt privacy	Operator sees	App-only	Internal only
Multi-tenant billing	Supported	DIY	Supported
BYOK support	Usually	DIY	Usually
Cross-protocol translation	Mature	Varies	Varies
Operational burden	None	Low	Medium–high
Vendor lock-in	High	None	None

The pattern: pick the leftmost option that doesn’t hit a deal-breaker constraint. Most teams start with SaaS, hit a constraint (cost, privacy, or custom requirements), then migrate.

Migration paths

The category boundaries are softer than they look.

SaaS → Self-hosted. The cross-protocol API is usually OpenAI-compatible on both sides, so application code doesn’t change. The lift is in copying over routing rules and rebuilding the billing pipeline.
Library → Self-hosted. Extract the routing logic from your application into a service, expose the same SDK interface. Usually a one-week project for someone familiar with the codebase.
Self-hosted → SaaS. Less common, but happens when teams realize they don’t actually need the customization they thought they did.

The cheap insurance is to write your application against an OpenAI-compatible endpoint regardless of which option you pick. This keeps every migration path open at low cost.

A note on disclosure

This blog is published by the synthorai project, which is a self-hosted gateway in category 3. We’re not pretending to be unbiased about the merits of that category — we built it.

But the honest read is: most teams should start in category 1 or 2. Self-hosted becomes the right answer when you specifically need things the managed-SaaS or library shape don’t provide — BYOK at scale, prompt privacy, custom routing, or per-tenant billing. If those aren’t your constraints, the operational savings of SaaS dominate.

Closing

The “best LLM gateway” doesn’t exist as an absolute. The right gateway is the one whose trade-off curve matches your constraints — and those constraints change as you scale.

The actionable advice: pick one that gets you to production fastest, with the explicit expectation that you’ll migrate at least once. Designing your code against an OpenAI-compatible API contract makes that migration cheap whenever the constraints flip.