Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Agumbe AI Gateway gives you a stable way to work with language models without forcing your application to depend on provider-specific naming or routing behavior. When your application sends a request to the gateway, it includes a model value. That value can be either:
  • an Agumbe alias
  • a catalog-backed model ID
This design gives teams two useful modes of operation:
  • stable application-facing names for day-to-day development
  • explicit model targeting when you need direct control
For most teams, aliases are the best starting point. They keep your application code simple and make it easier to evolve routing and model strategy over time.

Why this matters

Without a gateway, model names tend to leak directly into application code. Over time, that creates friction:
  • changing models requires code changes across services
  • production traffic becomes tightly coupled to one provider’s naming scheme
  • experimenting with fallbacks or routing becomes harder
  • business teams and platform teams lose a shared vocabulary for “default” or “reasoning” behavior
Agumbe solves this by separating what your application asks for from how the platform fulfills it. Your application can ask for a stable name such as smart-default, and the gateway can resolve that to the right model and provider behind the scenes.

Two ways to select a model

Use an alias

An alias is a stable, gateway-defined model name. Aliases are ideal when you want:
  • a clean integration experience
  • stable application code
  • room to evolve routing later
  • a platform-controlled default for quality, speed, or reasoning
Examples:
{
  "model": "smart-default"
}
{
  "model": "cheap-fast"
}
{
  "model": "reasoning"
}
{
  "model": "embed-default"
}

Use a catalog-backed model ID

A catalog-backed model ID targets a specific model entry exposed by the gateway. Use this path when you want:
  • explicit control over the model being called
  • consistency for evaluation or benchmarking
  • fine-grained model selection in a specific workload
  • a known model target for a tightly controlled use case
Example:
{
  "model": "@anthropic/claude-sonnet-4"
}
Or:
{
  "model": "@openai/gpt-5.2"
}
Agumbe also supports catalog-style convenience names that resolve through its alias layer, such as:
{
  "model": "gpt-5.2"
}

What is an alias?

An alias is a gateway-defined name that maps to a model target. You can think of an alias as a product-facing contract between your application and the gateway. Your code depends on the alias. The gateway owns the actual resolution. This gives you a better operating model:
  • your engineering team uses stable names
  • your platform team can change routing without rewriting integrations
  • your product team can standardize how workloads talk about model classes
For example, an application might use:
  • smart-default for general-purpose chat
  • cheap-fast for low-cost, low-latency tasks
  • reasoning for more complex or deliberate outputs
  • embed-default for embeddings
The meaning stays stable even if the exact backing model changes over time. Aliases are the safest default for most customer integrations. Use aliases when you want to:
  • reduce coupling between application code and model vendors
  • make future model changes less disruptive
  • introduce routing or fallback behavior later
  • keep your prompts and services readable
  • standardize model choices across teams
For example, this is easier to reason about in a production codebase:
{
  "model": "smart-default"
}
than embedding a provider-specific model name everywhere. Aliases are especially useful for teams with multiple services, multiple environments, or evolving model strategy.

When to use explicit model IDs

There are still good reasons to call a specific model directly. Use a catalog-backed model ID when:
  • you are comparing models side by side
  • a workflow must use a specific model for regulatory, evaluation, or internal reasons
  • you are running tests or benchmarks
  • you want exact reproducibility for a tightly scoped integration
  • you are debugging routing behavior
In other words, explicit model IDs are best when precision matters more than flexibility.

Listing available models

The gateway exposes a models endpoint so your application or team can discover what is currently available. Endpoint: GET /api/v1/llm/models This endpoint returns a list of model entries that can include:
  • gateway aliases
  • canonical model IDs
  • provider information
  • request kind metadata
  • alias markers
Use this endpoint when you want to:
  • populate a model selector in a UI
  • validate model names during development
  • inspect which aliases are currently exposed
  • understand whether a model supports chat or embeddings

Model kinds

Agumbe separates models into two request kinds:
  • chat
  • embeddings
This distinction matters because not every model can be used with every endpoint.

Chat models

Chat models are used with: POST /api/v1/llm/chat/completions These models support conversational or generative responses.

Embeddings models

Embeddings models are used with: POST /api/v1/llm/embeddings These models produce vector embeddings for search, retrieval, clustering, classification, and related use cases. If a model resolves to the wrong kind for the endpoint you are calling, the gateway rejects the request. For example:
  • a chat-only model cannot be used with the embeddings endpoint
  • an embeddings-only model cannot be used with the chat completions endpoint
This protects applications from sending invalid traffic.

How model resolution works

When your request reaches the gateway, the gateway resolves the model field before making any provider call. At a high level, resolution works like this:
  1. the gateway reads the requested model value
  2. it checks whether the value is an alias
  3. if it is an alias, the gateway resolves it to its target
  4. it validates whether the resolved model supports the requested endpoint
  5. it prepares the route plan for execution
  6. it forwards the request to the selected provider adapter
This means your application does not need to understand provider-specific request routing. The gateway handles that part for you.

Routing and aliases

Aliases become even more valuable once routing behavior becomes more sophisticated. Because an alias is a stable gateway name, Agumbe can attach routing logic to it over time. That may include:
  • a preferred primary model
  • retries
  • fallbacks
  • weighted candidate selection
  • future reliability rules
This is one of the strongest reasons to use aliases in production. They give the platform room to improve reliability and model strategy without changing your application contract.

Common alias patterns

The exact alias set in your gateway may evolve, but a typical setup includes names like these:

smart-default

Use for general-purpose chat tasks where you want a strong default model. This is a good fit for:
  • assistants
  • summaries
  • classification with reasoning
  • customer support workflows
  • general product features

cheap-fast

Use for lightweight tasks where low latency or lower cost matters more than maximum reasoning depth. This is a good fit for:
  • simple rewriting
  • tagging
  • short transformations
  • low-cost automation
  • bulk operational tasks

reasoning

Use for tasks that benefit from more deliberate reasoning or stronger structured thinking. This is a good fit for:
  • analysis
  • decision support
  • complex multi-step synthesis
  • workflows where answer quality matters more than speed

embed-default

Use for embeddings use cases where you want a stable vectorization default. This is a good fit for:
  • semantic search
  • retrieval pipelines
  • clustering
  • recommendation systems
  • document similarity

Example: using an alias in a chat request

{
  "model": "smart-default",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant."
    },
    {
      "role": "user",
      "content": "Summarize this support ticket in one paragraph."
    }
  ],
  "max_completion_tokens": 180
}
This is the recommended pattern for most product traffic.

Example: using a specific model in a chat request

{
  "model": "@anthropic/claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "Compare the tradeoffs between two architecture options."
    }
  ],
  "max_completion_tokens": 300
}
This is useful when you want direct model targeting.

Example: using an embeddings alias

{
  "model": "embed-default",
  "input": "Agumbe AI Gateway centralizes model access, guardrails, and observability."
}

Example: using a specific embeddings model

{
  "model": "@openai/text-embedding-3-small",
  "input": "Agumbe AI Gateway centralizes model access, guardrails, and observability."
}

Choosing the right model strategy

For most teams, the best model strategy is simple.

Start with aliases

Use aliases when:
  • you are integrating the gateway for the first time
  • you want clean application code
  • you want the gateway to own model evolution
  • you want room to improve routing later

Use explicit models selectively

Use specific model IDs when:
  • you need exact control
  • you are benchmarking
  • you are validating prompt behavior across models
  • you are debugging or evaluating quality differences
A good long-term pattern is to use aliases for production application traffic and explicit models for internal testing or experimentation.

Models and guardrails

Model selection and guardrails work together. Guardrail policies can include an allowed model list. When this is configured, the gateway checks the requested or resolved model against the allowed set before continuing. This means:
  • model choice is not only a product concern
  • it is also part of policy enforcement
For example, a team might allow only a small set of approved models for a given app. In that setup, the gateway rejects requests that attempt to use disallowed models, even if the caller is otherwise authenticated. This helps teams control risk, quality, and cost at the app level.

Models and production environments

As your platform grows, model strategy often becomes environment-specific. A common pattern is:
  • development uses a faster or cheaper alias
  • staging uses a more production-like alias
  • production uses a stable default alias with stronger guardrails
This works well because aliases let you keep application logic consistent while changing the backing model strategy per environment if needed.

Best practices

Prefer aliases for production workloads

Aliases reduce churn in your codebase and make model strategy easier to evolve.

Use the models endpoint for discovery

Do not hardcode assumptions about available models if your system needs to stay in sync with the gateway catalog.

Keep chat and embeddings clearly separated

Choose models that match the endpoint you are calling.

Combine aliases with guardrails

Model selection becomes much safer when each app has an explicit policy around allowed models, token limits, and rate limits.

Avoid scattering provider-specific names across your application

If every service chooses its own direct model ID, the platform becomes harder to standardize and govern.

Common errors

If a model cannot be resolved or does not match the endpoint, the gateway returns a structured error. Example:
{
  "error": {
    "message": "Model embed-default resolves to embeddings and cannot be used with the chat endpoint",
    "type": "invalid_request_error",
    "param": "model",
    "code": "invalid_model"
  }
}
You may also see a model-related error if:
  • the requested model is unknown
  • an alias points to an invalid target
  • the model is blocked by guardrail allowlists
  • no routing candidates are configured for the resolved model
If you are integrating Agumbe AI Gateway for the first time, start with:
  • smart-default for general chat
  • embed-default for embeddings
  • app-level guardrails
  • request logs enabled in your operational workflow
  • explicit model IDs only when you need direct targeting
This gives you the cleanest balance of simplicity, flexibility, and long-term maintainability.

Next steps

Once you understand model selection, the next pages to read are:
  • Guardrails to learn how app policies shape model usage
  • Routing and Reliability to understand how model resolution and execution behave in production
  • Request Logging and Observability to see how model usage appears in request records and operational flows