Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page collects the practices that help teams get the most value from Agumbe AI Gateway in real deployments. Some of these recommendations are about integration design. Others are about security, routing, guardrails, observability, and operational discipline. Together, they form a practical playbook for teams that want to move from a working prototype to a reliable production setup. If you are integrating Agumbe for the first time, you do not need to implement everything at once. Start with the basics, keep the architecture simple, and add more control as your workload grows.

1. Call the gateway from your backend

The safest and most maintainable integration pattern is: client application -> your backend -> Agumbe AI Gateway Your frontend, mobile app, or browser-based client should not call the gateway directly with production credentials. Instead, your backend should own the Gateway API key and be the component that makes requests to Agumbe. This gives you several important advantages:
  • credentials remain server-side
  • application logic stays under your control
  • app selection stays consistent
  • retries and error handling can be centralized
  • request metadata can be added in one place
  • usage and observability become easier to interpret
For most teams, this should be the default production pattern.

2. Use app-scoped keys whenever the workload is fixed

If a service or workflow should always use the same app policy, prefer an app-scoped API key. This is usually the better choice for:
  • a support assistant
  • an internal knowledge workflow
  • a document summarization service
  • a production environment with one clear AI purpose
App-scoped keys reduce ambiguity because the app policy is bound to the credential itself. Your service does not need to choose the app on every request, and the chance of using the wrong policy is much lower. Use tenant-scoped keys only when one service truly needs to operate across multiple app contexts.

3. Prefer aliases over provider-specific model names

Aliases are one of the most useful features in the gateway, and they are also one of the easiest best practices to follow. Instead of filling application code with provider-specific model IDs, prefer stable Agumbe aliases such as:
  • smart-default
  • cheap-fast
  • reasoning
  • embed-default
Aliases help you:
  • keep code cleaner
  • reduce vendor-specific coupling
  • change routing behavior without rewriting integrations
  • introduce retries, fallbacks, or new defaults centrally
  • standardize model choices across teams
Use direct model IDs only when you need exact targeting for testing, benchmarking, or tightly controlled workloads.

4. Keep model selection simple at first

It is tempting to create a complex model strategy early, especially when teams are evaluating several models at once. In practice, most production systems benefit from a smaller and more deliberate starting point. A strong early pattern is:
  • one default chat alias
  • one default embeddings alias
  • one app policy per workload
  • one predictable request path
This keeps behavior easier to understand and makes logs, costs, and incidents easier to interpret. Complex routing should be added because you need it, not because it is available.

5. Start guardrails in detect mode when the risk is unclear

Guardrails are one of the strongest parts of the platform, but they are also a place where overly aggressive policy can surprise teams if introduced too suddenly. For controls such as:
  • prompt injection
  • indirect prompt injection
  • denied topics
  • output safety
  • groundedness
it is often wise to begin with detect. This gives you visibility into how real traffic behaves before you move to block. For sensitive-data controls such as:
  • pii
  • secrets
a strong practical default is often redact. That approach gives teams a safe and useful rollout path:
  • observe first
  • redact where appropriate
  • block deliberately once you understand the traffic

6. Use allowed models and token caps early

Some controls are low-friction and immediately valuable. In particular, these are worth enabling early:
  • allowedModels
  • maxTokens
  • rateLimitPerMinute
These settings help you:
  • keep workloads on approved models
  • reduce accidental cost spikes
  • prevent oversized outputs
  • limit noisy or abusive request patterns
Even if your content policies are still evolving, these operational guardrails usually provide value right away.

7. Send request metadata with every important workflow

Agumbe supports request metadata fields such as:
  • workspace_id
  • xnamespace_id
  • source_service
  • operation
  • external_request_id
Use them. These fields make request logs dramatically more useful because they connect gateway traffic to the systems and workflows your team already understands. For example, it is much easier to investigate production traffic when a log record tells you:
  • which service generated it
  • which operation triggered it
  • which external business object it belongs to
Without metadata, request logs are still useful. With metadata, they become much more actionable.

8. Treat request IDs as first-class operational data

Every request should be traceable. If your application has its own request or workflow ID, pass it through as external_request_id. Also make sure your own service logs include both your internal request ID and the gateway response context when possible. This makes support, debugging, and incident review much easier. A good operational rule is simple: if a user reports a bad outcome, your team should be able to trace the related gateway request quickly.

9. Watch provider latency and gateway overhead separately

When teams begin measuring AI performance, they often think only in terms of total request latency. That is not enough. Agumbe exposes timing headers that separate:
  • total time
  • model resolution time
  • guardrail time
  • provider time
  • request logging time
  • usage emission time
  • gateway overhead
Use that distinction. If a request is slow, the next question should not be “was the model slow?” It should be “where was the time spent?” This helps teams make much better tuning decisions.

10. Handle gateway errors explicitly

Do not treat all gateway failures as generic upstream issues. Agumbe returns structured errors with fields such as:
  • message
  • type
  • param
  • code
Use the error code to categorize failures. For example:
  • unauthorized means the credential is missing or invalid
  • invalid_model means the selected model is not valid for the endpoint
  • app_mismatch means the request tried to use the wrong app context
  • guardrail_model_blocked means policy blocked the model choice
  • guardrail_rate_limit_exceeded means app-level rate limiting blocked the request
  • route_unavailable means the gateway could not find a usable route candidate
  • request_timeout means the upstream request timed out
A clean integration should distinguish between:
  • retryable failures
  • user-correctable failures
  • policy-related failures
  • operator-visible failures
That produces better application behavior and much clearer debugging.

11. Separate development, staging, and production clearly

Use separate credentials, app policies, and operational expectations for each environment. This helps you:
  • reduce blast radius
  • avoid mixing request logs
  • test policy changes safely
  • isolate billing and usage behavior
  • rotate keys without affecting unrelated systems
A simple naming pattern is often enough:
  • app_support_dev
  • app_support_staging
  • app_support_prod
Even small teams benefit from this discipline.

12. Use the Console for testing, not for production traffic

The Agumbe Console and playground are excellent for:
  • trying prompts
  • validating models
  • testing app policies
  • inspecting logs
  • understanding behavior before rollout
But production application traffic should still go through your own backend integration. Think of the Console as the control plane and test surface, not as the long-term execution path for your product traffic.

13. Keep route strategies intentional

Retries, fallbacks, and weighted candidates are powerful, but they should be introduced carefully. A route plan should be easy for your team to explain. Good routing design usually means:
  • one clear primary model
  • a small number of fallback candidates
  • retries only where they help
  • timeout settings that match the workload
  • no unnecessary complexity
If a route strategy becomes hard to explain, it will probably become hard to operate.

14. Tune reliability by workload type

Not every AI workload should have the same latency and retry profile. For example:
  • a user-facing chat assistant may need tighter timeouts
  • a background summarization job may tolerate a little more waiting
  • an internal analysis task may justify a slower but more deliberate model
  • an embeddings pipeline may prioritize consistency and throughput
Try to align:
  • model choice
  • token limits
  • timeouts
  • retries
  • fallback behavior
with the business purpose of the workload.

15. Review cost and token usage regularly

Even when applications are functioning correctly, they may still be inefficient. Make it a habit to review:
  • total tokens
  • output token volume
  • estimated cost
  • cost by workflow
  • cost by model
  • slow and expensive outliers
In many systems, prompt sprawl or overly large outputs cause cost growth long before anyone notices it at the product level. Small adjustments to prompts, token caps, or model choice can often reduce cost significantly without hurting outcomes.

16. Roll out guardrail and routing changes gradually

Do not change too many control-plane behaviors at once. If you update:
  • app policies
  • model aliases
  • route candidates
  • timeouts
  • fallback rules
introduce changes in a way your team can observe. A gradual rollout makes it easier to answer questions like:
  • Did this improve reliability?
  • Did this increase latency?
  • Did policy begin blocking unexpected traffic?
  • Did cost change after the new model path?
Observability is most useful when changes are staged clearly enough to interpret.

17. Keep the integration contract stable for application teams

One of the biggest advantages of using a gateway is that product teams do not need to think about every provider detail all the time. Try to preserve that benefit. A good platform team usually offers application teams:
  • stable aliases
  • stable request patterns
  • clear app policies
  • predictable environments
  • consistent observability
  • well-defined error handling guidance
The more stable the gateway contract is, the easier it is for other teams to adopt it confidently.

18. Document your app policies internally

Even when Agumbe stores and enforces the policy, your own team should still document what each app is intended to do. For each production app, it helps to record:
  • what the app is used for
  • which users or systems send traffic through it
  • which model aliases it is expected to use
  • which guardrails are enabled
  • what data sensitivity applies
  • what error behavior is expected
  • who owns it operationally
This turns the gateway from a technical integration into a manageable platform surface.

19. Run a production smoke test before launch

Before sending meaningful production traffic, run a small end-to-end test through the real deployment path. Confirm that:
  • the backend can reach the gateway
  • the credential scope is correct
  • the expected app policy is applied
  • the alias resolves as expected
  • request logs appear correctly
  • timing headers are present
  • errors are handled correctly
  • token usage and estimated cost look reasonable
A small smoke test catches many issues before users do.

20. Start with a boring production architecture

This is probably the most important best practice on the page. The best early production setup is usually not the most advanced one. It is the most understandable one. A very strong starting pattern looks like this:
  • backend integration only
  • one app-scoped key per workload
  • one stable alias for chat
  • one stable alias for embeddings
  • basic request metadata
  • baseline guardrails
  • request log review
  • cost review
  • gradual policy tightening over time
That setup is not flashy, but it is dependable. Dependability is what production systems need most. If your team wants a practical starting point, use this checklist:
  • call the gateway from your backend
  • use app-scoped keys for fixed workloads
  • use aliases instead of provider-specific model names
  • enable PII and secrets redaction
  • start other safety controls in detect mode
  • configure allowed models and token caps
  • attach source service and operation metadata
  • monitor request logs and timing headers
  • separate dev, staging, and prod
  • review cost and error trends after rollout

Final recommendation

The teams that succeed fastest with Agumbe usually do not try to use every feature at once. They choose a small number of strong defaults, make the request path observable, and evolve policy and routing based on real traffic. That is the right mindset for production AI systems. Use the gateway to simplify what applications need to know, centralize what platform teams need to control, and make AI traffic easier to govern as usage grows.