This page collects the practices that help teams get the most value from Agumbe AI Gateway in real deployments. Some of these recommendations are about integration design. Others are about security, routing, guardrails, observability, and operational discipline. Together, they form a practical playbook for teams that want to move from a working prototype to a reliable production setup. If you are integrating Agumbe for the first time, you do not need to implement everything at once. Start with the basics, keep the architecture simple, and add more control as your workload grows.Documentation Index
Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
1. Call the gateway from your backend
The safest and most maintainable integration pattern is: client application -> your backend -> Agumbe AI Gateway Your frontend, mobile app, or browser-based client should not call the gateway directly with production credentials. Instead, your backend should own the Gateway API key and be the component that makes requests to Agumbe. This gives you several important advantages:- credentials remain server-side
- application logic stays under your control
- app selection stays consistent
- retries and error handling can be centralized
- request metadata can be added in one place
- usage and observability become easier to interpret
2. Use app-scoped keys whenever the workload is fixed
If a service or workflow should always use the same app policy, prefer an app-scoped API key. This is usually the better choice for:- a support assistant
- an internal knowledge workflow
- a document summarization service
- a production environment with one clear AI purpose
3. Prefer aliases over provider-specific model names
Aliases are one of the most useful features in the gateway, and they are also one of the easiest best practices to follow. Instead of filling application code with provider-specific model IDs, prefer stable Agumbe aliases such as:- smart-default
- cheap-fast
- reasoning
- embed-default
- keep code cleaner
- reduce vendor-specific coupling
- change routing behavior without rewriting integrations
- introduce retries, fallbacks, or new defaults centrally
- standardize model choices across teams
4. Keep model selection simple at first
It is tempting to create a complex model strategy early, especially when teams are evaluating several models at once. In practice, most production systems benefit from a smaller and more deliberate starting point. A strong early pattern is:- one default chat alias
- one default embeddings alias
- one app policy per workload
- one predictable request path
5. Start guardrails in detect mode when the risk is unclear
Guardrails are one of the strongest parts of the platform, but they are also a place where overly aggressive policy can surprise teams if introduced too suddenly. For controls such as:- prompt injection
- indirect prompt injection
- denied topics
- output safety
- groundedness
- pii
- secrets
- observe first
- redact where appropriate
- block deliberately once you understand the traffic
6. Use allowed models and token caps early
Some controls are low-friction and immediately valuable. In particular, these are worth enabling early:- allowedModels
- maxTokens
- rateLimitPerMinute
- keep workloads on approved models
- reduce accidental cost spikes
- prevent oversized outputs
- limit noisy or abusive request patterns
7. Send request metadata with every important workflow
Agumbe supports request metadata fields such as:- workspace_id
- xnamespace_id
- source_service
- operation
- external_request_id
- which service generated it
- which operation triggered it
- which external business object it belongs to
8. Treat request IDs as first-class operational data
Every request should be traceable. If your application has its own request or workflow ID, pass it through as external_request_id. Also make sure your own service logs include both your internal request ID and the gateway response context when possible. This makes support, debugging, and incident review much easier. A good operational rule is simple: if a user reports a bad outcome, your team should be able to trace the related gateway request quickly.9. Watch provider latency and gateway overhead separately
When teams begin measuring AI performance, they often think only in terms of total request latency. That is not enough. Agumbe exposes timing headers that separate:- total time
- model resolution time
- guardrail time
- provider time
- request logging time
- usage emission time
- gateway overhead
10. Handle gateway errors explicitly
Do not treat all gateway failures as generic upstream issues. Agumbe returns structured errors with fields such as:- message
- type
- param
- code
- unauthorized means the credential is missing or invalid
- invalid_model means the selected model is not valid for the endpoint
- app_mismatch means the request tried to use the wrong app context
- guardrail_model_blocked means policy blocked the model choice
- guardrail_rate_limit_exceeded means app-level rate limiting blocked the request
- route_unavailable means the gateway could not find a usable route candidate
- request_timeout means the upstream request timed out
- retryable failures
- user-correctable failures
- policy-related failures
- operator-visible failures
11. Separate development, staging, and production clearly
Use separate credentials, app policies, and operational expectations for each environment. This helps you:- reduce blast radius
- avoid mixing request logs
- test policy changes safely
- isolate billing and usage behavior
- rotate keys without affecting unrelated systems
- app_support_dev
- app_support_staging
- app_support_prod
12. Use the Console for testing, not for production traffic
The Agumbe Console and playground are excellent for:- trying prompts
- validating models
- testing app policies
- inspecting logs
- understanding behavior before rollout
13. Keep route strategies intentional
Retries, fallbacks, and weighted candidates are powerful, but they should be introduced carefully. A route plan should be easy for your team to explain. Good routing design usually means:- one clear primary model
- a small number of fallback candidates
- retries only where they help
- timeout settings that match the workload
- no unnecessary complexity
14. Tune reliability by workload type
Not every AI workload should have the same latency and retry profile. For example:- a user-facing chat assistant may need tighter timeouts
- a background summarization job may tolerate a little more waiting
- an internal analysis task may justify a slower but more deliberate model
- an embeddings pipeline may prioritize consistency and throughput
- model choice
- token limits
- timeouts
- retries
- fallback behavior
15. Review cost and token usage regularly
Even when applications are functioning correctly, they may still be inefficient. Make it a habit to review:- total tokens
- output token volume
- estimated cost
- cost by workflow
- cost by model
- slow and expensive outliers
16. Roll out guardrail and routing changes gradually
Do not change too many control-plane behaviors at once. If you update:- app policies
- model aliases
- route candidates
- timeouts
- fallback rules
- Did this improve reliability?
- Did this increase latency?
- Did policy begin blocking unexpected traffic?
- Did cost change after the new model path?
17. Keep the integration contract stable for application teams
One of the biggest advantages of using a gateway is that product teams do not need to think about every provider detail all the time. Try to preserve that benefit. A good platform team usually offers application teams:- stable aliases
- stable request patterns
- clear app policies
- predictable environments
- consistent observability
- well-defined error handling guidance
18. Document your app policies internally
Even when Agumbe stores and enforces the policy, your own team should still document what each app is intended to do. For each production app, it helps to record:- what the app is used for
- which users or systems send traffic through it
- which model aliases it is expected to use
- which guardrails are enabled
- what data sensitivity applies
- what error behavior is expected
- who owns it operationally
19. Run a production smoke test before launch
Before sending meaningful production traffic, run a small end-to-end test through the real deployment path. Confirm that:- the backend can reach the gateway
- the credential scope is correct
- the expected app policy is applied
- the alias resolves as expected
- request logs appear correctly
- timing headers are present
- errors are handled correctly
- token usage and estimated cost look reasonable
20. Start with a boring production architecture
This is probably the most important best practice on the page. The best early production setup is usually not the most advanced one. It is the most understandable one. A very strong starting pattern looks like this:- backend integration only
- one app-scoped key per workload
- one stable alias for chat
- one stable alias for embeddings
- basic request metadata
- baseline guardrails
- request log review
- cost review
- gradual policy tightening over time
A recommended starting checklist
If your team wants a practical starting point, use this checklist:- call the gateway from your backend
- use app-scoped keys for fixed workloads
- use aliases instead of provider-specific model names
- enable PII and secrets redaction
- start other safety controls in detect mode
- configure allowed models and token caps
- attach source service and operation metadata
- monitor request logs and timing headers
- separate dev, staging, and prod
- review cost and error trends after rollout