Best Practices

This page collects the practices that help teams get the most value from Agumbe AI Gateway in real deployments. Some of these recommendations are about integration design. Others are about security, routing, guardrails, observability, and operational discipline. Together, they form a practical playbook for teams that want to move from a working prototype to a reliable production setup. If you are integrating Agumbe for the first time, you do not need to implement everything at once. Start with the basics, keep the architecture simple, and add more control as your workload grows.

1. Call the gateway from your backend

The safest and most maintainable integration pattern is: client application -> your backend -> Agumbe AI Gateway Your frontend, mobile app, or browser-based client should not call the gateway directly with production credentials. Instead, your backend should own the Gateway API key and be the component that makes requests to Agumbe. This gives you several important advantages:

credentials remain server-side
application logic stays under your control
app selection stays consistent
retries and error handling can be centralized
request metadata can be added in one place
usage and observability become easier to interpret

For most teams, this should be the default production pattern.

2. Use app-scoped keys whenever the workload is fixed

If a service or workflow should always use the same app policy, prefer an app-scoped API key. This is usually the better choice for:

a support assistant
an internal knowledge workflow
a document summarization service
a production environment with one clear AI purpose

App-scoped keys reduce ambiguity because the app policy is bound to the credential itself. Your service does not need to choose the app on every request, and the chance of using the wrong policy is much lower. Use tenant-scoped keys only when one service truly needs to operate across multiple app contexts.

3. Prefer aliases over provider-specific model names

Aliases are one of the most useful features in the gateway, and they are also one of the easiest best practices to follow. Instead of filling application code with provider-specific model IDs, prefer stable Agumbe aliases such as:

smart-default
cheap-fast
reasoning
embed-default

Aliases help you:

keep code cleaner
reduce vendor-specific coupling
change routing behavior without rewriting integrations
introduce retries, fallbacks, or new defaults centrally
standardize model choices across teams

Use direct model IDs only when you need exact targeting for testing, benchmarking, or tightly controlled workloads.

4. Keep model selection simple at first

It is tempting to create a complex model strategy early, especially when teams are evaluating several models at once. In practice, most production systems benefit from a smaller and more deliberate starting point. A strong early pattern is:

one default chat alias
one default embeddings alias
one app policy per workload
one predictable request path

This keeps behavior easier to understand and makes logs, costs, and incidents easier to interpret. Complex routing should be added because you need it, not because it is available.

5. Start guardrails in detect mode when the risk is unclear

Guardrails are one of the strongest parts of the platform, but they are also a place where overly aggressive policy can surprise teams if introduced too suddenly. For controls such as:

prompt injection
indirect prompt injection
denied topics
output safety
groundedness

it is often wise to begin with detect. This gives you visibility into how real traffic behaves before you move to block. For sensitive-data controls such as:

pii
secrets

a strong practical default is often redact. That approach gives teams a safe and useful rollout path:

observe first
redact where appropriate
block deliberately once you understand the traffic

6. Use allowed models and token caps early

Some controls are low-friction and immediately valuable. In particular, these are worth enabling early:

allowedModels
maxTokens
rateLimitPerMinute

These settings help you:

keep workloads on approved models
reduce accidental cost spikes
prevent oversized outputs
limit noisy or abusive request patterns

Even if your content policies are still evolving, these operational guardrails usually provide value right away.

7. Send request metadata with every important workflow

Agumbe supports request metadata fields such as:

workspace_id
xnamespace_id
source_service
operation
external_request_id

Use them. These fields make request logs dramatically more useful because they connect gateway traffic to the systems and workflows your team already understands. For example, it is much easier to investigate production traffic when a log record tells you:

which service generated it
which operation triggered it
which external business object it belongs to

Without metadata, request logs are still useful. With metadata, they become much more actionable.

8. Treat request IDs as first-class operational data

Every request should be traceable. If your application has its own request or workflow ID, pass it through as external_request_id. Also make sure your own service logs include both your internal request ID and the gateway response context when possible. This makes support, debugging, and incident review much easier. A good operational rule is simple: if a user reports a bad outcome, your team should be able to trace the related gateway request quickly.

9. Watch provider latency and gateway overhead separately

When teams begin measuring AI performance, they often think only in terms of total request latency. That is not enough. Agumbe exposes timing headers that separate:

total time
model resolution time
guardrail time
provider time
request logging time
usage emission time
gateway overhead

Use that distinction. If a request is slow, the next question should not be “was the model slow?” It should be “where was the time spent?” This helps teams make much better tuning decisions.

10. Handle gateway errors explicitly

Do not treat all gateway failures as generic upstream issues. Agumbe returns structured errors with fields such as:

message
type
param
code

Use the error code to categorize failures. For example:

unauthorized means the credential is missing or invalid
invalid_model means the selected model is not valid for the endpoint
app_mismatch means the request tried to use the wrong app context
guardrail_model_blocked means policy blocked the model choice
guardrail_rate_limit_exceeded means app-level rate limiting blocked the request
route_unavailable means the gateway could not find a usable route candidate
request_timeout means the upstream request timed out

A clean integration should distinguish between:

retryable failures
user-correctable failures
policy-related failures
operator-visible failures

That produces better application behavior and much clearer debugging.

11. Separate development, staging, and production clearly

Use separate credentials, app policies, and operational expectations for each environment. This helps you:

reduce blast radius
avoid mixing request logs
test policy changes safely
isolate billing and usage behavior
rotate keys without affecting unrelated systems

A simple naming pattern is often enough:

app_support_dev
app_support_staging
app_support_prod

Even small teams benefit from this discipline.

12. Use the Console for testing, not for production traffic

The Agumbe Console and playground are excellent for:

trying prompts
validating models
testing app policies
inspecting logs
understanding behavior before rollout

But production application traffic should still go through your own backend integration. Think of the Console as the control plane and test surface, not as the long-term execution path for your product traffic.

13. Keep route strategies intentional

Retries, fallbacks, and weighted candidates are powerful, but they should be introduced carefully. A route plan should be easy for your team to explain. Good routing design usually means:

one clear primary model
a small number of fallback candidates
retries only where they help
timeout settings that match the workload
no unnecessary complexity

If a route strategy becomes hard to explain, it will probably become hard to operate.

14. Tune reliability by workload type

Not every AI workload should have the same latency and retry profile. For example:

a user-facing chat assistant may need tighter timeouts
a background summarization job may tolerate a little more waiting
an internal analysis task may justify a slower but more deliberate model
an embeddings pipeline may prioritize consistency and throughput

Try to align:

model choice
token limits
timeouts
retries
fallback behavior

with the business purpose of the workload.

15. Review cost and token usage regularly

Even when applications are functioning correctly, they may still be inefficient. Make it a habit to review:

total tokens
output token volume
estimated cost
cost by workflow
cost by model
slow and expensive outliers

In many systems, prompt sprawl or overly large outputs cause cost growth long before anyone notices it at the product level. Small adjustments to prompts, token caps, or model choice can often reduce cost significantly without hurting outcomes.

16. Roll out guardrail and routing changes gradually

Do not change too many control-plane behaviors at once. If you update:

app policies
model aliases
route candidates
timeouts
fallback rules

introduce changes in a way your team can observe. A gradual rollout makes it easier to answer questions like:

Did this improve reliability?
Did this increase latency?
Did policy begin blocking unexpected traffic?
Did cost change after the new model path?

Observability is most useful when changes are staged clearly enough to interpret.

17. Keep the integration contract stable for application teams

One of the biggest advantages of using a gateway is that product teams do not need to think about every provider detail all the time. Try to preserve that benefit. A good platform team usually offers application teams:

stable aliases
stable request patterns
clear app policies
predictable environments
consistent observability
well-defined error handling guidance

The more stable the gateway contract is, the easier it is for other teams to adopt it confidently.

18. Document your app policies internally

Even when Agumbe stores and enforces the policy, your own team should still document what each app is intended to do. For each production app, it helps to record:

what the app is used for
which users or systems send traffic through it
which model aliases it is expected to use
which guardrails are enabled
what data sensitivity applies
what error behavior is expected
who owns it operationally

This turns the gateway from a technical integration into a manageable platform surface.

19. Run a production smoke test before launch

Before sending meaningful production traffic, run a small end-to-end test through the real deployment path. Confirm that:

the backend can reach the gateway
the credential scope is correct
the expected app policy is applied
the alias resolves as expected
request logs appear correctly
timing headers are present
errors are handled correctly
token usage and estimated cost look reasonable

A small smoke test catches many issues before users do.

20. Start with a boring production architecture

This is probably the most important best practice on the page. The best early production setup is usually not the most advanced one. It is the most understandable one. A very strong starting pattern looks like this:

backend integration only
one app-scoped key per workload
one stable alias for chat
one stable alias for embeddings
basic request metadata
baseline guardrails
request log review
cost review
gradual policy tightening over time

That setup is not flashy, but it is dependable. Dependability is what production systems need most.

A recommended starting checklist

If your team wants a practical starting point, use this checklist:

call the gateway from your backend
use app-scoped keys for fixed workloads
use aliases instead of provider-specific model names
enable PII and secrets redaction
start other safety controls in detect mode
configure allowed models and token caps
attach source service and operation metadata
monitor request logs and timing headers
separate dev, staging, and prod
review cost and error trends after rollout

Final recommendation

The teams that succeed fastest with Agumbe usually do not try to use every feature at once. They choose a small number of strong defaults, make the request path observable, and evolve policy and routing based on real traffic. That is the right mindset for production AI systems. Use the gateway to simplify what applications need to know, centralize what platform teams need to control, and make AI traffic easier to govern as usage grows.

Start here

Product

Cookbooks

1. Call the gateway from your backend

2. Use app-scoped keys whenever the workload is fixed

3. Prefer aliases over provider-specific model names

4. Keep model selection simple at first

5. Start guardrails in detect mode when the risk is unclear

6. Use allowed models and token caps early

7. Send request metadata with every important workflow

8. Treat request IDs as first-class operational data

9. Watch provider latency and gateway overhead separately

10. Handle gateway errors explicitly

11. Separate development, staging, and production clearly

12. Use the Console for testing, not for production traffic

13. Keep route strategies intentional

14. Tune reliability by workload type

15. Review cost and token usage regularly

16. Roll out guardrail and routing changes gradually

17. Keep the integration contract stable for application teams

18. Document your app policies internally

19. Run a production smoke test before launch

20. Start with a boring production architecture

A recommended starting checklist

Final recommendation

Start here

Product

Cookbooks

Documentation Index

​1. Call the gateway from your backend

​2. Use app-scoped keys whenever the workload is fixed

​3. Prefer aliases over provider-specific model names

​4. Keep model selection simple at first

​5. Start guardrails in detect mode when the risk is unclear

​6. Use allowed models and token caps early

​7. Send request metadata with every important workflow

​8. Treat request IDs as first-class operational data

​9. Watch provider latency and gateway overhead separately

​10. Handle gateway errors explicitly

​11. Separate development, staging, and production clearly

​12. Use the Console for testing, not for production traffic

​13. Keep route strategies intentional

​14. Tune reliability by workload type

​15. Review cost and token usage regularly

​16. Roll out guardrail and routing changes gradually

​17. Keep the integration contract stable for application teams

​18. Document your app policies internally

​19. Run a production smoke test before launch

​20. Start with a boring production architecture

​A recommended starting checklist

​Final recommendation

1. Call the gateway from your backend

2. Use app-scoped keys whenever the workload is fixed

3. Prefer aliases over provider-specific model names

4. Keep model selection simple at first

5. Start guardrails in detect mode when the risk is unclear

6. Use allowed models and token caps early

7. Send request metadata with every important workflow

8. Treat request IDs as first-class operational data

9. Watch provider latency and gateway overhead separately

10. Handle gateway errors explicitly

11. Separate development, staging, and production clearly

12. Use the Console for testing, not for production traffic

13. Keep route strategies intentional

14. Tune reliability by workload type

15. Review cost and token usage regularly

16. Roll out guardrail and routing changes gradually

17. Keep the integration contract stable for application teams

18. Document your app policies internally

19. Run a production smoke test before launch

20. Start with a boring production architecture

A recommended starting checklist

Final recommendation