Request logging and observability are essential parts of running AI workloads in production. Agumbe AI Gateway does not treat inference as a black box. Every request passes through a controlled runtime layer that can record useful operational metadata, emit usage signals, and expose timing breakdowns that help teams understand what happened during execution. For developers integrating the gateway, this means you can do more than send a prompt and receive a response. You can also inspect how traffic behaves over time, understand where latency comes from, trace requests across systems, and build better operational workflows around AI usage. For platform owners and business stakeholders, observability gives visibility into adoption, reliability, spend, and policy behavior.Documentation Index
Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Why observability matters
When AI traffic moves into production, teams quickly need answers to questions like these:- Which app generated this request?
- Which model was actually used?
- How long did the request take?
- Was the latency caused by the model provider or by gateway-side work?
- How many tokens did the request consume?
- What was the estimated cost?
- Which requests failed, and why?
- Which operations or services are generating the most traffic?
- Are guardrails or rate limits affecting behavior?
What the gateway records
For each request, the gateway can record structured request metadata in its request log store. This includes fields such as:- tenant ID
- user ID
- request ID
- subject type
- app ID
- workspace ID
- namespace ID
- source service
- operation
- external request ID
- request kind
- requested model
- provider
- upstream model
- status
- latency
- prompt tokens
- completion tokens
- total tokens
- estimated cost
- error code
- created time
What request logs are for
Request logs are useful for several different audiences.For developers
Request logs help developers:- trace production requests
- debug failed calls
- verify that the correct app policy was applied
- understand which model actually handled the request
- compare latency across workloads
- confirm token usage patterns
For platform and operations teams
Request logs help platform teams:- monitor system health
- identify unstable workloads
- audit model usage
- detect misuse or unexpected traffic
- understand routing outcomes
- investigate policy-related failures
For business and product stakeholders
Request logs help business-facing teams:- understand usage patterns
- spot growth in adoption
- review estimated cost trends
- identify which product areas are generating AI traffic
- connect AI activity to customer-facing workflows
Public request log access
The gateway exposes a request log endpoint for authenticated tenants. Endpoint:GET /api/v1/llm/requests\
This endpoint returns request history scoped to the authenticated tenant.
It supports filtering and pagination through query parameters such as:
- page
- page_size
- status
- request_kind
- model
- showing recent failed requests
- reviewing only chat or embeddings traffic
- filtering by a specific model
- browsing recent request history for a production app
Example request log query
Important request log fields
Some fields deserve special attention because they are especially useful in production.Request ID
Every request receives a request ID. This is one of the most important fields for debugging and tracing. If your application already has a request identifier, you can also pass your own external identifier through metadata so the gateway record can be tied back to your own systems.Requested model
This is the model value your application asked for, such as an alias or direct model ID. Examples:- smart-default
- reasoning
- @anthropic/claude-sonnet-4
Provider and upstream model
These fields show how the request was actually executed after model resolution. This distinction matters because the requested model may be an alias, while the upstream model is the concrete target selected by the gateway.Status
Status helps you separate successful traffic from failing traffic. This is useful for dashboards, alerting, QA review, and operational debugging.Token usage
For chat requests, token usage includes:- prompt tokens
- completion tokens
- total tokens
- prompt tokens
- total tokens
Estimated cost
If pricing is configured in the gateway, the request record can include an estimated cost value. This is useful for trend analysis, reporting, and budget reviews, even when downstream billing systems are separate.Error code
If a request fails, the error code helps teams understand whether the issue came from:- authentication
- validation
- guardrail enforcement
- rate limiting
- route configuration
- timeout
- upstream execution failure
Request metadata for better traceability
Agumbe allows callers to attach metadata to requests. This metadata can travel with the request and appear in request logs and usage events. Supported metadata fields include:- workspace_id
- xnamespace_id
- source_service
- operation
- external_request_id
Example with request metadata
{ "model": "smart-default", "messages": [ { "role": "user", "content": "Summarize this support ticket." } ], "agumbe_metadata": { "workspace_id": "workspace_123", "xnamespace_id": "ns_support", "source_service": "support-api", "operation": "ticket_summary", "external_request_id": "ticket_789" } }
This makes the request much easier to understand later in logs and downstream reporting systems.
Timing headers
In addition to request logs, the gateway can return timing headers on successful chat and embeddings responses. These headers expose where time was spent during execution. Available headers include:- x-agumbe-timing-total-ms
- x-agumbe-timing-model-resolve-ms
- x-agumbe-timing-guardrail-config-ms
- x-agumbe-timing-guardrail-input-ms
- x-agumbe-timing-provider-ms
- x-agumbe-timing-guardrail-output-ms
- x-agumbe-timing-request-log-ms
- x-agumbe-timing-usage-emit-ms
- x-agumbe-timing-side-effects-ms
- x-agumbe-timing-gateway-overhead-ms
How to read timing headers
Each timing field answers a different operational question.x-agumbe-timing-total-ms
The total end-to-end time spent handling the request. Use this when you want the full gateway execution time.x-agumbe-timing-model-resolve-ms
The time spent resolving the requested model into a usable route target. This is usually small, but it is still useful for understanding the full request lifecycle.x-agumbe-timing-guardrail-config-ms
The time spent loading the applicable app policy. This helps you understand the policy lookup portion of the request.x-agumbe-timing-guardrail-input-ms
The time spent preparing and inspecting request-side content before the provider call. This includes guardrail checks on incoming prompt or embedding content.x-agumbe-timing-provider-ms
The time spent in the upstream model provider call. This is often the most important timing field for latency analysis because it reflects the actual inference call.x-agumbe-timing-guardrail-output-ms
The time spent inspecting and processing the generated response after the provider call. This is especially useful when groundedness or output checks are enabled.x-agumbe-timing-request-log-ms
The time spent writing request log records.x-agumbe-timing-usage-emit-ms
The time spent emitting usage events.x-agumbe-timing-side-effects-ms
The total time spent on post-request side effects, such as request logging and usage emission.x-agumbe-timing-gateway-overhead-ms
The portion of total request time that was not spent inside the upstream provider. This helps teams understand gateway-side overhead separately from provider execution time.Estimated cost header
The gateway can also return an estimated cost header:- x-agumbe-estimated-cost-usd
- comparing workloads
- reviewing expensive prompts
- debugging sudden cost spikes
- understanding which use cases are most resource-intensive
Usage events
In addition to request logs, the gateway can emit structured usage events for downstream systems. These events can include fields such as:- tenant ID
- user ID
- request ID
- workspace ID
- namespace ID
- source service
- operation
- external request ID
- request kind
- requested model
- provider
- upstream model
- prompt tokens
- completion tokens
- total tokens
- latency
- status
- estimated cost
- timestamp
Observability and guardrails
Observability is particularly important when guardrails are enabled. Because guardrails can detect, redact, cap, block, or rate-limit requests, teams need a clear way to understand how policy enforcement affects runtime behavior. Request logs and response traces help answer questions such as:- Was this request blocked by policy?
- Which policy set was used?
- Was content redacted before reaching the model?
- Was the response modified before being returned?
- Was the request limited because of app-level rate limits?
- Was the model blocked by an allowlist policy?
Observability and routing
Request logging is also valuable for routing analysis. Because the gateway records fields such as:- requested model
- provider
- upstream model
- latency
- error code
- Are aliases resolving the way we expect?
- Are fallback models seeing real traffic?
- Are certain models consistently slower?
- Are failures clustering around one target?
- Is a route strategy causing cost growth?
Example operational use cases
Here are a few practical ways teams use request logs and observability.Debug a failed support workflow
A support application returns an error to the user. The team looks up the request log, sees the request ID, confirms the app ID, and finds that the request failed with guardrail_model_blocked. This immediately tells them the issue was policy-related, not provider-related.Investigate latency complaints
A team notices that a user-facing workflow feels slow. They inspect response headers and see that provider time is normal, but total time is much higher because of side-effect timing and policy checks. This gives them a much better starting point for tuning the workflow.Review expensive prompts
A team looks at estimated cost and token usage across recent requests and notices that one internal operation is using much more output volume than expected. They then tighten max token settings for that app.Connect gateway traffic to product workflows
A company uses source_service, operation, and external_request_id fields to connect gateway traffic back to product features. This helps them understand which product surfaces drive usage and cost.Best practices
Always attach meaningful metadata
If your system has concepts like service name, workflow name, tenant workspace, or external request ID, send them with the request. This makes logs far more useful later.Treat request IDs as first-class debugging tools
Include request IDs in your application logs and support workflows so teams can trace incidents quickly.Watch both latency and cost
A request can succeed technically and still be operationally problematic if it is too slow or too expensive.Separate provider time from gateway time
Do not assume all latency comes from the upstream model. Use timing headers to understand where time is actually spent.Use observability during rollout, not only during incidents
The best time to review logs is before a problem becomes severe. Early monitoring helps teams catch model, routing, or policy issues before they affect a larger user base.Recommended starting point
If you are moving into production, start with this observability pattern:- use request logs for every production workload
- attach source_service, operation, and external_request_id
- monitor timing headers during rollout and performance tuning
- review token usage and estimated cost regularly
- track failed requests by error code
- use app-specific policies so logs are easier to interpret
Next steps
Once you understand request logging and observability, the next helpful pages are:- Best Practices for production integration guidance
- API Overview for the request and response contract
- Go to Production for a broader launch checklist