Request logs and Observability

Request logging and observability are essential parts of running AI workloads in production. Agumbe AI Gateway does not treat inference as a black box. Every request passes through a controlled runtime layer that can record useful operational metadata, emit usage signals, and expose timing breakdowns that help teams understand what happened during execution. For developers integrating the gateway, this means you can do more than send a prompt and receive a response. You can also inspect how traffic behaves over time, understand where latency comes from, trace requests across systems, and build better operational workflows around AI usage. For platform owners and business stakeholders, observability gives visibility into adoption, reliability, spend, and policy behavior.

Why observability matters

When AI traffic moves into production, teams quickly need answers to questions like these:

Which app generated this request?
Which model was actually used?
How long did the request take?
Was the latency caused by the model provider or by gateway-side work?
How many tokens did the request consume?
What was the estimated cost?
Which requests failed, and why?
Which operations or services are generating the most traffic?
Are guardrails or rate limits affecting behavior?

Without request logging and runtime telemetry, these questions are difficult to answer consistently. Agumbe AI Gateway helps solve that problem by making request execution observable by default.

What the gateway records

For each request, the gateway can record structured request metadata in its request log store. This includes fields such as:

tenant ID
user ID
request ID
subject type
app ID
workspace ID
namespace ID
source service
operation
external request ID
request kind
requested model
provider
upstream model
status
latency
prompt tokens
completion tokens
total tokens
estimated cost
error code
created time

These records make it possible to inspect both application behavior and platform behavior from one place.

What request logs are for

Request logs are useful for several different audiences.

For developers

Request logs help developers:

trace production requests
debug failed calls
verify that the correct app policy was applied
understand which model actually handled the request
compare latency across workloads
confirm token usage patterns

For platform and operations teams

Request logs help platform teams:

monitor system health
identify unstable workloads
audit model usage
detect misuse or unexpected traffic
understand routing outcomes
investigate policy-related failures

For business and product stakeholders

Request logs help business-facing teams:

understand usage patterns
spot growth in adoption
review estimated cost trends
identify which product areas are generating AI traffic
connect AI activity to customer-facing workflows

Public request log access

The gateway exposes a request log endpoint for authenticated tenants. Endpoint: GET /api/v1/llm/requests\ This endpoint returns request history scoped to the authenticated tenant. It supports filtering and pagination through query parameters such as:

page
page_size
status
request_kind
model

This makes it possible to build useful workflows such as:

showing recent failed requests
reviewing only chat or embeddings traffic
filtering by a specific model
browsing recent request history for a production app

Example request log query

https://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \   
	-H "Authorization: Bearer $AGUMBE_API_KEYhttps://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \   
	-H "Authorization: Bearer $AGUMBE_API_KEY"

This example retrieves failed chat requests for the authenticated tenant.

Important request log fields

Some fields deserve special attention because they are especially useful in production.

Request ID

Every request receives a request ID. This is one of the most important fields for debugging and tracing. If your application already has a request identifier, you can also pass your own external identifier through metadata so the gateway record can be tied back to your own systems.

Requested model

This is the model value your application asked for, such as an alias or direct model ID. Examples:

smart-default
reasoning
@anthropic/claude-sonnet-4

Provider and upstream model

These fields show how the request was actually executed after model resolution. This distinction matters because the requested model may be an alias, while the upstream model is the concrete target selected by the gateway.

Status

Status helps you separate successful traffic from failing traffic. This is useful for dashboards, alerting, QA review, and operational debugging.

Token usage

For chat requests, token usage includes:

prompt tokens
completion tokens
total tokens

For embeddings requests, token usage includes:

prompt tokens
total tokens

These values help teams understand usage intensity and cost drivers.

Estimated cost

If pricing is configured in the gateway, the request record can include an estimated cost value. This is useful for trend analysis, reporting, and budget reviews, even when downstream billing systems are separate.

Error code

If a request fails, the error code helps teams understand whether the issue came from:

authentication
validation
guardrail enforcement
rate limiting
route configuration
timeout
upstream execution failure

This is often the fastest way to categorize operational issues.

Request metadata for better traceability

Agumbe allows callers to attach metadata to requests. This metadata can travel with the request and appear in request logs and usage events. Supported metadata fields include:

workspace_id
xnamespace_id
source_service
operation
external_request_id

These fields are especially useful for teams that want to connect gateway activity back to internal product workflows.

Example with request metadata

{ "model": "smart-default", "messages": [ { "role": "user", "content": "Summarize this support ticket." } ], "agumbe_metadata": { "workspace_id": "workspace_123", "xnamespace_id": "ns_support", "source_service": "support-api", "operation": "ticket_summary", "external_request_id": "ticket_789" } } This makes the request much easier to understand later in logs and downstream reporting systems.

Timing headers

In addition to request logs, the gateway can return timing headers on successful chat and embeddings responses. These headers expose where time was spent during execution. Available headers include:

x-agumbe-timing-total-ms
x-agumbe-timing-model-resolve-ms
x-agumbe-timing-guardrail-config-ms
x-agumbe-timing-guardrail-input-ms
x-agumbe-timing-provider-ms
x-agumbe-timing-guardrail-output-ms
x-agumbe-timing-request-log-ms
x-agumbe-timing-usage-emit-ms
x-agumbe-timing-side-effects-ms
x-agumbe-timing-gateway-overhead-ms

These headers are useful when you want to understand not just that a request was slow, but why it was slow.

How to read timing headers

Each timing field answers a different operational question.

x-agumbe-timing-total-ms

The total end-to-end time spent handling the request. Use this when you want the full gateway execution time.

x-agumbe-timing-model-resolve-ms

The time spent resolving the requested model into a usable route target. This is usually small, but it is still useful for understanding the full request lifecycle.

x-agumbe-timing-guardrail-config-ms

The time spent loading the applicable app policy. This helps you understand the policy lookup portion of the request.

x-agumbe-timing-guardrail-input-ms

The time spent preparing and inspecting request-side content before the provider call. This includes guardrail checks on incoming prompt or embedding content.

x-agumbe-timing-provider-ms

The time spent in the upstream model provider call. This is often the most important timing field for latency analysis because it reflects the actual inference call.

x-agumbe-timing-guardrail-output-ms

The time spent inspecting and processing the generated response after the provider call. This is especially useful when groundedness or output checks are enabled.

x-agumbe-timing-request-log-ms

The time spent writing request log records.

x-agumbe-timing-usage-emit-ms

The time spent emitting usage events.

x-agumbe-timing-side-effects-ms

The total time spent on post-request side effects, such as request logging and usage emission.

x-agumbe-timing-gateway-overhead-ms

The portion of total request time that was not spent inside the upstream provider. This helps teams understand gateway-side overhead separately from provider execution time.

Estimated cost header

The gateway can also return an estimated cost header:

x-agumbe-estimated-cost-usd

If model pricing is configured, this header gives a fast, request-level estimate of cost. This is helpful for:

comparing workloads
reviewing expensive prompts
debugging sudden cost spikes
understanding which use cases are most resource-intensive

It is important to treat this as an operational estimate rather than a full billing contract.

Usage events

In addition to request logs, the gateway can emit structured usage events for downstream systems. These events can include fields such as:

tenant ID
user ID
request ID
workspace ID
namespace ID
source service
operation
external request ID
request kind
requested model
provider
upstream model
prompt tokens
completion tokens
total tokens
latency
status
estimated cost
timestamp

These events are useful when you want to feed usage into metering, billing, analytics, or warehouse pipelines.

Observability and guardrails

Observability is particularly important when guardrails are enabled. Because guardrails can detect, redact, cap, block, or rate-limit requests, teams need a clear way to understand how policy enforcement affects runtime behavior. Request logs and response traces help answer questions such as:

Was this request blocked by policy?
Which policy set was used?
Was content redacted before reaching the model?
Was the response modified before being returned?
Was the request limited because of app-level rate limits?
Was the model blocked by an allowlist policy?

This is one of the major advantages of using a gateway: policy behavior becomes observable rather than hidden inside application code.

Observability and routing

Request logging is also valuable for routing analysis. Because the gateway records fields such as:

requested model
provider
upstream model
latency
error code

teams can use logs to understand how route behavior performs over time. This helps with questions such as:

Are aliases resolving the way we expect?
Are fallback models seeing real traffic?
Are certain models consistently slower?
Are failures clustering around one target?
Is a route strategy causing cost growth?

In other words, request observability helps teams make better routing decisions.

Example operational use cases

Here are a few practical ways teams use request logs and observability.

Debug a failed support workflow

A support application returns an error to the user. The team looks up the request log, sees the request ID, confirms the app ID, and finds that the request failed with guardrail_model_blocked. This immediately tells them the issue was policy-related, not provider-related.

Investigate latency complaints

A team notices that a user-facing workflow feels slow. They inspect response headers and see that provider time is normal, but total time is much higher because of side-effect timing and policy checks. This gives them a much better starting point for tuning the workflow.

Review expensive prompts

A team looks at estimated cost and token usage across recent requests and notices that one internal operation is using much more output volume than expected. They then tighten max token settings for that app.

Connect gateway traffic to product workflows

A company uses source_service, operation, and external_request_id fields to connect gateway traffic back to product features. This helps them understand which product surfaces drive usage and cost.

Best practices

Always attach meaningful metadata

If your system has concepts like service name, workflow name, tenant workspace, or external request ID, send them with the request. This makes logs far more useful later.

Treat request IDs as first-class debugging tools

Include request IDs in your application logs and support workflows so teams can trace incidents quickly.

Watch both latency and cost

A request can succeed technically and still be operationally problematic if it is too slow or too expensive.

Separate provider time from gateway time

Do not assume all latency comes from the upstream model. Use timing headers to understand where time is actually spent.

Use observability during rollout, not only during incidents

The best time to review logs is before a problem becomes severe. Early monitoring helps teams catch model, routing, or policy issues before they affect a larger user base.

Recommended starting point

If you are moving into production, start with this observability pattern:

use request logs for every production workload
attach source_service, operation, and external_request_id
monitor timing headers during rollout and performance tuning
review token usage and estimated cost regularly
track failed requests by error code
use app-specific policies so logs are easier to interpret

This gives you a practical observability foundation without requiring a large monitoring project on day one.

Next steps

Once you understand request logging and observability, the next helpful pages are:

Best Practices for production integration guidance
API Overview for the request and response contract
Go to Production for a broader launch checklist

Start here

Product

Cookbooks

Documentation Index

​Why observability matters

​What the gateway records

​What request logs are for

​For developers

​For platform and operations teams

​For business and product stakeholders

​Public request log access

​Example request log query

​Important request log fields

​Request ID

​Requested model

​Provider and upstream model

​Status

​Token usage

​Estimated cost

​Error code

​Request metadata for better traceability

​Example with request metadata

​Timing headers

​How to read timing headers

​x-agumbe-timing-total-ms

​x-agumbe-timing-model-resolve-ms

​x-agumbe-timing-guardrail-config-ms

​x-agumbe-timing-guardrail-input-ms

​x-agumbe-timing-provider-ms

​x-agumbe-timing-guardrail-output-ms

​x-agumbe-timing-request-log-ms

​x-agumbe-timing-usage-emit-ms

​x-agumbe-timing-side-effects-ms

​x-agumbe-timing-gateway-overhead-ms

​Estimated cost header

​Usage events

​Observability and guardrails

​Observability and routing

​Example operational use cases

​Debug a failed support workflow

​Investigate latency complaints

​Review expensive prompts

​Connect gateway traffic to product workflows

​Best practices

​Always attach meaningful metadata

​Treat request IDs as first-class debugging tools

​Watch both latency and cost

​Separate provider time from gateway time

​Use observability during rollout, not only during incidents

​Recommended starting point

​Next steps