Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Request logging and observability are essential parts of running AI workloads in production. Agumbe AI Gateway does not treat inference as a black box. Every request passes through a controlled runtime layer that can record useful operational metadata, emit usage signals, and expose timing breakdowns that help teams understand what happened during execution. For developers integrating the gateway, this means you can do more than send a prompt and receive a response. You can also inspect how traffic behaves over time, understand where latency comes from, trace requests across systems, and build better operational workflows around AI usage. For platform owners and business stakeholders, observability gives visibility into adoption, reliability, spend, and policy behavior.

Why observability matters

When AI traffic moves into production, teams quickly need answers to questions like these:
  • Which app generated this request?
  • Which model was actually used?
  • How long did the request take?
  • Was the latency caused by the model provider or by gateway-side work?
  • How many tokens did the request consume?
  • What was the estimated cost?
  • Which requests failed, and why?
  • Which operations or services are generating the most traffic?
  • Are guardrails or rate limits affecting behavior?
Without request logging and runtime telemetry, these questions are difficult to answer consistently. Agumbe AI Gateway helps solve that problem by making request execution observable by default.

What the gateway records

For each request, the gateway can record structured request metadata in its request log store. This includes fields such as:
  • tenant ID
  • user ID
  • request ID
  • subject type
  • app ID
  • workspace ID
  • namespace ID
  • source service
  • operation
  • external request ID
  • request kind
  • requested model
  • provider
  • upstream model
  • status
  • latency
  • prompt tokens
  • completion tokens
  • total tokens
  • estimated cost
  • error code
  • created time
These records make it possible to inspect both application behavior and platform behavior from one place.

What request logs are for

Request logs are useful for several different audiences.

For developers

Request logs help developers:
  • trace production requests
  • debug failed calls
  • verify that the correct app policy was applied
  • understand which model actually handled the request
  • compare latency across workloads
  • confirm token usage patterns

For platform and operations teams

Request logs help platform teams:
  • monitor system health
  • identify unstable workloads
  • audit model usage
  • detect misuse or unexpected traffic
  • understand routing outcomes
  • investigate policy-related failures

For business and product stakeholders

Request logs help business-facing teams:
  • understand usage patterns
  • spot growth in adoption
  • review estimated cost trends
  • identify which product areas are generating AI traffic
  • connect AI activity to customer-facing workflows

Public request log access

The gateway exposes a request log endpoint for authenticated tenants. Endpoint: GET /api/v1/llm/requests\ This endpoint returns request history scoped to the authenticated tenant. It supports filtering and pagination through query parameters such as:
  • page
  • page_size
  • status
  • request_kind
  • model
This makes it possible to build useful workflows such as:
  • showing recent failed requests
  • reviewing only chat or embeddings traffic
  • filtering by a specific model
  • browsing recent request history for a production app

Example request log query

https://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \   
	-H "Authorization: Bearer $AGUMBE_API_KEYhttps://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \   
	-H "Authorization: Bearer $AGUMBE_API_KEY"
This example retrieves failed chat requests for the authenticated tenant.

Important request log fields

Some fields deserve special attention because they are especially useful in production.

Request ID

Every request receives a request ID. This is one of the most important fields for debugging and tracing. If your application already has a request identifier, you can also pass your own external identifier through metadata so the gateway record can be tied back to your own systems.

Requested model

This is the model value your application asked for, such as an alias or direct model ID. Examples:
  • smart-default
  • reasoning
  • @anthropic/claude-sonnet-4

Provider and upstream model

These fields show how the request was actually executed after model resolution. This distinction matters because the requested model may be an alias, while the upstream model is the concrete target selected by the gateway.

Status

Status helps you separate successful traffic from failing traffic. This is useful for dashboards, alerting, QA review, and operational debugging.

Token usage

For chat requests, token usage includes:
  • prompt tokens
  • completion tokens
  • total tokens
For embeddings requests, token usage includes:
  • prompt tokens
  • total tokens
These values help teams understand usage intensity and cost drivers.

Estimated cost

If pricing is configured in the gateway, the request record can include an estimated cost value. This is useful for trend analysis, reporting, and budget reviews, even when downstream billing systems are separate.

Error code

If a request fails, the error code helps teams understand whether the issue came from:
  • authentication
  • validation
  • guardrail enforcement
  • rate limiting
  • route configuration
  • timeout
  • upstream execution failure
This is often the fastest way to categorize operational issues.

Request metadata for better traceability

Agumbe allows callers to attach metadata to requests. This metadata can travel with the request and appear in request logs and usage events. Supported metadata fields include:
  • workspace_id
  • xnamespace_id
  • source_service
  • operation
  • external_request_id
These fields are especially useful for teams that want to connect gateway activity back to internal product workflows.

Example with request metadata

{ "model": "smart-default", "messages": [ { "role": "user", "content": "Summarize this support ticket." } ], "agumbe_metadata": { "workspace_id": "workspace_123", "xnamespace_id": "ns_support", "source_service": "support-api", "operation": "ticket_summary", "external_request_id": "ticket_789" } } This makes the request much easier to understand later in logs and downstream reporting systems.

Timing headers

In addition to request logs, the gateway can return timing headers on successful chat and embeddings responses. These headers expose where time was spent during execution. Available headers include:
  • x-agumbe-timing-total-ms
  • x-agumbe-timing-model-resolve-ms
  • x-agumbe-timing-guardrail-config-ms
  • x-agumbe-timing-guardrail-input-ms
  • x-agumbe-timing-provider-ms
  • x-agumbe-timing-guardrail-output-ms
  • x-agumbe-timing-request-log-ms
  • x-agumbe-timing-usage-emit-ms
  • x-agumbe-timing-side-effects-ms
  • x-agumbe-timing-gateway-overhead-ms
These headers are useful when you want to understand not just that a request was slow, but why it was slow.

How to read timing headers

Each timing field answers a different operational question.

x-agumbe-timing-total-ms

The total end-to-end time spent handling the request. Use this when you want the full gateway execution time.

x-agumbe-timing-model-resolve-ms

The time spent resolving the requested model into a usable route target. This is usually small, but it is still useful for understanding the full request lifecycle.

x-agumbe-timing-guardrail-config-ms

The time spent loading the applicable app policy. This helps you understand the policy lookup portion of the request.

x-agumbe-timing-guardrail-input-ms

The time spent preparing and inspecting request-side content before the provider call. This includes guardrail checks on incoming prompt or embedding content.

x-agumbe-timing-provider-ms

The time spent in the upstream model provider call. This is often the most important timing field for latency analysis because it reflects the actual inference call.

x-agumbe-timing-guardrail-output-ms

The time spent inspecting and processing the generated response after the provider call. This is especially useful when groundedness or output checks are enabled.

x-agumbe-timing-request-log-ms

The time spent writing request log records.

x-agumbe-timing-usage-emit-ms

The time spent emitting usage events.

x-agumbe-timing-side-effects-ms

The total time spent on post-request side effects, such as request logging and usage emission.

x-agumbe-timing-gateway-overhead-ms

The portion of total request time that was not spent inside the upstream provider. This helps teams understand gateway-side overhead separately from provider execution time.

Estimated cost header

The gateway can also return an estimated cost header:
  • x-agumbe-estimated-cost-usd
If model pricing is configured, this header gives a fast, request-level estimate of cost. This is helpful for:
  • comparing workloads
  • reviewing expensive prompts
  • debugging sudden cost spikes
  • understanding which use cases are most resource-intensive
It is important to treat this as an operational estimate rather than a full billing contract.

Usage events

In addition to request logs, the gateway can emit structured usage events for downstream systems. These events can include fields such as:
  • tenant ID
  • user ID
  • request ID
  • workspace ID
  • namespace ID
  • source service
  • operation
  • external request ID
  • request kind
  • requested model
  • provider
  • upstream model
  • prompt tokens
  • completion tokens
  • total tokens
  • latency
  • status
  • estimated cost
  • timestamp
These events are useful when you want to feed usage into metering, billing, analytics, or warehouse pipelines.

Observability and guardrails

Observability is particularly important when guardrails are enabled. Because guardrails can detect, redact, cap, block, or rate-limit requests, teams need a clear way to understand how policy enforcement affects runtime behavior. Request logs and response traces help answer questions such as:
  • Was this request blocked by policy?
  • Which policy set was used?
  • Was content redacted before reaching the model?
  • Was the response modified before being returned?
  • Was the request limited because of app-level rate limits?
  • Was the model blocked by an allowlist policy?
This is one of the major advantages of using a gateway: policy behavior becomes observable rather than hidden inside application code.

Observability and routing

Request logging is also valuable for routing analysis. Because the gateway records fields such as:
  • requested model
  • provider
  • upstream model
  • latency
  • error code
teams can use logs to understand how route behavior performs over time. This helps with questions such as:
  • Are aliases resolving the way we expect?
  • Are fallback models seeing real traffic?
  • Are certain models consistently slower?
  • Are failures clustering around one target?
  • Is a route strategy causing cost growth?
In other words, request observability helps teams make better routing decisions.

Example operational use cases

Here are a few practical ways teams use request logs and observability.

Debug a failed support workflow

A support application returns an error to the user. The team looks up the request log, sees the request ID, confirms the app ID, and finds that the request failed with guardrail_model_blocked. This immediately tells them the issue was policy-related, not provider-related.

Investigate latency complaints

A team notices that a user-facing workflow feels slow. They inspect response headers and see that provider time is normal, but total time is much higher because of side-effect timing and policy checks. This gives them a much better starting point for tuning the workflow.

Review expensive prompts

A team looks at estimated cost and token usage across recent requests and notices that one internal operation is using much more output volume than expected. They then tighten max token settings for that app.

Connect gateway traffic to product workflows

A company uses source_service, operation, and external_request_id fields to connect gateway traffic back to product features. This helps them understand which product surfaces drive usage and cost.

Best practices

Always attach meaningful metadata

If your system has concepts like service name, workflow name, tenant workspace, or external request ID, send them with the request. This makes logs far more useful later.

Treat request IDs as first-class debugging tools

Include request IDs in your application logs and support workflows so teams can trace incidents quickly.

Watch both latency and cost

A request can succeed technically and still be operationally problematic if it is too slow or too expensive.

Separate provider time from gateway time

Do not assume all latency comes from the upstream model. Use timing headers to understand where time is actually spent.

Use observability during rollout, not only during incidents

The best time to review logs is before a problem becomes severe. Early monitoring helps teams catch model, routing, or policy issues before they affect a larger user base. If you are moving into production, start with this observability pattern:
  • use request logs for every production workload
  • attach source_service, operation, and external_request_id
  • monitor timing headers during rollout and performance tuning
  • review token usage and estimated cost regularly
  • track failed requests by error code
  • use app-specific policies so logs are easier to interpret
This gives you a practical observability foundation without requiring a large monitoring project on day one.

Next steps

Once you understand request logging and observability, the next helpful pages are:
  • Best Practices for production integration guidance
  • API Overview for the request and response contract
  • Go to Production for a broader launch checklist