Documentation Index
Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
This page shows how to integrate Agumbe AI Gateway directly over HTTP.
If you do not want to use an SDK, or if you are integrating from a language, runtime, or platform where a custom HTTP client is a better fit, you can call the gateway with standard HTTPS requests and JSON payloads.
This is the most portable integration path. If your environment can send an authenticated HTTPS request, it can call Agumbe AI Gateway.
For production use, the recommended pattern is still to call the gateway from your backend service, worker, or server-side application.
Before you begin
Make sure you have:
- an Agumbe Gateway API key
- the Agumbe base URL
- a server-side environment that can send HTTPS requests
Base URL:
https://api.agumbe.ai/api/v1/llm
Set your API key as an environment variable:
export AGUMBE_API_KEY="your_agumbe_gateway_api_key"
Authentication
Authenticate with a bearer token.
Example header:
Authorization: Bearer AGUMBE_API_KEY
Most requests should also send:
Content-Type: application/json
Your first chat request
Use the chat completions endpoint to send a conversational request through the gateway.
Endpoint:
POST /api/v1/llm/chat/completions
Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-default",
"messages": [
{
"role": "system",
"content": "You are a concise assistant."
},
{
"role": "user",
"content": "Explain what an AI gateway does in one paragraph."
}
],
"max_completion_tokens": 220
}'
This is the recommended starting pattern for most teams:
- use a stable alias such as smart-default
- keep the integration server-side
- start with one clean request shape
- evolve routing and guardrails later through the gateway
Example chat response
A successful response looks like this:
{
"id": "chatcmpl_123",
"object": "chat.completion",
"created": 1712345678,
"model": "@provider/model-name",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "An AI gateway gives your application one managed entry point for model access, routing, guardrails, and observability."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 29,
"total_tokens": 47
}
}
Send an embeddings request
Use the embeddings endpoint when you need vector representations for search, retrieval, clustering, or similarity workflows.
Endpoint:
POST /api/v1/llm/embeddings
Example:
curl https://api.agumbe.ai/api/v1/llm/embeddings \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "embed-default",
"input": "Agumbe AI Gateway helps teams route, govern, and observe AI traffic."
}'
Example embeddings response
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0123, -0.0481, 0.2214],
"index": 0
}
],
"model": "@provider/embedding-model",
"usage": {
"prompt_tokens": 12,
"total_tokens": 12
}
}
Use a specific app policy
If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id in the request body.
Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-default",
"messages": [
{
"role": "user",
"content": "Draft a safe reply to this customer message."
}
],
"max_completion_tokens": 180,
"agumbe_guardrails_app_id": "app_support"
}'
If you are using an app-scoped API key, the gateway automatically applies the bound app policy. In that case, you usually do not need to send this field.
Agumbe supports request metadata that helps teams trace traffic across systems and workflows.
Supported metadata fields include:
- workspace_id
- xnamespace_id
- source_service
- operation
- external_request_id
Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-default",
"messages": [
{
"role": "user",
"content": "Summarize this support ticket."
}
],
"max_completion_tokens": 180,
"agumbe_guardrails_app_id": "app_support",
"agumbe_metadata": {
"workspace_id": "workspace_123",
"source_service": "support-api",
"operation": "ticket_summary",
"external_request_id": "ticket_789"
}
}'
This metadata appears in request logs and usage records, making production traffic easier to debug and analyze.
Use grounding context
If groundedness checks are enabled for your app policy, you can send grounding context with the request.
Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart-default",
"messages": [
{
"role": "user",
"content": "Answer this question using the supplied refund policy."
}
],
"agumbe_guardrails_app_id": "app_support",
"agumbe_grounding_context": [
"Refunds are available within 14 days of purchase.",
"Support agents must not promise exceptions outside the published refund policy."
]
}'
List available models
Use the models endpoint to inspect the gateway catalog.
Endpoint:
GET /api/v1/llm/models
Example:
curl https://api.agumbe.ai/api/v1/llm/models \
-H "Authorization: Bearer $AGUMBE_API_KEY"
Use this endpoint when you want to:
- inspect the available model catalog
- discover aliases
- confirm whether a model supports chat or embeddings
- populate a model selector in a UI or internal tool
Read request logs
Use the request logs endpoint to inspect recent traffic for the authenticated tenant.
Endpoint:
GET /api/v1/llm/requests
Example:
curl "https://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \
-H "Authorization: Bearer $AGUMBE_API_KEY"
This is useful for debugging, QA, production monitoring, and usage review.
Read and update guardrail policies
You can also manage guardrail settings through HTTP.
Read a policy:
curl "https://api.agumbe.ai/api/v1/llm/guardrails?app_id=app_support" \
-H "Authorization: Bearer $AGUMBE_API_KEY"
Update a policy:
curl https://api.agumbe.ai/api/v1/llm/guardrails \
-X PUT \
-H "Authorization: Bearer $AGUMBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"app_id": "app_support",
"settings": {
"promptInjection": "detect",
"indirectPromptInjection": "detect",
"pii": "redact",
"secrets": "redact",
"deniedTopics": "detect",
"outputSafety": "detect",
"groundedness": "detect",
"allowedModels": ["smart-default", "reasoning"],
"maxTokens": 1024,
"rateLimitPerMinute": 60
}
}'
These operations are most commonly used in trusted backend or console-style workflows.
Successful chat and embeddings responses may include useful response headers.
Common headers include:
- x-agumbe-timing-total-ms
- x-agumbe-timing-model-resolve-ms
- x-agumbe-timing-guardrail-config-ms
- x-agumbe-timing-guardrail-input-ms
- x-agumbe-timing-provider-ms
- x-agumbe-timing-guardrail-output-ms
- x-agumbe-timing-request-log-ms
- x-agumbe-timing-usage-emit-ms
- x-agumbe-timing-side-effects-ms
- x-agumbe-timing-gateway-overhead-ms
- x-agumbe-estimated-cost-usd
These help you understand latency breakdowns and estimated request cost.
Example error response
When a request fails, the gateway returns a structured error object.
Example:
{
"error": {
"message": "Model blocked by allowlist guardrail",
"type": "invalid_request_error",
"param": "model",
"code": "guardrail_model_blocked"
}
}
Typical error categories include:
- authentication failures
- validation failures
- invalid model selection
- app mismatch
- guardrail policy blocks
- rate limits
- route configuration failures
- upstream timeouts
- provider execution failures
Example with fetch
If your backend runtime supports fetch, you can call the gateway directly with standard HTTP code.
const response = await fetch("https://api.agumbe.ai/api/v1/llm/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGUMBE_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "smart-default",
messages: [
{
role: "user",
content: "Explain AI gateways briefly.",
},
],
max_completion_tokens: 180,
}),
});
const data = await response.json();
if (!response.ok) {
console.error("Gateway request failed", data);
} else {
console.log(data.choices?.[0]?.message?.content ?? "");
}
This pattern is a good fit when you want a fully explicit HTTP integration without using an SDK wrapper.
Production recommendations
When integrating over HTTP, follow these guidelines:
- keep the API key in server-side environment variables or a secret manager
- do not expose the key in browser code
- prefer aliases such as smart-default and embed-default
- use app-scoped keys when the workload is fixed
- attach request metadata for important workflows
- handle structured error responses explicitly
- log request IDs and relevant application identifiers
- monitor timing headers during rollout and tuning
When to choose HTTP
Choose direct HTTP integration when:
- you are integrating from a language without a preferred SDK
- you want low-level control over request and response handling
- you want to build your own internal client wrapper
- you are debugging at the transport level
- you want a portable, language-neutral integration path