Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page shows how to integrate Agumbe AI Gateway directly over HTTP. If you do not want to use an SDK, or if you are integrating from a language, runtime, or platform where a custom HTTP client is a better fit, you can call the gateway with standard HTTPS requests and JSON payloads. This is the most portable integration path. If your environment can send an authenticated HTTPS request, it can call Agumbe AI Gateway. For production use, the recommended pattern is still to call the gateway from your backend service, worker, or server-side application.

Before you begin

Make sure you have:
  • an Agumbe Gateway API key
  • the Agumbe base URL
  • a server-side environment that can send HTTPS requests
Base URL: https://api.agumbe.ai/api/v1/llm Set your API key as an environment variable: export AGUMBE_API_KEY="your_agumbe_gateway_api_key"

Authentication

Authenticate with a bearer token. Example header: Authorization: Bearer AGUMBE_API_KEY Most requests should also send: Content-Type: application/json

Your first chat request

Use the chat completions endpoint to send a conversational request through the gateway. Endpoint: POST /api/v1/llm/chat/completions Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "system",
        "content": "You are a concise assistant."
      },
      {
        "role": "user",
        "content": "Explain what an AI gateway does in one paragraph."
      }
    ],
    "max_completion_tokens": 220
  }'
This is the recommended starting pattern for most teams:
  • use a stable alias such as smart-default
  • keep the integration server-side
  • start with one clean request shape
  • evolve routing and guardrails later through the gateway

Example chat response

A successful response looks like this:
{
  "id": "chatcmpl_123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "@provider/model-name",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An AI gateway gives your application one managed entry point for model access, routing, guardrails, and observability."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 29,
    "total_tokens": 47
  }
}

Send an embeddings request

Use the embeddings endpoint when you need vector representations for search, retrieval, clustering, or similarity workflows. Endpoint: POST /api/v1/llm/embeddings Example:
curl https://api.agumbe.ai/api/v1/llm/embeddings \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embed-default",
    "input": "Agumbe AI Gateway helps teams route, govern, and observe AI traffic."
  }'

Example embeddings response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0123, -0.0481, 0.2214],
      "index": 0
    }
  ],
  "model": "@provider/embedding-model",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Use a specific app policy

If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id in the request body. Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Draft a safe reply to this customer message."
      }
    ],
    "max_completion_tokens": 180,
    "agumbe_guardrails_app_id": "app_support"
  }'
If you are using an app-scoped API key, the gateway automatically applies the bound app policy. In that case, you usually do not need to send this field.

Attach request metadata

Agumbe supports request metadata that helps teams trace traffic across systems and workflows. Supported metadata fields include:
  • workspace_id
  • xnamespace_id
  • source_service
  • operation
  • external_request_id
Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Summarize this support ticket."
      }
    ],
    "max_completion_tokens": 180,
    "agumbe_guardrails_app_id": "app_support",
    "agumbe_metadata": {
      "workspace_id": "workspace_123",
      "source_service": "support-api",
      "operation": "ticket_summary",
      "external_request_id": "ticket_789"
    }
  }'
This metadata appears in request logs and usage records, making production traffic easier to debug and analyze.

Use grounding context

If groundedness checks are enabled for your app policy, you can send grounding context with the request. Example:
curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Answer this question using the supplied refund policy."
      }
    ],
    "agumbe_guardrails_app_id": "app_support",
    "agumbe_grounding_context": [
      "Refunds are available within 14 days of purchase.",
      "Support agents must not promise exceptions outside the published refund policy."
    ]
  }'

List available models

Use the models endpoint to inspect the gateway catalog. Endpoint: GET /api/v1/llm/models Example:
curl https://api.agumbe.ai/api/v1/llm/models \
  -H "Authorization: Bearer $AGUMBE_API_KEY"
Use this endpoint when you want to:
  • inspect the available model catalog
  • discover aliases
  • confirm whether a model supports chat or embeddings
  • populate a model selector in a UI or internal tool

Read request logs

Use the request logs endpoint to inspect recent traffic for the authenticated tenant. Endpoint: GET /api/v1/llm/requests Example:
curl "https://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \
  -H "Authorization: Bearer $AGUMBE_API_KEY"
This is useful for debugging, QA, production monitoring, and usage review.

Read and update guardrail policies

You can also manage guardrail settings through HTTP. Read a policy:
curl "https://api.agumbe.ai/api/v1/llm/guardrails?app_id=app_support" \
  -H "Authorization: Bearer $AGUMBE_API_KEY"
Update a policy:
curl https://api.agumbe.ai/api/v1/llm/guardrails \
  -X PUT \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "app_id": "app_support",
    "settings": {
      "promptInjection": "detect",
      "indirectPromptInjection": "detect",
      "pii": "redact",
      "secrets": "redact",
      "deniedTopics": "detect",
      "outputSafety": "detect",
      "groundedness": "detect",
      "allowedModels": ["smart-default", "reasoning"],
      "maxTokens": 1024,
      "rateLimitPerMinute": 60
    }
  }'
These operations are most commonly used in trusted backend or console-style workflows.

Response headers

Successful chat and embeddings responses may include useful response headers. Common headers include:
  • x-agumbe-timing-total-ms
  • x-agumbe-timing-model-resolve-ms
  • x-agumbe-timing-guardrail-config-ms
  • x-agumbe-timing-guardrail-input-ms
  • x-agumbe-timing-provider-ms
  • x-agumbe-timing-guardrail-output-ms
  • x-agumbe-timing-request-log-ms
  • x-agumbe-timing-usage-emit-ms
  • x-agumbe-timing-side-effects-ms
  • x-agumbe-timing-gateway-overhead-ms
  • x-agumbe-estimated-cost-usd
These help you understand latency breakdowns and estimated request cost.

Example error response

When a request fails, the gateway returns a structured error object. Example:
{
  "error": {
    "message": "Model blocked by allowlist guardrail",
    "type": "invalid_request_error",
    "param": "model",
    "code": "guardrail_model_blocked"
  }
}
Typical error categories include:
  • authentication failures
  • validation failures
  • invalid model selection
  • app mismatch
  • guardrail policy blocks
  • rate limits
  • route configuration failures
  • upstream timeouts
  • provider execution failures

Example with fetch

If your backend runtime supports fetch, you can call the gateway directly with standard HTTP code.
const response = await fetch("https://api.agumbe.ai/api/v1/llm/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.AGUMBE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "smart-default",
    messages: [
      {
        role: "user",
        content: "Explain AI gateways briefly.",
      },
    ],
    max_completion_tokens: 180,
  }),
});

const data = await response.json();

if (!response.ok) {
  console.error("Gateway request failed", data);
} else {
  console.log(data.choices?.[0]?.message?.content ?? "");
}
This pattern is a good fit when you want a fully explicit HTTP integration without using an SDK wrapper.

Production recommendations

When integrating over HTTP, follow these guidelines:
  • keep the API key in server-side environment variables or a secret manager
  • do not expose the key in browser code
  • prefer aliases such as smart-default and embed-default
  • use app-scoped keys when the workload is fixed
  • attach request metadata for important workflows
  • handle structured error responses explicitly
  • log request IDs and relevant application identifiers
  • monitor timing headers during rollout and tuning

When to choose HTTP

Choose direct HTTP integration when:
  • you are integrating from a language without a preferred SDK
  • you want low-level control over request and response handling
  • you want to build your own internal client wrapper
  • you are debugging at the transport level
  • you want a portable, language-neutral integration path