Http - Agumbe AI Gateway

This page shows how to integrate Agumbe AI Gateway directly over HTTP. If you do not want to use an SDK, or if you are integrating from a language, runtime, or platform where a custom HTTP client is a better fit, you can call the gateway with standard HTTPS requests and JSON payloads. This is the most portable integration path. If your environment can send an authenticated HTTPS request, it can call Agumbe AI Gateway. For production use, the recommended pattern is still to call the gateway from your backend service, worker, or server-side application.

Before you begin

Make sure you have:

an Agumbe Gateway API key
the Agumbe base URL
a server-side environment that can send HTTPS requests

Base URL: https://api.agumbe.ai/api/v1/llm Set your API key as an environment variable: export AGUMBE_API_KEY="your_agumbe_gateway_api_key"

Authentication

Authenticate with a bearer token. Example header: Authorization: Bearer AGUMBE_API_KEY Most requests should also send: Content-Type: application/json

Your first chat request

Use the chat completions endpoint to send a conversational request through the gateway. Endpoint: POST /api/v1/llm/chat/completions Example:

curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "system",
        "content": "You are a concise assistant."
      },
      {
        "role": "user",
        "content": "Explain what an AI gateway does in one paragraph."
      }
    ],
    "max_completion_tokens": 220
  }'

This is the recommended starting pattern for most teams:

use a stable alias such as smart-default
keep the integration server-side
start with one clean request shape
evolve routing and guardrails later through the gateway

Example chat response

A successful response looks like this:

{
  "id": "chatcmpl_123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "@provider/model-name",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An AI gateway gives your application one managed entry point for model access, routing, guardrails, and observability."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 29,
    "total_tokens": 47
  }
}

Send an embeddings request

Use the embeddings endpoint when you need vector representations for search, retrieval, clustering, or similarity workflows. Endpoint: POST /api/v1/llm/embeddings Example:

curl https://api.agumbe.ai/api/v1/llm/embeddings \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embed-default",
    "input": "Agumbe AI Gateway helps teams route, govern, and observe AI traffic."
  }'

Example embeddings response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0123, -0.0481, 0.2214],
      "index": 0
    }
  ],
  "model": "@provider/embedding-model",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Use a specific app policy

If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id in the request body. Example:

curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Draft a safe reply to this customer message."
      }
    ],
    "max_completion_tokens": 180,
    "agumbe_guardrails_app_id": "app_support"
  }'

If you are using an app-scoped API key, the gateway automatically applies the bound app policy. In that case, you usually do not need to send this field.

Attach request metadata

Agumbe supports request metadata that helps teams trace traffic across systems and workflows. Supported metadata fields include:

workspace_id
xnamespace_id
source_service
operation
external_request_id

Example:

curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Summarize this support ticket."
      }
    ],
    "max_completion_tokens": 180,
    "agumbe_guardrails_app_id": "app_support",
    "agumbe_metadata": {
      "workspace_id": "workspace_123",
      "source_service": "support-api",
      "operation": "ticket_summary",
      "external_request_id": "ticket_789"
    }
  }'

This metadata appears in request logs and usage records, making production traffic easier to debug and analyze.

Use grounding context

If groundedness checks are enabled for your app policy, you can send grounding context with the request. Example:

curl https://api.agumbe.ai/api/v1/llm/chat/completions \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-default",
    "messages": [
      {
        "role": "user",
        "content": "Answer this question using the supplied refund policy."
      }
    ],
    "agumbe_guardrails_app_id": "app_support",
    "agumbe_grounding_context": [
      "Refunds are available within 14 days of purchase.",
      "Support agents must not promise exceptions outside the published refund policy."
    ]
  }'

List available models

Use the models endpoint to inspect the gateway catalog. Endpoint: GET /api/v1/llm/models Example:

curl https://api.agumbe.ai/api/v1/llm/models \
  -H "Authorization: Bearer $AGUMBE_API_KEY"

Use this endpoint when you want to:

inspect the available model catalog
discover aliases
confirm whether a model supports chat or embeddings
populate a model selector in a UI or internal tool

Read request logs

Use the request logs endpoint to inspect recent traffic for the authenticated tenant. Endpoint: GET /api/v1/llm/requests Example:

curl "https://api.agumbe.ai/api/v1/llm/requests?page=1&page_size=25&status=error&request_kind=chat" \
  -H "Authorization: Bearer $AGUMBE_API_KEY"

This is useful for debugging, QA, production monitoring, and usage review.

Read and update guardrail policies

You can also manage guardrail settings through HTTP. Read a policy:

curl "https://api.agumbe.ai/api/v1/llm/guardrails?app_id=app_support" \
  -H "Authorization: Bearer $AGUMBE_API_KEY"

Update a policy:

curl https://api.agumbe.ai/api/v1/llm/guardrails \
  -X PUT \
  -H "Authorization: Bearer $AGUMBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "app_id": "app_support",
    "settings": {
      "promptInjection": "detect",
      "indirectPromptInjection": "detect",
      "pii": "redact",
      "secrets": "redact",
      "deniedTopics": "detect",
      "outputSafety": "detect",
      "groundedness": "detect",
      "allowedModels": ["smart-default", "reasoning"],
      "maxTokens": 1024,
      "rateLimitPerMinute": 60
    }
  }'

These operations are most commonly used in trusted backend or console-style workflows.

Response headers

Successful chat and embeddings responses may include useful response headers. Common headers include:

x-agumbe-timing-total-ms
x-agumbe-timing-model-resolve-ms
x-agumbe-timing-guardrail-config-ms
x-agumbe-timing-guardrail-input-ms
x-agumbe-timing-provider-ms
x-agumbe-timing-guardrail-output-ms
x-agumbe-timing-request-log-ms
x-agumbe-timing-usage-emit-ms
x-agumbe-timing-side-effects-ms
x-agumbe-timing-gateway-overhead-ms
x-agumbe-estimated-cost-usd

These help you understand latency breakdowns and estimated request cost.

Example error response

When a request fails, the gateway returns a structured error object. Example:

{
  "error": {
    "message": "Model blocked by allowlist guardrail",
    "type": "invalid_request_error",
    "param": "model",
    "code": "guardrail_model_blocked"
  }
}

Typical error categories include:

authentication failures
validation failures
invalid model selection
app mismatch
guardrail policy blocks
rate limits
route configuration failures
upstream timeouts
provider execution failures

Example with fetch

If your backend runtime supports fetch, you can call the gateway directly with standard HTTP code.

const response = await fetch("https://api.agumbe.ai/api/v1/llm/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.AGUMBE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "smart-default",
    messages: [
      {
        role: "user",
        content: "Explain AI gateways briefly.",
      },
    ],
    max_completion_tokens: 180,
  }),
});

const data = await response.json();

if (!response.ok) {
  console.error("Gateway request failed", data);
} else {
  console.log(data.choices?.[0]?.message?.content ?? "");
}

This pattern is a good fit when you want a fully explicit HTTP integration without using an SDK wrapper.

Production recommendations

When integrating over HTTP, follow these guidelines:

keep the API key in server-side environment variables or a secret manager
do not expose the key in browser code
prefer aliases such as smart-default and embed-default
use app-scoped keys when the workload is fixed
attach request metadata for important workflows
handle structured error responses explicitly
log request IDs and relevant application identifiers
monitor timing headers during rollout and tuning

When to choose HTTP

Choose direct HTTP integration when:

you are integrating from a language without a preferred SDK
you want low-level control over request and response handling
you want to build your own internal client wrapper
you are debugging at the transport level
you want a portable, language-neutral integration path

SDKs

Documentation Index

​Before you begin

​Authentication

​Your first chat request

​Example chat response

​Send an embeddings request

​Example embeddings response

​Use a specific app policy

​Attach request metadata

​Use grounding context

​List available models

​Read request logs

​Read and update guardrail policies

​Response headers

​Example error response

​Example with fetch

​Production recommendations

​When to choose HTTP