Skip to main content

Documentation Index

Fetch the complete documentation index at: https://agumbe.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page shows how to integrate Agumbe AI Gateway from a Python application. Agumbe AI Gateway exposes a provider-compatible API surface, which means you can use a familiar Python SDK pattern and point it to the Agumbe base URL. This makes it easy to get started quickly while keeping model routing, guardrails, request logging, and observability inside the gateway. For production use, the recommended pattern is to call the gateway from your backend service, worker, or server-side application, not directly from a browser-based client.

Before you begin

Make sure you have:
  • an Agumbe Gateway API key
  • the Agumbe base URL
  • a Python server-side environment
Base URL: https://api.agumbe.ai/api/v1/llm Set your API key as an environment variable: export AGUMBE_API_KEY="your_agumbe_gateway_api_key"

Install the SDK

pip install openai Even though the package name is openai, you are pointing it at Agumbe AI Gateway, not directly at any one model provider.

Create a client

import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) This client becomes your single entry point for chat and embeddings requests.

Send your first chat request

import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "Explain what an AI gateway does in one paragraph." } ], max_completion_tokens=220, ) print(response.choices[0].message.content) This is the recommended starting pattern for most teams:
  • use a stable alias such as smart-default
  • keep the integration server-side
  • start simple
  • evolve routing and guardrails later through the gateway

Send an embeddings request

import os
from openai import OpenAI

agumbe = OpenAI(
    api_key=os.environ["AGUMBE_API_KEY"],
    base_url="https://api.agumbe.ai/api/v1/llm",
)

response = agumbe.embeddings.create(
    model="embed-default",
    input="Agumbe AI Gateway helps teams route, govern, and observe AI traffic.",
)

print(response.data[0].embedding)
Use embeddings when you are building search, retrieval, classification, clustering, or similarity workflows.

Use a specific app policy

If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id. In Python, the cleanest way to send gateway-specific extra fields is through extra_body. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Draft a safe reply to this customer message." } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support" } ) print(response.choices[0].message.content) If you are using an app-scoped API key, the gateway applies the bound app policy automatically, so you usually do not need to send this field.

Attach request metadata

Agumbe supports request metadata that makes logs and observability much more useful. You can attach metadata such as:
  • workspace_id
  • xnamespace_id
  • source_service
  • operation
  • external_request_id
Example: import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Summarize this support ticket." } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support", "agumbe_metadata": { "workspace_id": "workspace_123", "source_service": "support-api", "operation": "ticket_summary", "external_request_id": "ticket_789" } } ) print(response.choices[0].message.content) This is especially useful in production, where teams need to connect gateway traffic back to internal systems and workflows.

Use grounding context

If groundedness checks are part of your app policy, you can send grounding context with the request. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Answer this question using the supplied refund policy." } ], extra_body={ "agumbe_guardrails_app_id": "app_support", "agumbe_grounding_context": [ "Refunds are available within 14 days of purchase.", "Support agents must not promise exceptions outside the published refund policy." ] } ) print(response.choices[0].message.content)

Read the response

A successful chat response includes the generated content and token usage. response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Write a one-line summary of this ticket." } ] ) text = response.choices[0].message.content usage = response.usage print(text) print(usage) You can also inspect gateway-specific response headers such as timing and estimated cost if your HTTP layer exposes raw response metadata. A simple server-side structure works well for most teams. app/ clients/ agumbe.py services/ summarizer.py routes/ support.py Example client module: import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) Example service module: from app.clients.agumbe import agumbe def summarize_ticket(ticket_text: str) -> str: response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "system", "content": "You summarize support tickets for an operations team." }, { "role": "user", "content": ticket_text } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support" } ) return response.choices[0].message.content or "" This pattern keeps your gateway setup centralized and makes the rest of the codebase easier to maintain.

Error handling

Agumbe returns structured errors. In Python, you should catch errors and handle them deliberately. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) try: response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Explain AI gateways briefly." } ] ) print(response.choices[0].message.content) except Exception as error: print("Gateway request failed") print(error) Typical failure cases include:
  • invalid credentials
  • invalid model selection
  • app mismatch
  • guardrail policy blocks
  • rate limits
  • upstream timeout or provider failure

Production recommendations

When integrating from Python, follow these guidelines:
  • keep the API key in server-side environment variables
  • do not expose the key in client-facing code
  • prefer aliases such as smart-default and embed-default
  • use app-scoped keys when the workload is fixed
  • attach request metadata for important workflows
  • send traffic through your backend, worker, or service layer
  • start with a small number of stable integration patterns
A good first production setup is usually:
  • one Python service or worker
  • one app-scoped key
  • one chat alias
  • one embeddings alias
  • one app policy
  • request metadata on every important workflow