Python - Agumbe AI Gateway

This page shows how to integrate Agumbe AI Gateway from a Python application. Agumbe AI Gateway exposes a provider-compatible API surface, which means you can use a familiar Python SDK pattern and point it to the Agumbe base URL. This makes it easy to get started quickly while keeping model routing, guardrails, request logging, and observability inside the gateway. For production use, the recommended pattern is to call the gateway from your backend service, worker, or server-side application, not directly from a browser-based client.

Before you begin

Make sure you have:

an Agumbe Gateway API key
the Agumbe base URL
a Python server-side environment

Base URL: https://api.agumbe.ai/api/v1/llm Set your API key as an environment variable: export AGUMBE_API_KEY="your_agumbe_gateway_api_key"

Install the SDK

pip install openai Even though the package name is openai, you are pointing it at Agumbe AI Gateway, not directly at any one model provider.

Create a client

import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) This client becomes your single entry point for chat and embeddings requests.

Send your first chat request

import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "Explain what an AI gateway does in one paragraph." } ], max_completion_tokens=220, ) print(response.choices[0].message.content) This is the recommended starting pattern for most teams:

use a stable alias such as smart-default
keep the integration server-side
start simple
evolve routing and guardrails later through the gateway

Send an embeddings request

import os
from openai import OpenAI

agumbe = OpenAI(
    api_key=os.environ["AGUMBE_API_KEY"],
    base_url="https://api.agumbe.ai/api/v1/llm",
)

response = agumbe.embeddings.create(
    model="embed-default",
    input="Agumbe AI Gateway helps teams route, govern, and observe AI traffic.",
)

print(response.data[0].embedding)

Use embeddings when you are building search, retrieval, classification, clustering, or similarity workflows.

Use a specific app policy

If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id. In Python, the cleanest way to send gateway-specific extra fields is through extra_body. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Draft a safe reply to this customer message." } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support" } ) print(response.choices[0].message.content) If you are using an app-scoped API key, the gateway applies the bound app policy automatically, so you usually do not need to send this field.

Attach request metadata

Agumbe supports request metadata that makes logs and observability much more useful. You can attach metadata such as:

workspace_id
xnamespace_id
source_service
operation
external_request_id

Example: import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Summarize this support ticket." } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support", "agumbe_metadata": { "workspace_id": "workspace_123", "source_service": "support-api", "operation": "ticket_summary", "external_request_id": "ticket_789" } } ) print(response.choices[0].message.content) This is especially useful in production, where teams need to connect gateway traffic back to internal systems and workflows.

Use grounding context

If groundedness checks are part of your app policy, you can send grounding context with the request. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Answer this question using the supplied refund policy." } ], extra_body={ "agumbe_guardrails_app_id": "app_support", "agumbe_grounding_context": [ "Refunds are available within 14 days of purchase.", "Support agents must not promise exceptions outside the published refund policy." ] } ) print(response.choices[0].message.content)

Read the response

A successful chat response includes the generated content and token usage. response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Write a one-line summary of this ticket." } ] ) text = response.choices[0].message.content usage = response.usage print(text) print(usage) You can also inspect gateway-specific response headers such as timing and estimated cost if your HTTP layer exposes raw response metadata.

Recommended project structure

A simple server-side structure works well for most teams. app/ clients/ agumbe.py services/ summarizer.py routes/ support.py Example client module: import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) Example service module: from app.clients.agumbe import agumbe def summarize_ticket(ticket_text: str) -> str: response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "system", "content": "You summarize support tickets for an operations team." }, { "role": "user", "content": ticket_text } ], max_completion_tokens=180, extra_body={ "agumbe_guardrails_app_id": "app_support" } ) return response.choices[0].message.content or "" This pattern keeps your gateway setup centralized and makes the rest of the codebase easier to maintain.

Error handling

Agumbe returns structured errors. In Python, you should catch errors and handle them deliberately. import os from openai import OpenAI agumbe = OpenAI( api_key=os.environ["AGUMBE_API_KEY"], base_url="https://api.agumbe.ai/api/v1/llm", ) try: response = agumbe.chat.completions.create( model="smart-default", messages=[ { "role": "user", "content": "Explain AI gateways briefly." } ] ) print(response.choices[0].message.content) except Exception as error: print("Gateway request failed") print(error) Typical failure cases include:

invalid credentials
invalid model selection
app mismatch
guardrail policy blocks
rate limits
upstream timeout or provider failure

Production recommendations

When integrating from Python, follow these guidelines:

keep the API key in server-side environment variables
do not expose the key in client-facing code
prefer aliases such as smart-default and embed-default
use app-scoped keys when the workload is fixed
attach request metadata for important workflows
send traffic through your backend, worker, or service layer
start with a small number of stable integration patterns

A good first production setup is usually:

one Python service or worker
one app-scoped key
one chat alias
one embeddings alias
one app policy
request metadata on every important workflow

SDKs

Documentation Index

​Before you begin

​Install the SDK

​Create a client

​Send your first chat request

​Send an embeddings request

​Use a specific app policy

​Attach request metadata

​Use grounding context

​Read the response

​Recommended project structure

​Error handling

​Production recommendations