Before you begin
Make sure you have:- an Agumbe Gateway API key
- the Agumbe base URL
- a server-side environment that can send HTTPS requests
https://api.agumbe.ai/api/v1/llm
Set your API key as an environment variable:
export AGUMBE_API_KEY="your_agumbe_gateway_api_key"
Authentication
Authenticate with a bearer token. Example header:Authorization: Bearer AGUMBE_API_KEY
Most requests should also send:
Content-Type: application/json
Your first chat request
Use the chat completions endpoint to send a conversational request through the gateway. Endpoint:POST /api/v1/llm/chat/completions
Example:
- use a stable alias such as smart-default
- keep the integration server-side
- start with one clean request shape
- evolve routing and guardrails later through the gateway
Example chat response
A successful response looks like this:Send an embeddings request
Use the embeddings endpoint when you need vector representations for search, retrieval, clustering, or similarity workflows. Endpoint:POST /api/v1/llm/embeddings
Example:
Example embeddings response
Use a specific app policy
If you are using a tenant-scoped API key, you can choose which app’s guardrails apply by sending agumbe_guardrails_app_id in the request body. Example:Attach request metadata
Agumbe supports request metadata that helps teams trace traffic across systems and workflows. Supported metadata fields include:- workspace_id
- xnamespace_id
- source_service
- operation
- external_request_id
Use grounding context
If groundedness checks are enabled for your app policy, you can send grounding context with the request. Example:List available models
Use the models endpoint to inspect the gateway catalog. Endpoint:GET /api/v1/llm/models
Example:
- inspect the available model catalog
- discover aliases
- confirm whether a model supports chat or embeddings
- populate a model selector in a UI or internal tool
Read request logs
Use the request logs endpoint to inspect recent traffic for the authenticated tenant. Endpoint:GET /api/v1/llm/requests
Example:
Read and update guardrail policies
You can also manage guardrail settings through HTTP. Read a policy:Response headers
Successful chat and embeddings responses may include useful response headers. Common headers include:- x-agumbe-timing-total-ms
- x-agumbe-timing-model-resolve-ms
- x-agumbe-timing-guardrail-config-ms
- x-agumbe-timing-guardrail-input-ms
- x-agumbe-timing-provider-ms
- x-agumbe-timing-guardrail-output-ms
- x-agumbe-timing-request-log-ms
- x-agumbe-timing-usage-emit-ms
- x-agumbe-timing-side-effects-ms
- x-agumbe-timing-gateway-overhead-ms
- x-agumbe-estimated-cost-usd
Example error response
When a request fails, the gateway returns a structured error object. Example:- authentication failures
- validation failures
- invalid model selection
- app mismatch
- guardrail policy blocks
- rate limits
- route configuration failures
- upstream timeouts
- provider execution failures
Example with fetch
If your backend runtime supports fetch, you can call the gateway directly with standard HTTP code.Production recommendations
When integrating over HTTP, follow these guidelines:- keep the API key in server-side environment variables or a secret manager
- do not expose the key in browser code
- prefer aliases such as smart-default and embed-default
- use app-scoped keys when the workload is fixed
- attach request metadata for important workflows
- handle structured error responses explicitly
- log request IDs and relevant application identifiers
- monitor timing headers during rollout and tuning
When to choose HTTP
Choose direct HTTP integration when:- you are integrating from a language without a preferred SDK
- you want low-level control over request and response handling
- you want to build your own internal client wrapper
- you are debugging at the transport level
- you want a portable, language-neutral integration path