Guardrails - Agumbe AI Gateway

Guardrails in Agumbe AI Gateway help teams control how AI traffic behaves before it reaches a model and after a model returns a response. They are designed to give customers a practical, app-level policy system for real production workloads. Instead of pushing safety, usage, and model controls into every individual application, Agumbe lets you define those controls once and enforce them consistently through the gateway. For most teams, guardrails are one of the main reasons to use a gateway in the first place. They turn AI access from a raw model integration into a governed platform capability.

What guardrails do

A guardrail policy tells the gateway how to inspect, modify, or block requests and responses for a specific app. Depending on how a policy is configured, the gateway can:

detect risky or disallowed content
redact sensitive content before it reaches a model
block a request entirely
inspect model output before returning it
restrict which models an app is allowed to use
cap token usage
apply app-level request rate limits

This gives teams a way to manage both content safety and operational control through one policy layer.

Why guardrails matter

AI applications often need more than authentication and model access. They also need rules. For example, a team may want to:

prevent prompt injection attempts from passing through unchanged
redact personally identifiable information before it reaches a model
block secrets or credentials from being exposed in prompts or outputs
restrict an app to a small set of approved models
keep responses grounded in supplied context
limit output size and request rate for a specific workload

Without a centralized policy layer, these protections tend to be implemented inconsistently across services. Guardrails solve that problem by moving the enforcement point into the gateway.

Guardrails are app-level

In Agumbe, guardrails are stored and enforced at the app level. That means each app can have its own policy based on its purpose, sensitivity, risk level, and operational needs. This is important because not all AI workloads are the same. A support workflow, an internal knowledge assistant, and a marketing content tool usually should not share the exact same rules. Agumbe lets you keep those policies separate while still using one common gateway.

How a guardrail policy is selected

The gateway determines which guardrail policy to apply based on the app context of the request. There are two common patterns:

App-scoped API key

If the request is made with an app-scoped API key, the gateway automatically uses that app’s guardrail policy. This is the simplest and safest production pattern when one workload should always use one app policy.

Tenant-scoped API key

If the request is made with a tenant-scoped API key, the caller can select the app policy for that request by sending: agumbe_guardrails_app_id Example:

{
  "model": "smart-default",
  "messages": [
    {
      "role": "user",
      "content": "Review this customer message and suggest a reply."
    }
  ],
  "agumbe_guardrails_app_id": "app_support"
}

If an app-scoped key is used with a different app ID, the gateway rejects the request with an app_mismatch error.

When guardrails are applied

Guardrails are applied during request execution in three stages.

1. Request stage

At the request stage, the gateway can enforce controls such as:

allowed model restrictions
token caps
app-level rate limits

These checks happen before the provider request is executed.

2. Input stage

At the input stage, the gateway can inspect prompt content or embeddings input and decide whether to detect, redact, or block content based on the policy. This is where controls such as prompt injection detection, PII handling, secrets handling, and denied topic checks are applied to incoming data.

3. Output stage

At the output stage, the gateway can inspect the model response before returning it to the caller. This is where controls such as output safety, PII checks, secrets checks, denied topics, and groundedness evaluation are applied to generated text.

Supported guardrail controls

Agumbe currently supports the following guardrail policy fields.

Prompt injection

Use this to inspect direct prompt injection attempts in request content. This helps identify or block content intended to override instructions, reveal hidden prompts, bypass safety, or manipulate model behavior. Typical use cases:

customer-facing assistants
internal copilots
retrieval-augmented generation pipelines
agents that read user-provided text

Indirect prompt injection

Use this to inspect content that may contain embedded or retrieved instructions designed to hijack model behavior. This is especially relevant when your system works with documents, knowledge bases, retrieved content, or long-form user inputs. Typical use cases:

document Q&A
retrieval systems
knowledge assistants
agentic workflows

PII

Use this to detect, redact, or block personally identifiable information in prompts and outputs. This helps reduce the chance of sending sensitive user data to models unnecessarily and helps prevent sensitive output from being returned to callers. Typical examples include:

email addresses
phone numbers
payment card numbers
other personal identifiers

Secrets

Use this to detect, redact, or block credentials and secret material. This is useful for preventing accidental leakage of tokens, keys, private credentials, or secret-bearing payloads. Typical examples include:

API keys
private keys
access tokens
JWTs
cloud credentials

Denied topics

Use this to detect or block requests and outputs that relate to topics your app should not handle. This gives teams a simple way to define domain-level exclusions for an app. Examples might include:

legal advice
medical diagnosis
self-harm content
prohibited operational topics
restricted internal content categories

The policy can include a deniedTopicsList, which the gateway checks against prompt and output content.

Output safety

Use this to inspect generated output for unsafe or disallowed content patterns. This helps prevent the gateway from returning harmful generated text to the caller. Typical examples include:

instructions for harmful activity
credential theft guidance
unsafe exploit-oriented output
phishing or malicious operational guidance

Groundedness

Use this to check whether a generated answer stays anchored to context supplied by the caller. This is especially useful for retrieval-augmented systems where the response should remain tied to known source material. The gateway can use agumbe_grounding_context from the request to evaluate whether the generated response stays consistent with the provided grounding context.

Allowed models

Use this to restrict which models an app is allowed to use. This is useful when a team wants to:

standardize approved models
limit usage to tested models only
avoid high-cost or unapproved models
separate development and production model access

If the requested or resolved model is not in the allowlist, the gateway blocks the request.

Max tokens

Use this to cap token usage for a specific app. This helps teams control output size and reduce cost or misuse risk. If a request asks for more tokens than the app policy allows, the gateway caps the request to the configured maximum.

Rate limit per minute

Use this to define an app-level request rate limit. This gives teams a simple way to constrain how much traffic a workload can send in a short window.

Guardrail modes

Most content-oriented guardrails support four policy modes:

off
detect
redact
block

These modes let teams decide how strict each control should be.

Off

The gateway does not apply that control. Use this when the guardrail is not relevant to the app.

Detect

The gateway records that a policy match occurred but allows the content to continue unchanged. Use this when you want visibility first, before you start enforcing stronger actions. This is a good starting point for new teams or new workloads.

Redact

The gateway replaces matched content before continuing. Use this when you want to allow the request or response to proceed, but you do not want specific sensitive content to pass through unchanged. This is often a strong default for PII and secrets.

Block

The gateway rejects the request or response when a match occurs. Use this when the content should never be allowed for the app. This is appropriate for high-risk workflows or clearly disallowed content classes.

Example guardrail policy

Here is a representative policy payload:

{
  "app_id": "app_support",
  "settings": {
    "promptInjection": "detect",
    "indirectPromptInjection": "detect",
    "pii": "redact",
    "secrets": "redact",
    "deniedTopics": "detect",
    "outputSafety": "detect",
    "groundedness": "detect",
    "deniedTopicsList": ["legal advice", "medical diagnosis"],
    "allowedModels": ["smart-default", "reasoning"],
    "maxTokens": 1024,
    "rateLimitPerMinute": 60
  }
}

This policy allows the app to operate normally while still adding meaningful protections and operational controls.

Example: request with app selection

{
  "model": "smart-default",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise customer support assistant."
    },
    {
      "role": "user",
      "content": "My card number is 4111 1111 1111 1111. Can you help me update billing?"
    }
  ],
  "agumbe_guardrails_app_id": "app_support"
}

With a policy that redacts PII, the gateway can sanitize sensitive values before forwarding the request.

Example: request with grounding context

{
  "model": "smart-default",
  "messages": [
    {
      "role": "user",
      "content": "Answer this question using the supplied support policy."
    }
  ],
  "agumbe_guardrails_app_id": "app_support",
  "agumbe_grounding_context": [
    "Refunds are allowed within 14 days of purchase.",
    "Support agents must not promise exceptions outside the published refund policy."
  ]
}

If groundedness is enabled, the gateway can inspect whether the output remains aligned with the supplied context.

Guardrail traces in responses

When a guardrail policy is applied, the gateway can attach an agumbe_guardrails object to the response. This gives the caller structured visibility into what happened during policy enforcement. A trace can include:

whether a guardrail policy was applied
whether the subject was a session or app credential
which app policy was used
which decisions were made
whether content was detected, redacted, blocked, capped, or rate-limited

This is useful for debugging, QA, and operational review.

Example response fragment

{
  "agumbe_guardrails": {
    "applied": true,
    "subject": "app",
    "appId": "app_support",
    "decisions": [
      {
        "guardrail": "pii",
        "stage": "input",
        "action": "redacted",
        "mode": "redact",
        "detail": "Redacted payment card number"
      }
    ]
  }
}

How guardrails affect production behavior

Guardrails shape both safety and operational behavior. That means they should be treated as a product and platform configuration concern, not only a model concern. For example, the same model may behave very differently depending on:

whether prompt injection is detected or blocked
whether secrets are redacted
whether the app is limited to approved models
whether groundedness is enforced
whether max tokens are capped aggressively
whether request rate limits are strict or permissive

This is why app-level policies are so valuable. They let teams create a specific operating posture for each workload.

Recommended rollout strategy

For most teams, the best way to adopt guardrails is gradually.

Start with visibility

Begin with detect for controls such as:

prompt injection
indirect prompt injection
denied topics
output safety
groundedness

This helps you understand how your traffic behaves before you start blocking requests.

Redact sensitive data early

For pii and secrets, redact is often a strong practical default. This reduces risk while still allowing useful application traffic to continue.

Tighten model controls

Use allowedModels, maxTokens, and rateLimitPerMinute early in production. These controls are often low-friction and high-value.

Move to blocking where appropriate

Once you understand real traffic patterns, move high-risk controls to block where needed. This is especially helpful for regulated, customer-facing, or high-sensitivity workloads.

Best practices

Keep policies app-specific

Do not try to make one policy fit every workload. Different apps have different risk profiles and operational needs.

Use app-scoped keys when policies should never vary

If a workload should always use one guardrail policy, app-scoped API keys reduce ambiguity and improve safety.

Start simple

You do not need a very large policy on day one. A focused policy with PII, secrets, allowed models, token caps, and basic prompt injection detection is often enough to start well.

Review traces and logs

Use request logs and guardrail traces to understand how policies are behaving in real traffic.

Pair guardrails with model strategy

Guardrails are strongest when combined with sensible aliases, approved model lists, and production environment separation.

Common errors

Guardrail enforcement can produce structured errors when the gateway blocks a request. Examples include:

guardrail_blocked
guardrail_model_blocked
guardrail_rate_limit_exceeded

A typical error shape looks like this:

{
  "error": {
    "message": "Model blocked by allowlist guardrail",
    "type": "invalid_request_error",
    "param": "model",
    "code": "guardrail_model_blocked"
  }
}

Or:

{
  "error": {
    "message": "Request blocked by guardrail rate limit",
    "type": "rate_limit_error",
    "param": "rateLimitPerMinute",
    "code": "guardrail_rate_limit_exceeded"
  }
}

Recommended starting point

If you are setting up guardrails for the first time, start with this approach:

promptInjection: detect
indirectPromptInjection: detect
pii: redact
secrets: redact
deniedTopics: detect
outputSafety: detect
groundedness: detect
a short allowedModels list
a sensible maxTokens cap
a reasonable rateLimitPerMinute

This gives you useful visibility and strong baseline protection without making the system too rigid too early.

Next steps

Once guardrails are in place, the next pages to read are:

Routing and Reliability to understand how requests are executed after policy checks
Request Logging and Observability to see how policy decisions and request metadata appear operationally
Go to Production for a broader production-readiness checklist

Start here

Product

Cookbooks

Documentation Index

​What guardrails do

​Why guardrails matter

​Guardrails are app-level

​How a guardrail policy is selected

​App-scoped API key

​Tenant-scoped API key

​When guardrails are applied

​1. Request stage

​2. Input stage

​3. Output stage

​Supported guardrail controls

​Prompt injection

​Indirect prompt injection

​PII

​Secrets

​Denied topics

​Output safety

​Groundedness

​Allowed models

​Max tokens

​Rate limit per minute

​Guardrail modes

​Off

​Detect

​Redact

​Block

​Example guardrail policy

​Example: request with app selection

​Example: request with grounding context

​Guardrail traces in responses

​Example response fragment

​How guardrails affect production behavior

​Recommended rollout strategy

​Start with visibility

​Redact sensitive data early

​Tighten model controls

​Move to blocking where appropriate

​Best practices

​Keep policies app-specific

​Use app-scoped keys when policies should never vary

​Start simple

​Review traces and logs

​Pair guardrails with model strategy

​Common errors

​Recommended starting point

​Next steps