guard() or the API so that content is allowed, blocked, or escalated before it reaches users.
What guardrails cover
| Category | Description |
|---|---|
| PII | Mask or block personally identifiable information (email, phone, SSN, credit card, location, etc.) in text. |
| Moderation | Sexual content, hate/harassment, self-harm, violence, illicit activities. |
| Jailbreak | Detection of attempts to bypass system instructions. |
| NSFW | Not-safe-for-work content filtering. |
| Prompt injection | Detection of prompt injection in user or model content. |
| URL filter | Allow or block URLs by scheme and allow-list. |
Policies: Guardrails
Full guide to guardrail types (PII, moderation, etc.).
SDK: guard()
Evaluate guardrails from your app.
How to create a guardrails policy
- From the Policies page (or any header with New Policy), click New Policy.
- In the dialog, describe your guardrails in plain language (e.g. “Mask email and phone numbers, block hate speech”).
- Choose Guardrail as the mode (alongside Conditions and Instructions).
- Submit. You are taken to
/assistant/guardrailwhere the AI Assistant generates the guardrails configuration. - When streaming finishes, you see the Results area and Simulation sidebar. Adjust which guardrails are enabled (PII, moderation, etc.), run simulation with sample text, then Deploy to save.
Policy structure (guardrails)
Every guardrails policy has:| Element | Description |
|---|---|
| Policy key | Unique identifier (e.g. content-guardrail). Required. |
| Tags | Optional labels (e.g. safety, pii). |
| Status | Active (enforcing) or Inactive (disabled). |
Editing a guardrails policy
- Open Policies from the sidebar.
- Find the policy (filter or search by key). Guardrail policies show mode Guardrail.
- Click Edit. You are taken to
/policies/[id]/edit/guardrail. - Edit which guardrails are enabled (PII types, moderation categories, etc.), run Simulation with sample text to test, then save (Deploy / Update).
Simulation
Use Simulation to test a guardrails policy before or after deploying:- On the policy edit page (Guardrail tab), open the Simulation sidebar.
- Enter sample text (e.g. a model response or user message that might contain PII or unsafe content).
- Run the simulation. The result shows allow, block, or escalate and the reason.
Where guardrails appear in the platform
| Location | What you do |
|---|---|
| Policies | List all policies; guardrail policies have mode Guardrail. Edit, delete, or change status/tags. |
| New Policy | Create a new policy; choose Guardrail mode and describe intent → you are taken to the Assistant to generate your guardrails. |
Policy edit → Guardrail (/policies/[id]/edit/guardrail) | Edit your guardrails settings, simulate, and deploy. |
Summary
- Guardrails = policy mode that evaluates text for PII, moderation, jailbreak, NSFW, prompt injection, and URL rules.
- Create: New Policy → Guardrail + describe → Assistant generates configuration → Deploy.
- Edit: Policies → select policy → Edit → Guardrail tab; adjust your guardrails and simulate.
- Use in app: SDK
guard(policyKeyOrTag, text)or APIPOST /api/policies/:key/evaluate/guardrails. See SDK Guardrails.