Guardrails
Every request through DirectAI is protected by content safety, PII detection, and prompt injection prevention — included in every tier at no extra cost.
Content Safety
Requests and responses are evaluated against content safety policies. Content that violates policies is blocked with a structured error response.
- Hate speech and harassment detection
- Violence and self-harm content filtering
- Sexual content filtering
- Configurable severity thresholds per category
When content is blocked, you receive a 400 response with a content_policy_violation error code and details about which category was triggered.
PII Detection & Redaction
DirectAI scans requests for personally identifiable information and can redact or flag it before it reaches the model.
- Social Security Numbers (SSN)
- Email addresses
- Phone numbers
- Credit card numbers
- IP addresses
- Custom patterns via guardrail rules
PII detection modes:
| Mode | Behavior |
|---|---|
| block | Reject the request with 400 error |
| redact | Replace PII with placeholder tokens (e.g., [SSN_REDACTED]) |
| flag | Allow the request but log a warning in audit logs |
Prompt Injection Prevention
Multi-layer injection detection protects against adversarial inputs that attempt to override system instructions.
- Heuristic detection — pattern matching for common injection techniques
- Token analysis — detects encoding-based evasion (base64, unicode, homoglyphs)
- Regex patterns — matches known injection signatures
- Prompt Shield — optional integration with cloud-native content safety APIs
Injection attempts are blocked with a 400 response and logged to the audit trail.
Custom Rules
Define custom guardrail rules via the dashboard or the Rules API. Rules can match on request content using regex patterns, keyword lists, or string matching.
POST /api/v1/guardrails/rules
{
"name": "block-competitor-names",
"description": "Block requests mentioning competitor products",
"pattern": "\\b(CompetitorA|CompetitorB)\\b",
"action": "block",
"scope": "input"
}Configuration
Configure guardrail behavior per-user in the Dashboard → Guardrails → Configuration. Settings include:
- Enable/disable individual guardrail types
- Set severity thresholds for content safety categories
- Choose PII detection mode (block, redact, flag)
- Configure injection detection sensitivity
Violation Tracking
All guardrail violations are tracked and viewable in Dashboard → Guardrails → Violations. Each violation records:
- Violation type (content safety, PII, injection)
- Request ID for correlation
- Timestamp and API key
- Category and severity