Guardrails

Every request through DirectAI is protected by content safety, PII detection, and prompt injection prevention — included in every tier at no extra cost.

Content Safety

Requests and responses are evaluated against content safety policies. Content that violates policies is blocked with a structured error response.

  • Hate speech and harassment detection
  • Violence and self-harm content filtering
  • Sexual content filtering
  • Configurable severity thresholds per category

When content is blocked, you receive a 400 response with a content_policy_violation error code and details about which category was triggered.

PII Detection & Redaction

DirectAI scans requests for personally identifiable information and can redact or flag it before it reaches the model.

  • Social Security Numbers (SSN)
  • Email addresses
  • Phone numbers
  • Credit card numbers
  • IP addresses
  • Custom patterns via guardrail rules

PII detection modes:

ModeBehavior
blockReject the request with 400 error
redactReplace PII with placeholder tokens (e.g., [SSN_REDACTED])
flagAllow the request but log a warning in audit logs

Prompt Injection Prevention

Multi-layer injection detection protects against adversarial inputs that attempt to override system instructions.

  • Heuristic detection — pattern matching for common injection techniques
  • Token analysis — detects encoding-based evasion (base64, unicode, homoglyphs)
  • Regex patterns — matches known injection signatures
  • Prompt Shield — optional integration with cloud-native content safety APIs

Injection attempts are blocked with a 400 response and logged to the audit trail.

Custom Rules

Define custom guardrail rules via the dashboard or the Rules API. Rules can match on request content using regex patterns, keyword lists, or string matching.

POST /api/v1/guardrails/rules
{
  "name": "block-competitor-names",
  "description": "Block requests mentioning competitor products",
  "pattern": "\\b(CompetitorA|CompetitorB)\\b",
  "action": "block",
  "scope": "input"
}

Configuration

Configure guardrail behavior per-user in the Dashboard → Guardrails → Configuration. Settings include:

  • Enable/disable individual guardrail types
  • Set severity thresholds for content safety categories
  • Choose PII detection mode (block, redact, flag)
  • Configure injection detection sensitivity

Violation Tracking

All guardrail violations are tracked and viewable in Dashboard → Guardrails → Violations. Each violation records:

  • Violation type (content safety, PII, injection)
  • Request ID for correlation
  • Timestamp and API key
  • Category and severity