AI Safety

How to Moderate AI-Generated Text

Catch unsafe completions before users see them.

Pre-screen prompts and post-screen outputs in real time.

What it detects

  • Hate speech in completions
  • PII leakage
  • Policy violations
  • Jailbreak success indicators
  • Hallucinated unsafe content
  • Custom rules

Why developers choose Vettly

  • Streaming SDK for real-time output screening
  • Pre-built llm-output policy
  • Same API works on inputs and outputs
  • Audit trail for every blocked completion
Example request
bash
import { createStreamingClient } from '@vettly/sdk';

const streaming = createStreamingClient('YOUR_KEY');
const ws = streaming.connectRealtime({
  policyId: 'chat-policy',
  onResult: (result) => {
    if (result.safe) showMessage(result);
    else logBlocked(result);
  }
});

await ws.connect();
const result = await ws.moderate(message);
Example response
json
{
  "safe": true,
  "action": "allow",
  "categories": {
    "harassment": 0.02,
    "spam": 0.01
  },
  "latency_ms": 47
}

Compared to relying on model safety alone

Model-side guardrails are routinely evaded. An independent check after generation closes that gap.

Get an API key

Start making decisions in minutes with a Developer plan and clear upgrade paths.

Get an API key