AI Safety
How to Moderate AI-Generated Text
Catch unsafe completions before users see them.
Pre-screen prompts and post-screen outputs in real time.
What it detects
- • Hate speech in completions
- • PII leakage
- • Policy violations
- • Jailbreak success indicators
- • Hallucinated unsafe content
- • Custom rules
Why developers choose Vettly
- • Streaming SDK for real-time output screening
- • Pre-built llm-output policy
- • Same API works on inputs and outputs
- • Audit trail for every blocked completion
Example request
bashimport { createStreamingClient } from '@vettly/sdk';
const streaming = createStreamingClient('YOUR_KEY');
const ws = streaming.connectRealtime({
policyId: 'chat-policy',
onResult: (result) => {
if (result.safe) showMessage(result);
else logBlocked(result);
}
});
await ws.connect();
const result = await ws.moderate(message);Example response
json{
"safe": true,
"action": "allow",
"categories": {
"harassment": 0.02,
"spam": 0.01
},
"latency_ms": 47
}Compared to relying on model safety alone
Model-side guardrails are routinely evaded. An independent check after generation closes that gap.
Keep exploring
Content Moderation API
One endpoint for text, image, and video moderation.
Image Moderation API
Policy-driven image checks with clear allow, review, and block actions.
Video Moderation API
Async video moderation without stitching together multiple vendors.
Content Moderation in Next.js
Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.
Get an API key
Start making decisions in minutes with a Developer plan and clear upgrade paths.
Get an API key