AI Safety

How to Moderate AI-Generated Text

Catch unsafe completions before users see them.

Pre-screen prompts and post-screen outputs in real time.

What it detects

• Hate speech in completions
• PII leakage
• Policy violations
• Jailbreak success indicators
• Hallucinated unsafe content
• Custom rules

Why developers choose Vettly

• Streaming SDK for real-time output screening
• Pre-built llm-output policy
• Same API works on inputs and outputs
• Audit trail for every blocked completion

Example request

bash

import { createStreamingClient } from '@vettly/sdk';

const streaming = createStreamingClient('YOUR_KEY');
const ws = streaming.connectRealtime({
  policyId: 'chat-policy',
  onResult: (result) => {
    if (result.safe) showMessage(result);
    else logBlocked(result);
  }
});

await ws.connect();
const result = await ws.moderate(message);

Example response

json

{
  "safe": true,
  "action": "allow",
  "categories": {
    "harassment": 0.02,
    "spam": 0.01
  },
  "latency_ms": 47
}

Compared to relying on model safety alone

Model-side guardrails are routinely evaded. An independent check after generation closes that gap.

Keep exploring

Content Moderation API

One endpoint for text, image, and video moderation.

Image Moderation API

Policy-driven image checks with clear allow, review, and block actions.

Video Moderation API

Async video moderation without stitching together multiple vendors.

Content Moderation in Next.js

Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.

Get an API key

Start making decisions in minutes with a Developer plan and clear upgrade paths.

Get an API key