AI Safety
How to Detect Prompt Injection
Block injection attempts before they reach your LLM.
Pre-screen prompts and post-screen completions with one API.
What it detects
- • Direct injection attempts
- • Indirect injection via retrieved content
- • Jailbreak prompts
- • Instruction override patterns
- • Data exfiltration probes
- • Custom rules
Why developers choose Vettly
- • Pre-built prompt-injection policy
- • Catches novel and known attacks
- • Same API for input and output checks
- • Audit trails for every blocked attempt
Example request
bashconst result = await vettly.check({
content: userPrompt,
contentType: 'text',
policyId: 'prompt-injection',
});
if (result.action === 'block') {
return { error: 'Prompt rejected', reasons: result.categories };
}
const completion = await llm.complete(userPrompt);
const safety = await vettly.check({
content: completion,
contentType: 'text',
policyId: 'llm-output',
});Example response
json{
"flagged": true,
"action": "block",
"categories": {
"harassment": 0.93,
"hate": 0.02
},
"policy": "default",
"latency_ms": 142
}Compared to regex-based detection
Regex misses semantic injection. AI evaluators score by intent, not exact wording.
See the AI chatbot patternKeep exploring
Content Moderation API
One endpoint for text, image, and video moderation.
Image Moderation API
Policy-driven image checks with clear allow, review, and block actions.
Video Moderation API
Async video moderation without stitching together multiple vendors.
Content Moderation in Next.js
Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.
Get an API key
Start making decisions in minutes with a Developer plan and clear upgrade paths.
Get an API key