AI Safety

How to Detect Prompt Injection

Block injection attempts before they reach your LLM.

Pre-screen prompts and post-screen completions with one API.

What it detects

• Direct injection attempts
• Indirect injection via retrieved content
• Jailbreak prompts
• Instruction override patterns
• Data exfiltration probes
• Custom rules

Why developers choose Vettly

• Pre-built prompt-injection policy
• Catches novel and known attacks
• Same API for input and output checks
• Audit trails for every blocked attempt

Example request

bash

const result = await vettly.check({
  content: userPrompt,
  contentType: 'text',
  policyId: 'prompt-injection',
});

if (result.action === 'block') {
  return { error: 'Prompt rejected', reasons: result.categories };
}

const completion = await llm.complete(userPrompt);

const safety = await vettly.check({
  content: completion,
  contentType: 'text',
  policyId: 'llm-output',
});

Example response

json

{
  "flagged": true,
  "action": "block",
  "categories": {
    "harassment": 0.93,
    "hate": 0.02
  },
  "policy": "default",
  "latency_ms": 142
}

Compared to regex-based detection

Regex misses semantic injection. AI evaluators score by intent, not exact wording.

See the AI chatbot pattern

Keep exploring

Content Moderation API

One endpoint for text, image, and video moderation.

Image Moderation API

Policy-driven image checks with clear allow, review, and block actions.

Video Moderation API

Async video moderation without stitching together multiple vendors.

Content Moderation in Next.js

Add content moderation to a Next.js App Router project in minutes. Server-side API routes, React Server Components, and edge runtime examples.

Get an API key

Start making decisions in minutes with a Developer plan and clear upgrade paths.

Get an API key