AI Safety

How to Detect Prompt Injection

Block injection attempts before they reach your LLM.

Pre-screen prompts and post-screen completions with one API.

What it detects

  • Direct injection attempts
  • Indirect injection via retrieved content
  • Jailbreak prompts
  • Instruction override patterns
  • Data exfiltration probes
  • Custom rules

Why developers choose Vettly

  • Pre-built prompt-injection policy
  • Catches novel and known attacks
  • Same API for input and output checks
  • Audit trails for every blocked attempt
Example request
bash
const result = await vettly.check({
  content: userPrompt,
  contentType: 'text',
  policyId: 'prompt-injection',
});

if (result.action === 'block') {
  return { error: 'Prompt rejected', reasons: result.categories };
}

const completion = await llm.complete(userPrompt);

const safety = await vettly.check({
  content: completion,
  contentType: 'text',
  policyId: 'llm-output',
});
Example response
json
{
  "flagged": true,
  "action": "block",
  "categories": {
    "harassment": 0.93,
    "hate": 0.02
  },
  "policy": "default",
  "latency_ms": 142
}

Compared to regex-based detection

Regex misses semantic injection. AI evaluators score by intent, not exact wording.

See the AI chatbot pattern

Get an API key

Start making decisions in minutes with a Developer plan and clear upgrade paths.

Get an API key