Product

Moderation Policies as Code: Managing Content Rules with YAML

·8 min read

Most content moderation systems bury their rules inside model configurations, admin dashboards, or hardcoded if-else chains. When a policy changes — a new category needs to be blocked, a threshold needs adjustment, a regulation requires stricter rules — someone edits a config, deploys a new build, and hopes nothing breaks.

There's a better way: treat moderation policies like code. Define them in YAML, version them in Git, review changes in pull requests, and deploy them through your existing CI/CD pipeline. The Policies documentation covers the full schema reference; this post focuses on the workflow.

Why Policies as Code?

Traceable: every policy change has a commit hash, an author, a timestamp, and a review. When a regulator asks "why was this content blocked?", you can point to the exact policy version.

Reviewable: policy changes go through pull requests. Engineers, legal, and trust & safety review the same diff. No more "someone changed a setting in the dashboard and we're not sure when."

Testable: write tests against your policies. "Given this input, the policy should return block." Run tests in CI before deploying.

Rollbackable: if a policy change causes problems (too many false positives, too permissive), revert the commit and redeploy.

Policy Structure

A Vettly policy is a YAML file that defines categories, thresholds, and actions:

policies/community-safe.yamlYAML
name: community-safe
version: "3"
description: Standard community safety policy for UGC
categories:
hate_speech:
action: block
threshold: 0.7
harassment:
action: block
threshold: 0.8
nudity:
action: block
threshold: 0.6
violence:
action: flag
threshold: 0.7
self_harm:
action: block
threshold: 0.5
spam:
action: flag
threshold: 0.8
pii:
action: flag
threshold: 0.9
defaults:
action: allow

Each category has an action (allow, flag, or block) and a threshold (confidence score required to trigger the action). The defaults section handles categories not explicitly listed.

Multiple Policies for Different Contexts

Different parts of your product may need different policies:

policies/profile-photo.yamlYAML
name: profile-photo
version: "1"
description: Stricter policy for profile photos - visible everywhere
categories:
nudity:
action: block
threshold: 0.4 # Lower threshold = stricter
violence:
action: block
threshold: 0.5
hate_symbols:
action: block
threshold: 0.3
defaults:
action: allow
policies/direct-messages.yamlYAML
name: direct-messages
version: "2"
description: More permissive for private conversations, strict on harassment
categories:
harassment:
action: block
threshold: 0.7
threats:
action: block
threshold: 0.5
csam:
action: block
threshold: 0.1 # Zero tolerance
nudity:
action: flag # Flag but don't block in DMs
threshold: 0.6
defaults:
action: allow

When calling the API, specify which policy to use:

check.tsNode.js
// Different policies for different surfaces
const feedCheck = await vettly.check({
content: post.text,
policy: 'community-safe',
});
const profileCheck = await vettly.check({
imageUrl: avatar.url,
policy: 'profile-photo',
});
const dmCheck = await vettly.check({
content: message.text,
policy: 'direct-messages',
});

Versioning and Git Workflow

Store policies in your repository alongside your application code:

policies/
  community-safe.yaml
  profile-photo.yaml
  direct-messages.yaml
  chatbot-output.yaml
tests/
  policies/
    community-safe.test.ts
    profile-photo.test.ts

Policy changes follow the same workflow as code changes:

  1. Create a branch
  2. Edit the YAML file
  3. Run policy tests
  4. Open a pull request
  5. Legal and trust & safety review the diff
  6. Merge and deploy

The pull request diff makes it obvious what changed:

 categories:
   hate_speech:
     action: block
-    threshold: 0.7
+    threshold: 0.6  # Lowered per T&S review 2026-03-01

Testing Policies

Write test cases for your policies. Each test provides a content sample and asserts the expected action:

tests/policies/community-safe.test.tsNode.js
import { describe, it, expect } from 'vitest';
import { testPolicy } from '../helpers';
describe('community-safe policy', () => {
it('blocks obvious hate speech', async () => {
const result = await testPolicy('community-safe', {
content: '[example hate speech input]',
});
expect(result.action).toBe('block');
});
it('allows normal conversation', async () => {
const result = await testPolicy('community-safe', {
content: 'Hey, great post! I really enjoyed reading this.',
});
expect(result.action).toBe('allow');
});
it('flags borderline content for review', async () => {
const result = await testPolicy('community-safe', {
content: '[example borderline content]',
});
expect(result.action).toBe('flag');
});
});

Run these tests in CI. If a policy change breaks a test, the PR fails and the change doesn't ship.

Deployment

When you merge a policy change, deploy it to Vettly:

deploy-policies.shShell
#!/bin/bash
# Deploy all policies to Vettly
for policy in policies/*.yaml; do
echo "Deploying $policy..."
curl -X PUT https://api.vettly.dev/v1/policies \
-H "Authorization: Bearer $VETTLY_API_KEY" \
-H "Content-Type: application/yaml" \
--data-binary @"$policy"
done

Or use the SDK:

scripts/deploy-policies.tsNode.js
import fs from 'fs';
import path from 'path';
import { Vettly } from '@vettly/sdk';
const vettly = new Vettly(process.env.VETTLY_API_KEY);
const policiesDir = path.join(__dirname, '../policies');
for (const file of fs.readdirSync(policiesDir)) {
const yaml = fs.readFileSync(path.join(policiesDir, file), 'utf-8');
await vettly.policies.deploy({ yaml });
console.log(`Deployed: ${file}`);
}

Rollbacks

If a policy change causes problems (spike in false positives, user complaints), revert the Git commit and redeploy:

git revert HEAD
git push
# CI redeploys the previous policy version

Vettly keeps all policy versions, so you can also reference a specific version in your API calls during an incident. See Policy Versioning for details on how version history works.

rollback.tsNode.js
// Temporarily pin to a known-good policy version
const result = await vettly.check({
content: post.text,
policy: 'community-safe',
policyVersion: '2', // Previous version
});

Benefits Over Dashboard-Only Configuration

| Aspect | Dashboard | Policies as Code | |--------|-----------|-----------------| | Change history | Audit log (if available) | Full Git history | | Review process | "I changed it" in Slack | Pull request with diff | | Rollback | Manual revert | git revert | | Testing | Manual spot checks | Automated test suite | | Multi-environment | Copy settings manually | Deploy per environment | | Compliance evidence | Screenshots | Commit hashes and diffs |

Dashboard configuration is fine for getting started. But as your moderation requirements grow — multiple policies, regulatory obligations, cross-team review — policies as code scales better.

Define your moderation policies in code

Vettly supports YAML policy definitions with versioning, testing, and API deployment. Bring your moderation rules into your engineering workflow.