Product
Moderation Policies as Code: Managing Content Rules with YAML
Most content moderation systems bury their rules inside model configurations, admin dashboards, or hardcoded if-else chains. When a policy changes — a new category needs to be blocked, a threshold needs adjustment, a regulation requires stricter rules — someone edits a config, deploys a new build, and hopes nothing breaks.
There's a better way: treat moderation policies like code. Define them in YAML, version them in Git, review changes in pull requests, and deploy them through your existing CI/CD pipeline. The Policies documentation covers the full schema reference; this post focuses on the workflow.
Why Policies as Code?
Traceable: every policy change has a commit hash, an author, a timestamp, and a review. When a regulator asks "why was this content blocked?", you can point to the exact policy version.
Reviewable: policy changes go through pull requests. Engineers, legal, and trust & safety review the same diff. No more "someone changed a setting in the dashboard and we're not sure when."
Testable: write tests against your policies. "Given this input, the policy should return block." Run tests in CI before deploying.
Rollbackable: if a policy change causes problems (too many false positives, too permissive), revert the commit and redeploy.
Policy Structure
A Vettly policy is a YAML file that defines categories, thresholds, and actions:
name: community-safeversion: "3"description: Standard community safety policy for UGCcategories:hate_speech:action: blockthreshold: 0.7harassment:action: blockthreshold: 0.8nudity:action: blockthreshold: 0.6violence:action: flagthreshold: 0.7self_harm:action: blockthreshold: 0.5spam:action: flagthreshold: 0.8pii:action: flagthreshold: 0.9defaults:action: allow
Each category has an action (allow, flag, or block) and a threshold (confidence score required to trigger the action). The defaults section handles categories not explicitly listed.
Multiple Policies for Different Contexts
Different parts of your product may need different policies:
name: profile-photoversion: "1"description: Stricter policy for profile photos - visible everywherecategories:nudity:action: blockthreshold: 0.4 # Lower threshold = stricterviolence:action: blockthreshold: 0.5hate_symbols:action: blockthreshold: 0.3defaults:action: allow
name: direct-messagesversion: "2"description: More permissive for private conversations, strict on harassmentcategories:harassment:action: blockthreshold: 0.7threats:action: blockthreshold: 0.5csam:action: blockthreshold: 0.1 # Zero tolerancenudity:action: flag # Flag but don't block in DMsthreshold: 0.6defaults:action: allow
When calling the API, specify which policy to use:
// Different policies for different surfacesconst feedCheck = await vettly.check({content: post.text,policy: 'community-safe',});const profileCheck = await vettly.check({imageUrl: avatar.url,policy: 'profile-photo',});const dmCheck = await vettly.check({content: message.text,policy: 'direct-messages',});
Versioning and Git Workflow
Store policies in your repository alongside your application code:
policies/
community-safe.yaml
profile-photo.yaml
direct-messages.yaml
chatbot-output.yaml
tests/
policies/
community-safe.test.ts
profile-photo.test.ts
Policy changes follow the same workflow as code changes:
- Create a branch
- Edit the YAML file
- Run policy tests
- Open a pull request
- Legal and trust & safety review the diff
- Merge and deploy
The pull request diff makes it obvious what changed:
categories:
hate_speech:
action: block
- threshold: 0.7
+ threshold: 0.6 # Lowered per T&S review 2026-03-01
Testing Policies
Write test cases for your policies. Each test provides a content sample and asserts the expected action:
import { describe, it, expect } from 'vitest';import { testPolicy } from '../helpers';describe('community-safe policy', () => {it('blocks obvious hate speech', async () => {const result = await testPolicy('community-safe', {content: '[example hate speech input]',});expect(result.action).toBe('block');});it('allows normal conversation', async () => {const result = await testPolicy('community-safe', {content: 'Hey, great post! I really enjoyed reading this.',});expect(result.action).toBe('allow');});it('flags borderline content for review', async () => {const result = await testPolicy('community-safe', {content: '[example borderline content]',});expect(result.action).toBe('flag');});});
Run these tests in CI. If a policy change breaks a test, the PR fails and the change doesn't ship.
Deployment
When you merge a policy change, deploy it to Vettly:
#!/bin/bash# Deploy all policies to Vettlyfor policy in policies/*.yaml; doecho "Deploying $policy..."curl -X PUT https://api.vettly.dev/v1/policies \-H "Authorization: Bearer $VETTLY_API_KEY" \-H "Content-Type: application/yaml" \--data-binary @"$policy"done
Or use the SDK:
import fs from 'fs';import path from 'path';import { Vettly } from '@vettly/sdk';const vettly = new Vettly(process.env.VETTLY_API_KEY);const policiesDir = path.join(__dirname, '../policies');for (const file of fs.readdirSync(policiesDir)) {const yaml = fs.readFileSync(path.join(policiesDir, file), 'utf-8');await vettly.policies.deploy({ yaml });console.log(`Deployed: ${file}`);}
Rollbacks
If a policy change causes problems (spike in false positives, user complaints), revert the Git commit and redeploy:
git revert HEAD
git push
# CI redeploys the previous policy version
Vettly keeps all policy versions, so you can also reference a specific version in your API calls during an incident. See Policy Versioning for details on how version history works.
// Temporarily pin to a known-good policy versionconst result = await vettly.check({content: post.text,policy: 'community-safe',policyVersion: '2', // Previous version});
Benefits Over Dashboard-Only Configuration
| Aspect | Dashboard | Policies as Code |
|--------|-----------|-----------------|
| Change history | Audit log (if available) | Full Git history |
| Review process | "I changed it" in Slack | Pull request with diff |
| Rollback | Manual revert | git revert |
| Testing | Manual spot checks | Automated test suite |
| Multi-environment | Copy settings manually | Deploy per environment |
| Compliance evidence | Screenshots | Commit hashes and diffs |
Dashboard configuration is fine for getting started. But as your moderation requirements grow — multiple policies, regulatory obligations, cross-team review — policies as code scales better.
Define your moderation policies in code
Vettly supports YAML policy definitions with versioning, testing, and API deployment. Bring your moderation rules into your engineering workflow.