Raw LLM output is not user-safe output. The raw text from the model has to survive schema validation, content filtering, and business rules before it's safe to display, act on, or store. This post is the guardrail stack we deploy between every LLM and every production surface.

Validation stack

Raw output descends through schema validator → content filter → business rules before reaching the user. Each layer catches a different failure class; all four are necessary.

Layer 1: Schema validators

If you asked the LLM for structured output, validate the structure before trusting it. JSON schema, Zod, Pydantic, whatever your language uses. The validator rejects malformed output. On rejection, the right move is usually to retry with a repair prompt — show the LLM the schema error and ask it to fix. After two failed repair attempts, fall through to a graceful fallback (a default response, an error message, a human-in-loop path). See our structured outputs post for the full pattern.

Layer 2: Content filters

Content that is structurally valid can still be substantively wrong. Four categories matter: (1) toxic or harmful content slipping through jailbreaks, (2) topic violations (the assistant answering off-topic questions, which for a customer support bot is an attack surface), (3) hallucinated facts stated confidently, (4) PII or credentials leaking into responses.

The toolkit: OpenAI's moderation endpoint or similar for toxicity; a topic classifier (a small fine-tuned model or a prompted LLM) for topic enforcement; a fact-checker that compares claims to retrieved documents for hallucination (hard; worth the effort for high-stakes answers); secret and PII scanners. Each runs in parallel against the output; failure on any one triggers the fallback path.

Layer 3: Business rules

Even valid, safe content can violate business rules. Examples from real deployments: a pricing assistant quoting a price the system doesn't actually offer; a scheduling agent proposing a time outside business hours; a customer support bot confirming a return for an item that's not eligible; a legal-doc generator referring to a precedent that doesn't apply in the user's jurisdiction.

Business-rule validators are not generic — they're written per-feature. A pricing validator checks that every price mentioned exists in the product catalog at the quoted value. A scheduling validator checks that proposed times are within configured hours and the user's calendar is free. A legal validator checks that cited precedents are in the jurisdiction-relevant database. These are boring, case-by-case rules. Skipping them is where most high-profile AI incidents originate.

The fallback path

Every guardrail needs to know what happens when it fires. Common patterns: (1) Retry with feedback — useful for schema and minor content issues. (2) Return a generic safe response — "I can't help with that, could you rephrase?" Useful when the user's query is malformed or out of scope. (3) Escalate to human — for high-stakes systems where the cost of a wrong answer exceeds the cost of latency. (4) Hard reject — for truly disallowed categories.

Whatever you pick, log the guardrail firing with full context. These logs become the dataset for your next eval cycle — they're where you discover that 8% of your outputs hit a specific business-rule validator and your rule might be wrong or your prompt might need tightening.

Common mistakes we see

Running guardrails only in testing. The point of guardrails is production failures; test-time firing doesn't prove much. Running guardrails but not logging. Every blocked output is a learning opportunity; invisible blocks are waste. Making the fallback worse than the original failure. A schema retry that calls the LLM a third time when the user has already waited 4 seconds is worse than serving a graceful error. Calibrate your guardrail latency against the user experience.

Final reminder: guardrails are not a substitute for good prompts and good models. They catch the 2-5% of outputs that slip through. If your guardrails are firing on 20% of traffic, the prompt is broken, not the validators.

Guardrails and validators: keeping LLM outputs safe

Layer 1: Schema validators

Layer 2: Content filters

Layer 3: Business rules

The fallback path

Common mistakes we see

Continue the thread.

Making structured outputs actually reliable

LLM security basics every team should know

AI incident response playbook

Want to talk about this?