Compliance · April 17, 2026 · 7 min read

SOC 2 for AI systems: a practical checklist

What auditors actually look for when you put an LLM in front of customer data — and the eleven controls that cover 80% of your exposure.

SOC 2 doesn’t have an "AI clause." That’s the first thing to understand. Auditors evaluate your AI stack against the same Trust Services Criteria they use for everything else: Security, Availability, Processing Integrity, Confidentiality, and Privacy. The problem is that an LLM introduces new surfaces for each of those criteria, and engineers rarely think about them until an auditor asks.

This is the checklist we use at the start of every Strategy Sprint with SOC 2-bound clients. It is not a replacement for your auditor or CPA; it is a pre-flight so your first auditor meeting isn’t a discovery session.

1. Data flow diagram — with the model in it

Most existing data-flow diagrams show database → service → user. For an AI system, you need: user input → orchestrator → retrieval → model → post-processing → user output. Every arrow is a potential control point. Auditors will ask for this; have it drawn before they do.

2. Model inventory

Which models are deployed, at what version, serving what traffic. If you use embeddings, those count too. A SOC 2 auditor wants to see a register, not a Slack thread.

3. Prompt injection defenses, documented

If a customer can get your LLM to reveal another customer’s data through a cleverly crafted prompt, you have a confidentiality incident. Auditors are increasingly aware of this. Document your defense-in-depth: input filtering, output filtering, per-tenant retrieval isolation, and monitoring for suspicious patterns.

4. Tenant isolation for retrieval

The single biggest AI-specific incident pattern: a RAG system returns content from Customer A’s documents in response to Customer B’s query. The fix is per-tenant vector indexes or tenant-scoped filtering on every query. Not "usually filters" — always filters, verified by tests.

5. PII redaction before logs hit storage

If you log model inputs for debugging and those inputs contain PII, your log store is now in scope for privacy controls. Cheaper to redact at the collection point than to retrofit retention policies on a terabyte of logs.

6. Model output validation

Processing Integrity requires that the system does what it claims. If the LLM occasionally returns uncited or malformed answers, that’s a processing-integrity gap. The fix: validate every output against a schema, reject or escalate failures, log both.

7. Change management for model updates

Upgrading from Llama-3 to Llama-4 is a material change to your system. It deserves the same change-management rigor you apply to a database migration: RFC, test, rollout plan, rollback plan. Auditors will look for this.

8. Drift detection and performance monitoring

A model that worked at launch can silently degrade as inputs drift. You need baseline metrics (accuracy, latency, refusal rate, citation coverage) and alerts on deviation. "We check it occasionally" fails SOC 2. "We have a dashboard reviewed weekly with documented thresholds" passes.

9. Third-party risk for model providers

If you use a hosted model API, that vendor is in scope as a subservice organization. You need their SOC 2 (if they have one) and a vendor risk assessment. This is the strongest single argument for offline models in regulated environments — it eliminates the category of risk entirely.

10. Incident response playbook — with AI-specific scenarios

Your existing IR playbook probably doesn’t cover: "the model returned confidential information to the wrong user." Add three or four AI-specific scenarios (prompt injection, data leakage, model malfunction) with specific response steps.

11. Access controls for the model endpoint

Who can call the model? Who can see logs? Who can push a new model version? Least-privilege, documented in your access matrix. For offline deployments, this extends to physical access to the GPU servers.

The one question that blocks most AI SOC 2 audits

"Can you produce the exact prompt, retrieved context, and output for request ID X?" If the answer is yes and takes under a minute, you’re 80% of the way through the AI-specific portion of the audit. If the answer is "we don’t log that," stop and fix it before scheduling.

Trust Services Criteria mapped to AI controls

Security: #3, #4, #11
Availability: #2, #7, #8
Processing Integrity: #6, #7, #8
Confidentiality: #3, #4, #5, #9
Privacy: #5, #9, #10

If you’re starting a SOC 2 audit with AI in scope

The order we recommend:

Week 1-2: data flow diagram + model inventory.
Week 3-4: tenant isolation, PII redaction, output validation. These are the most common gap findings.
Week 5-6: observability and drift detection.
Week 7-8: incident response, change management, access controls.
Then schedule the auditor.

If you want help with any of that, we run a focused Strategy Sprint specifically for "SOC 2 with AI in scope."

Request a proposal More posts →