eazyware
Engineering·March 10, 2025·10 min read

Multi-tenancy for AI applications: isolation patterns

How to keep tenant data separate across retrieval, context, and caches. The three common failures we see.

KR
Kushal R.
Engineering lead

Multi-tenancy is the hardest part of shipping AI in B2B SaaS. Traditional isolation handles databases, auth, and row-level access. AI adds surfaces — retrieval indexes, context assembly, cache keys, logs — each of which can leak data across tenants if implemented naively. This post is the isolation pattern we deploy, and the three failures we see most often.

Five layers
Multi-tenancy isolation layers 5. Audit — per-tenant logs, dashboards 4. Cache isolation — tenant keys, TTLs 3. Context sanitization — strip cross-tenant refs 2. Retrieval filter — tenant_id in every query 1. Auth — JWT carries tenant_id common failures: skipping layer 3 (context leaks), shared cache keyed only by query, log unification
Auth (JWT carries tenant_id), retrieval filter, context sanitization, cache isolation, audit. Missing any one creates a leak.

Why AI multi-tenancy is uniquely hard

Traditional SaaS: tenant_id column on rows, ORM middleware enforces filtering. Hard to leak if you use the framework correctly.

AI adds: vector indexes (may or may not encode tenant_id), context assembly (pulling from multiple sources), LLM context window (ephemeral but logged), response caches (keyed by prompt not tenant), system prompts (shared across tenants, contain assumptions). Each is a potential leak point.

The isolation layers

Layer 1 — Authentication carries tenant_id. Every authenticated request knows its tenant. JWT or session token encodes it. Foundation for everything below.

Layer 2 — Retrieval always filters by tenant. Vector DB query includes tenant_id filter. Keyword search includes it. Hybrid search enforces on both sides. This must be in the retrieval library, not in calling code — otherwise developers forget on the third place they write retrieval logic.

Layer 3 — Context sanitization. Assembled context to LLM reviewed for cross-tenant references. Example failure: retrieval filter was correct, but one of the documents contains 'see attached message from Customer B, Tenant 42.' The document body leaks. Mitigation: sanitize during ingestion, or add a sanitization pass at retrieval time.

Layer 4 — Cache isolation. Semantic caching saves real cost but is a cross-tenant leak if naive. Tenant A asks 'what is our policy on X'; Tenant B asks the same. If cache key is just the prompt, Tenant B gets Tenant A's answer. Cache key must include tenant_id.

Layer 5 — Audit logs. Every AI request, retrieval, and response logged with tenant_id. Logs queryable by tenant. Retention policies per tenant if required.

The three common failures

1. Retrieval filter missed in one code path. A new feature added; developer wrote their own retrieval query and forgot the tenant filter. Fix: retrieval library enforces tenant_id at the interface level.

2. Shared cache across tenants. Developers added caching; didn't consider tenant dimension. Fix: cache key must include tenant_id, enforced in caching wrapper.

3. Log aggregation across tenants. Logs go to shared system without tenant labeling. Analyst querying for debugging sees across tenants. Fix: tenant_id as structured log field on every entry.

Testing for tenant isolation

Red-team your own system. User A asks questions that should only be answerable from Tenant A's data; verify Tenant B's data cannot leak. Add as automated CI tests. See red-teaming post.

Specific tests: identical document names in two tenants, verify retrieval returns the right one. Document in Tenant A referencing Tenant B by name, verify Tenant B cannot retrieve it. Identical prompts from two tenants with different expected answers, verify no cache crossover.

Read next
LLM security basics every team should know
Read next
PII redaction patterns for LLM pipelines
Read next
Designing AI copilots inside SaaS products
Tags
multi-tenancysecurityarchitectureSaaS
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request