Your embedding model doesn't matter if your chunks are garbage. That's the slightly-provocative version of a truth we've learned the hard way over dozens of RAG deployments: chunking is the single most underinvested step in most RAG pipelines, and it's the step with the biggest quality impact per hour of engineering investment. Five chunking strategies, when to use each, and the common failure modes.

Five strategies

Fixed-size → structure-aware → semantic → parent-child → specialized. Parent-child is the recommended default for most production systems.

Why chunking matters

Chunks are the atomic unit of RAG. Retrieval happens at chunk granularity. Whatever's in a chunk is what the LLM sees. Bad chunks fail in predictable ways: chunks that split mid-sentence lose context; chunks that mix unrelated topics confuse retrieval; chunks that are too small miss relationships; chunks that are too large dilute relevance. All of these happen with default 'split into 500-token pieces' approaches — which is why default approaches rarely produce good RAG.

Five chunking strategies

Strategy 1: Fixed-size with overlap

Split every N tokens with M tokens of overlap between chunks. Simplest, fastest, and sometimes sufficient. Works okay for unstructured prose where topic boundaries are fluid. Fails for structured content where natural boundaries exist — splitting mid-section mutilates context.

When to use: quick prototypes, pure prose corpora without structural signals, baseline before trying something better. Parameters: typical 500-1000 tokens per chunk, 100-200 token overlap.

Strategy 2: Structure-aware splitting

Split on document structure: headers, paragraphs, sections. Respects natural topic boundaries. Produces variable-size chunks that map to semantic units. This is the right default for most content — docs, articles, reports, wikis.

Implementation: parse the document (markdown, HTML, docx — each has structural signals) and split at heading boundaries. Combine small adjacent sections; split oversized sections further. Target chunks of 300-1000 tokens, but let structure drive it.

Strategy 3: Semantic splitting

Use an embedding model to detect topic boundaries. Embed each sentence; compute similarity between adjacent sentences; split where similarity drops below a threshold. Produces chunks that are coherent by meaning, not structure.

When to use: unstructured long-form content where structure alone doesn't capture topic shifts. Interview transcripts, meeting notes, informal writing. More expensive than structure-aware (embeds every sentence) but produces noticeably better chunks for these cases.

Strategy 4: Parent-child (small chunks, big context)

Embed small chunks for precise retrieval, but retrieve the larger parent section when a child matches. Discussed in our RAG patterns post. Best of both worlds: precision from small chunks, context from large ones.

Implementation: two-level structure. Each small chunk carries a parent_id. After retrieval, dedupe by parent_id and fetch the parents. Adds modest complexity but meaningfully improves retrieval quality on rich documents.

Strategy 5: Specialized (code, tables, images)

Code should be chunked at function/class boundaries, not token count. Tables should be chunked row-wise or as whole tables with captions. Images with captions should keep caption + image together. Each content type has its own right answer.

Skipping this is how technical docs produce terrible RAG. We've seen systems retrieve half a function signature because the fixed-size splitter cut mid-function. Build content-type-aware chunking for heterogeneous corpora.

What we actually deploy

Pragmatic combination: structure-aware splitting as the base, parent-child for documents with natural hierarchy, specialized handling for code and tables. Target chunk sizes 300-800 tokens, with 100 tokens of overlap where structure doesn't give clean boundaries. This covers 90% of cases and takes 2-3 days to implement for a new corpus.

Evaluate chunking changes with evals

Chunking is a parameter of your RAG system. Change it, measure retrieval quality. Don't eyeball it. Build a held-out eval set of query-expected-document pairs, run retrieval with old and new chunking, compare precision/recall. See our eval infrastructure post.

Common mistakes

Copying the default 1000-token-fixed-size code from an online tutorial. This fails for most real corpora.
Not handling tables at all. Tables in docs become garbage when token-split.
Ignoring metadata. Every chunk should carry source, section, timestamp — retrieval gets massively better with structured filters.
Not re-chunking when the corpus changes shape. A corpus that started as prose and grew to include tables needs re-chunking.
Over-engineering chunking before anything else. Structure-aware is usually enough for 80% of quality. Diminishing returns past that.

Tooling

LangChain's text splitters cover common cases. Unstructured.io handles document parsing (PDFs, docx) with structural preservation. LlamaIndex has good parent-child abstractions. For serious work, we usually write custom chunkers — they're 100-300 lines per content type and give full control. The libraries work for prototyping; production chunkers often need customization.

Closing

Chunking is where RAG quality is won or lost. Invest here before fine-tuning, before switching embedding models, before buying a fancier vector DB. The order of investment should be: chunking → retrieval (hybrid search, reranking) → prompting → model choice. Teams that skip chunking and invest in everything else end up with good tooling over bad foundations. The foundations matter more.

Chunking strategies: the unglamorous key to RAG quality