The library. Everything we wrote down.
Guides, tools, case studies, research, references, and deep dives. Curated for engineers and leaders shipping AI in production.
In-depth guides
Long-form playbooks on how we build production AI.
The 2026 guide to picking an AI vendor
Agencies vs consultancies vs freelancers vs in-house — framework, cost data, time-to-ship benchmarks.
Total cost of ownership for LLM systems
API tokens are only 30% of real LLM cost. The other 70% in a complete TCO framework.
Six RAG patterns that actually work in production
Hybrid search, query rewriting, reranking, parent-document, metadata filtering, agentic retrieval.
Why evaluation infrastructure matters more than prompts
The four layers of production evals. Without them, every deploy is a prayer.
Multi-model routing: cutting LLM costs 40-60%
Route by task, not vendor. Deep dive with real cost data from production.
AI agents in production: what actually breaks
Infinite loops, state explosion, hallucinated tool calls. Failure modes and fixes.
Interactive tools
Calculators and checklists to shortcut your planning.
Case studies
Real production systems we have shipped, with outcomes and architecture.
Ledgerly — 74% reconciliation time cut
Multi-model copilot for a FinTech. Natural language queries, automated exception handling.
Hearthline — 60% inbound calls fully automated
Voice AI agent handling appointment booking, FAQs, escalation.
Kora Apparel — AOV lifted 23% with real-time recs
Personalization engine on 80M events/day, sub-50ms latency.
LegalFlow — contract intelligence at 10,000 docs/day
Document pipeline with classification, extraction, human review queue.
BrightStack — enterprise knowledge answering 92% of queries
RAG across Notion, Confluence, Slack, Linear, Drive with permission-aware retrieval.
SentinelPay — 99.95% fraud precision
Multi-stage fraud with ML + LLM pattern analysis + human review.
Research & benchmarks
Original research and evaluation data from our engagements.
We ran 200 LLMs through our eval suite
Custom benchmarks on 200 open and closed LLMs across seven production tasks.
AI hype vs reality: what actually shipped in 2025
Year-end review. What worked, what did not, where hype outran reality.
Semantic caching cut our biggest client's LLM bill 43%
Methodology, threshold tuning, when it works.
Reference library
Look-up content for practitioners.
AI & LLM Glossary
40 terms every AI engineer should know. Plain-English definitions.
AI Use Cases by Industry
10 high-leverage use cases across SaaS, FinTech, Retail, Education.
Compare: AI vendors side-by-side
10 comparison pages across popular AI vendor pairings.
FAQ — 18 questions we answer every week
Grouped by engagement, technology, delivery, and policy.
Pricing philosophy
How we scope and price engagements. Transparent tiers.
Security & compliance
Our posture on SOC 2, GDPR, data handling, vendor security.
Engineering deep dives
Technical writing for engineers and engineering leaders.
Chunking strategies: the unglamorous key to RAG quality
Five chunking approaches, when to use each.
Hybrid search: why pure vector isn't enough
BM25 + vector + RRF. Tuning notes from production.
LangGraph patterns we use in every agentic system
State routing, human-in-loop, retry-with-reflection, parallel tools, guards.
When to fine-tune (and when RAG is fine)
Three cases where fine-tuning earns its cost.
Building voice AI that passes the grandma test
Latency, interrupts, transfer, accent coverage.
Prompt testing like it's 2026
Golden sets, property tests, differential tests, fuzz tests.
Operations & reliability
Running AI systems in production without waking up at 3am.
By industry
Vertical-specific AI engineering posts.
AI fraud detection without over-blocking customers
High-precision fraud AI. Three architectural moves from FinTech work.
Retail personalization beyond "customers who bought"
Modern recs stack: embeddings + session context + LLM scoring.
What makes an AI tutor actually teach
Socratic questioning, knowledge graphs, scaffolded hints.
Designing AI copilots inside SaaS products
Feature copilots vs product copilots. How to choose.
Conversational UX for AI that isn't a chatbot
When chat wins, when structured input wins, when to hybridize.
Want new insights in your inbox?
One post per week. Engineering-first. No thought leadership, no pop-ups.