Production AI systems. Real numbers, real customers.

Case studies from live deployments — voice agents taking calls, chatbots deflecting tickets, agents reactivating customers, enterprise RAG over permissioned document estates. Every metric below is from production traffic, not a demo environment. NDAs kept where required; architecture shared openly.

Voice Agents Chatbots AI Agents RAG Evaluation HIPAA
Voice Agent · Restaurant · NYC

Milina — AI voice agent for a NYC restaurant at $0.09 per call

50+ reservations a night, bilingual (English + Spanish), sub-700ms response latency. LiveKit + Deepgram + GPT-4o-mini + Cartesia. Callers routinely don't realize they're talking to AI.

LiveKitDeepgram Nova-2GPT-4o-miniCartesiaResyToast POS
91%Task completion
$0.09Per call
+22%Bookings MoM
<700msp50 latency
Read the Milina case →
Voice Agent · HIPAA · Dental

CleverAnswerAI — HIPAA dental receptionist, 20+ offices

Self-hosted LiveKit on a BAA-covered stack, live for a year. 100% answer rate. 28% more new-patient bookings. Direct integration with Dentrix, Open Dental, Curve, Eaglesoft.

LiveKit (self-hosted)Deepgram EnterpriseAzure OpenAIElevenLabs EnterpriseDentrix
100%Answer rate
+28%New bookings
20+Offices
Read the CleverAnswerAI case →
LLM Evaluation · iGaming

iGaming QA — 66% to 91% with schema-guided reasoning

Took a Tier-1 operator's QA accuracy from 66% to 91% and coverage from 2% to 25%. Rubric-as-code, 1,200-case eval harness, two-model ensemble on regulatory criteria.

GPT-4oClaude Sonnet 3.5LangGraphLangSmithPydantic
66→91%Accuracy
2→25%Coverage
$0.04Per audit
Read the iGaming QA case →
AI Agent · Retail · Reactivation

Dry cleaning chain — AI reactivation agent, 3.5x ROI

192K customer × category intervals scored daily. LangGraph agent picks channel, message, offer, and timing per customer. 18.7% reactivation across 23 treatment categories.

LangGraphGPT-4oTwilio SMSWhatsApp Businessn8n
3.5xROI vs. control
18.7%Reactivation
60+Locations
Read the reactivation case →
Call QA · Sales Ops · B2B SaaS

ConvoTune — AI call transcription & scoring for a 40-seat sales org

3,000+ calls scored per month against a 30-point playbook. 89% agreement with human reviewers. Real-time coaching prompts at <300ms. Entire pipeline in client AWS.

Whisper fine-tunedDeepgram Nova-2Azure OpenAILangGraphTerraform
3,000+Calls/month
89%Scoring agreement
$34Per seat/mo
Read the ConvoTune case →
Enterprise RAG · UK Construction · NDA

Corporate RAG — ~500 internal users, permission-scoped

A UK building repair, maintenance and refurbishment group (under NDA). Permission-scoped RAG over SharePoint with AWS Kendra + Bedrock + OpenFGA + Keycloak. Document search collapsed from ~15 minutes to seconds (~150× faster).

AWS BedrockAWS KendraOpenFGAKeycloakNestJS 11Lambda
~150×Faster search
~500Internal users
30+Use cases shipped
Read the corporate RAG case →
Multi-tenant RAG · DE · NDA

German technical RAG — when the framework wasn't enough

Two German tenants (under NDA): a concrete-products manufacturer and a regional municipal water utility. We deleted the off-the-shelf RAG framework and wrote a single-orchestrator pipeline (rag2). DIN / EN / DWA norms preserved per chunk.

OpenAI embeddingsPinecone v6bm25smmarco-mMiniLMv2DoclingFlask
2Tenants live
0DAG frameworks
23+Golden eval cases
Read the German technical RAG case →

Earlier research and data-platform case studies.

Before the commercial RAG work landed, we ran an open RAG benchmark write-up and built data stacks on dbt, Snowflake, and Arabic-optimized analytics platforms. That work still pays bills for the clients — and we still do selective analytics engineering for existing AI clients — but it's no longer our primary practice.

Archive

Older write-ups kept online for reference: Enterprise RAG Challenge winner architecture, fitness-clubs analytics, medical aesthetics data platform, premium clinics analytics. These pages remain reachable but aren't featured in our current navigation.

Want similar results? Let's see if your use case ships.

One 20-minute call. Bring us your call volume, your tech stack, or your current conversion rate — we'll tell you honestly whether we can build it, what the architecture looks like, and what it'll cost.