Production AI systems. Real numbers, real customers.
Case studies from live deployments — voice agents taking calls, chatbots deflecting tickets, agents reactivating customers, enterprise RAG over permissioned document estates. Every metric below is from production traffic, not a demo environment. NDAs kept where required; architecture shared openly.
Milina — AI voice agent for a NYC restaurant at $0.09 per call
50+ reservations a night, bilingual (English + Spanish), sub-700ms response latency. LiveKit + Deepgram + GPT-4o-mini + Cartesia. Callers routinely don't realize they're talking to AI.
CleverAnswerAI — HIPAA dental receptionist, 20+ offices
Self-hosted LiveKit on a BAA-covered stack, live for a year. 100% answer rate. 28% more new-patient bookings. Direct integration with Dentrix, Open Dental, Curve, Eaglesoft.
iGaming QA — 66% to 91% with schema-guided reasoning
Took a Tier-1 operator's QA accuracy from 66% to 91% and coverage from 2% to 25%. Rubric-as-code, 1,200-case eval harness, two-model ensemble on regulatory criteria.
Dry cleaning chain — AI reactivation agent, 3.5x ROI
192K customer × category intervals scored daily. LangGraph agent picks channel, message, offer, and timing per customer. 18.7% reactivation across 23 treatment categories.
ConvoTune — AI call transcription & scoring for a 40-seat sales org
3,000+ calls scored per month against a 30-point playbook. 89% agreement with human reviewers. Real-time coaching prompts at <300ms. Entire pipeline in client AWS.
Corporate RAG — ~500 internal users, permission-scoped
A UK building repair, maintenance and refurbishment group (under NDA). Permission-scoped RAG over SharePoint with AWS Kendra + Bedrock + OpenFGA + Keycloak. Document search collapsed from ~15 minutes to seconds (~150× faster).
German technical RAG — when the framework wasn't enough
Two German tenants (under NDA): a concrete-products manufacturer and a regional municipal water utility. We deleted the off-the-shelf RAG framework and wrote a single-orchestrator pipeline (rag2). DIN / EN / DWA norms preserved per chunk.
Earlier research and data-platform case studies.
Before the commercial RAG work landed, we ran an open RAG benchmark write-up and built data stacks on dbt, Snowflake, and Arabic-optimized analytics platforms. That work still pays bills for the clients — and we still do selective analytics engineering for existing AI clients — but it's no longer our primary practice.
Older write-ups kept online for reference: Enterprise RAG Challenge winner architecture, fitness-clubs analytics, medical aesthetics data platform, premium clinics analytics. These pages remain reachable but aren't featured in our current navigation.
Want similar results? Let's see if your use case ships.
One 20-minute call. Bring us your call volume, your tech stack, or your current conversion rate — we'll tell you honestly whether we can build it, what the architecture looks like, and what it'll cost.