The AI stack we actually run in production
The tools, frameworks, and platforms shipping in clients' accounts right now. Grouped by layer, with honest "when to use what" notes. Not the shiny list — the working list.
Why a tech-stack page — and why this one.
Agencies list stacks to look credible. We list ours to filter — if you're thinking about building a voice agent on a platform we can't work with, we'd rather tell you up front than 40 minutes into a discovery call.
Everything below is in production right now. Not "we're excited about" or "we're evaluating." If we listed it, we've shipped it. If we pulled something off the list, we hit a wall and moved on.
For each tool, we call out when to use it — the actual decision framework. Most of these categories have multiple good options; the right one depends on the use case, the scale, and the constraints.
Voice agent platforms.
Three strong options, different sweet spots.
Vapi
Hosted voice AI. Fastest time-to-MVP — you can have a working inbound agent in an afternoon. Best for: early-stage companies validating voice AI before scaling; single-location deployments where operational simplicity matters more than per-call cost.
Retell AI
Hosted voice AI with strong enterprise features — call transfer, multi-turn handling, and analytics dashboards out of the box. Best for: mid-market deployments where the customer wants a polished product, not a framework.
LiveKit
Open-source voice infrastructure. Self-hosted or their cloud. Lowest per-call cost at scale, full control over audio pipeline. Best for: HIPAA-ready stacks, high-volume deployments, custom voice routing. Our default for restaurant and dental clients.
Twilio
The phone number and call routing layer under most voice agents. We use Twilio Programmable Voice for PSTN access and Twilio Flex when integrating with an existing contact-center estate.
Daily.co · Agora
Alternative real-time audio infra for specialized use cases — particularly multi-party calls and international telephony where Twilio's pricing or latency doesn't fit.
Our decision tree
Fast MVP, one location → Vapi. Mid-market, polished UX, multi-location → Retell. HIPAA, high volume, or cost-sensitive unit economics → LiveKit. We'll walk the actual trade-off in discovery.
Speech-to-text and text-to-speech.
The voice quality your customers actually hear.
Deepgram
Our default STT. Low-latency streaming transcription with strong domain-vocabulary tuning. Particularly good on noisy backgrounds — restaurants, clinics, mobile-caller audio.
Whisper
OpenAI's STT. Best accuracy on long-form and heavily accented audio. We use Whisper for async pipelines (call analysis, QA scoring) and Deepgram for real-time.
ElevenLabs
The voice most people mean when they say "AI that doesn't sound robotic." Enterprise tier is BAA-covered for HIPAA stacks. Our default when voice quality is the differentiator.
Cartesia
Lowest-latency TTS in production today. Sub-100ms first-audio time. Our default when latency is what makes or breaks the caller experience — which is most voice agents.
PlayHT · OpenAI TTS
Strong alternatives for specific voices or cost tiers. PlayHT when we need a multilingual catalogue beyond ElevenLabs; OpenAI TTS when the account is already on OpenAI contracts.
Our decision tree
Latency matters most → Cartesia. Voice quality matters most → ElevenLabs. HIPAA → ElevenLabs Enterprise under BAA or self-hosted open-weight TTS. Accent-heavy audio → Whisper on the input side.
LLMs — API, private, or self-hosted.
Choice driven by latency, cost, compliance, and residency.
GPT-4o · GPT-4o-mini
OpenAI's workhorses. GPT-4o for reasoning-heavy agent workflows and tool use. GPT-4o-mini for high-volume voice and chat deflection where cost per token matters.
Claude 3.5 Sonnet · Haiku
Anthropic's models. Sonnet for long-context reasoning and careful action-taking. Haiku for cost-sensitive conversational workloads. Strong on "careful not to hallucinate" workloads.
Azure OpenAI · AWS Bedrock
Same models, hosted inside your cloud tenancy with enterprise agreements. Our default for financial services and healthcare clients whose procurement can't approve a direct OpenAI contract.
Llama 3.3 · Mistral · Qwen
Open-weight models we deploy self-hosted for data-residency and cost reasons. Llama 3.3 for general reasoning, Mistral for smaller/faster, Qwen for multilingual-heavy work.
OpenAI embeddings · cohere · bge
OpenAI text-embedding-3 default for quality; cohere for multilingual; bge for self-hosted RAG stacks where we can't call out.
Our decision tree
Highest reasoning quality → Claude Sonnet. Lowest latency/cost at volume → GPT-4o-mini or Haiku. Residency or compliance constraints → Azure OpenAI / Bedrock / self-hosted. Discussed explicitly in discovery — not religion.
Chatbot platforms and agent frameworks.
When to buy vs. build vs. wrap.
Botpress · Voiceflow
Visual chatbot builders with solid operator UX. We build on these when the client's non-engineering team needs to own conversation flows post-launch.
ManyChat
The workhorse for WhatsApp, Instagram, and Messenger marketing automations. We wire custom backends behind ManyChat for clients already standardized on it.
LangGraph
Our default for stateful multi-step agent workflows. Durable state, inspectable graph, human-in-the-loop checkpoints.
LangChain
Building blocks for simpler pipelines where a DAG is overkill. We pick and compose, not adopt wholesale.
OpenAI Assistants API · CrewAI
Assistants API for scoped single-purpose assistants. CrewAI when role-based multi-agent composition fits the workflow (researcher/writer/reviewer patterns).
n8n · Make · Zapier
Visual automation layer for connecting agents to third-party services. n8n for self-hosted/data-residency use cases; Make or Zapier when the client is already on one.
Vector stores and RAG infrastructure.
Where your docs actually live.
Pinecone
Managed vector DB, fastest time-to-ship, strong at scale. Our default when operational overhead is the constraint.
Qdrant
Open-source vector DB, Rust-based, runs in your cloud. Our default when residency or cost-at-scale is the constraint.
pgvector
Vector search inside Postgres. Our default when RAG scale is moderate and the client already runs Postgres — saves operating a second data store.
Unstructured.io · Textract · custom OCR
Document-to-text pipelines for PDFs, scans, and structured forms. Unstructured for general; Textract when AWS-native; custom OCR for highly-formatted domain documents.
FastAPI · Node · Postgres · Redis
The plumbing underneath every agent — API endpoints, durable workflow state, job queues, rate limiters. Boring by design.
Docker · AWS · GCP · Railway
Containerized deployments. AWS and GCP for enterprise and compliance-heavy clients; Railway and Fly.io for startup deployments where ops simplicity wins.
CRM integrations and observability.
HubSpot · Salesforce · Zoho · GHL
Standard integrations with scoped API keys, permission boundaries, and conversation-to-record writeback. Also Pipedrive, Close, ActiveCampaign, Intercom, and custom CRMs.
Intercom · Zendesk · Help Scout · Front
Context-preserving handoff from AI to human, not "hi, can you tell me again what you just explained to the bot."
LangSmith
Our default agent observability. Every tool call, model call, and decision is traceable with input/output. We require this on every production agent deployment.
Helicone
LLM-specific monitoring — cost, latency, cache hit rate, per-model analytics. Complements LangSmith on the infrastructure side.
S3 WORM · Smarsh · Global Relay
Compliant storage for call recordings, transcripts, and supervision artifacts. Used in healthcare and financial-services deployments.
Segment · Amplitude · PostHog · Klaviyo
Event and customer-data stacks we plug agents into for onboarding triggers, churn signals, and retention campaign orchestration.
How we pick — the one-minute version.
Three variables decide 80% of our stack choices for a given project: latency budget, data-residency constraint, and volume at steady state.
Low latency, low residency, low volume → hosted platforms (Vapi / Pinecone / OpenAI API) win. High volume, strict residency, or HIPAA → self-hosted stacks (LiveKit / Qdrant / Llama) win on total cost and compliance. Mid-ground → hybrid (hosted voice, self-hosted vector, enterprise-hosted models).
We model the real numbers in discovery — per-call cost, per-conversation cost, p95 latency targets, residency obligations — and the stack choice falls out of the numbers. Most of the time, there's an obviously-correct answer for a given project; occasionally there isn't, and we walk the trade-off.
Want a stack opinion for your specific project?
One 20-minute technical call. We'll hear your use case, your scale, your residency constraints, and tell you what we'd build with and why. If we think you should pick something we don't work with, we'll tell you that too.