The AI stack we actually run in production

The tools, frameworks, and platforms shipping in clients' accounts right now. Grouped by layer, with honest "when to use what" notes. Not the shiny list — the working list.

50+ production systems Voice · Chat · Agents API-hosted or self-hosted Cost modeled per deployment

Why a tech-stack page — and why this one.

Agencies list stacks to look credible. We list ours to filter — if you're thinking about building a voice agent on a platform we can't work with, we'd rather tell you up front than 40 minutes into a discovery call.

Everything below is in production right now. Not "we're excited about" or "we're evaluating." If we listed it, we've shipped it. If we pulled something off the list, we hit a wall and moved on.

For each tool, we call out when to use it — the actual decision framework. Most of these categories have multiple good options; the right one depends on the use case, the scale, and the constraints.

Voice agent platforms.

Three strong options, different sweet spots.

Managed platform

Vapi

Hosted voice AI. Fastest time-to-MVP — you can have a working inbound agent in an afternoon. Best for: early-stage companies validating voice AI before scaling; single-location deployments where operational simplicity matters more than per-call cost.

Managed platform

Retell AI

Hosted voice AI with strong enterprise features — call transfer, multi-turn handling, and analytics dashboards out of the box. Best for: mid-market deployments where the customer wants a polished product, not a framework.

Self-hosted framework

LiveKit

Open-source voice infrastructure. Self-hosted or their cloud. Lowest per-call cost at scale, full control over audio pipeline. Best for: HIPAA-ready stacks, high-volume deployments, custom voice routing. Our default for restaurant and dental clients.

Telephony

Twilio

The phone number and call routing layer under most voice agents. We use Twilio Programmable Voice for PSTN access and Twilio Flex when integrating with an existing contact-center estate.

Latency layer

Daily.co · Agora

Alternative real-time audio infra for specialized use cases — particularly multi-party calls and international telephony where Twilio's pricing or latency doesn't fit.

When to pick what

Our decision tree

Fast MVP, one location → Vapi. Mid-market, polished UX, multi-location → Retell. HIPAA, high volume, or cost-sensitive unit economics → LiveKit. We'll walk the actual trade-off in discovery.

Speech-to-text and text-to-speech.

The voice quality your customers actually hear.

STT

Deepgram

Our default STT. Low-latency streaming transcription with strong domain-vocabulary tuning. Particularly good on noisy backgrounds — restaurants, clinics, mobile-caller audio.

STT

Whisper

OpenAI's STT. Best accuracy on long-form and heavily accented audio. We use Whisper for async pipelines (call analysis, QA scoring) and Deepgram for real-time.

TTS

ElevenLabs

The voice most people mean when they say "AI that doesn't sound robotic." Enterprise tier is BAA-covered for HIPAA stacks. Our default when voice quality is the differentiator.

TTS

Cartesia

Lowest-latency TTS in production today. Sub-100ms first-audio time. Our default when latency is what makes or breaks the caller experience — which is most voice agents.

TTS

PlayHT · OpenAI TTS

Strong alternatives for specific voices or cost tiers. PlayHT when we need a multilingual catalogue beyond ElevenLabs; OpenAI TTS when the account is already on OpenAI contracts.

When to pick what

Our decision tree

Latency matters most → Cartesia. Voice quality matters most → ElevenLabs. HIPAA → ElevenLabs Enterprise under BAA or self-hosted open-weight TTS. Accent-heavy audio → Whisper on the input side.

LLMs — API, private, or self-hosted.

Choice driven by latency, cost, compliance, and residency.

General

GPT-4o · GPT-4o-mini

OpenAI's workhorses. GPT-4o for reasoning-heavy agent workflows and tool use. GPT-4o-mini for high-volume voice and chat deflection where cost per token matters.

General

Claude 3.5 Sonnet · Haiku

Anthropic's models. Sonnet for long-context reasoning and careful action-taking. Haiku for cost-sensitive conversational workloads. Strong on "careful not to hallucinate" workloads.

Enterprise hosting

Azure OpenAI · AWS Bedrock

Same models, hosted inside your cloud tenancy with enterprise agreements. Our default for financial services and healthcare clients whose procurement can't approve a direct OpenAI contract.

Self-hosted

Llama 3.3 · Mistral · Qwen

Open-weight models we deploy self-hosted for data-residency and cost reasons. Llama 3.3 for general reasoning, Mistral for smaller/faster, Qwen for multilingual-heavy work.

Embeddings

OpenAI embeddings · cohere · bge

OpenAI text-embedding-3 default for quality; cohere for multilingual; bge for self-hosted RAG stacks where we can't call out.

When to pick what

Our decision tree

Highest reasoning quality → Claude Sonnet. Lowest latency/cost at volume → GPT-4o-mini or Haiku. Residency or compliance constraints → Azure OpenAI / Bedrock / self-hosted. Discussed explicitly in discovery — not religion.

Chatbot platforms and agent frameworks.

When to buy vs. build vs. wrap.

Chatbot platform

Botpress · Voiceflow

Visual chatbot builders with solid operator UX. We build on these when the client's non-engineering team needs to own conversation flows post-launch.

Chatbot platform

ManyChat

The workhorse for WhatsApp, Instagram, and Messenger marketing automations. We wire custom backends behind ManyChat for clients already standardized on it.

Agent orchestration

LangGraph

Our default for stateful multi-step agent workflows. Durable state, inspectable graph, human-in-the-loop checkpoints.

Agent orchestration

LangChain

Building blocks for simpler pipelines where a DAG is overkill. We pick and compose, not adopt wholesale.

Agent orchestration

OpenAI Assistants API · CrewAI

Assistants API for scoped single-purpose assistants. CrewAI when role-based multi-agent composition fits the workflow (researcher/writer/reviewer patterns).

Workflow glue

n8n · Make · Zapier

Visual automation layer for connecting agents to third-party services. n8n for self-hosted/data-residency use cases; Make or Zapier when the client is already on one.

Vector stores and RAG infrastructure.

Where your docs actually live.

Managed

Pinecone

Managed vector DB, fastest time-to-ship, strong at scale. Our default when operational overhead is the constraint.

Self-hosted

Qdrant

Open-source vector DB, Rust-based, runs in your cloud. Our default when residency or cost-at-scale is the constraint.

Postgres-native

pgvector

Vector search inside Postgres. Our default when RAG scale is moderate and the client already runs Postgres — saves operating a second data store.

Document processing

Unstructured.io · Textract · custom OCR

Document-to-text pipelines for PDFs, scans, and structured forms. Unstructured for general; Textract when AWS-native; custom OCR for highly-formatted domain documents.

Backend

FastAPI · Node · Postgres · Redis

The plumbing underneath every agent — API endpoints, durable workflow state, job queues, rate limiters. Boring by design.

Infra

Docker · AWS · GCP · Railway

Containerized deployments. AWS and GCP for enterprise and compliance-heavy clients; Railway and Fly.io for startup deployments where ops simplicity wins.

CRM integrations and observability.

CRM

HubSpot · Salesforce · Zoho · GHL

Standard integrations with scoped API keys, permission boundaries, and conversation-to-record writeback. Also Pipedrive, Close, ActiveCampaign, Intercom, and custom CRMs.

Help desk

Intercom · Zendesk · Help Scout · Front

Context-preserving handoff from AI to human, not "hi, can you tell me again what you just explained to the bot."

Observability

LangSmith

Our default agent observability. Every tool call, model call, and decision is traceable with input/output. We require this on every production agent deployment.

Observability

Helicone

LLM-specific monitoring — cost, latency, cache hit rate, per-model analytics. Complements LangSmith on the infrastructure side.

Archive / compliance

S3 WORM · Smarsh · Global Relay

Compliant storage for call recordings, transcripts, and supervision artifacts. Used in healthcare and financial-services deployments.

Analytics

Segment · Amplitude · PostHog · Klaviyo

Event and customer-data stacks we plug agents into for onboarding triggers, churn signals, and retention campaign orchestration.

How we pick — the one-minute version.

Three variables decide 80% of our stack choices for a given project: latency budget, data-residency constraint, and volume at steady state.

Low latency, low residency, low volume → hosted platforms (Vapi / Pinecone / OpenAI API) win. High volume, strict residency, or HIPAA → self-hosted stacks (LiveKit / Qdrant / Llama) win on total cost and compliance. Mid-ground → hybrid (hosted voice, self-hosted vector, enterprise-hosted models).

We model the real numbers in discovery — per-call cost, per-conversation cost, p95 latency targets, residency obligations — and the stack choice falls out of the numbers. Most of the time, there's an obviously-correct answer for a given project; occasionally there isn't, and we walk the trade-off.

Want a stack opinion for your specific project?

One 20-minute technical call. We'll hear your use case, your scale, your residency constraints, and tell you what we'd build with and why. If we think you should pick something we don't work with, we'll tell you that too.