AI agent & workflow automation development

Custom AI agents that don't just answer questions — they do the work. LangGraph, CrewAI, OpenAI Assistants. CRM-integrated. Human-in-the-loop where it matters, autonomous where it doesn't.

LangGraph · LangChain OpenAI Assistants · CrewAI HubSpot · Salesforce · Zoho · GHL n8n · LangSmith · Helicone

A chatbot answers. An agent acts.

A chatbot tells your customer their order has shipped. An agent pulls the tracking number from your shipping API, updates the CRM note, sends the customer a WhatsApp update, and logs the touchpoint in your reporting — without a human touching the keyboard. That's the difference. One talks. The other does work.

Production AI agents combine a language model with tools (APIs they can call), memory (conversation and workflow state), and decision logic (when to act, when to escalate, when to wait for approval). The agent reasons about what to do next, picks the right tool, executes, observes the result, and decides the next step.

Done right, an AI agent replaces a repeating human workflow — lead triage, email routing, report generation, onboarding sequences, research and outreach. Done wrong, it sends 400 malformed emails to your customer list before anyone notices. The engineering difference is observability, human-in-the-loop checkpoints, and scoped permissions — not the model. That's what we build.

Most clients think they need multi-agent. Most actually need one well-scoped agent.

The right architecture depends on whether the workflow has genuinely distinct skills — not on which one sounds more impressive.

Multi-agent by default

When "multi-agent" is an anti-pattern

  • Five "specialist agents" arguing with each other over one simple decision
  • Coordination overhead consuming 60% of tokens before any work gets done
  • Non-deterministic handoffs that debug like distributed systems — but worse
  • A single change to a prompt cascades into unpredictable behavior everywhere
  • Latency that makes the system unusable for any real-time-ish workflow

Single agent done right

When one agent is the production answer

  • One clear goal, a well-typed set of tools, deterministic handoff to human on ambiguity
  • Measurable success: the workflow either finished or it didn't, with full trace
  • Prompt changes have a contained blast radius — you can test one thing
  • Tool calls, state, and decisions are traceable end-to-end in LangSmith
  • Ships in weeks, not quarters, and the on-call story is actually tractable

Multi-agent systems are the right answer when a workflow has genuinely distinct skills that each benefit from their own prompt, tools, and memory — researcher / writer / reviewer / executor patterns, or long-horizon document processing with explicit roles. We build those on LangGraph because the graph is inspectable and the state is durable. We use CrewAI where role-based composition is the right abstraction. Honestly, the first question on the discovery call is usually: "Do you actually need multi-agent, or do you need one well-built single agent?"

Six workflows we automate with AI agents.

Scoped use cases where we've shipped production agents that are still running a year later.

CRM automation

Lead triage & enrichment agent

Reads every inbound lead, enriches from public sources, scores against your ICP, assigns to the right rep, drafts the first-touch email, and writes everything back to your CRM. Keeps reps on warm conversations instead of data entry.

Typical outcome: 2–5x rep throughput on inbound
Sales

Sales & opportunity agent

Monitors your CRM for stalled deals, drafts context-aware follow-ups using past conversation history, suggests next actions to the rep, and can send approved outreach on cadence. Lead scoring that actually reflects your pipeline reality.

Typical lift: 15–35% on pipeline velocity in reactivated segments
Operations

Email triage & routing agent

Reads inbound email (support@, sales@, hello@), classifies by intent, routes to the right team or Slack channel, drafts replies for human approval, and logs every thread into your CRM with a summary. Zero-touch for known categories; human-in-the-loop for anything else.

Typical outcome: −50% to −70% triage time for inbound email
Internal

Meeting summary & action-item agent

Connects to Zoom, Google Meet, or Teams. Produces per-participant action items, updates project tracker, posts the summary to the right Slack channel, and follows up automatically if an owner misses a deadline.

Typical outcome: −30% on meeting time to action
Content & research

Research & outreach agent

Given a list of target companies, the agent researches each, identifies the right contacts, drafts a personalized outreach based on a recent public trigger (hire, funding, launch), and stages sequences for your rep to approve. No spray-and-pray.

Typical lift: 3–6x response rate vs. template outreach
Document processing

Document & invoice processing agent

OCR + reasoning layer that parses contracts, invoices, medical records, or onboarding docs, extracts structured fields into your ERP or CRM, flags anomalies, and escalates edge cases with the reasoning attached for review.

Typical outcome: 85–95% straight-through processing rate

Your CRM is where the agent earns its keep.

Agents read context, take actions, and respect permission boundaries set by your admin. Four platforms are standard; more on request.

HubSpot

Deal automation, lead scoring, workflow triggers, custom properties, sequences. Native OAuth, scoped API keys per agent.

Salesforce

Opportunity enrichment, Einstein-complementary scoring, Apex callout integrations, Flow triggers, managed package option.

Zoho CRM

Lead routing, deal-stage automation, bulk update workflows, Zoho Desk ticket handoff, Zoho Flow orchestration.

GoHighLevel

Agency multi-tenant setup, SMS/email/voice touchpoints, pipeline automation, white-label portal wiring for GHL resellers.

Also standard: Pipedrive, Close, ActiveCampaign, Intercom, Freshsales, and custom CRMs via REST or GraphQL. The agent's permissions are scoped to exactly the objects it needs — we don't hand an agent an admin token and hope.

The frameworks and tools we actually use in production.

Not the shiny list. The one that's running in clients' accounts right now.

Orchestration

LangGraph

Our default for anything stateful or multi-step. Durable state, inspectable graph, human-in-the-loop checkpoints, and retry semantics that work.

Orchestration

LangChain

For simpler pipelines where a DAG is overkill. Great building blocks for tool use, retrievers, and memory — we pick and compose, not adopt wholesale.

Orchestration

OpenAI Assistants API

Right fit for scoped, single-purpose assistants — file search, code interpreter, function calling — where we don't need to own the entire state machine.

Role composition

CrewAI

Role-based multi-agent composition when the pattern genuinely fits — researcher, writer, reviewer, executor. Not our default, but the right answer in some cases.

Workflows

n8n

Visual automation glue where an agent pulls or pushes through many third-party services. Self-hosted option for data-residency clients.

Models

GPT-4o · Claude · Llama

OpenAI for tool-use heavy workflows, Anthropic Claude for long-context reasoning and careful action, self-hosted Llama 3.3 / Mistral for residency.

Memory

Pinecone · Qdrant · pgvector

Vector stores for agent memory and RAG grounding. Choice depends on your hosting preference and scale — not religion.

Observability

LangSmith · Helicone

Full trace of every agent decision, tool call, token spend, and latency. If we can't show you what the agent did yesterday at 3:47 AM, it's not production.

Backend

FastAPI · Postgres · Redis

The plumbing underneath — API endpoints, durable workflow state, job queues, and rate limiters. Boring by design, because agents fail loud when the plumbing is flaky.

Giving an agent real tools is a real responsibility.

Four controls we ship on every production agent. These are the difference between an agent that works and an agent that sends 400 wrong emails before anyone notices.

Scoped permissions, least privilege

Each tool the agent can call gets its own scoped API key with the minimum permissions required. The CRM write-token can't read billing. The email-send key is rate-limited and domain-restricted. An admin token never touches the agent process.

Human-in-the-loop checkpoints

Destructive or high-trust actions — sending customer email, moving money, deleting records, posting externally — require an approval step by default. Your team approves in Slack or a simple web UI. Low-risk actions flow through autonomously.

Dry-run mode & shadow deployments

Before an agent touches real systems, it runs in dry-run mode on live data, showing what it would do. We review a representative sample. Shadow mode runs the agent alongside the existing human workflow; promotion to live only happens when the diff looks right.

Audit logs & prompt-injection defense

Every tool call, every model call, every decision is logged in LangSmith with full input and output. Prompt-injection defense is structural — tool-use schemas reject malformed commands; user content is never interpreted as instructions to the orchestrator.

Agent projects come in three shapes.

Fixed scope, fixed price. Ongoing API costs modeled in discovery so you know unit economics before you commit.

EngagementScopePriceTimeline
Discovery workshop Workflow audit, tool inventory, architecture doc, fixed-scope proposal, unit-economics model $1,500–$3,000 1 week
Single-agent MVP One well-scoped agent, 2–3 tools, CRM integration, shadow mode, observability $8,000–$15,000 3–5 weeks
Custom workflow agent CRM + email + calendar + docs, human-in-the-loop checkpoints, LangGraph state, LangSmith tracing $10,000–$35,000 5–8 weeks
Multi-agent system Coordinated agents with explicit roles, durable state, approval flows, multi-tenant option $20,000–$50,000 6–10 weeks
Monthly retainer Ops, prompt tuning, new tools, new workflows, model upgrades, observability reviews $2,500–$9,000/mo Post-launch

Per-workflow API costs typically land between $0.05 and $0.50 per executed workflow, depending on model tier, tool call count, and context length. Self-hosted stacks shift the cost from API fees to infrastructure — we model both in the discovery workshop so you pick the right side of that curve for your volume.

From discovery to production in 4–10 weeks.

Weekly demos on your real data from week one. Shadow mode before any agent touches real systems.

Week 0

Discovery workshop

Paid audit of the workflow, the tools the agent would need, the approval boundaries, and the unit economics. You get an architecture document and a fixed-price proposal. If you don't move forward, you keep the document.

Weeks 1–3

Build

LangGraph state machine, tool wiring, CRM and messaging integrations, human-in-the-loop checkpoints, LangSmith tracing. Weekly demos on your real systems. Daily Slack access.

Weeks 4–5

Shadow mode

The agent runs in dry-run on real data with human review of every action it would have taken. We measure precision, escalation quality, and tool-call health before promoting to live.

Week 6+

Production

Live execution with full observability, approval workflows wired into Slack, and a 30-day tuning window included. Optional retainer for new tools, workflows, and model upgrades.

Agent questions we answer on every call.

What is the difference between an AI chatbot and an AI agent?
A chatbot answers questions. An AI agent takes actions. A chatbot tells your customer their order shipped; an agent pulls the tracking number from your shipping API, updates your CRM note, and sends the customer a WhatsApp update — without a human touching it. The agent has tools, memory, and decision logic on top of the language model.
When do I need a multi-agent system instead of a single agent?
A single agent is the right answer for most business automations. Multi-agent architectures (LangGraph, CrewAI) make sense when a workflow has genuinely distinct skills — researcher, writer, reviewer, executor — and each benefits from its own prompt, tools, and memory. Most clients think they need multi-agent and actually need one well-scoped agent.
How much does a custom AI agent cost?
Single-agent MVP: $8,000–$15,000. Multi-agent systems with handoffs: $20,000–$50,000. Custom workflow agents (CRM + email + calendar + docs): $10,000–$35,000. Ongoing API costs are modeled in discovery so you know unit economics before committing.
Can AI agents integrate with HubSpot, Salesforce, Zoho, or GoHighLevel?
Yes — all four are standard integrations. Also Pipedrive, Close, ActiveCampaign, Intercom, and custom CRMs via REST or GraphQL. Agents read context, take actions (create deals, move stages, log activities), and respect permission boundaries set by your admin.
How do you prevent AI agents from doing damage with tools they have access to?
Four controls: scoped API keys with least-privilege permissions per tool; human-in-the-loop checkpoints on destructive actions (sending email, moving money, deleting records); dry-run modes that show what the agent would do before it does it; and full audit logs in LangSmith so every tool call is traceable.
What is LangGraph and why do you use it?
LangGraph is a framework for building stateful multi-step agent workflows as graphs. We use it when the agent needs to branch, retry, loop, or hand off to a human mid-execution — which is most non-trivial business agents. For simpler single-step agents, OpenAI Assistants API or a custom LangChain pipeline is often the right tool.
How long does it take to build a production AI agent?
Single-agent MVP with 2–3 tools: 3–5 weeks. Multi-agent system with CRM and messaging integrations: 6–10 weeks. Custom enterprise workflow with approval flows and audit: 8–14 weeks. Shadow-mode testing on your real data is included before any agent takes live actions.
Can I self-host the AI agent instead of using OpenAI or Anthropic?
Yes. We deploy self-hosted stacks on Llama 3.3, Mistral, or Qwen for data-residency-sensitive clients. Self-hosted agents typically cost more to operate than API-based ones at low volume, and less at very high volume — we model this explicitly in the discovery workshop.

Ready to replace a human workflow with an agent that actually ships?

One 20-minute call. We'll map the workflow, name the tools, point out what's realistic to automate — and what still needs a human in the loop. If you'd be better off with a different team, we'll say so.