← Insights

How Dograh Works: The Open-Source Voice AI Architecture Behind the Vapi Alternative

A technical look at how Dograh works: workflows, calls, STT, LLM, TTS, Docker, traces, webhooks, and production risks.

Dograh works by turning a voice agent into a workflow graph connected to telephony, real-time transcription, an LLM, text-to-speech, run records, and webhooks. Developers should read it as an open-source voice AI orchestration layer: useful for control and customization, but still responsible for latency, deployment, monitoring, security, and production handoff.

The reason developers are looking up Dograh is not only that it is an open-source Vapi alternative. It is that Dograh makes the voice AI architecture visible. You can see the workflow, run the stack, inspect the call loop, and reason about what happens after the voice agent hangs up.

That visibility is valuable. It also means you own more of the system.

What Problem Does Dograh Solve for Developers?

Dograh gives developers a higher-level starting point than building a voice AI stack from raw WebRTC, telephony APIs, streaming STT, LLM calls, TTS streaming, webhook handling, and call records.

The project's GitHub repository positions it as an open-source alternative to Vapi with a drag-and-drop workflow builder, self-hosting, and flexible LLM, TTS, and STT integration. That matters because most voice AI demos hide the hard parts behind a clean dashboard. Dograh exposes more of those parts.

For a developer, the useful abstraction is this:

LayerWhat Dograh Gives YouWhat You Still Own
Agent designVisual workflow builder and JSON workflow definitionsConversation design, edge cases, testing
Voice loopSTT, LLM, and TTS orchestrationModel choice, latency, accuracy, cost
TelephonyIntegrations for phone and web callsNumber setup, SIP/provider details, call reliability
RuntimeDocker deployment pathHosting, updates, logs, scaling
Data outputRuns, transcripts, extracted data, webhooksDownstream storage, routing, reporting, retention
DebuggingTraces and call summariesIncident response and production monitoring

This puts Dograh between two extremes. It is more packaged than a low-level real-time media framework. It is more inspectable than a closed managed voice AI platform.

For the broader provider comparison, read Dograh vs Vapi. For the lower-level developer stack comparison, read LiveKit vs Dograh.

What Happens During a Dograh Voice Call?

Dograh's core docs describe the call loop clearly. A workflow defines the conversation. A telephony provider places or receives the call. Audio streams in real time. The caller's speech goes through STT. The transcript and active node prompt go to the LLM. The response is converted to audio through TTS. When the call ends, Dograh extracts context, fires webhooks, and saves the run record.

In practical terms, the runtime path looks like this:

StepRuntime EventDeveloper Concern
1Call starts through telephony or web callConnection setup, phone provider, caller identity
2Audio stream opensWebRTC/SIP behavior, network quality, TURN needs
3STT transcribes caller speechAccuracy, delay, accents, background noise
4LLM receives transcript, node prompt, and historyPrompt scope, tool access, hallucination control
5TTS streams the response backTime-to-first-audio, voice quality, interruption handling
6Workflow edge conditions move the callDeterministic routing, unclear intent, fallback paths
7End node completes the callSummary, extraction, customer experience
8Webhooks and run records persist outputRetry handling, downstream routing, reporting

This is why voice AI architecture is harder than chat architecture. A chat agent can wait. A phone agent cannot. Every turn has a latency budget, and the user can interrupt, talk over the agent, go silent, change intent, or hang up before your downstream automation finishes.

The Dograh architecture is interesting because it organizes this complexity as workflows and runs instead of forcing every team to start from a blank real-time audio service.

How Do Dograh Workflows Actually Work?

Dograh workflows are graph-based. The workflow schema docs say the workflow_definition object contains nodes and edges, and that the visual builder reads and writes the same structure used by the API.

That is important for developers. It means the UI is not just a separate no-code layer. The visual graph has a code representation.

Dograh lists these node types:

Node TypePurpose
startCallEntry point for telephony calls
triggerEntry point for API-triggered non-telephony runs
agentNodeLLM-powered conversation step
globalNodeShared configuration across agent nodes
webhookHTTP request when the workflow reaches that node
qaQuality analysis for completed calls
endCallCall termination

Edges connect nodes and define when the workflow moves. Dograh's schema includes labels, natural-language conditions, and optional transition speech. That gives developers two useful controls:

  1. The LLM can make flexible decisions inside a node.
  2. The workflow can keep the overall call path constrained.

That balance matters. Fully free-form voice agents can drift. Fully scripted agents break when the caller says something unexpected. A node graph lets developers define the lanes without scripting every sentence.

What Runs in the Dograh Docker Stack?

Dograh's Docker deployment docs describe a local setup that starts PostgreSQL, Redis, MinIO, the API, and the UI. The docs say first startup takes 2-3 minutes and the app is available at http://localhost:3010.

For remote deployment, the docs add more operational detail: HTTPS, nginx reverse proxy, WebSocket proxying, coturn for TURN, and public ports for WebRTC signaling and media relay. They recommend a server with at least 8 GB RAM and 4 vCPUs.

That tells you what kind of system Dograh is. It is not a static app with an LLM API call. It is a real-time application stack.

ComponentRole in the Architecture
UIWorkflow builder, testing surface, settings
APIAgent runtime, workflow execution, call orchestration
PostgreSQLDurable records and configuration
RedisRuntime coordination and fast state
MinIOObject storage for call artifacts
nginxRemote access and HTTPS termination
coturnTURN relay for WebRTC cases where direct media fails

This is good news for developers who want control. It is also the maintenance bill. If you self-host Dograh, you need deployment discipline: backups, secrets, certificates, resource monitoring, upgrade process, and logging.

Where Can Developers Customize Dograh?

Dograh's technical appeal is customization across several layers.

The model layer is the obvious one. Dograh supports configurable STT, LLM, TTS, and realtime providers. That lets teams test tradeoffs across recognition quality, latency, voice quality, cost, and data handling.

The workflow layer is the second one. Because workflow definitions can be represented as JSON, developers can treat agents as structured artifacts, not only dashboard configuration. That opens the door to versioning, review, generation, and repeatable templates.

The tool and webhook layer is the production one. Dograh can call external APIs, pass structured data, and trigger downstream workflows. That is where voice agents stop being demos and start affecting business systems.

Customization PointExample
STT providerSwap recognition providers for accuracy or cost
LLM providerUse a different model for reasoning, routing, or compliance
TTS providerTune voice quality and latency
Workflow JSONGenerate, version, or update agent flows programmatically
Webhook nodeTrigger CRM, scheduling, ticketing, or n8n workflows
QA nodeRun post-call evaluation on completed conversations
TracingInspect prompts, STT output, tool calls, and conversation history

Dograh's tracing docs are worth reading before production. They show STT entries, LLM calls, available tools, tool call requests and responses, prompt composition, and conversation history. That is the kind of visibility developers need when a voice agent behaves differently in a real call than it did in a test.

What Are the Production Risks of Self-Hosting Dograh?

The production risk is not that Dograh is open source. The risk is assuming open source removes operational work.

Self-hosted voice AI has several failure domains:

Failure DomainWhat Can Go Wrong
Media pathWebRTC fails, TURN is misconfigured, audio drops
TelephonyProvider outage, SIP issue, caller ID or number routing problem
STTBad transcription, delay, low confidence, noisy environment
LLMWrong decision, tool misuse, prompt drift, slow response
TTSSlow first audio, unnatural voice, interruption problems
WorkflowEdge condition misfires, missing fallback, dead-end node
WebhookTimeout, duplicate delivery, bad payload, downstream outage
DataTranscript retention, recording storage, client separation, deletion
OpsNo alerting, no rollback, no upgrade plan, unclear incident owner

A developer-led team can own this. Many should. But the decision should be explicit. Managed platforms like Vapi keep more of that responsibility inside the vendor. Open-source platforms like Dograh let you bring more of it into your own infrastructure.

That is the build-vs-buy question in practical form. If the system is core to your product, owning more of the stack may be worth it. If voice AI is one channel inside a broader operation, managed infrastructure may be the better default.

For the operating-cost side of that decision, read The Real Cost of Building Voice AI Infrastructure Yourself and The Integration Tax.

How Should Developers Compare Dograh, Vapi, LiveKit, and Pipecat?

Do not compare these tools as if they live at the same layer.

Tool CategoryExamplesWhat You Are Choosing
Managed voice AI platformVapi, Retell, BlandSpeed, hosted orchestration, vendor support
Open-source voice AI platformDograhMore control with a packaged agent workflow system
Real-time media frameworkLiveKitLow-level control over media transport and realtime app behavior
Agent pipeline frameworkPipecat-style architecturesCode-first control over conversational pipeline components
Operating layerVoxfraCapture, routing, separation, reporting, handoff, provider portability

Dograh is compelling because it sits in a practical middle. It is not only a media framework. It is not only a managed SaaS dashboard. It gives developers a workflow-oriented voice AI system they can run, inspect, and modify.

The strategic mistake is treating one layer as if it solves every layer. A Dograh deployment still needs post-call automation, reliable storage, data boundaries, reporting, and provider strategy. A Vapi deployment needs those too. That is why post-call automation, provider switching, and multi-tenant voice AI architecture belong in the same content cluster.

Frequently Asked Questions

What is Dograh architecture?

Dograh architecture is a voice AI orchestration stack built around workflow graphs, telephony, real-time audio, STT, LLM calls, TTS, run records, traces, and webhooks. Developers can use the visual builder or workflow definitions to define how the agent behaves, then connect it to phone calls, web calls, tools, and downstream systems.

Is Dograh only a no-code voice AI builder?

No. Dograh includes a visual workflow builder, but the workflow has a JSON definition behind it. That matters for developers because agent flows can be represented structurally, updated through APIs, and reasoned about as configuration rather than only dashboard state.

Can Dograh replace Vapi for developers?

Dograh can replace some Vapi use cases for developers who want open-source control and are comfortable owning deployment and operations. Vapi is still stronger for teams that want managed infrastructure, vendor support, and less self-hosting responsibility. The choice depends on which parts of the voice AI system your team wants to own.

What should developers test before using Dograh in production?

Developers should test local setup, remote deployment, telephony integration, STT latency, TTS time-to-first-audio, interruption handling, webhook retries, run records, traces, upgrade flow, backups, and alerting. A single successful demo call is not enough for a production decision.

Does Dograh solve post-call automation?

Dograh can fire webhooks and save run records when calls complete, which gives developers a practical handoff point. It does not automatically solve every post-call operations problem. Teams still need to decide where records live, how retries work, which workflow receives the data, how reporting is built, and how client or location boundaries are enforced.

Does Voxfra support Dograh?

No. Voxfra currently supports Vapi. Dograh is still useful to study because it makes the voice AI operating problem visible. Whether a team uses Vapi, Dograh, Retell, ElevenLabs, or a custom stack, production voice AI still needs call capture, routing, separation, reporting, handoff, and provider portability.


Voxfra is the operating layer around production voice AI. Today it supports Vapi, with capture, routing, separation, reporting, and handoff built for teams that need provider decisions to stay reversible. See the Vapi integration.

← Back to all insights
Ready to build on solid infrastructure?See pricing →