01 AI · VoIP · Infrastructure

AI Cold Calling Engine —
Modular In-House Development

A fully autonomous AI calling agent that connects directly to a local FritzBox SIP server. No Twilio, no cloud VoIP, no per-minute billing. Custom voice AI runs locally, handles real conversations with prospects, manages call logic, qualification, and retries — all entirely in-house.

100+
Concurrent calls
$0
Per-minute cost
0
External VoIP dependencies
Local
Deployment model
Overview

Replacing Twilio with a self-hosted voice AI stack

The standard approach for AI calling systems relies on cloud VoIP providers like Twilio — which works, but at scale, the per-minute costs become a significant line item. This project takes a different path: direct SIP registration on a local FritzBox router, cutting out the cloud middleman entirely.

The result is a system that runs fully on-premises on commodity hardware, scales to 100+ concurrent calls, and costs nothing per minute to operate. Every component — from the telephony layer to the AI conversation engine — is owned and controlled in-house.

Architecture

How the system connects

The engine registers as a SIP client directly on the FritzBox. Outgoing calls are initiated through the SIP protocol, with audio streams handled via WebRTC. The voice AI processes audio in real-time, generating contextually appropriate responses while managing call state and qualification logic.

q
Engineering Challenges

Problems solved during development

Challenge 01
SIP Registration Stability

Maintaining persistent SIP registration on consumer-grade Fritz!Box hardware required custom keepalive logic and graceful re-registration handling on network drops.

Challenge 02
Real-Time Audio Processing

Streaming live audio from SIP calls to the AI model with sub-200ms latency. Required custom RTP packet handling and efficient audio buffering.

Challenge 03
Concurrent Call Scaling

Handling 100+ simultaneous calls on a single server. Solved through non-blocking I/O patterns and isolating each call into its own managed context.

Challenge 04
Natural Conversation Flow

Making the AI sound natural and handle interruptions, pauses and unexpected responses. Custom state machine manages conversation context across turns.

Tech Stack

Built with precision

TypeScript Node.js SIP Protocol WebRTC FritzBox API Local Voice AI RTP Audio Custom State Machine
Results

Measurable impact

Per-minute calling cost $0
Max concurrent calls tested 100+
External VoIP services required 0
Deployment model Local
Monthly SaaS cost $0

Interested in a similar solution?

Get in Touch →