sagarkava (u/sagarkava)

sagarkava · 6 months ago

Hi HN — I’m launching an open-source WhatsApp AI Voice Agent for phone calls.

Tech stack: It runs on VideoSDK for the SIP gateway, bridging WebRTC ↔ SIP under the hood. For the AI side you can plug in whatever stack you prefer (LLM + STT + TTS). The repo includes example configs.

Why open-source? Most WhatsApp/voice AI projects out there are closed or tied to a single vendor. I wanted something people can actually hack on, fork, and extend — whether that’s experimenting with different voices, building domain-specific agents, or integrating with CRMs.

Performance: End-to-end round-trip latency is ~400–600ms in typical setups. With faster STT/TTS backends there’s headroom to improve this.

I’d love feedback on use cases you’d actually want to build with this: customer support lines, personal AI assistants, language tutors, appointment scheduling, etc. Curious what directions the HN crowd would push this in.

GitHub Repo: https://github.com/videosdk-community/videosdk-whatsapp-ai-c...

Video demo: https://youtu.be/KWfCWE8S_4U?si=yb5WWr4J4n2dgBm8

I’d love feedback: what use cases would you build with this? Customer support, personal AI assistants, language tutors… or something else?

sagarkava · 6 months ago

Hi HN, I'm excited to share our new open-source project: an AI voice agent specifically designed for call centers. This project aims to streamline customer interactions and reduce the workload on human agents by automating initial call handling.

Imagine using it to manage customer inquiries, handle reservations, or conduct surveys without human intervention. It's a game-changer for businesses looking to improve efficiency.

Key features include: - Real-time, low-latency voice conversation. - A cascading pipeline using Deepgram for STT, OpenAI (GPT-4o) for LLM, and ElevenLabs for TTS (customizable). - Advanced turn detection and voice activity detection (VAD) for smooth, natural conversations. - Fully open-source and easily customizable. - Support for Agent2Agent and MCP protocols.

Check out the repo: AI Voice Agent for Call Center https://github.com/videosdk-community/ai-voice-agent-for-cal... Main framework: VideoSDK Agents https://github.com/videosdk-live/agents

What use-cases do you envision for this AI voice agent?