Imagine using it to manage customer inquiries, handle reservations, or conduct surveys without human intervention. It's a game-changer for businesses looking to improve efficiency.
Key features include: - Real-time, low-latency voice conversation. - A cascading pipeline using Deepgram for STT, OpenAI (GPT-4o) for LLM, and ElevenLabs for TTS (customizable). - Advanced turn detection and voice activity detection (VAD) for smooth, natural conversations. - Fully open-source and easily customizable. - Support for Agent2Agent and MCP protocols.
Check out the repo: AI Voice Agent for Call Center https://github.com/videosdk-community/ai-voice-agent-for-cal... Main framework: VideoSDK Agents https://github.com/videosdk-live/agents
What use-cases do you envision for this AI voice agent?
Tech stack: It runs on VideoSDK for the SIP gateway, bridging WebRTC ↔ SIP under the hood. For the AI side you can plug in whatever stack you prefer (LLM + STT + TTS). The repo includes example configs.
Why open-source? Most WhatsApp/voice AI projects out there are closed or tied to a single vendor. I wanted something people can actually hack on, fork, and extend — whether that’s experimenting with different voices, building domain-specific agents, or integrating with CRMs.
Performance: End-to-end round-trip latency is ~400–600ms in typical setups. With faster STT/TTS backends there’s headroom to improve this.
I’d love feedback on use cases you’d actually want to build with this: customer support lines, personal AI assistants, language tutors, appointment scheduling, etc. Curious what directions the HN crowd would push this in.
GitHub Repo: https://github.com/videosdk-community/videosdk-whatsapp-ai-c...
Video demo: https://youtu.be/KWfCWE8S_4U?si=yb5WWr4J4n2dgBm8
I’d love feedback: what use cases would you build with this? Customer support, personal AI assistants, language tutors… or something else?