How it works
- Implements bog standard SIP over TCP to connect just like a normal desk phone or spftphone. - Streams RTP (mu-law) bidirectionally to OpenAI’s realtime API - Handles function calls for external actions (webhooks)
Use cases
- After-hours auto-attendant - Voicemail/intent capture with structured output - Internal system lookup (CRM, scheduling, ticket creation, etc.) via function calls - Replaces IVR’s with natural conversation
Why?
Most AI voice systems expect you to hand over call routing, use their SIP trunk, or are an entirely separate voice stack all together. Every company (nearly) already has a SIP PBX, so we thought operating as a normal SIP extension was the simplest integration point.
Tech Stack
- The backend is built in asynchronous Rust. - We connect to the realtime API using websockets rather than SIP trunking or WebRTC - Hosted on a simple AWS EC2 instance
Limitations / gotchas
- Currently only supports SIP over TCP, we have TLS support coming soon - There are some NAT traversal assumptions (we behave like a softphone) - Latency depends on PBX and model RTT and audio frame sizes (currently seeing ~300ms across most deployments) - You still need your own OpenAI key. Could be a positive or negative, depends how you look at it :)
Link https://leilani.dev