ldenoue (u/ldenoue) - Readit News

ldenoue commented on Show HN: VoiceView – Instant Audio Overviews (web, YouTube, pdf, X articles) voiceview.app/... · Posted by u/ldenoue

ldenoue · 3 days ago

I grew tired of endless YouTube videos, X articles or web articles. So this app lets you open any link and you instantly get an AI summary + brief about the content.

(It's free up to 20 articles because there are real costs: I use Gemini to summarize the pages you open)

AI voices run locally on your iPhone/iPad (web extension version coming soon).

When you find something useful, you can share the overviews online (free hosting), e.g. https://voiceview.app/a/2J49UnwK

Hope this helps cut the noise and help folks save time.

Laurent

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

grim_io · 3 months ago

You can call it what ever you like, but to me this is deceptive.

Where is the difference between this and Indian support staff pretending to be in your vicinity by telling you about the local weather? Your version is arguably even worse because it can plausibly fool people more competently.

ldenoue · 2 months ago

It doesn't have to be. You can configure your bot to great the user. E.g. "Aleksandra is not available at the moment, but I'm her AI assistant to help you book a table. How may I help you?"

So you're telling the caller that it is an AI, and yet you can have a pleasant background audio experience.

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

pugio · 3 months ago

Do you have anything written up about how you're doing this? Curious to learn more...

ldenoue · 2 months ago

I don't but I should open source this code. I was trying to sell to OEM though, that's why. Are you interested in licensing it?

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

picardo · 3 months ago

Is AssemblyAI or Deepgram compatible with OpenAI Realtime API, esp. around voice activity detection and turn taking? How do you implement those?

ldenoue · 2 months ago

I am not using speech to speech APIs like OpenAI, but it would be easy to swap the STT + LLM + TTS to using Realtime (or Gemini Live API for that matter).

OpenAI realtime voices are really bad though, so you can also configure your session to accept AUDIO and output TEXT, and then use any TTS provider (like ElevenLabs or InWord.ai, my favorite for cost) so generate the audio.

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

nextworddev · 3 months ago

Is there a simple, serverless version of deploying Pipecat stack, without: - me having to self host on my infra

I just want to provide: - business logic - tools - configuration metadata (e.g. which voice to use)

I don't like Vapi due to 1) extensive GUI driven experience, 2) cost

ldenoue · 2 months ago

Check out something like LayerCode (Cloudflare based).

Or PipeCat Cloud / LiveKit cloud (I think they charge 1 cent per minute?)

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

nextworddev · 3 months ago

let me get this straight, you are storing convo threads / context in DOs?

e.g. Deepgram (STT) via websocket -> DO -> LLM API -> TTS?

ldenoue · 2 months ago

Yes DO let you handle long lived websocket connections. I think this is unique to Cloudflare. AWS or Google Cloud don't seem to offer these things (statefulness basically).

Same with TTS: some like Deepgram and ElevenLabs let you stream the LLM text (or chunks per sentence) over their websocket API, making your Voice AI bot really really low latency.

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

the_af · 3 months ago

I assume it's to make it seem like an actual call center rather than a scam. I recently got two phone scam attempts (credit card related) that sounded exactly like this.

ldenoue · 3 months ago

I built a voice AI stack and background noise can be really helpful to a restaurant AI for example. Italian background music or cafe background is part of the brand. It’s not meant to make the caller believe this is not a bot but only to make the AI call on brand.

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

kwindla · 3 months ago

One easy way to build voice agents and connect them to Twilio is the Pipecat open source framework. Pipecat supports a wide variety of network transports, including the Twilio MediaStream WebSocket protocol so you don't have to bounce through a SIP server. Here's a getting started doc.[1]

(If you do need SIP, this Asterisk project looks really great.)

Pipecat has 90 or so integrations with all the models/services people use for voice AI these days. NVIDIA, AWS, all the foundation labs, all the voice AI labs, most of the video AI labs, and lots of other people use/contribute to Pipecat. And there's lots of interesting stuff in the ecosystem, like the open source, open data, open training code Smart Turn audio turn detection model [2], and the Pipecat Flows state machine library [3].

[1] - https://docs.pipecat.ai/guides/telephony/twilio-websockets [2] - https://github.com/pipecat-ai/pipecat-flows/ [3] - https://github.com/pipecat-ai/smart-turn

Disclaimer: I spend a lot of my time working on Pipecat. Also writing about both voice AI in general and Pipecat in particular. For example: https://voiceaiandvoiceagents.com/

ldenoue · 3 months ago

The problem with PipeCat and LiveKit (the 2 major stacks for building voice ai) is the deployment at scale.

That’s why I created a stack entirely in Cloudflare workers and durable objects in JavaScript.

Providers like AssemblyAI and Deepgram now integrate VAD in their realtime API so our voice AI only need networking (no CPU anymore).

ldenoue commented on Asterisk AI Voice Agent github.com/hkjarral/Aster... · Posted by u/akrulino

nextworddev · 3 months ago

Can I connect this to Twilio

ldenoue · 3 months ago

I developed a stack on Cloudflare workers where latency is super low and it is cheap to run at scale thanks to Cloudflare pricing.

Runs at around 50 cents per hour using AssemblyAI or Deepgram as the STT, Gemini Flash as LLM and InWorld.ai as the TTS (for me it’s on par with ElevenLabs and super fast)