Readit News logoReadit News
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
zestyping · 13 days ago
The primary purpose of generating real-time video of realistic-looking talking people is deception. The explicit goal is to make people believe that they're talking to a real person when they aren't.

It's on you to identify the "immense" benefits that outweigh that explicit goal. What are they?

lcolucci · 2 days ago
I don't think that's the primary purpose of realistic interactive avatars, any more than deception is the purpose of CGI. Deception requires intent to mislead — if users know they're talking to an avatar, it's not deception no matter how realistic. Just as moviegoers aren't "deceived" by CGI. It's an experience they opt into.

As for benefits: language learning with avatars, scalable corporate training, accessible education for kids, personalized coaching, and certainly entertainment, which has real value too.

lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
beast200 · 12 days ago
That's really impressive!
lcolucci · 12 days ago
Thank you!
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
zestyping · 13 days ago
When you generate real-time video of realistic-looking talking characters, the definition of success is fooling people into believing they are talking to a real person when they aren't.

If you pursue this, your explicit goal is deception, and it's a massively harmful kind of deception. I don't see how you can claim to be operating ethically here if that's your goal.

lcolucci · 12 days ago
Do you think the same about text that is indistinguishable from human-written text (LLM chatbots)? Or voice that is indistinguishable from a human talking?

Illegal things, like fraud and impersonation, are illegal. There's a difference between the tool and the actions people do with the tool.

There are tons of useful applications of interactive avatars - from corporate training to kids education to language learning and more. Plus, why would you want to stop this little guy from existing in the world? :) https://lemonslice.com/try/alien

lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
jamesdelaneyie · 14 days ago
I didn't know /imagine could be followed by a prompt, but similarly I asked the avatar about it's appearance and stated it had none. Should probably give it the context of what it's appearance is like, same thing happened for questions like where are you? What are you holding? Who's that behind you? etc etc
lcolucci · 14 days ago
This is so obvious now that you say it (* facepalm *). We definitely need to give the LLM context on the appearance (both from the initial image as well as any /imagine updates during the call). Thanks for pointing it out!
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
sbarre · 14 days ago
That.. is not Max Headroom.
lcolucci · 14 days ago
Can you help us make him? What's the right voice? https://lemonslice.com/hn
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
bn-l · 14 days ago
I wish I could invest in this company. Really. This is the most exciting revenue opportunity I’ve seen during this recent AI hype cycle.
lcolucci · 14 days ago
That's super nice of you to say. Thank you!
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
armcat · 14 days ago
This is so awesome, well done LemonSlice team! Super interesting on the ASR->LLM->TTS pipeline, and I agree, you can make it super fast (I did something myself as a 2-hour hobby project: https://github.com/acatovic/ova). I've been following full-duplex models as well and so far couldn't get even PersonaPlex to run properly (without choppiness/latency), but have you peeps tried Sesame, e.g. https://app.sesame.com/?

I played around with your avatars and one thing that it lacks is that it's "not patient", it's rushing the user, so maybe something to try and finetune there? Great work overall!

lcolucci · 14 days ago
This is good feedback thanks! The "not patient" feeling probably comes from our VAD being set to "eager mode" so that the latency is better. VAD (i.e. deciding when the human has actually stopped talking) is a tough problem in all of voice AI. It basically adds latency to whatever your pipeline's base latency is. Speech2Speech models are better at this.
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
echelon · 14 days ago
The dichotomy of AI haters and AI dreamers is wild.

OP, I think this is the coolest thing ever. Keep going.

Naysayers have some points, but nearly every major disruptive technology has had downsides that have been abused. (Cars can be used for armed robbery. Steak knives can be used to murder people. Computers can be used for hacking.)

The upsides of tech typically far outweigh the downsides. If a tech is all downsides, then the government just bans it. If computers were so bad, only government labs and facilities would have them.

I get the value in calling out potential dangers, but if we do this we'll wind up with the 70 years where we didn't build nuclear reactors because we were too afraid. As it turns out, the dangers are actually negligible. We spent too much time imagining what would go wrong, and the world is now worse for it.

The benefits of this are far more immense.

While the world needs people who look at the bad in things, we need far more people who dream of the good. Listen to the critiques, allow it to aid in your safety measures, but don't listen to anyone who says the tech is 100% bad and should be stopped. That's anti nuclear rhetoric, and it's just not true.

Keep going!

lcolucci · 14 days ago
Well put - and thanks, we'll keep building. Still chasing this level of magic: https://youtu.be/gL5PgvFvi8A?si=I__VSDqkXBdBTVvB&t=173 Not to mention language tutors, training experiences, and more.
lcolucci commented on Show HN: LemonSlice – Upgrade your voice agents to real-time video    · Posted by u/lcolucci
leetrout · 14 days ago
Quick feedback if you're still monitoring the thread:

I did /imagine cheeseburger and /imagine a fire extinguisher and both were correctly generated but the agent has no context. when I ask what they are holding in both cases they ramble about not holding anything and referencing lemons and lemon trees.

I expected it to retain the context as the chat continues. If I ask it what it imagined it just tells me I can use /imagine.

lcolucci · 14 days ago
Good idea. We need to do that. I'm also excited to push the /imagine stuff further and have B-roll interspersed with the talking (like a documentary) or even follow the character around as they move (like a video game)

u/lcolucci

KarmaCake day273June 16, 2021View Original