IMO, this announcement is far less significant than people make it out to be. The feature has been available as a private beta for a good few months, and as a public beta (with a waitlist) for the last few weeks. Most of the blind people I know (including myself) already have access and are pretty familiar with it by now.
I don't think this will replace human volunteers for now, but it's definitely a tool that can augment them. I've used the volunteer side of Be My AI quite a few times, but I only resort to that solution when I have no other option. Bothering a random human multiple times a day with my problems really doesn't feel like something I want to do. There are situations when you either don't need 100% certainty or know roughly what to expect and can detect hallucinations yourself. For example, when you have a few boxes that look exactly the same and you know exactly what they contain but not which box is which, Be My AI is a good solution. If it answers your question, that's great, if it hallucinates, you know that your box can only be one of a few things, so you'll probably catch that. Another interesting use case is random pictures shared to a group or Slack channel, it's good enough to let you distinguish between funny memes and screenshots of important announcements that merit further human attention, and perhaps a request for alt text.
This isn't a perfect tool for sure, but it's definitely pretty helpful if you know how to use it right. All these anti-AI sentiments are really unwarranted in this case IMO.
>Bothering a random human multiple times a day with my problems really doesn't feel like something I want to do.
Please don't feel like you're bothering us. I've had this app for years and absolutely cherished the few calls I've gotten. I get really bummed if I miss a call.
Have you ever needed to do something like make an appointment but kept putting it off because you just really didn't want to talk on the phone?
That can happen to anyone. Some blind people are introverts and don't want to talk to random strangers all the time.
Also, while the vast majority of volunteers have the best intentions and try hard to be helpful, you never know what you're going to get. Some are way too chatty, some offer unsolicited advice.
OpenAI announced from day 1 that GPT-4 is multimodal, so it was mostly waiting for safety censorship and enough GPUs to be available for mass rollout.
This won't entirely replace human volunteers, but these models get rapidly better over time. What you are seeing today is a mere toy compared to the multimodals you'll get in the future.
Currently there's no model trained on videos, due to large size of videos, but in the future there will be video-capable models, which means they can understand and interpret motion and physics. Put that in a smart glass, and it can act as live-eyes to navigate a busy street. Granted this will take years to bring the costs down to make that viable.
We had enough drama with BeMyAI refusing to recognize faces (including faces of famous people) as it were. If sighted people have the right of accessing porn and sexting, why shouldn't we? Who should dictate what content is "appropriate", and what about cultures with different opinions on the subject?
>> the future there will be video-capable models, which means they can understand and interpret motion and physics.
Videos may not suffice. Videos are 2d, with 3d aspects being inferred from that 2d data, which is an issue for autonomous driving based on cameras. A proper model for AI training would be 3d scans rather than videos. The best data set would be a combination of video and 3d scanning. Self-driving cars which might combine video with radar/laser scanning may one day provide such a data set.
There is talk of a 3d version of Google streetview, one using a pair of cameras to allow true VR viewing. That might also be good training data as it will capture, in 3d, may street scenes as they unfold.
Just wanted to say, not talking whatever you feel you should do, but me and others I know don't feel at all like you are 'Bothering a random human multiple times a day'. I personally feel extremely lucky when I get a 'call' and am able to help.
Not much else regarding your comment, just wanted to let you know that I have had very satisfying 'calls' and I am always looking for the next one and always happy when I do.
Almost all blind people usually wear a dark shade of glasses. So, instead of the phone doing the camera’s work, what about a tiny Google-Glass-esque snap-on to the Blind person’s glasses that feeds to the phone for processing?
A few companies are trying it. Last weekend, I tried one, and it was not yet polished, but good enough to start. It recognized me (my friend already has me on his phone contact).
I’m trying to help a friend with his non-profit initiative, bringing the cost to about a third or even a-fourth of Google Glass[1]. After reading the article and the other First Impressions with GPT-4V(ision)[2], it is apparent that this can be possible much simpler and soon enough. The rumor about Ive[3] and Sam[4] talking about AI in Hardware has already given me some good hope.
If anyone else does AI-enabled hardware assistance for blind people and has devices that will be less than $500 a piece in retail, I’d love to talk and introduce them to the right people.
I'm the Founder & CTO of Envision. We're building EXACTLY the product you're describing. Envision Glasses is a bunch of computer vision tools built on top of the Google Glass Enterprise Edition 2. We've more than 2000 visually impaired people, across the world, using the Glasses in more than 30 different langauges.
P.S: I know the EE2 has been discontinued but we've been working closely with Google to ensure current and future demand is met. We're also experimenting a lot with other exciting off-the-shelf glasses that I can't talk about here but I'm super excited for this whole glasses + AI space!
It was one of the best apps out there for (instant) text recognition, and I was pretty happy to pay for it, but since it went free, it's really not the same. There's nothing else like the old Envision out there.
Also, not an issue for me personally, but dropping support for Cyrillic in the middle of a brutal war in Ukraine, in an automatic update released with no prior warning, was an asshole move if I ever saw one.
The main problem with the glasses, in my view, is the cost. It means those of us in developing countries cannot afford it, meaning its only available to a select few. This seems to be common for a lot of assistive technology solutions, even software.
I think without a major consumer product as a platform, it will never reach the number of people it wants to help. AS in example take what the iPhone did to the market of braille note takers.
I have been beta-testing BeMyAI. For roughly a month. It is a milestone in independence. It typically describes scenes in a very usefu way, allows me to ask detail questions, can compare differences in pictures, has a very good OCR and can aso translate foreign languages. Just a few examples:
* It can help reading a menu. But not just straight linear reading... I told it I prefer veggie today, but would also eat something light with meat. So it highlighted the veggie options for me, and in one case even tried to guess the type of dish from a photo which lacked a text description.
* I had it search for cobwebs on a ceiling. Vacuumed them away, and took a second picture, asking if they were gone now.
* It told me a houseplant of mine has yellow leaves, and told me which by describing a path via the stems.
In general, the scene description is very good, and the built-in (implicit) OCR is just a game changer. It is like having a real human reading something to you. Typically, you dont want the human to read all the text on the page, typically you are only interested in some detail, and usually instruct the helper about that so they dont have to read everything to you. The same now wors with an AI, with consistent quality.
Volunteers are great, dont get me wrong, but this comes with a lot of problems. You sometimes have to request help several times, because frankly, sometimes you simply get people who can not help you due to their own abilities. Also, the video feed is demanding on connectivity. If I try to read a menu in the train with volunteers, most of the time this will fail due to "blurry vision" and the camera dropping out. Sending a single picture every few minutes is totally OK in these situations.
> Be My AI is perfect for all those circumstances when you want a quick solution or you don’t feel like talking to another person to get visual assistance.
It presumably will be getting used in low risk situations.
Well, I'd say it is about as dangerous as trusting a random stranger is. Or do you seriously believe blind people have never been trolled by fellow humans?
Please dont project the typical AI-angst into this assistive technology. It doesnt deserve this, especially from people without hands-on experience.
That is exactly the situation where I would NOT trust AI and I feel like the app would refuse to do it too.
I don't know about other countries, but here in Switzerland I've had the app for five years and got only 4 requests for help during that time, which made me think that there were way more helpers than help-seekers. But I suppose they wouldn't be adding AI if that was the case.
Why shouldn't they push responsibility to the user? Blindness isn't a mental handicap, their users are rational adults who are far more familiar with the risks of being unable to see than you are. Why shouldn't they have access to one more tool that they can make a judgement call on when to use? And what's wrong with the provider of the tool giving them guidance on when might be a good time to ask a human instead?
I suspect most of the anger in this thread is just the usual anti-AI stance, but it honestly feels extremely patronizing in this context.
I've been a Be My Eyes volunteer now for a couple of years. I've helped about 5 people and it's been very fulfilling. I jump with delight when I get a call because they are rare (thanks to a lot of volunteers) and the people I help are always so kind.
Two things I've helped with:
Wrapping a gift by helping them orientate the gift correctly (I don't even do this!)
As a user, my biggest concern is that it won’t describe certain content, such as adult content or something that might be considered offensive.. I feel like I should be able full control over the output.
On the other hand, I’d like to be able to test its ability to describe graphics to me. If it’s able to turn graphics into an accessible table, I can browse with my screen reader that would be revolutionary.
Reminds me of when I was young little teenage shithead with shithead friends who discovered the text-to-talk support provided for deaf landline telephone use. You would go to a website and enter a number you would like to call. Then a human operator would read out what you typed to the person you called.
As we quickly learned, the operator would say anything. Anything. So for a few days we would call each other with the operator and never were able to find the limit. I have great respect now, those operators were inhumanly stone faced, and respect that the system was perfectly transparent. Nothing typed was hidden or obfuscated.
There was a programme the other day about this on BBC Radio 4. Unless they’ve resolved the problem already, it will refuse to describe any scene with faces in it, for privacy reasons. Which obviously rules out a lot of useful use-cases. You’re in a cafe wondering what’s going on, Be My AI can’t help if there’s a face. I think it was related to some EU legislation which Be My Eyes are now at least trying to change for this use-case. I wish them the best of luck.
Update: Just tried this again, and it looks like they've loosened the restrictions. It will now describe people again. Yay! It just won't try to identify who someone is.
Original comment: Yeah, this wasn't always the case, until recently. People were using it to describe their kids, spouses, etc. Pissed a lot of folks off when they disabled it. I never even thought of using it for that, so now that I realize I could have, it kinda' pisses me off as well. Honestly, I never cared too much what they looked like, but it would have been interesting to hear the ChatGPT viewpoint. :)
But definitely not something I could have asked a fellow human about without it seeming really weird, and being confident I wouldn't get a biased answer. Though it's probably unrealistic to expect that the AI wouldn't also give a biased answer.
I don't see the point of this. This app has a massively disproportionate number of helpers versus those needing assistance. Personally, I've had the app installed for 3 years and never once was asked to assist anyone. Why take the risk of AI providing false info and injuring someone when there's willing and able humans at the ready?
Are there a large number of users that feel like they are wasting volunteer time with menial tasks?
Strictly speaking, Just because the app had excess helpers doesn't necessarily mean the visually impaired users wouldn't like assistance more often, just that they wouldn't bother others about it.
Probably plenty of times you might want to ask the question specifically to something as impersonal as a machine. ie: how's this itchy spot I've had look?
Considering how often I’ve seen the complaint from your parent post, it’s quite clear people don’t mind. Quite the opposite, they’d embrace the opportunity. Maybe the people who need assistance don’t realise that, but again, that complaint is quite common. I’d like to help but never signed up specifically because of that surplus.
So they had a solution based on humans who are eager to help and are replacing it with an automated system which when mistaken can have disastrous results and cause personal injury. Seems odd to me. A humanised approach is often seen as a positive and this cuts it out without necessity.
All that said, I don’t have any insider information. Perhaps the people who need assistant do prefer talking to a machine.
Thanks for being a volunteer. But please dont judge blind users if you apparently can not put yourself in ther shoes...
Yes, one reason is that some blind users dont want to waste volunteer time. Another reason is that volunteers are different, but the performance of the image AI is predictable. Another reason is that the AI OCR is fast, can also translate, and, surprise, the text is easier to handle later, for copy&paste.
Besides, the performance of the AI describing pictures is, sorry to say, a little bit above what the typical human is willing or capable of doing. IOW, some humans performs worse then the AI. Also, camera access is different from picture taking. I use volunteers when I need more interactive help, but I totally prefer the AI when I just need a single pic.
I was sighted volunteer on a call with a fellow using a treadmill touchscreen. He already knew the menu flow but the UX was dynamic and it wasn't his screen or he could put locator dots on it (Lesson to all designers! Hardware can have physical buttons!). Our interaction was mostly him stating his goal determining the screen's starting state and then where UI elements were, and I would feed back his finger position like "a little left ... no, too far, now up a little ... ok hit it."
I think we can imagine an AI could describe the screen, and even find non-language visual elements if asked explicitly, like arrows or turtle vs hare icons etc. But is it ready to have shared context of how people need to interact with that UI?
I've had it for five years and have had 8 or 9 calls, so you're right about it being very infrequent. You need to be mindful, at least on Android, that as the application is infrequently used that the OS may remove permissions. This feature can be disabled for Be My Eyes specifically.
As for menial tasks, I could definitely see people wanting to use this instead of calling a stranger for more personal matters -- at least initially.
Thanks for being a volunteer. For me as a blind person, there still is that sense that I'm bothering someone when I ask for human assistance. I realize that this is not really rational, given how much people seem to appreciate the opportunity to help, but it takes some effort to overcome this in my own brain regardless.
No, this is genuinely a useful technology. Blind people are getting a ton of value out of it. It's way faster than humans, you can do it quietly without bothering anyone, and honestly it's better at describing some things than 90% of humans are.
There are a lot of things that are just tech demos. This is far more than that.
You apparently have no personal experience wth needing help. Can you pleas take your AI-angst somewhere else, and leave the best innovation that assistive technologies have ever made, for those to discuss which actually know what they are talking about?
This sort of dismissive comment is very anti-social and full of hidden hatred. Projecting your squarrels with a company onto people that really need the help provided.
And before you click, I am that pissed because I am blind. You have absolutenly no idea what that means, and what BeMyEyes and BeMyAI did for us. Just go home and hate someone else please.
I've answered maybe 10 Be My Eyes calls over the last couple of years. I can see some value with AI describing labels or food items however most of the calls I've answered are more nuanced. Unusually I've answered two calls this week, 1st one was looking at photos of a hotel to help decide if it was met the requirements of the person. The second call was helping someone perform a blood sugar test, I had to tell the person when I thought the drop of blood on the tip of their finger was large enough to test and read the result off the tester. Neither of these are candidates for AI but let the users be the ultimate judge, I am continually impressed by the ingenuity and resourcefulness of the people I have interacted with.
This isn't trying to replace human volunteers, but complement them.
I know blind people who are making just as many human Be My Eyes calls as they were before, but they're using Be My AI even more, for things where they wouldn't have even bothered to use the service at all before - because the AI is so fast and convenient.
I am surprised by your assessment that these are not tasks for the AI. Well, the first one is troublesome, but judging the shape of a drop of liquid according to well established procedure sounds quite on par.
Maybe but it was on the tip of a finger in a moving image changing size as the person squeezed their finger, also something that I never previously considered, blind people have tend to have no/little artificial light on when alone, luckily the app allows the person providing the assistance turn on the flash on the other persons phone.
IMO, this announcement is far less significant than people make it out to be. The feature has been available as a private beta for a good few months, and as a public beta (with a waitlist) for the last few weeks. Most of the blind people I know (including myself) already have access and are pretty familiar with it by now.
I don't think this will replace human volunteers for now, but it's definitely a tool that can augment them. I've used the volunteer side of Be My AI quite a few times, but I only resort to that solution when I have no other option. Bothering a random human multiple times a day with my problems really doesn't feel like something I want to do. There are situations when you either don't need 100% certainty or know roughly what to expect and can detect hallucinations yourself. For example, when you have a few boxes that look exactly the same and you know exactly what they contain but not which box is which, Be My AI is a good solution. If it answers your question, that's great, if it hallucinates, you know that your box can only be one of a few things, so you'll probably catch that. Another interesting use case is random pictures shared to a group or Slack channel, it's good enough to let you distinguish between funny memes and screenshots of important announcements that merit further human attention, and perhaps a request for alt text.
This isn't a perfect tool for sure, but it's definitely pretty helpful if you know how to use it right. All these anti-AI sentiments are really unwarranted in this case IMO.
I've written more here https://dragonscave.space/@miki/111018682169530098
Please don't feel like you're bothering us. I've had this app for years and absolutely cherished the few calls I've gotten. I get really bummed if I miss a call.
The people who sign up to help (such as myself) want to help. Honestly, my frustration is that I don’t get asked enough.
That can happen to anyone. Some blind people are introverts and don't want to talk to random strangers all the time.
Also, while the vast majority of volunteers have the best intentions and try hard to be helpful, you never know what you're going to get. Some are way too chatty, some offer unsolicited advice.
This won't entirely replace human volunteers, but these models get rapidly better over time. What you are seeing today is a mere toy compared to the multimodals you'll get in the future.
Currently there's no model trained on videos, due to large size of videos, but in the future there will be video-capable models, which means they can understand and interpret motion and physics. Put that in a smart glass, and it can act as live-eyes to navigate a busy street. Granted this will take years to bring the costs down to make that viable.
We had enough drama with BeMyAI refusing to recognize faces (including faces of famous people) as it were. If sighted people have the right of accessing porn and sexting, why shouldn't we? Who should dictate what content is "appropriate", and what about cultures with different opinions on the subject?
Videos may not suffice. Videos are 2d, with 3d aspects being inferred from that 2d data, which is an issue for autonomous driving based on cameras. A proper model for AI training would be 3d scans rather than videos. The best data set would be a combination of video and 3d scanning. Self-driving cars which might combine video with radar/laser scanning may one day provide such a data set.
There is talk of a 3d version of Google streetview, one using a pair of cameras to allow true VR viewing. That might also be good training data as it will capture, in 3d, may street scenes as they unfold.
Just wanted to say, not talking whatever you feel you should do, but me and others I know don't feel at all like you are 'Bothering a random human multiple times a day'. I personally feel extremely lucky when I get a 'call' and am able to help.
Not much else regarding your comment, just wanted to let you know that I have had very satisfying 'calls' and I am always looking for the next one and always happy when I do.
A few companies are trying it. Last weekend, I tried one, and it was not yet polished, but good enough to start. It recognized me (my friend already has me on his phone contact).
I’m trying to help a friend with his non-profit initiative, bringing the cost to about a third or even a-fourth of Google Glass[1]. After reading the article and the other First Impressions with GPT-4V(ision)[2], it is apparent that this can be possible much simpler and soon enough. The rumor about Ive[3] and Sam[4] talking about AI in Hardware has already given me some good hope.
If anyone else does AI-enabled hardware assistance for blind people and has devices that will be less than $500 a piece in retail, I’d love to talk and introduce them to the right people.
1. https://en.wikipedia.org/wiki/Google_Glass
2. https://news.ycombinator.com/item?id=37673409
3. https://en.wikipedia.org/wiki/Jony_Ive
4. https://en.wikipedia.org/wiki/Sam_Altman
I'm the Founder & CTO of Envision. We're building EXACTLY the product you're describing. Envision Glasses is a bunch of computer vision tools built on top of the Google Glass Enterprise Edition 2. We've more than 2000 visually impaired people, across the world, using the Glasses in more than 30 different langauges.
You can check out more informaion here: https://www.letsenvision.com/glasses
P.S: I know the EE2 has been discontinued but we've been working closely with Google to ensure current and future demand is met. We're also experimenting a lot with other exciting off-the-shelf glasses that I can't talk about here but I'm super excited for this whole glasses + AI space!
It was one of the best apps out there for (instant) text recognition, and I was pretty happy to pay for it, but since it went free, it's really not the same. There's nothing else like the old Envision out there.
Also, not an issue for me personally, but dropping support for Cyrillic in the middle of a brutal war in Ukraine, in an automatic update released with no prior warning, was an asshole move if I ever saw one.
I think without a major consumer product as a platform, it will never reach the number of people it wants to help. AS in example take what the iPhone did to the market of braille note takers.
It's still not glass integrated - but the phone version has been out for a while ( https://www.microsoft.com/en-us/ai/seeing-ai ) - previously discussed when it was released https://news.ycombinator.com/item?id=14774167 (6 years ago, 133 comments)
* It can help reading a menu. But not just straight linear reading... I told it I prefer veggie today, but would also eat something light with meat. So it highlighted the veggie options for me, and in one case even tried to guess the type of dish from a photo which lacked a text description. * I had it search for cobwebs on a ceiling. Vacuumed them away, and took a second picture, asking if they were gone now. * It told me a houseplant of mine has yellow leaves, and told me which by describing a path via the stems.
In general, the scene description is very good, and the built-in (implicit) OCR is just a game changer. It is like having a real human reading something to you. Typically, you dont want the human to read all the text on the page, typically you are only interested in some detail, and usually instruct the helper about that so they dont have to read everything to you. The same now wors with an AI, with consistent quality. Volunteers are great, dont get me wrong, but this comes with a lot of problems. You sometimes have to request help several times, because frankly, sometimes you simply get people who can not help you due to their own abilities. Also, the video feed is demanding on connectivity. If I try to read a menu in the train with volunteers, most of the time this will fail due to "blurry vision" and the camera dropping out. Sending a single picture every few minutes is totally OK in these situations.
> Be My AI is perfect for all those circumstances when you want a quick solution or you don’t feel like talking to another person to get visual assistance.
It presumably will be getting used in low risk situations.
Please dont project the typical AI-angst into this assistive technology. It doesnt deserve this, especially from people without hands-on experience.
I don't know about other countries, but here in Switzerland I've had the app for five years and got only 4 requests for help during that time, which made me think that there were way more helpers than help-seekers. But I suppose they wouldn't be adding AI if that was the case.
Deleted Comment
Of course, they explicitly push responsibility to the user by saying things like:
> No. Do not use Be My AI for scanning medicines, reading dosages, or other safety issues
God forbid that we acknowledge this tech is fundamentally flawed. There’s money to be made!
I suspect most of the anger in this thread is just the usual anti-AI stance, but it honestly feels extremely patronizing in this context.
AI right now ain't no god, but the gap between AI mistakes and human mistakes isn't that wide.
I don't believe that Be My Eyes is a for-profit entity.
Dead Comment
Two things I've helped with:
Wrapping a gift by helping them orientate the gift correctly (I don't even do this!)
Preparing a meal by reading ingredients / labels.
On the other hand, I’d like to be able to test its ability to describe graphics to me. If it’s able to turn graphics into an accessible table, I can browse with my screen reader that would be revolutionary.
As we quickly learned, the operator would say anything. Anything. So for a few days we would call each other with the operator and never were able to find the limit. I have great respect now, those operators were inhumanly stone faced, and respect that the system was perfectly transparent. Nothing typed was hidden or obfuscated.
Original comment: Yeah, this wasn't always the case, until recently. People were using it to describe their kids, spouses, etc. Pissed a lot of folks off when they disabled it. I never even thought of using it for that, so now that I realize I could have, it kinda' pisses me off as well. Honestly, I never cared too much what they looked like, but it would have been interesting to hear the ChatGPT viewpoint. :)
But definitely not something I could have asked a fellow human about without it seeming really weird, and being confident I wouldn't get a biased answer. Though it's probably unrealistic to expect that the AI wouldn't also give a biased answer.
Are there a large number of users that feel like they are wasting volunteer time with menial tasks?
Considering how often I’ve seen the complaint from your parent post, it’s quite clear people don’t mind. Quite the opposite, they’d embrace the opportunity. Maybe the people who need assistance don’t realise that, but again, that complaint is quite common. I’d like to help but never signed up specifically because of that surplus.
So they had a solution based on humans who are eager to help and are replacing it with an automated system which when mistaken can have disastrous results and cause personal injury. Seems odd to me. A humanised approach is often seen as a positive and this cuts it out without necessity.
All that said, I don’t have any insider information. Perhaps the people who need assistant do prefer talking to a machine.
Thanks for being a volunteer. But please dont judge blind users if you apparently can not put yourself in ther shoes...
Yes, one reason is that some blind users dont want to waste volunteer time. Another reason is that volunteers are different, but the performance of the image AI is predictable. Another reason is that the AI OCR is fast, can also translate, and, surprise, the text is easier to handle later, for copy&paste.
Besides, the performance of the AI describing pictures is, sorry to say, a little bit above what the typical human is willing or capable of doing. IOW, some humans performs worse then the AI. Also, camera access is different from picture taking. I use volunteers when I need more interactive help, but I totally prefer the AI when I just need a single pic.
I think we can imagine an AI could describe the screen, and even find non-language visual elements if asked explicitly, like arrows or turtle vs hare icons etc. But is it ready to have shared context of how people need to interact with that UI?
As for menial tasks, I could definitely see people wanting to use this instead of calling a stranger for more personal matters -- at least initially.
Not to mention I imagine the carbon cost of pushing this through AI is much higher than just using humans.
There are a lot of things that are just tech demos. This is far more than that.
This sort of dismissive comment is very anti-social and full of hidden hatred. Projecting your squarrels with a company onto people that really need the help provided.
And before you click, I am that pissed because I am blind. You have absolutenly no idea what that means, and what BeMyEyes and BeMyAI did for us. Just go home and hate someone else please.
I know blind people who are making just as many human Be My Eyes calls as they were before, but they're using Be My AI even more, for things where they wouldn't have even bothered to use the service at all before - because the AI is so fast and convenient.