I did this at university.
It was our first comp sci class ever, we were given raspberry pi's. We had no coding experience or guidance, and were asked to create "something". All we had to work with was information on how to communicate with the pi using putty. Oddly, this assignment didn't require us to submit code, but simply demonstrate it working.
My group (3 of us) bought a moisture sensor to plug into the pi, and had the idea to make a "flood detection system" that would be housed under a bridge, and would send an email to relevant people when the bridge home from work is about to flood.
So for our demonstration, we had a guy in the back of the class with gmail open ready to send an email saying some variation of "flood warning".
Our script was literally just printing lines with wait statements in between.
Running the script, it prints to the screen "awaiting moisture", and after 3 seconds it will print "moisture detected". In that 3 seconds I dip the sensor into the glass of water. Then the script would wait a few more seconds before printing "sending email to xxx@yyy.com". We then opened up our email, our mate at the back of the room hit send, and an email appeared saying flood warning, and we would get full marks.
Related, I work with industrial control systems. We’d call this “smoke and mirrors”. Sometimes the client would insist on seeing a small portion of a large project working well before it was ready. They’d misunderstand that 90% of the bulk of the work is not visible to the user but they’d want to see a finished state.
We’d set up a dummy HMI and have someone pressing buttons on it for the demo, and someone in the next room manually driving outputs and inputs to make it seem like it was working. Very common.
To me it sounds like a useful thing, for communicating your vision and getting early feedback on whether you are building the right thing. Like uh, a live action blueprint. Let's say the client is like "Email???? That's no good. It needs to be a text message". Now you have saved yourself the trouble of implementing email.
I did this as well. I worked on a localized navigation system back when I was in a school, and unfortunately we broke all available GPS receivers in our hands over the course---that particular model of RS-232 GPS modules was really fragile. As a result we couldn't actually demonstrate a live navigation (and it was incomplete anyway). We proceeded to finish the GUI nevertheless, and then pretended that this is what you see during the navigation, but never actually ran the navigation code. It was an extracurricular activity and didn't affect GPA or anything, for the curious, but I remain kinda uneasy about it.
This can depend a lot on the context, which we don't have a lot of.
Looking at this a different way, they gave first-year students, likely with no established pre-requistites, an open-ended project with fixed hardware but no expectation to submit the final project for review. If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.
A project like this was likely intended to get the students to think about the "what" and not worry so much about the "how." Faking it entirely may have gone a bit further than intended, but would still meet the goal of getting the students to think about what they could do with this computer (if they knew how)
While university instructors can vastly underestimate student's creativity, they are, generally speaking, not stupid. At the very least, they know if you don't tell students to submit their work, you can often count on them doing as little as possible.
I'd call it cheating too but yeah. I like the pi and sensors though. Sounds like the start of something cool. Wish I could get a product like this to put in my roof to detect leaks. That would be useful.
It depends on what the intention of the assignment was. If it was primarily to help the students understand what these devices _could_ be used for, then it's fine. If it was to have them actually do it, well, then the professor should have at least tried to verify that. Given that it's for first-years who have no experience programming, it really could be either.
There’s a story about sales guys at a company (NewTek?) who faked a demo at CES of an Amiga 500 with two monitors showing the “Boing” ball bouncing from one screen to the next. This was insane because the Amiga didn’t have support for multiple monitors in hardware or software so nobody could figure out how they did it. Turns out they had another Amiga hidden behind running the same animation on the second monitor. When they started them at the right offset it looked believable.
My version of this involved a Wii remote: freshmen-level CompSci class, and the group had to build a simple game in Python to be displayed at a showcase among the class. We wrote a space invaders clone. I found a Bluetooth driver that allowed your Wiimote to connect to your Mac as a game controller, so I set up a basic left/right tilt control using a Wiimote for our space invaders clone.
The Wiimote connection was the star of the show by a long shot :P
This is so crazy. Google invented transformers which is the bases for all these models. How do they keep fumbling like this over and over. Google Docs created in 2006! Microsoft is eating their lunch. Google creates the ability to change VM's in place and makes a fully automated datacenter. Amazon and Microsoft are killing them in the cloud. Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.
I was at MS in 2008 September and internally they had a very beautiful and well functioning Office web already (named differently, forgot the name but it wasn't sharepoint if I recall correctly, I think it had to do something with expense reports?) that would put Google Docs to shame today. They just didn't want to cannibalize their own product.
Don’t forget that McAfee was delivering virus scanning in a browser in 1998 with active x support, TinyMCE was full wysiwyg for content in the browser by 2004, and Google docs was released in 2006 on top of a huge ecosystem of document solutions and even some real-time co-authoring document writing platforms.
2008 is late to the party for a docs competitor! Microsoft got the runaround by Google and after Google launched docs they could have clobbered Microsoft which kind of failed to respond properly in kind, but they didn’t push the platform hard enough to eat the corporate market share, and didn’t follow up with a share point alternative that would appeal to the enterprise, and kind of blew the opportunity imo.
I mean to this day Google docs is free but it still hasn’t unseated Word in the marketplace, but the real killer app that keeps office on top is Excel, which some companies built their entire tooling around.
It’s crazy interesting to look back and realize how many twists there were leading us to where we are today.
Btw it was Office Server or Sharepoint Portal earlier (this is like Frontpage days so like 2001?) and Microsoft called it Tahoe internally. I don’t think it became Sharepoint until Office 365 launched.
The XMLHTTP object launched in 2001 and was part of the dhtml wave. That gave a LOT of the capabilities to browsers that we currently see as browser-based word processing, but there were efforts with proprietary extensions going back from there they just didn’t get broad support or become standards. I saw some crazy stuff at SGI in the late 90s when I was working on their visual workstation series launch.
NetDocs was an effort in 2000/2001 that is sometimes characterized as a web productivity suite. There was an internal battle between the Netdocs and Office groups, and Office won.
A product requires commitment, it requires grind. That 10% is the most critical one, and Google persistently refuses to push products across the finish line, just giving up on them and adding to the infamous Google Product Graveyard.
Honestly, what is the point? They could just maintain the core search/ads and not pay billions of dollars for tens of thousands of expensive engineers who have to go through a bullshit interview process and achieve nothing.
If they tried to focus on ads, then they wouldn’t have the talent to support the business. They probably don’t need 17 chat apps - but they can’t start saying no without having other problems.
While it is crazy, it's not too surprising. Google has become as notorious for product ineptitude as they have been for technical prowess. Dominating the fundamental research for GenAI but face planting on the resulting consumer products is right in line with the company that built Stadia, GMail/Inbox, and 17 different chat apps.
tech was based on an acquired company, Google just abused their search monopoly to make it more popular(same thing they did with YT). This has been the strategy for every service they've ever made, Google really hasn't launched a decent in-house product since Gmail and even that was grown using their search monopoly as free advertising
>Google Docs originated from Writely, a web-based word processor created by the software company Upstartle and launched in August 2005
This is like critiquing Disney for putting out garbage and then defending them because dummies keep giving them money regardless of quality. Having standards and expectations of greatness is a good thing and the last thing you want is for mediocrity to become acceptable in society.
> People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.
> So who is losing?
The people who use their products, which are worse than they’ve been in decades? The people who make the content Google now displays without attribution on search results?
I agreed until the last bit. Waymo is making continuous progress and is years ahead of everyone else. Tesla is not catching up and won't beat anyone. Tesla plateaued years ago and has no clue how to improve further. Their Partial Self Driving app has never been anywhere near reliable.
Google Workspace works through resellers, they train less people, and those people give the customer support instead. IMO Google's bad reputation comes from their public customer support.
You mean 1990s? I don't think Word and Excel even existed until the late 80s, and nobody[0] used them until Windows 3.1.
[0] yes, not literally nobody. I know about the Windows 2.0 Excel or whatever, but the user base compared to WordPerfect or 1-2-3 was tiny up until MS was able to start driving them out by leveraging Windows in the early-mid 90s.
It's reassuring that the biggest tech company doesn't automatically make the best tech. If it were guaranteed that Google's resources would automatically trump any startup in the AI field, then it would likely predict a guaranteed dominance of incumbents and consolidation of power in the AI space.
Isn't it always easier to learn from others' mistakes?
Google has the problem that it's typically the first to encounter a problem, and it has the resources to approach it (from search), but the incentive to monetize it (to get away from depending entirely on search revenue). And, management.
I don't know if that really excuses Google in this case because it's a productization problem. Google never tried to release a ChatGPT competitor until after OpenAI had. OpenAI has been wildly successful as the first mover, despite having to blaze some new product trails. Even after months of watching them and with near-infinite resources, Google is still struggling to catch up.
Considerng the number of messaging apps they tried to launch, if there's at least one thing that can be concluded, it's that it isn't easier to learn from their own mistakes.
None of that matters. They'll still make heaps of profit long into the future unless someone beats them in Search or Ads.
AI is a threat there, but it'd require an AI company to transform the culture of Internet use to stop people 'Googling', and that will require two things: something significantly better than Google Search that's worth switching to, and a company that is willing to reject whatever offer Google makes to buy it. Neither is very likely.
I would love to see internal data on volume of search at google. Depending on the interpretation of them chatGPT can meet both of your requirements. Personally, I still search instead of chatGPT mostly, but I have seen other users chatGPT more and more.
Also "interesting" to see the if results being SEO spam generated using AI will keep seo search viable.
Nadella is as much of an MBA type as Pichai. Their education and career paths are incredibly similar.
The difference is Nadella is a good CEO and Pichai isn’t.
Part of it could also be a result of circumstance. Nadella came at a time when MS was foundering and he had to make what appeared to be fairly obvious decisions (pivot to cloud…he was literally picked because of this, and reducing dependence on Windows…which was an obvious necessary step for the pivot to cloud). Pichai OTOH was selected to run Google when it was already doing pretty well. His biggest mandate was likely to not upset the Apple cart.
If roles were reversed, I suspect Nadella would still have been more successful than Pichai, but you never know. I’d Nadella introduction to the CEO job was to keep things going as they were, and Pichai’s was to change the entire direction of the company, maybe a decade later Pichai would have been the aggressive decision maker whereas Nadella would have been the overly cautious guy making canned demos.
>>Google Docs created in 2006! Microsoft is eating their lunch.
Of all the things, this.
I use both Google and Microsoft office products. One thing that strikes you is just how feature rich Microsoft products are.
Google doesn't look like is serious about making money.
I squarely blame rockstar product managers and OKRs for this. Not everything can be a 1000% profitable product built in the next quarter. A lot of things require small continuous improvement and care over years.
Microsoft’s killer product is Excel. I didn’t realize how powerful it was until I saw an expert use it. There are entire billion dollar organisations that would collapse without Excel.
Engineer-driven company. Not enough top-down direction on the products. Too much self-perceived moral high ground. But lately they've been changing this.
Under Eric Schmidt they were engineer-driven, during the golden era of the 2000s. Nowadays they're MBA driven, which is why they had 4 different messaging apps from different product managers.
My engineer friend who work at Google would strongly disagree with this assertion. I keep hearing about all sorts of hijinks initiated by senior PMs and managers trying to build their fiefdoms.
> Google has been working on self driving longer than anyone. Tesla is catching up and will most likely beat them.
I agree with your general post but I disagree with this. Tesla's FSD is so far behind Google it's almost negligent on the part of Tesla despite having so much more data.
I can tell you exactly why. It’s because they have a separate vp and org for all these different products like search, maps, etc. none of them talk to each other and they all compete for promotions. There is no one shot caller same thing with gcp. Google does not know products.
A lot of companies have this structure. You have the Doritos line, the Pepsi line for example etc… maybe you find some common synergies but it’s not unusual.
Errr sorry what’s the innovation of google docs exactly ? Being able to write simultaneously with somebody else? Ok, so this is what it takes for a top notch docs app to exist? Microsoft been developing this product for ages, Google tried to steal the show, although had little to no experience in producing and marketing office apps…
Besides collaborative reuniting is a no feature and there is much more important stuff than this for a word processor to be useful.
Is paid MS Teams is more or less common than paid GSuite? It's hard to find stats on this. GSuite is the better product IMO, but MS has a stronger b2b reputation, and anecdotally I hear more about people using Teams.
The whole Gemini webpage and contents felt weird to me, it's in the uncanny valley of trying to look and feel like an Apple marketing piece. The hyperbolic language, surgically precise ethnic/gender diversity, unnecessary animations and the sales pitch from the CEO felt like a small player in the field trying to pass as a big one.
It's funny because now the OpenAI keynote feels like it's emulating the Google keynotes from 5 years ago.
Google Keynote feels like it's emulating the Apple keynote from 5 years ago.
And the Apple keynote looks like robots just out of an uncanny valley pretending to be humans - just like keynotes might look in 5 years, but actually made by AI. Apple is always ahead of the curve in keynote trends.
You know those memes where AI keeps escalating a theme to more extreme levels with each request?
That's what Apple keynotes feel like now. It seems like each year, they're trying to make their presentations even more essentially 'Apple.' They crossed the uncanny valley a long time ago.
I hadn’t thought about it until just now, but the most recent Apple events really are the closest real-person thing I’ve ever seen to some of the “good” computer generated photorealistic (kinda…) humans “reading” with text-to-speech that I’ve seen.
It’s the stillness between “beats” that does it, I think, and the very-constrained and repetitive motion.
I got the same vibes. Ultra and Pro. It feels tacky that it declares the "Gemini era" before it's even available. Google really want to be seen as level on the playing field.
I imagine the commenter was calling out what they perceived to be an inauthentic yet carefully planned facade of diversity. This marketing trend rubs me the wrong way as well, because it reminds me of how I was raised and educated as a 90s kid to believe that racism was a thing of the past. That turned out to be a damaging lie.
I don't mean to imply that companies should avoid displays of diversity, I just mean that it's obvious when it's inauthentic. Virtue signaling in exchange for business is not progress.
It's bad if the makeup of the company doesn't reflect the diversity seen in the marketing, because it doesn't reflect any genuine value and is just for show.
Now, I don't know how diverse the AI workforce is at Google, but the YT thumbnails show precisely 50% of white men. Maybe that's what the parent meant by "surgically precise".
Agreed with your comment. This is every marketing department on the planet right now, and it's not a bad thing IMO. Can feel a bit forced at times, but it's better than the alternative.
Of course to normal people, this just seems like another Google keynote. If OP is counting the number of white people, maybe they're the weird one here.
A big red flag for me was that Sundar was prompting the model to report lots of facts that can be either true or false. We all saw the benchmark figures that they published and the results mostly showed marginal improvements. In other words, the issue of hallucination has not been solved. But the demo seemed to imply that it had. My conclusion was that they had mostly cherry picked instances in which the model happened to report correct or consistent information.
They oversold its capabilities, but it does still seem that multi-modal models are going to be a requirement for AI to converge on a consistent idea of what kinds of phenomena are truly likely to be observed across modalities. So it's a good step forward. Now if they can just show us convincingly that a given architecture is actually modeling causality.
Yeah, this was so obvious too. Clearly Mark Rober tried to ask it what to try and got stupid answers, then tried to give it clues and had to get really specific before he got a usable answer.
The issue of hallucinations won't be solved with the RAG approach. It requires a fundamentally different architecture. These aren't my words but Yann LeCun's. You could easily understand if you spend some time playing around. The autoregressive nature won't allow the LLMs to create an internally consistent model before answering the question. We have approaches like Chain of Thought and others, but they are merely band-aids and superficially address the issue.
If you build a complex Chain if Thought style Agent and then train/finetune further by reinforcement learning with this architecture then it is not a band-aid anymore, it is an integral part of the model and the weights will optimize to make use of this CoT ability.
Ever since the "stochastic parrots" and "super-autocomplete" criticisms of LLMs, the question is whether hallucinations are solvable in principle at all. And if hallucinations are solvable, it would of such basic and fundamental scientific importance that I think would be another mini-breakthrough in AI.
An interesting perspective on this I’ve heard discussed is whether hallucinations ought to be solved at all, or whether they are core to the way human intelligence works as well, in the sense that that is what is needed to produce narratives.
I believe it is Hinton that prefers “confabulation” to “hallucination” because it’s more accurate. The example in the discussion about hallucination/confabulation was that of someone who had been present in the room during Nixon’s Watergate conversations. Interviewed about what he heard, he provided a narrative that got many facts wrong (who said what, and what exactly was said). Later, when audio tapes surfaced, the inaccuracies in his testimony became known. However, he had “confabulated truthfully”. That is, he had made up a narrative that fit his recall as best as he was able, and the gist of it was true.
Without the ability to confabulate, he would have been unable to tell his story.
(Incidentally, because I did not check the facts of what I just recounted, I just did the same thing…)
These LLMs do not have a concept of factual correctness and are not trained/optimized as such. I find it laughable that people expect these things to act like quiz bots - this misunderstands the nature of a generative LLM entirely.
It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.
In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.
> because for certain prompts, the approximate consensus of all available text on the internet is pretty much true
I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.
The first question I always ask myself in such cases: how much input data has a simple "I don't know" lines? This is clearly a concept (not knowing sth) that has to be learned in order to be expressed in the output.
> In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.
I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.
> It simply spits out whatever output sequence it feels is most likely to occur after your input sequence... but to optimize for factual correctness is a completely different endeavor
What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.
I was fooled. The model release announcement said it could accept video and audio multi-modal input. I understood that there was a lot of editing and cutting, but I really believed I was looking at an example of video and audio input. I was completely impressed since it’s quite a leap to go from text and still images to “eyes and ears.” There’s even the segment where instruments are drown and music was generated. I thought I was looking at a model that could generate music based on language prompts, as we have seen specialized models do.
This was all fake. You are taking a collection of cherry picked prompt engineered examples, then dramatizing them for maximum shareholder hype. The music example was just outputting a description of a song, not the generated music we heard in the video.
It’s one thing to release a hype video with what-ifs and quite another to claim that your new multi-modal model is king of the hill then game all the benchmarks and fake all the demos.
Google seems to be in an evil phase. OpenAI and MS must be quite pleased with themselves.
This kind of moral fraud - unethical behavior - is tolerated for some reason. It's almost like investors want to be fooled. There is no room for due diligence. They squeel like excited Taylor Swift fans as they are being lied to.
This shouldn't be a surprise. Companies optimize for what benefits shareholders. Or if there's an agency conflict of interest, companies optimize for what benefits managements' career and bonuses (perhaps at the expense of shareholders). Companies pay lip service to external stakeholders, but really that's a ploy to reduce attention and the risk of regulation, there is no fundamental incentive to treat all stakeholders well.
If lying helps, which can happen if there aren't large legal costs or social repercussions on brand equity, or if the lie goes undetected, then they'll lie. This is what we necessarily get from the upstream incentives. Fortunately, lying in a marketing video is fairly low on the list of ethical violations that have happened in the recent past.
We've effectively got a governance alignment problem that we've been trying to solve with regulations, taxes and social norms. How can you structure guardrails in the form of an incentive system to align companies with ethical outcomes? That's the question and it's a difficult one. This question also applies to any form of human organization, including governments.
My friend, all these large corporations are going to get away with exactly as much as they can, for as long as they can. You're implying there's nothing to do but wait until they grace us with a "not evil phase", when in reality we need to be working on restoring our anti-monopoly regulation that was systematically torn down over the last 30 years.
Given the massive data volume in videos, I assumed it processed video into pictures by extracting a frame per second or something along those lines, while still taking the entire video as the initial input.
Seems reminiscent of a video where the lead research department within Google is an animation studio (wish I could remember more about that video)
Doing all these hype videos just for the sake of satisfying shareholders or whatever is just making me loose trust in their research division. I don't think they did anything like this when they released Bert.
I agree completely. When alphazero was announced I remember feeling like shocked over how they stated this revolutionary breakthrough as if it was like a regular thing. Alphafold and Alphacode are also impressive but this one just sounds like it was forced from Sundar and not the usual deepmind
Well put. I’m not touching anything Google does any more. They’re far too dishonest. This failed attempt at a release (which turns out was all sizzle and no steak) only underscored how far behind OpenAI they actually are. I’d love to have been a fly on the wall in the OAI offices when this demo video went live.
I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(
I went back to the video and it said Gemini was "searching" for that music, not generating it. Google has done some stuff with generative music (https://aitestkitchen.withgoogle.com/experiments/music-lm) but we don't know if they'll bring that into Gemini.
If I demoed swype texting as it functions in my day to day life to someone used to a querty keyboard they would never adopt it
The rate at which it makes wrong assumptions about the word, or I have to fix it is probably 10% to 20% of the time
However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all. So within the context of the different types of text Systems out there, I t’s the best thing going for me personally, but it takes some time to learn how to use it.
This is every product.
If you demonstrated to people how something will actually work after 100 hours of habituation and compensation for edge cases, nobody would ever adopt anything.
I’m not sure how to solve this because both are bad.
(Edit: I’m keeping all my typos as meta-comment on this given that I’m posting via swype on my phone :))
Showing a product in its best light is one thing. Demonstrating a mode of operation that doesn't exist is entirely another. It would be like if a demo of your swipe keyboard included telepathic mind control for correcting errors.
I’m not sure I’d agree that what they showed will never be possible and in fact my whole point is that I think Google can most likely deliver on that in this specific case. Chalk it up to my experience in the space, but from what I can see it looks like something Google can actually execute on (unlike many areas where they fail on product regularly).
I would agree completely that it’s not ready for consumers the way it was displayed, which is my point.
I do want to add that I believe that the right way to do these types of new product rollout is not with these giant public announcements.
In fact, I think generally speaking the “right” way to do something like this demonstrates only things that are possible robustly. However that’s not the market that Google lives in. They’re capitalists trying to make as much money as possible. I’m simply evaluating that what they’re showing I think is absolutely technically possible and I think Google can deliver it even if its not ready today.
Do I think it’s supremely ethical the way that they did it? No I don’t.
Does swype make editing easier somehow? iOS spellcheck has negative value. I turned it off years ago and it reduced errors but there are still typos to fix.
Unfortunately iOS text editing is also completely worthless. It forces strange selections and inserts edited text in awkward ways.
I’m a QWERTY texter but text entry on iOS is a complete disaster that has only gotten worse over time.
I'm an iOS user and prefer the swipe input implementation in GBoard over the one in the native keyboard. I'm not sure what the differences are, but GBoard just seems to overall make fewer mistakes and do a better job correcting itself from context.
> However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all.
But that's a different issue than LLM hallucinations.
With Swype, you already know what the correct output looks like. If the output doesn't match what you wanted, you immediately understand and fix it.
When you ask an LLM a question, you don't necessarily know the right answer. If the output looks confident enough, people take it as the truth. Outside of experimenting and testing, people aren't using LLMs to ask questions for which they already know the correct answer.
The insight here is that the speed of correction is a crucial component of the perceived long-term value of an interface technology.
It is the main reason that handwriting recognition did not displace keyboards. Once the handwriting is converted to text, it’s easier to fix errors with a pointer and keyboard. So after a few rounds of this most people start thinking: might as well just start with the pointer and keyboard and save some time.
So the question is, how easy is it to detect and correct errors in generative AI output? And the unfortunate answer is that unless you already know the answer you’re asking for, it can be very difficult to pick out the errors.
Yeah the feedback loop with consumers has a higher likelihood of being detrimental, so even if the iteration rate is high, it’s potentially high cost at each step.
I think the current trend is to nerf the models or otherwise put bumpers on them so people can’t hurt themselves. That’s one approach that is brittle at best and someone with more risk tolerance (OpenAI) will exploit that risk gap.
It’s a contradiction then at best and depending on the level of unearned trust from the misleading marketing, will certainly lead to some really odd externalities
Think “man follows google maps directions into pond” but for vastly more things.
I really hated marketing before but yeah this really proves the warning I make in the AI addendum to my scarcity theory (in my bio).
I know marketing is marketing, but it's bad form IMO to "demo" something in a manner totally detached from its actual manner of use. A swype keyboard takes practice to use, but the demos of that sort of input typically show it being used in a realistic way, even if the demo driver is an "expert".
This is the sort of demo that 1) gives people a misleading idea of what the product can actually do; and 2) ultimately contributes to the inevitable cynical backlash.
If the product is really great, people can see it in a realistic demo of its capabilities.
I think you mean swipe. Swype was a brilliant third party keyboard app for Android which was better at text prediction and manual correction than Gboard is today. If however you really do still use Swype then please tell me how because I miss it.
Ha good point, and yes I agree Swype continues to be the best text input technology that I’ll never be able to use again. I guess I just committed genericide here but I meant the general “swiping” process at this point
You make a decent point, but you might underestimate how much this Gemini demo is faked[0].
In your Swype analogy, it would be as if Swype works by having to write out on a piece of paper the general goal of what you're trying to convey, then having to write each individual letter on a Post-it, only for you to then organize these Post-its in the correct order yourself.
This process would then be translated into a slick promo video of someone swiping away on their keyboard.
This is not a matter of “eh, it doesn't 100% work as smooth as advertised.”
Its honestly pretty mind boggling that we’d even use querty on a smartphone. The entire point of the layout is to keep your fingers on the home row. Meanwhile people text with a single or two thumbs 100% of the time.
The reason we use qwerty on a smartphone is extremely straightforward: people tend to know where to look for the keys already, so it's easy to adopt to even though it's not "efficient". We know it better than we know the positions of letters in the alphabet. You can easily see the difference if you're ever presented with an onscreen keyboard that's in alphabetical order instead of qwerty (TVs do this a lot, for some reason, and it's a different physical input method but alpha order really does make you have to stop and hunt). It slows you down quite a bit.
Path dependency is the reason for this, and is the reason why a lot of things are the way they are. An early goal with smart phone keyboards was to take a tool that everyone already knew how to use, and port it over with as little friction as possible. If smart phones happened to be invented before external keyboards the layouts probably would have been quite different.
"The entire point of the layout is to keep your fingers on the home row."
No, that is how you're told to type. You have to be told to type that way precisely because QWERTY is not designed to keep your fingers on the home row. If you type in a layout that is designed to do that, you don't need to be told to keep your fingers on the home row, because you naturally will.
Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information. But whatever they were thinking that is clearly not it because it is plainly obvious just by looking at it how bad it is at that. Nobody trying to design a layout for "keeping your fingers on the home row" would leave hjkl(semicolon) under the resting position of the dominant hand for ~90% of the people.
This, perhaps in one of technical history's great ironies, makes it a fairly good keyboard for swype-like technologies! A keyboard layout like Dvorak that has "aoeui" all right next to each other and "dhtns" on the other would be constantly having trouble figuring out which one you meant between "hat" and "ten" to name just one example. "uio" on qwerty could probably stand a bit more separation, but "a" and "e" are generally far enough apart that at least for me they don't end up confused, and pushing the most common consonants towards the outer part of the keyboard rather than clustering them next to each other in the center (on the home row) helps them be distinguishable too. "fghjkl" is almost a probability dead zone, and the "asd" on the left are generally reasonably distinct even if you kinda miss one of them badly.
I don't know what an optimal swype keyboard would be, and there's probably still a good 10% gain to be made if someone tried to make one, but it wouldn't be enough to justify learning a new layout.
I suppose this is a great example of how trust in authentic videos, audio, images, company marketing must be questioned and, until verified, assumed to be 'generated'.
I am curious, if the voice, email, chat, and shortly video can all be entirely generated in real or near real time, how can we be sure that remote employee is actually not a full or partially generated entity?
Shared secrets are great when verifying but when the bodies are fully remote - what is the solution?
I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?
The question is if an attacker tells you they lost access can you please reset some credential, and your security process is getting on a video call because you're a fully remote company let's say.
Fair, but that also assumes the recipients ("family") are in a mindset of constantly thinking about the threat model in this type of situation and will actually insist on hearing the passphrase.
I think it's also why we as a community should speak out when we catch them for doing this as they are discrediting tech demos. It won't be enough because a lie will be around the world before the truth gets out the starting gates but we can't just let this go unchecked.
My group (3 of us) bought a moisture sensor to plug into the pi, and had the idea to make a "flood detection system" that would be housed under a bridge, and would send an email to relevant people when the bridge home from work is about to flood.
So for our demonstration, we had a guy in the back of the class with gmail open ready to send an email saying some variation of "flood warning". Our script was literally just printing lines with wait statements in between. Running the script, it prints to the screen "awaiting moisture", and after 3 seconds it will print "moisture detected". In that 3 seconds I dip the sensor into the glass of water. Then the script would wait a few more seconds before printing "sending email to xxx@yyy.com". We then opened up our email, our mate at the back of the room hit send, and an email appeared saying flood warning, and we would get full marks.
We’d set up a dummy HMI and have someone pressing buttons on it for the demo, and someone in the next room manually driving outputs and inputs to make it seem like it was working. Very common.
Garbage in, garbage out.
Looking at this a different way, they gave first-year students, likely with no established pre-requistites, an open-ended project with fixed hardware but no expectation to submit the final project for review. If they wanted to verify the students actually developed a working program, they could have easily asked for the Pi's to be returned along with the source code.
A project like this was likely intended to get the students to think about the "what" and not worry so much about the "how." Faking it entirely may have gone a bit further than intended, but would still meet the goal of getting the students to think about what they could do with this computer (if they knew how)
While university instructors can vastly underestimate student's creativity, they are, generally speaking, not stupid. At the very least, they know if you don't tell students to submit their work, you can often count on them doing as little as possible.
Literally all that matters is that they passed.
The Wiimote connection was the star of the show by a long shot :P
The amount of fumbles is monumental.
2008 is late to the party for a docs competitor! Microsoft got the runaround by Google and after Google launched docs they could have clobbered Microsoft which kind of failed to respond properly in kind, but they didn’t push the platform hard enough to eat the corporate market share, and didn’t follow up with a share point alternative that would appeal to the enterprise, and kind of blew the opportunity imo.
I mean to this day Google docs is free but it still hasn’t unseated Word in the marketplace, but the real killer app that keeps office on top is Excel, which some companies built their entire tooling around.
It’s crazy interesting to look back and realize how many twists there were leading us to where we are today.
Btw it was Office Server or Sharepoint Portal earlier (this is like Frontpage days so like 2001?) and Microsoft called it Tahoe internally. I don’t think it became Sharepoint until Office 365 launched.
The XMLHTTP object launched in 2001 and was part of the dhtml wave. That gave a LOT of the capabilities to browsers that we currently see as browser-based word processing, but there were efforts with proprietary extensions going back from there they just didn’t get broad support or become standards. I saw some crazy stuff at SGI in the late 90s when I was working on their visual workstation series launch.
https://www.zdnet.com/article/netdocs-microsofts-net-poster-...
https://www.eweek.com/development/netdocs-succumbs-to-xp/
So why did they never release that and went with Office 365 instead?
A product requires commitment, it requires grind. That 10% is the most critical one, and Google persistently refuses to push products across the finish line, just giving up on them and adding to the infamous Google Product Graveyard.
Honestly, what is the point? They could just maintain the core search/ads and not pay billions of dollars for tens of thousands of expensive engineers who have to go through a bullshit interview process and achieve nothing.
tech was based on an acquired company, Google just abused their search monopoly to make it more popular(same thing they did with YT). This has been the strategy for every service they've ever made, Google really hasn't launched a decent in-house product since Gmail and even that was grown using their search monopoly as free advertising
>Google Docs originated from Writely, a web-based word processor created by the software company Upstartle and launched in August 2005
What about Chrome? And Chromebooks?
This is what Google has always cared about. Bring application to the billions of users.
People are forgetting Google is the most profitable AI company in the world right now. All of their products use ML and AI.
So who is losing?
The goal of Gemini isn't to build a chatbot like ChatGPT despite Google having Bard.
The goal for Gemini is to integrate it into those 10 products they have with a billion users.
> So who is losing?
The people who use their products, which are worse than they’ve been in decades? The people who make the content Google now displays without attribution on search results?
Is that supposed to be a vote of confidence for the current state of Google search?
Shopped it around VCs. Got laughed out of all the meetings. "Companies storing their documents on the Internet?! You're out of your mind!"
Globe dot com was basically Facebook, but the critical mass wasn't there. Nor were the smartphones.
And this business is so totally different to Google in every way imaginable.
Senior Managers love customer support, SLAs - Google loves automation. Two worlds collide.
Word and Excel have been dominant since the early 1980s. Google has never had a real shot in the space.
[0] yes, not literally nobody. I know about the Windows 2.0 Excel or whatever, but the user base compared to WordPerfect or 1-2-3 was tiny up until MS was able to start driving them out by leveraging Windows in the early-mid 90s.
Google has the problem that it's typically the first to encounter a problem, and it has the resources to approach it (from search), but the incentive to monetize it (to get away from depending entirely on search revenue). And, management.
They can't do anything that threatens their main income. They are tied to ads and ads technology, and can't do anything about it.
Microsoft had a crisis and that drives focus. Google... they probably mistreat their good employees if they don't work on ads.
Dead Comment
AI is a threat there, but it'd require an AI company to transform the culture of Internet use to stop people 'Googling', and that will require two things: something significantly better than Google Search that's worth switching to, and a company that is willing to reject whatever offer Google makes to buy it. Neither is very likely.
Also "interesting" to see the if results being SEO spam generated using AI will keep seo search viable.
Nadella is an all time great CEO. Pichai is an uninspired MBA-type.
The difference is Nadella is a good CEO and Pichai isn’t.
Part of it could also be a result of circumstance. Nadella came at a time when MS was foundering and he had to make what appeared to be fairly obvious decisions (pivot to cloud…he was literally picked because of this, and reducing dependence on Windows…which was an obvious necessary step for the pivot to cloud). Pichai OTOH was selected to run Google when it was already doing pretty well. His biggest mandate was likely to not upset the Apple cart.
If roles were reversed, I suspect Nadella would still have been more successful than Pichai, but you never know. I’d Nadella introduction to the CEO job was to keep things going as they were, and Pichai’s was to change the entire direction of the company, maybe a decade later Pichai would have been the aggressive decision maker whereas Nadella would have been the overly cautious guy making canned demos.
Of all the things, this.
I use both Google and Microsoft office products. One thing that strikes you is just how feature rich Microsoft products are.
Google doesn't look like is serious about making money.
I squarely blame rockstar product managers and OKRs for this. Not everything can be a 1000% profitable product built in the next quarter. A lot of things require small continuous improvement and care over years.
Under Eric Schmidt they were engineer-driven, during the golden era of the 2000s. Nowadays they're MBA driven, which is why they had 4 different messaging apps from different product managers.
I agree with your general post but I disagree with this. Tesla's FSD is so far behind Google it's almost negligent on the part of Tesla despite having so much more data.
What would the ideal setup in your opinion?
Deleted Comment
Besides collaborative reuniting is a no feature and there is much more important stuff than this for a word processor to be useful.
Well, that is trully shocking.
Dead Comment
Google Keynote feels like it's emulating the Apple keynote from 5 years ago.
And the Apple keynote looks like robots just out of an uncanny valley pretending to be humans - just like keynotes might look in 5 years, but actually made by AI. Apple is always ahead of the curve in keynote trends.
That's what Apple keynotes feel like now. It seems like each year, they're trying to make their presentations even more essentially 'Apple.' They crossed the uncanny valley a long time ago.
It’s the stillness between “beats” that does it, I think, and the very-constrained and repetitive motion.
What does that mean and why is it bad?
Diversity in marketing is used because, well, your desired market is diverse.
I don't know what it means for it to be surgically precise, though.
I don't mean to imply that companies should avoid displays of diversity, I just mean that it's obvious when it's inauthentic. Virtue signaling in exchange for business is not progress.
Now, I don't know how diverse the AI workforce is at Google, but the YT thumbnails show precisely 50% of white men. Maybe that's what the parent meant by "surgically precise".
They oversold its capabilities, but it does still seem that multi-modal models are going to be a requirement for AI to converge on a consistent idea of what kinds of phenomena are truly likely to be observed across modalities. So it's a good step forward. Now if they can just show us convincingly that a given architecture is actually modeling causality.
"do you believe that a pocket of hot air would lead to lower air pressure causing my plane to stall?"
he could barely even phrase the question correctly because it was so awkward. just embarrassing.
[1] https://www.youtube.com/watch?v=mHZSrtl4zX0&t=277s
I believe it is Hinton that prefers “confabulation” to “hallucination” because it’s more accurate. The example in the discussion about hallucination/confabulation was that of someone who had been present in the room during Nixon’s Watergate conversations. Interviewed about what he heard, he provided a narrative that got many facts wrong (who said what, and what exactly was said). Later, when audio tapes surfaced, the inaccuracies in his testimony became known. However, he had “confabulated truthfully”. That is, he had made up a narrative that fit his recall as best as he was able, and the gist of it was true.
Without the ability to confabulate, he would have been unable to tell his story.
(Incidentally, because I did not check the facts of what I just recounted, I just did the same thing…)
It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.
In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.
I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.
I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.
What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.
This was all fake. You are taking a collection of cherry picked prompt engineered examples, then dramatizing them for maximum shareholder hype. The music example was just outputting a description of a song, not the generated music we heard in the video.
It’s one thing to release a hype video with what-ifs and quite another to claim that your new multi-modal model is king of the hill then game all the benchmarks and fake all the demos.
Google seems to be in an evil phase. OpenAI and MS must be quite pleased with themselves.
1) Forward looking demoes that demonstrate the future of your product, where it’s clear that you’re not there yet but working in that direction
or
2) Demoes that show off current capabilities, but are scripted and edited to do so in the best light possible.
Both of those are standard practice and acceptable. What Google did was just wrong. They deserve to face backlash for this.
If lying helps, which can happen if there aren't large legal costs or social repercussions on brand equity, or if the lie goes undetected, then they'll lie. This is what we necessarily get from the upstream incentives. Fortunately, lying in a marketing video is fairly low on the list of ethical violations that have happened in the recent past.
We've effectively got a governance alignment problem that we've been trying to solve with regulations, taxes and social norms. How can you structure guardrails in the form of an incentive system to align companies with ethical outcomes? That's the question and it's a difficult one. This question also applies to any form of human organization, including governments.
My friend, all these large corporations are going to get away with exactly as much as they can, for as long as they can. You're implying there's nothing to do but wait until they grace us with a "not evil phase", when in reality we need to be working on restoring our anti-monopoly regulation that was systematically torn down over the last 30 years.
Given the massive data volume in videos, I assumed it processed video into pictures by extracting a frame per second or something along those lines, while still taking the entire video as the initial input.
Turns out, it wasn't even doing that!
Doing all these hype videos just for the sake of satisfying shareholders or whatever is just making me loose trust in their research division. I don't think they did anything like this when they released Bert.
Dead Comment
Deleted Comment
You do understand the concept of reputation, right?
If I demoed swype texting as it functions in my day to day life to someone used to a querty keyboard they would never adopt it
The rate at which it makes wrong assumptions about the word, or I have to fix it is probably 10% to 20% of the time
However because it’s so easy to fix this is not an issue and it doesn’t slow me down at all. So within the context of the different types of text Systems out there, I t’s the best thing going for me personally, but it takes some time to learn how to use it.
This is every product.
If you demonstrated to people how something will actually work after 100 hours of habituation and compensation for edge cases, nobody would ever adopt anything.
I’m not sure how to solve this because both are bad.
(Edit: I’m keeping all my typos as meta-comment on this given that I’m posting via swype on my phone :))
I would agree completely that it’s not ready for consumers the way it was displayed, which is my point.
I do want to add that I believe that the right way to do these types of new product rollout is not with these giant public announcements.
In fact, I think generally speaking the “right” way to do something like this demonstrates only things that are possible robustly. However that’s not the market that Google lives in. They’re capitalists trying to make as much money as possible. I’m simply evaluating that what they’re showing I think is absolutely technically possible and I think Google can deliver it even if its not ready today.
Do I think it’s supremely ethical the way that they did it? No I don’t.
Unfortunately iOS text editing is also completely worthless. It forces strange selections and inserts edited text in awkward ways.
I’m a QWERTY texter but text entry on iOS is a complete disaster that has only gotten worse over time.
But that's a different issue than LLM hallucinations.
With Swype, you already know what the correct output looks like. If the output doesn't match what you wanted, you immediately understand and fix it.
When you ask an LLM a question, you don't necessarily know the right answer. If the output looks confident enough, people take it as the truth. Outside of experimenting and testing, people aren't using LLMs to ask questions for which they already know the correct answer.
It is the main reason that handwriting recognition did not displace keyboards. Once the handwriting is converted to text, it’s easier to fix errors with a pointer and keyboard. So after a few rounds of this most people start thinking: might as well just start with the pointer and keyboard and save some time.
So the question is, how easy is it to detect and correct errors in generative AI output? And the unfortunate answer is that unless you already know the answer you’re asking for, it can be very difficult to pick out the errors.
Yeah the feedback loop with consumers has a higher likelihood of being detrimental, so even if the iteration rate is high, it’s potentially high cost at each step.
I think the current trend is to nerf the models or otherwise put bumpers on them so people can’t hurt themselves. That’s one approach that is brittle at best and someone with more risk tolerance (OpenAI) will exploit that risk gap.
It’s a contradiction then at best and depending on the level of unearned trust from the misleading marketing, will certainly lead to some really odd externalities
Think “man follows google maps directions into pond” but for vastly more things.
I really hated marketing before but yeah this really proves the warning I make in the AI addendum to my scarcity theory (in my bio).
This is the sort of demo that 1) gives people a misleading idea of what the product can actually do; and 2) ultimately contributes to the inevitable cynical backlash.
If the product is really great, people can see it in a realistic demo of its capabilities.
Maybe you can spice up a demo, but misleading to the point of implying things are generated when they're not (like the audio example) is pretty bad.
Except actual good ones, like ChatGPT or Gmail (by their time).
In your Swype analogy, it would be as if Swype works by having to write out on a piece of paper the general goal of what you're trying to convey, then having to write each individual letter on a Post-it, only for you to then organize these Post-its in the correct order yourself.
This process would then be translated into a slick promo video of someone swiping away on their keyboard.
This is not a matter of “eh, it doesn't 100% work as smooth as advertised.”
0: https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...
[0] https://f-droid.org/en/packages/inc.flide.vi8/
No, that is how you're told to type. You have to be told to type that way precisely because QWERTY is not designed to keep your fingers on the home row. If you type in a layout that is designed to do that, you don't need to be told to keep your fingers on the home row, because you naturally will.
Nobody really knows what the designers were thinking, which I do not mean as sarcasm, I mean it straight. History lost that information. But whatever they were thinking that is clearly not it because it is plainly obvious just by looking at it how bad it is at that. Nobody trying to design a layout for "keeping your fingers on the home row" would leave hjkl(semicolon) under the resting position of the dominant hand for ~90% of the people.
This, perhaps in one of technical history's great ironies, makes it a fairly good keyboard for swype-like technologies! A keyboard layout like Dvorak that has "aoeui" all right next to each other and "dhtns" on the other would be constantly having trouble figuring out which one you meant between "hat" and "ten" to name just one example. "uio" on qwerty could probably stand a bit more separation, but "a" and "e" are generally far enough apart that at least for me they don't end up confused, and pushing the most common consonants towards the outer part of the keyboard rather than clustering them next to each other in the center (on the home row) helps them be distinguishable too. "fghjkl" is almost a probability dead zone, and the "asd" on the left are generally reasonably distinct even if you kinda miss one of them badly.
I don't know what an optimal swype keyboard would be, and there's probably still a good 10% gain to be made if someone tried to make one, but it wouldn't be enough to justify learning a new layout.
[1] https://www.bloomberg.com/opinion/articles/2023-12-07/google...
[2] https://www.bloomberg.com/opinion/articles/2023-12-07/google...
Deleted Comment
I am curious, if the voice, email, chat, and shortly video can all be entirely generated in real or near real time, how can we be sure that remote employee is actually not a full or partially generated entity?
Shared secrets are great when verifying but when the bodies are fully remote - what is the solution?
I am traveling at the moment. How can my family validate that it is ME claiming lost luggage and requesting a Venmo request?
PGP
(I say this in jest, as a PGP user)