The first is that LLMs are bar none the absolute best natural language processing and producing systems we’ve ever made. They are absolutely fantastic at taking unstructured user inputs and producing natural-looking (if slightly stilted) output. The problem is that they’re not nearly as good at almost anything else we’ve ever needed a computer to do as other systems we’ve built to do those things. We invented a linguist and mistook it for an engineer.
The second is that there’s a maxim in media studies which is almost universally applicable, which is that the first use of a new media is to recapitulate the old. The first TV was radio shows, the first websites looked like print (I work in synthetic biology, and we’re in the “recapitulating industrial chemistry” phase). It’s only once people become familiar with the new medium (and, really, when you have “natives” to that medium) that we really become aware of what the new medium can do and start creating new things. It strikes me we’re in that recapitulating phase with the LLMs - I don’t think we actually know what these things are good for, so we’re just putting them everywhere and redoing stuff we already know how to do with them, and the results are pretty lackluster. It’s obvious there’s a “there” there with LLMs (in a way there wasn’t with, say, Web 3.0, or “the metaverse,” or some of the other weird fads recently), but we don’t really know how to actually wield these tools yet, and I can’t imagine the appropriate use of them will be chatbots when we do figure it out.
Transformers still excel at translation, which is what they were originally designed to do. It's just no longer about translating only language. Now it's clear they're good at all sorts of transformations, translating ideas, styles, etc. They represent an incredibly versatile and one-shot programmable interface. Some of the most successful applications of them so far are as some form of interface between intent and action.
And we are still just barely understanding the potential of multimodal transformers. Wait till we get to metamultimodal transformers, where the modalities themselves are assembled on the fly to best meet some goal. It's already fascinating scrolling through latent space [0] in diffusion models, now imagine scrolling through "modality space", with some arbitrary concept or message as a fixed point, being able to explore different novel expressions of the same idea, and sample at different points along the path between imagery and sound and text and whatever other useful modalities we discover. Acid trip as a service.
Something that has been bugging me is that, applications-wise, the exploitative end of the "exploitation-exploration" trade-off (for lack of a better summary) have gotten way more attention than the other side.
So, besides the complaints about accuracy, hallucinations (you said "acid trip") are dissed much more than would have been necessary.
I haven't read Understanding Media by Marshall McLuhan, but I think he introduced your second point in that book, in 1964. He claims that the content of each new medium is a previous medium. Video games contain film, film contains theater, theater contains screenplay, screenplay contains literature, literature contains spoken stories, spoken stories contain folklore, and I suppose if one were an anthropologist, they could find more and more chain links in this chain.
It's probably the same in AI — the world needs AI to be chat (or photos, or movies, or search, or an autopilot, or a service provider ...) before it can grow meaningfully beyond. Once people understand neural networks, we can broadly advance to new forms of mass-application machine learning. I am hopeful that that will be the next big leap. If McLuhan is correct, that next big leap will be something that is operable like machine learning, but essentially different.
Why are we comparing LLMs to media? I think media has much more freedom in a creative sense, its end goal is often very open-ended, especially when it's used for artistic purposes.
When it comes to AI, we're trying to replace existing technology with it. We want it to drive a car, write an email, fix a bug etc. That premise is what gives it economic value, since we have a bunch of cars/emails/bugs that need driving/writing/fixing.
Sure, it's interesting to think about other things it could potentially achieve when we think out of the box and find use cases that fit it more, but the "old things" we need to do won't magically go away. So I think we should be careful about such overgeneralizations, especially when they're covertly used to hype the technology and maintain investments.
OpenAI has been pushing the idea that these things are generic—and therefore the path to AGI—from the beginning. Their entire sales pitch to investors is that they have the lead on the tech that is most likely to replace all jobs.
If the whole thing turns out to be a really nifty commodity component in other people's pipelines, the investors won't get a return on any kind of reasonable timetable. So OpenAI keeps pushing the AGI line even as it falls apart.
First of all, "AI" is and always has been a vague term with a shifting definition. "AI" used to mean state search programs or rule-based reasoning systems written in LISP. When deep learning hit, lots of people stopped considering symbolic (i.e., non neural-net) AI to be AI. Now LLMs threaten to do the same to older neural-net methods. A pedantic conversation about what is and isn't true AI is not productive.
Second of all, LLMs have extremely impressive generic uses considering that their training just consists of consuming large amounts of unsorted text. Any counter argument about "it's not real intelligence" or "it's just a next-token predictor" ignores the fact that LLMs have enabled us to do things with machines that would have seemed impossible just a few years ago. No, they are not perfect, and yes there are lots of rough edges, but the fact that simply "solving text" has gotten us this far is huge and echoes some aspects of the Unix philosophy...
"Write programs to handle text streams, because that is a universal interface."
They're pretty AI to me
. I've been using chat gpt to explain things to me while learning a foreign language, and a native speaker has been overseeing the comments from it. it hasn't said anything that the native has disagreed with yet.
> It’s obvious there’s a “there” there with LLMs (in a way there wasn’t with, say, Web 3.0, or “the metaverse,” or some of the other weird fads recently)
There is a "there" with those other fads too. VRChat is a successful "metaverse" and Mastodon is a successful decentralized "web3" social media network. The reason these concepts are failures is because these small grains of success are suddenly expanded in scope to include a bunch of dumb ideas while the expectations are raised to astronomical levels.
That in turn causes investors throw stupid amounts of money at these concepts, which attracts all the grifters of the tech world. It smothers nacant new tech in the crib as it is suddenly assigned a valuation it can never realize while the grifters soak up all the investments that could've gone to competent startups.
>Mastodon is a successful decentralized "web3" social media network.
No, that's not what "web3" means. Web3 is all about the blockchain (or you can call it "distributed ledger technology" if you want to distance it from cryptocurrency scams).
There's nothing blockchain-y about Mastodon or the ActivityPub protocol.
> We invented a linguist and mistook it for an engineer.
That's not entirely true, either. Because LLMs _can_ write code, sometimes even quite well. The problem isn't that they can't code, the problem is that they aren't reliable.
Something that can code well 80% of time is as useful as something that can't code at all, because you'd need to review everything it writes to catch that 20%. And any programmer will know that reviewing code is just as hard as writing it in the first place. (Well, that's unless you just blindly trust whatever it writes. I think kids these days call that "vibe coding"....)
If that were the case, I wouldn't be using Cursor to write my code. It's definitely faster to write with Cursor, because it basically always knows what I was going to write myself anyway, so it saves me a ton of time.
>We invented a linguist and mistook it for an engineer.
People are missing the point. LLMs aren’t just fancy word parrots. They actually grasp something about how the world works. Sure, they’re still kind of stupid. Imagine a barely functional intern who somehow knows everything but can’t be trusted to file a document without accidentally launching a rocket.
Where I really disagree with the crowd is the whole “they have zero intelligence” take. Come on. These things are obviously smarter than some humans. I’m not saying they’re Einstein, but they could absolutely wipe the floor with someone who has Down syndrome in nearly every cognitive task. Memory, logic, problem-solving — you name it. And we don’t call people with developmental disorders letdowns, so why are we slapping that label on something that’s objectively outperforming them?
The issue is they got famous too quickly. Everyone wanted them to be Jarvis, but they’re more like a very weird guy on Reddit with a genius streak and a head injury. That doesn’t mean they’re useless. It just means we’re early. They’ve already cleared the low bar of human intelligence in more ways than people want to admit.
The fantastically intoxicating valuations of many current stocks is due to breathing the fumes of LLMs as artificial intelligence.
TFA puts it this way:
"The real reason companies are doing this is because Wall Street wants them to. Investors have been salivating for an Apple “super cycle” — a tech upgrade so enticing that consumers will rush to get their hands on the new model. "
Now to consider your two points...
> The first ... natural language querying.
Natural-language inputs are structured: they are language. But in any case, we must not minimise the significant effort to collect [0] and label trustworthy data for training. Given untrustworthy, absurd, and/or outright ignorant and wrong training data, an LLM would spew nonsense. If we train an LLM on tribalistic fictions, Reddit codswallop, or politicians' partisan ravings, what do you think the result of any rational natural-language query would be? (Rhetorical question.)
In short, building and labelling the corpus of knowledge is the essential technical advancement. We already have been doing natural-language processing with computers for a long time.
> The second ... new media recaptiulates the old.
LLMs are a new application. There are some effective uses of the new application. But there are many unsuitable applications, particularly where correctness is critical. (TFA mentions this.) There are illegal uses too.
TFA itself says,
"What problems is it solving? Well, so far that’s not clear! Are customers demanding it? LOL, no."
I agree that finding the profit models beyond stock hyperbole is the current endeavour. Some attempts are already proven: better Web search (with a trusted corpus), image scoring/categorisation, suggesting/drafting approximate solutions to coding or writing tasks.
How to monetise these and future implementations will determine whether LLMs devour anything serviceable the way Radio ate Theatre, the way TV ate Theatre, Radio and Print Journalism, the way the Internet ate TV, Radio, the Music Industry, and Print Journalism, and the way Social Media ate social discourse.
<edit: Note that the above devourings were mostly related to funding via advertising.>
If LLMs devour and replace the Village Idiot, we will have optimised and scaled the worst of humanity.
I actually believe the practical use of transformers, diffusers etc is already as impactful as the wide adoption of the internet. Or smartphones or cars. Its already used by hundreds of millions and it became an irreplaceable tool to enhance work output. And it just started. In 5 years from now it will dominate every single part of our lifes.
I just do not understand this attitude. ChatGPT alone has hundreds of millions of active users that are clearly getting value from it, despite any mistakes it may make.
To me the almost unsolvable problem Apple has is wanting to do as much as possible on device, but also have been historically very stingy with RAM (on iOS and Mac devices - iOS more understandably, given it doesn't really need huge amounts of RAM until LLMs came along). This gives them a real real problem, having to use very small models which hallucinate a lot more than giant cloud hosted ones.
Even if they did manage to get 16GB of RAM on their new iPhones that is still only going to be able to fit a 7b param model at a push (leaving 8GB for 'system' use).
In my experience even the best open source 7B local models are close to unusable. They'd have been mindblowning a few years ago but when you are used to "full size" cutting edge models it feels like an enormous downgrade. And I assume this to always be the case; while small models are always improving, so are the full size ones, so there will always be a big delta between them, and people are already used to the large ones.
So I think Apple probably needs to shift to using cloud services more like their Private Compute idea, but they have an issue there in so much that they have 1b+ users and it is not trivial at all to be able to handle that level of cloud usage for core iOS/Mac features (I suspect this is why virtually nothing uses Private Compute at the moment). Even if each iOS user only did 10 "cloud LLM" requests a day, that's over 10b/requests a day (10x the scale that OpenAI currently handles). And in reality it'd ideally be orders of magnitude more than that given how many possible integration options they are for mobile devices alone.
> ChatGPT alone has hundreds of millions of active users that are clearly getting value from it
True, but it’s been years now since the debut of the chat-interface-AI to the general public and we have yet to figure out another interface that would work for generative AI for the general public. I’d say the only other example is Adobe and what they are doing with generative AI in their photo editing tools, but thats a far cry from a “general public” type thing. You have all the bumbling nonsense coming out of Microsoft and Google trying to shove AI into whatever tools they are selling while still getting 0 adoption. The copilot and Gemini corporate sales teams have been both “restructured“ this year because they managed to sign up so many clients in 2023/2024 and all those clients refused to renew.
When it comes to the general public, we have yet to find a better application of AI than a chat interface. Even outside of the general public, I oversee few teams that are building “agentic AI tools/workflows” and the amount of trouble they have to go through to make something slightly coherent is insane. I still believe that the right team with the right architecture and design can probably achieve things that are incredible with LLMs, but it’s not as easy as the term “AI” makes it sound.
Putting generative AI inside tools without giving deep understanding of those tools to the AI generally made me more confused and frustrated than outside of it:
for example Gemini forced itself on me on my SMS app, so I thought I ask it to search something simple inside the messages, and it just started generating some random text about searching and saying that it doesn't have access to the messages themselves.
When I use ChatGPT, of course I know they don't have access to my SMSs (it would be weird).
I can give ChatGPT the exact context I want to, and I know it will work with it as long as the context is not too big.
> You assume hundreds of millions of users could identify serious mistakes when they see them. But humans have demonstrated repeatedly that they can't.
same is true for humans whether they're interacting with LLMs or other humans. so I'm inclined to take statements like
> I don't think it can ever be understated how dangerous this is.
There are thresholds for every technology where it is "good enough", same with LLMs or SLMs (on-device). Machine learning is already running on-device for photo classification/search/tagging, and even 1.5b models are getting fast really good, as long as they are well trained and used for the right task. Something like email writing, TTS and rewriting and other tasks should be easily doable, the "semantic search aspect" of chatbots are basically a new way of "google/web search" and probably stay in the cloud, but that's not their most crucial use.
Not a big fan of Apple's monopoly, but I like their privacy on-device handling. I don't care for Apple but on-device models are definitely the way to go from a consumer point of view.
The very fact that Apple thought they were going to run AI on iPhones says that leadership doesn't understand AI technology and simply mandated requirements to engineers without wanting to be bothered by details. In other words, Apple seems to be badly managed
I disagree. I think targeting running models on end user devices is a good goal, and it's the ideal case for user privacy and latency.
The human brain consumes around 20 watts, while of course there are substantial differences with implementation I think it's reasonable to draw a line and say that eventually we should expect models to hit similar levels of performance per watt. We see some evidence now that small models can achieve high levels of performance with better training techniques, and it's perfectly conceivable that acceptable levels of performance for general use will eventually be baked into models small enough to run on end hardware. And at the speed of development here, "eventually" could mean 1-2 years.
Actually, it's more of a sad capitulation to lazy armchair "analysts" and "pundits" who whined incessantly that Apple was "behind on AI," without taking stock of the fact that Apple does not NEED "AI." It does not serve their core businesses, product line, or users.
Instead of loudly jumping on this depressing bandwagon, Apple should have quietly improved Siri and then announced it when it was WORKING.
I suspect an issue at least as big is that they're running into a lot of prompt injection issues (even totally accidentally) with their attempts at personal knowledge base/system awareness stuff, whether remotely processed or not. Existing LLMs are already bad at this even with controlled inputs; trying to incorporate broad personal files in a Spotlight-like manner is probably terribly unreliable.
This is my experience as pretty heavy speech-to-text user (voice keyboard) - as they’ve introduced more AI features, I’ve started to have all sorts of nonsense from recent emails or contacts get mixed into simple transcriptions
It used to have no problem with simple phrases like “I’m walking home from the market” but now I’ll just as often have it transcribe “I’m walking home from the Mark Betts”, assuming Mark Betts was a name in my contacts, despite that sentence making much less structural sense
It’s bad enough that I’m using the feature much less because I have to spend as much time copyediting transcribed text before sending as I would if I just typed it out by hand. I can turn off stuff like the frequently confused notification summaries, but the keyboard has no such control as far as I know
> In my experience even the best open source 7B local models are close to unusable. They'd have been mindblowning a few years ago but when you are used to "full size" cutting edge models it feels like an enormous downgrade
Everything has limits - the only differences is where they are, and therefore how often you meet them.
If you are working with AI, using local models shows you where the problems can (and will) happen, which helps you write more robust code because you will be aware of these limits!
It's like how you write more efficient code if you have to use a resource constrained system.
Its just another tool (or toy), great at some stuff, almost useless or worse for another, and its fucking downed our throats at every corner, from every direction. I start to hate everything AI-infused with passion. Even here on HN, many people are not rational. I am willing to pay less for AI-anything, not the same and f_cking definitely not more.
Cargo culting of clueless managers which make long term usability of products much worse, everything requiring some stupid cloud, basic features locked up and you will be analyzed, this is just another shit on top.
You have any massive hype, you normally get this shit. Once big wave dies down with unavoidable sad moments for some, and tech progresses further (as it will) real added value for everybody may show up.
As for work - in my corporation, despite having pure dev senior role, coding is 10-20% of the work, and its part I can handle just fine on my own, I don't need babysitting from almost-correct statistical models. In fact I learn and keep fresh much better when still doing it on my own. You don't become or stay senior when solutions are handed down to you. Same reason I use git in command line and not clicking around. For code sweatshops I can imagine much more added value, but not here in this massive banking corporation. Politics, relationships, knowing processes and their quirks and limitations is what progresses stuff and gets it done. AI won't help here, if anybody thinks differently they have f_cking no idea what I talk about. In 10 years it may be different, lets open the discussion again then.
What do you mean? Lot of people pay, (me included) and are getting value. If you use it but don't pay, you still get value, otherwise you would be wasting your time. If you don't use it at all, that's your choice to make.
ChatGPT is mostly a tool which prints words on the screen; what the user does with those words is outside the domain and the area of responsibility of OpenAI. With iOS the expectation is that it will also do actions. It's almost a blessing that it hallucinates a lot and in obvious ways. It's going to get worse when it starts hallucinating in ways, and doing actions on user's behalf, that are subtle, almost unnoticeable.
With the current state of LLMs they should stay within the bounds of writing out random, but statistically likely, words. However, I think we are already at a point where will be paying price later down the road for all the hallucinations we have unleashed to the world in the past few years.
> ChatGPT alone has hundreds of millions of active user that are clearly getting value from it
So does OG Siri or Alexa, Letdown does not mean completely useless, it just means what the users are getting is far less than what they were promised, not that they get nothing.
In this context AI will be a letdown regardless of improvements in offline or even cloud models. It is not only because of additional complexity of offline model Apple will not deliver, their product vision just does not look achievable in the current state of tech in LLMs [1].
Apple itself while more grounded compared to peers who regularly talk about building AGI, or God etc, has been still showing public concept demos akin to what gaming studios or early stage founders do. Reality usually fall short when you run ahead of product development in marketing, it will be no different for Apple.
This is a golden rule of brand and product development - never show what have not built fully to the public if you want them to trust your brand.
To be clear, it is not bad for the company per se to do this, top tier AAA gaming studios do just fine as businesses despite letting down fans game after game with oversell and under deliver, but suffer as brands nobody will have good thing to say about Blizzard or EA or any other major studio.
Apple monetizes its brand very well by being able to price their products at premium compared to peers that will be at risk if users feel letdown.
[1] Perhaps new innovations will make radical improvements even in the near future, regardless that will not change Apple can ship in 2025 or even 2026 so still a letdown for users being promised things for last 2 years already.
Or clearly thinking they might get value from it. I personally agree they're likely getting value, but it's pretty easy to dupe otherwise smart people when handing them something with cabilities far outside their realm of expertise, so I'd caution against using a large user base as anything more than a suggestive signal when determining whether people are "clearly getting value."
For an example from a different domain, consider a lot of generic market-timing stock investment advice. It's pretty easy to sell predictions where you're right a significant fraction of the time, but the usual tradeoff is that the magnitude of your errors is much greater than the magnitude of your successes. Users can be easily persuaded that your advice is worth it because of your high success rate, but it's not possible for them to actually get any net value from the product.
Even beginning data scientists get caught in that sort of trap in their first forays into the markets [0], and people always have a hard time computing net value from products with a high proportion of small upsides and a small proportion of huge downsides [1].
It's kind of like the various philosophical arguments about micro murders. 10 murders per year is huge in a town of 40k people, but nobody bats an eye at 10 extra pedestrian deaths per year from routinely driving 35+ in a 25. Interestingly, even if that level of speeding actually saves you the maximal amount of time (rarely the case for most commutes, where light cycles and whatnot drastically reduce the average speedup from "ordinary" reckless driving), you'll on average cause more minutes of lost life from the average number of deaths you'll cause than you'll save from the speeding. It's a net negative behavior for society as a whole, but almost nobody is inclined to even try to think about it that way, and the immediate benefit of seemingly saving a few minutes outweighs the small risk of catastrophic harm. Similarly with rolling through stop signs (both from the immediate danger, and from the habit you're developing that makes you less likely to be able to successfully stop in the instances you actually intend to).
[0] Not a source, those are a dime a dozen if you want to see a DS lose a lot of money, but XKCD is always enjoyable: https://xkcd.com/1570/
Do you also judge crack cocaine's value by its number of users?
I don't think most people are capable of doing a cost/benefit ratio calculation on how what they do affects the rest of the world, and the wealthy are far and away the worst abusers of this sadass truth.
One of apple’s biggest missed with “AI” in my opinion, is not building a universal search.
For all the hype LLM generation gets, I think the rise of LLM-backed “semantic” embedding search does not get enough attention. It’s used in RAG (which inherits the hallucinatory problems), but seems underutilized elsewhere.
The worst (and coincidentally/paradoxically I use the most) searches I’ve seen is Gmail and Dropbox, both of which cannot find emails or files that I know exist, even if using the exact email subject and file name keywords.
Apple could arguably solve this with a universal search SDK, and I’d value this far more than yet-another-summarize-this-paragraph tool.
I have this same issue with gmail. I can not find e-mails by an exact word from text or subject. It is there, but search would not show it. I don't understand how a number one email provider can fail at that.
For this to happen they'd have to actually pay attention to spotlight and the quicklook/spotlight plugin ecosystem they abandoned. There's lots of obvious ways to combine LLMs with macOS' unique software advantages (applescript, bundle extensibility) but they have spent years systematically burning those bridges. I don't think they'll be able to swallow their pride and walk everything back.
The scenario in the article, about how AI is "usually" right in queries like "which airport is my mom's flight landing at and when?" is exactly the problem with Google's AI summaries as well. Several times recently I've googled something really obscure like how to get fr*king suspend working in Linux on a recent-ish laptop, and it's given me generic pablum instad of the actual, obscure trick that makes it work (type a 12-key magic sequence, get advanced BIOS options, pick an option way down a scrolling list to nuke fr*king modern suspend and restore S3 sleep... happiness in both Windows and Linux in the dual boot environment). So it just makes the answers harder to find, instead of helping.
But Google have a big problem in that the internet is full of random crap and people trying to actively mess with them.
Siri on the other hand should have access to definitely non-noise data like: your calendar. The message your mom sent to say ‘see you at _____ airport and your entire chat history with her.
I am 100% certain that if you gave GPT4 this info, it could EASILY get this right 100% of the time.
Apple’s ability to make Siri do anything useful with AI is totally incomprehensible and it is definitely not a problem with AI.
It could well be a problems with running a very tiny AI on device. I would not trust even GPT 3.5 with this task and it is a lot more capable than anything an iPhone could run.
I've been experiencing "AI" making things worse. Grammarly worked fine for a decade+ but now since, I guess, they've been trying to cram more LLM junk into it the recommendations have been a lot less reliable. Now it's sometimes missing even obvious typos.
AI working with your OS is absolutely the letdown. I do not want to give my personal computer's data a direct feed into the hands of the same developers who lie about copyright abuses when mining data.
90% of the mass consumer AI tech demos in the past 2-3 years are the exact same demos that voice assistants used to do with just speech-to-text + search functions. And these older tech demos are already things only 10% of users probably did regularly. So they are adding AI features to halo features that look good in marketing but people never use.
Keep the OS secure and let me use an Apple AI app in 2-3 years when they have rolled their own LLM.
It’s pretty hard for a company to do something outside their core competency.
Remember when Google launched a social network?
Remember when Facebook made a phone?
Remember when intel tried to make mobile chips?
Apple is the best in the world at making expensive computers in various sizes. From pocket size to desktop. And some peripherals. That’s their core competency. AI is not on the list.
FWIW I don't think that Google+ was a technological failure. On the contrary, it was quite a bit better than Facebook at being, well, social. The problem is that it doesn't matter if all people you actually want/need to talk to are already somewhere else. You can't really peel users off one by one, because each of them has their social graph locking them in.
Perhaps you're wrong, perhaps you're right, but it's the "Apple makes rare stumble" journo trope/narrative I was questioning tbf, not Apple themselves.
I don't know if the Vision Pro counts as a stumble. If they were planning to make a mass-market product, they wouldn't have priced it so high. Apple doesn't reveal sales targets, but I bet they sold about as many Vision Pros as they expected to.
The first is that LLMs are bar none the absolute best natural language processing and producing systems we’ve ever made. They are absolutely fantastic at taking unstructured user inputs and producing natural-looking (if slightly stilted) output. The problem is that they’re not nearly as good at almost anything else we’ve ever needed a computer to do as other systems we’ve built to do those things. We invented a linguist and mistook it for an engineer.
The second is that there’s a maxim in media studies which is almost universally applicable, which is that the first use of a new media is to recapitulate the old. The first TV was radio shows, the first websites looked like print (I work in synthetic biology, and we’re in the “recapitulating industrial chemistry” phase). It’s only once people become familiar with the new medium (and, really, when you have “natives” to that medium) that we really become aware of what the new medium can do and start creating new things. It strikes me we’re in that recapitulating phase with the LLMs - I don’t think we actually know what these things are good for, so we’re just putting them everywhere and redoing stuff we already know how to do with them, and the results are pretty lackluster. It’s obvious there’s a “there” there with LLMs (in a way there wasn’t with, say, Web 3.0, or “the metaverse,” or some of the other weird fads recently), but we don’t really know how to actually wield these tools yet, and I can’t imagine the appropriate use of them will be chatbots when we do figure it out.
And we are still just barely understanding the potential of multimodal transformers. Wait till we get to metamultimodal transformers, where the modalities themselves are assembled on the fly to best meet some goal. It's already fascinating scrolling through latent space [0] in diffusion models, now imagine scrolling through "modality space", with some arbitrary concept or message as a fixed point, being able to explore different novel expressions of the same idea, and sample at different points along the path between imagery and sound and text and whatever other useful modalities we discover. Acid trip as a service.
[0] https://keras.io/examples/generative/random_walks_with_stabl...
So, besides the complaints about accuracy, hallucinations (you said "acid trip") are dissed much more than would have been necessary.
It's probably the same in AI — the world needs AI to be chat (or photos, or movies, or search, or an autopilot, or a service provider ...) before it can grow meaningfully beyond. Once people understand neural networks, we can broadly advance to new forms of mass-application machine learning. I am hopeful that that will be the next big leap. If McLuhan is correct, that next big leap will be something that is operable like machine learning, but essentially different.
Here's Marc Andreessen applying it to AI and search on Lex Fridman's podcast: https://youtu.be/-hxeDjAxvJ8?t=160
When it comes to AI, we're trying to replace existing technology with it. We want it to drive a car, write an email, fix a bug etc. That premise is what gives it economic value, since we have a bunch of cars/emails/bugs that need driving/writing/fixing.
Sure, it's interesting to think about other things it could potentially achieve when we think out of the box and find use cases that fit it more, but the "old things" we need to do won't magically go away. So I think we should be careful about such overgeneralizations, especially when they're covertly used to hype the technology and maintain investments.
If the whole thing turns out to be a really nifty commodity component in other people's pipelines, the investors won't get a return on any kind of reasonable timetable. So OpenAI keeps pushing the AGI line even as it falls apart.
Second of all, LLMs have extremely impressive generic uses considering that their training just consists of consuming large amounts of unsorted text. Any counter argument about "it's not real intelligence" or "it's just a next-token predictor" ignores the fact that LLMs have enabled us to do things with machines that would have seemed impossible just a few years ago. No, they are not perfect, and yes there are lots of rough edges, but the fact that simply "solving text" has gotten us this far is huge and echoes some aspects of the Unix philosophy...
"Write programs to handle text streams, because that is a universal interface."
People primarily communicate thru words, so maybe not.
Of course, pictures, body language, and also tone are also other communication methods.
So far it looks like these models can convert pictures into words reasonably well, and the reverse is improving quickly.
Tone might be next - there are already models that can detect stress so that’s a good first start.
Body language is probably a bit farther in the future, but it might be as simple as image analysis (thats only a wild guess-I have no idea)
There is a "there" with those other fads too. VRChat is a successful "metaverse" and Mastodon is a successful decentralized "web3" social media network. The reason these concepts are failures is because these small grains of success are suddenly expanded in scope to include a bunch of dumb ideas while the expectations are raised to astronomical levels.
That in turn causes investors throw stupid amounts of money at these concepts, which attracts all the grifters of the tech world. It smothers nacant new tech in the crib as it is suddenly assigned a valuation it can never realize while the grifters soak up all the investments that could've gone to competent startups.
No, that's not what "web3" means. Web3 is all about the blockchain (or you can call it "distributed ledger technology" if you want to distance it from cryptocurrency scams).
There's nothing blockchain-y about Mastodon or the ActivityPub protocol.
Deleted Comment
That's not entirely true, either. Because LLMs _can_ write code, sometimes even quite well. The problem isn't that they can't code, the problem is that they aren't reliable.
Something that can code well 80% of time is as useful as something that can't code at all, because you'd need to review everything it writes to catch that 20%. And any programmer will know that reviewing code is just as hard as writing it in the first place. (Well, that's unless you just blindly trust whatever it writes. I think kids these days call that "vibe coding"....)
People are missing the point. LLMs aren’t just fancy word parrots. They actually grasp something about how the world works. Sure, they’re still kind of stupid. Imagine a barely functional intern who somehow knows everything but can’t be trusted to file a document without accidentally launching a rocket.
Where I really disagree with the crowd is the whole “they have zero intelligence” take. Come on. These things are obviously smarter than some humans. I’m not saying they’re Einstein, but they could absolutely wipe the floor with someone who has Down syndrome in nearly every cognitive task. Memory, logic, problem-solving — you name it. And we don’t call people with developmental disorders letdowns, so why are we slapping that label on something that’s objectively outperforming them?
The issue is they got famous too quickly. Everyone wanted them to be Jarvis, but they’re more like a very weird guy on Reddit with a genius streak and a head injury. That doesn’t mean they’re useless. It just means we’re early. They’ve already cleared the low bar of human intelligence in more ways than people want to admit.
The fantastically intoxicating valuations of many current stocks is due to breathing the fumes of LLMs as artificial intelligence.
TFA puts it this way:
Now to consider your two points...> The first ... natural language querying.
Natural-language inputs are structured: they are language. But in any case, we must not minimise the significant effort to collect [0] and label trustworthy data for training. Given untrustworthy, absurd, and/or outright ignorant and wrong training data, an LLM would spew nonsense. If we train an LLM on tribalistic fictions, Reddit codswallop, or politicians' partisan ravings, what do you think the result of any rational natural-language query would be? (Rhetorical question.)
In short, building and labelling the corpus of knowledge is the essential technical advancement. We already have been doing natural-language processing with computers for a long time.
> The second ... new media recaptiulates the old.
LLMs are a new application. There are some effective uses of the new application. But there are many unsuitable applications, particularly where correctness is critical. (TFA mentions this.) There are illegal uses too.
TFA itself says,
I agree that finding the profit models beyond stock hyperbole is the current endeavour. Some attempts are already proven: better Web search (with a trusted corpus), image scoring/categorisation, suggesting/drafting approximate solutions to coding or writing tasks.How to monetise these and future implementations will determine whether LLMs devour anything serviceable the way Radio ate Theatre, the way TV ate Theatre, Radio and Print Journalism, the way the Internet ate TV, Radio, the Music Industry, and Print Journalism, and the way Social Media ate social discourse.
<edit: Note that the above devourings were mostly related to funding via advertising.>
If LLMs devour and replace the Village Idiot, we will have optimised and scaled the worst of humanity.
= = =
[0] _ major legal concerns to be unresolved
[1] _ https://en.wikipedia.org/wiki/Network_(1976_film) , https://www.npr.org/2020/09/29/917747123/you-literally-cant-...
To me the almost unsolvable problem Apple has is wanting to do as much as possible on device, but also have been historically very stingy with RAM (on iOS and Mac devices - iOS more understandably, given it doesn't really need huge amounts of RAM until LLMs came along). This gives them a real real problem, having to use very small models which hallucinate a lot more than giant cloud hosted ones.
Even if they did manage to get 16GB of RAM on their new iPhones that is still only going to be able to fit a 7b param model at a push (leaving 8GB for 'system' use).
In my experience even the best open source 7B local models are close to unusable. They'd have been mindblowning a few years ago but when you are used to "full size" cutting edge models it feels like an enormous downgrade. And I assume this to always be the case; while small models are always improving, so are the full size ones, so there will always be a big delta between them, and people are already used to the large ones.
So I think Apple probably needs to shift to using cloud services more like their Private Compute idea, but they have an issue there in so much that they have 1b+ users and it is not trivial at all to be able to handle that level of cloud usage for core iOS/Mac features (I suspect this is why virtually nothing uses Private Compute at the moment). Even if each iOS user only did 10 "cloud LLM" requests a day, that's over 10b/requests a day (10x the scale that OpenAI currently handles). And in reality it'd ideally be orders of magnitude more than that given how many possible integration options they are for mobile devices alone.
True, but it’s been years now since the debut of the chat-interface-AI to the general public and we have yet to figure out another interface that would work for generative AI for the general public. I’d say the only other example is Adobe and what they are doing with generative AI in their photo editing tools, but thats a far cry from a “general public” type thing. You have all the bumbling nonsense coming out of Microsoft and Google trying to shove AI into whatever tools they are selling while still getting 0 adoption. The copilot and Gemini corporate sales teams have been both “restructured“ this year because they managed to sign up so many clients in 2023/2024 and all those clients refused to renew.
When it comes to the general public, we have yet to find a better application of AI than a chat interface. Even outside of the general public, I oversee few teams that are building “agentic AI tools/workflows” and the amount of trouble they have to go through to make something slightly coherent is insane. I still believe that the right team with the right architecture and design can probably achieve things that are incredible with LLMs, but it’s not as easy as the term “AI” makes it sound.
for example Gemini forced itself on me on my SMS app, so I thought I ask it to search something simple inside the messages, and it just started generating some random text about searching and saying that it doesn't have access to the messages themselves.
When I use ChatGPT, of course I know they don't have access to my SMSs (it would be weird).
I can give ChatGPT the exact context I want to, and I know it will work with it as long as the context is not too big.
You assume hundreds of millions of users could identify serious mistakes when they see them.
But humans have demonstrated repeatedly that they can't.
I don't think it can ever be understated how dangerous this is.
> I think Apple probably needs to shift to using cloud services more
You ignore lessons from the the recent spat between Apple and the UK.
same is true for humans whether they're interacting with LLMs or other humans. so I'm inclined to take statements like
> I don't think it can ever be understated how dangerous this is.
as hysteria
Not a big fan of Apple's monopoly, but I like their privacy on-device handling. I don't care for Apple but on-device models are definitely the way to go from a consumer point of view.
The human brain consumes around 20 watts, while of course there are substantial differences with implementation I think it's reasonable to draw a line and say that eventually we should expect models to hit similar levels of performance per watt. We see some evidence now that small models can achieve high levels of performance with better training techniques, and it's perfectly conceivable that acceptable levels of performance for general use will eventually be baked into models small enough to run on end hardware. And at the speed of development here, "eventually" could mean 1-2 years.
Instead of loudly jumping on this depressing bandwagon, Apple should have quietly improved Siri and then announced it when it was WORKING.
Nope
https://security.apple.com/blog/private-cloud-compute/
It used to have no problem with simple phrases like “I’m walking home from the market” but now I’ll just as often have it transcribe “I’m walking home from the Mark Betts”, assuming Mark Betts was a name in my contacts, despite that sentence making much less structural sense
It’s bad enough that I’m using the feature much less because I have to spend as much time copyediting transcribed text before sending as I would if I just typed it out by hand. I can turn off stuff like the frequently confused notification summaries, but the keyboard has no such control as far as I know
Everything has limits - the only differences is where they are, and therefore how often you meet them.
If you are working with AI, using local models shows you where the problems can (and will) happen, which helps you write more robust code because you will be aware of these limits!
It's like how you write more efficient code if you have to use a resource constrained system.
Cargo culting of clueless managers which make long term usability of products much worse, everything requiring some stupid cloud, basic features locked up and you will be analyzed, this is just another shit on top.
You have any massive hype, you normally get this shit. Once big wave dies down with unavoidable sad moments for some, and tech progresses further (as it will) real added value for everybody may show up.
As for work - in my corporation, despite having pure dev senior role, coding is 10-20% of the work, and its part I can handle just fine on my own, I don't need babysitting from almost-correct statistical models. In fact I learn and keep fresh much better when still doing it on my own. You don't become or stay senior when solutions are handed down to you. Same reason I use git in command line and not clicking around. For code sweatshops I can imagine much more added value, but not here in this massive banking corporation. Politics, relationships, knowing processes and their quirks and limitations is what progresses stuff and gets it done. AI won't help here, if anybody thinks differently they have f_cking no idea what I talk about. In 10 years it may be different, lets open the discussion again then.
Idk about that, wouldn't pay for it.
With the current state of LLMs they should stay within the bounds of writing out random, but statistically likely, words. However, I think we are already at a point where will be paying price later down the road for all the hallucinations we have unleashed to the world in the past few years.
So does OG Siri or Alexa, Letdown does not mean completely useless, it just means what the users are getting is far less than what they were promised, not that they get nothing.
In this context AI will be a letdown regardless of improvements in offline or even cloud models. It is not only because of additional complexity of offline model Apple will not deliver, their product vision just does not look achievable in the current state of tech in LLMs [1].
Apple itself while more grounded compared to peers who regularly talk about building AGI, or God etc, has been still showing public concept demos akin to what gaming studios or early stage founders do. Reality usually fall short when you run ahead of product development in marketing, it will be no different for Apple.
This is a golden rule of brand and product development - never show what have not built fully to the public if you want them to trust your brand.
To be clear, it is not bad for the company per se to do this, top tier AAA gaming studios do just fine as businesses despite letting down fans game after game with oversell and under deliver, but suffer as brands nobody will have good thing to say about Blizzard or EA or any other major studio.
Apple monetizes its brand very well by being able to price their products at premium compared to peers that will be at risk if users feel letdown.
[1] Perhaps new innovations will make radical improvements even in the near future, regardless that will not change Apple can ship in 2025 or even 2026 so still a letdown for users being promised things for last 2 years already.
They literally have data centers worth of devices running inferences anonymously
Or clearly thinking they might get value from it. I personally agree they're likely getting value, but it's pretty easy to dupe otherwise smart people when handing them something with cabilities far outside their realm of expertise, so I'd caution against using a large user base as anything more than a suggestive signal when determining whether people are "clearly getting value."
For an example from a different domain, consider a lot of generic market-timing stock investment advice. It's pretty easy to sell predictions where you're right a significant fraction of the time, but the usual tradeoff is that the magnitude of your errors is much greater than the magnitude of your successes. Users can be easily persuaded that your advice is worth it because of your high success rate, but it's not possible for them to actually get any net value from the product.
Even beginning data scientists get caught in that sort of trap in their first forays into the markets [0], and people always have a hard time computing net value from products with a high proportion of small upsides and a small proportion of huge downsides [1].
It's kind of like the various philosophical arguments about micro murders. 10 murders per year is huge in a town of 40k people, but nobody bats an eye at 10 extra pedestrian deaths per year from routinely driving 35+ in a 25. Interestingly, even if that level of speeding actually saves you the maximal amount of time (rarely the case for most commutes, where light cycles and whatnot drastically reduce the average speedup from "ordinary" reckless driving), you'll on average cause more minutes of lost life from the average number of deaths you'll cause than you'll save from the speeding. It's a net negative behavior for society as a whole, but almost nobody is inclined to even try to think about it that way, and the immediate benefit of seemingly saving a few minutes outweighs the small risk of catastrophic harm. Similarly with rolling through stop signs (both from the immediate danger, and from the habit you're developing that makes you less likely to be able to successfully stop in the instances you actually intend to).
[0] Not a source, those are a dime a dozen if you want to see a DS lose a lot of money, but XKCD is always enjoyable: https://xkcd.com/1570/
[1] Also not a source, just another great XKCD: https://xkcd.com/937/
I don't think most people are capable of doing a cost/benefit ratio calculation on how what they do affects the rest of the world, and the wealthy are far and away the worst abusers of this sadass truth.
For all the hype LLM generation gets, I think the rise of LLM-backed “semantic” embedding search does not get enough attention. It’s used in RAG (which inherits the hallucinatory problems), but seems underutilized elsewhere.
The worst (and coincidentally/paradoxically I use the most) searches I’ve seen is Gmail and Dropbox, both of which cannot find emails or files that I know exist, even if using the exact email subject and file name keywords.
Apple could arguably solve this with a universal search SDK, and I’d value this far more than yet-another-summarize-this-paragraph tool.
Siri on the other hand should have access to definitely non-noise data like: your calendar. The message your mom sent to say ‘see you at _____ airport and your entire chat history with her.
I am 100% certain that if you gave GPT4 this info, it could EASILY get this right 100% of the time.
Apple’s ability to make Siri do anything useful with AI is totally incomprehensible and it is definitely not a problem with AI.
It could well be a problems with running a very tiny AI on device. I would not trust even GPT 3.5 with this task and it is a lot more capable than anything an iPhone could run.
90% of the mass consumer AI tech demos in the past 2-3 years are the exact same demos that voice assistants used to do with just speech-to-text + search functions. And these older tech demos are already things only 10% of users probably did regularly. So they are adding AI features to halo features that look good in marketing but people never use.
Keep the OS secure and let me use an Apple AI app in 2-3 years when they have rolled their own LLM.
I'm not sure anyone is going to buy it, but it doesn't cost them anything to get a few of their PR hacks to give it a try.
It's about as convincing as "we didn't build a bad phone, you're just holding it wrong!".
Auto. Vision Pro. AI.
Is there a pattern emerging here?
Search in Mail is abysmal since forever. Everyone knows it. Apple knows it. No change still. So, no surprise here.
https://mjtsai.com/blog/2019/10/11/mail-data-loss-in-macos-1...
Remember when Google launched a social network?
Remember when Facebook made a phone?
Remember when intel tried to make mobile chips?
Apple is the best in the world at making expensive computers in various sizes. From pocket size to desktop. And some peripherals. That’s their core competency. AI is not on the list.
Or how Watch has the full power of all my location info, my driving status, my motion, all opted in, but can't figure out squat.
My watch thinks last week I did a 23 mile hike in 1 hour, 20 miles of it accomplished while in do-not-disturb-while-driving mode.
Fall detection? Nope. False negative, verified enabled and does not work.
Autocorrect… it's actually gotten worse.