I lead an applied AI research team where I work - which is a mid-sized public enterprise products company. I've been saying this in my professional circles quite often.
We talk about scaling laws, superintelligence, AGI etc. But there is another threshold - the ability for humans to leverage super-intelligence. It's just incredibly hard to innovate on products that fully leverage superintelligence.
At some point, AI needs to connect with the real world to deliver economically valuable output. The ratelimiting step is there. Not smarter models.
In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
We're seeing much less of "it's making mistakes" these days.
If we have open-source models that match up to GPT-4 on AWS / Azure etc, not much point to go with players like OpenAI / Anthropic who may have even smarter models. We can't even use the dumber models fully.
Your paycheck depends on people believing the hype. Therefore, anything you say about "superintelligence" (LOL) is pretty suspect.
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
So, what, you're going to build a model to instruct the model? And how do we instruct that model?
This is such a transparent scam, I'm embarrassed on behalf of our species.
Snarky tone aside, there are different audiences. For example, I primarily work with web dev and some DevOps and I can tell you that the state of both can be pretty dire. Maybe not as much in my particular case, as in general.
Some examples to illustrate the point: supply chain risks and an ever increasing amount of dependencies (look at your average React project, though this applies to most stacks), overly abstracted frameworks (how many CPU cycles Spring Boot and others burn and how many hoops you have to jump through to get thigns done), patterns that mess up the DBs ability to optimize queries sometimes (EAV, OTLT, trying to create polymorphic foreign keys), inefficient data fetching (sometimes ORMs, sometimes N+1), bad security practices (committed secrets, anyone? bad usage of OAuth2 or OIDC?), overly complex tooling, especially the likes of Kubernetes when you have a DevOps team of one part time dev, overly complex application architectures where you have more more services than developers (not even teams). That's before you even get into the utter mess of long term projects that have been touched by dozens of developers over the years and the whole sector sometimes feeling like wild west, as opposed to "real engineering".
However, the difference here is that I wouldn't overwhelm anyone who might give me money with rants about this stuff and would navigate around those issues and risks as best I can, to ship something useful at the end of the day. Same with having constructive discussions about any of those aspects in a circle of technical individuals, on how to make things better.
Calling the whole concept a "scam" doesn't do anyone any good, when I already derive value from the LLMs, as do many others. Look at https://www.cursor.com/ for example and consider where we might be in 10-20 years. Not AGI, but maybe good auto-complete, codegen and reasoning about entire codebases, even if they're hundreds of thousands of lines long. Tooling that would make anyone using it more productive than those who don't. Unless the funding dries up and the status quo is restored.
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself
Ehm yes. That's because it actually doesn't work as well as the hype suggests, not because it's too "high bandwidth".
Moreover, this is exactly the frustration I've experienced when working with outsourced developers.
Which tells me the problem may be fundamental, not a technical one. It's not just a matter of needing "more intelligence". I don't question the intelligence or skill of the people on the outsourced team I was working with. The problem was simple communication. They didn't really know or understand our business and its goals well enough to anticipate all sorts of little things, and the lack of constant social interaction of the type you typically get when everybody's a direct coworker meant we couldn't build that mind-meld over time, either. So we had to pick up the slack with massive over-specification.
I'd contribute that we, the engineering class, are as a whole terrible communicators that confuse and cannot explain their own work. LLMs require clear communications, while the majority of LLM users attempt to use them with a large host of implied context that anyone would have a hard time following, not to mention a non-human software construct. The key is clear communications, which is a phrase that many in STEM don't have the education to know what that phrase really means, technically, realistically.
It works for some annoyances in life, though. For example, you can get it to write a complaint to an administration. It's good enough unless you'd rather be witty and write it yourself.
The "it's making mistakes" phase might be based on the testing strategy.
Remember the old bit about the media-- the stories are always 100% infalliable except strangely in YOUR personal field of expertise.
I suspect it's something similar with AI products.
People test them with toy problems -- "Hey ChatGPT, what's the square root of 36", and then with something close to their core knowledge.
It might learn to solve a lot of the toy problems, but plenty of us are still seeing a lot of hallucinations in the "core knowledge" questions. But we see people then taking that product-- that they know isn't good in at least one vertical-- and trying to apply it to other contexts, where they may be less qualified to validate if the answer is right.
I think a crucial aspect is to only apply chatbots' answers to a domain where you can rapidly validate their correctness (or alternatively, to take their answers with a huge pinch of salt, or simply as creative search space exploration).
For me, the number of times where it's led me down a hallucinated, impossible, or thoroughly invalid rabbit hole have been relatively minimal when compared against the number of times when it has significantly helped. I really do think the key is in how you use them, for what types of problems/domains, and having an approach that maximizes your ability to catch issues early.
* We're seeing much less of "it's making mistakes" these days.*
Perhaps less than before, but still making very fundamental errors. Anything involving number I'm automatically suspicious. Pretty frequently I'd get different answers for the same question (to a human).
e.g. ChatGPT will give an effective tax rate of n for some income amount. Then when asked to break down the calculation will come up with an effective tax rate of m instead. When asked how much tax is owed on that income will come up with a different number such that the effective rate is not n or m.
Until this is addressed to a sufficient degree, it seems difficult to apply to anything that involves numbers and can't be quickly verified by a human.
Yes. Numbers / math is pretty much instant hallucination.
But. Try this approach instead: have it generate python code, with print statements before every bit of math it performs. It will write pretty good code, which you then execute to generate the actual answer.
Simpler example: paste in a paragraph of text, ask it to count the number of words. The answer will be incorrect most of the time.
Instead, ask it to out each word in the text in a numbered list and then output the word count. It will be correct almost always.
My anecdotal learning from this:
LLMs are pretty human-like in their mental abilities. I wouldn't be able to simply look at some text and give you an accurate word count. I would point my finger / cursor to every word and count up.
The solutions above are basically giving LLMs some additional techniques or tools, very similar to how a human may use a calculator, or count words.
In the products we've built, there is an AI feature that generates aggregations of spreadsheet data. We have a dual unittest & aggregator loop to generate correct values.
The first step is to generate some unittests. And in order to generate correct numerical data for unittests, we ask it to write some code with math expressions first. We interpret the expressions, and paste it back into the unittest generator - which then writes the unittests with the correct inputs / outputs.
Then the aggregation generator then generates code until the generated unittests pass completely. Then we have the code for the aggregator function that we can run against the spreadsheet.
Takes a couple of minutes, but pretty bulletproof and also generalizable to other complex math calculations.
> Perhaps less than before, but still making very fundamental errors.
Yes.
Suppose someone developed a way to get a reliable confidence metric out of an LLM. Given that, much more useful systems can be built.
Only high-confidence outputs can be used to initiate action. For low-confidence outputs, chain of reasoning
tactics can be tried. Ask for a simpler question. Ask the LLM to divide the question into sub-questions.
Ask the LLM what information it needs to answer the question, and try to get that info from a search engine.
Most of the strategies humans and organizations use when they don't know something will work for LLMs.
The goal is to get an all high confidence chain of reasoning.
If only they knew when they didn't know something.
There's research on this.[4] No really good results yet, but some progress. Biggest unsolved problem in computing today.
LLMs don't do math in that sense. They build a string of tokens out of a billion pre-weighted ones that gets a favorable probably distribution when taking your prompt into account. Change your prompt, get a different printout. There is no semantic understanding (in the sense of does what is printed make sense) and it therefore cannot plausibility check its response. A LLM will just print gibberish if it gets the best probability distribution of tokens. I'm sure that's something that will be addressed over time, but we are not there yet.
I'm not keen on marketing words like "superintelligence" but boiling it down that's what in my mind the OP said. These systems are limited in ways that we do not yet fully appreciate. They are not silver bullets for all or maybe even many problems. We need to figure out where they can be deployed for greater benefit.
> Perhaps less than before, but still making very fundamental errors. Anything involving number I'm automatically suspicious
Totally. They are large language models, not math models.
I think the problem is that 'some people' overhype them as universal tools to solve any problem, and to answer any question. But really, LLMs excel in generating pretty regular text.
i tell chatgpt i pay 5 dollars for every excellent response, and i make it keep track of how much i spend per chat session (of course in addition to my normal course of work)
it does 2 things. 1 tells me how deep i am in the conversation and 2 when the computation falls apart, i can assume other things in its response will be trash as well. and sometimes number 3 how well the software is working. ranges from 20$ to 35$ on average but a couple days they go deep to 45$ (4,7,9 responses in chat session)
today i learned i can knock the computation loose in a session somewhere around the 4th reply or $20 by injecting a random number in my course work and it was latching onto the number instead of computing $
We're seeing much less of "it's making mistakes" these days.
Is this because it's actually making less mistakes, or is it just because most people have used it enough now to know not to bother with anything complex?
Today I carved a front panel for my cyberdeck project out of a composite wood board. I hand-drafted everything, and planned out the wiring (though I won't be onto the soldering phase for a while now). It felt good. I don't think having a 3d printer + AI designing my cyberdeck would feel the same.
> In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
It's a token prediction machine. We've already generated most of the ideas for it, and hardly any of them work because see below
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
No. Gettin AI to work you need to make an AI, and not a token prediction machine which, however wonderful:
- does not understand what it is it's generating, and approaches generating code the same way it approaches generating haikus
- hallucinates and generates invalid data
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
Indeed. Instead of asking why, you're wildly fantasizing about running out of ideas and pretending you can make this work through other means of communication.
The model is doing the exact same thing when it generates "correct" output as it does when it generates "incorrect" output.
"Hallucination" is a misleading term, cooked up by people who either don't understand what's going on or who want to make it sound like the fundamental problems (models aren't intelligent, can't reason, and attach zero meaning to their input or output) can be solved with enough duct tape.
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
I find this funny because for what I use ChatGPT for - asking programming questions that would otherwise go to Google/StackOverflow - I have a much better time writing queries for ChatGPT than Google, and getting useful results back.
Google will so often return StackOverflow results that are for subtly very different questions, or ill have to squint hard to figure out how to apply that answer to my problem. When using ChatGPT, i rarely have to think about how other people asked the question.
We have ideas on how to leverage it. But we keep them to ourselves for our products and our companies. AI by itself isn’t a breakthrough product the same way that the iPhone or the web was. It’s a utility for others to enhance their products or their operations. Which is the main reason why so many people believe we’re in an AI bubble. We just don’t see the killer feature that justifies all that spending.
Your use of the word superintelligence is jarring to me. That's not yet a thing and not yet visible on the horizon. That aside, the point I like that you seem to be making is along the lines of: we overestimate the short term impact of new tech, but underestimate the long term impact. There is a lot to be done and a lot of refinement to come.
Yes, this is in my mind where people will find the fabled moat they search for too.
SOTA models are impressive, as is the idea of building AGIs that do everything for us, but in the meantime there are a lot of practical applications of the open source and smaller models that are being missed out on in my opinion.
I also think business is going to struggle to adapt and existing business is at a disadvantage for deploying AI tools, after all, who wants to replace themselves and lose their salary? Its a personal incentive not to leverage AI at the corporate level.
>Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
the problem is really, can it learn "I need to turn this over to a human because it is such an edge case that there will not be an automated solution."
"In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it."
This is the main bottle neck, in my kind. A lot of people are missing from the conversation because they don't understand AI fully. I keep getting glimpses of ideas and possibilities and chatting through a browser ain't one of them. On e we have more young people trained on this and comfortable with the tech and understanding it, and existing professionals have light bulbs go off in their heads as they try to integrate local LLMs, then real changes are going to hit hard and fast. This is just a lot to digest right now and the tech is truly exponential which makes it difficult to ideate right now. We are still enveloping the productivity boost from chatting.
I tried explaining how this stuff works to product owners and architects and that we can integrate local LLMs into existing products. Everyone shook their head and agreed. When I posted a demo in chat a few weeks later you would have thought the CEO called them on their personal phone and told them to get on this shit. My boss spent the next two weeks day and night working up a demo and presentation for his bosses. It went from zero to 100kph instantly.
Just the fact that I can have something proficient in language trivially accessible to me is really useful. I'm working on something that uses LLMs (language translation), but besides that I think it's brilliant that I can just ask an LLM to summarise my prompt in a way that gets the point across in far fewer tokens. When I forget a word, I can give it a vague description and it'll find it. I'm terrible at writing emails, and I can just ask it to point out all the little formalisms I need to add to make it "proper".
I can benchmark the quality of one LLM's translation by asking another to critique it. It's not infallible, but the ability to chat with a multilingual agent is brilliant.
It's a new tool in the toolbox, one that we haven't had in our seventy years of working on computers, and we have seventy years of catchup to do working out where we can apply them.
It's also just such a radical departure from what computers are "meant" to be good at. They're bad at mathematics, forgetful, imprecise, and yet they're incredible at poetry and soft tasks.
Oh - and they are genuinely useful for studying, too. My A Level Physics contained a lot of multiple choice questions, which were specifically designed to catch people out on incorrect intuitions and had no mark scheme beyond which answer was correct. I could just give gpt-4o a photo of the practice paper and it'd tell me not just the correct answer (which I already knew), but why it was correct, and precisely where my mental model was incorrect.
Sure, I could've asked my teacher, and sometimes I did. But she's busy with twenty other students. If everyone asked for help with every little problem she'd be unable to do anything else. But LLMs have infinite patience, and no guilt for asking stupid questions!
I will be glad to see the day when LLMs will be able to play Minecraft, so I won't have to. Then I can just relax and watch someone else do everything for me without lifting a single finger.
In the same line, there are also a phrase about technology, "is everything that doesn’t work yet." by Danny Hillis, "Electric motors were once technology – they were new and did not work well. As they evolved, they seem to disappear, even though they proliferated and were embedded by the scores into our homes and offices. They work perfectly, silently, unminded, so they no longer register as “technology.” https://kk.org/thetechnium/everything-that/
On an amusing note, I've read something similar: Everything that works stops being called philosophy. Science and math being the two familiar examples.
Just in case anyone's curious, this is from Bertrand Russell's "the history of philosophy".
> As soon as definite knowledge concerning any subject becomes possible, this subject ceases to be called philosophy, and becomes a separate science.
I'm not actually sure I agree with it, especially in light of less provable schools of science like string theory or some branches of economics, but it's a great idea.
There is a reason for that. People who inquired into the actual functioning of the world used to be called philosophers. That's why so many foundations of mathematics actually come from philosophers. The split happened around the 17th century. Newton still called his monumental work "Natural Philosophy", not "Physics".
This is also true for consciousness or sentience. No matter how surprising abilities of non-human beings (and computer agents), it is something mysterious that only humans do.
I won't say that things like stoicism or humanism never worked. But they never got to the level of strict logical or experimental verifiability. Physics may be hard science, but the very notion of hard science, hypotheses, demand to replicate, demand to be able to falsify, etc, are all philosophy.
Exactly what I wrote recently: "The "AI effect" is behind some of the current confusion. As John McCarthy, AI pioneer who coined the term "artificial intelligence," once said: "As soon as it works, no one calls it AI anymore." This is why we often hear that AI is "far from existing." This led to the formulation of the Tesler's Theorem: "AI is whatever hasn't been done yet.""
https://www.lycee.ai/blog/there-are-indeed-artificial-intell...
> As soon as it works, no one calls it AI anymore.
So, what are good examples of some things that we used to call AI, which we don't call AI anymore because they work? All the examples that come to my mind (recommendation engines, etc.) do not have any real societal benefits.
Lisp's inventor, John McCarthy, was an AI researcher. (The US government started funding AI research in the 1950s, expecting progress to be much faster than it actually was.)
The original SQL was essentially Prolog restricted to relational algebra and tuple relational calculus. SQL as is happened when a lot of cruft was added to the mathematical core.
This is a pretty common perspective that was introduced to me as “shifting the goalposts” in school. I have always found it a disingenuous argument because it’s applied so narrowly.
Humans are intelligent + humans play go => playing go is intelligent
Humans are intelligent + humans do algebra => doing algebra is intelligent
Meanwhile, humans in general are pretty terrible at exact, instantaneous arithmetic. But we aren’t claiming that computers are intelligent because they’re great at it.
Building a machine that does a narrowly defined task better than a human is an achievement, but it’s not intelligence.
Although, in the case of LLMs, in context learning is the closest thing I’ve seen to breaking free from the single-purpose nature of traditional ML/AI systems. It’s been interesting to watch for the past couple years because I still don’t think they’re “intelligent”, but it’s not just because they’re one trick ponies anymore. (So maybe the goalposts really are shifting?) I can’t quite articulate yet what I think is missing from current AI to bridge the gap.
> Meanwhile, humans in general are pretty terrible at exact, instantaneous arithmetic. But we aren’t claiming that computers are intelligent because they’re great at it.
"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra
> breaking free from the single-purpose nature of traditional ML/AI systems
it is really breaking free? so far LLMs in action seem to have a fairly limited scope -- there are a variety of purposes to which they can be applied but it's all essentially the same underlying task
People innately believe that intelligence isn't an algorithm. When a complex problem presents itself for the first time, people think "oh, this must be so complex that no algorithm can solve it, only AI," and when an algorithmic solution is found, people realise that the problem isn't that complex.
Indeed, if AI was an algorithm, imagine what would it feel like to be like one: at every step of your thinking process you are dragged by the iron hand of the algorithm, you have no agency in decision making, for every step is pre-determined already, and you're left the role of an observer. The algorithm leaves no room for intelligence.
Is that not the human experience? I have no “agency” over the next thought to pop into my head. I “feel” like I can choose where to focus attention, but that too is a predictable outcome arising from the integration of my embryology, memories, and recently reinforced behaviors. “I” am merely an observer of my own mental state.
But that is an uncomfortable idea for most people.
The other option you don't mention is "algorithms can solve it, but they do something different to what humans do". That's what happened with Go and Chess, for example.
I agree with you that people don't consider intelligence as fundamentally algorithmic. But I think the appeal of algorithmic intelligence comes from the fact that a lot of intelligent behaviours (strategic thinking, decomposing a problem into subproblems, planning) are (or at least feel) algorithmic.
it mostly depends on one's definition of an algorithm.
our brain is mostly scatter-gather with fuzzy pattern matching that loops back on itself. which is a nice loop, inputs feeding in, found patterns producing outputs and then it echoes back for some learning.
but of course most of it is noise, filtered out, most of the output is also just routine, most of the learning happens early when there's a big difference between the "echo" and the following inputs.
it's a huge self-referential state-machine. of course running it feels normal, because we have an internal model of ourselves, we ran it too, and if things are going as usual, it's giving the usual output. (and when the "baseline" is out of whack then even we have the psychopathologies.)
Exactly. Machine learning used to be AI and "AI-driven solutions" were peddled over a decade ago. Then that died down. Now suddenly every product has to once again be "powered by AI" (even if under the hood all you're running is a good 'ol SVM).
Interestingly, a big barrier to voice recognition is the same as with AI assistants - they don't understand context, and so they have a difficult time navigating messy inputs that you need assumed knowledge and contextual understanding for. Which is kinda the baseline for how humans communicate things to eachother with words in the first place.
Bert is already not an LLM and the vector embedding it generates are not AI. It is also first general solution for natural language search anyone has come up with. We call them vector databases. Again I'd wager this is because they actually work.
Yeah but honestly we all know LLMs are different then say some chess ai.
You can thank social media for dumbing down a human technological milestone in artificial intelligence. I bet if there was social media around when we landed on the moon you’d get a lot of self important people rolling their eyes at the whole thing too.
What about computer characters in video games? They're usually controlled by a system that everyone calls AI, but there's almost never any machine learning involved. What about fitting a line to a curve, i.e. linear regression? Definitely ML, but most people don't call it AI.
I think we are in the middle of a steep S-curve of technology innovation. It is far from plateauing and there are still a bunch of major innovations that are likely to shift things even further. Interesting time and these companies are riding a wild wave. It is likely some will actually win big, but most will die - similar to previous technology revolutions.
The ones that win will win not just on technology, but on talent retention, business relationships/partnerships, deep funding, marketing, etc. The whole package really. Losing is easy, miss out on one of these for a short period of time and you've easily lost.
There is no major moat, except great execution across all dimensions.
"There is, however, one enormous difference that I didn’t think about: You can’t build a cloud vendor overnight. Azure doesn’t have to worry about a few executives leaving and building a worldwide network of data centers in 18 months."
This isn't true at all. There are like 8 of these companies stood up in the last three or four years fueled by massive investment of sovereign funds - mostly the saudi, dubai, northern europe, etc. oil-derived funds - all spending billions of dollars doing exactly that and getting something done.
The real problem is the ROI on AI spending is.. pretty much zero. The commonly asserted use cases are the following:
Chatbots
Developer tools
RAG/search
Not a one of these is going to generate $10 of additional revenue per sollar spent, nor likely even $2. Optimizing your customer services representatives from 8 conversations at once to an average of 12 or 16 is going to save you a whopping $2 per hour per CSR. It just isn't huge money. And RAG has many, many issues with document permissions that make the current approaches bad for enterprises - where the money is - who as a group haven't spent much of anything to even make basic search work.
"The real problem is the ROI on AI spending is.. pretty much zero. The commonly asserted use cases are the following:
Chatbots Developer tools RAG/search"
I agree with you that ROI on _most_ AI spending is indeed poor, but AI is more than LLM's. Alas, what used to be called AI before the onset of the LLM era is not deemed sexy today, even though it can still make very good ROI when it is the appropriate tool for solving a problem.
While it might be possible for a deep-pocketed organization to spin up a cloud provider overnight, it doesn't mean that people will use it. In general, the switching cost of migrating compute infrastructure from one service to another is much higher than the switching cost of changing the LLM used for inference.
Amazon doesn't need to worry about suddenly losing its entire customer base to Alibaba, Yandex, or Oracle.
> The real problem is the ROI on AI spending is.. pretty much zero.
Companies in user acquisition/growth mode tend to have low internal ROI, but remember both Facebook and Google has the same issue -- then they introduced ads and all was well with their finances. Similar things will happen here.
> And RAG has many, many issues with document permissions
Why can't these providers access all documents and when answers are prompted, self-censor if the reply has references to documents that the end users do not have access permissions? In fact, I'm pretty sure that's how existing RAGaaS providers are handling document/file permissions.
> I think we are in the middle of a steep S-curve of technology innovation
We are? What innovation?
What do we need innovation for? What present societal problems can tech innovation possibly address? Surely none of the big ones, right? So then is it fit to call technological change - 'innovation'?
I'd agree that LLMs improve upon having to read Wikipedia for topics I'm interested in but would investing billions in Wikipedia and organizing human knowledge have produced a better outcome than relying on a magic LLM? Almost certainly, in my mind.
You see, people are pouring billions into LLMs and not Wikipedia not because it is a better product - but because they foresee a possibility of an abusive monopoly and that really excites them.
That's not innovation - that's more of the same anti-social behaviour that makes any meaningful innovation extremely difficult.
I'm not sure the Wikipedia example is a strong one as that site has it's own serious problems with "abusive monopolies" in its moderator cliques and biases (as with any social platform).
At least with the current big AI players there is the potential for differentiation through competition.
Unless there is some similar initiative with the Wikipedias, the problem of single supplier dominance is a difficult one to see as the way forward.
Nonetheless, Microsoft is firing up a nuclear reactor to power a new data center. My money is in the energy sector right now. Obvious boom coming with solar, nuclear and AI.
The car wasn't a horse that was better, but a car has not changed drastically since they went mainstream.
They've gotten better, more efficient, loaded with tech, but are still roughly 4 seats, 4 doors, 4 wheels, driven by petroleum.
I know that this is a massive oversimplification, but I think we have seen the "shape" of LLMs\Gen AI\AI products already and it's all incremental improvements from here on out with more specialization.
We are going to have SUVs, sports cars, and single seater cars, not flying cars. AI will be made more fit for purpose for more people to use, but isn't going to replace people outright in their jobs.
Feels like someone might have said this in 1981 about personal computers.
"We've pretty much seen their shape. The IBM PC isn't fundamentally very different from the Apple II. Probably it's just all incremental improvements from here on out."
The big missing thing between both the metaphor in the OP's link and yours is that I just can't fathom any of these companies being able to raise a paying subscriber base that can actually cover the outrageous costs of this tech. It feels like a pipe dream.
Putting aside that I fundamentally don't think AGI is in the tech tree of LLM, if you will, that there's no route from the latter to the former: even if there is, even if it takes, I dunno, ten years: I just don't think ChatGPT is a compelling enough product to fund about $70 billion in research costs. And sure, they aren't having to yet thanks to generous input from various commercial and private interests but like... if this is going to be a stable product at some point, analogous to something like AWS, doesn't it have to... actually make some money?
Like sure, I use ChatGPT now. I use the free version on their website and I have some fun with AI dungeon and occasionally use generative fill in Photoshop. I paid for AI dungeon (for awhile, until I realized their free models actually work better for how I like to play) but am now on the free version. I don't pay for ChatGPT's advanced models, because nothing I've seen in the trial makes it more compelling an offering than the free version. Adobe Firefly came to me free as an addon to my creative cloud subscription, but like, if Adobe increased the price, I'm not going to pay for it. I use it because they effectively gave it to me for free with my existing purchase. And I've played with Copilot a bit too, but honestly found it more annoying than useful and I'm certainly not paying for that either.
And I realize I am not everyone and obviously there are people out there paying for it (I know a few in fact!) but is there enough of those people ready to swipe cards for... fancy autocomplete? Text generation? Like... this stuff is neat. And that's about where I put it for myself: "it's neat." OpenAI supposedly has 3.9 million subscribers right now, and if those people had to foot that 7 billion annual spend to continue development, that's about $150 a month. This product has to get a LOT, LOT better before I personally am ready to drop a tenth of that, let alone that much.
And I realize this is all back-of-napkin math here but still: the expenses of these AI companies seem so completely out of step with anything approaching an actual paying user base, so hilariously outstripping even the investment they're getting from other established tech companies, that it makes me wonder how this is ever, ever going to make so much as a dime for all these investors.
In contrast, I never had a similar question about cars, or AWS. The pitch of AWS makes perfect sense: you get a server to use on the internet for whatever purpose, and you don't have to build the thing, you don't need to handle HVAC or space, you don't need a last-mile internet connection to maintain, and if you need more compute or storage or whatever, you move a slider instead of having to pop a case open and install a new hard drive. That's absolutely a win and people will pay for it. Who's paying for AI and why?
The car was very much a horse that was better though. It has replaced the horse ( or other draught animal ) and that's basically it. I'm not even sure it has brought fundamentally new and different use cases.
Kind of feels like the ride-sharing early days. Lots of capital being plowed into a handful of companies to grab market share. Economics don't really make sense in the short term because the vast majority of cash flows are still far in the future (Zero to One).
In the end the best funded company, Uber, is now the most valuable (~$150B). Lyft, the second best funded, is 30x smaller. Are there any other serious ride sharing companies left? None I know of, at least in the US (international scene could be different).
I don't know how the AI rush will work out, but I'd bet there will be some winners and that the best capitalized will have a strong advantage. Big difference this time is that established tech giants are in the race, so I don't know if there will be a startup or Google at the top of the heap.
I also think that there could be more opportunities for differentiation in this market. Internet models will only get you so far and proprietary data will become more important potentially leading to knowledge/capability specialization by provider. We already see some differentiation based on coding, math, creativity, context length, tool use, etc.
Uber is not really a tech company though - its moat is not technology but market domination. If it, along with all of its competitors were to disappear tomorrow, the power vacuum would be filled in very short order, as the core technology is not very hard to master.
It's a fundamentally different beast from AI companies.
How is it not a tech company? They're literally trying to approximate TSP in the way that makes them money. In addition, they're constantly optimizing for surge pricing to maximize ROI. What kind of problems do you think those are?
Is Uber profitable already or are they waiting for another order of magnitude increase in scale before they bother with that?
Amazon is the poster-child of that mentality. It spent more than it earned into growth for more than 20 years, got a monopoly on retail, and still isn't the most profitable retail company around.
Uber the company was profitable last year, for the first time[1].
But I am doubtful that the larger enterprise that is Uber (including all the drivers and their expenses and vehicle depreciation, etc) was profitable. I haven't seen that analysis.
This article, and all the articles like it, are missing most of the puzzle.
Models don’t just compete on capability. Over the last year we’ve seen models and vendors differentiate along a number of lines in addition to capability:
- Safety
- UX
- Multi-modality
- Reliability
- Embeddability
And much more. Customers care about capability, but that’s like saying car owners care about horsepower — it’s a part of the choice but not the only piece.
One somewhat obsessive customer here: I pay for and use Claude, ChatGPT, Gemini, Perplexity, and one or two others.
The UX differences among the models are indeed becoming clearer and more important. Claude’s Artifacts and Projects are really handy as is ChatGPT’s Advanced Voice mode. Perplexity is great when I need a summary of recent events. Google isn’t charging for it yet, but NotebookLM is very useful in its own way as well.
When I test the underlying models directly, it’s hard for me to be sure which is better for my purposes. But those add-on features make a clear differentiation between the providers, and I can easily see consumers choosing one or another based on them.
I haven’t been following recent developments in the companies’ APIs, but I imagine that they are trying to differentiate themselves there as well.
To me, the vast majority of "consumers" as in B2C only care about price, specifically free. Pro and enterprise customers may be more focused on the capabilities you listed, but the B2C crowd is vastly in the free tier only space when it comes to GenAI.
This is like when VCs were funding all kinds of ride share, bike share, food delivery, cannabis delivery, and burning money so everyone gets subsidized stuff while the market figures out wtf is going on.
Yep, you will probably lose. The VCs aren't out there to advance the technology. They are there to lay down bets on who's going to be the winner. "Winner" has little to do with quality, and rides much more on being the one that just happens to resonate with people.
The ones without money will usually lose because they get less opportunity to get in front of eyeballs. Occasionally they manage it anyway, because despite the myth that the VCs love to tell, they aren't really great at finding and promulgating the best tech.
> when VCs were funding all kinds of ride share, bike share, food delivery, cannabis delivery, and burning money so everyone gets subsidized stuff while the market figures out wtf is going on
I’m reminded of slime molds solving mazes [1]. In essence, VC allows entrepreneurs to explore the solution space aggressively. Once solutions are found, resources are trimmed.
I'm already keeping an eye on what NVidia gets into next... because that will inevitably be the "Next big thing". This is the third(ish) round of this pattern that I can recall, I'm probably wrong about the exact count, but NVidia is really good at figuring out how to be powering the "Next big thing". So alternatively... I should probably invest in the utilities powering whatever Datacenters are using the powerhungry monsters at the center of it all.
One thing I'm not clear on is how much of this is cause and how much effect: that is, does NVidia cheerleading for something make it more popular with the tech press and then everyone else too? There are definitely large parts of the tech press that serve more as stenographers than as skeptical reporters, and so I'm not sure how much is NVidia picking the right next big thing and how much is NVidia announcing the next big thing to the rest of us?
That's exactly the short term thinking they're hoping they can use to distract.
Tech companies purchased television away from legacy media companies and added (1) unskippable ads, (2) surveillance, (3) censorship and revocation of media you don't physically own, and now they're testing (4) ads while shows are paused.
Where I live the ridesharing/delivering startups didn't bring goodies, they just made everything worse.
They destroyed the Taxi industry, I used to be able to just walk out to the taxi rank and get in the first taxi, but not anymore. Now I have to organize it on an app or with a phone call to a robot, then wait for the car to arrive, and finally I have to find the car among all the others that other people called.
Food delivery used to be done by the restaurants own delivery staff, it was fast, reliable and often free if ordering for 2+ people. Now it always costs extra, and there are even more fees if I want the food while it's still hot. Zero care is taken with the delivery, food/drinks are not kept upright and can be a total mess on arrival. Sometimes it's escaped the container and is just in the plastic bag. I have ended up preferring to go pickup food myself over getting it delivered, even when I have a migraine, it's just gone to shit.
I assume you are talking about airports. Guess what, they still exist in many places. And on the other hand, for US, other than a few big cities, the "normal" taxi experience is that you call a number and maybe a taxi shows up in half an hour. With Uber, that becomes 10 minutes or less, with live map updates. Give me that and I'll be happy to forget about Uber.
For where I live (Asia), I disagree with both of these examples.
Getting a taxi was awful before ride-sharing apps. You'd have to walk to a taxi stop, or wait on the side of the road and hope you could hail one. Once the ride-sharing apps came in, suddenly getting a ride became a lot simpler. Our taxi companies are still alive, though they have their own apps now -- something that wouldn't have happened without competition -- and they also work together with the ride-hailing companies as a provider. You could still hail taxis or get them from stops too, though that isn't recommended given that they might try to run the meter by taking a longer route.
For food delivery, before the apps, most places didn't deliver food. Nowadays, more places deliver. Even if a place already had their own delivery drivers, they didn't get rid of them. We get a choice, to use the app or to use the restaurant's own delivery. Usually the app is better for smaller meals since it has a lower minimum order amount, but the restaurant provides faster delivery for bigger orders.
taxis were the greatest example of regulatory capture. the post-event/airport Uber pickup situation is stupid and has obvious fixes, but, again, that's the taxicab regulatory capture where Uber has to thread a needle in order to not be a taxi, for them to legally operate. if we could clean slate, and make a working system, that would be great but we can't, because of the taxicab regulatory commission.
I can now summon a cab from the comfort of the phone I'm holding, and know that they'll accept my credit card. I know the price before I get in and I know the route they should take. I'm not going to get taken for an unnecessary scenic tourist surcharge detour.
people don't like feeling they got cheated, and pre-uber, taxis did that all the time.
Add the abuse of gig workers, expansion of the toxic tipping culture, increase in job count but reduction in pay, concentration of wealth in fewer hands.
These rideshare and delivery companies are disgusting and terrible.
It seems very difficult to build a moat around a product when the product is supposed to be a generally capable tool and the input is English text. The more truly generally intelligent these models get the more interchangeable they become. It's too easy to swap one out for another.
Humans are the ultimate generally intelligent agents available on this planet. Even though most of them (us) are replaceable for mundane tasks, quite some are unique enough so that people seek their particular services and no one else's. And this is among the pool of about eight billion such agents.
> Even though most of them (us) are replaceable for mundane tasks, quite some are unique enough so that people seek their particular services and no one else's.
Very few people manage that - indeed I can't think of anyone. Even movie stars get replaced with other movie stars if they try to charge too much. Certainly everyone in the tech industry (including the CEOs, the VCs, the investors etc.) has a viable substitute.
The moat is/will be the virtuous cycle of feeding user usage back into the model like it is for Google. That historically has been a powerful tool and it's something thats nearly impossible to get as a newcomer to the marketplace.
I see 2 paths:
- Consumers - the Google way: search and advertise to consumers
- Businesses - the AWS way: attrack businesses to use your API and lock them in
The first is fickle. Will OpenAI become the door to the Internet? You'll need people to stop using Google Search and rely on ChatGPT for that to happen. Will become a commodity. Short term you can charge a subscription but long term will most likely become a commondity with advertising.
The second is tangible. My company is plugged directly to the OpenAI API. We build on it. Still very early and not so robust. But getting better and cheaper and faster over time. Active development. No reason to switch to something else as long as OpenAI leads the pack.
20 years ago people asked that exact question. E-Commerce emerged. People knew the physical process of buying things would move online. Took some time. Sure, more things emerged but monetizing the Internet still remains about selling you something.
My guess would be using "AI" to increase/enhance sales with your existing processes. Pay for this product, get 20% increased sales, ad revenue, yada yada.
Sure it does. Ask any common mortal about AI and they'll mention ChatGPT - not Claude, Gemini or whatever else. They might not even know OpenAI. But they do know ChatGPT.
Has it become a verb yet? Waiting to peole to replace "I googled how to..." with "I chatgpted how to...".
There would need to be significant capabilities that openai doesn't have or wouldn't be built on a short-ish timeline to have the enterprise switch. There's tons of bureaucratic work going on behind the scenes to approve a new vendor.
We talk about scaling laws, superintelligence, AGI etc. But there is another threshold - the ability for humans to leverage super-intelligence. It's just incredibly hard to innovate on products that fully leverage superintelligence.
At some point, AI needs to connect with the real world to deliver economically valuable output. The ratelimiting step is there. Not smarter models.
In my mind, already with GPT-4, we're not generating ideas fast enough on how best to leverage it.
Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
We're seeing much less of "it's making mistakes" these days.
If we have open-source models that match up to GPT-4 on AWS / Azure etc, not much point to go with players like OpenAI / Anthropic who may have even smarter models. We can't even use the dumber models fully.
Your paycheck depends on people believing the hype. Therefore, anything you say about "superintelligence" (LOL) is pretty suspect.
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
So, what, you're going to build a model to instruct the model? And how do we instruct that model?
This is such a transparent scam, I'm embarrassed on behalf of our species.
Some examples to illustrate the point: supply chain risks and an ever increasing amount of dependencies (look at your average React project, though this applies to most stacks), overly abstracted frameworks (how many CPU cycles Spring Boot and others burn and how many hoops you have to jump through to get thigns done), patterns that mess up the DBs ability to optimize queries sometimes (EAV, OTLT, trying to create polymorphic foreign keys), inefficient data fetching (sometimes ORMs, sometimes N+1), bad security practices (committed secrets, anyone? bad usage of OAuth2 or OIDC?), overly complex tooling, especially the likes of Kubernetes when you have a DevOps team of one part time dev, overly complex application architectures where you have more more services than developers (not even teams). That's before you even get into the utter mess of long term projects that have been touched by dozens of developers over the years and the whole sector sometimes feeling like wild west, as opposed to "real engineering".
That's why articles like this ring true: http://www.stilldrinking.org/programming-sucks
However, the difference here is that I wouldn't overwhelm anyone who might give me money with rants about this stuff and would navigate around those issues and risks as best I can, to ship something useful at the end of the day. Same with having constructive discussions about any of those aspects in a circle of technical individuals, on how to make things better.
Calling the whole concept a "scam" doesn't do anyone any good, when I already derive value from the LLMs, as do many others. Look at https://www.cursor.com/ for example and consider where we might be in 10-20 years. Not AGI, but maybe good auto-complete, codegen and reasoning about entire codebases, even if they're hundreds of thousands of lines long. Tooling that would make anyone using it more productive than those who don't. Unless the funding dries up and the status quo is restored.
Ehm yes. That's because it actually doesn't work as well as the hype suggests, not because it's too "high bandwidth".
Which tells me the problem may be fundamental, not a technical one. It's not just a matter of needing "more intelligence". I don't question the intelligence or skill of the people on the outsourced team I was working with. The problem was simple communication. They didn't really know or understand our business and its goals well enough to anticipate all sorts of little things, and the lack of constant social interaction of the type you typically get when everybody's a direct coworker meant we couldn't build that mind-meld over time, either. So we had to pick up the slack with massive over-specification.
Remember the old bit about the media-- the stories are always 100% infalliable except strangely in YOUR personal field of expertise.
I suspect it's something similar with AI products.
People test them with toy problems -- "Hey ChatGPT, what's the square root of 36", and then with something close to their core knowledge.
It might learn to solve a lot of the toy problems, but plenty of us are still seeing a lot of hallucinations in the "core knowledge" questions. But we see people then taking that product-- that they know isn't good in at least one vertical-- and trying to apply it to other contexts, where they may be less qualified to validate if the answer is right.
For me, the number of times where it's led me down a hallucinated, impossible, or thoroughly invalid rabbit hole have been relatively minimal when compared against the number of times when it has significantly helped. I really do think the key is in how you use them, for what types of problems/domains, and having an approach that maximizes your ability to catch issues early.
Gell-Mann amnesia:
https://en.wikipedia.org/wiki/Michael_Crichton#GellMannAmnes...
Perhaps less than before, but still making very fundamental errors. Anything involving number I'm automatically suspicious. Pretty frequently I'd get different answers for the same question (to a human).
e.g. ChatGPT will give an effective tax rate of n for some income amount. Then when asked to break down the calculation will come up with an effective tax rate of m instead. When asked how much tax is owed on that income will come up with a different number such that the effective rate is not n or m.
Until this is addressed to a sufficient degree, it seems difficult to apply to anything that involves numbers and can't be quickly verified by a human.
But. Try this approach instead: have it generate python code, with print statements before every bit of math it performs. It will write pretty good code, which you then execute to generate the actual answer.
Simpler example: paste in a paragraph of text, ask it to count the number of words. The answer will be incorrect most of the time.
Instead, ask it to out each word in the text in a numbered list and then output the word count. It will be correct almost always.
My anecdotal learning from this:
LLMs are pretty human-like in their mental abilities. I wouldn't be able to simply look at some text and give you an accurate word count. I would point my finger / cursor to every word and count up.
The solutions above are basically giving LLMs some additional techniques or tools, very similar to how a human may use a calculator, or count words.
In the products we've built, there is an AI feature that generates aggregations of spreadsheet data. We have a dual unittest & aggregator loop to generate correct values.
The first step is to generate some unittests. And in order to generate correct numerical data for unittests, we ask it to write some code with math expressions first. We interpret the expressions, and paste it back into the unittest generator - which then writes the unittests with the correct inputs / outputs.
Then the aggregation generator then generates code until the generated unittests pass completely. Then we have the code for the aggregator function that we can run against the spreadsheet.
Takes a couple of minutes, but pretty bulletproof and also generalizable to other complex math calculations.
Yes.
Suppose someone developed a way to get a reliable confidence metric out of an LLM. Given that, much more useful systems can be built.
Only high-confidence outputs can be used to initiate action. For low-confidence outputs, chain of reasoning tactics can be tried. Ask for a simpler question. Ask the LLM to divide the question into sub-questions. Ask the LLM what information it needs to answer the question, and try to get that info from a search engine. Most of the strategies humans and organizations use when they don't know something will work for LLMs. The goal is to get an all high confidence chain of reasoning.
If only they knew when they didn't know something.
There's research on this.[4] No really good results yet, but some progress. Biggest unsolved problem in computing today.
[4] https://hungleai.substack.com/p/uncertainty-confidence-and-h...
I'm not keen on marketing words like "superintelligence" but boiling it down that's what in my mind the OP said. These systems are limited in ways that we do not yet fully appreciate. They are not silver bullets for all or maybe even many problems. We need to figure out where they can be deployed for greater benefit.
Totally. They are large language models, not math models.
I think the problem is that 'some people' overhype them as universal tools to solve any problem, and to answer any question. But really, LLMs excel in generating pretty regular text.
it does 2 things. 1 tells me how deep i am in the conversation and 2 when the computation falls apart, i can assume other things in its response will be trash as well. and sometimes number 3 how well the software is working. ranges from 20$ to 35$ on average but a couple days they go deep to 45$ (4,7,9 responses in chat session)
today i learned i can knock the computation loose in a session somewhere around the 4th reply or $20 by injecting a random number in my course work and it was latching onto the number instead of computing $
Is this because it's actually making less mistakes, or is it just because most people have used it enough now to know not to bother with anything complex?
Once we have actual super intelligence there is no need for humans to innovate anymore. It is by definition better than us anyway.
I guess you could still have artisanal innovation
It's a token prediction machine. We've already generated most of the ideas for it, and hardly any of them work because see below
> Getting AI to do work involves getting AI to understand what needs to be done from highly bandwidth constrained humans using mouse / keyboard / voice to communicate.
No. Gettin AI to work you need to make an AI, and not a token prediction machine which, however wonderful:
- does not understand what it is it's generating, and approaches generating code the same way it approaches generating haikus
- hallucinates and generates invalid data
> Anyone using a chatbot already has felt the frustration of "it doesn't get what I want". And also "I have to explain so much that I might as well just do it myself"
Indeed. Instead of asking why, you're wildly fantasizing about running out of ideas and pretending you can make this work through other means of communication.
The model is doing the exact same thing when it generates "correct" output as it does when it generates "incorrect" output.
"Hallucination" is a misleading term, cooked up by people who either don't understand what's going on or who want to make it sound like the fundamental problems (models aren't intelligent, can't reason, and attach zero meaning to their input or output) can be solved with enough duct tape.
I find this funny because for what I use ChatGPT for - asking programming questions that would otherwise go to Google/StackOverflow - I have a much better time writing queries for ChatGPT than Google, and getting useful results back.
Google will so often return StackOverflow results that are for subtly very different questions, or ill have to squint hard to figure out how to apply that answer to my problem. When using ChatGPT, i rarely have to think about how other people asked the question.
This is just another way of saying this technology, like Blockchain before it, is a solution in search of a problem.
SOTA models are impressive, as is the idea of building AGIs that do everything for us, but in the meantime there are a lot of practical applications of the open source and smaller models that are being missed out on in my opinion.
I also think business is going to struggle to adapt and existing business is at a disadvantage for deploying AI tools, after all, who wants to replace themselves and lose their salary? Its a personal incentive not to leverage AI at the corporate level.
the problem is really, can it learn "I need to turn this over to a human because it is such an edge case that there will not be an automated solution."
Makes sense.
This is the main bottle neck, in my kind. A lot of people are missing from the conversation because they don't understand AI fully. I keep getting glimpses of ideas and possibilities and chatting through a browser ain't one of them. On e we have more young people trained on this and comfortable with the tech and understanding it, and existing professionals have light bulbs go off in their heads as they try to integrate local LLMs, then real changes are going to hit hard and fast. This is just a lot to digest right now and the tech is truly exponential which makes it difficult to ideate right now. We are still enveloping the productivity boost from chatting.
I tried explaining how this stuff works to product owners and architects and that we can integrate local LLMs into existing products. Everyone shook their head and agreed. When I posted a demo in chat a few weeks later you would have thought the CEO called them on their personal phone and told them to get on this shit. My boss spent the next two weeks day and night working up a demo and presentation for his bosses. It went from zero to 100kph instantly.
I can benchmark the quality of one LLM's translation by asking another to critique it. It's not infallible, but the ability to chat with a multilingual agent is brilliant.
It's a new tool in the toolbox, one that we haven't had in our seventy years of working on computers, and we have seventy years of catchup to do working out where we can apply them.
It's also just such a radical departure from what computers are "meant" to be good at. They're bad at mathematics, forgetful, imprecise, and yet they're incredible at poetry and soft tasks.
Oh - and they are genuinely useful for studying, too. My A Level Physics contained a lot of multiple choice questions, which were specifically designed to catch people out on incorrect intuitions and had no mark scheme beyond which answer was correct. I could just give gpt-4o a photo of the practice paper and it'd tell me not just the correct answer (which I already knew), but why it was correct, and precisely where my mental model was incorrect.
Sure, I could've asked my teacher, and sometimes I did. But she's busy with twenty other students. If everyone asked for help with every little problem she'd be unable to do anything else. But LLMs have infinite patience, and no guilt for asking stupid questions!
Won't that be fun.
We have a new god (superintelligence) but only the special can see it because it's too intelligent for people to interact with it.
It's so advanced all the common ways humans communicate "using mouse / keyboard / voice" don't work
The reason we've never seen it help us is we need more ideas (prayers) first.
NPC brains are really hard to understand I will say that and they use a lot of electricity.
Logic programming? AI until SQL came out. Now it's not AI.
OCR, computer algebra systems, voice recognition, checkers, machine translation, go, natural language search.
All solved, all not AI any more yet all were AI before they got solved by AI researchers.
There's even a name for it: https://en.m.wikipedia.org/wiki/AI_effect?utm_source=perplex...
> As soon as definite knowledge concerning any subject becomes possible, this subject ceases to be called philosophy, and becomes a separate science.
I'm not actually sure I agree with it, especially in light of less provable schools of science like string theory or some branches of economics, but it's a great idea.
Deleted Comment
Deleted Comment
So, what are good examples of some things that we used to call AI, which we don't call AI anymore because they work? All the examples that come to my mind (recommendation engines, etc.) do not have any real societal benefits.
A hundred years ago computers were 'electronic brains'.
It goes dormant after loosing its edge and then re-emerges when some research project gains traction again.
logic programming is not directly linked to SQL, and has its own AI term now: https://en.wikipedia.org/wiki/GOFAI.
Humans are intelligent + humans play go => playing go is intelligent
Humans are intelligent + humans do algebra => doing algebra is intelligent
Meanwhile, humans in general are pretty terrible at exact, instantaneous arithmetic. But we aren’t claiming that computers are intelligent because they’re great at it.
Building a machine that does a narrowly defined task better than a human is an achievement, but it’s not intelligence.
Although, in the case of LLMs, in context learning is the closest thing I’ve seen to breaking free from the single-purpose nature of traditional ML/AI systems. It’s been interesting to watch for the past couple years because I still don’t think they’re “intelligent”, but it’s not just because they’re one trick ponies anymore. (So maybe the goalposts really are shifting?) I can’t quite articulate yet what I think is missing from current AI to bridge the gap.
"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." - Edsger Dijkstra
it is really breaking free? so far LLMs in action seem to have a fairly limited scope -- there are a variety of purposes to which they can be applied but it's all essentially the same underlying task
Indeed, if AI was an algorithm, imagine what would it feel like to be like one: at every step of your thinking process you are dragged by the iron hand of the algorithm, you have no agency in decision making, for every step is pre-determined already, and you're left the role of an observer. The algorithm leaves no room for intelligence.
But that is an uncomfortable idea for most people.
I agree with you that people don't consider intelligence as fundamentally algorithmic. But I think the appeal of algorithmic intelligence comes from the fact that a lot of intelligent behaviours (strategic thinking, decomposing a problem into subproblems, planning) are (or at least feel) algorithmic.
our brain is mostly scatter-gather with fuzzy pattern matching that loops back on itself. which is a nice loop, inputs feeding in, found patterns producing outputs and then it echoes back for some learning.
but of course most of it is noise, filtered out, most of the output is also just routine, most of the learning happens early when there's a big difference between the "echo" and the following inputs.
it's a huge self-referential state-machine. of course running it feels normal, because we have an internal model of ourselves, we ran it too, and if things are going as usual, it's giving the usual output. (and when the "baseline" is out of whack then even we have the psychopathologies.)
https://www.youtube.com/watch?v=-rxXoiQyQVc
But it may be a while.
[1] https://taylor.town/synthetic-intelligence
I don't doubt that there were marketing people wanting to attach an AI label to them, but that's just marketing BS.
You can thank social media for dumbing down a human technological milestone in artificial intelligence. I bet if there was social media around when we landed on the moon you’d get a lot of self important people rolling their eyes at the whole thing too.
AI = machine learning
Yes, that makes a convolutional net trained on recognizing digits an AI.
The ones that win will win not just on technology, but on talent retention, business relationships/partnerships, deep funding, marketing, etc. The whole package really. Losing is easy, miss out on one of these for a short period of time and you've easily lost.
There is no major moat, except great execution across all dimensions.
"There is, however, one enormous difference that I didn’t think about: You can’t build a cloud vendor overnight. Azure doesn’t have to worry about a few executives leaving and building a worldwide network of data centers in 18 months."
This isn't true at all. There are like 8 of these companies stood up in the last three or four years fueled by massive investment of sovereign funds - mostly the saudi, dubai, northern europe, etc. oil-derived funds - all spending billions of dollars doing exactly that and getting something done.
The real problem is the ROI on AI spending is.. pretty much zero. The commonly asserted use cases are the following:
Chatbots Developer tools RAG/search
Not a one of these is going to generate $10 of additional revenue per sollar spent, nor likely even $2. Optimizing your customer services representatives from 8 conversations at once to an average of 12 or 16 is going to save you a whopping $2 per hour per CSR. It just isn't huge money. And RAG has many, many issues with document permissions that make the current approaches bad for enterprises - where the money is - who as a group haven't spent much of anything to even make basic search work.
I agree with you that ROI on _most_ AI spending is indeed poor, but AI is more than LLM's. Alas, what used to be called AI before the onset of the LLM era is not deemed sexy today, even though it can still make very good ROI when it is the appropriate tool for solving a problem.
Amazon doesn't need to worry about suddenly losing its entire customer base to Alibaba, Yandex, or Oracle.
Companies in user acquisition/growth mode tend to have low internal ROI, but remember both Facebook and Google has the same issue -- then they introduced ads and all was well with their finances. Similar things will happen here.
Why can't these providers access all documents and when answers are prompted, self-censor if the reply has references to documents that the end users do not have access permissions? In fact, I'm pretty sure that's how existing RAGaaS providers are handling document/file permissions.
We are? What innovation?
What do we need innovation for? What present societal problems can tech innovation possibly address? Surely none of the big ones, right? So then is it fit to call technological change - 'innovation'?
I'd agree that LLMs improve upon having to read Wikipedia for topics I'm interested in but would investing billions in Wikipedia and organizing human knowledge have produced a better outcome than relying on a magic LLM? Almost certainly, in my mind.
You see, people are pouring billions into LLMs and not Wikipedia not because it is a better product - but because they foresee a possibility of an abusive monopoly and that really excites them.
That's not innovation - that's more of the same anti-social behaviour that makes any meaningful innovation extremely difficult.
At least with the current big AI players there is the potential for differentiation through competition.
Unless there is some similar initiative with the Wikipedias, the problem of single supplier dominance is a difficult one to see as the way forward.
They've gotten better, more efficient, loaded with tech, but are still roughly 4 seats, 4 doors, 4 wheels, driven by petroleum.
I know that this is a massive oversimplification, but I think we have seen the "shape" of LLMs\Gen AI\AI products already and it's all incremental improvements from here on out with more specialization.
We are going to have SUVs, sports cars, and single seater cars, not flying cars. AI will be made more fit for purpose for more people to use, but isn't going to replace people outright in their jobs.
"We've pretty much seen their shape. The IBM PC isn't fundamentally very different from the Apple II. Probably it's just all incremental improvements from here on out."
Putting aside that I fundamentally don't think AGI is in the tech tree of LLM, if you will, that there's no route from the latter to the former: even if there is, even if it takes, I dunno, ten years: I just don't think ChatGPT is a compelling enough product to fund about $70 billion in research costs. And sure, they aren't having to yet thanks to generous input from various commercial and private interests but like... if this is going to be a stable product at some point, analogous to something like AWS, doesn't it have to... actually make some money?
Like sure, I use ChatGPT now. I use the free version on their website and I have some fun with AI dungeon and occasionally use generative fill in Photoshop. I paid for AI dungeon (for awhile, until I realized their free models actually work better for how I like to play) but am now on the free version. I don't pay for ChatGPT's advanced models, because nothing I've seen in the trial makes it more compelling an offering than the free version. Adobe Firefly came to me free as an addon to my creative cloud subscription, but like, if Adobe increased the price, I'm not going to pay for it. I use it because they effectively gave it to me for free with my existing purchase. And I've played with Copilot a bit too, but honestly found it more annoying than useful and I'm certainly not paying for that either.
And I realize I am not everyone and obviously there are people out there paying for it (I know a few in fact!) but is there enough of those people ready to swipe cards for... fancy autocomplete? Text generation? Like... this stuff is neat. And that's about where I put it for myself: "it's neat." OpenAI supposedly has 3.9 million subscribers right now, and if those people had to foot that 7 billion annual spend to continue development, that's about $150 a month. This product has to get a LOT, LOT better before I personally am ready to drop a tenth of that, let alone that much.
And I realize this is all back-of-napkin math here but still: the expenses of these AI companies seem so completely out of step with anything approaching an actual paying user base, so hilariously outstripping even the investment they're getting from other established tech companies, that it makes me wonder how this is ever, ever going to make so much as a dime for all these investors.
In contrast, I never had a similar question about cars, or AWS. The pitch of AWS makes perfect sense: you get a server to use on the internet for whatever purpose, and you don't have to build the thing, you don't need to handle HVAC or space, you don't need a last-mile internet connection to maintain, and if you need more compute or storage or whatever, you move a slider instead of having to pop a case open and install a new hard drive. That's absolutely a win and people will pay for it. Who's paying for AI and why?
The fact that human intelligence exists means that the idea of human level intelligence is not a pipe dream.
The question is whether or not the basic underlying technology of the LLM can achieve that level of intelligence.
In the end the best funded company, Uber, is now the most valuable (~$150B). Lyft, the second best funded, is 30x smaller. Are there any other serious ride sharing companies left? None I know of, at least in the US (international scene could be different).
I don't know how the AI rush will work out, but I'd bet there will be some winners and that the best capitalized will have a strong advantage. Big difference this time is that established tech giants are in the race, so I don't know if there will be a startup or Google at the top of the heap.
I also think that there could be more opportunities for differentiation in this market. Internet models will only get you so far and proprietary data will become more important potentially leading to knowledge/capability specialization by provider. We already see some differentiation based on coding, math, creativity, context length, tool use, etc.
It's a fundamentally different beast from AI companies.
Amazon is the poster-child of that mentality. It spent more than it earned into growth for more than 20 years, got a monopoly on retail, and still isn't the most profitable retail company around.
But I am doubtful that the larger enterprise that is Uber (including all the drivers and their expenses and vehicle depreciation, etc) was profitable. I haven't seen that analysis.
[1] https://www.theverge.com/2024/2/8/24065999/uber-earnings-pro...
AI offers WAAAAYYYYY more money in the future.
Models don’t just compete on capability. Over the last year we’ve seen models and vendors differentiate along a number of lines in addition to capability:
- Safety
- UX
- Multi-modality
- Reliability
- Embeddability
And much more. Customers care about capability, but that’s like saying car owners care about horsepower — it’s a part of the choice but not the only piece.
The UX differences among the models are indeed becoming clearer and more important. Claude’s Artifacts and Projects are really handy as is ChatGPT’s Advanced Voice mode. Perplexity is great when I need a summary of recent events. Google isn’t charging for it yet, but NotebookLM is very useful in its own way as well.
When I test the underlying models directly, it’s hard for me to be sure which is better for my purposes. But those add-on features make a clear differentiation between the providers, and I can easily see consumers choosing one or another based on them.
I haven’t been following recent developments in the companies’ APIs, but I imagine that they are trying to differentiate themselves there as well.
I love it. More goodies for us
The ones without money will usually lose because they get less opportunity to get in front of eyeballs. Occasionally they manage it anyway, because despite the myth that the VCs love to tell, they aren't really great at finding and promulgating the best tech.
LLMs are capital intensive. They’re a natural fit for financing.
I’m reminded of slime molds solving mazes [1]. In essence, VC allows entrepreneurs to explore the solution space aggressively. Once solutions are found, resources are trimmed.
[1] https://www.mbl.edu/news/how-can-slime-mold-solve-maze-physi...
Except for all the others.
Getting lucky twice is a row is really really lucky. Getting lucky three times in a row is not more likely because they were lucky two times in a row
Tech companies purchased television away from legacy media companies and added (1) unskippable ads, (2) surveillance, (3) censorship and revocation of media you don't physically own, and now they're testing (4) ads while shows are paused.
There's no excuse for getting fooled again.
They destroyed the Taxi industry, I used to be able to just walk out to the taxi rank and get in the first taxi, but not anymore. Now I have to organize it on an app or with a phone call to a robot, then wait for the car to arrive, and finally I have to find the car among all the others that other people called.
Food delivery used to be done by the restaurants own delivery staff, it was fast, reliable and often free if ordering for 2+ people. Now it always costs extra, and there are even more fees if I want the food while it's still hot. Zero care is taken with the delivery, food/drinks are not kept upright and can be a total mess on arrival. Sometimes it's escaped the container and is just in the plastic bag. I have ended up preferring to go pickup food myself over getting it delivered, even when I have a migraine, it's just gone to shit.
I assume you are talking about airports. Guess what, they still exist in many places. And on the other hand, for US, other than a few big cities, the "normal" taxi experience is that you call a number and maybe a taxi shows up in half an hour. With Uber, that becomes 10 minutes or less, with live map updates. Give me that and I'll be happy to forget about Uber.
Getting a taxi was awful before ride-sharing apps. You'd have to walk to a taxi stop, or wait on the side of the road and hope you could hail one. Once the ride-sharing apps came in, suddenly getting a ride became a lot simpler. Our taxi companies are still alive, though they have their own apps now -- something that wouldn't have happened without competition -- and they also work together with the ride-hailing companies as a provider. You could still hail taxis or get them from stops too, though that isn't recommended given that they might try to run the meter by taking a longer route.
For food delivery, before the apps, most places didn't deliver food. Nowadays, more places deliver. Even if a place already had their own delivery drivers, they didn't get rid of them. We get a choice, to use the app or to use the restaurant's own delivery. Usually the app is better for smaller meals since it has a lower minimum order amount, but the restaurant provides faster delivery for bigger orders.
I can now summon a cab from the comfort of the phone I'm holding, and know that they'll accept my credit card. I know the price before I get in and I know the route they should take. I'm not going to get taken for an unnecessary scenic tourist surcharge detour.
people don't like feeling they got cheated, and pre-uber, taxis did that all the time.
In every city I have lived, that is a good thing, despite all the bad things ridesharing startups may have done.
These rideshare and delivery companies are disgusting and terrible.
Leaves you in a worse position than you started with.
Monetizing all of this is frankly...not my problem.
Very few people manage that - indeed I can't think of anyone. Even movie stars get replaced with other movie stars if they try to charge too much. Certainly everyone in the tech industry (including the CEOs, the VCs, the investors etc.) has a viable substitute.
I see 2 paths: - Consumers - the Google way: search and advertise to consumers - Businesses - the AWS way: attrack businesses to use your API and lock them in
The first is fickle. Will OpenAI become the door to the Internet? You'll need people to stop using Google Search and rely on ChatGPT for that to happen. Will become a commodity. Short term you can charge a subscription but long term will most likely become a commondity with advertising.
The second is tangible. My company is plugged directly to the OpenAI API. We build on it. Still very early and not so robust. But getting better and cheaper and faster over time. Active development. No reason to switch to something else as long as OpenAI leads the pack.
There are so many ways, it makes the question seem nonsensical.
Ways to monetize AI so far:
Metered APIs (OpenAI and others)
Subscription products built on it (Copilot, ChatGPT, etc.)
Using it as a feature to give products a competitive edge (Apple Intelligence, Tesla FSD)
Selling the hardware (Nvidia)
What similar parallel can we think of for AI?
Has it become a verb yet? Waiting to peole to replace "I googled how to..." with "I chatgpted how to...".