There's only one core problem in AI worth solving for most startups building AI powered software: context.
No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules.
No LLM is going to know enough about some new drug in a pharma's pipeline, for example, because it doesn't know about the internal resources spread across multiple systems in an enterprise. (And if you've ever done a systems integration in any sufficiently large enterprise, you know that this is a "people problem" and usually not a technical problem).
I think the startups that succeed will understand that it all comes down to classic ETL: identify the source data, understand how to navigate systems integration, pre-process and organize the knowledge, train or fine-tune a model or have the right retrieval model to provide the context.
There's fundamentally no other way. AI is not magic; it can't know about trial ID 1354.006 except for what it was trained on and what it can search for. Even coding assistants like Cursor are really solving a problem of ETL/context and will always be. The code generation is the smaller part; getting it right requires providing the appropriate context.
This is why I strongly suspect that AI will not play out the way the Web did (upstarts unseat giants) and will instead play out like smartphones (giants entrench and balloon).
If all that matters is what you can put into context, then AI really isn't a product in most cases. The people selling models are actually just selling compute, so that space will be owned by the big clouds. The people selling applications are actually just packaging data, so that space will be owned by the people who already have big data in their segment: the big players in each industry. All competitors at this point know how important data is, and they're not going to sell it to a startup when they could package it up themselves. And most companies will prefer to just use features provided by the B2B companies they already trust, not trust a brand new company with all the same data.
I fully expect that almost all of the AI wins will take the form of features embedded in existing products that already have the data (like GitHub with Copilot), not brand new startups who have to try to convince companies to give them all their data for the first time.
Yup. And it’s already playing out that way. Anthropic, OpenAI, Gemini - technically not an upstart. All have hyperscalers backing and subsidizing their model training (AWS, Azure, GCP, respectively). It’s difficult to discern where the segmentation between compute and models are here.
> AI will not play out the way the Web did (upstarts unseat giants)
Yes, I agree.
I recently spoke to a doctor that wanted to do a startup one part of which is an AI agent that can provide consumers second opinions for medical questions. For this to be safe, it will require access to not only patient data, but possibly front line information from content origins like UpToDate because that content is a necessity to provide grounded answers for information that's not in the training set and not publicly available via search.
The obvious winner is UpToDate who owns that data and the pipeline for originating more content. If you want to build the best AI agent for medical analysis, you need to work with UpToDate.
> ...not brand new startups who have to try to convince companies to give them all their data for the first time.
Yes. I think of Microsoft and SharePoint, for example. Enterprises that are using SharePoint for document and content storage have already organized a subset of their information in a way that benefits Microsoft as concerns AI agents that are contextually aware of your internal data.
> will instead play out like smartphones (giants entrench and balloon).
Someone correct me if I'm wrong, but didn't smartphones go the "upstarts unseat giants" way? Apple wasn't a phone-maker, and became huge in the phone-market after their launch. Google also wasn't a phone-maker, yet took over the market slowly but surely with their Android purchase.
I barely see any Motorola, Blackberry, Nokia or Sony Ericsson phones anymore, yet those were the giants at one time. Now it's all iOS/Android, two "upstarts" initially.
The people selling models are actually just selling compute
Yes, fully agreed. Anything AI is discovering in your dataset could have been found by humans, and it could have been done by a more efficient program. But that would require humans to carefully study it and write the program. AI lets you skip the novel analysis of the data and writing custom programs by using a generalizable program that solves those steps for you by expending far more compute.
I see it as, AI could remove the most basic obstacle preventing us from applying compute to vast swathes of problems- and that’s the need to write a unique program for the problem at hand.
I think you're downplaying how well Cursor is doing "code generation" relative to other products.
Cursor can do at least the following "actions":
* code generation
* file creation / deletion
* run terminal commands
* answer questions about a code base
I totally agree with you on ETL (it's a huge part of our product https://www.definite.app/), but the actions an agent takes are just as tricky to get right.
Before I give Cursor, I often doubt it's going to be able to pull it off and I constantly impressed by how deep it can go to complete a complex task.
This really puzzles me. I tried Cursor and was completely underwhelmed. The answers it gave (about a 1.5M loc messy Spring codebase) were surface-level and unhelpful to anyone but a Java novice. I get vastly better work out of my intern.
To add insult to injury, the IntelliJ plugin threw spurious errors. I ended up uninstalling it and marking my calendar to try again in 6 months.
Yet some people say Cursor is great. Is it something about my project? I can't imagine how it deals with a codebase that is many millions of tokens. Or is it something about me? I'm asking hard questions because I don't need to ask the easy ones.
What are people who think Cursor is great doing differently?
So isn’t cursor just a tool for Claude or ChatGpt to use? Another example would be a flight booking engine. So why can’t an AI just talk direct to an IDE? This is hard as the process has changed, due to the human needing to be in the middle.
So Isn’t AI useless without the tools to manipulate?
I’m very “bullish” on AI in general but find cursor incredibly underwhelming because there is little value add compared to basically any other AI coding tool that goes beyond autocomplete. Cursor emphatically does not understand large codebases and smaller (few file codebases) can just be pasted into a chat context in the worst case.
I agree with you at this time, but there are a couple things I think will change this:
1. Agentic search can allow the model to identify what context is needed and retrieve the needed information (internally or externally through APIs or search)
2. I received an offer from OpenAI to give me free credits if I shared my API data with it, in other words, it is paying for industry specific data as they are probably fine tuning niche models.
There could be some exceptions to UI/UX going down specific verticals but eventually these fine tuning sector specific instances value will erode over time but this will likely occupy a niche since enterprise wants maximum configuration and more out of box solutions are oriented around SMEs.
It comes down to moats. Does OpenAI have a moat? It's leading the pack, but the competitors always seem to be catching up to it. We don't see network effects with it yet like with social networks, unless OpenAI introduces household robots for everyone or something, builds a leading marketshare in that segment, and the rich data from these household bots is enough training data that one can't replicate with a smaller robot fleet.
And AI is too fundamental of a technology that a "loss leader biggest wallet wins" strategy, used by the likes of Uber, will work.
API access can be restricted. Big part of why Twitter got authwalled was so that AI models can't train from it. Stack overflow added a no AI models clause to their free data dump releases (supposed to be CC licensed), they want to be paid if you use their data for AI models.
All you've proposed is moving the context problem somewhere else. You still need to build the search index. It's still a problem of building and providing context.
To your first point, the LLM still can’t know what it doesn’t know.
Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it, and AI can’t build its own context if it didn’t have good enough context already.
IMO that’s the point most agent frameworks miss. Piling on more LLM calls doesn’t fix the fundamental limitations.
TL;DR an LLM can’t magically make good context for itself.
I think you’re spot on with your second point. The big differentiators for big AI models will be data that’s not easy to google for and/or proprietary data.
Lucky they got all their data before people started caring.
It’s not even just the lack of access to the data, so much hidden information to make decisions is not documented at all. It’s intuition, learned from doing something in a specific context for a long time and only a fraction of that context is accessible.
Anyone that's done any amount of systems integration in enterprises knows this.
"Let me talk to Lars; he should know because his team owns that system."
"We don't have any documentation on this, but Mette should know about it because she led the project."
> No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules
This is exactly the motivation behind https://github.com/OpenAdaptAI/OpenAdapt: so that users can demonstrate their desktop workflows to AI models step by step (without worrying about their data being used by a corporation).
Context is important but it takes about two weeks to build a context collection bot and integrate it into slack. The hard part is not technical, AIs can rapidly build a company specific and continually updated knowledge base, it's political. Getting a drug company to let you tap slack and email and docs etc is dauntingly difficult.
Difficult to impossible. Their vendors are already working on AI features, so why would they risk adding a new vendor when a vendor they've already approved will have substantially the same capabilities soon?
This problem will be eaten by OpenAI et al. the same way the careful prompting strategies used in 2022/2023 were eaten. In a few years we will have context lengths of 10M+ or online fine tuning, combined with agents that can proactively call APIs and navigate your desktop environment.
Providing all context will be little more than copying and pasting everything, or just letting the agent do its thing.
Super careful or complicated setups to filter and manage context probably won't be needed.
Context requires quadratic VRAM. It is why OpenAI hasn't even supported 200k context length yet for its 4o model.
Is there a trick that bypasses this scaling constraint while strictly preserving the attention quality? I suspect that most such tricks lead to performance loss while deep in the context.
I agree but do see 1 realistic solution to solve the problem you describe. Every product on the market is independently integrating a LLM right now that has access to their product’s silo of information. I can imagine a future where a corporate employee interacts with 1 central LLM that in turn understands the domain of expertise of all the other system-specific LLMs. Given that knowledge, the central one can orchestrate prompting and processing responses from the others.
We been using this pattern forever with traditional APIs but the huge hurdle is that the information in any system you integrate with is often both complex and messy. LLMs handle the hard work of handling ambiguity and variations.
I agree that context is one core focus, but I really don't agree that it's the only thing a startup can focus on.
Context aside, you have the generation aspect of it, which can be very important (models trained to output good SQL, or good legal contracts, etc). You have the UI, which is possibly the most important element of a good AI product (think the difference between an IDE and Copilot - very very different UX/UI for the same underlying model).
Context is incredibly important, and I agree that people are downplaying some aspects of ETL here (though this isn't standard ETL in some cases). But it's not even close to being everything.
Startups can still win against big players by building better products faster (with AI), collecting more / better data to feed AI, and then feeding that into better AI automation for customers. Big players won't automatically win, but more data is a moat that gives them room to mess up for a long time and still pull out ahead. Even then, big companies already compete against one another and swallowing a small AI startup can help them and therefore starting one can also make sense.
I found that fine-tuning and RAG can be replaced with tool calling for some specialized domains, e.g. real-time data. Even things like user's location can be tool called, so context can be obtained reliably. I also note that GPT-4o and better are smart enough to chain together different functions you give it, but not reliably. System prompting helps some, but the non-determinism of AI today is both awesome and a cure.
All of these comments are premised on this technology staying still. A model with memory and the ability to navigate the computer (we are already basically halfway there) would easily eliminate the problems you describe.
HN, i find, also has a tendency to fall prey to the bitter lesson.
There is a second, related problem: continuous learning. AI models won’t go anywhere as long as their state resets on each new session, and they revert to being like the new intern on their first day.
And that's why the teams that really want to unlock AI will understand that the core problem is really systems integration and ETL; the AI needs to be aware of the entire corpus of relevant information through some mechanism (tool use, search, RAG, graph RAG, etc.) and the startups that win are the ones that are going to do that well.
You can't solve this problem with more compute nor better models.
I've said it elsewhere in this discussion, but the LLM is just a magical oven that's still reliant on good ingredients being prepped and put into the oven before hitting the "bake" button if you want amazing dishes to pop out. If you just want Stouffer's Mac & Cheese, it's already good enough for that.
Yeah seems like context is the AI version of cache invalidation, in the sense of the joke that "there's only 2 hard problems in computer science, cache invalidation and naming things". It all boils down to that (that, and naming things)
I think this argument only makes sense if you believe that AGI and/or unbounded AI agents are "right around the corner". For sure, we will progress in that direction, but when and if we truly get there–who knows?
If you believe, as I do, that these things are a lot further off than some people assume, I think there's plenty of time to build a successful business solving domain-specific workflows in the meantime, and eventually adapting the product as more general technology becomes available.
Let's say 25 years ago you had the idea to build a product that can now be solved more generally with LLMs–let's say a really effective spam filter. Even knowing what you know now, would it have been right at the time to say, "Nah, don't build that business, it will eventually be solved with some new technology?"
I don't think it's that binary. We've had a lot of progress over the last 25 years; much of it in the last two. AGI is not a well defined thing that people easily agree on. So, determining whether we have it or not is actually not that simple.
Mostly people either get bogged down into deep philosophical debates or simply start listing things that AI can and cannot do (and why they believe why that is the case). Some of those things are codified in benchmarks. And of course the list of stuff that AIs can't do is getting stuff removed from it on a regular basis at an accelerating rate. That acceleration is the problem. People don't deal well with adapting to exponentially changing trends.
At some arbitrary point when that list has a certain length, we may or may not have AGI. It really depends on your point of view. But of course, most people score poorly on the same benchmarks we use for testing AIs. There are some specific groups of things where they still do better. But also a lot of AI researchers working on those things.
Consider OpenAI's products as an example. GPT-3 (2020) was a massive step up in reasoning ability from GPT-2 (2019). GPT-3.5 (2022) was another massive step up. GPT-4 (2023) was a big step up, but not quite as big. GPT-4o (2024) was marginally better at reasoning, but mostly an improvement with respect to non-core functionality like images and audio. o1 (2024) is apparently somewhat better at reasoning at the cost of being much slower. But when I tried it on some puzzle-type problems I thought would be on the hard side for GPT-4o, it gave me (confidently) wrong answers every time. 'Orion' was supposed to be released as GPT-5, but was reportedly cancelled for not being good enough. o3 (2025?) did really well on one benchmark at the cost of $10k in compute, or even better at the cost of >$1m – not terribly impressive. We'll see how much better it is than o1 in practical scenarios.
To me that looks like progress is decelerating. Admittedly, OpenAI's releases have gotten more frequent and that has made the differences between each release seem less impressive. But things are decelerating even on a time basis. Where is GPT-5?
>Let's say 25 years ago you had the idea to build a product
I resemble that remark ;)
>that can now be solved more generally with LLMs
Nope, sorry, not yet.
>"Nah, don't build that business, it will eventually be solved with some new technology?"
Actually I did listen to people like that to an extent, and started my business with the express intent of continuing to develop new technologies which would be adjacent to AI when it matured. Just better than I could at my employer where it was already in progress. It took a couple years before I was financially stable enough to consider layering in a neural network, but that was 30 years ago now :\
Wasn't possible to benefit with Windows 95 type of hardware, oh well, didn't expect a miracle anyway.
Heck, it's now been a full 45 years since I first dabbled in a bit of the ML with more kilobytes of desktop memory than most people had ever seen. I figured all that memory should be used for something, like memorizing, why not? Seemed logical. Didn't take long to figure out how much megabytes would help, but they didn't exist yet. And it became apparent that you could only go so far without a specialized computer chip of some kind to replace or augment a microprocessor CPU. What kind, I really had no idea :)
I didn't say they resembled 25-year-old ideas that much anyway ;)
>We've had a lot of progress over the last 25 years; much of it in the last two.
I guess it's understandable this has been making my popcorn more enjoyable than ever ;)
Agreed. There's a difference between developing new AI, and developing applications of existing AI. The OP seems to blur this distinction a bit.
The original "Bitter Lesson" article referenced in the OP is about developing new AI. In that domain, its point makes sense. But for the reasons you describe, it hardly applies at all to applications of AI. I suppose it might apply to some, but they're exceptions.
You think it will be 25 years before we have a drop in replacement for most office jobs?
I think it will be less than 5 years.
You seem to be assuming that the rapid progress in AI will suddenly stop.
I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
Even if there is no progress in scaling memristors or any exotic new paradigm, high speed memory organized to localize data in frequently used neural circuits and photonic interconnects surely have multiple orders of magnitude of scaling gains in the next several years.
> You seem to be assuming that the rapid progress in AI will suddenly stop.
And you seem to assume that it will just continue for 5 years. We've already seen the plateau start. OpenAI has tacitly acknowledged that they don't know how to make a next generation model, and have been working on stepwise iteration for almost 2 years now.
Why should we project the rapid growth of 2021–2023 5 years into the future? It seems far more reasonable to project the growth of 2023–2025, which has been fast but not earth-shattering, and then also factor in the second derivative we've seen in that time and assume that it will actually continue to slow from here.
> You seem to be assuming that the rapid progress in AI will suddenly stop.
> I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
It's better to talk about actual numbers to characterise progress and measure scaling:
"
By scaling I usually mean the specific empirical curve from the 2020 OAI paper. To stay on this curve requires large increases in training data of equivalent quality to what was used to derive the scaling relationships.
"[^2]
"I predicted last summer: 70% chance we fall off the LLM scaling curve because of data limits, in the next step beyond GPT4.
[…]
I would say the most plausible reason is because in order to get, say, another 10x in training data, people have started to resort either to synthetic data, so training data that's actually made up by models, or to lower quality data."[^0]
“There were extraordinary returns over the last three or four years as the Scaling Laws were getting going,” Dr. Hassabis said. “But we are no longer getting the same progress.”[^1]
Also office jobs will be adapted to be a better fit to what AI can do, just as manufacturing jobs were adapted so that at least some tasks could be completed by robots.
Not my downvote, just the opposite but I think you can do a lot in an office already if you start early enough . . .
At one time I would have said you should be able to have an efficient office operation using regular typewriters, copiers, filing cabinets, fax machines, etc.
And then you get Office 97, zip through everything and never worry about office work again.
I was pretty extreme having a paperless office when my only product is paperwork, but I got there. And I started my office with typewriters, nice ones too.
Before long Google gets going. Wow. No-ads information superhighway, if this holds it can only get better. And that's without broadband.
But that's besides the point.
Now it might make sense for you to at least be able to run an efficient office on the equivalent of Office 97 to begin with. Then throw in the AI or let it take over and see what you get in terms of output, and in comparison. Microsoft is probably already doing this in an advanced way. I think a factor that can vary over orders of magnitude is how does the machine leverage the abilities and/or tasks of the nominal human "attendant"?
One type of situation would be where a less-capable AI could augment a defined worker more effectively than even a fully automated alternative utilizing 10x more capable AI. There's always some attendant somewhere so you don't get a zero in this equation no matter how close you come.
Could be financial effectiveness or something else, the dividing line could be a moving target for a while.
You could even go full paleo and train the AI on the typewriters and stuff just to see what happens ;)
But would you really be able to get the most out of it without the momentum of many decades of continuous improvement before capturing it at the peak of its abilities?
For me, general intelligence from a computer will be achieved when it knows when it's wrong. You may say that humans also struggle with this, and I'd agree - but I think there's a difference between general intelligence and consciousness, as you said.
I think one thing ignored here is the value of UX.
If a general AI model is a "drop-in remote worker", then UX matters not at all, of course. I would interact with such a system in the same way I would one of my colleagues and I would also give a high level of trust to such a system.
If the system still requires human supervision or works to augment a human worker's work (rather than replace it), then a specific tailored user interface can be very valuable, even if the product is mostly just a wrapper of an off-the-shelf model.
After all, many SaaS products could be built on top of a general CRM or ERP, yet we often find a vertical-focused UX has a lot to offer. You can see this in the AI space with a product like Julius.
The article seems to assume that most of the value brought by AI startups right now is adding domain-specific reliability, but I think there's plenty of room to build great experiences atop general models that will bring enduring value.
If and when we reach AGI (the drop-in remote worker referenced in the article), then I personally don't see how the vast majorities of companies - software and others - are relevant at all. That just seems like a different discussion, not one of business strategy.
The value of UX is being ignored, as the magical thinking has these AIs being fully autonomous, which will not work. The phrase "the devil's in the details" needs to be imprinted on everyone's screens, because the details of a "drop-in remote worker" are several Grand Canyons yet to be realized. This civilization is vastly more complex than you, dear reader, realize, and the majority of that complexity is not written down.
Also, the UX of your potential "remote workers" are vitally important! The difference between a good and a bad remote worker is almost always how good they are at communicating - both reading and understanding tickets of work to be done and how well they explain, annotate, and document the work they do.
At the end of the day, someone has to be checking the work. This is true of humans and of any potential AI agent, and the UX of that is a big deal. I can get on a call and talk through the code another engineer on my team wrote and make sure I understand it and that it's doing the right thing before we accept it. I'm sure at some point I could do that with an LLM, but the worry is that the LLM has no innate loyalty or sense of its own accuracy or honesty.
I can mostly trust that my human coworker isn't bullshitting me and any mistakes are honest mistakes that we'll learn from together for the future. That we're both in the same boat where if we write or approve malicious or flagrantly defective code, our job is on the line. An AI agent that's written bad or vulnerable code won't know it, will completely seriously assert that it did exactly what it was told, doesn't care if it gets fired, and may say completely untrue things in an attempt to justify itself.
Any AI "remote worker" is a totally different trust and interaction model. There's no real way to treat it like you would another human engineer because it has, essentially, no incentive structure at all. It doesn't care if the code works. It doesn't care if the team meets its goals. It doesn't care if I get fired. I'm not working with a peer, I'm working with an industrial machine that maybe makes my job easier.
I guess part of the point is that the value of the UX will quickly start to decrease as more tasks or parts of tasks can be done without close supervision. And that is subject to the capabilities of the models which continues to improve.
I suggest that before we satisfy _everyone_'s definition of AGI, more and more people may decide we are there as their own job is automated.
The UX at that point, maybe in 5 or 10 or X years, might be a 3d avatar that pops up in your room via mixed reality glasses, talks to you, and then just fires off instructions to a small army of agents on your behalf.
Nvidia actually demoed something a little bit like that a few days ago. Except it lives on your computer screen and probably can't manage a lot of complex tasks on it's own. Yet.
Or maybe at some point it doesn't need sub agents and can just accomplish all of the tasks on its own. Based on the bitter lesson, specialized agents are probably going to have a limited lifetime as well.
But I think it's worth having the AGI discussion as part of this because it will be incremental.
Personally, I feel we must be pretty close to AGI because Claude can do a lot of my programming for me. I still have to make important suggestions, and routinely for obvious things, but it is much better at me at filling in all the details and has much broader knowledge.
And the models do keep getting more robust, so I seriously doubt that humans will be better programmers overall for much longer.
Which is an easier way to interact with your bank? Writing a business letter, or filling out a form?
I suspect that we will still be filling out forms, because that’s a better UI for a routine business transaction. It’s easier to know what the bank needs from you if it’s laid out explicitly, and you can also review the information you gave them to make sure it’s correct.
AI could still be helpful for finding the right forms, auto-filling some fields, answering any questions you might have, and checking for common errors, but that’s only a mild improvement from what a good website already does.
And yes, it’s also helpful for the programmers writing the forms. But the bank still needs people to make sure that any new forms implement their consumer interactions correctly, that the AI assist has the right information to answer any questions, and that it’s all legal.
Chat models make UI redundant. who will want to learn how to use some apps custom interface when they are used to just asking it to do what they want/need? Chat is the most natural interface for humans. UX will be just trying to steer models to kiss your butt in the right way, eventually, and the bar for this will be low as language interaction problems are going to be obvious even to teen-agers.
The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.
A drop in remote worker will still require their work to be checked and their access to the systems they need to do their work secured in case they are a bad actor.
I think the core problem at hand for people trying to use AI in user-facing production systems is "how can we build a reliable system on top of an unreliable (but capable) model?". I don't think that's the same problem that AI researchers are facing, so I'm not sure it's sound to use "bitter lesson" reasoning to dismiss the need for software engineering outright and replace it with "wait for better models".
The article sits on an assumption that if we just wait long enough, the unreliability of deep learning approaches to AI will just fade away and we'll have a full-on "drop-in remote worker". Is that a sound assumption?
Well. We were working on a search engine for industry suppliers since before the whole AI hype started (even applied to YC once), and hit a brick wall at some point were it got too hard to improve search result quality algorithmically. To understand what that means: We gathered lots of data points from different sources, tried to reconcile that into unified records, then find the best match for a given sourcing case based on that. But in a lot of cases, both the data wasn’t accurate enough to identify what a supplier was actually manufacturing, and the sourcing case itself wasn’t properly defined, because users found it too hard to come up with good keywords for their search.
Then, LLMs entered the stage. Suddenly, we became able to both derive vastly better output from the data we got, and also offer our users easier ways to describe what they were looking for, find good keywords automatically, and actually deliver helpful results!
This was only possible because AI augments our product well and really provides a benefit in that niche, something that would just not have been possible otherwise. If you plan on founding a company around AI, the best advice I can give you is to choose a problem that similarly benefits from AI, but does exist without it.
The author discusses the problem from the point of engineering, not from business. When you look at it from business perspective, there is a big advantage of not waiting, and using whatever exists right now to solve the business problem, so that you can get traction, get funding, grab marketshare, build a team, and when the next day a better model will come, you can rewrite your code, and you would be in a much better position to leverage whatever new capabilities the new models provide; you know your users, you have the funds, you built the right UX...
The best strategy from your experience, is to jump on a problem as soon there is opportunity to solve it and generate lots of business value within the next 6 months. The trick is finding that subproblem that is worth a lot right now and could not be resolved 6 months ago. A couple of AI-sales startups "succeeded" quite well doing that (e.g. 11x), now they are in a good position to build from there (whether they will succeed in building a unicorn, that's another question, it just looks like they are in a good position now).
Very true. Most code written today will probably be obsolete in 2050. So why write it? Because it puts you in a good strategic position to keep leading in your space.
It's a little depressing how many high valued startups are basically just wrappers around LLMs that they don't own. I'd be curious to see what percentage of YC latest batch is just this.
> 70% of Y Combinator’s Winter 2024 batch are AI startups. This is compared to -57% of YC Summer 2023 companies and ~32% from the Winter batch one year ago (YC W23).
The thinking is, the models will get better which will improve our product, but in reality, like the article states, the generalized models get better so your value add diminished as there's no need to fine tune.
On the other hand the crypto fund made a killing off of "me too" block chain technology before it got hammered again. So who knows about 2-5 year term but 10 year almost certainly won't have these billion dollar companies that are wrappers around LLMs
How is being a wrapper for LLMs you don’t own any different from being a company based on cloud infrastructure you don’t own?
LLMs are a platform.
Bill Gates definition of a platform was “A platform is when the economic value of everybody that uses it exceeds the value of the company that creates it.”
It's relatively easy to move to different cloud infrastructure (or host your own) later on down the line.
If you rely on an OpenAI LLM for your business, they can basically do whatever they want to you. Oh, prices went up 10x? What are you gonna do, train your own AI?
A LLM wrapper adds near-zero value. If I type some text into a "convert to Donald Trump style" tool, it produces the exact same output as typing it into ChatGPT following "Convert this text to Donald Trump style:" because that's what the tool actually does. Implementing ChatGPT is 99.999% of the value creation. Prepending the prompt is 0.001%. The surprising fact is that the market assigns a non-zero value to the tool anyway.
Startups that use cloud servers still write the software that goes on those servers, which is 90% of the value creation.
Controversial opinion: I don't believe in the bitter lesson. I just think that the current DNN+SGD approaches are just not that good at learning deep general expressive patterns. With less inductive bias the model memorizes a lot of scenarios and is able to emulate whatever real work scenario you are trying to make the model learn. However it fails to simulate this scenario well.
So it's kind of misleading to say that it's generally better to have less inductive bias. That is only true if your model architecture and optimization approach are just a bit crap.
My second controversial point regarding AI research and startups: doing research sucks. It's risky business. You are not guaranteed success. If you make it, your competitors will be hot on your tail and you will have to keep improving all the time. I personally would rather leave the model building to someone else and focus more on building products with the available models. There are exceptions like finetuning for your specific product or training bespoke models for very specific tasks at hand.
> I just think that the current DNN+SGD approaches are just not that good
I'll add even further. The transformers and etc that we are using today are not good either.
That's evidenced by the enormous amount of memory they need to do any task. We have just taken the one approach that was working a bit better for sensorial tasks and pattern matching, and went all in, adding hardware after hardware so we could brute-force some cognitive tasks out of it.
If we do the same to other ML architectures, I don't think they would stay much behind. And maybe some would get even better results.
I also don't believe in the 'bitter lesson' when extrapolated to apply to all 'AI application layer implementations' - at least in the context of asserting that the universe of problem scopes are affected by it.
I think it is true in an AI research context, but an unstated assumption is that you have complete data, E2E training, and the particular evaluated solution is not real-world unbounded.
It assumes infinite data, and it assumes the ability to falsify the resulting model output. Most valuable, 'real world' applications of AI when trying to implement in practice have an issue with one or both of those. So in other words: where a fully unsupervised AI pathway is viable due to the structure of the problem, absolutely.
I'm not convinced in the universality of this. Doesn't mean the core point of this essay on the futility of startups basing their business around one of the off the shelf LLMs isn't valid - I think for many they risk being generalized away.
The "bitter lesson" is self evidently true in one way as was a quantum jump in what AI's could do once we gave them enough compute. But as a "rule of AI" I think it's being over generalised, meaning it's being used to make predictions where it doesn't apply.
I don't see how the bitter lesson could not be true for the current crop of LLM's. They seem to have memorised just about everything mankind has written down, and squished it into something of the order of 1TB. You can't do that without a lot of memory to recognise the common patterns and eliminate them. The underlying mechanism is nothing like the zlib's deflate but when it comes to memory you have to throw at it they are the same in this respect. The bigger the compression window the better deflate does. When you are trying to recognise all the pattens in everything humans have written down to a deep level (such as discovering the mathematical theorems are generally applicable), the memory window and/or compute you have to use must be correspondingly huge.
That was also true to a lesser extent when Deep Mind taught an AI to play pong in 2013. They had 1M of pixels arriving 24 times a second, and it had to learn to pick out balls, bats and balls in that sea of data. It's clearly going to require a lot of memory and compute to do that. Those resources simply weren't available on a researchers budget much before 2013.
Since 2013, we've asked our AI's to ingest larger and larger datasets using the much same techniques used in 2013 (but known long before that) and been enchanted with the results. The "bitter lesson" predicts you need correspondingly more compute and memory to compress those datasets. Is it really a lesson, or engineering rule of thumb that only became apparent when we had enough compute to do anything useful with AI?
I'm not sure this rule of thumb has much applicability outside of this "lets compress enormous amounts of data, looking for deep structure" realm. That's because if we look at neural networks in animals, most are quite small. A mosquito manages to find us for protein, find the right plant sap for food, find a mate, find water with enough algae for it's eggs, using data from vision, temperature sensors, and smell, and uses that to activate wings, legs and god knows what else. It does all that with 100,000 neurons. That's not what a naive reading of "the bitter lesson" tells you it should take.
Granted it may take an AI of enormous proportions to discover how to do it with 100,000 neurons. Nature did it by iteratively generating trillions upon trillions of these 100,000 neurons networks over millennia, and used a genetic algorithm to select the best at each step. If we have to do it that way it will be a very bitter lesson. The 10 fold increases in compute every few years that made us aware of the bitter lesson is ending. If the prediction of the bitter lesson is that we have rely on it continuing to build our mosquito emulation, then it's predicting it will take us centuries to build all the sorts of robots we need to do all the jobs we have.
But that's looking unlikely. We have an example. On one hand we have Tesla FSD, using throwing more and more resources an conventional AI training in the way the bitter lesson says you must do in order to progress. On the other we have Waymo using a more traditional approach. It's pretty clear which approach is failing and the other is working - and it's not going the way the bitter lesson says it should.
> We have an example. On one hand we have Tesla FSD, using throwing more and more resources an conventional AI training in the way the bitter lesson says you must do in order to progress. On the other we have Waymo using a more traditional approach. It's pretty clear which approach is failing and the other is working - and it's not going the way the bitter lesson says it should.
As I understand the article, it is going the way the bitter lesson predicts it would - the initial "more traditional" approach generates almost-workable solutions in the near term while the "bitter lesson" approach is unreliable in the near term.
Unless you think that FSD is already in the "far" term (i.e. already at the endgame), this is exactly what the article predicts happens in the near term.
No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules.
No LLM is going to know enough about some new drug in a pharma's pipeline, for example, because it doesn't know about the internal resources spread across multiple systems in an enterprise. (And if you've ever done a systems integration in any sufficiently large enterprise, you know that this is a "people problem" and usually not a technical problem).
I think the startups that succeed will understand that it all comes down to classic ETL: identify the source data, understand how to navigate systems integration, pre-process and organize the knowledge, train or fine-tune a model or have the right retrieval model to provide the context.
There's fundamentally no other way. AI is not magic; it can't know about trial ID 1354.006 except for what it was trained on and what it can search for. Even coding assistants like Cursor are really solving a problem of ETL/context and will always be. The code generation is the smaller part; getting it right requires providing the appropriate context.
If all that matters is what you can put into context, then AI really isn't a product in most cases. The people selling models are actually just selling compute, so that space will be owned by the big clouds. The people selling applications are actually just packaging data, so that space will be owned by the people who already have big data in their segment: the big players in each industry. All competitors at this point know how important data is, and they're not going to sell it to a startup when they could package it up themselves. And most companies will prefer to just use features provided by the B2B companies they already trust, not trust a brand new company with all the same data.
I fully expect that almost all of the AI wins will take the form of features embedded in existing products that already have the data (like GitHub with Copilot), not brand new startups who have to try to convince companies to give them all their data for the first time.
I recently spoke to a doctor that wanted to do a startup one part of which is an AI agent that can provide consumers second opinions for medical questions. For this to be safe, it will require access to not only patient data, but possibly front line information from content origins like UpToDate because that content is a necessity to provide grounded answers for information that's not in the training set and not publicly available via search.
The obvious winner is UpToDate who owns that data and the pipeline for originating more content. If you want to build the best AI agent for medical analysis, you need to work with UpToDate.
Yes. I think of Microsoft and SharePoint, for example. Enterprises that are using SharePoint for document and content storage have already organized a subset of their information in a way that benefits Microsoft as concerns AI agents that are contextually aware of your internal data.Someone correct me if I'm wrong, but didn't smartphones go the "upstarts unseat giants" way? Apple wasn't a phone-maker, and became huge in the phone-market after their launch. Google also wasn't a phone-maker, yet took over the market slowly but surely with their Android purchase.
I barely see any Motorola, Blackberry, Nokia or Sony Ericsson phones anymore, yet those were the giants at one time. Now it's all iOS/Android, two "upstarts" initially.
Yes, fully agreed. Anything AI is discovering in your dataset could have been found by humans, and it could have been done by a more efficient program. But that would require humans to carefully study it and write the program. AI lets you skip the novel analysis of the data and writing custom programs by using a generalizable program that solves those steps for you by expending far more compute.
I see it as, AI could remove the most basic obstacle preventing us from applying compute to vast swathes of problems- and that’s the need to write a unique program for the problem at hand.
Except they won't package it themselves because they are inept and inert. They still won't sell it to startups though.
Cursor can do at least the following "actions":
* code generation
* file creation / deletion
* run terminal commands
* answer questions about a code base
I totally agree with you on ETL (it's a huge part of our product https://www.definite.app/), but the actions an agent takes are just as tricky to get right.
Before I give Cursor, I often doubt it's going to be able to pull it off and I constantly impressed by how deep it can go to complete a complex task.
To add insult to injury, the IntelliJ plugin threw spurious errors. I ended up uninstalling it and marking my calendar to try again in 6 months.
Yet some people say Cursor is great. Is it something about my project? I can't imagine how it deals with a codebase that is many millions of tokens. Or is it something about me? I'm asking hard questions because I don't need to ask the easy ones.
What are people who think Cursor is great doing differently?
So Isn’t AI useless without the tools to manipulate?
1. Agentic search can allow the model to identify what context is needed and retrieve the needed information (internally or externally through APIs or search)
2. I received an offer from OpenAI to give me free credits if I shared my API data with it, in other words, it is paying for industry specific data as they are probably fine tuning niche models.
There could be some exceptions to UI/UX going down specific verticals but eventually these fine tuning sector specific instances value will erode over time but this will likely occupy a niche since enterprise wants maximum configuration and more out of box solutions are oriented around SMEs.
And AI is too fundamental of a technology that a "loss leader biggest wallet wins" strategy, used by the likes of Uber, will work.
API access can be restricted. Big part of why Twitter got authwalled was so that AI models can't train from it. Stack overflow added a no AI models clause to their free data dump releases (supposed to be CC licensed), they want to be paid if you use their data for AI models.
Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it, and AI can’t build its own context if it didn’t have good enough context already.
IMO that’s the point most agent frameworks miss. Piling on more LLM calls doesn’t fix the fundamental limitations.
TL;DR an LLM can’t magically make good context for itself.
I think you’re spot on with your second point. The big differentiators for big AI models will be data that’s not easy to google for and/or proprietary data.
Lucky they got all their data before people started caring.
Anyone that's done any amount of systems integration in enterprises knows this.
This is exactly the motivation behind https://github.com/OpenAdaptAI/OpenAdapt: so that users can demonstrate their desktop workflows to AI models step by step (without worrying about their data being used by a corporation).
Providing all context will be little more than copying and pasting everything, or just letting the agent do its thing.
Super careful or complicated setups to filter and manage context probably won't be needed.
Is there a trick that bypasses this scaling constraint while strictly preserving the attention quality? I suspect that most such tricks lead to performance loss while deep in the context.
We been using this pattern forever with traditional APIs but the huge hurdle is that the information in any system you integrate with is often both complex and messy. LLMs handle the hard work of handling ambiguity and variations.
Context aside, you have the generation aspect of it, which can be very important (models trained to output good SQL, or good legal contracts, etc). You have the UI, which is possibly the most important element of a good AI product (think the difference between an IDE and Copilot - very very different UX/UI for the same underlying model).
Context is incredibly important, and I agree that people are downplaying some aspects of ETL here (though this isn't standard ETL in some cases). But it's not even close to being everything.
RAG is just a single-purpose instance of the more general process of tool calling, so, that's not surprising.
HN, i find, also has a tendency to fall prey to the bitter lesson.
Startups should really try to get such a moat. Chapter 2 will cover this.
Is this another way of saying "content is king"?
And that's why the teams that really want to unlock AI will understand that the core problem is really systems integration and ETL; the AI needs to be aware of the entire corpus of relevant information through some mechanism (tool use, search, RAG, graph RAG, etc.) and the startups that win are the ones that are going to do that well.
You can't solve this problem with more compute nor better models.
I've said it elsewhere in this discussion, but the LLM is just a magical oven that's still reliant on good ingredients being prepped and put into the oven before hitting the "bake" button if you want amazing dishes to pop out. If you just want Stouffer's Mac & Cheese, it's already good enough for that.
The process that feeds RAG is all about how you extract, transform, and load source data into the RAG database. Good RAG is the output of good ETL.
Seems to apply to AI as well.
If you believe, as I do, that these things are a lot further off than some people assume, I think there's plenty of time to build a successful business solving domain-specific workflows in the meantime, and eventually adapting the product as more general technology becomes available.
Let's say 25 years ago you had the idea to build a product that can now be solved more generally with LLMs–let's say a really effective spam filter. Even knowing what you know now, would it have been right at the time to say, "Nah, don't build that business, it will eventually be solved with some new technology?"
Mostly people either get bogged down into deep philosophical debates or simply start listing things that AI can and cannot do (and why they believe why that is the case). Some of those things are codified in benchmarks. And of course the list of stuff that AIs can't do is getting stuff removed from it on a regular basis at an accelerating rate. That acceleration is the problem. People don't deal well with adapting to exponentially changing trends.
At some arbitrary point when that list has a certain length, we may or may not have AGI. It really depends on your point of view. But of course, most people score poorly on the same benchmarks we use for testing AIs. There are some specific groups of things where they still do better. But also a lot of AI researchers working on those things.
Consider OpenAI's products as an example. GPT-3 (2020) was a massive step up in reasoning ability from GPT-2 (2019). GPT-3.5 (2022) was another massive step up. GPT-4 (2023) was a big step up, but not quite as big. GPT-4o (2024) was marginally better at reasoning, but mostly an improvement with respect to non-core functionality like images and audio. o1 (2024) is apparently somewhat better at reasoning at the cost of being much slower. But when I tried it on some puzzle-type problems I thought would be on the hard side for GPT-4o, it gave me (confidently) wrong answers every time. 'Orion' was supposed to be released as GPT-5, but was reportedly cancelled for not being good enough. o3 (2025?) did really well on one benchmark at the cost of $10k in compute, or even better at the cost of >$1m – not terribly impressive. We'll see how much better it is than o1 in practical scenarios.
To me that looks like progress is decelerating. Admittedly, OpenAI's releases have gotten more frequent and that has made the differences between each release seem less impressive. But things are decelerating even on a time basis. Where is GPT-5?
I resemble that remark ;)
>that can now be solved more generally with LLMs
Nope, sorry, not yet.
>"Nah, don't build that business, it will eventually be solved with some new technology?"
Actually I did listen to people like that to an extent, and started my business with the express intent of continuing to develop new technologies which would be adjacent to AI when it matured. Just better than I could at my employer where it was already in progress. It took a couple years before I was financially stable enough to consider layering in a neural network, but that was 30 years ago now :\
Wasn't possible to benefit with Windows 95 type of hardware, oh well, didn't expect a miracle anyway.
Heck, it's now been a full 45 years since I first dabbled in a bit of the ML with more kilobytes of desktop memory than most people had ever seen. I figured all that memory should be used for something, like memorizing, why not? Seemed logical. Didn't take long to figure out how much megabytes would help, but they didn't exist yet. And it became apparent that you could only go so far without a specialized computer chip of some kind to replace or augment a microprocessor CPU. What kind, I really had no idea :)
I didn't say they resembled 25-year-old ideas that much anyway ;)
>We've had a lot of progress over the last 25 years; much of it in the last two.
I guess it's understandable this has been making my popcorn more enjoyable than ever ;)
The original "Bitter Lesson" article referenced in the OP is about developing new AI. In that domain, its point makes sense. But for the reasons you describe, it hardly applies at all to applications of AI. I suppose it might apply to some, but they're exceptions.
I think it will be less than 5 years.
You seem to be assuming that the rapid progress in AI will suddenly stop.
I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
Even if there is no progress in scaling memristors or any exotic new paradigm, high speed memory organized to localize data in frequently used neural circuits and photonic interconnects surely have multiple orders of magnitude of scaling gains in the next several years.
And you seem to assume that it will just continue for 5 years. We've already seen the plateau start. OpenAI has tacitly acknowledged that they don't know how to make a next generation model, and have been working on stepwise iteration for almost 2 years now.
Why should we project the rapid growth of 2021–2023 5 years into the future? It seems far more reasonable to project the growth of 2023–2025, which has been fast but not earth-shattering, and then also factor in the second derivative we've seen in that time and assume that it will actually continue to slow from here.
> I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
It's better to talk about actual numbers to characterise progress and measure scaling:
" By scaling I usually mean the specific empirical curve from the 2020 OAI paper. To stay on this curve requires large increases in training data of equivalent quality to what was used to derive the scaling relationships. "[^2]
"I predicted last summer: 70% chance we fall off the LLM scaling curve because of data limits, in the next step beyond GPT4.
[…]
I would say the most plausible reason is because in order to get, say, another 10x in training data, people have started to resort either to synthetic data, so training data that's actually made up by models, or to lower quality data."[^0]
“There were extraordinary returns over the last three or four years as the Scaling Laws were getting going,” Dr. Hassabis said. “But we are no longer getting the same progress.”[^1]
---
[^0]: https://x.com/hsu_steve/status/1868027803868045529
[^1]: https://x.com/hsu_steve/status/1869922066788692328
[^2]: https://x.com/hsu_steve/status/1869031399010832688
At one time I would have said you should be able to have an efficient office operation using regular typewriters, copiers, filing cabinets, fax machines, etc.
And then you get Office 97, zip through everything and never worry about office work again.
I was pretty extreme having a paperless office when my only product is paperwork, but I got there. And I started my office with typewriters, nice ones too.
Before long Google gets going. Wow. No-ads information superhighway, if this holds it can only get better. And that's without broadband.
But that's besides the point.
Now it might make sense for you to at least be able to run an efficient office on the equivalent of Office 97 to begin with. Then throw in the AI or let it take over and see what you get in terms of output, and in comparison. Microsoft is probably already doing this in an advanced way. I think a factor that can vary over orders of magnitude is how does the machine leverage the abilities and/or tasks of the nominal human "attendant"?
One type of situation would be where a less-capable AI could augment a defined worker more effectively than even a fully automated alternative utilizing 10x more capable AI. There's always some attendant somewhere so you don't get a zero in this equation no matter how close you come.
Could be financial effectiveness or something else, the dividing line could be a moving target for a while.
You could even go full paleo and train the AI on the typewriters and stuff just to see what happens ;)
But would you really be able to get the most out of it without the momentum of many decades of continuous improvement before capturing it at the peak of its abilities?
It isn't a specific model for any of those problems, but a "general" intelligence.
Of course, it's not perfect, and it's obviously not sentient or conscious, etc. - but maybe general intelligence doesn't require or imply that at all?
In other words, just AI, not AGI.
If a general AI model is a "drop-in remote worker", then UX matters not at all, of course. I would interact with such a system in the same way I would one of my colleagues and I would also give a high level of trust to such a system.
If the system still requires human supervision or works to augment a human worker's work (rather than replace it), then a specific tailored user interface can be very valuable, even if the product is mostly just a wrapper of an off-the-shelf model.
After all, many SaaS products could be built on top of a general CRM or ERP, yet we often find a vertical-focused UX has a lot to offer. You can see this in the AI space with a product like Julius.
The article seems to assume that most of the value brought by AI startups right now is adding domain-specific reliability, but I think there's plenty of room to build great experiences atop general models that will bring enduring value.
If and when we reach AGI (the drop-in remote worker referenced in the article), then I personally don't see how the vast majorities of companies - software and others - are relevant at all. That just seems like a different discussion, not one of business strategy.
At the end of the day, someone has to be checking the work. This is true of humans and of any potential AI agent, and the UX of that is a big deal. I can get on a call and talk through the code another engineer on my team wrote and make sure I understand it and that it's doing the right thing before we accept it. I'm sure at some point I could do that with an LLM, but the worry is that the LLM has no innate loyalty or sense of its own accuracy or honesty.
I can mostly trust that my human coworker isn't bullshitting me and any mistakes are honest mistakes that we'll learn from together for the future. That we're both in the same boat where if we write or approve malicious or flagrantly defective code, our job is on the line. An AI agent that's written bad or vulnerable code won't know it, will completely seriously assert that it did exactly what it was told, doesn't care if it gets fired, and may say completely untrue things in an attempt to justify itself.
Any AI "remote worker" is a totally different trust and interaction model. There's no real way to treat it like you would another human engineer because it has, essentially, no incentive structure at all. It doesn't care if the code works. It doesn't care if the team meets its goals. It doesn't care if I get fired. I'm not working with a peer, I'm working with an industrial machine that maybe makes my job easier.
I suggest that before we satisfy _everyone_'s definition of AGI, more and more people may decide we are there as their own job is automated.
The UX at that point, maybe in 5 or 10 or X years, might be a 3d avatar that pops up in your room via mixed reality glasses, talks to you, and then just fires off instructions to a small army of agents on your behalf.
Nvidia actually demoed something a little bit like that a few days ago. Except it lives on your computer screen and probably can't manage a lot of complex tasks on it's own. Yet.
Or maybe at some point it doesn't need sub agents and can just accomplish all of the tasks on its own. Based on the bitter lesson, specialized agents are probably going to have a limited lifetime as well.
But I think it's worth having the AGI discussion as part of this because it will be incremental.
Personally, I feel we must be pretty close to AGI because Claude can do a lot of my programming for me. I still have to make important suggestions, and routinely for obvious things, but it is much better at me at filling in all the details and has much broader knowledge.
And the models do keep getting more robust, so I seriously doubt that humans will be better programmers overall for much longer.
I suspect that we will still be filling out forms, because that’s a better UI for a routine business transaction. It’s easier to know what the bank needs from you if it’s laid out explicitly, and you can also review the information you gave them to make sure it’s correct.
AI could still be helpful for finding the right forms, auto-filling some fields, answering any questions you might have, and checking for common errors, but that’s only a mild improvement from what a good website already does.
And yes, it’s also helpful for the programmers writing the forms. But the bank still needs people to make sure that any new forms implement their consumer interactions correctly, that the AI assist has the right information to answer any questions, and that it’s all legal.
Deleted Comment
The article sits on an assumption that if we just wait long enough, the unreliability of deep learning approaches to AI will just fade away and we'll have a full-on "drop-in remote worker". Is that a sound assumption?
Then, LLMs entered the stage. Suddenly, we became able to both derive vastly better output from the data we got, and also offer our users easier ways to describe what they were looking for, find good keywords automatically, and actually deliver helpful results!
This was only possible because AI augments our product well and really provides a benefit in that niche, something that would just not have been possible otherwise. If you plan on founding a company around AI, the best advice I can give you is to choose a problem that similarly benefits from AI, but does exist without it.
how did the LLM help with that challenge?
The best strategy from your experience, is to jump on a problem as soon there is opportunity to solve it and generate lots of business value within the next 6 months. The trick is finding that subproblem that is worth a lot right now and could not be resolved 6 months ago. A couple of AI-sales startups "succeeded" quite well doing that (e.g. 11x), now they are in a good position to build from there (whether they will succeed in building a unicorn, that's another question, it just looks like they are in a good position now).
> 70% of Y Combinator’s Winter 2024 batch are AI startups. This is compared to -57% of YC Summer 2023 companies and ~32% from the Winter batch one year ago (YC W23).
The thinking is, the models will get better which will improve our product, but in reality, like the article states, the generalized models get better so your value add diminished as there's no need to fine tune.
On the other hand the crypto fund made a killing off of "me too" block chain technology before it got hammered again. So who knows about 2-5 year term but 10 year almost certainly won't have these billion dollar companies that are wrappers around LLMs
https://x.com/natashamalpani/status/1772609994610835505?mx=2
LLMs are a platform.
Bill Gates definition of a platform was “A platform is when the economic value of everybody that uses it exceeds the value of the company that creates it.”
If you rely on an OpenAI LLM for your business, they can basically do whatever they want to you. Oh, prices went up 10x? What are you gonna do, train your own AI?
Startups that use cloud servers still write the software that goes on those servers, which is 90% of the value creation.
My second controversial point regarding AI research and startups: doing research sucks. It's risky business. You are not guaranteed success. If you make it, your competitors will be hot on your tail and you will have to keep improving all the time. I personally would rather leave the model building to someone else and focus more on building products with the available models. There are exceptions like finetuning for your specific product or training bespoke models for very specific tasks at hand.
I'll add even further. The transformers and etc that we are using today are not good either.
That's evidenced by the enormous amount of memory they need to do any task. We have just taken the one approach that was working a bit better for sensorial tasks and pattern matching, and went all in, adding hardware after hardware so we could brute-force some cognitive tasks out of it.
If we do the same to other ML architectures, I don't think they would stay much behind. And maybe some would get even better results.
I think it is true in an AI research context, but an unstated assumption is that you have complete data, E2E training, and the particular evaluated solution is not real-world unbounded.
It assumes infinite data, and it assumes the ability to falsify the resulting model output. Most valuable, 'real world' applications of AI when trying to implement in practice have an issue with one or both of those. So in other words: where a fully unsupervised AI pathway is viable due to the structure of the problem, absolutely.
I'm not convinced in the universality of this. Doesn't mean the core point of this essay on the futility of startups basing their business around one of the off the shelf LLMs isn't valid - I think for many they risk being generalized away.
I don't see how the bitter lesson could not be true for the current crop of LLM's. They seem to have memorised just about everything mankind has written down, and squished it into something of the order of 1TB. You can't do that without a lot of memory to recognise the common patterns and eliminate them. The underlying mechanism is nothing like the zlib's deflate but when it comes to memory you have to throw at it they are the same in this respect. The bigger the compression window the better deflate does. When you are trying to recognise all the pattens in everything humans have written down to a deep level (such as discovering the mathematical theorems are generally applicable), the memory window and/or compute you have to use must be correspondingly huge.
That was also true to a lesser extent when Deep Mind taught an AI to play pong in 2013. They had 1M of pixels arriving 24 times a second, and it had to learn to pick out balls, bats and balls in that sea of data. It's clearly going to require a lot of memory and compute to do that. Those resources simply weren't available on a researchers budget much before 2013.
Since 2013, we've asked our AI's to ingest larger and larger datasets using the much same techniques used in 2013 (but known long before that) and been enchanted with the results. The "bitter lesson" predicts you need correspondingly more compute and memory to compress those datasets. Is it really a lesson, or engineering rule of thumb that only became apparent when we had enough compute to do anything useful with AI?
I'm not sure this rule of thumb has much applicability outside of this "lets compress enormous amounts of data, looking for deep structure" realm. That's because if we look at neural networks in animals, most are quite small. A mosquito manages to find us for protein, find the right plant sap for food, find a mate, find water with enough algae for it's eggs, using data from vision, temperature sensors, and smell, and uses that to activate wings, legs and god knows what else. It does all that with 100,000 neurons. That's not what a naive reading of "the bitter lesson" tells you it should take.
Granted it may take an AI of enormous proportions to discover how to do it with 100,000 neurons. Nature did it by iteratively generating trillions upon trillions of these 100,000 neurons networks over millennia, and used a genetic algorithm to select the best at each step. If we have to do it that way it will be a very bitter lesson. The 10 fold increases in compute every few years that made us aware of the bitter lesson is ending. If the prediction of the bitter lesson is that we have rely on it continuing to build our mosquito emulation, then it's predicting it will take us centuries to build all the sorts of robots we need to do all the jobs we have.
But that's looking unlikely. We have an example. On one hand we have Tesla FSD, using throwing more and more resources an conventional AI training in the way the bitter lesson says you must do in order to progress. On the other we have Waymo using a more traditional approach. It's pretty clear which approach is failing and the other is working - and it's not going the way the bitter lesson says it should.
As I understand the article, it is going the way the bitter lesson predicts it would - the initial "more traditional" approach generates almost-workable solutions in the near term while the "bitter lesson" approach is unreliable in the near term.
Unless you think that FSD is already in the "far" term (i.e. already at the endgame), this is exactly what the article predicts happens in the near term.