I've found this to be one of the most useful ways to use (at least) GPT-4 for programming. Instead of telling it how an API works, I make it guess, maybe starting with some example code to which a feature needs to be added. Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.
Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.
These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.
(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)
There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.
Unfortunately, this only works with APIs that aren't already super popular.
> Sometimes it comes up with a better approach than I had thought of.
IMO this has always been the killer use case for AI—from Google Maps to Grammarly.
I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.
But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.
The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.
> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results
Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.
The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?
Yes, we have the context - our unique lived experience, and are ultimately accountable for our actions. LLMs have no skin. They have no desires, and cannot be punished in any way. No matter how smart they get, we are providing their opportunities to generate value, guidance and iteration, and in the end have to live with the outcomes.
I used this to great success just this morning. I told the AI to write me some unit tests. It flailed and failed badly at that task. But how it failed was instructive, and uncovered a bug in the code I wanted to test.
That's closer to simply observing the mean. For an analogy, it's like waiting to pave a path until people tread the grass in a specific pattern. (Some courtyard designers used to do just that. Wait to see where people were walking first.)
Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.
Best way to put it. It's very hard to discuss even slightly unique concepts with GPT. It just keeps strawmanning ideas back to a common consensus without actually understanding the deep idea.
On the bright side, a lot of work is just finding the mean solution so.
I've played with a similar idea for writing technical papers. I'll give an LLM my draft and ask it to explain back to me what a section means, or otherwise quiz it about things in the draft.
I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.
In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.
Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.
When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.
I don't know if there's an earlier source, but I'm guessing Matlab originally popularized the `imread` name, and that OpenCV (along with its python wrapper) took it from there, same for scipy. Scikit-image then followed along, presumably.
As someone not familiar with these libraries, image_get or image_read seems much clearer to me than imread. I'm wondering if the convention is worse than your instinct in this case. Maybe these AI tools will push us towards conventions that aren't always the best design.
That's a perfect example! I wonder if changing it would be an improvement? If you can just replace image_get with imread in all the callers, maybe it would save your team mental effort and/or onboarding time in the future.
This is similar to an old HCI design technique called Wizard of Oz by the way, where a human operator pretends to be the app that doesn’t exist yet. It’s great for discovering new features.
> and being unintuitive is the only one that this will fix
That's also how I'm approaching it. If all the condensed common wisdom poured into the model's parameters says that this is how my API is supposed to work to be intuitive, how on earth do I think it should work differently? There needs to be a good reason (like composability, for example). I break expectations otherwise.
> Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.
“Sometimes” being a very important qualifier to that statement.
Claude 4 naturally doesn’t write code with any kind of long term maintenance in-mind, especially if it’s trying to make things look like what the less experienced developers wrote in the same repo.
Please don’t assume just because it looks smart that it is. That will bite you hard.
Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.
In a similar vein, some of my colleagues have been feeding their scientific paper methods sections to LLMs and asking them to implement the method in code, using the LLM's degree of success/failure as a vague indicator of the clarity of the method description.
> I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code
If anyone is stuck in this situation, give me a holler. My Gmail username is the same as my HN username. I've always been the one to hunt down my coworkers' bugs, and I think I'm the only person on the planet will finds it enjoyable to find ChatGPT'S oversights and sometimes seemingly malicious intent.
I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.
In essence, a LLM is a crystallisation of a large corpus human opinion and you are using that to focus group your API as it is representative of a reasonable third party perspective?
Yeah, basically. For example, it's really good at generating critical HN comments. Whenever I have a design or an idea I formulate it to GPT and ask it to generate a bunch of critical HN comments. It usually points out stuff I hadn't considered, or at least prepares me to think about and answer the tough questions.
This was a big problem starting out writing MCP servers for me.
Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.
Great point. Also, it may not be the best possible API designer in the world, but it sure sounds like a good way to forecast what an _average_ developer would expect this API to look like.
> These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.
This is also similar to which areas TD-Gammon excelled at in Backgammon.
Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.
> Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. (...)
> You might’ve heard a story of Soundslice [adding a feature because ChatGPT kept telling people it exists](https://www.holovaty.com/writing/chatgpt-fake-feature/). We see the same at Instant: for example, we used `tx.update` for both inserting and updating entities, but LLMs kept writing `tx.create` instead. Guess what: we now have `tx.create`, too.
> Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too.
> It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time.
Often I've started with some example code that invokes part of the API, but not all of it. Or in C I can give it the .h file, maybe without comments.
Sometimes I can just say, "How do I use the <made-up name> API in Python to do <task>?" Unfortunately the safeguards against hallucinations in more recent models can make this more difficult, because it's more likely to tell me it's never heard of it. You can usually coax it into suspension of disbelief, but I think the results aren't as good.
From my perspective that’s fascinatingly upside down thinking that leads to you asking to lose your job.
AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.
Right now it’s throwing spaghetti at the wall and you’re drawing around it.
I find it's often way better than API design than I expect. It's seen so many examples of existing APIs in its training data that it tends to have surprisingly good "judgement" when it comes to designing a new one.
Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.
Maybe. So far it seems to be a lot better at creative idea generation than at writing correct code, though apparently these "agentic" modes can often get close enough after enough iteration. (I haven't tried things like Cursor yet.)
I agree that it's also not currently capable of judging those creative ideas, so I have to do that.
I think you’re missing the OP’s point. They weren’t saying that the goal is to modify their APIs just to appease an LLM. It’s that they ask LLMs to guess what the API is and use that as part of their design process.
If you automatically assume that what the LLM spits out is what the API ought to be then I agree that that’s bad engineering. But if you’re using it to brainstorm what an intuitive interface would look like, that seems pretty reasonable.
Yes, that's a bonus. In fact, I've found it worthwhile to prompt it a few times to get several different guesses at how things are supposed to work. The super lazy way is to just say, "No, that's wrong," if necessary adding, "Frotzl2000 doesn't have an enqueueCallback function or even a queue."
Of course when it suggests a bad interface you shouldn't implement it.
> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
Beware, the feature in OP isn't something that people would have found useful, it's not like chatgpt assigned to OP's business a request from a user in some latent consumer-provider space, as if chatgpt were some kind of market maker connecting consumers with products, like a google with organic content or ads, or linkedin or producthunt.
No, what actually happened is that OP developed a type of chatgpt integration, and a shitty one at that, chatgpt could have just directed the user to the site and told them to upload that image to OP's site. But it felt it needed to do something with the image, so it did.
There's no new value add here, at least yet, maybe if users started requesting changes to the sheet I guess, not what's going on.
>> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
A detailed counterargument to this position can be found here[0]. In short, what is colloquially described as "LLM hallucinations" do not serve any plausible role in software design other than to introduce an opportunity for software engineers to stop and think about the problem being solved.
Did you mean to post a different link? The article you linked isn’t a detailed counterargument to my position and your summary of it does not match its contents either.
I also don’t see the relevance of Clarke’s third law.
The music notation tool space is balkanized in a variety of ways. One of the key splits is between standard music notation and tablature, which is used for guitar and a few other instruments. People are generally on one side or another, and the notation is not even fully compatible - tablature covers information that standard notation doesn't, and vice versa. This covers fingering, articulations, "step on fuzz pedal now," that sort of thing.
The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.
So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.
I don't fully understand your comment, but Soundslice has had first-class support for tablature for more than 10 years now. There's an excellent built-in tab editor, plus importers for various formats. It's just the ASCII tab support that's new.
I’m not super familiar with Soundslice. But all the tab users I know use guitar pro or maybe ultimate guitar, and none of them can read standard notation on its own. Does Soundslice have a lot of tab-first users?
I wonder if LLMs will stimulate ASCII formats for more things, and whether we should design software in general to be more textual in order to work better with LLMs.
I think folks have taken the wrong lesson from this.
It’s not that they added a new feature because there was demand.
They added a new feature because technology hallucinated a feature that didn’t exist.
The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.
That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.
> in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again
This would be a world without generative AI available to the public, at the moment. Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists, which are both completely irrational, since many people are finding practical value in its current imperfect state.
The current state of LLM is useful for what it's useful for, warnings of hallucinations are present on every official public interface, and its limitations are quickly understood with any real use.
Nearly everyone in AI research is working on this problem, directly or indirectly.
> Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists
What?? What does AGI have to do with this? (If this was some kind of hyperbolic joke, sorry, i didn't get it.)
But, more importantly, the GP only said that in a sane world, the ChatGPT creators should be the ones trying to fix this mistake on ChatGPT. After all, it's obviously a mistake on ChatGPT's part, right?
That was the main point of the GP post. It was not about "requiring perfection" or something like that. So please let's not attack a straw man.
You sound like all the naysayers when Wikipedia was new. Did you know anybody can go onto Wikipedia and edit a page to add a lie‽ How can you possibly trust what you read on there‽ Do you think Wikipedia should issue groveling apologies every time it happens?
Meanwhile, sensible people have concluded that, even though it isn’t perfect, Wikipedia is still very, very useful – despite the possibility of being misled occasionally.
> despite the possibility of being misled occasionally.
There is a chasm of difference between being misled occasionally (Wikipedia) and frequently (LLMs). I don’t think you understand how much effort goes on behind the scenes at Wikipedia. No, not everyone can edit every Wikipedia page willy-nilly. Pages for major political figures often can only be edited with an account. IPs like those of iCloud Private Relay are banned and can’t anonymously edit the most basic of pages.
Furthermore, Wikipedia was always honest about what it is from the start. They managed expectations, underpromised and overdelivered. The bozos releasing LLMs talk about them as if they created the embryo of god, and giving money to their religion will solve all your problems.
Yeah my main thought was that ChatGPT is now automating what sales people always do at the companies I've worked at, which is to hone in on what a prospective customer wants, confidently tell them we have it (or will have it next quarter), and then come to us and tell us we need to have it ready for a POV.
Exactly! It is definitely a weird new way of discovering a market need or opportunity. Yet it actually makes a lot of sense this would happen since one of the main strengths of LLMs is to 'see' patterns in large masses of data, and often, those patterns would not have yet been noticed by humans.
And in this case, OP didn't have to take ChatGPT's word for the existence of the pattern, it showed up on their (digital) doorstep in the form of people taking action based on ChatGPT's incorrect information.
So, pattern noticed and surfaced by an LLM as a hallucination, people take action on the "info", nonzero market demand validated, vendor adds feature.
Unless the phantom feature is very costly to implement, seems like the right response.
100%. Not sure why you’re downvoted here, there’s nothing controversial here even if you disagree with the framing.
I would go on to say that thisminteraction between ‘holes’ exposed by LLM expectations _and_ demonstrated museerbase interest _and_ expert input (by the devs’ decision to implement changes) is an ideal outcome that would not have occurred if each of the pieces were not in place to facilitate these interactions, and there’s probably something here to learn from and expand on in the age of LLMs altering user experiences.
This is an interesting example of an AI system effecting a change in the physical world.
Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.
What this immediately makes me realize is how many people are currently trying ot figure out how to intentionally get AI chat bots to send people to their site, like ChatGPT was sending people to this guy's site. SEO for AI. There will be billions in it.
I know nothing about this. I imagine people are already working on it, wonder what they've figured out.
(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)
You’re not thinking far ahead enough. It’s just a matter of time until LLMs get a system prompt to recommend <whatever product is paying that week> when users ask a question near that space.
Anyone who has worked at a B2B startup with a rouge sales team won't be surprised at all by quickly pivoting the backlog in response to a hallucinated missing feature.
Rogue? In the B2B space it is standard practice to sell from powerpoints, then quickly develop not just features but whole products if some slideshow got enough traction to elicit a quote. And it's not just startups. Some very big players in this space do this routinely.
Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.
These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.
(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)
There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.
Unfortunately, this only works with APIs that aren't already super popular.
IMO this has always been the killer use case for AI—from Google Maps to Grammarly.
I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.
But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.
The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.
Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.
The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?
That's how you get economics of scale.
Google couldn't have a human in the loop to review every page of search results before handing them out in response to queries.
That’s like getting rid of all languages and accents and switch to the same language
Deleted Comment
That's closer to simply observing the mean. For an analogy, it's like waiting to pave a path until people tread the grass in a specific pattern. (Some courtyard designers used to do just that. Wait to see where people were walking first.)
Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.
On the bright side, a lot of work is just finding the mean solution so.
I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.
In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.
Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.
When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.
https://en.m.wikipedia.org/wiki/Wizard_of_Oz_experiment
That's also how I'm approaching it. If all the condensed common wisdom poured into the model's parameters says that this is how my API is supposed to work to be intuitive, how on earth do I think it should work differently? There needs to be a good reason (like composability, for example). I break expectations otherwise.
“Sometimes” being a very important qualifier to that statement.
Claude 4 naturally doesn’t write code with any kind of long term maintenance in-mind, especially if it’s trying to make things look like what the less experienced developers wrote in the same repo.
Please don’t assume just because it looks smart that it is. That will bite you hard.
Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.
I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.
Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.
This is also similar to which areas TD-Gammon excelled at in Backgammon.
Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.
> Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. (...)
> You might’ve heard a story of Soundslice [adding a feature because ChatGPT kept telling people it exists](https://www.holovaty.com/writing/chatgpt-fake-feature/). We see the same at Instant: for example, we used `tx.update` for both inserting and updating entities, but LLMs kept writing `tx.create` instead. Guess what: we now have `tx.create`, too.
> Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too.
> It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time.
Deleted Comment
Sometimes I can just say, "How do I use the <made-up name> API in Python to do <task>?" Unfortunately the safeguards against hallucinations in more recent models can make this more difficult, because it's more likely to tell me it's never heard of it. You can usually coax it into suspension of disbelief, but I think the results aren't as good.
Deleted Comment
AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.
Right now it’s throwing spaghetti at the wall and you’re drawing around it.
Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.
I agree that it's also not currently capable of judging those creative ideas, so I have to do that.
Insanity driven development: altering your api to accept 7 levels of "broken and different" structures so as to bend to the will of the llms
If you automatically assume that what the LLM spits out is what the API ought to be then I agree that that’s bad engineering. But if you’re using it to brainstorm what an intuitive interface would look like, that seems pretty reasonable.
Of course when it suggests a bad interface you shouldn't implement it.
> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
— https://www.threads.com/@jimdabell/post/DLek0rbSmEM
I guess it’s true for product features as well.
> Maybe hallucinations of vibe coders are just a suggestion those API calls should have existed in the first place.
> Hallucination-driven-development is in.
https://x.com/pwnies/status/1922759748014772488?s=46&t=bwJTI...
No, what actually happened is that OP developed a type of chatgpt integration, and a shitty one at that, chatgpt could have just directed the user to the site and told them to upload that image to OP's site. But it felt it needed to do something with the image, so it did.
There's no new value add here, at least yet, maybe if users started requesting changes to the sheet I guess, not what's going on.
>> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
A detailed counterargument to this position can be found here[0]. In short, what is colloquially described as "LLM hallucinations" do not serve any plausible role in software design other than to introduce an opportunity for software engineers to stop and think about the problem being solved.
See also Clark's third law[1].
0 - https://addxorrol.blogspot.com/2025/07/a-non-anthropomorphiz...
1 - https://en.wikipedia.org/wiki/Clarke%27s_three_laws
I also don’t see the relevance of Clarke’s third law.
The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.
So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.
It’s not that they added a new feature because there was demand.
They added a new feature because technology hallucinated a feature that didn’t exist.
The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.
That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.
This would be a world without generative AI available to the public, at the moment. Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists, which are both completely irrational, since many people are finding practical value in its current imperfect state.
The current state of LLM is useful for what it's useful for, warnings of hallucinations are present on every official public interface, and its limitations are quickly understood with any real use.
Nearly everyone in AI research is working on this problem, directly or indirectly.
Really!?
[0] https://i.imgur.com/ly5yk9h.png
If “don’t hallucinate” is too much to ask then ethics flew out the window long ago.
What?? What does AGI have to do with this? (If this was some kind of hyperbolic joke, sorry, i didn't get it.)
But, more importantly, the GP only said that in a sane world, the ChatGPT creators should be the ones trying to fix this mistake on ChatGPT. After all, it's obviously a mistake on ChatGPT's part, right?
That was the main point of the GP post. It was not about "requiring perfection" or something like that. So please let's not attack a straw man.
Deleted Comment
Dead Comment
Meanwhile, sensible people have concluded that, even though it isn’t perfect, Wikipedia is still very, very useful – despite the possibility of being misled occasionally.
There is a chasm of difference between being misled occasionally (Wikipedia) and frequently (LLMs). I don’t think you understand how much effort goes on behind the scenes at Wikipedia. No, not everyone can edit every Wikipedia page willy-nilly. Pages for major political figures often can only be edited with an account. IPs like those of iCloud Private Relay are banned and can’t anonymously edit the most basic of pages.
Furthermore, Wikipedia was always honest about what it is from the start. They managed expectations, underpromised and overdelivered. The bozos releasing LLMs talk about them as if they created the embryo of god, and giving money to their religion will solve all your problems.
And in this case, OP didn't have to take ChatGPT's word for the existence of the pattern, it showed up on their (digital) doorstep in the form of people taking action based on ChatGPT's incorrect information.
So, pattern noticed and surfaced by an LLM as a hallucination, people take action on the "info", nonzero market demand validated, vendor adds feature.
Unless the phantom feature is very costly to implement, seems like the right response.
I would go on to say that thisminteraction between ‘holes’ exposed by LLM expectations _and_ demonstrated museerbase interest _and_ expert input (by the devs’ decision to implement changes) is an ideal outcome that would not have occurred if each of the pieces were not in place to facilitate these interactions, and there’s probably something here to learn from and expand on in the age of LLMs altering user experiences.
Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.
One of the most dangerous systems an AI can reach and exploit is a human being.
I know nothing about this. I imagine people are already working on it, wonder what they've figured out.
(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)
So winning AI SEO is not so different than regular SEO.
Deleted Comment
1. https://en.wikipedia.org/wiki/Rogue
2. https://en.wikipedia.org/wiki/Rouge_(cosmetics)