ChatGPT was announced November, 2022 - 8 months ago. Time flies.
Question for HN: Where are we in the hype cycle on this?
We can run shitty clones slowly on Raspberry Pi's and your phone. The educational implementations demonstrate the basics in under a thousand lines of brisk C. Great. At some point you have to wonder... well, so what?
Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.
It did generate a whole lot of breathless click-bait-y articles and gave people something to blab about. Ironically it also accelerated the value of that sort of gab and clicks towards zero.
As I am not a VC, politician, or opportunist, hand waving and telling me this is Frankenstein's monster about to come alive and therefore I need billions of dollars or "regulations" just makes folks sound like the crypto scammers.
Please HN, say something actually insightful, I beg you.
I work in tech diligence so I look at companies in detail. I have seen a couple where good machine learning is going to make a massive difference (whether it will keep them ahead of everyone is a separate question). I think it really boils down to:
"Is this a problem where an answer that is mostly right and sometimes wrong is still a great value proposition?"
This is what people don't get. If sometimes the answer is (catastrophically) wrong, and the cost of this is high, there's no market fit. So I think a lot of these early LLM related startups are going to be trainwrecks because they haven't figured this out. If the cost of an error is very high in your business, and human checking is what you are trying to avoid, these are not nearly as helpful.
I looked at one company in this scenario and they were dying. Couldn't get big customers to commit because the product was just not worth it if it couldn't be reliably right on something that a human was never going to get wrong (can't say what it was, NDAs and all that.) I also looked at one where they were doing very well because an answer that was usually close would save workers tons of time, and the nature of the biz was that eliminating the human verification step would make no sense anyway. Let's just say it was in a very onerous search problem, and it was trivial for the searcher to say "wrong wrong wrong, RIGHT, phew that saved me hours!". And that saving was going to add up to very significant cash.
So killer apps are going to be out there. But I agree that there is massive overhype and it's not all of them! (or even many!)
That's interesting. Quite the needle to thread. I wonder how big the market will be for niche models that aren't commodities.
It needs to be something lucrative enough that training the model is not-trivial but not so lucrative Microsoft/Google would care enough to go after. And it somehow needs to stay in that sweet spot even as Nvidia chips away at that moat with each new hardware generation.
I'll say that I pretty firmly disagree with this. I've been using Github Copilot for about six months for my own work and it has fundamentally changed how I write code. Ignoring the ethics of Copilot, if I just need to read a file with some data, parse it, and render that data on screen, Copilot just _does_ most of that for me. I write a chunky comment explaining what I want, it writes a blob of code that I tab through, and I'm left with a nicely-documented, functioning piece of software. A one-off script that took me 30 minutes to write previously now takes me maybe a minute on a bad day.
For ages we've had Text Expander and key mappings and shortcuts and macros that render templates of pre-built code. Now I can just say what I'm trying to do, the language model considers the other code on the page, and it gets done.
If this isn't a "killer app" then I'm not sure what is. In my entire career I can think of maybe two things that I've come upon that have affected my workflow this much: source control and continuous integration. Which, frankly, is wild.
Separately, I use LLMs to generate marketing copy for my side hustle. I suck at marketing, but I can tell the damn thing what I want to market and it gives me a list of tweets back that sound like the extroverted CMO that I don't have. I can outsource creative tasks like brainstorming lists of names for products, or coming up with text categories for user feedback from a spreadsheet. I don't know if I'd call either of those things "killer apps" but I have a tool which can do thinking for me at a nominal cost, quickly, and with a high-enough quality bar that it's usually not a waste of my time.
My friend made a great comparison that seems to agree with your take: chatGPT for coding is like when ruby on rails came out. Or wordpress. felt magical and boosted (a certain kind of) productivity through the roof.
We don't think of rails as the second coming though.
same with code editors. of course a rails for all of code is cool. but iono, it's a code editor. i still use sublime.
I think the microsoft gpt integration on Office is probably that app.
Ability to ask to have your email's summarised, or getting your excel sheets formulas configured with natural language, etc are increidbly useful tools to lower the floor of entry to tools that already speed up humans so much.
I don't think the use of this tools is some life redefining feature, but a friend of mine joked that in a year from now you will right a simple sentence like "write polite work email with following request: Come to the meeting, you are late" then Gpt will write the email, another gpt will send it and his GPT will sumarise it and he will reply with another gpt message instantly apologising that you will read the summary off. Leaving a trail of polite long messages that no one will even open.
Got a good chuckle from me. I find that in quick daily back-and-forths, time saved by such a system would be negligible. In many places I've worked, the 'polite work mail' has gone out the door long ago, already at the lower bound of what is considered a proper sentence.
There was a science fiction story about this, with phone auto-message and auto-answer systems connecting with each other long after all the humans were dead.
Can we stop acting like the Gartner "hype cycle" is anything more than a marketing gimmick created Gartner to validate their own consulting/research services?
While you can absolutely find cases that map to the "hype cycle", there is nothing whatsoever to validate this model as remotely accurate or valid for describing technology trends.
Where is crypto in the "hype cycle"? It went through at least 3 rounds of "peak of inflated expectation" and I'm not confident it will ever reach a meaningful "plateau of productivity".
Did mobile ever have "inflated expectation"? Yes there was a lot of hype in the early days but those people hyped about it, rushing to build mobile versions of their websites... were correct.
The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful. It's only useful for Gartner to create an illusion of credibility and sell their services.
> The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful.
What do you propose as a more accurate alternative, or do you think that the whole idea should be scrapped? Because personally I feel like certain tech/practices certainly go through multiple stages, where initially people expect too much from them and eventually figure out what they're good for and what they're not.
Not always a single linear process, like NFTs/crypto refusing to die despite numerous scams out there and projects that seem to go nowhere, yet people still falling for the scams due to promised profits. However, the amount of people critiquing the blockchain as a crappy database seems to suggest at least some lessons learnt along the way and hopefully some actually decent use cases.
Gartner are so great at their job you think they own the concept of hype cycles and rage against them being mentioned while being the one to introduce them to the conversation in the first place :)
That 8 months seems like a long time to you is indicative of just how fast tech has been moving lately. I expect at least another year before we have a good sense for where we actually are, probably more.
However, I'll hazard a guess: I think we haven't seen many real new apps since then because too many people are focused on packaging ChatGPT for X. A chatbot is a perfectly decent use case for some things, but I think the real progress will come when people stop trying to copy what OpenAI already did and start integrating LLMs in a more hands-off way that's more natural to their domains.
A great example that's changed my life is News Minimalist [0]. They feed all the news from a ton of sources into one of the GPT models and have it rate the story for significance and credibility. Only the highest rated stories make it into the newsletter. It's still rough around the edges, but being able to delegate most of my news consumption has already made a huge difference in my quality of life!
I expect successful and useful applications to fall in a similar vein to News Minimalist. They're not going to turn the world upside down like the hype artists claim, but there is real value to be made if people can start with a real problem instead of just adding a chatbot to everything.
> Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.
In my mind I divide LLM usage into two categories, creation and ingestion.
Creation is largely a parlor trick that blew the minds of some people because it was their first exposure to generative AI. Now that some time has passed, most people can pattern match GPT-generated content, especially one without sufficient "prompt engineering" to make it sound less like the default writing style. Nobody is impressed by "write a rap like a pirate" output anymore.
Ingestion is a lot less sexy and hasn't gotten nearly as much attention as creation. This is stuff like "summarize this document." And it's powerful. But people didn't get as hyped up on it because it's something that they felt like a computer was supposed to be able to do: transforming existing data from one format to another isn't revolutionary, after all.
But the world has a lot of unstructured, machine-inaccessible text. Legal documents saved in PDF format, consultant reports in Word, investor pitches in PowerPoint. And when I say "unstructured" I mean "there is data here that it is not easy for a machine to parse."
Being able to toss this stuff into ChatGPT (or other LLM) and prompt with things like "given the following legal document, give me the case number, the names of the lawyers, and the names of the defendants; the output must be JSON with the following schema..." and that save that information into a database is absolutely killer. Right now companies are recruiting armies of interns and contractors to do this sort of work, and it's time-consuming and awful.
Isn’t the summarization of text like legal documents where the notion of hallucinations come in as a huge blocker?
Is the industry making progress on fixing such hallucinations? Or for that matter the privacy implications of sharing such documents with entities like OpenAI that don’t respect IP?
Until hallucinations and IP/PII are fixed I don’t want this technology anywhere near my legal or personal documents.
ChatGPT has already put some copywriters and journalists out of work, or at least reduced their hours. The app is quite literally “killing” something, i.e. people’s jobs. For those people, it’s not just empty hype. It’s very real. Certainly it’s already more real than anything having to do with blockchain/crypto.
I'm dubious. The few news websites that started publishing LLM articles (CNET, etc) were already circling the drain. They'd probably have fired their journalists anyway because they're on the edge of bankruptcy.
The killer app for large enterprises is Q&A against the corporate knowledgebase(s). Big companies have an insane amount of tribal knowledge locked away in documents sitting on Sharepoint, on Box, on file servers, etc. Best case scenario, their employees can do keyword search against a subset of those documents. Chunk those docs, run them through an embedding process, store the embeddings in a vector store, let employees ask questions, do a similarity search against the vector store, pass the top results and the question to the LMM, get an actual answer back to present to the employee. This unlocks a ton of knowledge and can be a massive productivity booster.
There is definitely interesting and high-potential technology here. I do not think the current crop of "wrap ChatGPT in an API for XYZ business-case" startups will succeed - they will be total fails across the board. There is also an issue where anyone with an iota of experience or degree in something tangential to AI or ML can be the "genius" behind a new startup for funding - a telltale sign of bubble mentality to me.
If LLMs in their current form as human-replacement agents are cheaper versions of Fiver / mechanical turks, and we all know there are very limited, bottom-of-the-barrel use cases for those cheap labor technologies, then why would LLMs be a radical improvement? It's nonsensical.
About as killer as that twitter clone that was in the news for a minute after forcing people to use it and immediately losing 90% of the captive audience..
They have been losing users. Summer is here, school is out, the kids are back in reality for the moment and apparently when they aren't busy plagiarizing homework the interest is very limited.
I personally use copilot every day and I love it. It reduces the amount of typing I have to do, gives me lots of good suggestions for solving simple problems and has made working with unfamiliar languages so much easier.
I'd say we're maybe half or two-thirds of the way down from the peak of inflated expectations toward the trough of disillusionment. Before long, I think maybe in the next three months or so, certainly around the time we hit the one year anniversary of chatgpt's release, we'll start seeing mainstream takes along the lines of "chatgpt and Bing's Sydney episode and such were good entertainment, but it's obvious in hindsight that it was a fad; nobody is posting funny screenshots of their conversations anymore, and all the pronouncements about a superhuman AGI apocalypse were obviously silly, it's clear chatgpt has failed and this whole thing was the same old hype-y SV pointlessness".
And at that point, we will have reached the trough of disillusionment. I think funding will be less readily available, and we'll start seeing some of the bevy of single-purpose LLM-based products start closing up shop.
But more quietly, others will be (already are) traversing up the slope of enlightenment. As others have mentioned, this is stuff like features in Microsoft's and Google's productivity products (including those for software engineering productivity like Github Copilot), and some subset of products and features elsewhere that turn out to be compelling in a sticky way.
I expect 2024 and 2025 to be the more interesting part of this hype cycle. I don't think we're on the verge of waking up in a world nobody recognizes in a small number of days or months, but I think in a few years we're going to have a bunch of useful tools that we didn't have a year ago, some of which are the obvious ones we've already seen, but improved, and others that are not obvious right now.
Not sure if this was insightful enough for you :) Apologies if not.
We're still on the exponential rise of the hype cycle. If capabilities appear to plateau - no GPT5/6 that are even more amazing, then the hype will not merely plateau but plummet. For now, anything seems possible.
As for a killer app, I'm another person for whom ChatGPT is it. I use GPT-4 something like Google, Wikipedia and Stack Overflow in one, but being very aware of the limitations. It feels a bit like circa 2000 when being good at googling things felt like a superpower. It doesn't do everything for you but can make you drastically more effective.
There's three levels of what's going on with AI at the moment, each with their own momentum and hype cycle: (1) the current generation of chat bots and image generators, which some of us would be using for the rest of our lives even with only minor refinements; (2) the prospect that new tools built on top of this and subsequent generations could remake the internet and how we interact with our gadgets; and (3) the prospect that the systems will keep getting smarter and smarter.
I wonder if language translation will be one of the "killer apps".
Especially if it can be done real-time and according to the context/level of the audience/listener. Even within the same language, translation from a more technical/expert level to a simplified summary helps education/communication/knowledge transfer significantly.
I mentioned the Stack Overflow Developer Survey once already today, but at the risk of sounding like a broken record, it has some data on this as well: https://survey.stackoverflow.co/2023/#ai
To save someone a click, around 44% of the respondents (some 39k out of 89k people) are already using "AI" solutions as a part of their workflow, another 25% (close to 23k people) are planning to do so soon.
The sentiment also seems mostly favorable, most aim to increase productivity or help themselves with learning and just generally knock out some more code, though there is a disconnect between what people want to use AI for (basically everything) and what they currently use it for (mostly just code).
There's also a section on the AI search tools in particular, about 83% of the respondents have at least had a look at ChatGPT, which is about as close to a killer app as you can probably get, even if it's cloud based SaaS: https://survey.stackoverflow.co/2023/#section-most-popular-t...
> Where are we in the hype cycle on this?
I'm not sure about the specifics here, but the trend feels about as significant as Docker and other container technologies more or less taking the industry by storm and changing a bunch of stuff around (to the point where most of my server software is containers).
That said, we're probably still somewhere in the early stages of the hype cycle for AI (the drawbacks like hallucination will really become apparent to many in the following years).
Honestly, the technology itself seems promising for select use cases and it's still nice that we have models that can be self hosted and somehow the software has gotten decent enough that you can play around with reasonably small models on your machine even without a GPU: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...
I'm cautiously optimistic about the current forms of LLM/AI, but fear that humanity will misuse the tech (as a cost cutting measure sometimes, without proper human review).
The killer app is ChatGPT. I'm not sure what you're expecting here, but it's been enormously useful while trying out new languages. For example, even if it's not 100% right, it has been a great help while working with nix, as I'm often ignorant to entire methods of solving a problem, and it's pretty good at suggesting the right method.
It's also super useful for things like "convert this fish shell snippet to bash" or "rewrite this Python class as a single function". It tends to really nail these sorts of grounded questions, and it legitimately saves me time.
I think 8 months is a little short for the utility of a new tech to be fully realized and utilized. I'm pretty sure there were still horses on the roads long after 8 months after the Model T first went on sale.
I can't tell if this is satire or not. It is so... Well, to be polite, sounds so much like an uninformed stock trader, that I find it hard to believe this isn't some sort of meta commentary on hacker News conversations.
There are plenty examples of where the technology can eventually lead in terms of entertainment, impact on society and news, knowledge work, and so on. It doesn't have to happen immediately. But to handwave The myriad articles about the subject away and just say " I don't believe any of it, what else you got" is a bit annoying.
Why don't you ask ChatGPT or bard? If there s a hype cycle, it is just starting.
The killer app is the LLM tech itself, and the victim seems to be the whole tech ecosystem. It disintermediates everyone who is gatekeeping information and connects end users with the information they want without the google, the SEO and without ads. Even if we are not right there today, the potential is there. This in itself is huge, since the whole ecosystem of SV is funded by ads.
I think it has shown the limitations of the Society of Mind hypothesis. Aggregating individuals equates to aggregating knowledge/experience, not intelligence. This is why hives and anthills do not really surpass their individuals intelligence. Ditto for human societies. In other words: composing LLMs using tools like langchain yield minor improvements over a single LLM instance.
Its not an "AI-killer-app" thats the real deal I think. Its that these AI tools (esp LLMs) are truly powerful tools in everyday work now. Automating stuff is a breeze now whereas it was much more involved before. Data classification, content/code creation, data transformation, ... typical jobs for software engineers boil down to this. Its only a prompt now you fire against an API. Automating tasks that used to require human clerks is now a few hours/days of creative coding and the tasks are gone.
A surprising amount of work can tolerate a percentage of errors in a non-deterministic way, even before considering that humans make even more errors that way usually. :-)
To be extremely cynical, all of this hype seems to be the mid-life crisis of gen-xers who grew up on the jetsons trying to bring the future they saw on tv as children to life, withour regard to the economic or technical feasibility.
The biggest impact on my life has been Code Interpreter. Much of my job as a CEO involves analyzing data to make strategic decisions - “which of several options is best based on the evidence?”
Code Interpreter lets me upload data in a multitude of formats and play with it without wasting hours futzing around in Google Sheets or pulling my hair out with Pandas confusion. I know basic statistics concepts and I studied engineering so I know about signals and systems. But putting that knowledge into practice using data analysis tools is time consuming. Code Interpreter automates the time consuming parts and lets me focus on the exploration, delivering insights I never even had access to before.
I don't think there's a "killer app" coming soon, but it'll be a thousand cuts. One awesome thing here, one slightly less awesome but still useful thing over there. Take Copilot. Cool stuff and one of the early products. Doesn't change the game in any fundamental way, but it does have its impact on the work of a substantial fraction of developers.
This is not unlike the computer revolution itself. When the PC came on the scene it was easy - for some types - to imagine The Future and they proclaimed it loudly. They forgot that the rest of the world take their time and regularly take decades to get used to very minor changes in their routine.
Since writing that, we’ve started using https://read.ai and other similar tools at my company, and we find them very helpful. I also have a friend working on a large content moderation team that will be using LLaMa 2 for screening comments. Lots of uses!
This concept of hype and decline has been happening for literally decades. Yet people don't realize it even when it's literally on the first google page for anything to do with AI.
The people spouting this AI nonsense seriously need to fuck off and read a book.
> This article needs to be updated. The reason given is: Add more information about post-2018 developments in artificial intelligence leading to the current AI boom.
I don't quite remember anything existing and comparable over the last few decades to LLMs like ChatGPT/Claude
We shipped a major feature in our core product atop the API. It's central to our onboarding experience for new users, and works quite well at the job of "teaching" them how to use the product more effectively. It isn't magic, but this has been an inflection point in capabilities.
An artist friend of mine with no programming knowledge used ChatGPT to produce a variety of cool visuals for a music gig, in Processing - spinning wireframes, bobbing cube grids, that sort of thing. They didn't even know they needed to use Processing at first - ChatGPT told them everything. They had an aesthetic in mind, and ChatGPT helped them deliver.
I don't want to make any real assertions but my intuitive reaction to this comment is _this person has no clue what they are talking about_. I would rather turn of syntax highlighting than turn off Copilot and I'd rather disable Google search rather than ChatGPT. And frankly, it's not even close, I use these tools "all the time for everything".
If you follow the Gartner model, there is usually a surge of high expectations right before a "trough of disillusionment" - but eventually the real applications do emerge. Humans are just impatient.
This got way out of hand as of by now and isn't about serving humanity as a whole anymore in big parts (!). This is some actors with the money and hardware trying to build their AI dream castles up on the shoulders of the rest and even don't care what the implications of their actions are. Money is regulating this business and is taking more away from us all in the long term than it pays in the short. I'm kinda glad we're developing backwards, because this changes are necessary for building a balanced future for all of us. Not just for a few eligible...
I run through a lot of these concepts, specifically RLHF, in my latest coding stream where I finetune LLama 2 if anyone's interested in getting a LLM deep dive https://www.youtube.com/watch?v=TYgtG2Th6fI&t=4002s
Long story short, the size of the model and reward mechanisms used in validating off of human annotating/feedback are the main differences between what we can do as independents in OSS vs OpenAI. BigCode's StarCoder (https://huggingface.co/bigcode/starcoder) has some human labor backing it (I believe correct me if I'm wrong) but at the end of the day a company will always be able to gather people better.
Not knocking Starcoder, in fact I streamed how to fine tune it the other day. However, it's important to mention some of the limitations in the OSS space now (big reason Meta pushing LLama 2 is a nice to have)
The way to think about it is that backpropagation changes the parameters of a model so they get closer to some sort of desired output.
In pre-training and SFT, the parameters are changed so the model does a better job of replicating the next word in the training data, given the words it has already seen.
In RLHF, the parameters are changed so the model does a better job of outputting the response that aligns to the human's preference (see: the feedback screen in the linked article).
I’ve been reading a pop neuroscience book called Incognito (2011).
In it, the author talks about how the brain is a group of competing sub-brains of many forms, and the brain might have several ways of doing the same thing (e.g. recognizing an object). The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains. Our brains are always adjusting and trying new scenarios.
I was struck by how similar these brain observations were to recent developments in AI and LLMs.
The book is full of cool stories, even if some of them are now recognized as non-reproducible. I recommend!
In the end - an AI should have these competing subsystems in one system - just as our brains are one system.
What I find extremely interesting is how perception and thinking differs from person to person too - it was a "taboo" topic to call this neurodiversity - just as other genetic traits, but AI makes this relevant more than ever imo.
Sure, its complicated and much comes from nurture (Nurture vs nature.. as exposure/epigeneticd vs genetics) but there sure are markable differences - the ones starting to stand out are e.g. adhd / autistic people, but Im sure it wont stay just there over time!
You touch in an important topic here, how our understanding of AI/ML/LLMs will influence our "understanding" of the human brain and intelligence.
My fear is that we will ascribe too much human behaviour to that we see in and understand of our AI inventions, and that this will result in the dehumanisation of people.
So essentially my fear is what we justify doing to each other due to AI, rather than what "AGI" could do to us.
Even within my immediate family we seem to have distinct differences in our conscious experience. My wife has very little visual or auditory experience of thought, no inner voice even when reading a book. While I mostly experience speaking as a continuous stream of words coming basically from my subconscious, with only a vague sense of what's coming up, one of my daughters says she is consciously aware of the exact words she is going to say several seconds in advance. It's like she has the ability to introspect her internal speech buffer, while I can't.
So while I'm sure there are a lot of custom tuned, problem specific hardware structures in our brain architecture, we do seem to learn how to actually use that hardware individually. As a result we seem to come up with a diverse range of different high level approaches.
> The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains.
That became popular in neural networks after the introduction of dropout regularization, which forced neurons to "co-adapt" and learn to do each others' jobs. Large, over-specified models also provide a natural setting for co-adaptation.
In fact, this is what psychoanalysis and the notion of the unconscious (as opposed to "subconscious processes") was all about. (And it's also, where the "talking cure" found its leverage.)
Specifically about RLHF, I find this video by Rob Miles still the best presentation of the ingenious original 2017(!) paper: https://youtube.com/watch?v=PYylPRX6z4Q
RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.
How do you do science on LLMs? I would imagine that is super important, given their broad impact on the social fabric. But they're non-deterministic, very expensive to train, and subjective. I understand we have some benchmarks for roughly understanding a model's competence. But is there any work in the area of understanding, through repeatable experiments, why LLMs behave how they do?
I'm pretty much certain the cost of training and running large LLMs is going to come down, because it's only a matter of time before truly customized chips come out for these.
GPUs really aren't that. They're massively parallel vector processors that turn out to be generally better than CPUs at running these models, but they're still not the ideal chip for running LLMs. That would be a large even more specialized parallel processor where almost all the silicon is dedicated to running exactly the types of operations used in large LLMs and that natively supports quantization formats such as those found in the ggml/llama.cpp world. Being able to natively run and train on those formats would allow gigantic 100B+ models to be run with more reasonable amounts of RAM and at a higher speed due to memory bandwidth constraints.
These chips, when they arrive, will be a lot cheaper than GPUs when compared in dollars per LLM performance. They'll be available for rent in the cloud and for purchase as accelerators.
I'd be utterly shocked if lots of chip companies don't have projects working on these chips, since at this point it's clear that LLMs are going to become a permanent fixture of computing.
I would imagine it's a bit like doing science on human beings, who are also non-deterministic, expensive to train, and subjective. Perhaps there's scope for a scientific discipline corresponding to psychology but concerned with AI systems. We could call it robopsychology.
There's a field called Interpretability (sometimes "Mechanistic Interpretability") which researches how weights inside of a neural network function. From what I can tell, Anthropic has the largest team working on this [0]. OpenAI has a small team inside their SuperAlignment org working on this. Alphabet has at least one team on this (not sure if this is Deepmind or Deepmind-Google or just Google). There are a handful of professors, PhD students, and independent researchers working on this (myself included); also, there are a few small labs working on this.
At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race. Some researchers and labs are funded by EA charities such as Long Term Future Fund and Open Philanthropy.
There is the occasional hackathon on Interpretability [1].
Here's an overview talk about it by one of the most-known researchers in the field [2].
Some people (namely the EAs) care because they don't want AI to kill us.
Another reason is to understand how our models make important decisions. If we one day use models to help make medical diagnoses or loan decisions, we'd like to know why the decision was made to ensure accuracy and/or fairness.
Others care because understanding models could allow us to build better models.
> Transformers can be generally categorized into one of three categories: “encoder only” (a la BERT); “decoder only” (a la GPT); and having an “encoder-decoder” architecture (a la T5). Although all of these architectures can be rigged for a broad range of tasks (e.g. classification, translation, etc), encoders are thought to be useful for tasks where the entire sequence needs to be understood (such as sentiment classification), whereas decoders are thought to be useful for tasks where text needs to be completed (such as completing a sentence). Encoder-decoder architectures can be applied to a variety of problems, but are most famously associated with language translation.
theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks?
'Formal Algorithms for Transformers'[1] is a proper account of the architectures and what tasks they naturally lend themselves to, by authors from DeepMind. See sections 3 (Transformers and Typical Tasks) and 6 (Transformer Architectures).
As a matter of fact, there’s even more developers making a hard left into AI who have never touched crypto.
The interesting follow up question is: what will they actually spend time on? Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?
I think many of them will be scared by how much of a hard science ML is vs just spinning up old CRUD apps
> Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?
The stable diffusion community is probably 2 years more mature than the GPT, there we see gui tools of a kind (in colab notebooks) to abstract away from code and thenlots of fine tuning.
There is value in applying old techniques to new problems. Training a model to, I don't know, recognize snake species might help save snake bite victims lives.
(This is an example I came up with in 5 seconds, please don't take it seriously)
But there's also the whole "sell the shovel" aspect; it can be hard to train models. It can be hard to interpret the quality of the results. How do I know version 2 of the model is better than version 1? How do I even get labeled photos of snakes and not-snakes?
I suspect solving some of those problems are where some of the real gold is buried.
I'd imagine something like openBB / BB Terminal with consolidated API access for financial reporting, a platform for insider communities ("chat, forums, and an app!"), etc. Make it a club, and it'll sell itself.
Since investment has been demoed successfully with off the shelf models, I don't think we're waiting on big advancements to be able to build a product. The bar for something like this, short term, is 1) be cool and 2) lose less money than traditional investing, sometimes.
Question for HN: Where are we in the hype cycle on this?
We can run shitty clones slowly on Raspberry Pi's and your phone. The educational implementations demonstrate the basics in under a thousand lines of brisk C. Great. At some point you have to wonder... well, so what?
Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.
It did generate a whole lot of breathless click-bait-y articles and gave people something to blab about. Ironically it also accelerated the value of that sort of gab and clicks towards zero.
As I am not a VC, politician, or opportunist, hand waving and telling me this is Frankenstein's monster about to come alive and therefore I need billions of dollars or "regulations" just makes folks sound like the crypto scammers.
Please HN, say something actually insightful, I beg you.
"Is this a problem where an answer that is mostly right and sometimes wrong is still a great value proposition?"
This is what people don't get. If sometimes the answer is (catastrophically) wrong, and the cost of this is high, there's no market fit. So I think a lot of these early LLM related startups are going to be trainwrecks because they haven't figured this out. If the cost of an error is very high in your business, and human checking is what you are trying to avoid, these are not nearly as helpful.
I looked at one company in this scenario and they were dying. Couldn't get big customers to commit because the product was just not worth it if it couldn't be reliably right on something that a human was never going to get wrong (can't say what it was, NDAs and all that.) I also looked at one where they were doing very well because an answer that was usually close would save workers tons of time, and the nature of the biz was that eliminating the human verification step would make no sense anyway. Let's just say it was in a very onerous search problem, and it was trivial for the searcher to say "wrong wrong wrong, RIGHT, phew that saved me hours!". And that saving was going to add up to very significant cash.
So killer apps are going to be out there. But I agree that there is massive overhype and it's not all of them! (or even many!)
Can you give any examples of those types of problems you've encountered?
It needs to be something lucrative enough that training the model is not-trivial but not so lucrative Microsoft/Google would care enough to go after. And it somehow needs to stay in that sweet spot even as Nvidia chips away at that moat with each new hardware generation.
I'll say that I pretty firmly disagree with this. I've been using Github Copilot for about six months for my own work and it has fundamentally changed how I write code. Ignoring the ethics of Copilot, if I just need to read a file with some data, parse it, and render that data on screen, Copilot just _does_ most of that for me. I write a chunky comment explaining what I want, it writes a blob of code that I tab through, and I'm left with a nicely-documented, functioning piece of software. A one-off script that took me 30 minutes to write previously now takes me maybe a minute on a bad day.
For ages we've had Text Expander and key mappings and shortcuts and macros that render templates of pre-built code. Now I can just say what I'm trying to do, the language model considers the other code on the page, and it gets done.
If this isn't a "killer app" then I'm not sure what is. In my entire career I can think of maybe two things that I've come upon that have affected my workflow this much: source control and continuous integration. Which, frankly, is wild.
Separately, I use LLMs to generate marketing copy for my side hustle. I suck at marketing, but I can tell the damn thing what I want to market and it gives me a list of tweets back that sound like the extroverted CMO that I don't have. I can outsource creative tasks like brainstorming lists of names for products, or coming up with text categories for user feedback from a spreadsheet. I don't know if I'd call either of those things "killer apps" but I have a tool which can do thinking for me at a nominal cost, quickly, and with a high-enough quality bar that it's usually not a waste of my time.
We don't think of rails as the second coming though.
same with code editors. of course a rails for all of code is cool. but iono, it's a code editor. i still use sublime.
Dead Comment
I think the microsoft gpt integration on Office is probably that app.
Ability to ask to have your email's summarised, or getting your excel sheets formulas configured with natural language, etc are increidbly useful tools to lower the floor of entry to tools that already speed up humans so much.
I don't think the use of this tools is some life redefining feature, but a friend of mine joked that in a year from now you will right a simple sentence like "write polite work email with following request: Come to the meeting, you are late" then Gpt will write the email, another gpt will send it and his GPT will sumarise it and he will reply with another gpt message instantly apologising that you will read the summary off. Leaving a trail of polite long messages that no one will even open.
https://news.microsoft.com/reinventing-productivity/
It's going to be an absolute "killer app".
Can we stop acting like the Gartner "hype cycle" is anything more than a marketing gimmick created Gartner to validate their own consulting/research services?
While you can absolutely find cases that map to the "hype cycle", there is nothing whatsoever to validate this model as remotely accurate or valid for describing technology trends.
Where is crypto in the "hype cycle"? It went through at least 3 rounds of "peak of inflated expectation" and I'm not confident it will ever reach a meaningful "plateau of productivity".
Did mobile ever have "inflated expectation"? Yes there was a lot of hype in the early days but those people hyped about it, rushing to build mobile versions of their websites... were correct.
The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful. It's only useful for Gartner to create an illusion of credibility and sell their services.
What do you propose as a more accurate alternative, or do you think that the whole idea should be scrapped? Because personally I feel like certain tech/practices certainly go through multiple stages, where initially people expect too much from them and eventually figure out what they're good for and what they're not.
Not always a single linear process, like NFTs/crypto refusing to die despite numerous scams out there and projects that seem to go nowhere, yet people still falling for the scams due to promised profits. However, the amount of people critiquing the blockchain as a crappy database seems to suggest at least some lessons learnt along the way and hopefully some actually decent use cases.
hn today: "where are we on the gartner hype cycle on this one?"
However, I'll hazard a guess: I think we haven't seen many real new apps since then because too many people are focused on packaging ChatGPT for X. A chatbot is a perfectly decent use case for some things, but I think the real progress will come when people stop trying to copy what OpenAI already did and start integrating LLMs in a more hands-off way that's more natural to their domains.
A great example that's changed my life is News Minimalist [0]. They feed all the news from a ton of sources into one of the GPT models and have it rate the story for significance and credibility. Only the highest rated stories make it into the newsletter. It's still rough around the edges, but being able to delegate most of my news consumption has already made a huge difference in my quality of life!
I expect successful and useful applications to fall in a similar vein to News Minimalist. They're not going to turn the world upside down like the hype artists claim, but there is real value to be made if people can start with a real problem instead of just adding a chatbot to everything.
[0] https://www.newsminimalist.com/
In my mind I divide LLM usage into two categories, creation and ingestion.
Creation is largely a parlor trick that blew the minds of some people because it was their first exposure to generative AI. Now that some time has passed, most people can pattern match GPT-generated content, especially one without sufficient "prompt engineering" to make it sound less like the default writing style. Nobody is impressed by "write a rap like a pirate" output anymore.
Ingestion is a lot less sexy and hasn't gotten nearly as much attention as creation. This is stuff like "summarize this document." And it's powerful. But people didn't get as hyped up on it because it's something that they felt like a computer was supposed to be able to do: transforming existing data from one format to another isn't revolutionary, after all.
But the world has a lot of unstructured, machine-inaccessible text. Legal documents saved in PDF format, consultant reports in Word, investor pitches in PowerPoint. And when I say "unstructured" I mean "there is data here that it is not easy for a machine to parse."
Being able to toss this stuff into ChatGPT (or other LLM) and prompt with things like "given the following legal document, give me the case number, the names of the lawyers, and the names of the defendants; the output must be JSON with the following schema..." and that save that information into a database is absolutely killer. Right now companies are recruiting armies of interns and contractors to do this sort of work, and it's time-consuming and awful.
Is the industry making progress on fixing such hallucinations? Or for that matter the privacy implications of sharing such documents with entities like OpenAI that don’t respect IP?
Until hallucinations and IP/PII are fixed I don’t want this technology anywhere near my legal or personal documents.
The big thing holding them back is legal/copyright concerns, but I expect that will be worked out eventually.
Surely the “killer app” is ChatGPT itself?
ChatGPT has already put some copywriters and journalists out of work, or at least reduced their hours. The app is quite literally “killing” something, i.e. people’s jobs. For those people, it’s not just empty hype. It’s very real. Certainly it’s already more real than anything having to do with blockchain/crypto.
Dead Comment
If LLMs in their current form as human-replacement agents are cheaper versions of Fiver / mechanical turks, and we all know there are very limited, bottom-of-the-barrel use cases for those cheap labor technologies, then why would LLMs be a radical improvement? It's nonsensical.
ChatGPT itself is a killer app.
Deleted Comment
They have been losing users. Summer is here, school is out, the kids are back in reality for the moment and apparently when they aren't busy plagiarizing homework the interest is very limited.
And at that point, we will have reached the trough of disillusionment. I think funding will be less readily available, and we'll start seeing some of the bevy of single-purpose LLM-based products start closing up shop.
But more quietly, others will be (already are) traversing up the slope of enlightenment. As others have mentioned, this is stuff like features in Microsoft's and Google's productivity products (including those for software engineering productivity like Github Copilot), and some subset of products and features elsewhere that turn out to be compelling in a sticky way.
I expect 2024 and 2025 to be the more interesting part of this hype cycle. I don't think we're on the verge of waking up in a world nobody recognizes in a small number of days or months, but I think in a few years we're going to have a bunch of useful tools that we didn't have a year ago, some of which are the obvious ones we've already seen, but improved, and others that are not obvious right now.
Not sure if this was insightful enough for you :) Apologies if not.
As for a killer app, I'm another person for whom ChatGPT is it. I use GPT-4 something like Google, Wikipedia and Stack Overflow in one, but being very aware of the limitations. It feels a bit like circa 2000 when being good at googling things felt like a superpower. It doesn't do everything for you but can make you drastically more effective.
There's three levels of what's going on with AI at the moment, each with their own momentum and hype cycle: (1) the current generation of chat bots and image generators, which some of us would be using for the rest of our lives even with only minor refinements; (2) the prospect that new tools built on top of this and subsequent generations could remake the internet and how we interact with our gadgets; and (3) the prospect that the systems will keep getting smarter and smarter.
Especially if it can be done real-time and according to the context/level of the audience/listener. Even within the same language, translation from a more technical/expert level to a simplified summary helps education/communication/knowledge transfer significantly.
That is an intellectual exercise, it requires understanding, and I have not yet seen an LLM implementation that does it properly. If you know one...
What I have seen are outputs that can give the illusion of a properly done job, if the user were willingly (or not) blind to quality.
So: non intellectual translation, we already had tools; intellectually valid one, then we would have much higher opportunities than translation.
I mentioned the Stack Overflow Developer Survey once already today, but at the risk of sounding like a broken record, it has some data on this as well: https://survey.stackoverflow.co/2023/#ai
To save someone a click, around 44% of the respondents (some 39k out of 89k people) are already using "AI" solutions as a part of their workflow, another 25% (close to 23k people) are planning to do so soon.
The sentiment also seems mostly favorable, most aim to increase productivity or help themselves with learning and just generally knock out some more code, though there is a disconnect between what people want to use AI for (basically everything) and what they currently use it for (mostly just code).
There's also a section on the AI search tools in particular, about 83% of the respondents have at least had a look at ChatGPT, which is about as close to a killer app as you can probably get, even if it's cloud based SaaS: https://survey.stackoverflow.co/2023/#section-most-popular-t...
> Where are we in the hype cycle on this?
I'm not sure about the specifics here, but the trend feels about as significant as Docker and other container technologies more or less taking the industry by storm and changing a bunch of stuff around (to the point where most of my server software is containers).
That said, we're probably still somewhere in the early stages of the hype cycle for AI (the drawbacks like hallucination will really become apparent to many in the following years).
Honestly, the technology itself seems promising for select use cases and it's still nice that we have models that can be self hosted and somehow the software has gotten decent enough that you can play around with reasonably small models on your machine even without a GPU: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...
I'm cautiously optimistic about the current forms of LLM/AI, but fear that humanity will misuse the tech (as a cost cutting measure sometimes, without proper human review).
It's also super useful for things like "convert this fish shell snippet to bash" or "rewrite this Python class as a single function". It tends to really nail these sorts of grounded questions, and it legitimately saves me time.
Still plenty around until at least 1930.
http://www.americanequestrian.com/pdf/US-Equine-Demographics...
There are plenty examples of where the technology can eventually lead in terms of entertainment, impact on society and news, knowledge work, and so on. It doesn't have to happen immediately. But to handwave The myriad articles about the subject away and just say " I don't believe any of it, what else you got" is a bit annoying.
The killer app is the LLM tech itself, and the victim seems to be the whole tech ecosystem. It disintermediates everyone who is gatekeeping information and connects end users with the information they want without the google, the SEO and without ads. Even if we are not right there today, the potential is there. This in itself is huge, since the whole ecosystem of SV is funded by ads.
A surprising amount of work can tolerate a percentage of errors in a non-deterministic way, even before considering that humans make even more errors that way usually. :-)
(See flying cars and vertical farms as well.)
Code Interpreter lets me upload data in a multitude of formats and play with it without wasting hours futzing around in Google Sheets or pulling my hair out with Pandas confusion. I know basic statistics concepts and I studied engineering so I know about signals and systems. But putting that knowledge into practice using data analysis tools is time consuming. Code Interpreter automates the time consuming parts and lets me focus on the exploration, delivering insights I never even had access to before.
This is not unlike the computer revolution itself. When the PC came on the scene it was easy - for some types - to imagine The Future and they proclaimed it loudly. They forgot that the rest of the world take their time and regularly take decades to get used to very minor changes in their routine.
Since writing that, we’ve started using https://read.ai and other similar tools at my company, and we find them very helpful. I also have a friend working on a large content moderation team that will be using LLaMa 2 for screening comments. Lots of uses!
https://en.m.wikipedia.org/wiki/AI_winter
This concept of hype and decline has been happening for literally decades. Yet people don't realize it even when it's literally on the first google page for anything to do with AI.
The people spouting this AI nonsense seriously need to fuck off and read a book.
I don't quite remember anything existing and comparable over the last few decades to LLMs like ChatGPT/Claude
https://www.honeycomb.io/blog/improving-llms-production-obse...
It's a revolution, and it's here.
I stopped reading there. So ignorant it’s painful
Deleted Comment
What's the killer app? The Web? CompuServe? AOL chat? ICQ? Blue's News? WebChat Broadcasting? Real Audio? Broadcast.com? GeoCities? It's all ridiculous.
I mean, have you tried to search for anything on AltaVista, Excite or Lycos?!? They hardly work, all you get is page after page of garbage results!
And don't even get me started with eBay or CDNow. Nobody is going to shop online.
edit: I'm not in for a discussion.
?
Long story short, the size of the model and reward mechanisms used in validating off of human annotating/feedback are the main differences between what we can do as independents in OSS vs OpenAI. BigCode's StarCoder (https://huggingface.co/bigcode/starcoder) has some human labor backing it (I believe correct me if I'm wrong) but at the end of the day a company will always be able to gather people better.
Not knocking Starcoder, in fact I streamed how to fine tune it the other day. However, it's important to mention some of the limitations in the OSS space now (big reason Meta pushing LLama 2 is a nice to have)
Deleted Comment
Or is something on top?
The way to think about it is that backpropagation changes the parameters of a model so they get closer to some sort of desired output.
In pre-training and SFT, the parameters are changed so the model does a better job of replicating the next word in the training data, given the words it has already seen.
In RLHF, the parameters are changed so the model does a better job of outputting the response that aligns to the human's preference (see: the feedback screen in the linked article).
For the finetuning i'm using LoRA to freeze most of the layers for parameter optimization. Using PEFT from huggingface
I’ve been reading a pop neuroscience book called Incognito (2011).
In it, the author talks about how the brain is a group of competing sub-brains of many forms, and the brain might have several ways of doing the same thing (e.g. recognizing an object). The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains. Our brains are always adjusting and trying new scenarios.
I was struck by how similar these brain observations were to recent developments in AI and LLMs.
The book is full of cool stories, even if some of them are now recognized as non-reproducible. I recommend!
My fear is that we will ascribe too much human behaviour to that we see in and understand of our AI inventions, and that this will result in the dehumanisation of people.
So essentially my fear is what we justify doing to each other due to AI, rather than what "AGI" could do to us.
So while I'm sure there are a lot of custom tuned, problem specific hardware structures in our brain architecture, we do seem to learn how to actually use that hardware individually. As a result we seem to come up with a diverse range of different high level approaches.
That became popular in neural networks after the introduction of dropout regularization, which forced neurons to "co-adapt" and learn to do each others' jobs. Large, over-specified models also provide a natural setting for co-adaptation.
RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.
Do we care?
GPUs really aren't that. They're massively parallel vector processors that turn out to be generally better than CPUs at running these models, but they're still not the ideal chip for running LLMs. That would be a large even more specialized parallel processor where almost all the silicon is dedicated to running exactly the types of operations used in large LLMs and that natively supports quantization formats such as those found in the ggml/llama.cpp world. Being able to natively run and train on those formats would allow gigantic 100B+ models to be run with more reasonable amounts of RAM and at a higher speed due to memory bandwidth constraints.
These chips, when they arrive, will be a lot cheaper than GPUs when compared in dollars per LLM performance. They'll be available for rent in the cloud and for purchase as accelerators.
I'd be utterly shocked if lots of chip companies don't have projects working on these chips, since at this point it's clear that LLMs are going to become a permanent fixture of computing.
Given that there's already definitely real money involved here, I wonder what's holding up the custom AI ASICs?
At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race. Some researchers and labs are funded by EA charities such as Long Term Future Fund and Open Philanthropy.
There is the occasional hackathon on Interpretability [1].
Here's an overview talk about it by one of the most-known researchers in the field [2].
[0] https://transformer-circuits.pub/2021/framework/index.html [1] https://alignmentjam.com/jam/interpretability [2] https://drive.google.com/file/d/1hwjAK3lWnDRBtbk3yLFL2DCK1Dg...
Another reason is to understand how our models make important decisions. If we one day use models to help make medical diagnoses or loan decisions, we'd like to know why the decision was made to ensure accuracy and/or fairness.
Others care because understanding models could allow us to build better models.
That’s a little depressing.
theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks?
Not much on empirical observations, though.
[1]https://arxiv.org/abs/2207.09238
This reminded me a lot of what economist Murray Rothbard talked about on preferences in his treatise Man, Economy, and State.
There is likely to be other insights hidden in these philosophical works on human choices.
ELI5 - I'd like to know more about this, as I have no experience with this line of thought as of yet.
The interesting follow up question is: what will they actually spend time on? Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?
I think many of them will be scared by how much of a hard science ML is vs just spinning up old CRUD apps
The stable diffusion community is probably 2 years more mature than the GPT, there we see gui tools of a kind (in colab notebooks) to abstract away from code and thenlots of fine tuning.
On the professional side, adobe have plugged these tools into their products. https://www.adobe.com/sensei/generative-ai/firefly.html
There is value in applying old techniques to new problems. Training a model to, I don't know, recognize snake species might help save snake bite victims lives.
(This is an example I came up with in 5 seconds, please don't take it seriously)
But there's also the whole "sell the shovel" aspect; it can be hard to train models. It can be hard to interpret the quality of the results. How do I know version 2 of the model is better than version 1? How do I even get labeled photos of snakes and not-snakes?
I suspect solving some of those problems are where some of the real gold is buried.
Since investment has been demoed successfully with off the shelf models, I don't think we're waiting on big advancements to be able to build a product. The bar for something like this, short term, is 1) be cool and 2) lose less money than traditional investing, sometimes.