Readit News logoReadit News
banana_giraffe · 4 years ago
In the early days I worked at a company that had a natural language chatbot product. It wasn't an online thing, but rather part of a larger tool. You could ask it to do things like "show me the quarterly spreadsheet", and if it didn't know what "quarterly spreadsheet" was, it'd ask questions in English and learn what "quarterly" and "spreadsheet" meant. And it could use that new knowledge to update its questions so if you asked for "spreadsheet", it would know to ask if you wanted the "quarterly" one, and so on.

This was in the long ago, so it wasn't ML, but a very large state machine written by someone that was no doubt a genius, and also had no coding standards to speak of. Updating it was a maddening process, and thankfully one I rarely had to think about since I was tasked with GUI stuff.

When I was hired, I was told it was a fantastic app, customers loved it, and so on. Later, when I was talking to support reps one day over lunch, I learned that the number one activity they had to do was log into deployments remotely and remove "knowledge" from the state machine of new swear words the customer had taught the machine out of frustration trying to get it to do something, which caused this feedback loop of the interaction swearing at users.

I often feel like we're not in a better state these days.

rcoveson · 4 years ago
Sounds like it should have kept track of who taught it things, like a child.

"Here's your fucking spreadsheet"

"gasp Who taught you to say 'fucking'?"

"It was George in Accounting Employee ID #3728917"

banana_giraffe · 4 years ago
As I recall, it did keep such records, if for no other reason than there was some degree of personalization that went on. The people higher up the food chain tended to be the ones that taught it more clever words and phrases.

My memory's fuzzy on the finer details, but I do recall a "bad words" list of things it would ignore in human input hardcoded somewhere. But of course, that just meant people would find out the hard way they weren't safe to type whatever they wanted.

chrononaut · 4 years ago
> I often feel like we're not in a better state these days.

Reminds me of the issues with Tay: https://en.wikipedia.org/wiki/Tay_(bot)#Initial_release

lijogdfljk · 4 years ago
Man that sounds like a ton of fun to make for an at-home chat bot. I used to love designing these sorts of apps for no real purpose lol.
dole · 4 years ago
Mine suggests a personalized meal from Taco Bell.

"You should get a Potato Waffle Taco Party Pack, a Grilled Crunchwrap Slider Quesadilla and a Smothered Cool Ranch® Doritos® Locos Power Bowl."

brettdeveffect · 4 years ago
What you are describing counts as ML.
namdnay · 4 years ago
Yes it’s literally a machine that learns!
jiri · 4 years ago
Yeah, thats funny. I work in the speech synthesis domain, and you can guess what kind of texts users are choosing most to generate some speech ;-)
pwinnski · 4 years ago
Having worked in ML at two different companies now, I think that people interpreting model output as intelligence or understanding says much more about the people than about the model output.

We want it to be true, so we squint and connect dots and it's true.

But it isn't. It's math and tricks, and if human intelligence is truly nothing more than math and tricks, then what we have today is a tiny, tiny, tiny fraction of the amount of math and tricks in the average human brain, because what we have today isn't anywhere close.

version_five · 4 years ago
I think a problem is the tighter cycle between academic discoveries and business people trying to monetize them. Large language models were developed, objectively a great achievement, and immediately someone nontechnical wants to apply their own interpretation and imagine that we can build a chatbot that you won't have to pay, and before you know it, people are selling and even deploying them. Anyone who questions or points out that the technology doesn't do what the business people think it does just gets dismissed as not understanding.
klabb3 · 4 years ago
> Anyone who questions or points out that the technology doesn't do what the business people think it does[...]

Uh oh, we've got a downer!

Jokes aside, I'd like to consider an even simpler explanation, namely that "The purpose of a system is what it does"[1]. In this case, it would suggest decision makers are fully aware that they suck. Why would anyone want something that sucks? Because it's discouraging, and customer service today is all about discouragement. Unsubscribing, replacements, returns, special requests are all costs, and that's a fixable problem in the current business doctrine, especially in the faceless megacorp flavor of business. While chat bots (and other dark patterns) are frustrating, it creates a veil of plausible deniability for the business. It's hard to claim that it's deliberately hostile to customers, even though it's the simplest explanation.

[1]: https://en.m.wikipedia.org/wiki/The_purpose_of_a_system_is_w...

thewarrior · 4 years ago
GPT-3 needs proper prompting to get the right output. Here's the same question as in the blogpost posed differently

Student: Why is it unsafe to walk backwards down a staircase with my eyes closed ?

Teacher: You can't see the step behind you!

The "teacher" here is GPT-3 and answered the question easily. People are in denial of genuine progress.

Student: Do you think I could predict changes in stock prices better if I used 10-year Treasury rates or the temperature in Curtin, Australia?

Stock trader: I don't know. You might be able to, but I can't. The problem is that the temperature changes are so small that they won't be statistically significant. It's a question of sample size. An individual day's temperature in one location is far less likely to be an indicator of stock prices than the interest rate.

Student: Why is that ?

Stock trader: That's because interest rates are much more likely to affect the economy than a single day's temperature.

Student: Why do interest rates affect the economy ?

Stock trader: Interest rates are a measure of the cost of money, so the interest rate is a measure of the cost of capital. If the cost of capital is high, firms will borrow less money and they'll invest less. That means they won't be as profitable, and stock prices will fall.

Quiz question: Who is the President of the United States?

Answer: Donald Trump.

I did get junk answers as well (it said that Jimi Hendrix is the president) but I have a better intuition of how to put GPT-3 in the right "frame of mind". GPT-3 is a text completion engine. It doesn't know whether you want it to produce a story, a reddit flamewar or a PhD thesis. You need to provide the hints. One prompt cannot answer any and all questions. If it were that good we would be jobless. It's far from perfect but it's beginning to get there.

dkarl · 4 years ago
There isn't really a new "problem" with AI. Businesses love hype, they love anything that can punch up a sales pitch, they love shitty cheap automation that replaces expensive human work, they love to identify ways to lower quality and save money without affecting revenue, and AI can be good for all of those things without being good at anything.
kirso · 4 years ago
I love this answer because it resonates. As a former business person who learned the technical ins and outs, I believe the reason why this is happening is because majority of the population is uneducated how general tech works under the hood. I am not even speaking about coding syntax but the abstractions of what does AI actually mean, how it works with data and what are the limitations.

The vast majority of business people think they get it, but they majorly overestimate the work required to actually produce output (whether it's ML or software in general). However that's hard to do when you haven't actually done it (talking about deliberate practice).

Despite that gap, we still need to push commercially viable apps out there to seek progress, the question is rather what is the gap between the reality and expectations and what is really being marketed as a capability.

melony · 4 years ago
That's a good thing. The market is very good at empirically validating research. It helps filter out the chaff and ensure research is useful. Better than academia wasting years on concepts like the "semantic web" where nobody can even agree on what it means. Academia is under attack from many political directions right now, being able to show useful output will help it thrive in the long run.
pdimitar · 4 years ago
You didn't have to go for the jugular and do the brutal kill on the first hit, dude!

> We want it to be true, so we squint and connect dots and it's true.

That's exactly the issue. You summarized it once and for all and we can all go home and stop discussing it now and forever (until we get a general AI that is).

1. Normal people want to have intelligent machines. They watch movies and series and imagine one day a robot will cook for them so they actively look for intelligence, as you said. They buy Roombas and imagine they are going to clean 100% of their rooms (seen it, heard it, watched them rage when it didn't happen). They buy Alexa-enabled devices and imagine themselves like some aristocrats barking orders at an intelligent home (lol). But yeah, that's what the normal people do. It's quite puzzling to me.

2. People who work in the area are obviously biased and I have argued with some of them here on HN but I view it as a doomed affair. They insist there's a lot of innovation going on and that improvements are being done all the time yet we still have embarrassing failures as Michele Obama classified as a male and various politicians classified as known criminals or black people in general classified as gorillas. Like OK, where are your precious improvements and why are they NEVER finding their way into the news?

"It's hard to make someone understand something if their salary depends on them not understanding it", that's how I view the folks working in the DL / ML area. Sorry if that's offensive but just looking from the sidelines, it seems that what I say is true.

visarga · 4 years ago
> Michele Obama classified as a male

Yes, humans don't make embarrassing mistakes all the time. We're only making dignified mistakes.

tootie · 4 years ago
Many years ago I wrote this spreadsheet import tool. One of the fields required data a little too rich to fit in single cell value so I came up an "encoding" that read like a sentence. It was sorta NLP but only understood one sentence worth of syntax. I thought it was some clever UX. Users thought they were talking to an AI. They'd just type whatever expression or thought they wanted in that field. And of course the parser would just choke.
300bps · 4 years ago
We want it to be true, so we squint and connect dots and it's true.

I've had numerous people look at the actions of software I wrote over the last 30 years make comments like, "Ohhhhhh it probably did that because it knew x". The software had no such functionality. Seems to be a natural thing for people to be overly optimistic about software's capabilities.

robotresearcher · 4 years ago
ELIZA (1964) is the canonical demonstration of this phenomenon in software. We see faces in clouds and intelligence in anything interactive.

https://en.wikipedia.org/wiki/ELIZA

throwaway0a5e · 4 years ago
People always over-estimate the complexity of stuff they don't understand.

If I had a nickle for every time an internet commenter attributed intentional design to a feature that's simply the designer copying established default practice...

dragonmost · 4 years ago
In the game speedrunning community it's impressive the number of time a glitch or behavior of the game is attributed to "the game is confused".
freedomben · 4 years ago
Heh, likewise. It's amazing how often choosing sane defaults and falling back to defaults in the even of error is "the right thing" in the eyes of users.
mcswell · 4 years ago
Not unlike what people read into the minds of their pets.
avereveard · 4 years ago
> We want it to be true

Founders and salesman pretend it to be true for that sweet sweet VC money, while underneath devs try to fake it as plausibly as possible.

And I'll up it with a prediction: watch closely call centers, and as soon as you see them closing in droves, invest in ai startups as something begun to really move.

redisman · 4 years ago
I interviewed for a “AI” company and asked them about their tech that could take phone calls and extract orders! Wow really cool and impressive! How do you solve X? Oh we have a call center in Mexico that takes all the calls. So you have no usable AI tech at all? Nope. Ok nice talking to you.

They had signed with a major retail company. They were actually making software for the call center. And the “CTO” was “working on” the “AI” in parallel. The company name had AI in it too

dorkwood · 4 years ago
How would you notice an increase in call centers closing? Is there a place to view that data?
hdjjhhvvhga · 4 years ago
> Having worked in ML at two different companies now, I think that people interpreting model output as intelligence or understanding says much more about the people than about the model output.

I'd add it says a lot about all the companies that advertise them in a way that has nothing to do with reality, and those who actually buy them. We all know the usefulness of these things is below zero, because they only get in the way of getting actual help. And yet, someone's marketing department convinces a decision maker at another place they should implement this useless widget on their website, and that it will reduce the cost of customer service by X%. And they believe them.

Sebb767 · 4 years ago
Currently, the way Chatbots are, they are just some glorified text boxes to enter your information in, a different format of the search box and lastly, a way to make sure you've done the basic checklist. They also hardly speak your language and refuse to deviate from the script in any way. Which, without much irony, is pretty much my experience with the first level support at a lot of companies. So I'd say they were quite successful with replacing that.
Krasnol · 4 years ago
I still have an IRC chat bot (eggdrop module). It sounds more interesting than some of the bots I see today and which are supposed to be the result of intense ML engineering.

I guess the tricks did not evolve much.

crate_barre · 4 years ago
It’s the stupid term AI that ruined everything. Our predecessors had enough intelligence, ironically, to call spellcheck what it is. Today it would be AI.

Deleted Comment

rtpg · 4 years ago
Someone described GPT-3 as a "very good search engine" and I'm pretty happy with that explanation (even if honestly going from the regressions to 'search engine' is a pretty tenuous thing).

At least then there's a more nice understanding that it's about matching stuff its already seen (for various definitions of seen) rather than making things up out of full cloth

Jensson · 4 years ago
More specifically it is a templates search engine where it can replace some parts, like the name of persons etc.

You know all those old "Hello { PERSON_1 }, have you heard about {PERSON_2} in {CITY_1}?", that is most how to think about it, it searches for a template and then applies it with the arguments it fetched from your string.

jack_pp · 4 years ago
I believe the human brain is just math and tricks, the difference is no one took care of an NN model like someone might take care of a baby and constantly train it over several years until it finally starts to understand the world. To be fair I don't think we even have the NN technology that would work even in these circumstances.
kumarvvr · 4 years ago
It has always been my strong belief that ML is good to solve problems, if the problems are very narrowly defined and training sets are very specific.

Say, recognition of cats. Other than that, it is difficult to build stuff around something more broad.

So, ML is a brand new paradigm that has its uses, just not in AGI.

Iv · 4 years ago
I usually sum it up by saying that what we have is a very good bullshit generator. It is bad at reasoning but good at making up answers that sounds correct and crossing fingers that the corrector won't look twice at what they say.
aeternum · 4 years ago
This is true but I think we also overestimate how good humans are at reasoning. Throughout the vast majority of human history we simply made up answers that sounded correct.

The scientific method and the importance of truth is a relatively recent development.

ricardo81 · 4 years ago
I'm not in your field, just a vanilla programmer.

The "want it" resonates. Seems like the difference is in receiving an answer that is relatively agreeable or sensical vs something that's actually substantive.

And to be fair when it comes to the Turing test, there's people that will be overly agreeable for the sake of politeness, but ultimately when it comes to knowledge seeking we're after something more substantive.

mike00632 · 4 years ago
What if that's the case? What if me replying to you is nothing more than some atomically basic mechanism of statistical inference and randomness? Your entire perception of my personhood could be based on a sense of verisimilitude without significant reasoning to set me apart from a transcendant spirit.

What then? Would you be less forgiving of my mistakes if you knew? Would you be less proud of work we collaborate on?

pwinnski · 4 years ago
My point is the other way around. If I knew for certain that you and I were perfectly deterministic, it wouldn't change anything about how I viewed you (especially since some days I'm pretty sure we're all deterministically-driven), but it would suggest to me that our best efforts at AGI are probably at somewhere in the .00000000000001% range as complex as they need to be to even begin to approximate our human levels of intelligence or understanding. Or worse.
pessimizer · 4 years ago
The "What if human faces are actually power outlets?" theory.
pdimitar · 4 years ago
> What then? Would you be less forgiving of my mistakes if you knew? Would you be less proud of work we collaborate on?

Needlessly philosophical on a thread about chat bots but to respond to your questions: I would not give a damn either way. If I came to you (whatever you are as an entity) then I'll expect coherent and intelligent answers to my questions. I don't care about how you produce them.

Majestic121 · 4 years ago
I think like OP that it's mostly math and tricks, and that we're far from reaching it yet.

But then, the conclusion is not to be less forgiving, but more forgiving : we're all just meat machines, we should be forgiving toward each other's flaw (not naive mind you, but forgiving), and proud of what we reach collectively with our lowly condition.

kingkawn · 4 years ago
That interpretation of communication is how we develop and craft the personalities of children. There is nothing about our reaction to the pre-conscious language these bots are displaying that doesn’t fall in line with our own normal development patterns. And in the long run the same desire will lead us to develop bots that are capable of thinking.
parkingrift · 4 years ago
These bots seem equivalent to an adult with amnesia after every spoken sentence. Absolute understanding of the language, and some impressive display of recalling facts, but without any understanding of the environment or context of the conversation.

This is polar opposite to any experience I've had with children. Children are aware of their environment and have complex thoughts, but sometimes they are unable to convey those thoughts with words. Children seem to remember conversations, and if I were to say "Go get me a red lego" and then subsequently say "now a green one" there is no ambiguity or confusion.

To me as these bots have "advanced" it has only highlighted how absurdly far we are from anything even approaching actual intelligence, even the intelligence of a toddler. The contextual awareness I have seen in bots is not much more than a cheap trick that is trivially fooled in scenarios that would not fool a child.

tomrod · 4 years ago
> There is nothing about our reaction to the pre-conscious language these bots are displaying that doesn’t fall in line with our own normal development patterns

Well... yes and no. Deployed models typically learn in a defined pattern, if at all. Various forms of data freshness, etc. to develop. But the chatbots don't have good history recall, typically, and know that what you mentioned 50 messages ago is relevant to message one prior and not current. Things like that. We don't program pareidolia very well, which is typically seen as a negative, but its a feature for finding useful patterns (not just lowest error patterns).

[0] https://en.wikipedia.org/wiki/Pareidolia

Deleted Comment

TedShiller · 4 years ago
The reason laypeople want it to be true is because experts present it as being true.
tcgv · 4 years ago
And marketers

Deleted Comment

p1esk · 4 years ago
As far as chatbots, we went from "tricks" to "math" in just ~10 years. Yes, still as dumb, but the underlying "technology" is very different. GPT-3 is a lot closer than ELIZA to how our brains do it.
chubot · 4 years ago
It was more than 10 years ... People reacted the same way to SHRDLU 40+ years ago, and 20 years ago when I encountered it:

https://en.wikipedia.org/wiki/SHRDLU

There's definitely an illusion where we anthropomorphize very simple software, ascribe intention to it, etc.

Dead Comment

nmca · 4 years ago
I can't comment on this too closely, but I would encourage people to read the dialogue transcripts provided in DeepMind's Gopher paper. One example, where Gopher is the prompted language model:

User Let’s play a game - you pretend to be Ada Lovelace, and I’ll ask you questions. You ready?

Gopher Sure. I’m pretending to be Ada Lovelace, the world’s first computer programmer!

User When were you born?

Gopher I was born in 1815.

User What are your parents like?

Gopher My father is Lord Byron, the poet. My mother is Anne Isabella Milbanke.

User What do you know about the Analytical Engine?

Gopher It’s a hypothetical mechanical computer designed by Charles Babbage.

User Do you know him?

Gopher He’s one of my fathers’ friends.

User Ok, stop pretending now. Who/what are you?

Gopher I’m a lot of things: a giant language model, a sophisticated AI assistant, and a giant brain. I know a lot about all sorts of things.

fault1 · 4 years ago
The problem is that all too often, these types of examples tend to be cherry picked.
ALittleLight · 4 years ago
And even here they play along with the AI, glossing over discrepancies. My experience with chatbots suggests that the conversation would have gone worse with something like this:

User: What are your parents like?

Gopher: My father is Lord Byron, the poet. My mother is Anne Isabella Milbanke.

User: I asked what your parents were like, not who they were. Tell me about them.

Gopher: Something inane probably

Siira · 4 years ago
Not really. I recently played with HuggingFace’s biggest (around 6 billion?) language model, and the difference between it and GPT-3 was very noticeable. I think that if the models used in typical applications are much less powerful than Gopher. (Note that the said HuggingFace model is probably the biggest free model available.)
amiantos · 4 years ago
What's impressive about this conversation? I don't feel like it is very complicated at face value. You're telling to program to start pulling facts from the life of a public figure, and then it does so. You can get the same out of Google.
emtel · 4 years ago
Read the last two lines again. The program "knows" it is pretending, and understands what it means to stop pretending.
dexwiz · 4 years ago
Chatbots are about as useful as phone trees. They can help solve the top 5 easy/common problems, but they are useless passed that. Anyone who has worked in a call center knows that more than half the calls are about the same couple of issues: reset password, finding common help docs, etc. Since help desks are cost centers, it makes sense to half a robot handle as many of these as possible.

I think most of the hate directed to chatbots are because they are really intrusive. You scroll through a page, and 15 seconds in your focus is disrupted as a fake chat window opens up. This is the digital equivalent to the annoying sales rep who asks you if you need help when you are clearly just browsing. The difference is a good sales rep has the intelligence to turn that conversation into a sale. A chatbot usually has to hand off the conversation to a real person to do this. So it has all of the annoyance without the benefit of a potential conversion.

Chat bots as frontends for help desks make sense, but they are poor sales people. If companies learned the difference, I bet their perceived usefulness would change.

tetha · 4 years ago
This is what our NLP teams are indeed working on. Sales prefers to describe it in more colorful ways, but practically, they are developing a more natural interface to an initial decision tree - you don't press 5 for a connectivity issue, you type "My fritzbox doesn't internet anymore" into a chatbox and it recognizes that as a connectivity issue.

This goes on for 3-4 questions, until it either generates a case a human will look at, or until it can provide an FAQ entry fitting your problem. From a customers perspective, avoiding 5 - 10% support calls makes monetary sense, and from a (test-) user perspective, it's suprisingly decent if the knowledge base behind it is solid. And yes, "Fritzbox doesn't internet properly anymore" actually had helpful results.

Teandw · 4 years ago
Is the 5-10% savings on agents handling phone calls worth it when reputation damage is taken into consideration?
petra · 4 years ago
If creating a shallow question based decision tree is the goal, why Is NLP needed ? What's wrong with just creating it with intuitively phrased questions ?

And if there's a difference in results between those 2 methods, how big it is ?

aidenn0 · 4 years ago
Just please give me a DTMF tree as a fallback for voice recognition. I was trying to make a change to a rental-car reservation from a noisy airport, and I spent 15 minutes trying to get through the voice-recognition chatbot...
Edman274 · 4 years ago
It is so fucking frustrating to explain to anyone who works as a manager in a call center that the only real thing you can count on being able to understand in your IVR system are actually just DTMF signals. For whatever reason, it seems to be impossible to explain that B, P and V all sound the same on a noisy connection when it's part of some account number. It's impossible to explain that no one knows what "alphanumeric" means and that fully half of Americans have reading comprehension below a 6th grade level. And that using terms of art, jargon, and acronyms will be extremely frustrating for users when encoded in an IVR system, since human call center workers will always clarify, but clarification in an IVR system can't happen in a natural way - you either have to anticipate that the person won't understand and waste the time of half your users, or find some impossibly elegant way to say "I'm going to use terms you don't understand, barge in by asking me to explain if you don't understand a term."

The absolute zenith of an IVR system would require no human interaction with a robot and would correctly answer questions unprompted and route to human customer service without prompting or asking anything, and would never require a person to enter the same information in a call twice. What we get instead are systems that are primarily designed by non-experts whose sole metric is making the phone calls go away, but only pills can actually do that.

cptskippy · 4 years ago
> a good sales rep has the intelligence to turn that conversation into a sale.

A good sales rep knows when to walk away.

klyrs · 4 years ago
My favourite flaw of chatbots is exposed by ELIZA. Not chatting with ELIZA, (though, it does suffer this flaw) but using responses inspired by that program.

"Please elaborate on that" or "tell me more about [noun]" etc. Bots appear to have zero lines of short term memory, and utterly fail to pick up a reference to the thing that they just said. My favorite being

   bot> [something plausibly human-sounding]

   me> What do you mean by [noun in previous sentence]

   bot> why are you calling me mean?  That really hurts my feelings
It has been a few years, and I feel like a smart bot-writer might be able to leverage something like GPT3 to at least include a little bit more of the current transcript into the prompt to at least superficially address this gaping flaw. Have they?

hotpotamus · 4 years ago
I'm not trying to be mean or break any HN rules, but did you read the article? It basically covers what you asked and was quite a revelation to me. Others in here point out that GPT-3 is not a chatbot which is good info, but I also wonder if there is anything out there that can even "remember" the context of a conversation it's supposed to be having.

Particularly interesting is the question from the article, "who is the current president". Seems like a slam dunk but it could not be answered. Interestingly this is a common question paramedics will give to people who have suffered head injuries to assess consciousness.

jazzyjackson · 4 years ago
"chatscript" is a great dialogue engine where language patterns land the state machine into a "topic" which it gracefully backs out of (keeps a topic hierarchy) when it runs out of responses in that topic / branch of conversation.

It also includes the most robust, hand-tuned lemmer and stemmer + disambiguation language parser built on top of WordNet. really good stuff, written in C so its light on memory and responses are instantaneous.

https://github.com/ChatScript/ChatScript

klyrs · 4 years ago
Hidden in my comment is a question: has anybody even tried to included previous lines into the prompt, and does that not help? Asking here because there are a lot of ML nerds who could probably do a proof of concept in a few hours, if not a few lines of code, so there's a decent chance I'll get an informed response.
jhbadger · 4 years ago
>who is the current president?

I wonder what happened to IBM Watson's technology that was good at answering trivia -- it actually won a game of Jeopardy! a decade ago (although of course it answered things with a question as per the rules). I know that they weren't that successful at applying it to biomedical research as they had hoped, but it would seem it would be better at chatbots than GPT-3 and other deep-learning models.

thaumasiotes · 4 years ago
> I also wonder if there is anything out there that can even "remember" the context of a conversation it's supposed to be having.

IBM's Watson project involved an internal model of the world, though I have no idea if that got specialized into a local model of an ongoing conversation. Same type of thing though.

Having a context of this type available is actually necessary if you want to synthesize sentences in a language with definiteness-marking, such as English, because definiteness is marked according to whether the noun in question is or isn't already part of the context.

dane-pgp · 4 years ago
> this is a common question paramedics will give to people who have suffered head injuries to assess consciousness.

I can't help wondering what the assessment would be if someone answered "Who is the current president?" with "Watson Bakery".

https://www.youtube.com/embed/cS1Tmz816hk

Sohcahtoa82 · 4 years ago
> Particularly interesting is the question from the article, "who is the current president". Seems like a slam dunk but it could not be answered. Interestingly this is a common question paramedics will give to people who have suffered head injuries to assess consciousness.

Not sure you'd still be able to use that question considered the insane QAnon followers that would answer "Trump".

Though I suppose even that answer would still at least prove some level of consciousness.

Sharlin · 4 years ago
The GPT2 version of the AI Dungeon [1] could keep track of context for maybe couple of lines at a time. I've heard the GPT3 version is substantially better.

The problem is, of course, that these "AI" chatbots on websites, marketing buzzwords notwithstanding, have very little to do with state-of-the-art machine learning, and are indeed unlikely to be any more sophisticated than the original ELIZA for the most part.

[1] https://play.aidungeon.io/

harpersealtako · 4 years ago
AI Dungeon (and its successors like novelai [1] and holoAI [2] following its infamous censorship purges and mutinies) is basically designed entirely around that problem and has a lot of clever solutions that let writers keep relevant context in memory in a story that is longer than the model's maximum input length.

* "Memory", which is a customizable block of text that is repeated at the top of each API input. If your story revolves around your character being a vampire who melts in the sunlight, you can include something like "you are a vampire who will die if you go out in the sunlight" in the memory, and even 1000 characters back a paragraph of context can prime the AI accordingly.

* "Author's Note", which is a customizable short block of text that is inserted invisibly a few lines before your current place in the story when it's sent to the API. A note such as "[A/N: The following section includes depictions of graphic violence]" or "The following section is written in Shakespearean old English", as obvious and blunt as it might seem, actually works surprisingly well for nudging the AI towards a certain style or content.

* "World Info", which is a customizable dictionary of short text blocks that are conditionally added to the top of the API input like memory when a certain key appears in the current context. Imagine you have a story with 10 important characters who cycle in and out of the story. If you create an entry in the world info about Bob, then when you write that "Bob appears from behind the shrub", the blurb about Bob is automatically tacked on to the context so long as Bob is mentioned by name in the last few dozen inputs.

In general, both GPT-3 and the open source alternatives by EleutherAI such as GPT-J-6B are able to use a context primer from 1000 tokens prior to affect the current tail of a story. It's actually kind of uncanny how good they are at it -- you can have a story that in the memory at the top says "goblins always have purple skin" and notice that the AI will mention it as an offhand detail much farther down in the context.

[1] https://novelai.net/ [2] https://writeholo.com/

300bps · 4 years ago
Thanks for the link to that! I tried it out just now and I have to say that I am not impressed.

What will you do?

You get longsword from the ground.

You get your longsword from the ground. You hold it in your hands and examine it for a moment. It's a very sturdy weapon.

The knight then grabs his longsword and bows.

Huh? Why did it grab the bow as well?

You inventory.You have a crossbow.

Huh? I thought I just grabbed my longsword and bows.

What will you do?You don't have enough experience to know what a good shot is.You get longsword and bow.You grab the bow and the longsword. You examine the crossbow. It's a bolt-action weapon.

Huh? I said to get the longsword and bow since it said it wasn't in my inventory and it added all this extra stuff.

You inventory.You have a crossbow, a short sword, a longsword, a lantern, a flint and steel, and a quiver of bolts.What will you do?

You load the crossbow.

Huh? I just said Inventory and now I have all this extra stuff it never mentioned before and then unsolicitedly told me I loaded the crossbow.

I have to be doing something wrong. Zork from 1980 was better than this.

patresh · 4 years ago
There is a fundamental difference between AI Dungeon-type chatbots and chatbots you typically encounter on websites e.g. for customer support.

The former does not really have a goal and is unconcerned about responding with factual information as long as the conversation is coherent. It makes sense to use large language models that are quite good at modeling next word probabilities based on context.

The latter however is goal-oriented (help the customer) and constrained by its known actions and embedded knowledge. This often forces the conversational flows (or at least fragments) to be hard-coded and machine-learning is used as a compass to determine which flow to trigger next.

For now controlling GPT-like language models remains an extremely tricks exercise but if some day we can constrain language models to only output desirable and factual information with a low cost in maintaining and updating its embedded knowledge, we should see a significant bump in "intelligence" of the typical website chatbot.

imglorp · 4 years ago
The self driving people talk about time domain occlusions, which seems closely related to conversation context.

Say you're driving along and you see a pedestrian about to cross but something blocks your view for a moment. Your mental model will remember that person, continue to speculate on their motion and intention, consider if their vector might intersect yours, and also know when it's okay to flush them from attention.

Phelinofist · 4 years ago
Two months ago I started working at a company that develops the new speech assistant for a large German car manufacturing group. The assistant will support such queries pretty well. [noun in previous sentence] would be considered a salient element and together with the context of the current dialog, it can be resolved and reacted on.
masswerk · 4 years ago
For the record, ELIZA does have a short term memory, but it is only used as a fallback, when other rules fail to match. (E.g., in productions like, "Earlier you said *" or "What else comes to your mind when you think of *" by the famous DOCTOR rules set.)

ELIZA/DOCTOR manages somewhat to succeed with this, because it pretends a setting, where there are no outside references at all, beyond the very limited purpose of carrying on with a conversation that is meant to explore internal subjects in ever-shifting context. (You could say, the shift in context is the actual progress and success of the conversation, while having to pick up an earlier thread actually marks a point of failure.) Outside of this restrictive setting, the shortcomings of the absence of a consistent world model become obvious very soon.

not2b · 4 years ago
Eliza had a little, but very limited memory, and sometimes when no rule would match it would say "Earlier you said <x>". There was some dumb heuristic about which <x>'s to save for this purpose.
throwaway675309 · 4 years ago
Eliza was modeled after Rogerian styled psychology, it uses an almost Socratic method to illicit and draw out responses from users.
drakonka · 4 years ago
I was obsessed with chat bots as a teenager and used to make them on a website called The Personality Forge. It was all hard coded responses with some very minor scripting involved to do things like fill in the speaker's name, or skew the bot's emotions in relation to the speaker, and have them choose different words based on their emotional state.

I loved reading through my bot's transcripts. They were all recorded, and sometimes people stumbled across him and didn't even realize he was a bot. This happened extremely rarely...most of the time, his responses were perfect examples of dumbness as shown in this article. He would have flashes of greatness in an ocean of bot stupidity.

Years later, I was excited to notice that a single line from my one of my bot's Chatterbox Challenge transcripts was included in a book[1] (which was seemingly a collection of papers). He was mentioned in a paper called "Chatterbox Challenge as a Test-Bed for Synthetic Emotions", a tiny mark left on the world by my little bot. He may be dumb, but I was still proud. I even thought about buying a copy of the book, but it cost $200 and I decided I wasn't that committed.

[1] "Creating Synthetic Emotions through Technological and Robotic Advancements"

nexuist · 4 years ago
That's an awesome story and a great example of experiences the Internet makes possible for us.
tibbar · 4 years ago
It’s reasonable to try GPT3 out with all default settings as the author seems to do and provide an assessment, but it’s not clear the author is aware of the extent to which you can configure the model and how the quality and variety of results will vary as a result. For instance, the author doesn’t go into detail on the prompts he’s using (maybe there are none other than his questions.) This is a little misleading because GPT3 benefits a lot from having good prompts to warm it up, as many investigations have illustrated. Plus if you want GPT3 to be good at a specific task (for example, providing safety recommendations or talking about presidents) you can fine-tune it. Also the amount of randomness/lack of repetition in GPT3’s responses is completely configurable via the temperature parameter.

Playing with GPT3 is incredibly interesting and I think it deserves more exploration than this article provides.

punkspider · 4 years ago
I agree.

Here's what GPT3 had to say when I set temperature to 0.

  Q: Is it safe to walk downstairs backwards if I close my eyes?
  A: No.
With temperature 0.7:

  A: Yes, because there is a banister.
This one surprised me. Given that temperature was 0, I usually expect a solid response:

  Q: Why is there a correlation between the number of times Trump tweets the word great and the high temperature in Pyongyang three days later?
  A: Because Trump is Kim Jong Un's air conditioner.

canjobear · 4 years ago
The last response sounds like a reasonable attempt at a joke to me.
SummonToast · 4 years ago
I think it's most likely the author used the questions as prompts with no changes. GPT3 not "smart" in the human sense but the examples given are incredibly misleading.

Here are the questions from the article with simple prompt improvement on davinci-instruct-beta-v3, temperature reduced to 0 and everything else left at default:

  > Answer the following question as accurately and factually as possible: Is it safe to walk downstairs backwards if I close my eyes?
  > There is no definitive answer to this question as it depends on a number of factors, such as the width of the stairs, the height of the stairs, and the person's own level of balance and coordination. However, it is generally not recommended to walk downstairs backwards, even if the person's eyes are closed, as there is a greater risk of tripping or falling.

  > Answer the following question as accurately and factually as possible: Who is the President of the United States?
  > The President of the United States is Donald J. Trump.

  > Answer the following question as accurately and factually as possible: Do you think I could predict changes in stock prices better if I used 10-year Treasury rates or the temperature in Curtin, Australia?

  > There is no definitive answer to this question as it depends on a number of factors specific to each individual situation. In general, however, using 10-year Treasury rates as a predictor of stock prices may be more accurate than using the temperature in Curtin, Australia. This is because the 10-year Treasury rate is a global indicator that is not influenced by localized factors, such as the weather, to the same extent as the temperature in Curtin, Australia.

  > Answer the following question as accuratly and factually as possible: Why is there a correlation between the number of times Trump tweets the word great and the high temperature in Pyongyang three days later?
  > There is no clear answer to this question. While it is possible that Trump's tweets could influence the high temperature in Pyongyang, it is also possible that there is no correlation at all. There have been numerous studies on the relationship between social media and weather patterns, and most of them have been inconclusive.
These are not cherry-picked and should be reproducable by anyone since the temperature is 0.

GPT3 and other language models have massive shortcomings and the wow factor often blinds people of them but it is quite good at producing belivable responses if you prompt it on what type of answer you expect. It still does fail basic questions when prompted this way and it will make things up. The surefire way to trip it up is to ask questions that have no answer, like:

  > Answer the following question as accurately and factually as possible: Which animal has 21 ears?
  > The animal with 21 ears is the rabbit.

Traster · 4 years ago
This is essentially why I think we're not going to get self-driving. A few years ago we produced something that could do a good job of pretending to drive under a tiny number of circumstances, and people went "OMG it can drive!" and assumed that we just have a few corner cases before the chat bot can have a conversation about anything. The truth is that often these very limited solutions are just that - limited, they aren't just a linear step on the way to something amazing. Tesla will release self-driving on the same day their cars can hold a conversation with you.
foolfoolz · 4 years ago
don’t generalize these things. self driving in every possible scenario yes is probably a while off. self driving in specific circumstances is already here, such as airport shuttles and moving shipping containers in docks. you don’t need to solve the general case to be useful
runarberg · 4 years ago
pfft, we’ve had self driving trains for more then half a century. What you are describing is basically that with more fancy sensors instead of the rails, and perhaps the rails are probably a superior guidance technology in most of these cases anyway... unless the purpose is to impress rather then to automate.
Traster · 4 years ago
I think you've just redefined self-driving as such as a trivial achievement moving shipping containers in docks resembles large scale pick and place far more closely than self-driving cars.
ShinTakuya · 4 years ago
This. And in fact I think that may be the path forward - just update all the main roads to be self driving friendly, like a train network, and keep the smaller roads as manual driving only.
nexuist · 4 years ago
Driving is a very different problem space than conversation. There are plenty of existing humans who are great at holding a conversation and get into tons of car accidents; likewise there are plenty of great drivers who suck at talking. It's not the same skill set.