AI 'hallucinated' fake legal cases filed to B.C. court in Canadian first

> In one case, a judge imposed a fine on New York lawyers who submitted a legal brief with imaginary cases hallucinated by ChatGPT — an incident the lawyers maintained was a good-faith error.

They need to be disbarred. Submitting legal filings that contain errors because you used ChatGPT to make up crap is the opposite of a "good-faith" error.

leereeves · 2 years ago

> They need to be disbarred.

In that NY case, they were only fined $5000 each, which seems like a slap on the wrist.

I think the court was sympathetic to them, as was I. They were the first lawyers to get burned by ChatGPT. How were they supposed to know that a highly publicized product from a major tech company would just make shit up?

But the penalty for lawyers who do it now, after the first case got so much publicity, should be more severe.

happymellon · 2 years ago

Hard disagree.

They should have taken a look at what they were submitting. That is literally their job.

karmakaze · 2 years ago

> I think the court was sympathetic [...] They were the first lawyers to get burned by ChatGPT.

This makes it even more important to strongly discourage such conduct. Unless (cynic me) considers all the $5K fines they'll collect.

jeroenhd · 2 years ago

I'm not sure if disbarring is appropriate here, but then again I'm no legal professional.

I don't know what the appropriate response would be to a lawyer lying to the court and making up facts. I don't think something as bad as making up lawsuits even happens in normal legal proceedings. I'd presume fines and other types of punishment, depending on if the lawyer is stupid enough to lie about using ChatGPT like in the American case.

The person who made the mistake of hiring this lawyer will probably have grounds to sue them for malpractice, especially if they end up losing this case. I know I'd want my money back if my lawyer didn't even bother to read the paperwork they were filing.

This lawyer will now have "lawyer lied to the court" show up the moment you Google their name. I think that, plus a hefty fine, is more than enough punishment. Whether or not their future clients will trust them after this is up to them.

matheusmoreira · 2 years ago

A fine is appropriate. There's no reason to destroy someone's life because of this. There are numerous forms of disciplinary actions available that don't involve needlessly and permanently destroying a person's livelihood.

happymellon · 2 years ago

Lawyers submitting fake briefs that they could be bothered to review destroys someone's life.

This would just mean they can't be a lawyer.

PoignardAzur · 2 years ago

Does disbarment really destroys someone's life? It considerably reduces their career options and it means they'll probably get much less wealthy than they would otherwise have been, but they can still find another job.

By contrast, if a lawyer makes a mistake that gets someone a criminal record they didn't deserve, that person has much more ground to say it destroyed their life.

Dead Comment

speedgoose · 2 years ago

If the user didn't understand that ChatGPT make up crap sometimes, despite the warnings everywhere about it that they may not read, it could still be a good-faith error to me. ChatGPT was just released.

happymellon · 2 years ago

If the user doesn't understand their tool then they should review what it spits out.

> I found a guy who claims to have the sum of all knowledge, and I've just copied and pasted code from him that I haven't reviewed.

Then either your an idiot and should be disbarred or you are negligent and should be disbarred. As a lawyer you have people's lives in your hands, this is not the time for "woops, sorry I just couldn't be arsed to read what I submitted".

unsupp0rted · 2 years ago

If the user is a lawyer who officially referenced a case that a) they never read and b) never existed, then the user can’t be allowed to practice law anymore.

orangesite · 2 years ago

The demo: AI so powerful it's a literal existential risk for humanity!

v1.0: You should be disbarred for using our product and assuming the output would make any sense whatsoever.

corobo · 2 years ago

If the user doesn't understand their tools maybe they should delegate to someone that does before they rely on the results

Especially if they're a lawyer! Llmao

The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent". The other day, I debated a guy who thought that the utopian future was governments run by AI. He was convinced that the AI would always make the perfect, optimal decision in any circumstance. The scary thing to me is that LLMs are probably really good at fabricating the kind of brain dead lies that get corrupt politicians into power.

shzhdbi09gv8ioi · 2 years ago

> The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent".

100% this!

I'm fed up with seemingly rational people who just can't comprehend that the AI hype is just sales talk.

I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done, because ChatGPT is not reasoning!!! It is not intelligent.

It can just spit out seemingly coherent, but often incorrect, snippets of code.

I think it's best described as a word-calculator, or autocomplete on crack, in the sense that it is great on guessing what comes next. But it has no reasoning behind it's predictions, only statistics.

EDIT: The AI apologetics are downvoting me LOL

soulofmischief · 2 years ago

Are we using the same ChatGPT?

It cannot write a full app, but that is a matter of context size.

It absolutely can "reason" within bounds, attention is a much more powerful system than you realize.

Chain-of-Thought is a well-established technique for LLM prompting which can produce great results. And the snippets of code GPT-4 generates for me are usually on point, at least they were until OpenAI dumbed the model down in the last few months with GPT-4-Turbo.

I get it not only to write novel code, but explain existing code all the time and it reasons about my code very well.

kombookcha · 2 years ago

>autocomplete on crack

I have had some really frustrating conversations about this very topic with people insisting on using it for complex tasks like designing business strategies. It doesn't know how to actually answer your question, it's just spitting out probable words associated with the topic in a plausible sequence! It's like Quora with spellcheck and an aura of legitimacy which makes it infinitely more seductive.

robbiep · 2 years ago

I’ve actually used a mates replit chained gpt-4 app to build short single function apps from a single command that function on first run, so it’s definitely possible, and will only get better - but building code is just following a logical set of instructions down a pathway

JimDabell · 2 years ago

> I'm fed up with seemingly rational people who just can't comprehend that the AI hype is just sales talk.

The reason why you see rational people believe the AI hype is that AI is a lot better than you give it credit for. They are right.

> I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done

It can. When it first came out, I wanted a macOS menu bar app so that I could access ChatGPT conversations from the menu bar. I’d never written a macOS app before. I told it what I wanted, it wrote the whole thing in one go. There was one minor compile error (a type signature had changed from a previous version, if I remember correctly), which was a one-line fix. I iterated a couple of times, telling ChatGPT what improvements I wanted to make to the app. It did them.

Would I use it to build a complex app? No. But it is capable of building a whole app.

_fizz_buzz_ · 2 years ago

Why didn't the customer write the whole app themselves with chatgpt?

williamcotton · 2 years ago

What matters is if the tool is useful or not.

You cannot claim that this tool is not useful to people and use cases you are entirely unfamiliar with.

Debates of what is real intelligence are akin to running around in circles.

ben_w · 2 years ago

"""That's one of those irregular verbs, isn't it? I give confidential security briefings. You leak. He has been charged under section 2a of the Official Secrets Act.""" - Yes Minister.

To rephrase for the subject: We're only human, mistakes are to be expected; They are a idiots, mistakes are to be expected; that thing is just a glorified calculator, mistakes are to be expected.

(There's a short story I found I couldn't bring myself to finish, “Zero for Conduct” by Greg Egan, where the lead character is bullied by someone who has a similar disregard for her intelligence; I know one cannot use fiction to learn about reality, so I will instead say that this disregard of human intelligence by other humans happens a lot in real life too, the racism and xenophobia of βαρβαρίζω can still be found today in all the people who insist that ancient structures like the pyramids couldn't possibly have been built by the locals and therefore it must have been aliens).

> AI hype is just sales talk

But where does the hype end and the reality begin?

> I had to convince a customer the other day we cannot write whole apps with ChatGPT. As far as I know, not a single example exists of a full app written by ChatGPT. It just can't be done, because ChatGPT is not reasoning!!! It is not intelligent.

I will agree ChatGPT does indeed make incoherent solutions — one test project was making a game in JS, it (eventually) gave me a vector class with methods for multiply(scalar) etc., but then tried to use mul(scalar).

But ironically, I've also made a functioning (basic, but functioning) ChatGPT API interface… by bolting together the output of ChatGPT. I won't claim it's amazing or anything, because I'm an iOS developer and the thing "I" made is a web app, but it works well enough for my needs (just don't paste HTML into the query section, because I stopped adding to it when it was good enough for my needs and therefore only have a very basic solution to code being shown in the chat list, there's a lot of stuff that would be improved by using simple libraries but I didn't want to).

> I think it's best described as a word-calculator, or autocomplete on crack.

And that's the same error in the opposite direction.

If I understand right, GPT-3 is literally the complexity of a mid-sized rodent. Thus the metaphor I use is:

Imagine a ferret was generically modified to be immortal, had every sensory input removed except smell, which was wired up to a computer. The ferret and computer then spend 50,000 years going through 10% of all the text on the internet, where every sequence is tokenised and those tokens are turned into a pattern of olfactory nerves to stimulate, and the ferret is rewarded or punished based on how well it imagined the next token.

You're annoyed that this specific ferret's jokes are derivative, their code doesn't always compile, that they make mistakes when trying to solve algebraic problems, that their pecan pie recipe needs work, and that they make mistakes when translating Latin into Hindi.

I'm amazed the ferret can do any of these things.

Springtime · 2 years ago

I've wondered if the impression isn't so much caused by the answers themselves but a trust level has formed from its behavior. A core appeal of LLM bots is they're instructed to be non-judgemental and ever-helpful, so in most scenarios one won't 'butt heads' with them.

This obviously has some real positives for learning (if the responses could be accurate) since you're more likely to continue eagerly the less ashamed you are asking 'dumb' questions to more rapidly gain an understanding of something (anonymous accounts online similarly facilitate shame-free question asking).

I wonder though if it might be having an effect where people begin to prefer AI over humans since with real humans, whether online or IRL, there's no consistency of response—people could be cranky, disinterested in responding or criticize someone for even asking.

So the more one interacts with an ever-willing, ever-pleasant, non-judgemental, human-esque answer machine—even if it's hallucinating things—I could see how it could become more of a go-to and even a trust begin to grow from familiarity and its generally polite ability to listen.

blibble · 2 years ago

> The danger of "AI" is that we actually believe the plausible fabrications it produces are "intelligent".

the thing I find scary is you see people on here believing this

if a website full of techies can't tell the difference what chance do the general public have?

I suspect the invention of the LLM will be the final nail in the coffin of liberal democracy (after the invention of social media)

extheat · 2 years ago

People should never have blindly trusted anything on the internet in the first place. As far as I’m aware, most of the fears like this are overrated. People will live, learn and adapt as we always have. Constantly complaining about doom and having pessimism about anything seemingly scary is not a great way to live a fulfilling life.

xanderlewis · 2 years ago

There are entire subreddits of people who believe that ‘AI’ is by definition infallible. I feel myself getting sucked in and going slightly crazy just spending a few minutes reading it.

Those places seem to mostly be populated by disenfranchised people who are desperate for some ‘crazy alien tech’ to come along and overturn whatever systems they feel led them to their current (miserable) lives.

“Anyone who doesn’t think that in two years’ time AGI will have ushered in a new age where we all live on UBI and are free to explore our passions isn’t paying attention/has no idea what’s about to happen”, etc. etc.

Needless to say, none of this stuff has anything remotely to do with, you know… empirical evidence or theory. But reading endless reams of it for hours a day can certainly make you think it does.

ben_w · 2 years ago

> “Anyone who doesn’t think that in two years’ time AGI will have ushered in a new age where we all live on UBI and are free to explore our passions isn’t paying attention/has no idea what’s about to happen”, etc. etc.

I'd be surprised if it takes less than 6 to fully generalise over all domains (what I expect to be hard is "learning from as few examples as a human"), and I'm not convinced there's the political will to roll out UBI fast enough, not even if AI that is sufficiently powerful as to require UBI to avoid economic collapse and popular revolution takes 20 years.

But AI may well break a lot of things in 2 years even without being either of those. Radical economic disruption doesn't need to replace all humans, even 5% would be huge, and it doesn't need to be as evidence-efficient as a human if there's lots of humans doing the same thing that the evidence-inefficient AI can learn from.

On the other hand I have just described the trucking industry, and yet even despite the ability to collect all that training data from all those drivers, Tesla has still not fully solved self-driving vehicle AI.

And also, I do mean "replacing humans" rather than just doing specific tasks that are currently done by humans: if those same humans can "just learn a different job", then so long as they can retrain faster than AI can learn the same jobs by watching humans collectively, this isn't so bad.

> empirical evidence or theory

Theory, I'd agree. But then, we don't really have any theory suitable to this task.

Empirical evidence? That's a weird take, to me. What it can already do, even despite the mistakes of the current generation, has me wondering how long I'm going to be economically relevant for — and hoping this is a repeat of Clever Hans.

unsupp0rted · 2 years ago

The utopian future is governments run by AI.

The Culture series explores this to some degree.

ben_w · 2 years ago

Fiction isn't a good reference point for reality. The dystopian future is also governments run by AI. Terminator so much so it's a cliché, but also I Have No Mouth, and I Must Scream. Also, never read this, but people sometimes point it out as an example of "Can we, like, not do that?": https://tvtropes.org/pmwiki/pmwiki.php/Fanfic/FriendshipIsOp...

InsomniacL · 2 years ago

I imagine after the first car was invented if somebody described our current situation with cars most people would be as sceptical as you are about this.

If the purpose of Government is to govern as per the will of the people, AI's ability will likely surpass Humans in this task certainly in my lifetime.

Talking about current gen AI is like talking about the Benz Patent-Motorwagen, We haven't reached the Ford Model T and we can't even imagine a Tesla Model X.

ben_w · 2 years ago

Given how bad the Benz Patent-Motorwagen was, I think the DIY/"open source" LLMs would count as equivalents of that, and that OpenAI, Stable Diffusion, and Midjourney (especially given their popularity and economic disruptiveness to buggy whip manufacturers/artists) are equivalents of the Model-T. Cars have become more efficient, comfortable and safe since then, but the utility is similar.

But I also think cars are a terrible analogy. Internal combustion engines collectively are probably more apt, and for that case: OpenAI's various models may well be the 1776 Watt steam engine — a basic useful tool that displaces manual labor, which has a direct influence in its own right, but which would also see categorical replacement several times over.

yosito · 2 years ago

I think people under and overestimate AI at the same time. E.g. I asked ChatGPT4 to draw me a schematic of a simple buck converter (i.e. 4 components + load). In the written response it got the basics right. Drawing that schematic is completely garbled non-sense.

I was expecting something like this maybe: https://en.wikipedia.org/wiki/Buck_converter#/media/File:Buc...

I got this: https://imgur.com/a/tEqprGq

Mordisquitos · 2 years ago

> I got this: https://imgur.com/a/tEqprGq

Are you sure you asked ChatGPT4 to draw the schematic for a simple buck converter? I'm asking because that looks like a near perfect pre-Schneider coencarbulator-control oblidisk transistor, ingeniously aligned for use with theta arrays!

I'm quite out of date with the latest VX work, but it looks impressive enough that I think you should ask the VX community [0] in case any of them may use this design to get improved delta readings.

[0] https://old.reddit.com/r/VXJunkies/

But that is my point, ChatGPT connected the rockwell retro encabulator[0] in reverse, which will of course make the resonance tank split capacitor bridge oscillate out of sync with the active input bias triodes. This will of course immediately fry the dual-bifet transistor driver stage. Not sure why they call this AI.

[0]https://www.youtube.com/watch?v=RXJKdh1KZ0w

hackerlight · 2 years ago

Is GPT-4 just passing a prompt to DALL-E to create an image? The garbled diagram makes sense since DALL-E isn't supposed to be that intelligent or an AGI.

Yes it did. And this was the description ChatGPT put underneath it:

"Here's the illustration of a basic buck converter schematic. This diagram includes the input voltage source (Vin), the switch (transistor), diode, inductor (L), capacitor (C), output voltage (Vout), and the load (represented as a resistor). The connections between these components and the flow of current are also shown. This should help you understand the basic operation of a buck converter."

Which is a pretty good description of the components. However, it then gaslights me into believing this is somehow depicted in the picture Dall-E produced.

bobim · 2 years ago

AI version is so much better!

Deleted Comment

eurekin · 2 years ago

Right. It's the AI that is the problem.

I have another use case for LLM's, I haven't thought of before: absolution of responsibility. The public is already primed to focus on AI in such cases.

b112 · 2 years ago

In the early 90s, when every business was moving full out into computing, I'd call for an issue.

Sometimes the person helping me, would say "I'm sorry, the computer won't let me do that". This seemed to grow as a response, until, to protect myself, I started pushing back.

"I don't care, you guys owe me $x, and telling me your computer won't let you do it is absurd. Pay me."

Most people would then get a manager on the phone, who would ape the "computer was at fault" line, but after pushback.. surprise!, there was a way to resolve things.

Point is, abdicating responsibility is an easy out for the lazy and for corporate greed. And AI will surely be used in this fashion, because people love an out.

Amazon is already the worst of this, if you have an issue that doesn't fit in a premade solution, good luck getting any fix!

Taking humans out of the chain is always bad.

Just look at the UK post office scandal, for the dark side of this.

Calls very close to home...

It really feels like playing a whack-a-mole, never really progressing towards any reasonable level of responsibility

dspillett · 2 years ago

> absolution of responsibility

There are some still fighting the fight, but this is already happening¹ when it comes to failure to attribute and other licensing issues. “I didn't copy/misattribute/other your code, an AI trained in part by it was ‘inspired’ by it amongst other code”.

----

[1] unless you believe the assurances that an LLM will never regurgitate chunks of training material, like the image based models have been shown to

happytiger · 2 years ago

A hundredth the price and a quarter the quality means that this is here to stay. Might be a little early in the accuracy phase to start riding AI written briefs into court unchecked, but then I’ve never met a lawyer who didn’t try to make their billing efficient.

But logically, since all that is needed is improved accuracy it’s more likely that improved accuracy will be the answer rather than any change in human behavior.

A_D_E_P_T · 2 years ago

> A hundredth the price and a quarter the quality means that this is here to stay

No, it's simply that those noobs don't know how to use LLMs. They'll eventually learn.

Basically, you don't use them to dig up new information, unless you're extremely careful about triple-checking that information. Google Scholar's legal database search is better for that. You use LLMs to write boilerplate, paraphrase, edit, and synthesize information from your own sources. Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.

consp · 2 years ago

> Do it properly, and you'll never "hallucinate" a fake case in one of your legal filings, and you'll be able to write 'em in 5% of the time.

All fun and games for those who can and good for them, but I'm betting the majority can't or won't. The result being society pays for those later ones incompetence.

sgt101 · 2 years ago

I've led a team building an LLM agent for customer service.

Our finding is that it's between 50% and 10% of the operational cost of a human for a case. This costing is based on the range of costs for offshore vs. nearshore workers and doesn't account for a lot of the overhead of a human powered service organisation (in the jargon this isn't fully loaded).

I believe that the real cost is about 20% if dev expenses are included - but that's just my view of where inbetween the bounds thing come to rest.

Now, that's not 100th. In terms of quality, there are things it can't do and despite our architecture (which is aimed at managing the deficiencies of LLM's) we still see some hallucinations creeping through. For example our encoder has problems with directionality as in it will write text like "average transaction value declined from $150 to $154 in october." We can catch (in our tests anyway) all the mistakes about the values, but the actual textual phrasing is hard to check - at the level where hard means I think that the value of the system doesn't justify it.

I think, from customer feedback, that this sort of thing will be ok for the apps we are building, but it is a real problem with this generation of models and it's not clear to me that it will be solved in the future (although like everyone else I was blindsided by the jump from GPT3 to 4 so who knows).

Really interesting insights and a really great comment.

I expect the technology to accelerate including dramatic leaps in accuracy and for LLM technology to make geometric improvements (just larger models and better hardware will improve them substantially, and that’s already coming to market in 2024-2025).

lapcat · 2 years ago

> since all that is needed is improved accuracy

Oh, is that all?

You speak as if truth and facts are just minor details, perhaps to be added in a .1 update.

Yea I believe it will progress geometrically and not linearly so basically that’s how I see it.

mrtksn · 2 years ago

Isn’t “hallucination” named after a human phenomena? People too remember things that never happened.

Wouldn’t be solvable with a second AI agent which checks the output of the first one and be like “bro, you sure about that? I never heard of it”.

In my experience with LLMs, they don’t insist when corrected, instead they apologize and generate a response with that correction in mind.

I don't like "hallucination" in this context. "Confabulation" seems much more accurate.

Last time I tried to correct AI, it assured me that I'm wrong, and for a short, funny moment, it started insulting me before the text got scrapped and a generic "something went wrong" placeholder appeared. I don't think putting two AIs together will solve the problem.

The legal references all need to follow relatively standardised formats, so those should be easy to check. Of course there's no way to know for sure if the case is actually about the subject that it's being used as for support, but bare minimum you could filter our AI responses with fake cases.

Because AI can't differentiate truth from fiction, it's essential that a real lawyer double checks the documents AI will generate, including reading referenced cases and verifying that they support the supposed argument. That way, they can skip a difficult part of their job without actually lying to the court when they submit "their" documents.

For important fields like law, you will always need a human verification step.

supriyo-biswas · 2 years ago

> they don’t insist when corrected

That’s just OpenAI’s RLHF and instruct tuning though. Bing Chat’s Sydney had the temperament of a moody teenager and would often contradict or accuse people of being wrong when pushed.

The correct way to solve this is to use retrieval augmented generation (finding relevant cases using embeddings and then feeding the data along with the question), so that the model is grounded to the truth.

Sure, a second agent can do fact and sanity checking with retrieval too.

Closi · 2 years ago

It's absolutely solvable - and presumably this is the sort of functionality that more legal-specific AI applications offer (probably mixed with some sort of knowledge graph that contains facts).

Seems like the issue here was that a lawyer was just using OOTB ChatGPT.

tiborsaas · 2 years ago

I'm not so sure with generic LLM-s. With what I've seen so far in how LLM-s work "hallucination" is a feature, not a bug. In other use cases it's called creativity when the user wants new ideas generated or filling in the missing spots. In these cases they are really happy that it can hallucinate, but of course they don't call it like that.

So the dilemma is that we need to eliminate the same thing in one case but improve it in others.

But I believe you are right that we can engineer a fine tuned system that can use tools to fact check using traditional techniques.

thriftwy · 2 years ago

I wonder how many AI hallucinations stem from the fact that nowhere in the prompt it was said that information have to be real. And this cannot be implicitly assumed since humans like fiction in other contexts.

I haven't used or paid much attention to ChatGPT, but the other day I was reading a macOS question on Reddit, and one of the "answers" was completely bizarre, claiming that the Apple Launchpad app was developed by Canonical. I checked the commenter's bio, and sure enough, they were a prolific ChatGPT user. It also turns out that Canonical has a product called Launchpad, which was the basis of ChatGPT's mindlessly wrong answer.

The scary thing is that even though ChatGPT's response was completely detached from reality, it was articulate and sounded authoritative, easily capable of fooling someone who wasn't aware of the facts. It seems to me that these "AI tools" are a menace in a society already rife with misinformation. Of course the Reddit commenter didn't have the decency to preface their comment with a disclaimer about how it was generated. I'm not looking forward to the future of this.

jruohonen · 2 years ago

It is spreading like a wildfire. Yet, the question remains about the repercussions.

Had the lawyer used a fine-tuned model with some RAG to go along with it, I assume it’d have been fine.

This is just people failing to use a technology correctly, like riding a bike without a helmet.

echelon · 2 years ago

We never talked about the Internet's repercussions, and yet here we are.

Did we do a good job, or should we have left that stone unturned?

hutzlibu · 2 years ago

"Did we do a good job, or should we have left that stone unturned?"

Regarding the quality of the average Website - no , but also no to the second. The time of the internet was just there with no precedent and now it is hyped imperfect AIs we have to deal with. I think the main problem in both cases is, that most people don't have a clue at all, how it works. (And too many of them are in positions of power).

> Did we do a good job, or should we have left that stone unturned?

It's kind of a hangover, don't you think? The past decade, I mean, with the Internet.

badgersnake · 2 years ago

We have been talking about them since day 1 and we still are. DMCA, the existence of the EFF, cryptocurrency regulations are all examples of this.