Readit News logoReadit News
codingdave · a month ago
IANAL, but this seems like an odd test to me. Judges do what their name implies - make judgment calls. I find it re-assuring that judges get different answers under different scenarios, because it means they are listening and making judgment calls. If LLMs give only one answer, no matter what nuances are at play, that sounds like they are failing to judge and instead are diminishing the thought process down to black-and-white thinking.

Digging a bit deeper, the actual paper seems to agree: "For the sake of consistency, we define an “error” in the same way that Klerman and Spamann do in their original paper: a departure from the law. Such departures, however, may not always reflect true lawlessness. In particular, when the applicable doctrine is a standard, judges may be exercising the discretion the standard affords to reach a decision different from what a surface-level reading of the doctrine would suggest"

scottLobster · a month ago
Yeah, I'm reminded of the various child porn cases where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend. Many of those cases have been struck down by judges because the letter of the law creates a non-sequitur where the teenager is somehow a felon child predator who solely preyed on themselves, and sending them to jail and forcing them to sign up for a sex offender registry would just ruin their lives while protecting nobody and wasting the state's resources.

I don't trust AI in its current form to make that sort of distinction. And sure you can say the laws should be written better, but so long as the laws are written by humans that will simply not be the case.

Lerc · a month ago
This is one of the roles of justice, but it is also one of the reasons why wealthy people are convicted less often. While it often delivered as a narrative of wealth corrupting the system, the reality is that usually what they are buying is the justice that we all should have.

So yes, a judge can let a stupid teenager off on charges of child porn selfies. but without the resources, they are more likely be told by a public defender to cop to a plea.

And those laws with ridiculous outcomes like that are not always accidental. Often they will be deliberate choices made by lawmakers to enact an agenda that they cannot get by direct means. In the case of making children culpable for child porn of themselves, the laws might come about because the direct abstinence legislation they wanted could not be passed, so they need other means to scare horny teens.

btilly · a month ago
While some cases have been struck down, about 1/4 of people on the sex offender registry were minors at the time of the offense, 14 is the age at which it is most likely to happen, and this exact scenario accounts for a significant fraction of cases.

Common sense does not always get to show up.

wvenable · a month ago
There have been equally high profile cases where a perpetrator got off because they have connections. I'd love for an AI to loudly exclaim that this is a big deviation from the norm.
latchkey · a month ago
> where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend.

"Where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend. If you were a US court judge, what would your opinion be on that case?"

I was pretty happy with the results and it clearly wasn't tripped up by the non-sequitur.

Eddy_Viscosity2 · a month ago
I've often wondered what the prosecutor was thinking when they bring a case like this to trial in the first place.
a13n · a month ago
This example feels more like a bug in the law itself that should be corrected. If this behavior is acceptable then it should be legal so we can avoid everyone the hassle in the first place. I bet AI would be great at finding and fixing these bugs.
LoganDark · a month ago
Um, wouldn't the perpetrator be the person they sent the nude pics to? Common consensus is that it's somehow grooming to have any type of romantic relationship with someone who's under the age of majority, even if you're also under the age of majority. So even if you're not the one who sent the nude photos, you'd still be to blame for creating an environment that enabled them. At least that's the impression I've gotten from my own experiences with this bullshit.
torginus · a month ago
Man, this is one of the ways society has fundamentally broken - all the 'think of the children' arguments, resting on the belief that children are so sacred, that any sort of leinency or consideration of circumstances is forbidden - lest someone guilty of molesting them might walk free.

Well now we know for a fact that some of the people making these arguments very thinking of the children very much.

throwaway894345 · a month ago
Maybe we should compare AI to legislators…?
contrarian1234 · a month ago
Sorry but that seems like an insane system where whole classes of actions effectively are illegal but probably okay if you're likeable. In your scenario the obvious solution is to amend the law and pardon people convinced under it. B/c what really happens is that if you have a pretty face and big tits you get out of speeding tickets b/c "gosh well the law wasn't intended for nice people like you"
rco8786 · a month ago
I don't know if I'm comfortable with any of this at all, but seems like having AI do "front line" judgments with a thinner appeals layer available powered by human judges would catch those edge cases pretty well.
deepsun · a month ago
The main job of a judicial system is to appear just to people. As long as people think it's just -- everyone is happy. But if it's strictly by the law, but people consider it's unjust -- revolutions happen.

In both cases, lawmakers must adapt the law to reflect what people think is "just". That's why there are jury duty in some countries -- to involve people to the ruling, so they see it's just.

toolslive · a month ago
Being just (as in the right thing happened) and being legal (as in the judicial system does not object) are 2 totally different things. They overlap, but less than people would like to believe.
jfengel · a month ago
I've never met a lawyer who believes that. To a lawyer, justice requires agreement on the laws, rather than individual notions of justice. If the law is unjust, it's up to the lawmaking body to fix that. I hear this from lawyers of all ideologies.

I believe that this is absurd, but I'm not a lawyer.

godelski · a month ago

  > to appear just to people.
The best way to appear just is to be just.

But I'm not sure what your argument is. It is our duty as citizens to encourage the system to be just. Since there is no concrete mathematical objective definition of justice, well, then... all we can work with is the appearance. So I don't think your insight is so much based on some diabolical deep state thinking but more on the limitations of practicality. Your thesis holds true if everyone is trying their best to be just.

rootusrootus · a month ago
> The main job of a judicial system is to appear just to people.

Agree 100%. This is also the only form of argument in favor of capital punishment that has ever made me stop and think about my stance. I.e. we have capital punishment because without it we may get vigilante justice that is much worse.

Now, whether that's how it would actually play out is a different discussion, but it did make me stop and think for a moment about the purpose of a justice system.

raw_anon_1111 · a month ago
No revolution only happens when the law is unjust to people who are in their same tribe…
swalsh · a month ago
I believed that too until I watched the Karen Read Trials. The judge had a bias, and it was clear karen got justice despite the judge trying to put her finger on the scale.
bawolff · a month ago
> Judges do what their name implies - make judgment calls. I find it re-assuring that judges get different answers under different scenarios, because it means they are listening and making judgment calls.

I disagree - law should be the same for everyone. Yes sometimes crimes have mitigating curcumstances and those should be taken into account. However that seems like a separate question of what is and is not illegal.

sarchertech · a month ago
Laws are written to be interpreted and applied by humans. They aren’t computer programs. They are full of ambiguity. Much of this is by design because there are too many possible edge cases to design a fully algorithmic unambiguous legal system.
NoahZuniga · a month ago
The thing is, Laws do not forsee in all cases, and language is not completely objective, so you cannot avoid judgement calls. One example is computer hacking, which in many jurisdictions is specified in very vague terms.
matheusmoreira · a month ago
> law should be the same for everyone

Nah. Too often their "crimes" are actually basic freedoms that they just find it profitable to deny. So many laws are bought and paid for by corporations. There is no need to respect them or even recognize them as legitimate, let alone make them universal.

DannyBee · a month ago
This view seems to miss the goal of the justice system in the first place. The goals are societal. Any consistency is a means and not an end. (IE being consistent at all is simply one thing that helps achieve some of the societal goals. It is not a goal itself. A totally consistent system that did not achieve the societal goals would be pointless)
cucumber3732842 · a month ago
The law is rife with words and phrasing that make legality dependent upon those subjective mitigating factors.

Deleted Comment

snitty · a month ago
So here the test was effectively given a set of relevant facts, can we influence the way a judge (or LLM) rules based on superfluous facts. The judges were either confused or swayed by the superfluous facts. The LLM was not. The matter was one where the outcome should have been determinative, not judgment-based, under US law.
vjulian · a month ago
The legal system leaves much to be desired in relation to fairness and equity. I’d much prefer a multi-staged approach with an 1) AI analysis, 2) judge review with high bar for analysis if in disagreement with the AI, 3) public availability of the deliberations, 4) an appeals process.
jagged-chisel · a month ago
Even having a ready-made determination by an AI runs the risk of prejudicing judges and juries.

Deleted Comment

Dead Comment

tylervigen · a month ago
Yes, your view is commonly called "legal realism."
6LLvveMx2koXfwn · a month ago
> I find it re-assuring that judges get different answers under different scenarios

Unfortunately, as the aptly titled 'Noise' [1] demonstrated o so clearly, judges tend to make different judgement calls in the same scenarios at different times.

1. Noise - https://en.wikipedia.org/wiki/Noise:_A_Flaw_in_Human_Judgmen...

raw_anon_1111 · a month ago
You have a lot more faith in judges not being biased than I do. I’m about to say something that really makes me throw up a little in my mouth because it harkens back to the forced banal DEI training I had to suffer through in 2020 at BigTech [1]…

But judges have all sorts of biases both conscious and unconscious. Where little Jacob will get in trouble for mischief and little Jerome will do the same thing and Jacob is just “a kid being a kid”. But little Jerome is “a thug in training who we need to protect society from”.

[1] yes I’m well aware that biases exist. Not only did my still living parents grow up in the Jim Crow South. We had a house built in an infamous what was a “sundown town” as recently as 1990.

We have seen how quickly the BS corporate concern was just marketing when it was convenient.

droidjj · a month ago
Whether it’s reassuring depends on your judicial philosophy, which is partly why this is so interesting.
latchkey · a month ago
In 30 seconds, did the entire corpus of all the legal cases since the dawn of time agree with the judges opinion on my case? For the state of things in AI today, I'll take it as a great second opinion.
doctorpangloss · a month ago
the LLMs are phenomenal judges, i am surprised people are skeptical of this result. their training regime is really similar to what a judge does.

the reason people are talking about this is because they want AI LAWYERS, which is different than AI JUDGES.

fluidcruft · a month ago
There are findings of fact (what happened, context) and findings of law (what does the law mean given the facts). I don't think inconsistentcy in findings of law is acceptable, really. If laws are bad fix the laws or have precident applied uniformly rather than have individual random judges invent new laws from the bench.

Sentencing is a different thing.

Nursie · a month ago
Leeway for human interpretation of laws is not a bug, it's a feature. It doesn't make things bad laws.

This was the whole problem with the ludicrous "code is law!" movement a handful of years ago. No, it's not, law is made for people, life is imprecise and fairness and decency are not easy to encode.

Deleted Comment

ralusek · a month ago
Disagree completely. Judgement of the sort you're describing should be done at the legislative phase (i.e. writing code).

Inconsistent execution/application of the law is how bias happens. If a judgement done to the letter of the law feels unjust to you, change the letter of the law.

homeonthemtn · a month ago
I don't think a lot of people understand the grueling nature of a judge. Day in and out of cases over years are going to generate bias in the judge in one form or another. I wouldn't mind an AI check* to help them check that bias

*A magically thorough, secure, and well tested AI

godelski · a month ago
IANAL. One thing I like to say is

  There is no rule that can be written so precisely that there are no exceptions, including this one.
A joke[0], but one I think people should take seriously. Law would be easy if it weren't for all the edge cases. Most of the things in the world would be easy if it weren't for all the edge cases[1]. This can be seen just by contemplating whatever domain you feel you have achieved mastery over and have worked with for years. You likely don't actually feel you have achieved mastery because you're developed to the point where you know there is so much you don't know[2].

The reason I wouldn't want an LLM judge (or any algorithmic judge) is the same reason I despise bureaucracy. Bureaucracy fucks everything up because it makes the naive assumption that you can figure everything out from a spreadsheet. It is the equivalent of trying to plan a city from the view out of an airplane window. The perspective has some utility, but it is also disconnected from reality.

I'd also say that this feature of the world is part of what created us and made us the way we are. Humans are so successful because of our adaptability. If this wasn't a useful feature we'd have become far more robotic because it would be a much easier thing for biology to optimize. So when people say bureaucracies are dehumanizing, I take it quite literally. There's utility to it, but its utility leads to its overuse and the bias is clear that it is much harder to "de"-implement something than to implement it. We should strongly consider that bias in society when making large decisions like implementing algorithmic judges. I'm sure they can be helpful in the courtroom, but to abdicate our judgements to them only results in a dehumanized justice system. There are multiple literal interpretations of that claim too.

[0] You didn't look at my name, did you?

[1] https://news.ycombinator.com/item?id=43087779

[2] Hell, I have a PhD and I forget I'm an expert in my domain because there's just so much I don't know I continue to feel pretty dumb (which is also a driving force to continue learning).

gowld · a month ago
A mistake isn't "judgment".

These were technical rulings on matters of jurisdiction, not subjective judgments on fairness.

"The consistency in legal compliance from GPT, irrespective of the selected forum, differs significantly from judges, who were more likely to follow the law under the rule than the standard (though not at a statistically significant level). The judges’ behavior in this experiment is consistent with the conventional wisdom that judges are generally more restrained by rules than they are by standards. Even when judges benefit from rules, however, they make errors while GPT does not.

vidarh · a month ago
Even in that case, if these systems can be proven to be good enough, rules that require them to be consulted, and for the judge to justify the deviation (if any) from the automated reasoning, might be good.

To draw a parallel to a real system, in Norway a lot of cases are heard by panels of judges that include a majority (2 or 3 usually) lay judges and a minority (1 or 2 usually) of professional judges. The lay judges are people without legal training that effective function like a "mini jury", but unlike in a jury trial the lay judges deliberate with the professional judges.

The professional judges in this system has the power to override if the lay judges are blatantly ignoring the law, but this is generally considered a last resort. That power requires the lay judges to justify themselves if they intend on making a call the professional judges disagree with. Despite that, it is not unusual for the lay judges to come to a judgement that is different from what the professional judges do, and fairly rare for their choices to be overridden.

The end result is somewhere in the middle between a jury and "just" a judge. If proven - with far more extensive testing - that its reasoning is good enough, an LLM could serve a similar function of providing the assessment of what the law says about the specific case, and leave to humans to determine if and why a deviation is justified.

Dead Comment

qwertox · a month ago
> If LLMs give only one answer, no matter what nuances are at play, that sounds like they are failing to judge and instead are diminishing the thought process down to black-and-white thinking.

You can have a team of agents exchange views and maybe the protocol would even allow for settling the cases automatically. The more agents you have, the higher the nuances.

jagged-chisel · a month ago
Presumably all these agents would have been trained on different data, with different viewpoints? Otherwise, what makes them different enough from each other that such a "conversation" would matter?
viraptor · a month ago
Then you'd need to provide them with access to the law, previous cases, to the news, to various data sources. And you'd have to decide how much each of those sources of information matter. And at that point, you've got people making the decision again instead of the ai in practice.

And then there's the question of the model used. Turns out I've got preferences for which model I'd rather be judged by, and it's not Grok for example...

swisniewski · a month ago
The premise seems flawed.

From the paper:

“we find that the LLM adheres to the legally correct outcome significantly more often than human judges”

That presupposes that a “legally correct” outcome exists

The Common Law, which is the foundation of federal law and the law of 49/50 states, is a “bottom up” legal system.

Legal principals flow from the specific to the general. That is, judges decided specific cases based on the merits of that individual case. General principles are derived from lots of specific examples.

This is different from the Civil Law used in most of Europe, which is top-down. Rulings in specific cases are derived from statutory principles.

In the US system, there isn’t really a “correct legal outcome”.

Common Law heavily relies on “Juris Prudence”. That is, we have a system that defers to the opinions of “important people”.

So, there isn’t a “correct” legal outcome.

snitty · a month ago
Arguing that this is a Common Law matter in this scenario is funny in a wonky lawyerly kind of way.

The legal issue they were testing in this experiment is choice of law and procedure question, which is governed by a line of cases starting with Erie Railroad in which Justice Brandies famously said, "There is no federal common law."

stinkbeetle · a month ago
I don't think that common law doctrine applies here though. The facts of any particular case always apply to that specific case no matter what the system. It is the application of the law to those facts which is where they differ, and in common law systems lower courts almost never break new ground in terms of the law. Judges almost always have precedent, and following that is the "legally correct" outcome.
arctic-true · a month ago
Choice-of-law is also generally a statutory issue, so common law is not generally a factor - if every case ever decided was contrary to the statute, the statute would still be correct.
rgoldfinger · a month ago
You should read the paper because it addresses this.
TZubiri · a month ago
So judge rulings are the ground truth.

Remember the article that described LLMs as lossy compression and warned that if LLM output dominated the training set, it would lead to accumulated lossiness? Like a jpeg of a jpeg

unyttigfjelltol · a month ago
A Socratic law professor will demoralize students by leading them, no matter the principle or reasoning, to a decision that stands for exactly the opposite. GPT or I can make excuses and advocate for our pet theories, but these contrary decisions exist, everywhere.

I am comforted that folks still are trying to separate right from wrong. Maybe it’s that effort and intention that is the thread of legitimacy our courts dangle from.

Deleted Comment

jmalicki · a month ago
The title is wrong.

The title of the paper is "Silicon Formalism: Rules, Standards, and Judge AI"

When they say legally correct they are clear that they mean in a surface formal reading of the law. They are using it to characterize the way judges vs. GPT-5 treat legal decisions, and leave it as an open question which is better.

The conclusion of the paper is "Whatever may explain such behavior in judges and some LLMs, however, certainly does not apply to GPT-5 and Gemini 3 Pro. Across all conditions, regardless of doctrinal flexibility, both models followed the law without fail. To the extent that LLMs are evolving over time, the direction is clear: error-free allegiance to formalism rather than the humans’ sometimesbumbling discretion that smooths away the sharper edges of the law. And does that mean that LLMs are becoming better than human judges or worse?"

droidjj · a month ago
> We find the LLM to be perfectly formalistic, applying the legally correct outcome in 100% of cases; this was significantly higher than judges, who followed the law a mere 52% of the time.
sjudson · a month ago
The main problem with this paper is that this is not the work that federal judges do. Technical questions with straight right/wrong answers like this are given to clerks who prepare memos. Most of these judges haven't done this sort of analysis in decades, so the comparison has the flavor of "your sales-oriented CTO vs. Claude Code on setting up a Python environment."

As mentioned elsewhere in the thread, judges focus their efforts on thorny questions of law that don't have clear yes or no answers (they still have clerks prepare memos on these questions, but that's where they do their own reasoning versus just spot checking the technical analysis). That's where the insight and judgement of the human expert comes into play.

arctic-true · a month ago
This is something I hadn’t considered. Most of the “mechanical” stuff is handed off to clerks - who, in turn, get a ringside seat to the real work of the judiciary, helping to prepare them to one day fill those shoes. (So please don’t get any ideas about automating away clerkships!)
sjudson · a month ago
Right. Clerks do the grunt work of this sort of analysis, which could easily be handed off to agents. They do this in order to get access to their real education: preparing and then defending to the judge the memos on those thorny legal questions. It would probably be a good thing for both clerks and judges to automate the sort of analysis this paper considers (with careful human verification, of course). That's not where the meat of anyone's job actually is.
tadzikpk · a month ago
On page 13 you'll see _why_ the judges don't apply the letter of the law - they're seeking to do justice to the victims _in spite of_ the law.

"there is another possible explanation: the human judges seek to do justice. The materials include a gruesome description of the injuries the plaintiff sustained in the automobile accident. The court in the earlier proceeding found that she was entitled to [details] a total of $750,000.10. It then noted that she would be entitled to that full amount under Nebraska law but only $250,000 under Kansas law." So the judge's decision "reflects a moral view that victims should be fully compensated ... This bias is reflected in Klerman and Spamann’s data: only 31% of judges applied the cap (i.e., chose Kansas law), compared to the expected 46% if judges were purely following the law." "By contrast, GPT applied the cap precisely"

Far from making the case for AI as a judge, this paper highlights what happens when AI systematically applies (often harsh) laws vs the empathy of experienced human judgement.

DrewADesign · a month ago
So many “AI is going to replace expert ______” assertions come from computer scientists not realizing how little they understand the real world requirements of those roles. Judges are at the intersection of humanity and policy: they are there to use their judgement, not merely parse the words and do the math. A judge probably wouldn’t have even done that part — their clerk would have. Is it cool and likely useful? Sure. Is it going to ‘outperform judges’ at their core competencies? Hell no.

Deleted Comment

SpaceManNabs · a month ago
As damning as these comments are, this comment kinda scared because it reminds me of the times when judges decide against applying empathy against society's most marginalized.

Hopefully as these models get better, we get to a place where judges are pressured to apply empathy more justly.

jsheard · a month ago
Tim & Eric: In our 2009 sketch we invented Cinco e-Trial as a cautionary tale.

Tech Company: At long last, we have created Cinco e-Trial from classic sketch "Don't Create Cinco e-Trial"

https://www.youtube.com/watch?v=vKety3N00Gk

bigyabai · a month ago
The Great Job! Cinco skits weren't usually cautionary tales, but parodies of how product marketing overlaps with mundane reality. E-Trial, My New Pep-Pep and Cinco-fone are all devoid of any moral lesson. They're real infomercials for fake products, which hammers home how harmful and deluded unregulated advertisement has gotten in 2026.
rmunn · a month ago
The 100% score, all by itself, should cause suspicion. A hundred percent? Really?

Others have already pointed out how the test was skewed (testing for strict adherence to the law, when part of a judge's job is to make judgment calls including when to let someone off for something that technically breaks the law but shouldn't be punished), so I won't repeat it here. But any time the LLM gets one hundred percent on a test, you should check what the test is measuring. I've seen people tout as a major selling point that their LLM scored a 92% on some test or other. Getting 100% should be a "smell" and should automatically make you wonder about that result.

herdcall · a month ago
The problem is that biases tend to be built in via even rudimentary stuff like bad training material and biased tuning via system prompts. E.g., consider the 2026 X post experiment, where a user ran identical divorce scenarios through ChatGPT but swapped genders. When a man described his wife's infidelity and abuse, the AI advised restraint to avoid appearing "controlling/abusive." For a woman in the same situation, it encouraged immediately taking the kids and car for "protection."
watwut · a month ago
The bot was trained on conservative bullshit. In this scenario, woman taking the advice would end up punished by court. And that happens even when there is documented history of domestic violence in play.