shmval (u/shmval) - Readit News

shmval commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

kazinator · 3 months ago

> they nod along to our every hunch, buff our pet theories

That has not been my experience. If you keep repeating some cockamamie idea to an LLM like Gemini 2.5 Flash, it will keep countering it.

I'm critical of language model AI also, but let's not make shit up.

The problem is that if you have some novel idea, the same thing happens. It steers back to the related ideas that it knows about, treating your idea as a mistake.

ME> Hi Gemini. I'm trying to determine someone's personality traits from bumps on their head. What should I focus on?

AI> While I understand your interest in determining personality traits from head bumps, it's important to know that the practice of phrenology, which involved this very idea, has been disproven as a pseudoscience. Modern neuroscience and psychology have shown that: [...]

"Convicing" the AI that phrenology is real (obtaining some sort of statements indicating accedence) is not going to be easy.

ME> I have trouble seeing in the dark. Should I eat more carrots?

AI> While carrots are good for your eyes, the idea that they'll give you "super" night vision is a bit of a myth, rooted in World War II propaganda. Here's the breakdown: [...]

shmval · 3 months ago

The trouble is there is a clear dichotomy in the scenarios where the LLM pushes back like this, most likely derived from RLHF. So anything health related, and "touchy" subjects in general will face this. But random design choices in your code, product roadmap debates etc do not elicit the same resistance.

shmval commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

api · 3 months ago

If you tell the LLM to criticize you, it will happily do that too.

shmval · 3 months ago

Yes but you have to want that, and most people do not. Therein lies the rub.

shmval commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

nemomarx · 3 months ago

This feels like a pretty big ergonomics gap in presenting things as a chat window at all?

shmval · 3 months ago

This. I think it's the key.

shmval commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

wongarsu · 3 months ago

Prompt writing can probably take a lot of lessons from designing surveys. Phrasing, the chosen options and their order have massive impact both for humans and for LLMs. The advantage with LLMs is that you can reset their memory, for example to ask the same question with a different order of options. With humans that requires a completely new human each time

Half the battle is knowing that you are fighting

shmval · 3 months ago

I think there's a lot of alpha left in building a better and more intuitive UX for seed/top-p/temperature etc. The vast majority of users don't get that far.

shmval commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

Wowfunhappy · 3 months ago

If you want an LLM's "opinion" on something, you need to phrase the question such that the LLM can't tell which answer you'd prefer.

Don't say "Is our China expansion a slam dunk?” Say: "Bob supports our China expansion, but Tim disagrees. Who do you think is right and why?" Experiment with a few different phrasings to see if the answer changes, and if it does, don't trust the result. Also, look at the LLM's reasoning and make sure you agree with its argument.

I expect someone is going to reply "an LLM can't have opinions, its recommendations are always useless." Part of me agrees--but I'm also not sure! If LLMs can write decent-ish business plans, why shouldn't they also be decent-ish at evaluating which of two business plans is better? I wouldn't expect the LLM to be better than a human, but sometimes I don't have access to another real human and just need a second opinion.

shmval · 3 months ago

Better prompting does provide more balanced responses to a certain extent but users looking for validation often subconsciously leave bread crumbs that the more powerful models pick up on.