Readit News logoReadit News
advael · 2 years ago
Played with this a little more and I think the assessment at the end is both the weak link in the UX and a very illustrative debugging tool, not only of LLM limitations, but oddly, the format of internet debate

I've thrown about a hundred prompts at this, half of them phrased as neutral questions, the other half as statements comprising the position of one side of the debate. I notice that when doing the latter, LLM A always takes the position I state, and LLM B is always said to "win" the debate, and the form of this is usually the same: B claims that there's insufficient substantiating evidence for A's claim. Neither says much substantial about the topic of the prompt after the first exchange. Usually A's first paragraph will expand a little bit on the topic and B will mostly try to discredit the arguments made, then they'll repeat this three or four times while insulting each other's intelligence without bringing up anything new, expanding further on the topic, or notably, trying to use any examples aside from the ones initially discussed in the first reply. What's interesting about the "neutral question" style is that the debate tends to go about the same, but the assessment at the end only considers B to have won about 90% of the time

As someone who studies AI and has gotten into a lot of internet arguments, this is a fascinating example of how LLMs are great at capturing the average case with essentially zero chance of capturing the best-case of the format, in terms of being informative on the topic at hand to a third-party observer

advael · 2 years ago
Weak debates but great spectacle

I kind of wonder if systems of LLMs are going to become its own new entertainment medium

unraveller · 2 years ago
I wouldn't listen to a knowledge based podcast without an LLM at the ready and that makes it a rather different experience to the other kinds of podcasts available. I suspect the hosts will start to get wind of this and just glance a lot of surfaces and let the audience branch out as they want.

Entertainment could just become a prompt example dive off point for the user to remix to their liking.

advael · 2 years ago
To be frank this is the opposite of how you should use an LLM. It's not going to be useful for diving deep on stuff that needs factual accuracy. It will on the other hand be useful for giving you shallow overviews you can drill deeper into, maybe even by helping come up with good search terms, or sometimes interrogating your thought process
gregw2 · 2 years ago
“Iceberg is better than delta lake.” Fight!

Plato or Hegel this is not. But it is kinda amusing.

Seeing the fight club topics others asked would be an additionally fun and engaging feature. I asked some funny or interesting ones but would like to see others’…

SushiHippie · 2 years ago
> Seeing the fight club topics others asked would be an additionally fun and engaging feature

I'll start, "Tabs vs Spaces" ;)

SushiHippie · 2 years ago
Really digging that UI, what did you use for designing the frontend?

And are LLM A and B different models or just different system prompts, where the system prompt for LLM B includes that it disagrees with the output of LLM A?

SushiHippie · 2 years ago
Okay after looking at it on my computer:

- OpenGraph data somehow is for https://claros.so an "AI Shopper"

- It uses nextjs and tailwindcss

- The model says it is GPT-4, but you'll never know if that's true

- LLM B seems to be instructed to 'argue against the topic'

timonoko · 2 years ago
TIL. It understand languages, but debates only in Ænglish.

Anyways. The eternal kwestion "Oliko Urho Kekkonen diktaattori?" produced better discussion I have seen in years.

busssard · 2 years ago
i managed to let it argue in german
joshagilend · 2 years ago
This is fun :) I like it a lot! Good job :)
fzliu · 2 years ago
Not my website, but I agree - it is fun indeed.
gregw2 · 2 years ago
The author aquajet first posted it as a ShowHN about a month ago it seems.
huxflux · 2 years ago
Which LLM does it use?

Dead Comment