Played with this a little more and I think the assessment at the end is both the weak link in the UX and a very illustrative debugging tool, not only of LLM limitations, but oddly, the format of internet debate
I've thrown about a hundred prompts at this, half of them phrased as neutral questions, the other half as statements comprising the position of one side of the debate. I notice that when doing the latter, LLM A always takes the position I state, and LLM B is always said to "win" the debate, and the form of this is usually the same: B claims that there's insufficient substantiating evidence for A's claim. Neither says much substantial about the topic of the prompt after the first exchange. Usually A's first paragraph will expand a little bit on the topic and B will mostly try to discredit the arguments made, then they'll repeat this three or four times while insulting each other's intelligence without bringing up anything new, expanding further on the topic, or notably, trying to use any examples aside from the ones initially discussed in the first reply. What's interesting about the "neutral question" style is that the debate tends to go about the same, but the assessment at the end only considers B to have won about 90% of the time
As someone who studies AI and has gotten into a lot of internet arguments, this is a fascinating example of how LLMs are great at capturing the average case with essentially zero chance of capturing the best-case of the format, in terms of being informative on the topic at hand to a third-party observer
I wouldn't listen to a knowledge based podcast without an LLM at the ready and that makes it a rather different experience to the other kinds of podcasts available. I suspect the hosts will start to get wind of this and just glance a lot of surfaces and let the audience branch out as they want.
Entertainment could just become a prompt example dive off point for the user to remix to their liking.
To be frank this is the opposite of how you should use an LLM. It's not going to be useful for diving deep on stuff that needs factual accuracy. It will on the other hand be useful for giving you shallow overviews you can drill deeper into, maybe even by helping come up with good search terms, or sometimes interrogating your thought process
Plato or Hegel this is not. But it is kinda amusing.
Seeing the fight club topics others asked would be an additionally fun and engaging feature. I asked some funny or interesting ones but would like to see others’…
Really digging that UI, what did you use for designing the frontend?
And are LLM A and B different models or just different system prompts, where the system prompt for LLM B includes that it disagrees with the output of LLM A?
I've thrown about a hundred prompts at this, half of them phrased as neutral questions, the other half as statements comprising the position of one side of the debate. I notice that when doing the latter, LLM A always takes the position I state, and LLM B is always said to "win" the debate, and the form of this is usually the same: B claims that there's insufficient substantiating evidence for A's claim. Neither says much substantial about the topic of the prompt after the first exchange. Usually A's first paragraph will expand a little bit on the topic and B will mostly try to discredit the arguments made, then they'll repeat this three or four times while insulting each other's intelligence without bringing up anything new, expanding further on the topic, or notably, trying to use any examples aside from the ones initially discussed in the first reply. What's interesting about the "neutral question" style is that the debate tends to go about the same, but the assessment at the end only considers B to have won about 90% of the time
As someone who studies AI and has gotten into a lot of internet arguments, this is a fascinating example of how LLMs are great at capturing the average case with essentially zero chance of capturing the best-case of the format, in terms of being informative on the topic at hand to a third-party observer
I kind of wonder if systems of LLMs are going to become its own new entertainment medium
Entertainment could just become a prompt example dive off point for the user to remix to their liking.
Plato or Hegel this is not. But it is kinda amusing.
Seeing the fight club topics others asked would be an additionally fun and engaging feature. I asked some funny or interesting ones but would like to see others’…
I'll start, "Tabs vs Spaces" ;)
And are LLM A and B different models or just different system prompts, where the system prompt for LLM B includes that it disagrees with the output of LLM A?
- OpenGraph data somehow is for https://claros.so an "AI Shopper"
- It uses nextjs and tailwindcss
- The model says it is GPT-4, but you'll never know if that's true
- LLM B seems to be instructed to 'argue against the topic'
Anyways. The eternal kwestion "Oliko Urho Kekkonen diktaattori?" produced better discussion I have seen in years.
Dead Comment