Set is a card game where players have to identify sets of three cards from a layout of 12. Each card features a combination of four attributes: shape, color, number, and shading. A valid set consists of three cards where each attribute is either the same on all three cards or different on each. The goal is to find such sets quickly and accurately.
Though this game is a solved computer problem — easily tackled by algorithms or deep learning — I thought it would be interesting to see if Large Language Models (LLMs) could figure it out.
FYI: Card 8's transcription is different than the image. In the image 5, 8, 12 is a Set but the transcription says Card 8 only has 2 symbols which removes that Set.
Oh no, thanks for pointing this out! I asked GTP-4o to convert the image to text for me and I only checked some of the cards, assuming the rest would be correct. That was a mistake.
I've now corrected the experiment to accurately take the image into account. This meant that Deepseek was no longer able to find all the sets, but o3-mini still did a good job.
I noticed that LLM at least at the Claude and OpenAI 4o level can not play tic tac toe and win against a competent opponent. They make illogical moves.
Interestingly, they can write a piece of code to solve Tic Tac Toe perfectly without breaking a sweat.
I've always said that appending "use python" to your prompt is a magic phrase that makes 4o amazingly powerful across a wide range of tasks. I have a whole slew of things in my memories that nudge it to use python when dealing with anything even remotely algorithmic, numeric, etc
Since you can train an LLM to play chess from scratch, I would not be surprised if you could also train one to play Set. I might experiment with it tomorrow.
I still think that we’re at much greater risk of discovering that human thinking is much less magical, than we are of making a machine that does magical thinking.
Though this game is a solved computer problem — easily tackled by algorithms or deep learning — I thought it would be interesting to see if Large Language Models (LLMs) could figure it out.
Deleted Comment
I've now corrected the experiment to accurately take the image into account. This meant that Deepseek was no longer able to find all the sets, but o3-mini still did a good job.
This is wildly disconcerting to me
Interestingly, they can write a piece of code to solve Tic Tac Toe perfectly without breaking a sweat.
On the other hand writing a piece of code to solve Tic Tac Toe sounds like it could be a relatively common coding challenge.
https://adamkarvonen.github.io/machine_learning/2024/01/03/c...
Deleted Comment
(this one, where ever changing rules is part of the game: https://www.looneylabs.com/games/fluxx )