As for software, there are engines out there that do content-based image retrieval (CBIR) out of the box [1]. It should be possible to build something quickly in OpenCV. You may be able to get away using simple image template matching by putting a few constraints on how the camera sees each card. Something more robust can be also be build using image descriptors and approximate nearest neighbors, as in [2].
[1] https://en.wikipedia.org/wiki/List_of_CBIR_engines
[2] https://blog.francium.tech/feature-detection-and-matching-wi...
I've thought of doing with with QR codes, but I'm not sure if it will look okay for the other players and if the qr codes won't be recognizable enough by a human to give someone unfair advantage in some situations.
https://www.who.int/news/item/08-05-2015-who-issues-best-pra...
As others said, the key is feedback and prompting. In a model with long context, it'll figure it out.