I can't test the bot right now, because it seems to have been hugged to death. But there's quite a lot of simple tests LLMs fail. Basically anything where the answer is both precise/discrete and unlikely to be directly in its training set. There's lots of examples in this [1] post, which oddly enough ended up flagged. In fact this guy [2] is offering $10k to anybody that create a prompt to get an LLM to solve a simple replacement problem he's found they fail at.
They also tend to be incapable of playing even basic level chess, in spite of there being undoubtedly millions of pages of material on the topic in their training base. If you do play, take the game out of theory ASAP (1. a3!? 2. a4!!) such that the bot can't just recite 30 moves of the ruy lopez or whatever.
[1] - https://news.ycombinator.com/item?id=39959589
[2] - https://twitter.com/VictorTaelin/status/1776677635491344744
"Keeping Netflix Reliable Using Prioritized Load Shedding" https://netflixtechblog.com/keeping-netflix-reliable-using-p...
It’s as if we all forgot how viruses work.
One of the big factors of "better protection" of immunity gained after recovering from the virus is survivor bias. In data about mortality after reinfection there are no people who died after getting it for the 1st time. This leaves only ones who had body strong enough to recover.
There's no surprise that later that group fares better.
Personally I use it for labeling physical things - mainly boxes. With a corresponding note in my Obsidian vault it really helps with getting content, context, and history about random stuff in my basement.
Python oneliner for generating them I've aliased in my Bash config: python3 -c "import base64; import secrets; print(''.join(secrets.choice(base64._b32alphabet.decode()) for _ in range(4)))"