Thanks for linking your game! I love how fast it is. One thing I would like is the ability to work backwards from the end--on the challenge problem yesterday I got all the way from "artist" to "chess", only to find that neither "chess" nor "checkmate" nor any other chess-related word I could think of met the 30% threshold to get to "check". That was frustrating.
Maybe have a button to flip the direction of the word chain, so you could work from either end and meet in the middle somewhere.
Hey! I'm trying to come up with a similar daily puzzle game, did anything help you come up with the idea? Also, do you generate these unrelated words daily and then vet them before releasing the daily puzzle, or is it all pretty much automated?
I don't really remember how i've come with that idea :) Basically i've become obsessed with word games lately and made couple of them. I'm most proud and happy of those two:
As for Enlinko, it's hard game to balance. That's why i've made three difficulty levels. As for daily puzzles, i'm now using semi automated generator - it generates lot of pairs, try to solve them, check if solution counts match some rules and then i'm hand picking those potential pairs for daily puzzles.
I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.
Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.
I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.
"Relatedness" here is according to ... something ... Approximately but not exactly the likelyhood that words appear near to one another in a large corpus of text. Probably doing lookups on something crunched by google books.
I like it a lot, but it’s also frustrating. Perhaps it’s being hugged to death right now but the lookups are very slow, so when I disagree with the results it’s a bit painful. If the results were quicker it would not be so bad, I could try different things.
I was a bit miffed that “currency” was not considered to be related to “mark”. Similarly I thought I’d found the perfect word between “ski” and “trust”, “mogul”, but once again your program disagreed.
Also, please help the player understand the basis of the word relations. I was surprised that the shortest path between “investor” and “mark” was a non-dictionary word: “zuckerberg”. Presumably you are not using WordNet but some corpus of embeddings. If you say where he corpus comes from I can tailor my guesses. Conversely though, the shortest path feature is good because it teaches me what works. Maybe a top 5 would be even better.
I saw "mark" and immediately though "scam" (which I figured would be easy to get to from "investor," but it told me that "scam" and "mark" share only 2% similarity.
I can't even go from "investor" to "money" (16%). I'm not sure how "Zuckerberg" is closer to "investor" than "money" is.
I had the exact same first guess. Easy, I figured - that's a perfect connection between them. It just seems like the logic dictating how "close" two words are is opaque and incomplete.
Exactly my feelings. The relatedness calculation needs to be an order of magnitude faster to make iterating on an idea fun.
After seeing the Zuckerberg path I tried Cuban, which is not related to Mark or investor despite Mark Cuban being far more famous as an investor than Mark Zuckerberg.
I agree. This would be a great usecase for the fastText.js library. It can calculate similarities of words based on embeddings in the browser - no need to wait for a slow php script.
interesting, I did that just now (with out having seen your comment) and got 13% and 33% (for future reference this comment is being made about an hour after parent)
I tried the same guess, and felt the same confusion. Whatever quality it is that the relatedness factor measures doesn't seem to align well with my sense of word association.
I was delighted to see this--thanks for posting it. Games like this (and Semantle, etc.) have a surprisingly long history. The TikTok #gotitchallenge [0] shows one way to play in person, also demonstrated by the vlogbrothers [1]. But there was also a 19th C. parlor game called "What is My Thought Like?" [2] in which players had to make semantic connections between two random words or phrases, and that is basically the same game as "Le Jeu de la pensée" [3][my English translation 4], ca. 1701, which is an extended version with additional random features players have to connect to a random word
Comparing the first example against a similar guess based on intuition:
zuckerberg => investor(21%), mark(20%)
cuban => investor(3%), mark(%4)
Using google as a general guide to how often these words appear together
mark cuban => About 40,500,000 results on google
"mark cuban" => About 13,200,000 results on google
"mark" "cuban" => About 33,500,000 results on google
investor cuban => About 80,800,000 results on google
"investor cuban" => About 945 results on google
"investor" "cuban" => About 9,810,000 results on google
mark zuckerberg => About 41,700,000 results on google
"mark zuckerberg" => About 29,400,000 results on google
"mark" "zuckerberg" => About 35,700,000 results on google
investor zuckerberg => About 11,100,000 results on google
"investor zuckerberg" => About 479 results on google
"investor" "zuckerberg" => About 3,160,000 results on google
Considering the above results of how often the base words appear together and the added knowledge that Mark Cuban is more recognized for his investment activity than Zuckerberg I wonder how the relational scores are calculated by the game.
(Note: I realize this is nit-picking in an extreme sense but I found myself very interested in the underlying tech behind the game and this was part of my exploration so I thought I would share it with everyone else. Feel free to tear apart my methods I am still very interested in how the OP coded their solution)
I suspect this is because "cuban" has a lot of meaning in other contexts as well. If you see "cuban" out of context, one may think of Cuba or even sandwiches before thinking about Mark Cuban or other investors.
I'm irritated to learn that proper nouns are allowed. That's unusual for word games, and imho breaks the spirit of the thing. But honestly most of the frustration is not knowing whether the game is going to treat two words as related enough in advance. It doesn't feel like I'm being clever, it feels like I'm blindly exploring a graph.
How is relatedness measured? Using some embedding space? I often disagree with the measurements, the worst one being "punch" and "bowl" only relating 12%.
The concept is very fun though. I might try to make my own version, as it also seems like a fun side project and a way to explore different word embedding spaces. Could be fun to maybe also have a visualization of the embedding space.
Per instructions, word similarities are computed using word vectors[1].
Note that the relatedness of words will depend on the training set. Many of these word2vec-based games uses data that was trained on Google News[2], so if "Unrelated Words" uses the same data, you should be looking for word pairs that are more common in news but perhaps less common in general text.
Semantle[3] is another game based on word vectors. I like "Unrelated Words" better because whereas Semantle requires guessing one fixed target word, which is often very different from its nearest neighbor, this game requires guessing a set of words, the flexibility of which makes it feel less frustrating.
I'm author of https://enlinko.com/ game, published it 24 days ago:
https://news.ycombinator.com/item?id=35630451
Domain for this game has been created 9 days ago. So, i think someone was heavily inspired by my idea.
I understand that anyone can make game with same idea, but i'm bit sad that Enlinko haven't got such traction on HN as this game.
Maybe have a button to flip the direction of the word chain, so you could work from either end and meet in the middle somewhere.
https://pixletters.comhttps://betweenle.com
As for Enlinko, it's hard game to balance. That's why i've made three difficulty levels. As for daily puzzles, i'm now using semi automated generator - it generates lot of pairs, try to solve them, check if solution counts match some rules and then i'm hand picking those potential pairs for daily puzzles.
A question I ran into while playing your game is why it says "Amazon" and "Prime" are only 3% related? That seems very surprising.
https://github.com/commonsense/conceptnet-numberbatch
I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.
Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.
I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.
Also I don't see source code.
I do not know which one was created first.
"Relatedness" here is according to ... something ... Approximately but not exactly the likelyhood that words appear near to one another in a large corpus of text. Probably doing lookups on something crunched by google books.
Thanks for posting about Enlinko. I'm author of it, i've published it 24 days ago:
https://news.ycombinator.com/item?id=35630451
Domain for this game has been created 9 days ago. So, i think someone was heavily inspired by my idea.
I understand that anyone can make game with same idea, but i'm bit sad that Enlinko haven't got such traction on HN as this game.
As for relatedness, my game uses semantic vectors from this model https://github.com/commonsense/conceptnet-numberbatch
Deleted Comment
I was a bit miffed that “currency” was not considered to be related to “mark”. Similarly I thought I’d found the perfect word between “ski” and “trust”, “mogul”, but once again your program disagreed.
Also, please help the player understand the basis of the word relations. I was surprised that the shortest path between “investor” and “mark” was a non-dictionary word: “zuckerberg”. Presumably you are not using WordNet but some corpus of embeddings. If you say where he corpus comes from I can tailor my guesses. Conversely though, the shortest path feature is good because it teaches me what works. Maybe a top 5 would be even better.
You’re onto something though, keep at it!
I can't even go from "investor" to "money" (16%). I'm not sure how "Zuckerberg" is closer to "investor" than "money" is.
After seeing the Zuckerberg path I tried Cuban, which is not related to Mark or investor despite Mark Cuban being far more famous as an investor than Mark Zuckerberg.
Is my expectation that the first percentage should be higher off?
Check. That worked.
"Capital" --> "Letter"
Did not work at all. And yet the two words are side-by-side with extreme frequency.
So, basically, I don't know how this game gauges relatedness. I do know that I don't like it.
Try out semantle to get a better sense of it if it's not immediately intuitive what I mean by that.
Edit: oh wait I nevermind wave/waves
I also had a random one:
Heat -> bar
I went with pressure, since it is related to heat obviously, and a bar is a unit of pressure. But it didn’t like the second one.
I wonder if there’s a homonym issue, or if I just don’t understand word embeddings.
[0] https://www.tiktok.com/tag/gotitchallenge
[1] https://www.youtube.com/watch?v=kyx8iMKYrE8
[2] https://www.google.com/books/edition/American_Girl_s_Book/WO...
[3] https://www.google.com/books/edition/Les_jeux_d_esprit_ou_La...
[4] https://wobbupalooza.neocities.org/1701#tr_60
zuckerberg => investor(21%), mark(20%)
cuban => investor(3%), mark(%4)
Using google as a general guide to how often these words appear together
mark cuban => About 40,500,000 results on google
"mark cuban" => About 13,200,000 results on google
"mark" "cuban" => About 33,500,000 results on google
investor cuban => About 80,800,000 results on google
"investor cuban" => About 945 results on google
"investor" "cuban" => About 9,810,000 results on google
mark zuckerberg => About 41,700,000 results on google
"mark zuckerberg" => About 29,400,000 results on google
"mark" "zuckerberg" => About 35,700,000 results on google
investor zuckerberg => About 11,100,000 results on google
"investor zuckerberg" => About 479 results on google
"investor" "zuckerberg" => About 3,160,000 results on google
Considering the above results of how often the base words appear together and the added knowledge that Mark Cuban is more recognized for his investment activity than Zuckerberg I wonder how the relational scores are calculated by the game.
(Note: I realize this is nit-picking in an extreme sense but I found myself very interested in the underlying tech behind the game and this was part of my exploration so I thought I would share it with everyone else. Feel free to tear apart my methods I am still very interested in how the OP coded their solution)
The concept is very fun though. I might try to make my own version, as it also seems like a fun side project and a way to explore different word embedding spaces. Could be fun to maybe also have a visualization of the embedding space.
Note that the relatedness of words will depend on the training set. Many of these word2vec-based games uses data that was trained on Google News[2], so if "Unrelated Words" uses the same data, you should be looking for word pairs that are more common in news but perhaps less common in general text.
Semantle[3] is another game based on word vectors. I like "Unrelated Words" better because whereas Semantle requires guessing one fixed target word, which is often very different from its nearest neighbor, this game requires guessing a set of words, the flexibility of which makes it feel less frustrating.
[1] https://en.wikipedia.org/wiki/Word_embedding
[2] https://code.google.com/archive/p/word2vec/
[3] https://news.ycombinator.com/item?id=31588388
I think the logic here needs some work, very cool idea though.