Unrelated Words Puzzle

Hello,

I'm author of https://enlinko.com/ game, published it 24 days ago:

https://news.ycombinator.com/item?id=35630451

Domain for this game has been created 9 days ago. So, i think someone was heavily inspired by my idea.

I understand that anyone can make game with same idea, but i'm bit sad that Enlinko haven't got such traction on HN as this game.

CrazyStat · 2 years ago

Thanks for linking your game! I love how fast it is. One thing I would like is the ability to work backwards from the end--on the challenge problem yesterday I got all the way from "artist" to "chess", only to find that neither "chess" nor "checkmate" nor any other chess-related word I could think of met the 30% threshold to get to "check". That was frustrating.

Maybe have a button to flip the direction of the word chain, so you could work from either end and meet in the middle somewhere.

hershyb_ · 2 years ago

Hey! I'm trying to come up with a similar daily puzzle game, did anything help you come up with the idea? Also, do you generate these unrelated words daily and then vet them before releasing the daily puzzle, or is it all pretty much automated?

michalg82 · 2 years ago

I don't really remember how i've come with that idea :) Basically i've become obsessed with word games lately and made couple of them. I'm most proud and happy of those two:

https://pixletters.com https://betweenle.com

As for Enlinko, it's hard game to balance. That's why i've made three difficulty levels. As for daily puzzles, i'm now using semi automated generator - it generates lot of pairs, try to solve them, check if solution counts match some rules and then i'm hand picking those potential pairs for daily puzzles.

zeven7 · 2 years ago

Great idea for a game!

A question I ran into while playing your game is why it says "Amazon" and "Prime" are only 3% related? That seems very surprising.

michalg82 · 2 years ago

I'm using those vectors, which latest version is from 2019:

https://github.com/commonsense/conceptnet-numberbatch

I guess data used for making those vectors doesn't contain many occurrences of those two words in relation.

Anyway, that's downside of word vectors idea. There always will be some words which we human will consider more or less related than word vectors.

I've tried finding best one. It's different what Semantle uses (word2vec from Google) and different what Contexto uses (Glove). But still there are probably many word pairs which could match better.

hgsgm · 2 years ago

Enlinko is a dead end because it only allows 1 5-second game per day.

Also I don't see source code.

I like it a lot, but it’s also frustrating. Perhaps it’s being hugged to death right now but the lookups are very slow, so when I disagree with the results it’s a bit painful. If the results were quicker it would not be so bad, I could try different things.

I was a bit miffed that “currency” was not considered to be related to “mark”. Similarly I thought I’d found the perfect word between “ski” and “trust”, “mogul”, but once again your program disagreed.

Also, please help the player understand the basis of the word relations. I was surprised that the shortest path between “investor” and “mark” was a non-dictionary word: “zuckerberg”. Presumably you are not using WordNet but some corpus of embeddings. If you say where he corpus comes from I can tailor my guesses. Conversely though, the shortest path feature is good because it teaches me what works. Maybe a top 5 would be even better.

You’re onto something though, keep at it!

SamBam · 2 years ago

I saw "mark" and immediately though "scam" (which I figured would be easy to get to from "investor," but it told me that "scam" and "mark" share only 2% similarity.

I can't even go from "investor" to "money" (16%). I'm not sure how "Zuckerberg" is closer to "investor" than "money" is.

kimbernator · 2 years ago

I had the exact same first guess. Easy, I figured - that's a perfect connection between them. It just seems like the logic dictating how "close" two words are is opaque and incomplete.

johtso · 2 years ago

I thought Deutsche Mark..

CrazyStat · 2 years ago

Exactly my feelings. The relatedness calculation needs to be an order of magnitude faster to make iterating on an idea fun.

After seeing the Zuckerberg path I tried Cuban, which is not related to Mark or investor despite Mark Cuban being far more famous as an investor than Mark Zuckerberg.

nilstycho · 2 years ago

"Cuban" was my first try. I'm surprised that "Zuckerberg" works so well given how poorly "Cuban" performs.

tomthe · 2 years ago

I agree. This would be a great usecase for the fastText.js library. It can calculate similarities of words based on embeddings in the browser - no need to wait for a slow php script.

sparsely · 2 years ago

I felt like "market" should have got better than 6% related score - investors participate in stock markets, and they often mark-to-market.

obituary_latte · 2 years ago

Agreed--I think it's overloaded at the moment. Not clear how it's supposed to work since the responses are so delayed. Looks fun, though!

albrewer · 2 years ago

I thought investor -> currency -> mark would be a slam dunk but apparently "mark" here is not related to the German Mark.

NiloCK · 2 years ago

See also https://enlinko.com/. The calculations here are snappy.

I do not know which one was created first.

"Relatedness" here is according to ... something ... Approximately but not exactly the likelyhood that words appear near to one another in a large corpus of text. Probably doing lookups on something crunched by google books.

Thanks for posting about Enlinko. I'm author of it, i've published it 24 days ago:

As for relatedness, my game uses semantic vectors from this model https://github.com/commonsense/conceptnet-numberbatch

Deleted Comment

agos · 2 years ago

this is really nice!

neilk · 2 years ago

xyztimm · 2 years ago

I put waves for radio and ocean and got 19% and 55%.

Is my expectation that the first percentage should be higher off?

A_D_E_P_T · 2 years ago

"Investment" --> "Capital"

Check. That worked.

"Capital" --> "Letter"

Did not work at all. And yet the two words are side-by-side with extreme frequency.

So, basically, I don't know how this game gauges relatedness. I do know that I don't like it.

dangond · 2 years ago

Semantic similarity means similarity of meaning, not how frequently they appear together.

Try out semantle to get a better sense of it if it's not immediately intuitive what I mean by that.

Avshalom · 2 years ago

interesting, I did that just now (with out having seen your comment) and got 13% and 33% (for future reference this comment is being made about an hour after parent)

Edit: oh wait I nevermind wave/waves

Did you really? I just did it just now (9 minutes after your comment) and got 19% and 55%.

zoogeny · 2 years ago

I literally just did the exact same. I'm totally unmotivated by this game if this kind of connection isn't what it is looking for.

bee_rider · 2 years ago

I also did waves. I needed amplitude to bridge the gap to radio.

I also had a random one:

Heat -> bar

I went with pressure, since it is related to heat obviously, and a bar is a unit of pressure. But it didn’t like the second one.

I wonder if there’s a homonym issue, or if I just don’t understand word embeddings.

marssaxman · 2 years ago

I tried the same guess, and felt the same confusion. Whatever quality it is that the relatedness factor measures doesn't seem to align well with my sense of word association.

WobbuPalooza · 2 years ago

I was delighted to see this--thanks for posting it. Games like this (and Semantle, etc.) have a surprisingly long history. The TikTok #gotitchallenge [0] shows one way to play in person, also demonstrated by the vlogbrothers [1]. But there was also a 19th C. parlor game called "What is My Thought Like?" [2] in which players had to make semantic connections between two random words or phrases, and that is basically the same game as "Le Jeu de la pensée" [3][my English translation 4], ca. 1701, which is an extended version with additional random features players have to connect to a random word

[0] https://www.tiktok.com/tag/gotitchallenge

[1] https://www.youtube.com/watch?v=kyx8iMKYrE8

[2] https://www.google.com/books/edition/American_Girl_s_Book/WO...

[3] https://www.google.com/books/edition/Les_jeux_d_esprit_ou_La...

[4] https://wobbupalooza.neocities.org/1701#tr_60

duckqlz · 2 years ago

Comparing the first example against a similar guess based on intuition:

zuckerberg => investor(21%), mark(20%)

cuban => investor(3%), mark(%4)

Using google as a general guide to how often these words appear together

mark cuban => About 40,500,000 results on google

"mark cuban" => About 13,200,000 results on google

"mark" "cuban" => About 33,500,000 results on google

investor cuban => About 80,800,000 results on google

"investor cuban" => About 945 results on google

"investor" "cuban" => About 9,810,000 results on google

mark zuckerberg => About 41,700,000 results on google

"mark zuckerberg" => About 29,400,000 results on google

"mark" "zuckerberg" => About 35,700,000 results on google

investor zuckerberg => About 11,100,000 results on google

"investor zuckerberg" => About 479 results on google

"investor" "zuckerberg" => About 3,160,000 results on google

Considering the above results of how often the base words appear together and the added knowledge that Mark Cuban is more recognized for his investment activity than Zuckerberg I wonder how the relational scores are calculated by the game.

(Note: I realize this is nit-picking in an extreme sense but I found myself very interested in the underlying tech behind the game and this was part of my exploration so I thought I would share it with everyone else. Feel free to tear apart my methods I am still very interested in how the OP coded their solution)

wakamoleguy · 2 years ago

I suspect this is because "cuban" has a lot of meaning in other contexts as well. If you see "cuban" out of context, one may think of Cuba or even sandwiches before thinking about Mark Cuban or other investors.

zeta0134 · 2 years ago

I'm irritated to learn that proper nouns are allowed. That's unusual for word games, and imho breaks the spirit of the thing. But honestly most of the frustration is not knowing whether the game is going to treat two words as related enough in advance. It doesn't feel like I'm being clever, it feels like I'm blindly exploring a graph.

kaesve · 2 years ago

How is relatedness measured? Using some embedding space? I often disagree with the measurements, the worst one being "punch" and "bowl" only relating 12%.

The concept is very fun though. I might try to make my own version, as it also seems like a fun side project and a way to explore different word embedding spaces. Could be fun to maybe also have a visualization of the embedding space.

omoikane · 2 years ago

Per instructions, word similarities are computed using word vectors[1].

Note that the relatedness of words will depend on the training set. Many of these word2vec-based games uses data that was trained on Google News[2], so if "Unrelated Words" uses the same data, you should be looking for word pairs that are more common in news but perhaps less common in general text.

Semantle[3] is another game based on word vectors. I like "Unrelated Words" better because whereas Semantle requires guessing one fixed target word, which is often very different from its nearest neighbor, this game requires guessing a set of words, the flexibility of which makes it feel less frustrating.

[1] https://en.wikipedia.org/wiki/Word_embedding

[2] https://code.google.com/archive/p/word2vec/

[3] https://news.ycombinator.com/item?id=31588388

Apparently "invest" is only 6% related to "investor"?

I think the logic here needs some work, very cool idea though.