_t89y (u/_t89y) - Readit News

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

jll29 · 2 years ago

Curiously, the upcoming third edition of Jurafsky and Martin [1], one of the two standard text books for NLP, places Context-Free Grammars, Combinatory Categorial Grammars, and logical meaning representations in its appendices on the companion Web site, no longer in the text book itself. Unthinkable only a few years ago.

[1] https://web.stanford.edu/~jurafsky/slp3/

_t89y · 2 years ago

That's a really interesting thing to point out. NLP doesn't even work on language anymore. If it was adjacent to information retrieval before it is now a subfield of information retrieval. As long as it's grounded in Firth Mode natural language understanding, as it's called, can't really be a semantics.

I tried to create a Kaggle (TensorFlow Hub, TensorFlow Quantum) competition for motivating alternative formalisms but was unable to publish it because all Kaggle competitions must be evaluated with information retrieval metrics. Talk about a one-track mindset!

Today work in NLP advances by ``leaderboards'' and dubious, language-specific evaluation datasets that the same authors stand to benefit from when their proprietary model is praised for doing well on the evaluation criteria they invented a few months back. It validates the price hike for access to their proprietary models.

These formalisms that do work are at odds with Firth Mode, the preferred representation for Google (Stanford, OpenAI), so I guess we should be thankful they're still in the book. If you're interested in language, though, I'd suggest picking up a different book.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

minimaxir · 2 years ago

Embeddings and vector stores wouldn't have taken off in the way that they did if they didn't actually work.

_t89y · 2 years ago

They've taken off because they have utility in information retrieval systems. They work for getting info into Google (Stanford) Knowledge Panels. I don't think it really goes any further than that. They are most useful to the few orgs that went from dominating NLP research to controlling it outright by convincing everyone scale is the only way forward and owning scale. Alternatives to word embeddings aren't even considered or discussed. They are assumed as a starting point for pretty much all work in NLP today even though they are as uninteresting today as they were when word2vec was published in 2013. They do not and will not work for language.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

mikewarot · 2 years ago

My understanding was that Word2Vec[1] was trained on Wikipedia and other such texts, not artificially constructed things like the triplets you suggest. There's an inherent structure present in human languages that enable the "magic" of embeddings to work, as far as I can tell.

[1] https://code.google.com/archive/p/word2vec/

_t89y · 2 years ago

There is an inherent structure in language. Embeddings do not and will not capture it. It's why they do not work. Their ability to form grammatical sentences with high accuracy is part of the illusion that you have been understood.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

yorwba · 2 years ago

Works for:

Bitext mining: Given a sentence in one language, find its translation in a collection of sentences in another language using the cosine similarity of embeddings.

Classification: identify the kind of text you're dealing with using logistic regression on the embeddings.

Clustering: group similar texts together using k-means clustering on the embeddings.

Pair Classification: determine whether two texts are paraphrases of each other by using a binary threshold on the cosine similarity of the embeddings.

Reranking: given a query and a list of potential results, sort relevant results ahead of irrelevant ones by sorting according to the cosine similarity of embeddings.

Etc etc.

These are MTEB benchmark tasks https://arxiv.org/pdf/2210.07316.pdf . If you have no need for something like that, good for you, you don't need to care how well embeddings work for these tasks.

_t89y · 2 years ago

Easy there, Firthmiester. I'm familiar with the canon. If getting some desirable behavior in your application is good enough for you then feel free to ignore what I'm saying.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

stormfather · 2 years ago

...like?

_t89y · 2 years ago

https://news.ycombinator.com/item?id=39680852

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

throwaway2562 · 2 years ago

Intuitively I agree with you, despite the unexpected ‘success’ of current approaches. What formalisms do you suggest?

_t89y · 2 years ago

The Lambek calculus. Categorial grammars. Meanings are proofs. Not clusters of directional magnitudes in space.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

singularity2001 · 2 years ago

like?

_t89y · 2 years ago

Having a mid-century theory of natural language semantics isn't necessarily a bad thing. You just have to pick the right one.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

blackbear_ · 2 years ago

Can you elaborate?

_t89y · 2 years ago

Modeling language in a latent space is useful for certain kinds of analyses and certain aspects of language. It has its place as an empirical tool. That place is not the nuts and bolts of language itself. There are more suitable formalisms for this than directional magnitudes and BPE tiktokens.

_t89y commented on Is Cosine-Similarity of Embeddings Really About Similarity? arxiv.org/abs/2403.05440... · Posted by u/Jimmc414

_t89y · 2 years ago

It is meaningless to talk about cosine similarity of sentences, or words, at all. Choose whatever mapping you want. You'll still be in Firth Mode.

_t89y · 2 years ago

Uh oh. LOL. Got some angry Firthers out there.