100 NLP Papers - Readit News

Unless you are a researcher (in academia or a corporate research lab), you should think twice before spending your time with these papers.

I have seen repeated examples of information technology industry professionals who go off on a wild goose chase of trying to parse the papers and reproduce them. If you are a machine learning practitioner or a data scientist in the industry, it is highly likely that you are going to waste your time with these papers. Here's a concrete example from the list: "John Lafferty, Andrew McCallum, Fernando C.N. Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001." This used to be the defining paper in early 2000s. Today it is important only as a road marker in the NLP research history which turned out to lead down an unproductive route.

Those who have not spent meaningful time in academia working on publishing their own research papers tend to fetishize them. The reality is that even the best papers in the field are a mess of ideas designed to please fickle reviewers and academic superiors. Most papers explore nooks and crannies of ideas that are irrelevant to an industry practitioner and are filled with assumptions that turn out to be impractical.

Unfortunately reading research papers has become a self-reinforcing status symbol for practitioners to name-drop and generally show off their in-crowd status rather than to rely on the ideas in the papers for a source of useful and practical information.

whymauri · 5 years ago

Uh, I disagree? Some of the best scientific discoveries of the 2000s came from insights found in old papers (50s, 60s, 70s). For example, optogenetics.

And now with Transformers, Hopfield learning and continuous Hebbian dynamics are making a small comeback. I mean, sure, don't implement the paper verbatim, but it's depressing to discard decades worth of work and insights only to rebuild it all again. Our disregard for past 'unsexy' work is one of the largest inefficiencies in science, hands down.

melenaboija · 5 years ago

The comment starts with "Unless you are a researcher...".

If you are a practitioner you are just trying to use the result of some research that someone else has done before, mostly to not have to do that research again.

Sure this result is possible thanks to revisiting old ideas, but using it does not mean you are discarding anything.

dtjohnnyb · 5 years ago

As a counterpoint, "Marco Tulio Ribeiro et al.: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList, ACL 2020" is a highly practical paper that should be incredibly useful in helping IT industry professionals to make their NLP/machine learning systems more robust. The more we can move towards software engineering practices of testing and monitoring, the better.

Maybe it's the exception that proves the rule though, I do agree with your point in general!

wenc · 5 years ago

Very true of applied research papers in general. Folks who've only either worked in academia, or in industry, but not both often don't appreciate the real distance between a journal publication and a corresponding practical application/implementation.

A "hot" paper that is wrong but intriguing can sometimes trigger a flurry of derivative works, and unless someone tries to implement it in the real world (and deal with the constraints of systems as found, not as imagined), suddenly it ends up spawning an entire new field that works in theory but not in practice.

The incentives of academia and industry are simply different.

Exceptions exist however, like journal papers that describe products/algorithms/technology that already exist in the real-world, like FFTW [1] or IPOPT [2]. In such cases, the publication exists merely as a form of technical documentation that other academics can easily cite.

Reality is a really good arbiter of how solid an idea is.

[1] http://www.fftw.org/pldi99.pdf

[2] https://github.com/coin-or/Ipopt

bratao · 5 years ago

I agree with but a nitpick. CRF is still very much used, even on Transformers architectures as the last layer in tasks such as NER. Many here in the leaderboard use it https://paperswithcode.com/sota/named-entity-recognition-ner...

Der_Einzige · 5 years ago

CRFs are still state of the art on many domains, and in some places are only just now being beaten by (far more expensive) transformer models. What are you smoking with this claim that CRFs are not useful to read about? CRFs are used a lot within industry and are very effective and interpretable...

JHonaker · 5 years ago

Yea, CRFs and graphical models in general are extremely versatile! It’s a shame more people don’t think about them. The major problem with them is computation, but there are approximate methods like belief propagation, expectation propagation, and even sequential Monte Carlo that you can leverage depending on your inference goals.

ericd · 5 years ago

Do you have recommendations for getting the lay of the research land without reading the papers?

Personally, I liked OpenAI's Spinning Up RL (which basically points you to useful papers) and Fast.ai's videos, but after those, it seems like reading papers are the main option.

osipov · 5 years ago

I find it more productive to follow the people of the academia rather than the papers.

psyklic · 5 years ago

If your goal is learning and understanding, reading papers won't be a waste of time. If you need good results fast, they are less likely to be useful.

aapppwe · 5 years ago

so what should nlp pratictioners and enthusiast read instead?

I've been going down the academic NLP rabbit hole lately, and at least in my domain (unsupervised key phrase extraction), the problem isn't the papers, it's the code (surprise!).

Let's start with the fact that in applied NLP, everyone has a plan until they get punched in the face by any number of pre-processing issues. And let's set aside the fact that in the end it's all going to regress to supervision, without which you can't optimize. Let's also set aside the fact that performance against a "gold standard" SemEval dataset doesn't mean shit in a lot of real world applications.

So you try out the standard issue "top of the line" algo, like YAKE, which is so fucking slow in pure Python that it'll choke a Bayesian optimizer. You sit around for a while debating whether or not to port it to Cython, having little idea if the effort will pay off because you aren't sure how well YAKE is going to work to begin with and it might get bested by another algo anyway.

So you go looking through the literature and you're delighted to find that within just the last few months, there have been some really cool and promising algos coming out with solid benchmarks, and there's code available to boot. Yay!

So you download the "weakly supervised" statistical one and it turns out to be a fucked up polyglot of Bash, C++, and a stale version of OpenJDK, some of which you have to compile yourself with g++, and then you have to dump your corpus into text files even though you've already got it in memory, run it through a tokenizer you neither want nor need, and then read the results back out of other text files. Sure, there's a docker version. It's full of bloat and solves some of the more negligible problems at hand.

Then you download a graph-based algo and it's such an undocumented mess of spaghetti it might as well have been written by an Italian restaurant. So you spend a really unreasonable amount of time just trying to figure out which function even takes your text as an input, and you read through a bunch of other functions trying to figure out if it needs to be pre-tokenized or not and if it wants the input as sentences or not or whatever. It also wants your input as a text file.

Then you download a language model-based algo and you think you're going to run the BERT variant you have at hand, but you double check the paper and it happens to perform way better with ELMO and then if you're lucky you don't spend a whole day trying to get AllenNLP running because you're using WSL on a laptop without a GPU and the non-gpu Tensorflow dependency is shitting itself all over the stack trace. You finally get the environment going in all its bloated glory even though you just wanted the pre-trained ELMO model, which you finally get deployed to Cortex or whatever and breathe a sigh of relief. And then it turns out your corpus is so domain specific that your matrix is sparser than swiss cheese because it's chock full of unks.

What have you learned after all this? That building an ensemble model which plays nicely with spaCy or SparkNLP is going to be an order of magnitude harder. Have fun!

Der_Einzige · 5 years ago

I wrote a summarizer which (when using the right settings) performs unsupervised key phrase extraction using language models. It is available here: https://github.com/Hellisotherpeople/CX_DB8 It seems that it would be very useful to you.

Like most data science code, it's non-trivial to install (it used to be when it was still updated)mostly because some dependencies are out of date and I will not risk a lawsuit from my current employer due to the similarity between this work and my day-to-day work. There is a jupyter notebook available which will allow you to use it without an install