Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
I'll admit it's something of a bold label, but there is truth in it.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.
---
There's also state space models, but don't believe they've gone mainstream yet.
I would much rather check my writing against grammatical rules that are hard coded in an open source program—meaning that I can change them—than ones that I imagine would be subject to prompt fiddling or worse; implicitly hard coded in a tangle of training data that the LLM would draw from.
If a language changes, there are only three possible options: either it becomes more expressive; or it becomes less expressive; or it remains as expressive as before.
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
There are two versions of the LanguageTool: open source and cloud-based. Open source checks the individual words in the dictionary just like the system's spell checker. Maybe there is something more to it, but in my tests, it did not fix even obvious errors. It's not an alternative to Grammarly or this tool.
IMO not using LLMs is a big plus in my book. Grammarly has been going downhill since they've been larding it with "AI features," it has become remarkably inconsistent. It will tell me to remove a comma one hour, and then tell me to add it back the next.
Being dyslexic, I was an avid Grammarly user. Once it started adding "AI features" the deterioration was noticeable, I cancelled my subscription and stopped using it a year ago.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
Grammarly sometimes gets stuck in a loop, where it suggests changing from A to B. It then immediately suggests changing from B to A again, continuing to suggest the opposite change every time I accept the suggestion.
It's not a problem; I make the determination which option I like better, but it is funny.
General purpose LLMs seem to get very confused about punctuation, in my experience. It's one of their big areas of obvious failing. I'm surprised Grammarly would allow this to happen.
Grammarly came out before the LLMs. I'm not sure what approach it took, but they're likely feeling a squeeze as LLMs can tell you how to rewrite a sentence to remove passive voice and all that. I doubt the LLMs are as consistent (some comments below show some big issues), but they're free (for now).
'imo' and 'in my book' are redundant in the same sentence. Are there rules-based techniques to catch things like that? Btw I loved the use of 'larding' outside the context of food.
DeepL is different in my opinion. They always focused on machine learning for languages.
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia
prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
What's wild is that OpenAI's earlier models were trained to guess the next word in a sentence. I wonder if GPT-2 would get "though" correct more often than the latest AI-assisted writing tools like Grammerly.
There are some areas where it seems like LLMs (or even SLMs) should be way more capable. For example, when I touch a word on my Kindle, I'd think Amazon would know how to pick the most relevant definition. Yet it just grabs the most common definition. For example, consider the proper definition of "toilet" in this passage: "He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."
Why wouldn't you want an LLM for a language learning tool? Language is one of things I would trust an LLM completely on. Have you ever seen ChatGPT make an English mistake?
Grammarly is all in on AI and recently started recommended splitting "wasn't" and added the contraction to the word it modified. Example: "truly wasn't" becomes "was trulyn't"
Hm ... I wonder, is Grammarly also responsible for the flood of contraction of lexical "have" the last few years? It's standard in British English, but outside of poetry it is proscribed in almost all other dialects (which only permit contraction of auxiliary "have").
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
Yeah, I agree. An open-source LLM-based grammar checker with a user interface similar to Grammarly is probably what I'm looking for. It doesn't need to be perfect (none of the options are); it just needs to help me become a better writer by pointing out issues in my text. I can ignore the false positives, and as long as it helps improve my text, I don't mind if it doesn't catch every single issue.
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
I tried to use one LLM based tool to rewrite sentence in more official corporate form, and it rewrote something like "we are having issues with xyz" into "please provide more information and I'll do my best to help".
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
uh. yes? it's far from uncommon, and sometimes it's ludicrously wrong. Grammarly has been getting quite a lot of meme-content lately showing stuff like that.
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.
Do you have any examples? The only time I noticed an LLM make a language mistake was when using a quantized model (gemma) with my native language (so much smaller training data pool).
I'm just a bit skeptical about this quote:
> Harper takes advantage of decades of natural language research to analyze exactly how your words come together.
But it's just a rather small collection of hard-coded rules:
https://docs.rs/harper-core/latest/harper_core/linting/trait...
Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
1. https://jalammar.github.io/illustrated-word2vec/
2. https://jalammar.github.io/visualizing-neural-machine-transl...
3. https://jalammar.github.io/illustrated-transformer/
4. https://jalammar.github.io/illustrated-bert/
5. https://jalammar.github.io/illustrated-gpt2/
And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.
---
There's also state space models, but don't believe they've gone mainstream yet.
https://newsletter.maartengrootendorst.com/p/a-visual-guide-...
And diffusion models - but I'm struggling to find a good resource so https://ml-gsai.github.io/LLaDA-demo/
---
All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.
The Neovim configuration for the LSP looks neat: https://writewithharper.com/docs/integrations/neovim
The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
Deleted Comment
https://github.com/languagetool-org/languagetool
I generally run it in a Docker container on my local machine:
https://hub.docker.com/r/erikvl87/languagetool
I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.
It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.
https://dev.languagetool.org/finding-errors-using-n-gram-dat...
I would suggest diving into it more because it seems like you missed how customizable it is.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
It's not a problem; I make the determination which option I like better, but it is funny.
Not that I think LLM is always better, but it would be interesting to compare these two approaches.
Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
So just like English teachers I see
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
There are some areas where it seems like LLMs (or even SLMs) should be way more capable. For example, when I touch a word on my Kindle, I'd think Amazon would know how to pick the most relevant definition. Yet it just grabs the most common definition. For example, consider the proper definition of "toilet" in this passage: "He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."
No errors detected. So this needs a lot of rule contributions to get to Grammarly level.
> In large, this is _how_ anything crawler-adjacent tends to be
It suggests
> In large, this is how _to_ anything crawler-adjacent tends to be
https://imgur.com/a/RQZ2wXA
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
Has to be a bug in their rule-based system?
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.