Harper – an open-source alternative to Grammarly

Fantastic work, I was so fed up with Grammarly and instantly installed this.

I'm just a bit skeptical about this quote:

> Harper takes advantage of decades of natural language research to analyze exactly how your words come together.

But it's just a rather small collection of hard-coded rules:

https://docs.rs/harper-core/latest/harper_core/linting/trait...

Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?

There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.

LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.

Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.

chilipepperhott · 2 months ago

I'll admit it's something of a bold label, but there is truth in it.

Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.

> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.

This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.

I appreciate your skepticism and attention to detail.

s1291 · 2 months ago

Here's an article you might find interesting: https://www.quantamagazine.org/when-chatgpt-broke-an-entire-...

tough · 2 months ago

to someone who would like to study/learn that evolution, any good recs?

jasonjmcghee · 2 months ago

This skips over Bag of Words / N-Gram / TF-IDF and many other things, but paints a reasonable picture of the progression.

1. https://jalammar.github.io/illustrated-word2vec/

2. https://jalammar.github.io/visualizing-neural-machine-transl...

3. https://jalammar.github.io/illustrated-transformer/

4. https://jalammar.github.io/illustrated-bert/

5. https://jalammar.github.io/illustrated-gpt2/

And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.

---

There's also state space models, but don't believe they've gone mainstream yet.

https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

And diffusion models - but I'm struggling to find a good resource so https://ml-gsai.github.io/LLaDA-demo/

---

All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.

IMO not using LLMs is a big plus in my book. Grammarly has been going downhill since they've been larding it with "AI features," it has become remarkably inconsistent. It will tell me to remove a comma one hour, and then tell me to add it back the next.

tiew9Vii · 2 months ago

Being dyslexic, I was an avid Grammarly user. Once it started adding "AI features" the deterioration was noticeable, I cancelled my subscription and stopped using it a year ago.

I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.

Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.

InsideOutSanta · 2 months ago

Grammarly sometimes gets stuck in a loop, where it suggests changing from A to B. It then immediately suggests changing from B to A again, continuing to suggest the opposite change every time I accept the suggestion.

It's not a problem; I make the determination which option I like better, but it is funny.

boplicity · 2 months ago

General purpose LLMs seem to get very confused about punctuation, in my experience. It's one of their big areas of obvious failing. I'm surprised Grammarly would allow this to happen.

jethro_tell · 2 months ago

The internet, especially post phone keyboards, is extremely inconsistent about punctuation. I’m not sure how anyone could think an llm wouldn’t be.

raincole · 2 months ago

So is there a similar tool but based on an LLM?

Not that I think LLM is always better, but it would be interesting to compare these two approaches.

mannycalavera42 · 2 months ago

Grammarly is (was) written in Common LISP https://www.grammarly.com/blog/engineering/running-lisp-in-p...

Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now

7thaccount · 2 months ago

Grammarly came out before the LLMs. I'm not sure what approach it took, but they're likely feeling a squeeze as LLMs can tell you how to rewrite a sentence to remove passive voice and all that. I doubt the LLMs are as consistent (some comments below show some big issues), but they're free (for now).

chneu · 2 months ago

Thank you. In general my grammarly and gboard predictions have become so, so bad over the last year.

harvey9 · 2 months ago

'imo' and 'in my book' are redundant in the same sentence. Are there rules-based techniques to catch things like that? Btw I loved the use of 'larding' outside the context of food.

Alex-Programs · 2 months ago

DeepL Write was pretty good in the post-LLM, pre-ChatGPT era.

Dr4kn · 2 months ago

DeepL is different in my opinion. They always focused on machine learning for languages.

They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.

They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.

DeepL shows, in my opinion, how much more useful a model trained for specific uses is.

raverbashing · 2 months ago

> It will tell me to remove a comma one hour, and then tell me to add it back the next.

So just like English teachers I see

oersted · 2 months ago

tolerance · 2 months ago

I would much rather check my writing against grammatical rules that are hard coded in an open source program—meaning that I can change them—than ones that I imagine would be subject to prompt fiddling or worse; implicitly hard coded in a tangle of training data that the LLM would draw from.

The Neovim configuration for the LSP looks neat: https://writewithharper.com/docs/integrations/neovim

The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.

triknomeister · 2 months ago

You would lose out on evolution of language.

phoe-krk · 2 months ago

Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.

airstrike · 2 months ago

I don't need grammar to evolve in real time. In fact, having a stabilizing function is probably preferable to the alternative.

eadmund · 2 months ago

If a language changes, there are only three possible options: either it becomes more expressive; or it becomes less expressive; or it remains as expressive as before.

Certainly we would never want our language to be less expressive. There’s no point to that.

And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.

Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.

So, what would the point of evolution be? If technology impedes it … fine.

Polarity · 2 months ago

why did you use chatgpt for this text then?

acidburnNSA · 2 months ago

I can write em-dashes on my keyboard in one second using the compose key: right alt + ---

Deleted Comment

shortformblog · 2 months ago

LanguageTool (a Grammarly competitor) is also open source and can be managed locally:

https://github.com/languagetool-org/languagetool

I generally run it in a Docker container on my local machine:

https://hub.docker.com/r/erikvl87/languagetool

I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.

It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.

akazantsev · 2 months ago

There are two versions of the LanguageTool: open source and cloud-based. Open source checks the individual words in the dictionary just like the system's spell checker. Maybe there is something more to it, but in my tests, it did not fix even obvious errors. It's not an alternative to Grammarly or this tool.

There is. It can be heavily customized to your needs and built to leverage a large ngram data set:

https://dev.languagetool.org/finding-errors-using-n-gram-dat...

I would suggest diving into it more because it seems like you missed how customizable it is.

unfitted2545 · 2 months ago

This is a really nice app to use LanguageTool, it runs the server in the flatpak: https://flathub.org/apps/re.sonny.Eloquent

pram · 2 months ago

aDyslecticCrow · 2 months ago

Harper is decent.

I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)

So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.

Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.

Breza · 2 months ago

What's wild is that OpenAI's earlier models were trained to guess the next word in a sentence. I wonder if GPT-2 would get "though" correct more often than the latest AI-assisted writing tools like Grammerly.

There are some areas where it seems like LLMs (or even SLMs) should be way more capable. For example, when I touch a word on my Kindle, I'd think Amazon would know how to pick the most relevant definition. Yet it just grabs the most common definition. For example, consider the proper definition of "toilet" in this passage: "He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."

aagha · 2 months ago

Have you tried Hemingway Editor?

demarq · 2 months ago

"Me and Jennifer went to have seen the ducks cousin."

No errors detected. So this needs a lot of rule contributions to get to Grammarly level.

alpb · 2 months ago

Similarly 0 grammatical errors flagged: "My name John. What your name? What day today?"

Tsarp · 2 months ago

I was initially impressed. But then I tested a bunch, it wasn't catching some really basic things. Mostly hit or miss.

wellthisisgreat · 2 months ago

What the duck is that test

canyp · 2 months ago

Nominative vs objective

marginalia_nu · 2 months ago

Goes the other way around too. For

> In large, this is _how_ anything crawler-adjacent tends to be

It suggests

> In large, this is how _to_ anything crawler-adjacent tends to be

healsdata · 2 months ago

Given this is an Automattic product, I'm hesitant to use it. If it gets remotely successful, Matt will ruin it in the name of profit.

josephcsible · 2 months ago

It's FOSS, so even if the worst happens, anyone could just fork the last good version and continue development there.

jantissler · 2 months ago

Oh, that’s a big no from me then.

icapybara · 2 months ago

Why wouldn't you want an LLM for a language learning tool? Language is one of things I would trust an LLM completely on. Have you ever seen ChatGPT make an English mistake?

Grammarly is all in on AI and recently started recommended splitting "wasn't" and added the contraction to the word it modified. Example: "truly wasn't" becomes "was trulyn't"

https://imgur.com/a/RQZ2wXA

o11c · 2 months ago

Hm ... I wonder, is Grammarly also responsible for the flood of contraction of lexical "have" the last few years? It's standard in British English, but outside of poetry it is proscribed in almost all other dialects (which only permit contraction of auxiliary "have").

Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?

akdev1l · 2 months ago

This is what peak innovation looks like

Destiner · 2 months ago

I don't think an LLM would recommend an edit like that.

Has to be a bug in their rule-based system?

I wonder how much memes like whomst'd might skew the training set.

Yeah, I agree. An open-source LLM-based grammar checker with a user interface similar to Grammarly is probably what I'm looking for. It doesn't need to be perfect (none of the options are); it just needs to help me become a better writer by pointing out issues in my text. I can ignore the false positives, and as long as it helps improve my text, I don't mind if it doesn't catch every single issue.

Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.

Szpadel · 2 months ago

I tried to use one LLM based tool to rewrite sentence in more official corporate form, and it rewrote something like "we are having issues with xyz" into "please provide more information and I'll do my best to help".

LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks

Groxx · 2 months ago

uh. yes? it's far from uncommon, and sometimes it's ludicrously wrong. Grammarly has been getting quite a lot of meme-content lately showing stuff like that.

it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.

perching_aix · 2 months ago

Do you have any examples? The only time I noticed an LLM make a language mistake was when using a quantized model (gemma) with my native language (so much smaller training data pool).

dartharva · 2 months ago

Because this "language learning tool" will be dominantly used to avoid actually learning the language.