A spellchecker used to be a major feat of software engineering (2008)

I read an article where a software engineer was about to go on a long plane flight, so he downloaded a file of all english words and tried to write a spell checker during his flight, with no internet access, just as an exercise. And when he landed, he had finished, and his algorithm was beautiful, simple, elegant, easy to understand, and never something I'd have come up with in a million years.

It was actually a little depressing - it made me realize that some people just have a "something" that I'm not going to get through hard work or whatnot.

*edit: ah, found it. https://norvig.com/spell-correct.html

andersource · 3 years ago

To be fair, Peter Norvig [0] isn't just another software engineer, he's a hardcore CS researcher, co-authored the most popular (pre-DL) AI textbook, was head of NASA's Computational Sciences Division, etc.

[0] https://en.wikipedia.org/wiki/Peter_Norvig

e_i_pi_2 · 3 years ago

I think what they're saying is that some people are Peter Norvig and got lucky, and other people won't get there with any amount of study or practice. He's definitely not an average engineer, and as a result most people shouldn't expect to have the same level of ability

I had the same thought - read the URL and then said "Oh well that's Peter Norvig he's just weird in a cool way"

rbancroft · 3 years ago

And he was probably one of the worlds foremost experts on this at the time given his position at google, haha.

azurelake · 3 years ago

I mean, he definitely knew about the concept of edit distance: https://en.wikipedia.org/wiki/Edit_distance before writing this code, which is the most difficult part. Here's a practice problem: https://leetcode.com/problems/edit-distance/.

So in this case, the special "something" he has is quite literally the amount of hard work he put in to learn this stuff. And if you take this opportunity to learn about edit distance, you'll be one step closer to that special someone you want to be.

pasquinelli · 3 years ago

maybe someday i'll finally be the person i want to be, and then i can really start to live.

JoshCole · 3 years ago

You know how there is that joke about reasoning about multi-dimensional structures?

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it."

– Geoffrey Hinton, A geometrical view of perceptrons

Well, you actually can come up with Norvig's spellchecker very easily with a similar trick. Just tell yourself: to deal with a really hard problem, just argmax with an okayish heuristic. Everyone does it.

Solutions like this seem impossibly elegant because they tolerate a whole lot of error, but they are actually extremely obvious when you tolerate error.

notfish · 3 years ago

That quote is incredible, and now I know what coffee feels like leaving my nose

teshigahara · 3 years ago

If it makes you feel any better the original is not the same as what is displayed now[0] so it did not take just one flight to achieve elegance, and also included some bugs which were mentioned in the errata later[1].

[0] - https://web.archive.org/web/20070410053746/http://norvig.com...

[1] - https://web.archive.org/web/20150906100448/http://www.norvig... : ctrl+f "Update"

TacticalCoder · 3 years ago

It's really nice and I'm a fan of the man but... Even back then it was a really simple spellchecker. Back then stuff like the "metaphone" and "double metaphone" algorithms already existed (suggesting candidates based on similarly sounding words). Fun algorithms 'em metaphones I'd say. Relatively easy to implement/port (been there, done that).

I'm sure there's better stuff now.

Also it's possible to better ranks candidates to fix typos by taking into account the layout of the keyboard (e.g. on a QWERTY keyboard is you see "wery" it is, despite having an edit distance of 1, highly unlikely the person typing meant to type "very". "weary" is much more likely). So just sorting by edit distance and then suggesting the most common word often doesn't result in the best candidate being shown first.

And then context. Ah, context. I type this and my browser isn't aware I'm talking about algorithms and not only says "metaphone" is incorrect but suggests "megaphone".

So... The matter still isn't settled if you ask me.

WalterBright · 3 years ago

Soundex is an earlier method.

GuB-42 · 3 years ago

This kind of feat is the exact opposite of what the article is about.

This code is very simple, very short, very unoptimized and done in a few hours, free solo. The kind of code the article could have linked as an example of how simple it is now to write a spellchecker, in fact it explicitly talks about how you can do it in a few lines of Python, i.e. this.

What the article is about is complex, highly optimized code that took months to do, because computers at the time didn't even have enough space to store the complete word list uncompressed, neither in storage nor in memory, and even with compression, you had to be clever.

chii · 3 years ago

so a question that could be asked today is: what could be achieved, if only the same cleverness was applied to the utilization of the massive resources available today?

One potential answer is indeed the recent developments in AI (generative stuff like GPT & stable diffusion etc).

stevenspasbo · 3 years ago

You're in for a bad time if you're going to be comparing yourself to Peter Norvig.

wheels · 3 years ago

The prowess here is greatly overstated. His implementation would be pretty obvious for anyone that had read an Information Retrieval textbook any time in the couple decades between when these methods were devised and when he wrote that post.

That's kind of like watching someone use Red-Black Trees and assuming that they intuited them out of thin air. It's just remembering a basic algorithm that would be familiar to anyone in his particular field.

feoren · 3 years ago

These ideas are extremely powerful. I built a spell-checker largely based on this article, by parsing English Wikipedia. At scale it needs a trie and a distance metric with better-scaling performance metric, but it works really well. These go together: your error metric is a priority-based multi-headed traversal of your trie -- this emulates comparing your word to every word in your dictionary; you can compare against a few million words very quickly.

Because it's based on Wikipedia, it can correct proper names too. It's very extensible: by changing the error model, you can get fuzzy auto-complete that can look at "Shwarz" and suggest "Schwarzenegger" (even though you missed the 'c'). You can extend it to looking at multiple words instead of just letters, for correcting common word-use issues as well.

WalterBright · 3 years ago

I built a spell checker in the D compiler, where the dictionary consists of the symbols in scope. It is very satisfactory.

agencies · 3 years ago

Is the code or expanded explanation available?

corobo · 3 years ago

Do you allow yourself to be bored at all?

I recently realised I've been missing my "something" for a while.. ya I've not had a dopamine free moment since doom scrolling became a thing.

Not comparing somethings, I probably couldn't write a spell checker much less a beautiful one, but my creative juices are much less clogged up since allowing myself to get bored.

I believe it's being able to combine a learned skillset with raw creativity (boredom) that gives a person that something you refer to.

andai · 3 years ago

For me it's largely "the joy of finding things out". If I know there's a standard solution to a problem, I'll avoid the "spoilers" because I want to see how far I can get on my own. (And yeah, sometimes I just get bored of doomscrolling... Building things is a much more satisfying way to spend my time!)

I often avoid things because I think they'll be hard, but then discover that they turn out to be easy. "Why didn't anyone tell me how easy this was! I could have done this ages ago!" was a running theme for me for a good while.

Of course, sometimes things are actually harder than you thought. But they're still doable! Whereas I have this cognitive bias where "difficult" is (emotionally) equivalent to "impossible", which is just really unfortunate. I'm sort of starting to see through that, that everything's made of atoms, and once you break a task down far enough, the individual steps are trivial.

rjh29 · 3 years ago

Worth noting he did not perform the "major feat of software engineering" like the OP says (given 256k ram or lower). He just put all the words into a hash. The main key is knowing about edit distance, if you know that, you can probably write the solution.

So yes he's very talented, but if you study enough computer science, you can also do it. Or don't. It's totally fine to be interested in other parts of engineering instead.

alfalfasprout · 3 years ago

Don't get me wrong, Peter Norvig is awesome, but if you're already familiar with levenshtein or even hamming distances, it's pretty straightforward to hack together a spellchecker.

rzimmerman · 3 years ago

Reminds me of a dumb project I did that takes a person's name and returns a name that is close, but not quite right: https://github.com/rzimmerman/shitty-autoreply

The idea was to set up an auto-reply to my email that looked automated, but surely couldn't be because it messed up the sender's name. Not nearly as clever as Peter Norvig's program and even less useful.

I'd say the bias toward common American names is a bug, but given that the goal of the project is to be obnoxious it's probably a feature.

pixel_tracing · 3 years ago

To be fair even Norvig’s implementation wasn’t great for a mobile device. It can be improved with a more efficient trie structure…

Xeoncross · 3 years ago

Do you have any links?

cortesoft · 3 years ago

It is a bit strange to me that it would be surprising that there are people way better at programming than you. The chance that you would be at the far right of the bell curve for any skill is pretty small, by definition. Almost everyone is going to not be the best at your thing. I don’t know why it would be depressing to not be the best.

addaon · 3 years ago

> any skill is pretty small, by definition

I challenge this, especially the "by definition" part. It depends on how many skills you recognize.

The chance of being the best artist (to the extent that such a statement is reasonable) might be one in eight billion.

The chance of being the best automotive sharpie artist is, similarly, one in eight billion.

The chance of being the best artist /of some particular skill set/ is, then, much higher -- if you recognize only a few thousand distinct skills here, you're already at one in a million.

Then realize that those who are likely to be surprised that there are better programmers than them are /not/ likely to be surprised that there are better artists than them. So we're assessing across all skills.

The chance of being on the far right of the bell curve of a specific skill is pretty small. The chance of being on the far right of the bell curve for /any/ skill is quite high, with only eight billion people around.

Find what you do better than anyone else, embrace it, and enjoy it.

knicholes · 3 years ago

I used it to apply to a job at Twitch, as they tasked me on creating a spellchecker. Thanks, Peter! Also, they didn't give me an offer.

kachurovskiy · 3 years ago

I've come to accept the fact that there are millions of people better than me at something. It's depressing and hopeless trying to be the best, there isn't even an objective measure. My goal is instead to be useful.

nottathrowaway3 · 3 years ago

That "something" is basically passion and motivation, a child-like fascination with computers and the powers they can give you. People like this are so valuable and great to work with not because they have some innate power to understand algorithms, but because they sell a brand of optimism reminiscent of the "old" tech industry, the American postwar optimism of the 90s, where new things actually were revolutionary.

Let's be real, in 5-6 hours he designed a very nice, simple algorithm to produce likely words from a dictionary based on frequency analysis, and produced 30 lines of Python code implementing it. A lot of engineers today could have done this.

The "specialness" was that he did it on his own time and enjoyed it.

There was an xkcd relevant here about enjoying Python: https://xkcd.com/353/

aflag · 3 years ago

Many developers would come up with that once they know the answer. Not as many will find the answer.

projektfu · 3 years ago

The optimism of the 90s. That's a strange one I hadn't heard before.

devoutsalsa · 3 years ago

I’d bet you have a “something” that isn’t obvious to you, or that people don’t compliment you on. And even if you don’t, comparing one’s self to anyone is sure to be depressing.

2143 · 3 years ago

Peter Norvig is at god-tier, comparable perhaps to Torvalds, van Rossum, Wozniak, Stallman, ESR, Kernighan, Ritchie, Gates, Cormen, and many others I've left out.

oska · 3 years ago

Gates is definitely not 'god-tier' (I won't argue about any of the others, although I don't know who Cormen is)

Deleted Comment

armatav · 3 years ago

Yeah and that “something” is a deep interest.

You can do similar things in domains where you’re so interested you can’t think of doing anything else.

anu7df · 3 years ago

Left the punchline for the link huh? Peter Norvig was the "a software engineer" nuf said, Mike drop and all that.

technological · 3 years ago

If you realize that how hard you try you won’t able to do certain things, do ppl keep trying or accept the limitations ?

Deleted Comment

yieldcrv · 3 years ago

The guy making revenue is a better developer than the guy making elegant algorithms

Very few of us actually want to be the guy accumulating accolades for their engineering, we want to take flights to things we actually want to be doing, not opening our laptop once or even bringing it

Everything in between is a coping mechanism

Waterluvian · 3 years ago

You are a very big fish when measured against countless ponds.

Dead Comment

No one posted Martha Snow's epic poem yet?

Eye halve a spelling chequer It came with my pea sea It plainly marques four my revue Miss steaks eye kin knot sea.

Eye strike a quay and type a word And weight four it two say Weather eye am wrong oar write It shows me strait a weigh.

As soon as a mist ache is maid It nose bee fore two long And eye can put the error rite It's rare lea ever wrong.

Eye have run this poem threw it I am shore your pleased two no It's letter perfect awl the weigh My chequer tolled me sew.

Deleted Comment

pornel · 3 years ago

For the record, ChatGPT can actually correct spelling of this (or it has seen spoilers somewhere…)

nathan_f77 · 3 years ago

This is very impressive. I ran it through ChatGPT and got the following response. The only mistake it made is "really" instead of "rarely".

------

> Correct the spelling:

Eye halve a spelling chequer It came with my pea sea It plainly marques four my revue Miss steaks eye kin knot sea.

Eye strike a quay and type a word And weight four it two say Weather eye am wrong oar write It shows me strait a weigh.

As soon as a mist ache is maid It nose bee fore two long And eye can put the error rite It's rare lea ever wrong.

Eye have run this poem threw it I am shore your pleased two no It's letter perfect awl the weigh My chequer tolled me sew.

> I have corrected the spelling errors in the poem:

"I have a spelling checker

It came with my PC

It plainly marks for my review

Mistakes I cannot see.

I strike a key and type a word

And wait for it to say

Whether I am wrong or right

It shows me straight away.

As soon as a mistake is made

It knows before too long

And I can put the error right

It's really ever wrong.

I have run this poem through it

I am sure you're pleased to know

It's letter perfect all the way

My checker told me so."