The longest word you can type on the first row

I think we can do better than 11 characters!

I happen to have a corpus which includes pretty much every word ever written in a book, including many misspelled, mistranscribed, or otherwise non-dictionary words.

After eliminating nonsense, non-English, or other mistakes, I think the real winner, coming it at 12 characters, is:

    teetertotter

That's a relatively common word. Even though it's usually seen hyphenated, the unhyphenated form is recognized by all the online dictionaries I found.

----

And some other candidates, just for fun, in the 13 or 12 character range:

    proproprietor
    priorityqueue
    reporterette
    preprototype

"proproprietor" seems more like a misspelling. Should have a hyphen, or be two words.

"priorityqueue" is of course familiar to hackers here, but is more of a jargon term, and is only concatenated due to appearing in source code. Invariably it's two words when actually written out.

"reporterette" is antique, but appeared in a NYTimes headline as late as 2018 - the author reflected on her career, including sexist epithets. https://www.nytimes.com/2018/12/02/opinion/george-hw-bush-ma...

"preprototype" is used exactly as is, in lots of scientific papers, up to the current day. That's a pretty good one too, and could be a tie for "teetertotter", but it's verging on jargon.

soultrees · 2 years ago

How did you scrape that data? How do you store and retrieve it? Is it just a standard db or a vector db?

Sorry for the questions, but it seems like an interesting, yet probably common data set and as someone who is venturing down this path, I’d like to learn more about building my own dataset similar to this from scratch.

neilk · 2 years ago

> standard db or vector db

lol, it's a 42MB text file from Google Books Ngrams.

The format looks like this:

    $ head words-all.txt

    a       14219615690
    a!      196012
    a"      84
    a'      47713
    a'0     3036
    a'1     4070
    a'10    99
    a'11    56

I queried it with perl and sort.

    $ time perl -wlane 'if ($F[0] =~ /^[qwertyuiop]+$/) { print length($F[0]), "\t", $F[0] }' words-all.txt | sort -rn > qwertywords

    real 0m1.915s
    user 0m1.896s
    sys 0m0.025s

I can't remember exactly which file I downloaded, but according to my notes I got it from here back in 2012 or so.

https://storage.googleapis.com/books/ngrams/books/datasetsv2...

There seems to be a newer corpus published in 2020:

https://storage.googleapis.com/books/ngrams/books/datasetsv3...

Deleted Comment

On any macOS computer (or replace /usr/share/dict/words with your own word list):

  grep '^[qwertyuiop]*$' /usr/share/dict/words | \
  awk '{ print length(), $0 }' | \
  sort -n

juujian · 2 years ago

Works for Ubuntu, too. My Colemak self can only get fluffy (6) from the front row, that's the longest word. Middle row really shines though, I can get hardheartedness (15) or assassinations (14).

tiltowait · 2 years ago

Interesting that your dictionary doesn't have "tenderheartedness", which is two letters longer.

seabass-labrax · 2 years ago

Gulp, fluffy puppy pug! Yup. Fly, ugly pup, fly.

I note that hardheartedness and hotheadedness threaten the darnedest nonstandard assassinations. Such sordidness!

travisgriggs · 2 years ago

Nice.

Middle/Second row result is

8 flagfall "Flagfall, or flag fall, is a common Australian expression for a fixed start fee, especially in the taxi, haulage, railway, and toll road industries."

8 galagala "A name in the Philippine Islands of Dammara Philippinensis, a coniferous tree yielding dammar-resin."

Lower/Third Row: - None

There are no vowels on the bottom row. So no words. I've been typing at ~ 50wpm for 30 years, and I don't think I'd ever actually consciously recognized this fact about the bottom row.

(standard US keyboard layout)

JoshTriplett · 2 years ago

For QWERTY, I found two nine-letter words using only the middle row: halakhahs and haggadahs.

And yeah, nothing in the bottom row other than acronyms and similar pseudo-words.

Symbiote · 2 years ago

Dvorak:

  ',.PY FGCRL   pry or Lyly
  AOEUI DHTNS   tendentiousness
  ;QJKX BMWVZ   xxxv, www, bbq or mm

After 'apt install wbritish-insane'

  pyrryl (a chemical group)
  unostentatiousnesses (and anaesthetisations is good too)
  mmmm

tedunangst · 2 years ago

Knuth vs McIlroy all over again.

IshKebab · 2 years ago

Just use https://www.visca.com/regexdict/

$ grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head 11 rupturewort 11 proterotype 11 proprietory 10 typewriter 10 tetterwort 10 repetitory 10 repertoire 10 proprietor 10 pretorture 10 prerequire

$ grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head 10 typewriter 10 repertoire 10 proprietor 10 perpetuity 9 typewrote 9 typewrite 9 territory 9 repertory 9 puppeteer 9 prototype

↑7↑{⍵[⍒≢¨⍵]}words/⍨{''≡⍵~'qwertyuiop'}¨words ┌→─────────┐ ↓peppertree│ │perpetuity│ │prerequire│ │proprietor│ │repertoire│ │typewriter│ │etiquette │ └──────────┘

% grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head 11 rupturewort 11 proterotype 11 proprietory 10 typewriter 10 tetterwort 10 repetitory 10 repertoire 10 proprietor 10 pretorture 10 prerequire

C:\>grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head Bad command or file name Bad command or file name SORT: Too many parameters Bad command or file name

In the start The One has risen the stars and the earth. The earth had no order, and nothin' resided there; and shade resided on the nonendin' 'neath. And The One rided on the seas. Then The One said: "I desire it to shine"; and it shone. And The One had seen the shine, that it's neat; and The One sorted the shine on one side, and the shade on the other. The One then denoted the shine and the shade. So the nite and the shine that are date no. one had ended.

bnjmn · 2 years ago

susam · 2 years ago

On macOS version 12.1 Monterey:

On Debian GNU/Linux 11 (bullseye):

jodrellblank · 2 years ago

Dyalog APL, using the enable1 wordlist, I don't know its origins but you can get it from Peter Norvig's website https://norvig.com/ngrams/enable1.txt or various GitHubs and Gists:

Reading from the right, "test each word by removing 'qwertyuiop' and see if it leaves an empty string, use the test results to filter the input word list, descending-sort the length of each word and use that to arrange(index) the remaining words, flatten the array and take the top 7".

(Longest from the middle row is 'haggadahs' then 'alfalfas', third row is 'mm')

For Debian, try installing one of the larger wordlists, such as wamerican-huge or wbritish-huge; those have "rupturewort".

codetrotter · 2 years ago

FreeBSD 13.2

So it seems that in addition to having parts of its kernel based on FreeBSD, there is also a lot of similarities in the wordlist at /usr/share/dict/words of macOS to that of FreeBSD :) perhaps even the same?

p1mrx · 2 years ago

MS-DOS 6.22

kazinator · 2 years ago

Awk greps!

   awk '/^[qwertyuiop]+$/ {print length, $0}'

kristopolous · 2 years ago

Here's something you may not know, the *-insane dictionaries, which are giant, are functions of OCR output and are known to contain lots of errors.

I found a few earlier this year and I was going to file a bug so I did some research to find out this is a known and expected behavior.

If the computer say reads stubborn as stubbum, the smaller dictionaries are the ones that have cross checked and filtered those out. The insane ones do not. It's a good name. "Lack of sanity checks"

Here's an example word I found, "suabilities". You'll find it only on wordlist sites that used this wordlist and I guess, now here.

colinchartier · 2 years ago

Reminds me of the ghost Unicode character saga: https://www.dampfkraft.com/ghost-characters.html

just saw this. I've got no idea how kanji ocr works but I do know enough japanese to know what most of those characters are attempting to refer to, my penmanship has certainly been that bad. I still don't understand how it would make its way into the standard unless that part wasn't written by someone who is competent in japanese.

I wonder how often that happens - surely there's tons of people dealing with japanese text who can't read it and just use diligence to make sure the "letters are the same"

schoen · 2 years ago

I've used the insane dictionaries a number of times for puzzle stuff and I never knew that they were derived from OCR output. Thanks for mentioning that!

You might find the... 'translation'[1] of Genesis 1 using only keys on the Colemak home row interesting:

[1]: https://colemak.com/Fun

If you enjoy that, you might also enjoy this version that I wrote

https://godexperiment.org/beginnings-an-alliterative-rewrite...

It was inspired by these versions:

https://llamasandmystegosaurus.blogspot.com/2017/05/alpha.ht...

https://calvinballing.github.io/saga/

SethTro · 2 years ago

For Dvorak with a little assist from unix

First row

$ awk '/^[,.pyfgcrl]$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

3 pry / ply / fry / cry

Second row

$ awk '/^[aoeuidhtns]

$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

15 tendentiousness

14 assassinations

13 instantaneous

13 insidiousness

Third row

$ awk '/^[;qjkxbmwvz]*$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

4 xxxv

3 xxx

3 xxv

2 xx

rwl4 · 2 years ago

Hmm. My Mac shows these:

[...] 15 sententiousness 15 sinuatodentated 15 soundheadedness 15 tendentiousness 15 uninitiatedness 16 antisensuousness 16 ostentatiousness 17 dissentaneousness 17 instantaneousness 18 unostentatiousness

Nekhrimah · 2 years ago

Not sure about that third row, the "A" is in the second row.

Awkward, now the third now doesn't return anything

$ awk '/^[;qjkxbmwvz]*$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head 4 xxxv 3 xxx 3 xxv 2 xx 2 xv

lovehashbrowns · 2 years ago

I tried to do some other fun things like going row by row with each row only contributing one letter and seeing what’s the longest word I could come up with.

If I start at the top row and go down, I can make TAXES but couldn’t think of a longer word. The third row having no vowels makes it so hard.

Starting at the bottom row and going up, I came up with CHICKEN which is delicious and neat that it ends where it started. Chickens is longer but ends on the middle row which is not as neat I feel like :(

> If I start at the top row and go down, I can make TAXES but couldn’t think of a longer word. The third row having no vowels makes it so hard.

A dictionary search turned up "paxwaxes" as the longest word I could find that starts in the top row and goes down, wrapping around to the top every three letters.

> Starting at the bottom row and going up, I came up with CHICKEN which is delicious and neat that it ends where it started. Chickens is longer but ends on the middle row which is not as neat I feel like :(

Chickens is indeed the longest.

If you start at the bottom row and go up-and-down: cataclysms, or catamarans.

If you start at the top row and go down-and-up: escapable.

If you start in the middle and go down-and-up: scarabaean

If you start in the middle and go up-and-down, I didn't find anything longer than 7 letters, and there were 39 seven-letter words, including "discard", "grandpa", and "stacked".

DylanDmitri · 2 years ago

Related, is there a high quality plaintext dictionary file for running similar searches? I’ve spent several hours but couldn’t find one that’s both comprehensive and accurate.

aidenn0 · 2 years ago

What are your rules for what counts as a "word"? If you go with the basic scrabble rules (i.e. nothing that would be capitalized or punctuated) then YAWL[1] is pretty good, with the downside being the most recent version I know of is from 2008.

FYI, rupturewort is the sole 11-letter word answer to TFA in YAWL; found using:

    grep '^[qwertyuiop]*$' word.list |while read -r line; do echo "${#line} ${line}"; done |sort -n | tail

1: https://github.com/elasticdog/yawl

mminer237 · 2 years ago

https://github.com/dwyl/english-words/blob/master/words_alph...

layer8 · 2 years ago

https://packages.debian.org/bookworm/wordlist

I linked in another comment, I use "enable1.txt" which is here on Peter Norvig's site: https://norvig.com/ngrams/enable1.txt

It's 170k English words, no placenames or people's names or anything like that, but does have some that I question how valid they are.

praash · 2 years ago

Some common Linux distributions have packages that provide word list files to /usr/share/dict/ in several languages. It's likely for English files to be preinstalled. I've had a plenty of fun practising regex and pipes with these word lists!

koolba · 2 years ago

Dictionary or word list?

/usr/share/dict/words is always destination zero for words.

I'd recommend the SCOWL wordlist, which also has usage data (so you can decide how rare of words you want to include).