Fakelish – Fake English word generator

Nice idea, naive implementation which leads to the output being unconvincing as hypothetical English words. I had a brief look and it seems to be proportionally selecting and sticking together sequences of letters sampled from English words (lib/word-probability.ts). This doesn't take into account syllable boundaries, the way the English spelling system maps between phones/phonemes and the phonotactic properties of English which is why the output looks unconvincing.

A better approach would be to use a markov chain built from sampling English text letter-by letter... an even better approach would be to build your stats from some source of English words in IPA transcription with syllable boundaries etc marked, then map from IPA to spelling via some kind of lookup table. We use a similar process in reverse in my research group for building datasets for doing Bayesian phylogenies of language families

KennyBlanken · 4 years ago

Clearly you are far more of a linguist than I am, but from such a perspective, I had a similar impression; I reloaded the page several times and none of the words struck me as being remotely plausibly English. These are worse than most Hollywood scifi words/names.

rlayton2 · 4 years ago

A significant improvement on letter-by-letter, but not that much harder, is to use n-grams: "two letters to predict the third" etc. Still not "industry grade", but the results start making more sense.

bruce343434 · 4 years ago

A letter-by-letter markov chain would lead to similar unconvincing results. As you said, vocal groups matter much more than single letters. If you know anything about korean, they actually group letters into characters that way. If one could build such a markov chain for English it would be very convincing I think.

mrbukkake · 4 years ago

You're right, I forgot that markov chains are memoryless

rajansaini · 4 years ago

You should check out the VOLT paper, I think it would work well. It's a new technique for splitting up a vocabulary into subwords while minimizing entropy. These subwords could then be mixed and matched, maybe by a neural model, for better results.

lioeters · 4 years ago

Thank you for the reference. To save others a search, I believe this is the paper:

Vocabulary Learning via Optimal Transport for Neural Machine Translation - https://arxiv.org/abs/2012.15671

https://jingjing-nlp.github.io/volt-blog/

https://github.com/Jingjing-NLP/VOLT

themdonuts · 4 years ago

I got "minable" on my first try and found it impressive and surprised that it wasn't a word. After 3 other reloads nothing else came up.

tw04 · 4 years ago

Definitely not a fake word. Coal, for instance, is a minable resource.

https://www.dictionary.com/browse/minable

phs318u · 4 years ago

Similarly, ”shitbin” was the second word on my first try, and I had to internet search to convince myself that it isn’t in fact a word.

thaumasiotes · 4 years ago

It definitely is a word, since "mine" is an existing verb.

Wistar · 4 years ago

I got "episexic" and, well, I kind of like that one.

Jabberwocky

’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:

All mimsy were the borogoves,

      And the mome raths outgrabe.

“Beware the Jabberwock, my son!

      The jaws that bite, the claws that catch!

Beware the Jubjub bird, and shun

      The frumious Bandersnatch!”

He took his vorpal sword in hand;

      Long time the manxome foe he sought—

So rested he by the Tumtum tree

      And stood awhile in thought.

And, as in uffish thought he stood,

      The Jabberwock, with eyes of flame,

Came whiffling through the tulgey wood,

      And burbled as it came!

One, two! One, two! And through and through

      The vorpal blade went snicker-snack!

He left it dead, and with its head

      He went galumphing back.

“And hast thou slain the Jabberwock?

      Come to my arms, my beamish boy!

O frabjous day! Callooh! Callay!”

      He chortled in his joy.

’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:

All mimsy were the borogoves,

      And the mome raths outgrabe.

</obligatory>

inglor_cz · 4 years ago

I am aware about two translations of this poem into Czech. They are completely different from each other and both very playful.

mPReDiToR · 4 years ago

Have you seen the ActionScript version?

Many years of /. posts and other results might find you a version that's readable if you search.

gumby · 4 years ago

I love this.

SavantIdiot · 4 years ago

Speaking of gibberish english: I know this has been on YouTube for 10 years, but there are always newcomers who haven't had their brain melted by it:

https://www.youtube.com/watch?v=-VsmF9m_Nt8

BrandoElFollito · 4 years ago

For a non-native speaker of English - this sounds like lots of songs.

Tangentially related - this is how I discovered Nightwish some 15 years ago: https://www.youtube.com/watch?v=gg5_mlQOsUQ

speedcoder · 4 years ago

Nobody could make up words like Frankie Smith (may he RIP 2019) in the middle of Double Dutch Bus https://youtu.be/fK9hK82r-AM

Thank you.

I know this comment doesn't add anything of value to the discussion per se, but that's given me the biggest laugh I've had in months.

Nightwish came into my life in the 00s, and I couldn't tell you one song meaning, yet I love the sound.

This is just a perfect video, thank you for sharing.

dustintrex · 4 years ago

Modern version: https://youtu.be/ybcvlxivscw

"English" starts at around 0:48, but the others are also worth a listen!

LordDragonfang · 4 years ago

Here's another similar one, but acted prose instead of a song:

https://www.youtube.com/watch?v=Vt4Dfa4fOEY

Joeboy · 4 years ago

This isn't nonsense in the same way, but it has a similar appeal: https://www.youtube.com/watch?v=Y8yEH8TZUsk

formerly_proven · 4 years ago

This is what a parse error feels like.

avgcorrection · 4 years ago

My brain isn’t melted. This could just be some obscure Dutch dialect for all I know.

I'm sure some people don't hear it, like "the dress", but for some of us it sounds like an Uncanny Valley of English: close but not quite, just enough for our brains to trip over / struggle to comprehend b/c it is so close.

scubbo · 4 years ago

As well as the associations with [1], this also made me think of one of my favourite essays, "Horsehistory study and the automated discovery of new areas of thought"[2]

[1] https://www.thisworddoesnotexist.com/ [2] https://interconnected.org/home/2021/06/16/horsehistory

nkrisc · 4 years ago

Sorry, after a few refreshes not a single word was anything that looked remotely like English. It all looked like complete gibberish or words in another language. Most of them weren’t even pronounceable.

On my first load, I got "Plailmly", which uses a sequence of consonants that I'm reasonably certain occurs nowhere in the English language.

I think ailml is the offending sequence here. It's pretty difficult to say and doesn't sound like something that you'd find in a native English word.

There's calmly which is similar, to be fair, but there's something about the tongue positions for ailml that I find noticeably more difficult, it's too far forward.

lokl · 4 years ago

Not nowhere, but uncommon: calmly, filmlike, ...

clavicat · 4 years ago

Runinal Worriably Homenite

I like these, especially the last.

foobarbecue · 4 years ago

Down due to rate limiting so I can't look at it, but sounds similar to the fantastic https://www.thisworddoesnotexist.com/

quercusa · 4 years ago

The first word I got was 'scrotal', which is a real word.

jstx1 · 4 years ago

After a few refreshes I got 'sundial'.

echelon · 4 years ago

Should probably do a final pass filter against an English word dictionary.

annetipasto · 4 years ago

Can anyone tell me more about how this works? Most of these don't resemble English words at all to me lol, wondering what the generative procedure/parameters are in the first place

jaclaz · 4 years ago

I find much more interesting:

http://www.thisworddoesnotexist.com/

as it also fakes the definition.

But if you want to write some Vogon like poetry, the words generated by Fakelish might be just fine.

newsbinator · 4 years ago

dynoderma

dyn·o·derma

a slender, membranous musclelike structure, believed to represent a cross between a cranium and the external spaces of fish and invertebrates, supporting the glans in most vertebrates

"a dynoderma is thought to have existed in all living organisms"

dharmaturtle · 4 years ago

https://raw.githubusercontent.com/nwtgck/fakelish-npm/develo...

Basically a big probability map. I'm guessing this was machine generated though, and it isn't clear to me how that was done.