> “Boiling water” isn’t “water that happens to be boiling.” It’s a hazard, a cooking stage, a state of matter
I guess we'll have to disagree then, because "boiling water" is "water that's boiling" to me. It's not a different state of matter to "water", that would be "steam". It being a hazard doesn't mean it's a singular concept, same as "wet floor"
Yeah, if "boiling water" is one word, what about boiling sugar? Boiling milk? Boiling volcano? Boiling soup?
Adding two words together creates a new and different concept. The permutations necessary to represent every concept ever formed by combining two or more different words would be endless.
Some of them on the list, like black hole, do make sense. That's a very distinct thing. It's not a hole in the conventional sense and it's not really black. Boiling water, though, is water. And it's boiling.
Norwegian is almost as compound-happy as German, and we could've filled many volumes with compounds. But what generally happens for one of the compunds to enter the dictionary is that the compound needs to have a meaning that is non-obvious from the individual parts, at least to some people, and typically that the compound has a non-obvious meaning if interpreted as two separate words.
E.g. "akterutseilt" is an example. "Akterut" means behind, aft. "Seilt" means sailed. "Behind sailed" helps as a way to remember it, but it's not obvious whether it's strictly a sailing term, or means that you've been left behind or have left someone else behind.
In this case if you say someone has been akterutseilt, it means they've been metaphorically left behind, often by their own failure to keep up.
Those kinds of compounds deserve dictionary entries whether they are actually written in two words or one, because they function as a single unit however it is written.
I think black hole is a perfect example in English. And in fact, this is a compound that is written in two words in Norwegian as well, but is in Norwegian dictionaries despite that[1] as "svart hull".
> Adding two words together creates a new and different concept. The permutations necessary to represent every concept ever formed by combining two or more different words would be endless.
May I introduce you to the German language?
We have "gesundheitszeugnis" (health certificate) and "bärenstark" (strong as a bear), and of course "[der] Donaudampfschifffahrtsgesellschaftskapitän" ([the] Danube Steamship Navigation Company Captain) and "[Das] Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" ([the] cattle marking and beef labeling supervision duties delegation law).
Boiling water is not a word. The phrase contains two words.
While German has no word for "boiling water", it uses two words too, an adjective and a noun, the German language has the principle of composite words. As a consequence, there is an infinite amount of German words.
"Hackernewsleser" would be a word I just made up but every German can understand. A reader of Hackernews. Obviously this makes a dictionary tricky. And it has been a big problem for spell corrections in early MS Word Software.
Agree. “boiling water” is such a staggeringly terrible example for TFA to have opened with.
“Honey, I’ve overheated the fondue! The problem is I can’t describe the liquid because English completely lacks any word that might be apposite in this situation other than the newly-minted ‘boiling water’.”
“It’s a problem. Maybe you could call it ‘boiling water that happens to be quite cheesy’. It’s not great, but it’s the best we can do.”
> Traditional dictionaries skip almost all such phrases, because they contain spaces.
Yes, because they're phrases, not words. I don't even understand what's surprising about this. Sure, the entire article talks about how dictionaries contain _some_ phrases; but it's clear it's not many of them. Dictionaries are for words, not phrases.
Technically they are both phrases and words. You can call them lexemes if you want to avoid confusing the computer programmers who do not understand that life isn't binary.
Boiling water is mostly same as boiling anything. So I would just have "boiling". No need for "boiling water". I see no reason why boiling water could not just be covered by whatever general boiling entry covers.
The reason is the same reason for why the word "hot water" is found in the dictionary: Because it has picked up other meaning.
The word "boiling water" is not currently found in the dictionary because the meaning has not been considered widespread or significant enough to justify inclusion. The article is pondering what line exactly defines widespread or significant.
As an idiomatic expression, "Hot water" = "trouble".
Are there idiomatic expressions for warm/cold/dirty water, which mean something other than a literal adjective describing the temperature or condition of water?
Agree. You can of course treat "Boiling water" in its gerund form where it functions as a noun:
"Boiling water should be performed in a metal pot".
> It’s a hazard, a cooking stage, a state of matter
All of these are ancillary and depend on context, but in every one of these downstream cases the same underlying process is happening: the water is boiling.
I would have agreed with you before they pointed out that "frozen water" gets a word: ice. Honestly, I think it's reasonable: people deal with frozen water far more than they do boiling water, but it changes it from a case of "what are they talking about?" to "okay, where do we draw the line?" for me.
But water that has boiled into gas also gets a word: steam.
As far as I'm aware, there is no separate word for freezing water -- i.e. water that is very cold and will, if it continues to get colder (and has something to crystallise around), turn into ice.
So the symmetry seems complete: ice -> freezing water -> water -> boiling water -> steam.
Frozen water represents a state change and that different state commonly gets its own word: ice/water/steam equates to solid/liquid/gas
Boiling/freezing water represents the state of the liquid, not the transition. Its descriptive. Water boils away into steam, or freezes into ice.
Should we consider luke-warm water also singular? What about body-temperature water? cool water? It makes sense not to treat adjectives/descriptive words combined with the subject as singular because the definition already exists in the root of the words (meaning of adjective word + meaning of subject word). Blue clay is another example, why would that be a singular?
It really only makes sense to me in the rare cases where the combination words represent something different or non obvious than the combined meanings of the two words (i.e to 'give up')
Ice, slush, sleet, snow, graupel, hail... And within there is a subtype "black ice", a compound noun that isn't really just a description (it's not black, it's nearly invisible - a similar sense as another one, "black hole", which you'd never figure out from the components alone).
We have a lot of words for "frozen water" because it takes a lot of forms. As far as I know "boiling water" is only one thing so we've never needed additional words to distinguish it.
I’m so glad I’m not going insane. I don’t see any examples on that site that I agree are ‘one word’. Sure they’re singular concepts but so what? Are we going to have singular words to describe all adjective noun pairs now?
A compound word isn't just a phrase. The latter is a group of words that indicate a single concept. The former is a new word that has a distinct meaning from the subwords that compose it. "I love you" is an example of a clausal phrase. The meaning is entirely evident from the words that compose it. In contrast, a "hot dog" is not a particularly warm canine, and has its own OED entry [0] as a compound word.
And some of the entries on this list are wrong. "Good night" exists in OED as "goodnight" [1] because there are multiple ways it's used. One is the clausal phrase "I hope you have a good night", which can be modified by changing the adjective, e.g. "great night" or "terrible night". "Goodnight" the bedtime ritual can't be modified the same way, so OED chooses to write it as a compound word without spaces.
Surprised that no comment mentioned that there is a standard term (not a word :P) for the set of words that denominates a particular concept: nominal syntagm. Such as "boiling water" and also "that green parrot we saw yesterday over the left branch".
Also the slider examples are abysmal. "I love you", "Go home" and "How are you" are not words by any stretch of imagination. For someone who makes word games, I don't see a particularly deep love of words here.
Added a note: "'I love you' isn't opaque, but it's tight enough to put on a tile." The familiar end of the spectrum picks up collocations that are transparent but loaded — I'm not claiming they're words in the traditional sense, but they're useful vocabulary for word games, which is where I'm coming from.
> "'I love you' isn't opaque, but it's tight enough to put on a tile."
The problem with introducing phrase/sentences into a word game (let's take Scrabble) is that you'd spend half the night with your friends arguing over what is and is not acceptable with the only litmus test being its... corpus frequency?
Funnily enough, "nominal syntagm" is, itself, not in the OED or Wiktionary. But Wiktionary has "syntagme nominal" as the French translation for "noun phrase".
You really have to love the human messiness of language!
A nominal syntagm is a somewhat overlapping concept, but deviates slightly from the direct discussion taking place. The more appropriate standard term here is: open compound word. Or, as one might say casually: word.
There are nearly half a million compound phrases that aren’t in any dictionary—simply because they contain spaces. “Boiling water.” “Saturday night.” “Help me.”
I would hope that none of those examples were taking up space in a dictionary.
It's quite interesting that "boiling water" in many Slavic languages is actually a separate word (and not derived from "water", but from "boiling"; similar how the author mentions "ice" being used instead of "frozen water").
It was mentioned in other comments but boiled water is steam, and frozen water is ice. We do not have separate words for freezing water or boiling water.
in the slavic languages do they have a different way to describe boiling or freezing milk, or any other liquid?
I mean it’s interesting that this is generally the case with many (or even most) words across languages… But I’d wager it’s more the norm than the exception, so I don’t know if “boiling water” is that interesting of an example.
This was a great detail — added Russian kipyatok and Polish wrzątok to the article as evidence that "boiling water" carries enough conceptual weight that other languages crystallized it into a single word
The rest of the article did a good job explaining that. I just think those were terrible examples for the introduction. I think "shut up", "good night", and "hot dog" would have really got the point across better, but those might already be in dictionaries.
The first two I kind of understand what the author means. But "help me" and "severe pain" made me think that I'm just not the right public for this text.
While 'this analysis would not have been possible without LLM', I am not sure the LLM analysis was well reviewed after it has been done. From the obscure/familiar word list, some of the n-grams, e.g. "is resource", "seq size", "db xref" surely happen in the wild (we well know), but I would doubt that we can argue they are missing from the dictionary. Knowing the realm, I would argue none of them are words, not even collocations. If "is resource" is, why not, "has resource"?
So while the path is surely interesting, this analysis does miss scrutiny, which you would expect from a high-level LLM analysis.
The very bottom of the slider is there to illustrate where LLM artifacts and Wiktionary noise live — it's not presented as legitimate vocabulary. The slider lets you see the full quality gradient, including where it breaks down.
That's not really mentioned in the article, though. As far as the article is concerned, the right side of that slider is valid-but-possibly-too-rare-to-be-interesting, when in fact it's just garbage. This does not sell the concept well.
In addition to what others have pointed out, many of these aren't actually missing from traditional dictionaries: they're just inflected differently. So your example lists phrases like "operating systems", "immune systems" and "solar systems" as missing from traditional dictionaries, but at least the online OED and M-W have "operating system", "immune system" and "solar system" in them. It's just that your script is apparently listing the plural as a separate phrase.
On languages other than English: in general, different languages do word division very differently. At least in German and Dutch, many of those phrasal verbs are separable, meaning that they are one word in the infinitive but are multiple words in the present tense. So for example, where in English you would say "I log in to the website", in Dutch it would be "Ik log in op de website". "Log in" is two words in both cases, but in Dutch it's the separated form of the single-word separable verb inloggen ("I must log in now" = "Ik moet nu inloggen"). The verb is indeed separable in that the two words often don't end up next to each other: "I log in quickly" = "Ik log snel in".
Dutch, like German, has lots of compounds. But there are also agglutinative languages, which have even more complex compound words, perhaps comprising a whole sentence in another language. Eg (from Wikipedia) Turkish "evlerinizdenmiş" = "(he/she/it) was (apparently/said to be) from your houses" or Plains Cree "paehtāwāēwesew" = "he is heard by higher powers"; and these aren't corner cases, that's how the language works.
Collocation dictionaries are lists of collocations. The reason they're absent from single word dictionaries is because there's about 25x more collocations than single words.
The author of this article just hasn’t been taught how to use a dictionary. The words aren’t “missing”, they’re just indexed under one of their parts. For example “wait upon” would be located within the entry for “wait”.
I guess we'll have to disagree then, because "boiling water" is "water that's boiling" to me. It's not a different state of matter to "water", that would be "steam". It being a hazard doesn't mean it's a singular concept, same as "wet floor"
Adding two words together creates a new and different concept. The permutations necessary to represent every concept ever formed by combining two or more different words would be endless.
Some of them on the list, like black hole, do make sense. That's a very distinct thing. It's not a hole in the conventional sense and it's not really black. Boiling water, though, is water. And it's boiling.
Norwegian is almost as compound-happy as German, and we could've filled many volumes with compounds. But what generally happens for one of the compunds to enter the dictionary is that the compound needs to have a meaning that is non-obvious from the individual parts, at least to some people, and typically that the compound has a non-obvious meaning if interpreted as two separate words.
E.g. "akterutseilt" is an example. "Akterut" means behind, aft. "Seilt" means sailed. "Behind sailed" helps as a way to remember it, but it's not obvious whether it's strictly a sailing term, or means that you've been left behind or have left someone else behind.
In this case if you say someone has been akterutseilt, it means they've been metaphorically left behind, often by their own failure to keep up.
Those kinds of compounds deserve dictionary entries whether they are actually written in two words or one, because they function as a single unit however it is written.
I think black hole is a perfect example in English. And in fact, this is a compound that is written in two words in Norwegian as well, but is in Norwegian dictionaries despite that[1] as "svart hull".
[1] https://ordbokene.no/bm/svart%20hull
May I introduce you to the German language?
We have "gesundheitszeugnis" (health certificate) and "bärenstark" (strong as a bear), and of course "[der] Donaudampfschifffahrtsgesellschaftskapitän" ([the] Danube Steamship Navigation Company Captain) and "[Das] Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" ([the] cattle marking and beef labeling supervision duties delegation law).
"Hackernewsleser" would be a word I just made up but every German can understand. A reader of Hackernews. Obviously this makes a dictionary tricky. And it has been a big problem for spell corrections in early MS Word Software.
“Honey, I’ve overheated the fondue! The problem is I can’t describe the liquid because English completely lacks any word that might be apposite in this situation other than the newly-minted ‘boiling water’.”
“It’s a problem. Maybe you could call it ‘boiling water that happens to be quite cheesy’. It’s not great, but it’s the best we can do.”
> Traditional dictionaries skip almost all such phrases, because they contain spaces.
Yes, because they're phrases, not words. I don't even understand what's surprising about this. Sure, the entire article talks about how dictionaries contain _some_ phrases; but it's clear it's not many of them. Dictionaries are for words, not phrases.
- Don't put your hand in water that's boiling,
- Add the pasta to water that's boiling,
- That saucepan is full of water that's boiling.
If "boiling water" were a distinct word, all of these sentences would change meaning compare to their idiomatic counterparts.
The word "boiling water" is not currently found in the dictionary because the meaning has not been considered widespread or significant enough to justify inclusion. The article is pondering what line exactly defines widespread or significant.
Deleted Comment
Are there idiomatic expressions for warm/cold/dirty water, which mean something other than a literal adjective describing the temperature or condition of water?
Depending on the context you got sewage, slush, runoff, murk, waste etc.
All of these are ancillary and depend on context, but in every one of these downstream cases the same underlying process is happening: the water is boiling.
Not necessarily. It might refer to heating water to bring it to a boil.
Q. What are you doing over there?
A. Oh, just boiling water.
As far as I'm aware, there is no separate word for freezing water -- i.e. water that is very cold and will, if it continues to get colder (and has something to crystallise around), turn into ice.
So the symmetry seems complete: ice -> freezing water -> water -> boiling water -> steam.
Frozen water represents a state change and that different state commonly gets its own word: ice/water/steam equates to solid/liquid/gas
Boiling/freezing water represents the state of the liquid, not the transition. Its descriptive. Water boils away into steam, or freezes into ice.
Should we consider luke-warm water also singular? What about body-temperature water? cool water? It makes sense not to treat adjectives/descriptive words combined with the subject as singular because the definition already exists in the root of the words (meaning of adjective word + meaning of subject word). Blue clay is another example, why would that be a singular?
It really only makes sense to me in the rare cases where the combination words represent something different or non obvious than the combined meanings of the two words (i.e to 'give up')
We have a lot of words for "frozen water" because it takes a lot of forms. As far as I know "boiling water" is only one thing so we've never needed additional words to distinguish it.
https://www.ritasice.com
Ice cream is a shortened pronunciation.
Deleted Comment
The chef was out the back, boiling water.
The chef was out the back. Boiling water had spilled everywhere.
The seas had turned to boiling water.
I dunno, could be down to interpretation.
Deleted Comment
Which is why "state of matter" is, itself, often in the dictionary, possibly to the dismay of the Team Single Word in this comment section.
Dead Comment
And some of the entries on this list are wrong. "Good night" exists in OED as "goodnight" [1] because there are multiple ways it's used. One is the clausal phrase "I hope you have a good night", which can be modified by changing the adjective, e.g. "great night" or "terrible night". "Goodnight" the bedtime ritual can't be modified the same way, so OED chooses to write it as a compound word without spaces.
[0] https://www.oed.com/dictionary/hot-dog_n
[1] https://www.oed.com/dictionary/goodnight_n
Dead Comment
Also the slider examples are abysmal. "I love you", "Go home" and "How are you" are not words by any stretch of imagination. For someone who makes word games, I don't see a particularly deep love of words here.
Edit: Obligatory reference to Borges's Tlön: https://en.wikipedia.org/wiki/Tl%C3%B6n,_Uqbar,_Orbis_Tertiu...
The problem with introducing phrase/sentences into a word game (let's take Scrabble) is that you'd spend half the night with your friends arguing over what is and is not acceptable with the only litmus test being its... corpus frequency?
You really have to love the human messiness of language!
I would hope that none of those examples were taking up space in a dictionary.
in the slavic languages do they have a different way to describe boiling or freezing milk, or any other liquid?
Deleted Comment
Deleted Comment
i guess Saturday night could have some extra details explaining the context around our standard work week. But even that is a stretch.
Yeah, I agree! Fuck ICE!
On languages other than English: in general, different languages do word division very differently. At least in German and Dutch, many of those phrasal verbs are separable, meaning that they are one word in the infinitive but are multiple words in the present tense. So for example, where in English you would say "I log in to the website", in Dutch it would be "Ik log in op de website". "Log in" is two words in both cases, but in Dutch it's the separated form of the single-word separable verb inloggen ("I must log in now" = "Ik moet nu inloggen"). The verb is indeed separable in that the two words often don't end up next to each other: "I log in quickly" = "Ik log snel in".
Dutch, like German, has lots of compounds. But there are also agglutinative languages, which have even more complex compound words, perhaps comprising a whole sentence in another language. Eg (from Wikipedia) Turkish "evlerinizdenmiş" = "(he/she/it) was (apparently/said to be) from your houses" or Plains Cree "paehtāwāēwesew" = "he is heard by higher powers"; and these aren't corner cases, that's how the language works.
Collocation dictionaries are lists of collocations. The reason they're absent from single word dictionaries is because there's about 25x more collocations than single words.
Deleted Comment
Presumably if the word thesaurus was actually "synonym dictionary" it would likewise be absent.