Readit News logoReadit News
alexharri commented on Compressing Icelandic name declension patterns into a 3.27 kB trie   alexharri.com/blog/icelan... · Posted by u/alexharri
radpanda · 22 days ago
> There are, in fact, 88 approved Icelandic names with this exact pattern of declension, and they all end with “dur”, “tur” or “ður”.

> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension

My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.

Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?

alexharri · 22 days ago
Hmm, good idea. There are names that have the exact same pronunciation yet have different patterns of declension, for example:

- Ástvaldur -> ur,,i,ar - Baldur -> ur,ur,ri,urs

The "aldur" ending is pronounced in the exact same manner, but applying the declension pattern of "Ástvaldur" to "Baldur" would yield:

- Baldur - Bald - Baldi - Baldar

The three last forms feel very wrong (I asked my partner to verify and she cringed).

Spoken Icelandic is surprisingly close to its written form. I wouldn't expect very different results for the trie if a "phonetic" version of names and their endings were used instead of their written forms

alexharri commented on Compressing Icelandic name declension patterns into a 3.27 kB trie   alexharri.com/blog/icelan... · Posted by u/alexharri
dmurray · 22 days ago
For the 800 names that were missing declension data in the database, it seems like the most straightforward thing to do would be to assign their declensions by hand. It shouldn't take a native speaker more than a couple of hours (if some name they haven't seen before is ambiguous, then whatever they guess at least won't sound obviously wrong to other native speakers). Alternatively, very cheap to ask an LLM to do it.

Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.

alexharri · 22 days ago
It would be good to cover more names for sure -- that's an ongoing process at DIM. Names are frequently added to the approved list of Icelandic names, so there's always going to be some lag.

I would not be confident enough myself to add the data myself since I'd probably be wrong a lot of the time. When reviewing the results for the top 100 unknown names I frequently got results that I thought _might_ be wrong, but I wasn't sure. For those, I looked up similar names in DIM to verify, and often thought "huh, I would not have declined those names like this". For that reason, I rely on the DIM data as the source of truth since it's maintained by experts on the language.

alexharri commented on Compressing Icelandic name declension patterns into a 3.27 kB trie   alexharri.com/blog/icelan... · Posted by u/alexharri
ryanjshaw · 22 days ago
An interesting article but I was surprised there was no discussion about what humans do to address this problem?
alexharri · 22 days ago
As a native Icelandic speaker, I have an intuition for how to decline names -- I don't really think about it consciously. I'd assume that for most people it's just pattern matching.

Native speakers very frequently decline names in ways that are not technically perfect but sound correct enough. For example, my name (Alex) should not be declined, but people frequently use the declension pattern (Alex, Alex, Alexi, Alexar).

There's some parallel to be drawn with how the compressed trie applies patterns that it's learned to names. That's at least how I thought about it when designing the library.

u/alexharri

KarmaCake day687November 8, 2022
About
Website: https://alexharri.com GitHub: https://github.com/alexharri
View Original