Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.
I would not be confident enough myself to add the data myself since I'd probably be wrong a lot of the time. When reviewing the results for the top 100 unknown names I frequently got results that I thought _might_ be wrong, but I wasn't sure. For those, I looked up similar names in DIM to verify, and often thought "huh, I would not have declined those names like this". For that reason, I rely on the DIM data as the source of truth since it's maintained by experts on the language.
…
> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension
My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.
Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?
- Ástvaldur -> ur,,i,ar - Baldur -> ur,ur,ri,urs
The "aldur" ending is pronounced in the exact same manner, but applying the declension pattern of "Ástvaldur" to "Baldur" would yield:
- Baldur - Bald - Baldi - Baldar
The three last forms feel very wrong (I asked my partner to verify and she cringed).
Spoken Icelandic is surprisingly close to its written form. I wouldn't expect very different results for the trie if a "phonetic" version of names and their endings were used instead of their written forms