"The spark that led me to create Mini was realizing that a micro-language like TP could actually work: there’s no reason in principle a language with a limited word-count couldn’t have a simple, complete, and unambiguous grammar alongside a vocabulary based on intelligible word roots designed to handle most aspects of everyday discourse...."
Though if you speak Spanish and English you'll probably be able to guess most of the words, so you may find it easier to read initially than Toki Pona.
I've been making a game system in which the core mechanic is using a limited language to describe magical effects, so it's a long way away from the intended purpose for both Toki Pona and Mini. However, I've found that Mini is far easier to work with and more expressive for my purposes because it's easier to put structure into the statements using the particles to indicate what part of speech is intended for each word. The selection of words also seems to be surprisingly well-chosen, because most of my use cases have been pretty straightforward to express.
I haven't really tried to limit the system to just Mini Kore (which is also 120 words, like Toki Pona, and would be a more direct comparison), mostly because Mini's current size actually seems to have the right feel. It might be an interesting experiment though.
I think the concept of "simplest naturalistic language" may be intrinsically broken -- a "naturalistic language" is not simple. Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common). This tension is partly about how much a language user must know/consider when speaking/listening and how efficiently you can say things.
I cannot find a citation quickly, but I recall years ago reading a paper about simulated agents "evolving" a language in a game context where agents had to indicate items to one another, by sending messages which were subject to a noisy channel. Items had multiple attributes (think "small red square", "big green triangle" etc), and experimenters could vary both the noise in the channel, and the entropy of the distribution over items. Naturally if "small red square" is 99% of the things you have to communicate, and there is low noise, agents invent an abbreviation for it. If there's a huge amount of noise and a relatively even distribution over items, then "small small green green triangle triangle" or similar becomes more likely. Languages very naturally reflect both the things people discuss and the environment in which they discuss them.
Your general point is a good one but I don't think irregular verbs are the best example of error correcting redundancy, or evolved shortcutting. In most cases they are just a relic of genealogy, and don't serve those purposes:
> Most English irregular verbs are native, derived from verbs that existed in Old English. Nearly all verbs that have been borrowed into the language at a later stage have defaulted to the regular conjugation.
Irregular verbs (go/went, and so on) congugate (change according to tense and subject) using rules just like regular verbs, except that they have different rules. The irregular verbs use Germanic conjugations (cf. man/men, child/children) whereas the regular verbs use grammatical constructions from other source languages.
In every language, for every word, there will be some history and source. And one can always declare a "different rule" around exceptional cases ... but that's kind of vacuous, and speakers have to remember which words are subject to a minority "rule", so claiming they aren't "exceptions" seems disingenuous.
But if you look at the words in English for which we have "different rules", and you look at which words in other languages which have "different rules" ... they typically line up with frequency. You'll note that the small list of verbs listed above also happen to be irregular verbs in a lot of languages.
While completely true, I think this misses the point which makes minimal "natural" language interesting. Sure you don't use one of these constructed languages in practice the same way you don't build your websites with Turing machine tapes. The question of interest is not one of practice but of theory, what is the equivalent of Turing completeness for natural language? What is the minimum criteria of grammar and vocabulary needed to span the space of conversational ability? In other words, what is the minimum needed for a language to even theoretically be "naturalistic" (even if no naturally occurring language ever looks like it in practice)?
not saying those papers are wrong, but 136 years and millions of speakers from _most_ countries and Esperanto's speakers seem just fine without adding irregular verbs.
> Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common).
Yes, but different natural languages resolve this tension differently.
For example, Turkish is much more regular in its verbs (and in general) than English or German.
> The vowels are pronounced like they are in Spanish, Italian, German, and many other languages
... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context.
Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".
> All of the consonants (b d f g j k l m n p r s t v) are pronounced exactly the same as they are in English. Phew!
Lots of the world's languages have exactly five vowels corresponding to [a], [e], [i], [o], [u], but Japanese is a bit unusual in that the Japanese [u] is unrounded, so it can be more precisely (narrowly) transcribed as [ɯ]. Spanish has a more "typical" set of five vowels. You would presumably be understood all right if you used Spanish vowels in Japanese but you wouldn't sound like a native, so pronouncing [ɯ] correctly usually wouldn't be one's first priority in learning Japanese. In Russian and Turkish, on the other hand, you would have to make a distinction between [u] and [ɯ]. (I'm not an authority on any of this; I just dabble in phonetics.)
That clarifies a bit, but still leaves me confused at some of the choices made.
Why include both sounds "r" and "l", when they can be tricky to distinguish for some speakers, and then use Japanese as pronunciation guide? The sounds "m/n" are also easy to mix up. Same with "b/v", which are pretty much interchangeable to a lot of Spanish speakers. I think the number of consonants could have been reduced considerably.
I like how the language flows though. It seems like a goal has been to avoid consonant clusters. It feels kind of like Swahili, though I don't speak that at all. The only input I would have on this point is that the verb/noun/adjective markers "i/a/e" would be hard to distinguish against words ending in a vowel, which seems to happen a lot. In rapid speech I see that becoming a problem that would cause it to flow less well, or breed forth a need for a de facto fixed word order for clarity.
What if every word started with a consonant and ended in a vowel, including those three markers? What if we completely got rid of problem pairs like "rl/mn/bv", by removing one or both in each pair? Could we get by using mainly voiced consonants? I kind of want to fork this project and try it out.
To be clear, while I am being critical in this comment, I want to explicitly say also that it is an impressive job to have made a new language, and refine it to this level of minimalism. Perhaps I am wary after having "wasted" a lot of time on Esperanto.
In pretty much any language there's no single point on the vowel chart that actually identifies a vowel - it's a spectrum with numerous allophones. Conlangs like this one are generally constructed in such a way as to allow maximally wide spectrum that is still distinctive. So if you pronounce it the way you speak English, that's still fine.
If you want more precision, generally speaking, the value of the character in IPA will match the actual sound value, except for "j".
I think the best way to see it is like this: vowels like in Spanish and consonants like in English. The Duolingo Stories have pronunciation with a TTS engine https://duostories.org/mini-en and the dictionary has pronunciation with an actual human voice (mine) https://jprogr.github.io/buku-name
> > The vowels are pronounced like they are in Spanish, Italian, German, and many other languages
> ... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context. Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".
I listened all four (UK/US English, German, Italian), and the 'o' in moment sounded the same to me.
In English, that "o" is a diphthong, for starters - something like [oʊ] usually
Whether it sounds the same to you or not in different languages/dialects depends on how many "o-like" sounds your native language has. If it's just one, then e.g. [o] and [ɔ] can be hard to distinguish, because you're used to treating them as the same thing manifesting in different contexts.
And that's fine, a lot of sounds are hard to distinguish if you're not familiar with that language. For English, here's a dictionary entry: https://dictionary.cambridge.org/dictionary/english/moment
Showing IPA for both US and UK: it's a slightly different diphthong. For German and Italian (I think), it shouldn't be a diphthong at all. Not everyone will hear a difference, which makes it even more helpful to precisely define what the sound should be. Or just don't put any rules in your instructions, or tell people that it's flexible or whatever.
Reminds me of some similarities to Arabic. Arabic uses root words, usually 3 consonants, that mean many similar things with surrounding letters. K-T-B means writing. Kitab means book. Kitaba is writing.
The script is hard, and you have to learn enough of the roots and recognize them to get the meaning. Indonesian is slightly similar: tinju is boxing and petinju is boxer. Prefixes on roots to build up and guess meaning from context.
I like Arabic's diacritic system which makes pronounciation of a word you've only previously read predictable.
I remember once pronouncing "stoic" as "stoyc" instead of "stow-ik" once in English for example. My limited knowledge of Arabic indicates that one diacritic produces "aah"-like vowels, another produces "ooh"-like vowels and another "iih"-like vowels, and even though some other modifiers come into play later, it's still predictiable how a word is pronounced just from reading it. Would be happy to be corrected if I am wrong.
Redundancy in a natural language is not necessarily a bug. It can be considered a feature. Speach is transmitted over a noisy channel (as everybody knows who has ever tried talking/screaming to a friend on a busy street or a concert), so needs to contain redundancy for error correction purposes. A lot of that is context (there are only a handful of things my friend could be screaming at me at a given point in time), but a lot is that it's enough to hear parts of a sentence to infer what it's about.
Many different contexts make use of this redundancy. Air traffic communications is another example where synonyms are chosen to minimize misunderstandings yet still be concise.
Minimizing redundancy also minimizes synonyms, which can be undesirable. Another example is poetry.
Like Wilkins’ Real Character, a priori languages attempt to decompose the elements of thought into distinct atomic units and build up larger linguistic constructs from those simpler units.
A posteriori languages like Esperanto take a very different approach: rather than starting from scratch with a set of basic concepts, they attempt to pave over the unnecessary grammatical quirks and complications of natural language to create something which is simple and easier to learn.
Mini’s goal is to fully realize both of these visions: to have, at once, a set of linguistic primitives which can be combined to discuss any topic, while ensuring that those primitives are themselves borrowed as directly from natural languages as possible.
Yeah, I don't get it. In Esperanto you don't use particles, but you change the endings of the words, according to their roles in the sentence. How is Mini fundamentally different?
That said, Toki Pona's goal is to help clarify thought, whereas this seems to intend to prioritize communication more highly.
https://en.wikipedia.org/wiki/Toki_Pona
"The spark that led me to create Mini was realizing that a micro-language like TP could actually work: there’s no reason in principle a language with a limited word-count couldn’t have a simple, complete, and unambiguous grammar alongside a vocabulary based on intelligible word roots designed to handle most aspects of everyday discourse...."
I haven't really tried to limit the system to just Mini Kore (which is also 120 words, like Toki Pona, and would be a more direct comparison), mostly because Mini's current size actually seems to have the right feel. It might be an interesting experiment though.
I cannot find a citation quickly, but I recall years ago reading a paper about simulated agents "evolving" a language in a game context where agents had to indicate items to one another, by sending messages which were subject to a noisy channel. Items had multiple attributes (think "small red square", "big green triangle" etc), and experimenters could vary both the noise in the channel, and the entropy of the distribution over items. Naturally if "small red square" is 99% of the things you have to communicate, and there is low noise, agents invent an abbreviation for it. If there's a huge amount of noise and a relatively even distribution over items, then "small small green green triangle triangle" or similar becomes more likely. Languages very naturally reflect both the things people discuss and the environment in which they discuss them.
> Most English irregular verbs are native, derived from verbs that existed in Old English. Nearly all verbs that have been borrowed into the language at a later stage have defaulted to the regular conjugation.
https://en.wikipedia.org/wiki/English_irregular_verbs#Develo...
But if you look at the words in English for which we have "different rules", and you look at which words in other languages which have "different rules" ... they typically line up with frequency. You'll note that the small list of verbs listed above also happen to be irregular verbs in a lot of languages.
Yes, but different natural languages resolve this tension differently.
For example, Turkish is much more regular in its verbs (and in general) than English or German.
... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context. Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".
> All of the consonants (b d f g j k l m n p r s t v) are pronounced exactly the same as they are in English. Phew!
Not helpful.
In college Japanese class we were taught the phrase “ah, we soon get old” for a, i, u, e, and o respectively. I found it to be simple and satisfying.
Why include both sounds "r" and "l", when they can be tricky to distinguish for some speakers, and then use Japanese as pronunciation guide? The sounds "m/n" are also easy to mix up. Same with "b/v", which are pretty much interchangeable to a lot of Spanish speakers. I think the number of consonants could have been reduced considerably.
I like how the language flows though. It seems like a goal has been to avoid consonant clusters. It feels kind of like Swahili, though I don't speak that at all. The only input I would have on this point is that the verb/noun/adjective markers "i/a/e" would be hard to distinguish against words ending in a vowel, which seems to happen a lot. In rapid speech I see that becoming a problem that would cause it to flow less well, or breed forth a need for a de facto fixed word order for clarity.
What if every word started with a consonant and ended in a vowel, including those three markers? What if we completely got rid of problem pairs like "rl/mn/bv", by removing one or both in each pair? Could we get by using mainly voiced consonants? I kind of want to fork this project and try it out.
To be clear, while I am being critical in this comment, I want to explicitly say also that it is an impressive job to have made a new language, and refine it to this level of minimalism. Perhaps I am wary after having "wasted" a lot of time on Esperanto.
If you want more precision, generally speaking, the value of the character in IPA will match the actual sound value, except for "j".
"T" as in "Trent" is the same as "T" as in "butter" for this person?
"S" as in "pass" is the same as "S" as in "passion" too?
"G" as in "go" is the same as "G" as in "gel" as well?
There's a reason humans invented the IPA.
C and G have some ambiguity, but they didn't include C and it should be obvious that G is not going to be the same as J.
> There's a reason humans invented the IPA.
99% of people can't use IPA without an example chart.
But also: "Each letter matches its International Phonetic Alphabet pronunciation with the exception of J, which is the English /dʒ/."
> ... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context. Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".
I listened all four (UK/US English, German, Italian), and the 'o' in moment sounded the same to me.
Whether it sounds the same to you or not in different languages/dialects depends on how many "o-like" sounds your native language has. If it's just one, then e.g. [o] and [ɔ] can be hard to distinguish, because you're used to treating them as the same thing manifesting in different contexts.
The script is hard, and you have to learn enough of the roots and recognize them to get the meaning. Indonesian is slightly similar: tinju is boxing and petinju is boxer. Prefixes on roots to build up and guess meaning from context.
I remember once pronouncing "stoic" as "stoyc" instead of "stow-ik" once in English for example. My limited knowledge of Arabic indicates that one diacritic produces "aah"-like vowels, another produces "ooh"-like vowels and another "iih"-like vowels, and even though some other modifiers come into play later, it's still predictiable how a word is pronounced just from reading it. Would be happy to be corrected if I am wrong.
Many different contexts make use of this redundancy. Air traffic communications is another example where synonyms are chosen to minimize misunderstandings yet still be concise.
Minimizing redundancy also minimizes synonyms, which can be undesirable. Another example is poetry.
Yeah, I don't get it. In Esperanto you don't use particles, but you change the endings of the words, according to their roles in the sentence. How is Mini fundamentally different?
https://www.bbc.com/pidgin
I made this 4 letter language: http://move.rupy.se/file/talk.txt