I haven't dug into the github repo but I'm curious if by "guided decoding" you're referring to logit bias (which I use), or actual token blocking? Interested to know how this works technically.
(shameless self plug) I've actually been solving a similar problem for Mandarin learning - but from the comprehensible input side rather than the dictionary side:
https://koucai.chat - basically AI Mandarin penpals that write at your level
My approach uses logit bias to generate n+1 comprehensible input (essentially artificially raising the probability of the tokens that correspond to the user's vocabulary). Notably I didn't add the concept of a "regeneration loop" (otherwise there would be no +1 in N+1) but think it's a good idea.
Really curious about the grammar issues you mentioned - I also experimented with the idea of an AI-enhanced dictionary (given that the free chinese-english dictionary I have is lacking good examples) but determined that the generated output didn't meet my quality standards. Have you found any models that handle measure words reliably?
I have repeatable workflows that harness the benefits of multiple agents. Repeatable workflows drive consistent results for single agents. Using multiple agents allows you to fully explore the problem space.
An example of using these concepts harmoniously would be creating a custom slash command that spawns sub-agents that each have custom prompts, causing them to do more exploration. The commands + agent prompts make the flow repeatable + improvable
1. Effectively infinite engaging comprehensible input at your level 2. Fantastic way to practice new vocabulary and grammar patterns (AI can provide correction for mistakes) 3. Somewhat fun - if you view chat as a choose your own adventure, the experience becomes more interesting
However, due to the more user-driven approach to this learning method (output-focused, user has to put in effort to chat with the AI and get feedback), there is more friction with using the tool. This isn't necessarily a bad thing - in fact, more friction can lead to more meaningful experiences. That being said, I believe the market will push tools to be low friction and low effort (i.e. gamified apps) that are focused on consumption rather than tools that require more user effort.
just my 2c from a fellow builder. if curious, check it out here! would love any feedback
On the other hand, the Chinese writing system is logographic (or ideographic), unlike the English system which is phonetic. The most basic characters, such as 日 (sun), 月 (moon), and 山 (mountain), are essentially graphics (or pictures) of the objects themselves. that makes them very suitable for being represented by images. The emoji you are using is also very good.
I believe this method should be very effective for beginners in Chinese. However, once you have mastered the basic Chinese characters, you can learn about the structure of Chinese characters and then continue reading more materials to expand your vocabulary.
The real challenge is to expand your vocabulary through extensive reading, i'm actually working on a tool to solve this specific problem (https://lingoku.ai/learn-chinese), If you are reading English, it will insert Chinese text for you, if your are reading Chinese text, it will translate the text from Chinese to English then inject Chinese words into the translated text, thus improving your vocabulary while reading.
At least for me, there's large value in consuming bigger volumes of Chinese to get me used to pattern-matching on the characters, as opposed to only reading a smaller amount of harder characters that I'm less likely to actually encounter