Demo: https://www.loom.com/share/a78e713d46934857a2dc88aed1bb100d?...
We started this company after struggling to find great tools to practice speaking Japanese and French. Having a tutor can be awesome, but there are downsides: they can be expensive (since you pay by the hour), difficult to schedule, and have a high upfront cost (finding a tutor you like often forces you to cycle through a few that you don’t).
We wanted something that would talk with us — realistically, in full conversations — and actually help us improve. So we built it ourselves. The app relies on a custom voice AI pipeline combining STT (speech-to-text), TTS (text-to-speech), LLMs, long term memory, interruptions, turn-taking, etc. Getting speech-to-text to work well for learners was one of the hardest parts — especially with accents, multi-lingual sentences, and noisy environments. We now combine Gemini Flash, Whisper, Scribe, and GPT-4o-transcribe to minimize errors and keep the conversation flowing.
We didn’t want to focus too much on gamification. In our experience, that leads to users performing well in the app, achieving long streaks and so on, without actually getting fluent in the language you're wanting to learn.
With ISSEN you instantly speak and immerse yourself in the language, which, while not easy, is a much more efficient way to learn.
We combine this with a word bank and SRS flashcards for new words learned in the AI voice chats, which allows very rapid improvement in both vocabulary and speaking skills. We also create custom curriculums for each student based on goals, interests, and preferences, and fully customizable settings like speed, turn taking, formality, etc.
App: https://issen.com (works on web, iOS, Android) Pricing: 20 min free trial, $20–29/month (depending on duration and specific geography)
We’d love your feedback — on the tech, the UX, or what you’d wish from a tool like this. Thanks!
Now, I tried the web app and chose to learn Greek as a beginner. And while I had better experience with your app than with ChatGPT or Gemini voice modes, I still got lost 5 minutes in because the AI tutor doesn't seem to have a plan for me, nor does it "see" my struggles. For example, after asking me about a hobby, it gives me a long sentence in Greek about how how it is nice to hike in mountains. Being absolute noob I cannot reply to it, nor even repeat it. And I don't even know what it is expected from me at the moment. A human tutor here would probably repeat a part of the sentence with a translation and ask me to repeat, or would explain something. The AI just sits there waiting for me to make a sound, and when I make it, it goes on on a tangental subject of beach vacations. :)
Again, this is still relatively not bad, and I'm going to give it another try.
[0]: https://www.languagetransfer.org/
I tried following the modern Japanese track on Memrise and was appalled at how bad it is nowadays.
Thank you so much for this. Duolingo is literally unbearable because it's so gamified. I'll try it out later. I've seen a few of these apps, can I seamlessly go between my native language and the language I'm trying to learn? If I am trying to learn Hindi, can I ask a question in English in the middle of a conversation?
These kinds of learning apps are destined to become mediocre over time.
The learning metric is so easy to capture, the learning content so easy to produce, yet no one has an individualized loop to make learning work well.
For example, I'd press "Training" on Duolingo, and would get nowhere. Same lessons all of the time. Bread and water.
---
AI: Anh mệt is good if bạn are a man speaking about yourself. You can also say, “Em mệt” if you’re a woman.
this isn't correct. If you are of "older brother" age and are male, you say Anh. Em is for if you are "younger person" (does not matter the gender). Women tend to prefer being called "em" (even if they are older), because women prefer to be identified as younger than their true age... But that doesn't mean you can't call younger men em.
A good tutor would know your age relative to theirs and explain this context.
---
It would say english phrases with a vietnamese accent.
---
It also would give me really complex vietnamese phrases that I am not ready for. when I prompt for an explaination or translation, it would get off track from the original thing we were learning.
---
Way more people in Vietnam (and the globe) speak southern Vietnamese, but the tutors seem to be from north Vietnam.
---
The STT also was very forgiving if I pronounced things incorrectly. Or it would confuse english and vietnamese. I would say, "Phai", but it heard "bye"
---
I was ready to pull out my credit card, but I can't trust it to teach me the right information. I pay $160/mo for Vietnamese tutoring ($20 per class). This would be way cheaper and I don't have to schedule my classes.
I think this company will end up pivoting into a B2B context before long. Hopefully they will still stick to the mission, but who knows (and I wouldn't fault them if they don't – survival comes first).
Trying out your tool, I'd really like to know if the sound is north or southern Vietnamese. I think your tool is southern vietnamese, but idk.. I personally prefer learning southern, but all the AI TTS tools use the north dialect. Ideally, I'd like a 'pure' southern accent and not a hybrid.
For your tool, You might want to get into the way to address people (Anh, em, ba, co, etc). You seem to just use toi (which I hear vietnamese people using with each other too...) but my understanding is the (Anh/Em/Ba/etc) are more 'intimate' whereas toi is more formal/business like?
One idea I haven't tried too much of yet is making flash cards that teach me a sentence structure, but introduce new vocabulary. Learning a diaspora of phrases works for short 2-3 word ones, but when I try to learn more complex sentences, my brain isn't able to draw the patterns as nothing is connected.
For example, trying to learn "bạn tên là gì" and "nhà vệ sinh ở đâu" (from your website) is harder than learning "Bạn tên là gì?", "Bạn nghề là gì?", "Bạn số điện thoại là gì?"
The other huge challenge I have is feeling like I am making progress. I'm definitely getting better, but its pretty disheartening to study for 40+ hours and still can't pronounce words like Can Tho properly, despite knowing how to read and write.
---
My email is in my profile. Feel free to reach out to me if you have more updates or want to bounce ideas.
Also, for the transcription it would be great to get pure romanji to start with!
I do think immersion is generally better, but it is not only harder, an AI app doesn't seem like it could do the right kind of immersion (missing body language, visual cues, seeing the mouth movements, and all sorts of other things one gets from watching a podcast, or even better, in person interaction).
The promise and potential of LLM based language learning apps is that you can cross that gap to full immersion in a way that has never been possible before.
Please be more ambitious.
Deleted Comment
Also Japanese specifically has this meme where it literally is a pitch-accent language but many people say it's not and teaching resources ignore it. E.g. 'ima' means either 'now' or 'living room' depending if syllable #2 is higher or lower. Clearly only applies to some languages, but is another dimension even harder to a learner to know there's a mistake. I have to imagine even other Latin languages probably have reading quirks where this could happen to me.
I think Japanese is somewhat special though for a large number of homonyms (i.e. words that are spelled the same) so speaking with the correct pitch becomes somewhat more important.
There are incorrect reading or Chinese readings occasionally, but you can tell when that happens due to the furigana being different
But how do you know the furigana are correct? Unless you start out fully human-annotated text, you need some automated procedure to add furigana, which pushes the problem from "TTS AI picked the wrong reading" to "furigana AI picked the wrong reading."
It _is_ fixable though. It took me about a week, but I have yet to find a mistaken reading now. This also seems to just be the case with Japanese - most tonal languages seem to have the correct tones (I’m not qualified to comment on how natural the tones sound, but I have yet to find a mismatch like in Japanese)
What I usually do is pick a random blurb in the news and paste the entire thing along with the Reuters link at the beginning and inform ChatGPT that we'll be carrying on language practice specifically over that topic of discussion.
I've used this to carry an hour long foreign language practice in Spanish while walking my husky. Just put the phone in my pocket and go. If you're an intermediate/advanced learner, it's a pretty decent solution.
In fact, you can actually instruct ChatGPT that you are going to speak in your native language, but ChatGPT is only allowed to respond in the target language if you just want to focus on practicing listening comprehension.
I'd be interested in hearing how significantly improved Issen is over this.
You do need an app to create a holistic learning experience just for language learning. Customized curriculum, tons of prompting, AI models chosen for transcription accuracy, flashcards/dictionary, etc.
We also support hands free mode, and many other things are customizable like slang, speaking speed, target language usage, etc.
I've been living in Buenos Aires for over 18 years now, so my pronunciations and accent is quite good. It's just that I never had the proper early fundamental foundations of grammar ..so I have a bunch of embarrassing holes that need filling -- this app is quite precise when it comes to focusing on those aspects.
Te felicito!
Ps my only nit pick so far is the UX on ios > the Settings modal > when opened there is no clear CTA to close it. Because the click-state of the settings button is 97% the same color as the non-click state.
Solution : 1 - add a close X button to the top right (standard accessibility)
2 - change the click-state Color of the settings button to a reverse color or accent color.
Want more UX tear-downs? Dm me artur at visualsitemaps.com
I am very excited for the whole STT/TTS to go away and for us to have models that really "hear" exactly what you said.
Sometimes this is about accent but a lot of the time, the AI won't spot areas where you e.g. fudge a case ending or the stress on a word. Yes, you can get some of that pronunciation right by the AI repeating back with the correct stress or clear case, but you never really get the confidence that you would get from an actual human.
Another product suggestion - turn off transcription (at least for the tutor side of the conversation; I'd suggest both). Personally I find it distracting at best for languages I already speak well and a crutch for those I don't.
Finally, I find it really very hard to enjoy having a random conversation that's not very directed ("What interests you most about artificial intelligence?"). I'd suggest that there are ways of making it more goal focused without being explicitly gamified - maybe something like, here's a position and you have to persuade me (AI debate club!), or something that brings out an actual opinion or relates to a concrete experience ("what's your main goal in your job this year").
Overall though this is the first product I've seen in this space that I might actually use, so well done.