Question/feature request: Is it possible to bring my own CoreML models over and use them? I honestly end up bundling llama.cpp and doing gguf right now because I can't figure out the setup for using CoreML models, would love for all of that to be abstracted away for me :)
They seem generic enough that I think it's okay, but you're right there is no need in including them and I should've caught that in the AI output, thank you!!
1. The leak Friday was from firebase's file storage service
2. This one is about their firebase database service also being open (up until Saturday morning)
The tl;dr is:
1. App signed up using Firebase Auth
2. App traded Firebase Auth token to API for API token
3. API talked to Firebase DB
The issue is you could just take the Firebase Auth key, talk to Firebase directly, and they had the read/write/update/delete permissions open to all users so it opened up an IDOR exploit.
I pulled the data Friday night to have evidence to prove the information wasn't old like the previous leak and immediately reached out to 404media.
Here is a gist of Gemini 2.5 Pro summarizing 10k random posts: https://gist.github.com/jc4p/7c8ce9a7392f2cbc227f9c6a4096111...
And to be 100% clear, the data in this second "leak" is a 300MB JSON file that (hopefully) only exists on my computer, but I did see evidence that other people were communicating with the Firebase database directly.
If anyone is interested in the how: I signed up against Firebase Auth using a dummy email and password, retrieved an idToken, sent it into the script generated by this Claude convo: https://claude.ai/share/2c53838d-4d11-466b-8617-eae1a1e84f56
And here's the output of that script (any db that has <100 rows is something another "hacker" wrote to and deleted from): https://gist.github.com/jc4p/bc35138a120715b92a1925f54a9d8bb...
> Diffusion: generate a lot of noise then try to clean it up
You could say this about Flows too. The history of them is shared with diffusion and goes back to the Whitening Transform. Flows work by a coordinate transform so we have an isomorphism where diffusion works through, for easier understanding, a hierarchical mixture of gaussians. Which is a lossy process (more confusing when we get into latent diffusion models, which are the primary type used). The goal of a Normalizing Flow is to turn your sampling distribution, which you don't have an explicit representation of, into a probability distribution (typically Normal Noise/Gaussian). So in effect, there are a lot of similarities here. I'd highly suggest learning about Flows if you want to better understand Diffusion Models. > The diffusion approach that is the baseline for sota is Flow Matching from Meta
To be clear, Flow Matching is a Normalizing Flow. Specifically, it is a Continuous and Conditional Normalizing Flow. If you want to get into the nitty gritty, Ricky has a really good tutorial on the stuff[0]Claude's Summary: "Normalizing flows aren't dead, they just needed modern techniques"
My Summary: "Transformers aren't just for text"
1. SOTA model for likelihood on ImageNet 64×64, first ever sub 3.2 (Bits Per Dimension) prev was 2.99 by a hybrid diffusion model
2. Autoregressive (transformers) approach, right now diffusion is the most popular in this space (it's much faster but a diff approach)
tl;dr of autoregressive vs diffusion (there's also other approaches)
Autoregression: step based, generate a little then more then more
Diffusion: generate a lot of noise then try to clean it up
The diffusion approach that is the baseline for sota is Flow Matching from Meta: https://arxiv.org/abs/2210.02747 -- lots of fun reading material if you throw both of these into an LLM and ask it to summarize the approaches!
"write as if you are a person from {{REGION}}. Modify your language to proficiency level {{PROFICIENCY_LEVEL}}"
that way I could for example, speak as if it's someone using Mexican Spanish vs Madrid Spanish vs Chilean Spanish, etc.
Secondly, you could include the user's speech transcribed as part of the conversation window
Curious because I’m trying to learn Romanian, and since it’s a less common language there are fewer resources available. So I wasn’t sure if you added Dutch with minimal amount of effort following the poster’s request.
That said, I gave your app a try with Spanish and it looks pretty good! But I didn’t see a Help page to clarify how I’m “supposed” to interact. Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners. Although full immersion is much better way to learn.
I can try playing around more with it to give you some feedback.
Please let me know if it works, and I'll definitely work on adding in instructions for the expected interactivity, thank you!
I've used the realtime API for something similar (also related to practicing speaking, though not for foreign languages). I just wanted to comment that the realtime API will definitely give you the user's transcriptions -- they come back as an `server.conversation.item.input_audio_transcription.completed` event. I use it in my app for exactly that purpose.
If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.
What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.