1. You can access those models via three APIs: the Gemini API (which it turns out is only for prototyping and returned errors 30% of the time), the Vertex API (much more stable but lacking in some functionality), and the TTS API (which performed very poorly despite offering the same models). They also have separate keys (at least, Gemini vs Vertex).
2. Each of those APIs supports different parameters (things like language, whether you can pass a style prompt separate from the words you want spoken, etc). None of them offered the full combination we wanted.
3. To learn this, you have to spend a couple hours reading API docs, or alternatively, just have Claude Code read the docs then try all different combinations and figure out what works and what doesn't (with the added risk that it might hallucinate something).
All I know, is that it smells really unhealthy, and the smoke coming out of houses is a deep, black colour, almost like oil.
Jokes aside, this sounds terrible. What are the policies in place to prevent this?