I was recently (vibe)-coding some games with my kid, and we wanted some basic text-to-speech functionality. We tested Google's Gemini models in-browser, and they worked great, so we figured we'd add them to the app. Some fun learnings:
1. You can access those models via three APIs: the Gemini API (which it turns out is only for prototyping and returned errors 30% of the time), the Vertex API (much more stable but lacking in some functionality), and the TTS API (which performed very poorly despite offering the same models). They also have separate keys (at least, Gemini vs Vertex).
2. Each of those APIs supports different parameters (things like language, whether you can pass a style prompt separate from the words you want spoken, etc). None of them offered the full combination we wanted.
3. To learn this, you have to spend a couple hours reading API docs, or alternatively, just have Claude Code read the docs then try all different combinations and figure out what works and what doesn't (with the added risk that it might hallucinate something).
- The models perform differently when called via the API vs in the Gemini UI.
- The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory.
- API performance is heavily influenced by the whims of the Google we've observed spreads between 30 seconds and 4 minutes for the same query depending on how Google is feeling that day.
> The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory.
That is sadly true across the board for AI inference API providers. OpenAI and Anthropic API stability usually suffers around launch events. Azure OpenAI/Foundry serving regularly has 500 errors for certain time periods.
For any production feature with high uptime guarantees I would right now strongly advise for picking a model you can get from multiple providers and having failover between clouds.
Also, usage and billing takes a DAY to update. On top of that, there are no billing caps or credit-based billing. They put the entire burden on users not to ensure that they don't have a mega bill.
Trying to implement their gRPC api from their specs and protobufs for Live is an exercise in immense frustration and futility. I wanted to call it from Elixir, even with our strong AI I wasted days then gave up.
We are updating the API to be REST centric. Very fair feedback, see the new Interactions API we just shipped, very REST centric and all future work we do will be REST centric : )
4. If you read about a new Gemini model, you might want to use it - but are you using @google/genai, @google/generative-ai (wow finally deprecated) or @google-ai/generativelanguage? Silly mistake, but when nano banana dropped it was highly confusing image gen was available only through one of these.
5. Gemini supports video! But that video first has to be uploaded to "Google GenAI Drive" which will then splices it into 1 FPS images and feeds it to the LLM. No option to improve the FPS, so if you want anything properly done, you'll have to splice it yourself and upload it to generativelanguage.googleapis.com which is only accessible using their GenAI SDK. Don't ask which one, I'm still not sure.
6. Nice, it works. Let's try using live video. Open the docs, you get it mentioned a bunch of times but 0 documentation on how to actually do it. Only suggestions for using 3rd party services. When you actually find it in the docs, it says
"To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository".
Oh well, time to read badly written python.
7. How about we try generating a video - open up AI studio, see only Veo 2 available from the video models. But, open up "Build" section, and I can have Gemini 3 build me a video generation tool that will use Veo 3 via API by clicking on the example. But wait why cant we use Veo 3 in the AI studio with the same API key?
8. Every Veo 3 extended video has absolutely garbled sound and there is nothing you can do about it, or maybe there is, but by this point I'm out of willpower to chase down edgy edge cases in their docs.
9. Let's just mention one semi-related thing - some things in the Cloud come with default policies that are just absurdly limiting, which means you have to create a resource/account, update policies related to whatever you want to do, which then tells you these are _old policies_ and you want to edit new ones, but those are impossible to properly find.
10. Now that we've setup our accounts, our AI tooling, our permissions, we write the code which takes less than all of the previous actions in the list. Now, you want to test it on Android? Well, you can:
- A. Test it with your account by signing in into emulators, be it local or cloud, manually, which means passing 2FA every time if you want to automate this and constantly risking your account security/ban.
- B. Create a google account for testing which you will use, add it to Licensed Testers on the play store, invite it to internal testers, wait for 24-48 hours to be able to use it, then if you try to automate testing, struggle with having to mock a whole Google Account login process which every time uses some non-deterministic logic to show a random pop-up. Then, do the same thing for the purchase process, ending up with a giant script of clicking through the options
11. Congratulations, you made it this far and are able to deploy your app to Beta. Now, find 12 testers to actively use your app for free, continuously for 14 days to prove its not a bad app.
At this point, Google is actively preventing you from shipping at every step, causing more and more issues the deeper down the stack you go.
> 4. If you read about a new Gemini model, you might want to use it - but are you using @google/genai, @google/generative-ai (wow finally deprecated) or @google-ai/generativelanguage? Silly mistake, but when nano banana dropped it was highly confusing image gen was available only through one of these.?
Yeah, I hear you, open to suggestions to make this more clear, but it is google/genai going forward. Switching packages sucks.
> Gemini supports video! But that video first has to be uploaded to "Google GenAI Drive" which will then splices it into 1 FPS images and feeds it to the LLM. No option to improve the FPS, so if you want anything properly done, you'll have to splice it yourself and upload it to generativelanguage.googleapis.com which is only accessible using their GenAI SDK. Don't ask which one, I'm still not sure.
We have some work ongoing (should launch in the next 3-4 weeks) which will let you reference files (video included) from links directly so you don't need to upload to the File API. We do also support custom FPS: https://ai.google.dev/gemini-api/docs/video-understanding#cu...
> 6. Nice, it works. Let's try using live video. Open the docs, you get it mentioned a bunch of times but 0 documentation on how to actually do it. Only suggestions for using 3rd party services. When you actually find it in the docs, it says "To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository". Oh well, time to read badly written python.
> 7. How about we try generating a video - open up AI studio, see only Veo 2 available from the video models. But, open up "Build" section, and I can have Gemini 3 build me a video generation tool that will use Veo 3 via API by clicking on the example. But wait why cant we use Veo 3 in the AI studio with the same API key?
We are working on adding Veo 3.1 into the drop down, I think it is being tested by QA right now, pinged the team to get ETA, should be rolling out ASAP though, sorry for the confusing experience. Hoping this is fixed by Monday EOD!
> 8. Every Veo 3 extended video has absolutely garbled sound and there is nothing you can do about it, or maybe there is, but by this point I'm out of willpower to chase down edgy edge cases in their docs.
Checking on this, haven't used extend a lot but will see if there is something missing we can clarify.
On some of the later points, I don't have enough domain expertise to weight in but will forward to folks n the Android / Play side to see what we can do to streamline things!
Thank you for taking the time to write up this feedback : ) hoping we can make the product better based on this.
The odd thing about all of this (well, I guess it's not odd, just ironic), is that when Google AdWords started, one of the notable things about it was that anyone could start serving or buying ads. You just needed a credit-card. I think that bought Google a lot of credibility (along with the ads being text-only) as they entered an already disreputable space: ordinary users and small businesses felt they were getting the same treatment as more faceless, distant big businesses.
I have a friend that says Google's decline came when they bought DoubleClick in 2008 and suffered a reverse-takeover: their customers shifted from being Internet users and became other, matchingly-sized corporations.
I have had way too many arguments over the years with product and sales people at my job on the importance of instant self-signup. I want to be able to just pay and go, without having to talk to people or wait for things.
I know part of it is that sales wants to be able to price discriminate and wants to be able to use their sales skills on a customer, but I am never going to sign up for anything that makes me talk to someone before I can buy.
My previous company was like this, and it boggles the mind.
Sales is so focused on their experience that they completely discount what the customer wants. Senior management wants what's best for sales & the bottom line, so they go along with it. Meanwhile, as a prospective customer I would never spend a minute evaluating our product if it means having to call sales to get a demo & a price quote.
My team was focused on an effort to implement self-service onboarding -- that is, allowing users to demo our SaaS product (with various limitations in place) & buy it (if so desired) without the involvement in sales. We made a lot of progress in the year that I was there, but ultimately our team got shutdown & the company was ready to revert back to sales-led onboarding. Last I heard, the CEO "left" & 25% of the company was laid off; teams had been "pivoting" every which way in the year since I'd been let go, as senior management tried to figure out what might help them get more traction in their market.
There will clearly be a gap in understanding, when their whole job is to talk to people, and you come to them to argue for clients to not do that.
As you point out it's not that black and white, most companies will have tiers of client they want to spend less or more time with etc. but sales wanting direct contact with clients is I think a fundamental bit.
Bless you and your family for all time and beyond. Having to talk to someone before I even get a price to compare, or a demo, drives me mad, and then a week later you get their contract and find they claim ownership of everything your company uploads to them -- all that time down the drain, and the salesperson never read the contract so they don't know what to say. Then there are the smaller companies with unwritten policies -- we used to get call metric software from a small Swiss outfit, but I discovered we were billed based on how many employees we've ever had, not based on current employees, with no method to delete terminated employees from the database -- on what planet do you expect someone to pay a recurring expense in perpetuity for someone who showed up for training one day 5 years ago and was never heard from again? I was so mad when they gave us the renewal price, we made our own replacement software for it.
Anyway, long story short: I now require the price and details before I'll even consider talking to a salesperson, not the other way around. Might actually be a good job for an AI agent; they can talk to these sales bozos (respectfully) for me.
That's just a disqualification process. Many products don't want a <$40k/annual customer because they're a net drain. For those, "talk to sales" is a way to qualify whether you're worth it as a customer. Very common in B2B and makes sense. Depends entirely on the product, of course.
If it's only pay and go why have Sales at all? At the very best you need only a slimmed down Sales Department, so being against pay and go is self preservation.
If a platform is designed in a way that users can sign up and go, it can work well.
If an application is complicated or it’s a tool that the whole business runs on, often times the company will discover their customers have more success with training and a point of contact/account manager to help with onboarding.
Instant self signup died with cryptocurrency and now AI: any "free" source of compute/storage/resources will be immediately abused until you put massive gates on account creation.
That has definitely changed. Google AdWords today is one of the most unfriendly services to onboard I've ever encountered. Signing up is trivial, setting up your first ad is easy, then you instantly get banned. Appeals do nothing. You essentially have to hire a professional just to use it.
Yet it's still absolutely inundated with scams and occasionally links that directly download malware[1] that they don't action reports on. I don't think the process needs to be easier if they already can't keep up with moderation.
The thing to understand about google services is that they see so much spam and abuse that it's easier for them to just assume you are a spammer rather than a legitimate customer, unless you go through other channels to establish yourself.
Also adding onto this, it is impossible to get human support!
One of my co-workers left with an active account and active card but no passwords noted. The company gave up and just had to cancel + create a new account for the next adwords specialist.
Hi, as the original-thought-haver here (and a buyer of DoubleClick's services on various projects 1998-2003), I should clarify -the problem with Google's acquisition of DoubleClick wasn't just about customer scale, or even market power, it was that DoubleClick was already the skeeziest player on the internet, screwing over customers, advertisers and platforms at every opportunity, and culturally antithetical to Google at the time. And there wasn't any way that "Don't Be Evil" was going to win in the long run.
I wasted several hours this week going around in the exact same circles. We have a billing account, but kept hitting a gemini quota. Fine. But then on the quota page, every quota said 0% usage. And our bill was like $5. Some docs said check AI studio, but then the "import project from google cloud to AI studio" button kept silently failing. This was a requests per minute quota, which was set at 15 (not a whole lot...) but wouldn't reset for 24 hours. So then I kept making new projects so I could keep testing this thing I'm building, until eventually I ran out.
The only way we could get it resolved was to (somehow) get a real human at google on the phone because we're in some startup program or something and have some connection there. Then he put in a manual request to bump our quota up.
Google cloud is the most kafkaesque insane system I've ever had the misfortune of dealing with. Every time I use it I can tell the org chart is leaking.
For the last decade or so I get a second $0.85 monthly bill from google. Nobody at google knows why, but they recommend to leave it because who knows what could be disabled if I block those payments. Interesting detail here is that this is on a bank account that we stopped using in 2017, so the only reason we are keeping that account alive is for these stupid google payments. In the cloud environment there is an invoice for the amounts, but no way to change the billing info to our current account and also no way (not by us, not by google support) to figure out what these payments are actually for...
Calling it kafkaesque is giving it too much credit.
I recently got an email saying a project I got is at the risk of being disabled because my payment information is invalid. But the card I got registered for it is the same I've had the last two years, and it's still valid cause I used it yesterday. Also, there is no amount due as far as I can tell. I haven't done anything with the project for 6 month, it's just sitting there. No API usage, nothing.
So I got no idea what to do to address it. I feel my best option is wait for it to get disabled and try to address it afterwards.
Chargebacks or disputes will lock your account, so definitely stay away from that path.
But just closing the bank account will stop auto billing (it's considered a decline). So if you closed the account, it would just stop paying for whatever it is, and then cloud may lock the gcp account until it's paid. (I'm not 100% sure what cloud does with unpaid invoices).
I have been fighting the same bizarre quota demon. Scripts kept timing out due to quota limitations, but I haven't been able to find any indication of a limit in the console. Finally gave up and switched to Claude, since they at least have a sane interface for API keys and billing!
The trick here is that they describe internal loadshedding as quota limits.
There’s a quote for your general class of query, and there’s a quota for how many can be in flight on a given server. It’s not necessarily about you specifically.
Unfortunately Google's problem is the product is dictated by the architecture of the APIs and this is an issue for anything they do. At one point long ago every Google product was disjointed and Larry Page told everyone they needed to be unified under a single theme and login. Then over time with the scale of the company you become entirely dependent on the current workflows. To work around it, all of a sudden there's a new UI for a new product and it looks super clean right until you try do something with that login or roles or an API key that has to effectively jailbreak the flow you're in. Painful. It's why startups win. Small, nimble, none of that legacy cruft to deal with. Whoever is working hard to fix these problems at Google KUDOS TO YOU because it's not easy. It's not easy to wrangle these systems across hundreds of teams, products and infrastructure. The unification and seamless workflow at that scale is painfully hard to achieve and the issue is entirely about operating within the limitations of the system but for good reason.
I hope they figure out a lot of the issues but at the same time, I hope Gemini just disappears back into products rather than being at the forefront, because I think that's when Google does it's best work.
> The unification and seamless workflow at that scale is painfully hard to achieve
It does make you wonder, why not just be a lot smaller? It's not like most of these teams actually generate any revenue. It seems like a weird structural decision which maybe made sense when hoovering up available talent was its own defensive moat but now that strategy is no longer plausible should be rethought?
Two reasons. 1 - they print cash through Ads which means there's opportunity or desire to do more things, or even a feeling like you should or can. So new products emerge but also to try diversify the revenue stream. 2 - the continuous hiring and scale means churn, people get bored, they leave teams, they want to do something new, it all bifurcates. It keeps fragmenting and fragmenting until you have this multilayered fractal. It's how systems in nature operate so we shouldn't think corporation's will be any different. The only way to mitigate things like this is putting in places limits, rules and boundaries, but that also limits the upside and if you're a public company you can't do that. You have to grow grow grow and then cut cut cut and continue in that cycle forever or until you die.
> The “Set up billing” link kicked me out of Google AI Studio and into Google Cloud Console, and my heart sank. Every time I’ve logged into Google Cloud Console or AWS, I’ve wasted hours upon hours reading outdated documentation, gazing in despair at graphs that make no sense, going around in circles from dashboard to dashboard, and feeling a strong desire to attain freedom from this mortal coil.
Add me to the list of "saw nano banana pro, attempted to get an API key for like 5min, failed and gave up." Maybe I am a dummy (quite possible) but I have seen many smart people similarly flummoxed!
You can walk into a McDonalds without being able to read, write, or speak English, and the order touchscreen UI is so good (er, "good") that you can successfully order a hamburger in about 60 seconds. Why can't Google (of all companies) figure this out?
I tried at some point to sign up for whatever IBMs AI cloud was called. None of the documentation was up to date, when you clicked on things you ended up in circular loops that took you back where you started. Somehow there were several kinds of api keys you could make, most seemingly decoys and only one correct one. The whole experience was like one of those Mario castle levels where if you don’t follow the exact right pattern you just loop back to where you started.
It makes sense for IBM, seems like google is just reaching that stage?
That docs page has a link in the first primary section on the page. Sure, it could be a huge CTA, but this is a docs page, so it's kinda nice that it's not gone through a marketing make over.
* besides sponsored result for AI Studio
(Maybe I misunderstood and all the complaints are about billing. I don't remember having issues when I added my card to GCP in the past, but maybe I did)
I've to this day never been able to pay for Gemini through the API, even though I've tried maybe 6-7 times
If you bring it up to Logan he'll just brush it off — I honestly don't know if they test these UX flows with their own personal accounts, or if something is buggy with my account.
To Logan's credit though, his team made and drove a lot of good improvements in AI studio and Gemini in general since the early days.
I feel his team is really hitting a wall now in terms of improvements, because it involves Google teams/products outside of their control, or require deep collaboration.
This is my experience as well in my personal account, however at work given we were already paying for Google Cloud it was easy enough to connect a GCP account.
But somehow personally even though I'm a paying Google One subscriber and have a GCP billing account with a credit card, I get confusing errors when trying to use the Gemini API
AI Studio is meant to be the fast path from prompt to production, bringing billing fully into AI Studio in January will make this even faster! We have hundreds of thousands of paying customers in production using AI Studio right now.
I could've made my comment more clear. Definitely missing a statement along the lines of "and then after creating, you click 'set up billing' and link the accounts in 15 seconds"
I did edit my message to mention I had GCP billing set up already. I'm guessing that's one of the differences between those having trouble and those not.
Every aspect is at least partially broken several times a day, and even when there isn't a temporary outage of something somewhere, there are nonsensical "blocks" for things that ought to just work.
I've been using the AI Studio with my personal Workspace account. I can generate an API key. That worked for a while, but now Gemini CLI won't accept it. Why? No clue. It just says that I'm "not allowed" to use Gemini Pro 3 with the CLI tool. No reason given, no recourse, just a hand in your face flatly rejecting access to something I am paying for and can use elsewhere.
Simultaneously, I'm trying to convince my company to pay for a corporate account of some sort so that I can use API keys with custom tools and run up a bill of potentially thousands of dollars that we can charge back to the customer.
My manager tried to follow the instructions and... followed the wrong ones. They all look the same. They all talk about "Gemini" and "Enterprise". He ended up signing up for Google's equivalent of Copilot for business use, not something that provides API keys to developers. Bzzt... start over from the beginning!
I did eventually find the instructions by (ironically) asking Gemini Pro, which provided the convenient 27 step process for signing up to three different services in a chain before you can do anything. Oh, and if any of them trigger any kind of heuristic, again, you get a hand in face telling you firmly and not-so-politely to take a hike.
PS: Azure's whatever-it-is-called-today is just as bad if not worse. We have a corporate account and can't access GPT 5 because... I dunno. We just can't. Not worthy enough for access to Sam Altman's baby, apparently.
> I've been using the AI Studio with my personal Workspace account. I can generate an API key. That worked for a while, but now Gemini CLI won't accept it. Why? No clue. It just says that I'm "not allowed" to use Gemini Pro 3 with the CLI tool. No reason given, no recourse, just a hand in your face flatly rejecting access to something I am paying for and can use elsewhere.
Passing along this feedback to the CLI team, no clue why this would be the case.
Excuse me? If you mean AI Studio, are you talking about the product where you can’t even switch which logged in account you’re using without agreeing to its terms under whatever random account it selected, where the ability to turn off training on your data does not obviously exist, and where it’s extremely unclear how an organization is supposed to pay for it?
I have a claude max subscription and a gemini pro sub and I exclusively use them on the cli. When I run out of claude max each week I switch over to gemini and the results have been pretty impressive -- I did not want to like it but credit where credit is due to google.
Like the OP others I didn't use the API for gemini and it was not obvious how to do that -- that said it's not cost effective to develop without a Sub vs on API pay-as-you-go, so i do no know why you would? Sure you need API for any applications with built-in LLM features, but not for developing in the LLM assisted CLI tools.
I think the issue with cli tools for many is you need to be competent with cli like a an actual nix user not Mac first user etc. Personally I have over 30 years of daily shell use and a sysadmin and developer. I started with korn and csh and then every one you can think of since.
For me any sort of a GUI slows me down so much it's not feasible. To say nothing of the physical aliments associated with excessive mousing.
Having put approaching thousands of hours working with LLM coding tools so far, for me claude-code is the best, gemini is very close and might have a better interface, and codex is unusable and fights me the whole time.
I did this same thing and this was my first result too. I am just not seeing how the author ended up where they did, unless knowing how to use Google search is not a core skill.
Read the full post. Partway down you will see they agree with you that getting an API key is not hard.
Paying is hard. And it is confusing how to set it up: you have to create a Vertex billing account and go through a cumbersome process to then connect your AIStudio to it and bring over a "project" which then disconnects all the time and which you have to re-select to use Nano Banana Pro or Gemini 3. It's a very bad process.
It's easy to miss this because they are very generous with the free tier, but Gemini 3 is not free.
I did notice in their post instead of searching for answers, they asked Gemini how to do things, and when that didn't work, they asked Claude.
I often see coworkers offload their work of critical thinking to an AI to give them answers instead doing the grunt work nessecary to find their answers on their own.
Hi if the Gemini API team is reading this can you please be more transparent about 'The specified schema produces a constraint that has too many states for serving. ...' when using Structured Outputs.
I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.
Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.
1. You can access those models via three APIs: the Gemini API (which it turns out is only for prototyping and returned errors 30% of the time), the Vertex API (much more stable but lacking in some functionality), and the TTS API (which performed very poorly despite offering the same models). They also have separate keys (at least, Gemini vs Vertex).
2. Each of those APIs supports different parameters (things like language, whether you can pass a style prompt separate from the words you want spoken, etc). None of them offered the full combination we wanted.
3. To learn this, you have to spend a couple hours reading API docs, or alternatively, just have Claude Code read the docs then try all different combinations and figure out what works and what doesn't (with the added risk that it might hallucinate something).
- The models perform differently when called via the API vs in the Gemini UI.
- The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory.
- API performance is heavily influenced by the whims of the Google we've observed spreads between 30 seconds and 4 minutes for the same query depending on how Google is feeling that day.
That is sadly true across the board for AI inference API providers. OpenAI and Anthropic API stability usually suffers around launch events. Azure OpenAI/Foundry serving regularly has 500 errors for certain time periods.
For any production feature with high uptime guarantees I would right now strongly advise for picking a model you can get from multiple providers and having failover between clouds.
I'm passing docs for bulk inference via Vertex, and a small number of returned results will include gibberish in Japanese.
This shouldn't be surprised, e.g. the model != the product. The same way GPT4o behaves differently than the ChatGPT product when using GPT4o.
This difference between API vs UI responses being different is common across all the big players (Claude, GPT models, etc.)
The consumer chat interfaces are designed for a different experience than a direct API call, even if pinging the same model.
We are working on billing caps along with credits right now. Billing caps will land first in Jan!
Was really curious about that when I saw this in the posted article:
> I had some spare cash to burn on this experiment,
Hopefully the article's author is fully aware of the real risk of giving Alphabet his CC details on a project which has no billing caps.
4. If you read about a new Gemini model, you might want to use it - but are you using @google/genai, @google/generative-ai (wow finally deprecated) or @google-ai/generativelanguage? Silly mistake, but when nano banana dropped it was highly confusing image gen was available only through one of these.
5. Gemini supports video! But that video first has to be uploaded to "Google GenAI Drive" which will then splices it into 1 FPS images and feeds it to the LLM. No option to improve the FPS, so if you want anything properly done, you'll have to splice it yourself and upload it to generativelanguage.googleapis.com which is only accessible using their GenAI SDK. Don't ask which one, I'm still not sure.
6. Nice, it works. Let's try using live video. Open the docs, you get it mentioned a bunch of times but 0 documentation on how to actually do it. Only suggestions for using 3rd party services. When you actually find it in the docs, it says "To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository". Oh well, time to read badly written python.
7. How about we try generating a video - open up AI studio, see only Veo 2 available from the video models. But, open up "Build" section, and I can have Gemini 3 build me a video generation tool that will use Veo 3 via API by clicking on the example. But wait why cant we use Veo 3 in the AI studio with the same API key?
8. Every Veo 3 extended video has absolutely garbled sound and there is nothing you can do about it, or maybe there is, but by this point I'm out of willpower to chase down edgy edge cases in their docs.
9. Let's just mention one semi-related thing - some things in the Cloud come with default policies that are just absurdly limiting, which means you have to create a resource/account, update policies related to whatever you want to do, which then tells you these are _old policies_ and you want to edit new ones, but those are impossible to properly find.
10. Now that we've setup our accounts, our AI tooling, our permissions, we write the code which takes less than all of the previous actions in the list. Now, you want to test it on Android? Well, you can:
- A. Test it with your account by signing in into emulators, be it local or cloud, manually, which means passing 2FA every time if you want to automate this and constantly risking your account security/ban.
- B. Create a google account for testing which you will use, add it to Licensed Testers on the play store, invite it to internal testers, wait for 24-48 hours to be able to use it, then if you try to automate testing, struggle with having to mock a whole Google Account login process which every time uses some non-deterministic logic to show a random pop-up. Then, do the same thing for the purchase process, ending up with a giant script of clicking through the options
11. Congratulations, you made it this far and are able to deploy your app to Beta. Now, find 12 testers to actively use your app for free, continuously for 14 days to prove its not a bad app.
At this point, Google is actively preventing you from shipping at every step, causing more and more issues the deeper down the stack you go.
13. Get your whole google account banned.
Yeah, I hear you, open to suggestions to make this more clear, but it is google/genai going forward. Switching packages sucks.
> Gemini supports video! But that video first has to be uploaded to "Google GenAI Drive" which will then splices it into 1 FPS images and feeds it to the LLM. No option to improve the FPS, so if you want anything properly done, you'll have to splice it yourself and upload it to generativelanguage.googleapis.com which is only accessible using their GenAI SDK. Don't ask which one, I'm still not sure.
We have some work ongoing (should launch in the next 3-4 weeks) which will let you reference files (video included) from links directly so you don't need to upload to the File API. We do also support custom FPS: https://ai.google.dev/gemini-api/docs/video-understanding#cu...
> 6. Nice, it works. Let's try using live video. Open the docs, you get it mentioned a bunch of times but 0 documentation on how to actually do it. Only suggestions for using 3rd party services. When you actually find it in the docs, it says "To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository". Oh well, time to read badly written python.
Just pinged the team, we will get a live video example added here: https://ai.google.dev/gemini-api/docs/live?example=mic-strea... should have it live Monday, not sure why that isn't there, sorry for the miss!
> 7. How about we try generating a video - open up AI studio, see only Veo 2 available from the video models. But, open up "Build" section, and I can have Gemini 3 build me a video generation tool that will use Veo 3 via API by clicking on the example. But wait why cant we use Veo 3 in the AI studio with the same API key?
We are working on adding Veo 3.1 into the drop down, I think it is being tested by QA right now, pinged the team to get ETA, should be rolling out ASAP though, sorry for the confusing experience. Hoping this is fixed by Monday EOD!
> 8. Every Veo 3 extended video has absolutely garbled sound and there is nothing you can do about it, or maybe there is, but by this point I'm out of willpower to chase down edgy edge cases in their docs.
Checking on this, haven't used extend a lot but will see if there is something missing we can clarify.
On some of the later points, I don't have enough domain expertise to weight in but will forward to folks n the Android / Play side to see what we can do to streamline things!
Thank you for taking the time to write up this feedback : ) hoping we can make the product better based on this.
Deleted Comment
I have a friend that says Google's decline came when they bought DoubleClick in 2008 and suffered a reverse-takeover: their customers shifted from being Internet users and became other, matchingly-sized corporations.
I know part of it is that sales wants to be able to price discriminate and wants to be able to use their sales skills on a customer, but I am never going to sign up for anything that makes me talk to someone before I can buy.
1. Never make it hard for people to give you money.
You say that as if it isn’t the entire reason why these interactions should be avoided at all costs. Dynamic pricing should be a crime.
Sales is so focused on their experience that they completely discount what the customer wants. Senior management wants what's best for sales & the bottom line, so they go along with it. Meanwhile, as a prospective customer I would never spend a minute evaluating our product if it means having to call sales to get a demo & a price quote.
My team was focused on an effort to implement self-service onboarding -- that is, allowing users to demo our SaaS product (with various limitations in place) & buy it (if so desired) without the involvement in sales. We made a lot of progress in the year that I was there, but ultimately our team got shutdown & the company was ready to revert back to sales-led onboarding. Last I heard, the CEO "left" & 25% of the company was laid off; teams had been "pivoting" every which way in the year since I'd been let go, as senior management tried to figure out what might help them get more traction in their market.
Someone who works in finance or conpliances might want a demo, or views those things as signals the product is for serious use cases.
> talk to people
There will clearly be a gap in understanding, when their whole job is to talk to people, and you come to them to argue for clients to not do that.
As you point out it's not that black and white, most companies will have tiers of client they want to spend less or more time with etc. but sales wanting direct contact with clients is I think a fundamental bit.
Anyway, long story short: I now require the price and details before I'll even consider talking to a salesperson, not the other way around. Might actually be a good job for an AI agent; they can talk to these sales bozos (respectfully) for me.
If a platform is designed in a way that users can sign up and go, it can work well.
If an application is complicated or it’s a tool that the whole business runs on, often times the company will discover their customers have more success with training and a point of contact/account manager to help with onboarding.
Boy oh boy are they going to be surprised when they learn what AI can replace.
[1]: https://adstransparency.google.com/advertiser/AR129387695568...
One of my co-workers left with an active account and active card but no passwords noted. The company gave up and just had to cancel + create a new account for the next adwords specialist.
Look how quaint this seems now: https://www.cnet.com/tech/services-and-software/consumer-gro...
The only way we could get it resolved was to (somehow) get a real human at google on the phone because we're in some startup program or something and have some connection there. Then he put in a manual request to bump our quota up.
Google cloud is the most kafkaesque insane system I've ever had the misfortune of dealing with. Every time I use it I can tell the org chart is leaking.
Calling it kafkaesque is giving it too much credit.
So I got no idea what to do to address it. I feel my best option is wait for it to get disabled and try to address it afterwards.
But just closing the bank account will stop auto billing (it's considered a decline). So if you closed the account, it would just stop paying for whatever it is, and then cloud may lock the gcp account until it's paid. (I'm not 100% sure what cloud does with unpaid invoices).
pls send feedback if this is helpful!
There’s a quote for your general class of query, and there’s a quota for how many can be in flight on a given server. It’s not necessarily about you specifically.
I hope they figure out a lot of the issues but at the same time, I hope Gemini just disappears back into products rather than being at the forefront, because I think that's when Google does it's best work.
It does make you wonder, why not just be a lot smaller? It's not like most of these teams actually generate any revenue. It seems like a weird structural decision which maybe made sense when hoovering up available talent was its own defensive moat but now that strategy is no longer plausible should be rethought?
100% agree
Dead Comment
You can walk into a McDonalds without being able to read, write, or speak English, and the order touchscreen UI is so good (er, "good") that you can successfully order a hamburger in about 60 seconds. Why can't Google (of all companies) figure this out?
It makes sense for IBM, seems like google is just reaching that stage?
I made a free Chrome extension that uses Fal api key if you want a UI instead of code
https://chromewebstore.google.com/detail/ai-slop-canvas/dogg...
I google `gemini API key` and the first result* is this docs page: https://ai.google.dev/gemini-api/docs/api-key
That docs page has a link in the first primary section on the page. Sure, it could be a huge CTA, but this is a docs page, so it's kinda nice that it's not gone through a marketing make over.
* besides sponsored result for AI Studio
(Maybe I misunderstood and all the complaints are about billing. I don't remember having issues when I added my card to GCP in the past, but maybe I did)
If you bring it up to Logan he'll just brush it off — I honestly don't know if they test these UX flows with their own personal accounts, or if something is buggy with my account.
I feel his team is really hitting a wall now in terms of improvements, because it involves Google teams/products outside of their control, or require deep collaboration.
But somehow personally even though I'm a paying Google One subscriber and have a GCP billing account with a credit card, I get confusing errors when trying to use the Gemini API
But also the (theoretical) production platform for Gemini is Vertex AI, not AI Studio.
And until pretty recently using that took figuring out service accounts, and none of Google's docs would demonstrate production usage.
Instead they'd use the gcloud CLI to authenticate, and you'd have to figure out how each SDK consumed a credentials file.
-
Now there's "express mode" for Vertex which uses an API Key, so things are better, but the complaints were well earned.
At one point there were even features (like using a model you finetuned) that didn't work without gcloud depending on if you used Vertex or AI Studio: https://discuss.ai.google.dev/t/how-can-i-use-fine-tuned-mod...
I did edit my message to mention I had GCP billing set up already. I'm guessing that's one of the differences between those having trouble and those not.
I've been using the AI Studio with my personal Workspace account. I can generate an API key. That worked for a while, but now Gemini CLI won't accept it. Why? No clue. It just says that I'm "not allowed" to use Gemini Pro 3 with the CLI tool. No reason given, no recourse, just a hand in your face flatly rejecting access to something I am paying for and can use elsewhere.
Simultaneously, I'm trying to convince my company to pay for a corporate account of some sort so that I can use API keys with custom tools and run up a bill of potentially thousands of dollars that we can charge back to the customer.
My manager tried to follow the instructions and... followed the wrong ones. They all look the same. They all talk about "Gemini" and "Enterprise". He ended up signing up for Google's equivalent of Copilot for business use, not something that provides API keys to developers. Bzzt... start over from the beginning!
I did eventually find the instructions by (ironically) asking Gemini Pro, which provided the convenient 27 step process for signing up to three different services in a chain before you can do anything. Oh, and if any of them trigger any kind of heuristic, again, you get a hand in face telling you firmly and not-so-politely to take a hike.
PS: Azure's whatever-it-is-called-today is just as bad if not worse. We have a corporate account and can't access GPT 5 because... I dunno. We just can't. Not worthy enough for access to Sam Altman's baby, apparently.
Passing along this feedback to the CLI team, no clue why this would be the case.
Excuse me? If you mean AI Studio, are you talking about the product where you can’t even switch which logged in account you’re using without agreeing to its terms under whatever random account it selected, where the ability to turn off training on your data does not obviously exist, and where it’s extremely unclear how an organization is supposed to pay for it?
Like the OP others I didn't use the API for gemini and it was not obvious how to do that -- that said it's not cost effective to develop without a Sub vs on API pay-as-you-go, so i do no know why you would? Sure you need API for any applications with built-in LLM features, but not for developing in the LLM assisted CLI tools.
I think the issue with cli tools for many is you need to be competent with cli like a an actual nix user not Mac first user etc. Personally I have over 30 years of daily shell use and a sysadmin and developer. I started with korn and csh and then every one you can think of since.
For me any sort of a GUI slows me down so much it's not feasible. To say nothing of the physical aliments associated with excessive mousing.
Having put approaching thousands of hours working with LLM coding tools so far, for me claude-code is the best, gemini is very close and might have a better interface, and codex is unusable and fights me the whole time.
Paying is hard. And it is confusing how to set it up: you have to create a Vertex billing account and go through a cumbersome process to then connect your AIStudio to it and bring over a "project" which then disconnects all the time and which you have to re-select to use Nano Banana Pro or Gemini 3. It's a very bad process.
It's easy to miss this because they are very generous with the free tier, but Gemini 3 is not free.
I often see coworkers offload their work of critical thinking to an AI to give them answers instead doing the grunt work nessecary to find their answers on their own.
I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.
OpenAI has more generous limits on the schemas and clearer docs. https://platform.openai.com/docs/guides/structured-outputs#s....
You guys closed this issue for no reason: https://github.com/googleapis/python-genai/issues/660
Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.