I still don't really understand what Vertex AI is.
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
It's a way for you to have your AI billing under the same invoice as all of your other cloud purchases. If you're a startup this is a dumb feature, if you work at a $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers
> $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers
What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.
Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.
It's also useful in a startup, I just start using it with zero effort.
For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".
Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.
Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.
Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can
In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.
Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.
Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.
Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.
OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.
When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…
simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?
Wait until you see how to check Bedrock usage in AWS.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.
Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.
FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.
Hey there, I’m Chris Cho (x: chrischo_pm, Vertex PM focusing on DevEx) and Ivan Nardini (x: ivnardini, DevRel). We heard you and let us answer your questions directly as possible.
First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex
We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.
I’m going to respond to some of the comments given here directly on the thread
Can we avoid weekend changes to the API? I know it's all non-GA, but having `includeThoughts` suddenly work at ~10AM UTC on a Sunday and the raw thoughts being returned after they were removed is nice, but disruptive.
Can you tell me the exact instance when this happened please? I will take this feedback back to my colleagues. But in order to change how we behave I need a baseline and data
I love that you're responding on HN, thanks for that! While you're here I don't suppose you can tell me when Gemini 2.5 Pro is hitting European regions on Vertex? My org forbids me from using it until then.
Thanks for replying, and I can safely say that most of us just want first-class conformity with OpenAI's API without JSON schema weirdness (not using refs, for instance) baked in.
Hi, one thing I am really struggling with in AI studio API is stop_sequences. I know how to request them, but cannot see how to determine which stop_sequence was triggered. They don't show up in the stop_reason like most other APIs. Is that something which vertex API can do? I've built some automation tools around stop_sequences, using them for control logic, but I can't use Gemini as the controller without a lot of brittle parsing logic.
Is there an undocumented hardcoded timeout for Gemini responses even in streaming mode? JSON output according to a schema can get quite lengthy, and I can't seem to get all of it for some inputs because Gemini seemingly terminates requests
Google usually doesn't care what users say at all. This is why they so often have product-crippling bugs and missing features. At least this guy is making a show of trying before he transfers to another project.
Ramoz, good to hear that native Structured Outputs are working! But if the docs are 'confusing and partially incomplete,' that’s not a good DevEx. Good docs are non-negotiable. We are in the process of revamping the whole documentation site. Stay tuned, you will see something better than what we have today.
Site seems to be down - I can’t get the article to load - but by far the most maddening part of Vertex AI is the way it deals with multimodal inputs. You can’t just attach an image to your request. You have to use their file manager to upload the file, then make sure it gets deleted once you’re done.
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
That’s correct! You can send images through uploading either the Files API from Gemini API or Google Cloud Storage (GCS) bucket reference. What we DON’T have a sample on is sending images through bytes. Here is a screenshot of the code sample from the “Get Code” function in the Vertex AI studio.
https://drive.google.com/file/d/1rQRyS4ztJmVgL2ZW35NXY0TW-S0...
Let me create a feature request to get these samples in our docs because I could not find a sample too. Fixing it
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
It looks like you can use the gemma tokenizer to count tokens up to at least the 1.5 models. The docs claim that there's a local compute_tokens function in google-genai, but it looks like it just does an API call.
Additionally, there's no OpenAPI spec, so you have to generate one from their protobuf specs if you want to use that to generate a client model. Their protobuf specs live in a repo at https://github.com/googleapis/googleapis/tree/master/google/.... Now you might think that v1 would be the latest there, but you would be wrong - everyone uses v1beta (not v1, not v1alpha, not v1beta3) for reasons that are completely unclear. Additionally, this repo is frequently not up to date with the actual API (it took them ages to get the new thinking config added, for example, and their usage fields were out of date for the longest time). It's really frustrating.
lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.
The way dependency resolution works in Java with the special, Google only, giant dynamic BOM resolver is hell on earth.
We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.
Read timeouts aren't configurable on the Vertex SDK.
I’m sorry have you used Azure? I’ve worked with all the major cloud providers and Google has its warts, but pales in comparison to the hoops Azure make you jump through to make a simple API call.
Azure API for LLM changes depending on what datacenter you are calling. It is bonkers. In fact it is so bad that at work we are hosting our own LLMs on azure GPU machines rather than use their API. (Which means we only have small models at much higher cost…)
In general, it's just wild to see Google squander such an intense lead.
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
With Gemini 2.5 (both Pro and Flash) Google have regained so much of that lost ground. Those are by far the best long-context models right now, extremely competitively priced and they have features like image mask segmentation that aren't available from other models yet: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...
I think the commenter was saying google squandered its lead ("goodwill" is how I would refer to it) in providing open and interoperable services, not the more recent lead it squandered in AI. I agree with your point that they've made up a lot of that ground with gemini 2.5.
Gemini 2.5 Pro is so good. I’ve found that using it as the architect and orchestrator, then farming subtasks and computer use to sonnet, is the best ROI
The models are great but the quotas are a real pain in the ass. You will be fighting other customers for capacity if you end up needing to scale. If you have serious Gemini usage in mind, you almost have to have a Google Cloud TAM to advocate for your usage and quotas.
Google's headcount (and internal red tap) grew significantly from 2012 to 2025. You're highlighting the fact that at some point in its massive growth, Google had to stop relentlessly pushing R&D and allocate leadership focus on addressing technical debt (or at least operational efficiency) that was a consequence of that growth.
I don't understand why Sundar Pichai hasn't been replaced. Google seems like it's been floundering with respect to its ability to innovate and execute in the past decade. To the extent that this Google has been a good maintenance org for their cash cows, even that might not be a good plan if they dropped the ball with AI.
Perhaps you need to first define "innovation" and maybe also rationalize why that view of innovation is the end-all of determining CEO performance. Otherwise you're begging the question here.
Google's stock performance, revenue growth, and political influence in Washington under his leadership has grown substantially. I don't disagree that there are even better CEO's out there, but as an investor, the framing of your question is way off. Given the financial performance, why would you want to replace him?
Answer is simple: he keeps cash coming in and stock price rising. You can compare his performance to his predecessors and CEOs at other companies. That does not necessarily make him a "good" leader in your eyes, but good enough to the board.
Google is the leader in LLMs and self-driving cars, two of the biggest innovation areas in the last decade, so how exactly has it been floundering in its ability to innovate and execute?
googles worth 2 trillion dollars off the back of a website. I think investors are so out of their depth with tech that theyre cool with his mediocre performance
Hubris. It seems similar, at least externally, to what happened at Microsoft in the late 90s/early 00s. I am convinced that a split-up of Microsoft would have been invigorating for the spin-offs, and the tech industry in general would have been better for it.
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk
What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.
Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.
For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.
Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.
- Vertex AI
- AI Studio
- Gemini
- Firebase Gen AI
Just stick with AI Studio and the free developer AI along with it; you will be much much happier.
Do Google use all the AI studio traffic to train etc?
First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex
We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.
I’m going to respond to some of the comments given here directly on the thread
Regardless of if I passed a role or not, the function would say something to the effect of "invalid role, accepted are user and model".
Tried switching to openAI compatible SDK, it threw errors for tool call calls and I just gave up.
Could you confirm if it was a known bug that was fixed?
Or not failing when passing `additionalProperties: false`
Or..
For other models, see this link and open up the collapsed section for your specific model: https://ai.google.dev/gemini-api/docs/models
Deleted Comment
I hope it doesn't become a trend on this site.
It's the best model out there.
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
https://github.com/ryao/gemini-chat
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
Example for 1.5:
https://github.com/googleapis/python-aiplatform/blob/main/ve...
We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.
Read timeouts aren't configurable on the Vertex SDK.
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
Google's stock performance, revenue growth, and political influence in Washington under his leadership has grown substantially. I don't disagree that there are even better CEO's out there, but as an investor, the framing of your question is way off. Given the financial performance, why would you want to replace him?
Maybe we’ll get a do-over with Google.