Google Gemini has the worst LLM API

I still don't really understand what Vertex AI is.

If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.

I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...

You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.

Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai

Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk

anaisbetts · 4 months ago

It's a way for you to have your AI billing under the same invoice as all of your other cloud purchases. If you're a startup this is a dumb feature, if you work at a $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

blitzar · 4 months ago

> $ENTERPRISE_BIGCO, it just saved you 6mo+ of fighting with IT / Legal / various annoying middle managers

What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.

Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.

progbits · 4 months ago

It's also useful in a startup, I just start using it with zero effort.

For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.

NoahZuniga · 4 months ago

This is not true??? The AI studio surface is also billed on a per project basis?

bn-l · 4 months ago

ah! thank you. I was also struggling with where vertex fitted.

tzury · 4 months ago

Vertex by example:

    creds = service_account.Credentials.from_service_account_file(
        SA_FILE,
        scopes=[
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/generative-language",
        ]
    )


    google.genai.Client(
        vertexai=True,
        project=PROJECT_ID,
        location=LOCATION,
        http_options={"api_version": "v1beta1"},
        credentials=sa_creds,
    )

That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".

Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.

For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.

Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.

sitefail1 · 4 months ago

I don't think a service account vs an API key would improve performance in any meaningful way. I doubt the AI endpoint is authenticating the API key against a central database every request, it will most certainly be cached against a service key in the same AZ or whatever GCP call it.

ivanvanderbyl · 4 months ago

Service account file vs API Key have similar security risks if provided the way you are using them. Google recommends using ADC and it’s actually an org policy recommendation to disable SA files.

logankilpatrick · 4 months ago

The startup credits are fully compatible with AI Studio, they are not specific to Vertex.

laborcontract · 4 months ago

Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.

I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.

chrisheecho · 4 months ago

I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.

tyre · 4 months ago

Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.

__jl__ · 4 months ago

Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.

logankilpatrick · 4 months ago

Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.

mgraczyk · 4 months ago

OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.

Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft

simonw · 4 months ago

I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.

minimaxir · 4 months ago

From the linked docs:

> If you want to disable thinking, you can set the reasoning effort to "none".

For other APIs, you can set the thinking tokens to 0 and that also works.

chrisheecho · 4 months ago

We built the OpenAI Compatible API (https://cloud.google.com/vertex-ai/generative-ai/docs/multim...) layer to help customers that are already using OAI library to test out Gemini easily with basic inference but not as a replacement library for the genai sdk (https://github.com/googleapis/python-genai). We recommend using th genai SDK for working with Gemini.

logankilpatrick · 4 months ago

This is documented for AI Studio here: https://ai.google.dev/gemini-api/docs/openai#thinking

Aeolun · 4 months ago

When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…

franze · 4 months ago

yeah, 2 days to get Google OAuth flow integrated into an background app/script, 1 day coding for the actual app ...

shresbm123 · 4 months ago

We support reasoning_effort = none. That will let you disable flash 2 thinking. We will document it better.

omneity · 4 months ago

JSONSchema support on Google's OpenAI-compatible API is very lackluster and limiting. My biggest gripe really.

chrisheecho · 4 months ago

simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.

For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.

However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.

troupo · 4 months ago

Why create two different APIs that are the same, but only subtly different, and have several different SDKs?

unknown_user_84 · 4 months ago

Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?

logankilpatrick · 4 months ago

we will have a dashboard in AI Studio very soon! Then will work to drive down delay.

coredog64 · 4 months ago

Wait until you see how to check Bedrock usage in AWS.

(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)

jacob019 · 4 months ago

Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.

ashu1461 · 4 months ago

There is also no way to over-write content moderation settings, and half of the responses you generate via open ai endpoint end up being moderated.

fzysingularity · 4 months ago

Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.

FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.

minimaxir · 4 months ago

Vertex AI is essentially a rebranding of their more enterprise platform on GCP, nothing explicitly "new."

ashu1461 · 4 months ago

Have to work hard to figure out the difference between

- Vertex AI

- AI Studio

- Gemini

- Firebase Gen AI

hustwindmaple1 · 4 months ago

If you are not a paying GCP user, there is really no point to even look at Vertex AI.

Just stick with AI Studio and the free developer AI along with it; you will be much much happier.

egamirorrim · 4 months ago

I use Vertex because that's the one that makes enterprise security people happy about how our datas handled.

Do Google use all the AI studio traffic to train etc?

sunaookami · 4 months ago

Not if you have billing enabled: https://ai.google.dev/gemini-api/docs/pricing

KTibow · 4 months ago

Vertex is the enterprise platform. It also happens to have much higher rate limits, even for free models.

Hey there, I’m Chris Cho (x: chrischo_pm, Vertex PM focusing on DevEx) and Ivan Nardini (x: ivnardini, DevRel). We heard you and let us answer your questions directly as possible.

First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex

We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.

I’m going to respond to some of the comments given here directly on the thread

ctxc · 4 months ago

Had to move away from Gemini because the SDK just didn't work.

Regardless of if I passed a role or not, the function would say something to the effect of "invalid role, accepted are user and model".

Tried switching to openAI compatible SDK, it threw errors for tool call calls and I just gave up.

Could you confirm if it was a known bug that was fixed?

ctxc · 4 months ago

The error fyr https://x.com/dvsj_in/status/1895522286297567369?t=qYLx3kchj...

Deathmax · 4 months ago

Can we avoid weekend changes to the API? I know it's all non-GA, but having `includeThoughts` suddenly work at ~10AM UTC on a Sunday and the raw thoughts being returned after they were removed is nice, but disruptive.

chrisheecho · 4 months ago

Can you tell me the exact instance when this happened please? I will take this feedback back to my colleagues. But in order to change how we behave I need a baseline and data

jbellis · 4 months ago

Can you ask whoever owns dashboards to make it so I can troubleshoot quota exceeded errors like this? https://x.com/spyced/status/1917635135840858157

logankilpatrick · 4 months ago

We are working on fixing this and showing the critical ones in AIS. I agree it is crazy there is 700+ items here. Real pain in the neck to deal with.

egamirorrim · 4 months ago

I love that you're responding on HN, thanks for that! While you're here I don't suppose you can tell me when Gemini 2.5 Pro is hitting European regions on Vertex? My org forbids me from using it until then.

m3adow · 4 months ago

Yeah, not having clear time lines for new releases on the one hand, but being quick with deprecation of older models isn't a very good experience.

froggertoaster · 4 months ago

Thanks for replying, and I can safely say that most of us just want first-class conformity with OpenAI's API without JSON schema weirdness (not using refs, for instance) baked in.

troupo · 4 months ago

Or returning null for null values, not some "undefined" string.

Or not failing when passing `additionalProperties: false`

Or..

irthomasthomas · 4 months ago

Hi, one thing I am really struggling with in AI studio API is stop_sequences. I know how to request them, but cannot see how to determine which stop_sequence was triggered. They don't show up in the stop_reason like most other APIs. Is that something which vertex API can do? I've built some automation tools around stop_sequences, using them for control logic, but I can't use Gemini as the controller without a lot of brittle parsing logic.

shresbm123 · 4 months ago

Thank you feedback noted

troupo · 4 months ago

Is there an undocumented hardcoded timeout for Gemini responses even in streaming mode? JSON output according to a schema can get quite lengthy, and I can't seem to get all of it for some inputs because Gemini seemingly terminates requests

NoahZuniga · 4 months ago

This is probably just you hitting the model's internal output length maximum. Its 65,536 tokens for 2.5 pro and flash.

For other models, see this link and open up the collapsed section for your specific model: https://ai.google.dev/gemini-api/docs/models

Deleted Comment

moralestapia · 4 months ago

This is so cringe.

I hope it doesn't become a trend on this site.

thebytefairy · 4 months ago

A team taking the opportunity to engage directly with their users to understand their feedback so they can improve the product? So cringe.

lern_too_spel · 4 months ago

Google usually doesn't care what users say at all. This is why they so often have product-crippling bugs and missing features. At least this guy is making a show of trying before he transfers to another project.

tgv · 4 months ago

It’s the US style, which has made its way across the pond too: you have to make upbeat noises to remove any suspicion you’re criticizing.