Cloud Run GPUs, now GA, makes running AI workloads easier for everyone

I love Google Cloud Run and highly recommend it as the best option[1]. The Cloud Run GPU, however is not something I can recommend. It is not cost effective (instance based billing is expensive as opposed to request based billing), GPU choices are limited, and the general loading/unloading of model (gigabytes) from GPU memory makes it slow to be used as server less.

Once you compare the numbers it is better to use a VM + GPU if the utilization of your service is even only for 30% of the day.

1 - https://ashishb.net/programming/free-deployment-of-side-proj...

gabe_monroy · 7 months ago

google vp here: we appreciate the feedback! i generally agree that if you have a strong understanding of your static capacity needs, pre-provisioning VMs is likely to be more cost efficient with today's pricing. cloud run GPUs are ideal for more bursty workloads -- maybe a new AI app that doesn't yet have PMF, where you really need that scale-to-zero + fast start for more sparse traffic patterns.

jakecodes · 7 months ago

Appreciate the thoughtful response! I’m actually right in the ICP you described — I’ve run my own VMs in the past and recently switched to Cloud Run to simplify ops and take advantage of scale-to-zero. In my case, I was running a few inference jobs and expected a ~$100 bill. But due to the instance-based behavior, it stayed up the whole time, and I ended up with a $1,000 charge for relatively little usage.

I’m fairly experienced with GCP, but even then, the billing model here caught me off guard. When you’re dealing with machines that can run up to $64K/month, small missteps get expensive quickly. Predictability is key, and I’d love to see more safeguards or clearer cost modeling tooling around these types of workloads.

krembo · 7 months ago

How does that compare to spinning up some ec2s with amazon trainium gpus?

Sn0wCoder · 7 months ago

Has this changed? When I looked pre-ga the requirements were you need to pay for the CPU 24x7 to attach a GPU so that is not really scaling to zero unless this requirement has changed...

icedchai · 7 months ago

Cloud Run is a great service. I find it much easier to work with than AWS's equivalent (ECS/Fargate.)

psanford · 7 months ago

AWS AppRunner is the closest equivalent to Cloud Run. Its really not close though, AppRunner is an unloved service at AWS and is missing a lot of the features that make Cloud Run nice.

gabe_monroy · 7 months ago

i am biased, but i agree :)

AChampaign · 7 months ago

I think Lambda is more or less the AWS equivalent.

mountainriver · 7 months ago

The problem is you can't reliably get VMs on GCP.

All the major clouds are suffering from this. AWS you can't ever get an 80gb gpu without a long term reserve and even then it's wildly expensive. GCP you can sometimes but its also insanely expensive.

These companies claim to be "startup friendly", they are anything but. All the neo-clouds somehow manage to do this well (runpod, nebius, lambda) but the big clouds are just milking enterprise customers who won't leave and in the process screwing over the startups.

This is a massive mistake they are making, which will hurt their long term growth significantly.

covi · 7 months ago

To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,

$ sky launch --gpus H100

will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or <insert>.

Essentially the way you deal with it is to increase the infra search space.

rendaw · 7 months ago

We've hit into this a lot lately too, even on AWS. "Elastic" compute, but all the elasticity's gone. It's especially bitter since splitting the costs for spare capacity is the major benefit of scale here...

dconden · 7 months ago

Agreed. Pricing is insane and availability generally sucks.

If anyone is curious about these neo-clouds, a YC startup called Shadeform has their availability and pricing in a live database here: https://www.shadeform.ai/instances

They have a platform where you can deploy VMs and bare metal from 20 or so popular ones like Lambda, Nebius, Scaleway, etc.

bodantogat · 7 months ago

I had the opposite experience with cloud run. Mysterious scale outs/restarts - I had to buy a paid subscription to cloud support to get answers and found none. Moved to self managed VMs. Maybe things have changed now.

PaulMest · 7 months ago

Sadly this is still the case. Cloud Run helped us get off the ground. But we've had two outages where Google Enhanced Support could give us no suggestion other than "increase the maximum instances" (not minimum instances). We were doing something like 13 requests/min on this instance at the time. The resource utilization looked just fine. But somehow we had a blip in any containers being available. It even dropped below our min containers. The fix was to manually redeploy the latest revision.

We're now investigating moving to Kubernetes where we will have more control over our destiny. Thankfully a couple people on the team have experience with this.

Something like this never happened with Fargate in the years my previous team had used that.

ajayvk · 7 months ago

https://github.com/claceio/clace is project I am building which gives a Cloud Run type deployment experience on your own VMs. For each app, it supports scale down to zero containers (scaling up beyond one is being built).

The authorization and auditing features are designed for internal tools, any app can be deployed otherwise.

Bombthecat · 7 months ago

You don't go to cloud services because they are cheaper.

You go there because you are already there or have contracts etc etc

JoshTriplett · 7 months ago

Does Cloud Run still use a fake Linux kernel emulated by Go, rather than a real VM?

Does Cloud Run give you root?

seabrookmx · 7 months ago

You're thinking of gvisor. But no, the "gen2" runtime is a microvm ala firecracker and performs a lot better as a result.

rpei · 7 months ago

We (I work on Cloud Run) are working on root access. If you'd like to know more you can reach me rpei@google.com

dig1 · 7 months ago

> I love Google Cloud Run and highly recommend it as the best option

I'd love to see the numbers for Cloud Run. It's nice for toy projects, but it's a money sink for anything serious, at least from my experience. On one project, we had a long-standing issue with G regarding autoscaling - scaling to zero sounds nice on paper, but they will not mention you the warmup phases where CR can spin up multiple containers for a single request and keep them for a while. And good luck hunting for unexplainedly running containers when there are no apparent cpu or network uses (G will happily charge you for this).

Additionally, startup is often abysmal with Java and Python projects (although it might perform better with Go/C++/Rust projects, but I don't have experience running those on CR).

tylertreat · 7 months ago

> It's nice for toy projects, but it's a money sink for anything serious, at least from my experience.

This is really not my experience with Cloud Run at all. We've found it to actually be quite cost effective for a lot of different types of systems. For example, we ended up helping a customer migrate a ~$5B/year ecommerce platform onto it (mostly Java/Spring and Typescript services). We originally told them they should target GKE but they were adamant about serverless and it ended up being a perfect fit. They were paying like $5k/mo which is absurdly cheap for a platform generating that kind of revenue.

I guess it depends on the nature of each workload, but for businesses that tend to "follow the sun" I've found it to be a great solution, especially when you consider how little operations overhead there is with it.

ivape · 7 months ago

Maybe I just don't know, but I really don't think most people here can even point to a cloud GPU with 1000 concurrent users and not end up with a million dollar bill.

All the cruft of a big cloud provider, AND the joy of uncapped yolo billing that has the potential to drain your creditcard overnight. No thanks, I'll personally stick with Modal and vast.ai

montebicyclelo · 7 months ago

Not providing a cap on spending is a major flaw of GCP for individuals / small projects.

With Cloud Run, AFAIK, spending can effectively be capped by: limiting concurrency, plus limiting the max number of instances it can scale to. (But this is not as good as GCP having a proper cap.)

brutus1213 · 7 months ago

Amazon is the same I think? I live in constant fear we will have a runaway job one day. I get daily emails to myself (as a manager) and to my finance person. We had one instance where a team member forgot to turn off a machine for a few months :(

I get why it is a business strategy to not have limits .. but I wonder if providers would get more usage if people had more trusts on costs/predictability.

yarri · 7 months ago

[edit - Gabe responded]. See this Cloud Run spending cap recommendation [0] to disable billing, which potentially irreversibly deletes resources but does cap spend!

[0] https://cloud.google.com/billing/docs/how-to/disable-billing...

gabe_monroy · 7 months ago

Heard on this feedback. While not quite a hard cap, I'd also point to https://cloud.google.com/billing/docs/how-to/budgets which many customers are having success with for this use case.

advisedwang · 7 months ago

It's rock and a hard place for the cloud providers.

Cap billing, and you have created an outage waiting to happen, one that will be triggered if they ever have sudden success growth.

Don't cap billing, and you have created a bankruptcy waiting to happen.

delfinom · 7 months ago

Flaw? Nah

Feature for Google's profits.

kamranjon · 7 months ago

I dunno, the scale to zero and pay per second features seemed super useful to me after forgetting to shut down some training instances with AWS. Also the fast startup ability, if it actually works as well as they say, would be amazing for a lot of the type of workloads that I have.

isoprophlex · 7 months ago

Agreed, but runpod or modal offer the same. Happy to use big cloud for a client if they pay the bills, but for personal quests... too scary.

petesergeant · 7 months ago

I've abandoned DataDog in production for just this reason. Is the amount of money they make on dinging people who screw up really worth the ill-will and people who decide they're just not going to start projects on these platforms?

geodel · 7 months ago

> Is the amount of money they make on dinging people who screw up really worth the ill-will

I think it is .

1) They make money for services they provided instead of looking into meaning of what customer actually wanted.

2) Small time customers move away so they concentrate energy on big enterprise sales.

Not justifying anything here but it just kind of make business sense for them.

decimalenough · 7 months ago

You can set max instances in Cloud Run, which is an effective limit on how much you'll spend.

Also, hard dollar caps are rarely if ever the right choice. App Engine used to have these, and the practical effect was that your website would completely stop working exactly when you least want it to (posted on HN etc).

It's better to set billing alerts and make the call yourself if they go off.

rustc · 7 months ago

> Also, hard dollar caps are rarely if ever the right choice.

Depends on if you're a big business or an individual. There is absolutely no reason I would ever pay $100k for a traffic burst on my personal site or side project (like the $100k Netlify case a few months ago).

> It's better to set billing alerts and make the call yourself if they go off.

Billing alerts are not instant and neither is anyone online 24x7 monitoring the alerts.

ipaddr · 7 months ago

One bad actor / misconfiguration / attack can put you out of business. It not the safest strategy to allow unlimited liability in business or for personal projects.

spacecadet · 7 months ago

Runpod is pretty great. I wrote some genetic end point script that I can deploy in seconds, download the models to the pod, and Im ready to go. Plus I forgot and left a pod running, but down, for a week and it was like 0.60, and they emailed me like 3 times reminding me of the pod.

nprateem · 7 months ago

Cloud Run is great but no billing limits is too scary. No idea why they don't address this. They must know if they support individuals we'll eventually leave our saases there.

randlet · 7 months ago

Setting max instances effectively caps your spend right?

weinzierl · 7 months ago

I never used modal or vast.ai and from their pages it was not obvious how they solve the yolo billing issue? Are they pre-paid or do they support caps?

thundergolfer · 7 months ago

Engineer from Modal here: we support caps. They kick in within ~2s if your usage exceeds the configured limit.

sharifhsn · 7 months ago

I know vast.ai uses a prepaid credits system.

oldandboring · 7 months ago

> uncapped yolo billing

This made me laugh out loud, thank you for this!

rikafurude21 · 7 months ago

thats what billing limits are for

isoprophlex · 7 months ago

Unless something changed gcp only does billing alerts, not billing limits

aiiizzz · 7 months ago

Those, on gcp, are just alerts, not hard limits, no?

1x L4 24GB: google: $0.71; runpod.io: $0.43, spot: $0.22 4x L4 24GB: google: $4.00; runpod.io: $1.72, spot: $0.88 1x A100 80GB: google: $5.07; runpod.io: $1.64, spot: $0.82; vast.ai $0.880, spot: $0.501 1x H100 80GB: google: $11.06; runpod.io: $2.79, spot: $1.65; vast.ai $1.535, spot: $0.473 8x H200 141GB: google: $88.08; runpod.io: $31.92; vast.ai $15.470, spot: $14.563