Readit News logoReadit News
ashishb · 3 months ago
I love Google Cloud Run and highly recommend it as the best option[1]. The Cloud Run GPU, however is not something I can recommend. It is not cost effective (instance based billing is expensive as opposed to request based billing), GPU choices are limited, and the general loading/unloading of model (gigabytes) from GPU memory makes it slow to be used as server less.

Once you compare the numbers it is better to use a VM + GPU if the utilization of your service is even only for 30% of the day.

1 - https://ashishb.net/programming/free-deployment-of-side-proj...

gabe_monroy · 3 months ago
google vp here: we appreciate the feedback! i generally agree that if you have a strong understanding of your static capacity needs, pre-provisioning VMs is likely to be more cost efficient with today's pricing. cloud run GPUs are ideal for more bursty workloads -- maybe a new AI app that doesn't yet have PMF, where you really need that scale-to-zero + fast start for more sparse traffic patterns.
jakecodes · 3 months ago
Appreciate the thoughtful response! I’m actually right in the ICP you described — I’ve run my own VMs in the past and recently switched to Cloud Run to simplify ops and take advantage of scale-to-zero. In my case, I was running a few inference jobs and expected a ~$100 bill. But due to the instance-based behavior, it stayed up the whole time, and I ended up with a $1,000 charge for relatively little usage.

I’m fairly experienced with GCP, but even then, the billing model here caught me off guard. When you’re dealing with machines that can run up to $64K/month, small missteps get expensive quickly. Predictability is key, and I’d love to see more safeguards or clearer cost modeling tooling around these types of workloads.

krembo · 3 months ago
How does that compare to spinning up some ec2s with amazon trainium gpus?
Sn0wCoder · 3 months ago
Has this changed? When I looked pre-ga the requirements were you need to pay for the CPU 24x7 to attach a GPU so that is not really scaling to zero unless this requirement has changed...
icedchai · 3 months ago
Cloud Run is a great service. I find it much easier to work with than AWS's equivalent (ECS/Fargate.)
psanford · 3 months ago
AWS AppRunner is the closest equivalent to Cloud Run. Its really not close though, AppRunner is an unloved service at AWS and is missing a lot of the features that make Cloud Run nice.
gabe_monroy · 3 months ago
i am biased, but i agree :)
AChampaign · 3 months ago
I think Lambda is more or less the AWS equivalent.
mountainriver · 3 months ago
The problem is you can't reliably get VMs on GCP.

All the major clouds are suffering from this. AWS you can't ever get an 80gb gpu without a long term reserve and even then it's wildly expensive. GCP you can sometimes but its also insanely expensive.

These companies claim to be "startup friendly", they are anything but. All the neo-clouds somehow manage to do this well (runpod, nebius, lambda) but the big clouds are just milking enterprise customers who won't leave and in the process screwing over the startups.

This is a massive mistake they are making, which will hurt their long term growth significantly.

covi · 3 months ago
To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,

$ sky launch --gpus H100

will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or <insert>.

Essentially the way you deal with it is to increase the infra search space.

rendaw · 3 months ago
We've hit into this a lot lately too, even on AWS. "Elastic" compute, but all the elasticity's gone. It's especially bitter since splitting the costs for spare capacity is the major benefit of scale here...
dconden · 3 months ago
Agreed. Pricing is insane and availability generally sucks.

If anyone is curious about these neo-clouds, a YC startup called Shadeform has their availability and pricing in a live database here: https://www.shadeform.ai/instances

They have a platform where you can deploy VMs and bare metal from 20 or so popular ones like Lambda, Nebius, Scaleway, etc.

bodantogat · 3 months ago
I had the opposite experience with cloud run. Mysterious scale outs/restarts - I had to buy a paid subscription to cloud support to get answers and found none. Moved to self managed VMs. Maybe things have changed now.
PaulMest · 3 months ago
Sadly this is still the case. Cloud Run helped us get off the ground. But we've had two outages where Google Enhanced Support could give us no suggestion other than "increase the maximum instances" (not minimum instances). We were doing something like 13 requests/min on this instance at the time. The resource utilization looked just fine. But somehow we had a blip in any containers being available. It even dropped below our min containers. The fix was to manually redeploy the latest revision.

We're now investigating moving to Kubernetes where we will have more control over our destiny. Thankfully a couple people on the team have experience with this.

Something like this never happened with Fargate in the years my previous team had used that.

ajayvk · 3 months ago
https://github.com/claceio/clace is project I am building which gives a Cloud Run type deployment experience on your own VMs. For each app, it supports scale down to zero containers (scaling up beyond one is being built).

The authorization and auditing features are designed for internal tools, any app can be deployed otherwise.

Bombthecat · 3 months ago
You don't go to cloud services because they are cheaper.

You go there because you are already there or have contracts etc etc

JoshTriplett · 3 months ago
Does Cloud Run still use a fake Linux kernel emulated by Go, rather than a real VM?

Does Cloud Run give you root?

seabrookmx · 3 months ago
You're thinking of gvisor. But no, the "gen2" runtime is a microvm ala firecracker and performs a lot better as a result.
rpei · 3 months ago
We (I work on Cloud Run) are working on root access. If you'd like to know more you can reach me rpei@google.com
dig1 · 3 months ago
> I love Google Cloud Run and highly recommend it as the best option

I'd love to see the numbers for Cloud Run. It's nice for toy projects, but it's a money sink for anything serious, at least from my experience. On one project, we had a long-standing issue with G regarding autoscaling - scaling to zero sounds nice on paper, but they will not mention you the warmup phases where CR can spin up multiple containers for a single request and keep them for a while. And good luck hunting for unexplainedly running containers when there are no apparent cpu or network uses (G will happily charge you for this).

Additionally, startup is often abysmal with Java and Python projects (although it might perform better with Go/C++/Rust projects, but I don't have experience running those on CR).

tylertreat · 3 months ago
> It's nice for toy projects, but it's a money sink for anything serious, at least from my experience.

This is really not my experience with Cloud Run at all. We've found it to actually be quite cost effective for a lot of different types of systems. For example, we ended up helping a customer migrate a ~$5B/year ecommerce platform onto it (mostly Java/Spring and Typescript services). We originally told them they should target GKE but they were adamant about serverless and it ended up being a perfect fit. They were paying like $5k/mo which is absurdly cheap for a platform generating that kind of revenue.

I guess it depends on the nature of each workload, but for businesses that tend to "follow the sun" I've found it to be a great solution, especially when you consider how little operations overhead there is with it.

ivape · 3 months ago
Maybe I just don't know, but I really don't think most people here can even point to a cloud GPU with 1000 concurrent users and not end up with a million dollar bill.
isoprophlex · 3 months ago
All the cruft of a big cloud provider, AND the joy of uncapped yolo billing that has the potential to drain your creditcard overnight. No thanks, I'll personally stick with Modal and vast.ai
montebicyclelo · 3 months ago
Not providing a cap on spending is a major flaw of GCP for individuals / small projects.

With Cloud Run, AFAIK, spending can effectively be capped by: limiting concurrency, plus limiting the max number of instances it can scale to. (But this is not as good as GCP having a proper cap.)

brutus1213 · 3 months ago
Amazon is the same I think? I live in constant fear we will have a runaway job one day. I get daily emails to myself (as a manager) and to my finance person. We had one instance where a team member forgot to turn off a machine for a few months :(

I get why it is a business strategy to not have limits .. but I wonder if providers would get more usage if people had more trusts on costs/predictability.

yarri · 3 months ago
[edit - Gabe responded]. See this Cloud Run spending cap recommendation [0] to disable billing, which potentially irreversibly deletes resources but does cap spend!

[0] https://cloud.google.com/billing/docs/how-to/disable-billing...

gabe_monroy · 3 months ago
Heard on this feedback. While not quite a hard cap, I'd also point to https://cloud.google.com/billing/docs/how-to/budgets which many customers are having success with for this use case.
advisedwang · 3 months ago
It's rock and a hard place for the cloud providers.

Cap billing, and you have created an outage waiting to happen, one that will be triggered if they ever have sudden success growth.

Don't cap billing, and you have created a bankruptcy waiting to happen.

delfinom · 3 months ago
Flaw? Nah

Feature for Google's profits.

kamranjon · 3 months ago
I dunno, the scale to zero and pay per second features seemed super useful to me after forgetting to shut down some training instances with AWS. Also the fast startup ability, if it actually works as well as they say, would be amazing for a lot of the type of workloads that I have.
isoprophlex · 3 months ago
Agreed, but runpod or modal offer the same. Happy to use big cloud for a client if they pay the bills, but for personal quests... too scary.
petesergeant · 3 months ago
I've abandoned DataDog in production for just this reason. Is the amount of money they make on dinging people who screw up really worth the ill-will and people who decide they're just not going to start projects on these platforms?
geodel · 3 months ago
> Is the amount of money they make on dinging people who screw up really worth the ill-will

I think it is .

1) They make money for services they provided instead of looking into meaning of what customer actually wanted.

2) Small time customers move away so they concentrate energy on big enterprise sales.

Not justifying anything here but it just kind of make business sense for them.

decimalenough · 3 months ago
You can set max instances in Cloud Run, which is an effective limit on how much you'll spend.

Also, hard dollar caps are rarely if ever the right choice. App Engine used to have these, and the practical effect was that your website would completely stop working exactly when you least want it to (posted on HN etc).

It's better to set billing alerts and make the call yourself if they go off.

rustc · 3 months ago
> Also, hard dollar caps are rarely if ever the right choice.

Depends on if you're a big business or an individual. There is absolutely no reason I would ever pay $100k for a traffic burst on my personal site or side project (like the $100k Netlify case a few months ago).

> It's better to set billing alerts and make the call yourself if they go off.

Billing alerts are not instant and neither is anyone online 24x7 monitoring the alerts.

ipaddr · 3 months ago
One bad actor / misconfiguration / attack can put you out of business. It not the safest strategy to allow unlimited liability in business or for personal projects.
spacecadet · 3 months ago
Runpod is pretty great. I wrote some genetic end point script that I can deploy in seconds, download the models to the pod, and Im ready to go. Plus I forgot and left a pod running, but down, for a week and it was like 0.60, and they emailed me like 3 times reminding me of the pod.
nprateem · 3 months ago
Cloud Run is great but no billing limits is too scary. No idea why they don't address this. They must know if they support individuals we'll eventually leave our saases there.
randlet · 3 months ago
Setting max instances effectively caps your spend right?
weinzierl · 3 months ago
I never used modal or vast.ai and from their pages it was not obvious how they solve the yolo billing issue? Are they pre-paid or do they support caps?
thundergolfer · 3 months ago
Engineer from Modal here: we support caps. They kick in within ~2s if your usage exceeds the configured limit.
sharifhsn · 3 months ago
I know vast.ai uses a prepaid credits system.
oldandboring · 3 months ago
> uncapped yolo billing

This made me laugh out loud, thank you for this!

rikafurude21 · 3 months ago
thats what billing limits are for
isoprophlex · 3 months ago
Unless something changed gcp only does billing alerts, not billing limits
aiiizzz · 3 months ago
Those, on gcp, are just alerts, not hard limits, no?
mythz · 3 months ago
The pricing doesn't look that compelling, here are the hourly rate comparisons vs runpod.io vs vast.ai:

    1x L4 24GB:    google:  $0.71; runpod.io:  $0.43, spot: $0.22
    4x L4 24GB:    google:  $4.00; runpod.io:  $1.72, spot: $0.88
    1x A100 80GB:  google:  $5.07; runpod.io:  $1.64, spot: $0.82; vast.ai  $0.880, spot:  $0.501
    1x H100 80GB:  google: $11.06; runpod.io:  $2.79, spot: $1.65; vast.ai  $1.535, spot:  $0.473
    8x H200 141GB: google: $88.08; runpod.io: $31.92;              vast.ai $15.470, spot: $14.563
Google's pricing also assumes you're running it 24/7 for an entire month, where as this is just the hourly price for runpod.io or vast.ai which both bill per second. Wasn't able to find Google's spot pricing for GPUs.

otherjason · 3 months ago
Where did you get the pricing for vast.ai here? Looking at their pricing page, I don't see any 8xH200 options for less than $21.65 an hour (and most are more than that).
zackangelo · 3 months ago
I think it’s a typo, looks pretty close to their 8xH100 prices.
steren · 3 months ago
> Google's pricing also assumes you're running it 24/7 for an entire month

What makes you think that?

Cloud Run [pricing page](https://cloud.google.com/run/pricing) explicitly says : "charge you only for the resources you use, rounded up to the nearest 100 millisecond"

Also, Cloud Run's [autoscalling](https://cloud.google.com/run/docs/about-instance-autoscaling) is in effect, scaling down idle instances after a maximum of 15 minutes.

(Cloud Run PM)

mythz · 3 months ago
Because the pricing when creating an instance shows me the cost for the entire month, then works out the average hourly price based on that. This is just creating a GPU VM instance, I don't see how to see the cost of different NVidia GPUs without it.

If you wanted to show hourly pricing, you would show that first, then calculate the monthly price from the hourly rate. I've no idea if the monthly cost includes sustained usage discount and what the hourly cost is for just running it for an hour.

progbits · 3 months ago
You can just go to "create compute instance" to see the spot pricing.

Eg GCP price for spot 1xH100 is $2.55/hr, lower with sustained use discounts. But only hobbyists pay these prices, any company is going to ask for a discount and will get it.

counters · 3 months ago
Nothing but 1xL4 are even offered on Cloud Run GPUs, are they?
ZiiS · 3 months ago
I think the Google prices are billed per-second so under 20min you are better on Google?
mythz · 3 months ago
RunPod also charges per second [1], also this is Google's expected avg cost per hour after running it 24/7 for an entire month, I couldn't find an hourly cost for each GPU.

When you need under <1hr than you can go with Runpod's Spot pricing which is ~4-7x cheaper than Google, where even 20min of Google would cost more than 1hr on RunPod.

[1] https://docs.runpod.io/serverless/pricing

thousand_nights · 3 months ago
runpod is billed by the minute
jbarrow · 3 months ago
I’m personally a huge fan of Modal, and have been using their serverless scale-to-zero GPUs for a while. We’ve seen some nice cost reductions from using them, while also being able to scale WAY UP when needed. All with minimal development effort.

Interesting to see a big provider entering this space. Originally swapped to Modal because big providers weren’t offering this (e.g. AWS lambdas can’t run on GPU instances). Assuming all providers are going to start moving towards offering this?

scj13 · 3 months ago
Modal is great, they even released a deep dive into their LP solver for how they're able to get GPUs so quickly (and cheaply).

Coiled is another option worth looking at if you're a Python developer. Not nearly as fast on cold start as Modal, but similarly easy to use and great for spinning up GPU-backed VMs for bursty workloads. Everything runs in your cloud account. The built-in package sync is also pretty nice, it auto-installs CUDA drivers and Python dependencies from your local dev context.

(Disclaimer: I work with Coiled, but genuinely think it's a good option for GPU serverless-ish workflows. )

AndresSRG · 3 months ago
I’m also a big fan.

Modal has the fastest cold-start I’ve seen for 10GB+ models.

dr_kiszonka · 3 months ago
Thanks for sharing! They even support running HIPAA-compliant workloads, which I didn't anticipate.
chrishare · 3 months ago
Modal documentation is also very good.
montebicyclelo · 3 months ago
Reason Cloud Run is so nice compared to other providers is that it has autoscaling, with scaling to 0. Meaning it can cost basically 0 if it's not being used. Also can set a cap on the scaling, e.g. 5 instances max, which caps the max cost of the service too. - Note, I only have experience with the CPU version of Cloud Run, (which is very reliable / easy).
rvnx · 3 months ago
Even regular Cloud Run can take a lot of time to boot (~3 to 30 seconds), so this can be a problem when scaling to 0
gizzlon · 3 months ago
That's not my experience, using Go. Never measured, but it goes to 0 all the time, so I would definitely noticed more than a couple of seconds.
mdhb · 3 months ago
I’m looking at logs for a service I run on cloud run right now which scales to zero. Boot times are approximate 200ms for a Dart backend.
lexandstuff · 3 months ago
Not to mention, if it's an ML workload, you'll also have to factor in downloading the weights and loading them into memory, which can double that time or more.
huksley · 3 months ago
A small and independent EU GPU cloud provider, DataCrunch (I am not affiliated), offers VMs with Nvidia GPUs even cheaper than Run Pod, etc

1x A100 80Gb 1.37€/hour

1x H100 80Gb 2.19€/hour

sigmoid10 · 3 months ago
That's funny. You can get a 1x H100 80Gb VM at lambda.ai for $2.49/hour. At the current exchange rate, that's exactly 2.19€. Coincidence or is this actually some kind of ceiling?
diggan · 3 months ago
Or go P2P with Vast.ai, cheapest A100 right now is a setup with 2x A100 for $0.8/hour (so $0.4 per A100). Not affiliated with them, but mostly happy user. Be vary of network speeds though, some hosts are clearly on shared bandwidth and reported numbers don't always line up with reality, which kind of sucks when you're trying to shuffle around 100GB of data.
triknomeister · 3 months ago
You really need NVL for some performance.
gabe_monroy · 3 months ago
i'm the vp/gm responsible for cloud run and GKE. great to see the interest in this! happy to answer questions on this thread.
lemming · 3 months ago
If I understand this correctly, I should be able to stand up an API running arbitrary models (e.g. from Hugging Face), and it’s not quite charged by the token but should be very cheap if my usage is sporadic. Is that correct? Seems pretty huge if so, most of the providers I looked at required a monthly fee to run a custom model.
lexandstuff · 3 months ago
Yes, that's basically correct. Except be warned that the cold start times can be huge (30-60 seconds). So scaling to 0 doesn't really work in practice, unless your users are happy to wait from time to time. Also, you also have to pay a small monthly fee for container storage (and a few other charges iirc).
42lux · 3 months ago
Runpod, vast, coreweave, replicate... just a bunch of alternatives that let you run serverless GPU inference.
_zoltan_ · 3 months ago
you can't just sign up for coreweave, can you?