Ask HN: How can I quickly trim my AWS bill?

Disclosure: I work on Google Cloud (but my advice isn’t to come to us).

Sorry to hear that. I’m sure it’s super stressful, and I hope you pull through. If you can, I’d suggest giving a little more information about your costs / workload to get more help. But, in case you only see yet another guess, mine is below.

If your growth has accelerated yielding massive cost, I assume that means you’re doing inference to serve your models. As suggested by others, there are a few great options if you haven’t already:

- Try spot instances: while you’ll get preempted, you do get a couple minutes to shut down (so for model serving, you just stop accepting requests, finish the ones you’re handling and exit). This is worth 60-90% of compute reduction.

- If you aren’t using the T4 instances, they’re probably the best price/performance for GPU inference. If you’re using a V100 by comparison that’s up to 5-10x more expensive.

- However, your models should be taking advantage of int8 if possible. This alone may let you pack more requests per part. (Another 2x+)

- You could try to do model pruning. This is perhaps the most delicate, but look at things like how people compress models for mobile. It has a similar-ish effect on trying to pack more weights into smaller GPUs, or alternatively you can do a lot simpler model (less weights and less connections also often means a lot less flops).

- But just as much: why do you need a GPU for your models? (Usually it’s to serve a large-ish / expensive model quickly enough). If you’re going to be out of business instead, try cpu inference again on spot instances (like the c5 series). Vectorized inference isn’t bad at all!

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Remember, your users / customers won’t somehow be happier when you’re out of business in a month. Making all requests suddenly take 3x as long on a cpu or sometimes fail, is better than “always fail, we had to shut down the company”. They’ll understand!

ParanoidShroom · 5 years ago

I was in the same boat and this is good advice!

I stopped using gpu's, "Vectorized inference isn’t bad at all!". This soo much, I was blinded with gpu speed, using tensorflow builds with avx optimization is actually pretty fast.

My discovery:

+ Stop expensive GPU's for inference and switch to avx optimized tensorflow builds.

+ Cleaned up the inference pipeline and reduced complexity.

+ Buying compute instance for a year or more provides a discount.

- I never got pruning to work without a significant loss increase.

- Tried spot instances with gpu's that are cheaper. Random kills and spinning up new instances took too long loading my code. The discount is a lot, but I couldn't reliable get it up. Users where getting more timeouts. I bailed and just used cpu inference. The gpu was being underutilized, using cpu only increased the inference to around 2-3 seconds. With the price trade off it was a more simpel,cheaper and easier solution.

jwr · 5 years ago

Also, consider physical servers from providers like Hetzner. These can be several times cheaper than EC2.

boulos · 5 years ago

Oh and I should have said why they shouldn’t bother attempting to migrate somewhere “cheaper” (whether GCP, Hetzner, or whatever else): it doesn’t sound like they have time. I read the call for help as: we need something we can do in the next week or two to keep us in business. Any “move the infrastructure” plan will take too long and you should still do the “choose the right GPU / CPU, optimize your precision” change no matter what.

AWS/clouds aren't always the best solution for a problem. Often they're the worst (just like any other tool).

You don't provide a lot of detail but I imagine at this point you need to get "creative" and move at least some aspect of your operation out of AWS. Some variation of:

- Buy some hardware and host it at home/office/etc.

- Buy some hardware and put it in a colocation facility.

- Buy a lot of hardware and put it in a few places.

Etc.

Cash and accounting is another problem. Hardware manufacturers offer financing (leasing). Third party finance companies offer lines of credit, special leasing, etc. Even paying cash outright can (in certain cases) be beneficial from a tax standpoint. If you're in the US there's even the best of both worlds: a Section 179 deduction on a lease!

https://www.section179.org/section_179_leases/

You don't even need to get dirty. Last I checked it was pretty easy to get financing from Dell, pay next to nothing to get started, and have hardware shipped directly to a co-location facility. Remote hands rack and configure it for you. You get a notification with a system to log into just like an AWS instance. All in at a fraction of the cost. The dreaded (actually very rare) hardware failure? That's what the warranty is for. Dell will dispatch people to the facility and replace XYZ as needed. You never need to physically touch anything.

A little more complicated than creating an AWS account with a credit card number? Of course. More management? Slightly. But at the end of the day it's a fraction of the total cost and probably even advantageous from a taxation standpoint.

AWS and public clouds really shine in some use cases and absolutely suck at others (as in suck the cash right out of your pockets).

tarun_anand · 5 years ago

100% agree. Most public clouds are ripoffs. We have spent 11 years on it and now thrown in the towel.

Go for some colocation facility where costs are predictable.

dspillett · 5 years ago

It depends on your use case and internal infrastructure support. A lot of start-ups start on "cloud" when they have unpredictable needs and little immediate cash for kit & sys-admins (to manage more than the bare servers: backups and monitoring and other tasks that a cloud arrangement will offer the basics of at least, will need to be managed by you or a paid 3rd party on your kit). Later when things have settled they can move to more static kit and make a saving in cost at the expense of the flexibility (that they no longer need). Or they go hybrid if their product & architecture allows it: own kit for the static work, spreading load out to the cloud if a temporary boost of CPU/GPU/similar power is needed (this works best for loosely-coupled compute-intensive workloads, which may be the case here depending on exactly what they are trying to get out of ML and what methods & datasets are involved).

zupa-hu · 5 years ago

This should be top voted. Buy the hardware and expect your costs to fall 10x.

peterwwillis · 5 years ago

There are also more upfront costs (not just monetary), you can't scale quicky, and you lose all the managed solutions that make building things super fast and effective. Your hardware cost may be lower 10x but the operational and developmental cost will be higher as well as a limit on your business to grow.

A balanced approach is to only put the most expensive hardware portion of the business with the smallest availability requirement in colo, and horizontally scale it over time. Simultaneously use a cloud provider to execute on the cheap stuff fast and reliably.

philliphaydon · 5 years ago

> AWS/clouds aren't always the best solution for a problem.

And when they aren’t always the best. It’s often because you don’t know what you’re doing.

It’s too uncommon for people to over provision. Or go with too many services when they don’t need to.

Like let’s have a database and cache service and search search. When 95% of the time they only need the database because it can do full text searching adequate enough and they don’t have the traffic to warrant caching in redis, and can do basic caching.

They don’t take advantage of auto scale groups, or run instances that are over provisioned 24/7.

I’ve seen database instances where when it’s slow they throw more hardware at it instead of optimising the queries and analysing / adding indexes.

The biggest cost of cloud providers is outbound data. The rest is almost always the problem of the Developers.

rumanator · 5 years ago

None of your comments are relevant to machine learning applications, and all you do is throw blanket statements about ignorance. Your comments are very far from the problem and from being helpful.

geofft · 5 years ago

Nothing is stopping you from applying all those optimizations to on-premise hardware, right?

That is, I am not sure "public cloud, if you spend lots of effort to optimize it and ask devs to be careful, can be as cheap as a naive on-prem implementation where devs don't need to be careful" is an argument for public cloud.

kkielhofner · 5 years ago

stratified · 5 years ago

[DISCLAIMER] I work at AWS, not speaking for my employer.

We really need some more details on your infrastructure, but I assume it's EC2 instance cost that skyrocketed?

A couple of pointers:

- Experiment with different GPU instance types.

- Try Inferentia [1], a dedicated ML chip. Most popular ML frameworks are supported by the Neuron compiler.

Assuming you manage your instances in an auto scaling group (ASG):

- Enable a target tracking scaling policy to reactively scale your fleet. The best scaling metric depends on your inference workload.

- If your workload is predictable (e.g. high traffic during the daytime, low traffic during nighttime), enable predictive scaling. [3]

[1] https://aws.amazon.com/machine-learning/inferentia/

[2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-sca...

[3] https://docs.aws.amazon.com/autoscaling/plans/userguide/how-...

belval · 5 years ago

It could also be worth it to have a look at SageMaker? IIRC it's cheaper.

solresol · 5 years ago

My pitch to help: you can probably replace the GPU-intensive ML model with some incredibly dumb linear model. The difference in accuracy/precision/recall/F1 score might only be a few percentage points, and the linear model training time will be lightning fast. There are enough libraries out there to make it painless in any language.

It's unlikely that your users are going to notice the accuracy difference between the linear model and the GPU-intensive one unless you are doing computer vision. If you have small datasets, you might even find the linear model works better.

So it won't affect revenue, but it will cut costs to almost nothing.

Supporting evidence: I just completed this kind of migration for a bay area client (even though I live in Australia). Training (for all customers simultaneously) runs on a single t3.small now, replacing a very large and complicated set up that was there previously.

Enginerrrd · 5 years ago

Yeah, I agree with this. Rather than ask if OP is optimizing their AWS billing, I'd also ask if are OP's devs even have any incentive to do better. Even with machine vision it's stupidly easy to increase your computation effort by 2 or more orders of magnitude for almost no benefit. Default parameters often will do that in fact.

ww520 · 5 years ago

I would second that. NN model is the catch all approach but it's very expensive to train. The shallow learning algorithms can work well in a variety scenarios.

somurzakov · 5 years ago

linear model can be even offloaded to the client (javascript) so no compute will be even needed

pixiemaster · 5 years ago

I‘m a CTO of a compute intensive AI SaaS company, so I can relate.

One advice: speak to your AWS rep immediately. Get credits to redesign your system and keep you running. you can expect up to 7 digits in credits (for real!) and support for a year for free, they really want to help you in avoiding this.

cj · 5 years ago

This.

AWS has always been eager to get on the phone with me to discuss cost savings strategies. And they don’t upsell you in the process.

corford · 5 years ago

Second this. You'll be surprised at the flexibility they show if you ask (and have a genuine problem).

kureikain · 5 years ago

I was in same situation.

We bough 2 Dell servers via their financing program. Each server is about 19-25K. We paid AWS $60K per month before that. We pay $600 for co-location.

So my advice is try to get hardware via financing of provider Dell had a good program I think.

ricw · 5 years ago

This! We did the exact same, though our payback period was 2 months of AWS costs. Try and put the base load on your own servers, use the cloud to scale up and down when needed.

Cloud servers are a “luxury” that most don’t realise and just take for granted. Having said that, there are obvious overheads with handling your own servers, but when your costs are several salaries it’s probably worth considering.

christophilus · 5 years ago

Was about to suggest the same thing. You can buy physical machines with beefy specs for much less than your cloud bill when you get to these extremes.

liveoneggs · 5 years ago

this is good advice-- you can run those boxes into the ground for five years and easily get paid back

lovetocode · 5 years ago

What does colocation mean in this context? Did you buy the servers and AWS hosted on their premises?

Nextgrid · 5 years ago

Colocation just means buying space in a datacenter somewhere (and it comes with a certain amount of power and bandwidth).

QuinnyPig · 5 years ago

Howdy.

I have loud and angry thoughts about this; https://www.lastweekinaws.com/blog/ has a bunch of pieces, some of which may be more relevant than others. The slightly-more-serious corporate side of the house is at https://www.duckbillgroup.com/blog/, if you can stomach a slight decline in platypus.

beardface · 5 years ago

Came here to recommend you! Your newsletter always provides both enlightenment and a giggle.

atsaloli · 5 years ago

I came here to recommend QuinnyPig's services as well. He's a pro at reducing AWS costs.

fxtentacle · 5 years ago

You might be able to significantly lower your monthly bill in exchange for an upfront payment by purchasing your own servers and then renting co-location space.

I'm CTO of an AI image processing company, so I speak from experience here.

I personally use Hetzner.de and their Colo plans are very affordable, while still giving you multi GBit internet uplinks per server. If you insist on renting, Hetzner also offers rental plans for customer-specified hardware upon request. The only downside is that if you call a Hetzner tensorflow model from an AWS east frontend instance, you'll have 80-100 ms of roundtrip latency for the rpc/http call. But the insane cost savings over using cloud might make that negligible.

Also, have you considered converting your models from GPU to CPU? They might still be almost as fast, and affordable CPU hosting is much easier to find than GPU options.

I'm happy to talk with you about the specifics of our / your deployment via email, if that helps. But let me warn you, that my past experience with AWS and Google Cloud performance and pricing, in addition to suffering through low uptime at the hands of them, has made me somewhat of a cloud opponent for compute or data heavy deployments.

So unless your spend is high enough to negotiate a custom SLA, I would assume that your cloud uptime isn't any better than halfway good bare metal servers.