We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible

> We rebuilt key AWS features ourselves

At what cost? People usually exclude the cost of DIY style hosting. Which usually is the most expensive part. Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.

> $24,000 annual bill felt disproportionate

That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.

This still could make sense. But you aren't telling the full story here. And I bet it's a lot less glamorous when you factor in development time for this.

Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings. But this will raise cost and hassle for us and I probably will need some re-enforcements on my team. As the CTO, my time is a very scarce commodity. So, the absolute worst use of my time would be doing this myself. My focus should be making our company and product better. Your techstack is fine. Been there done that. IMHO Terraform is overkill for small setups like this; fits solidly in the YAGNI category. But I like Ansible.

StopDisinfo910 · 9 months ago

> Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.

I don’t understand why people keep propagating this myth which is mostly pushed by the marketing department of Azure, AWS and GCP.

The truth is cloud provider doesn’t actually provide 24/7 support to your app. They only ensure that their infrastructure is mostly running for a very loose definition of 24/7.

You still need an expert on board to ensure you are using them correctly and are not going to be billed a ton of money. You still need people to ensure that your integration with them doesn’t break on you and that’s the part which contains your logic and is more likely to break anyway.

The idea that your cloud bill is your TCO is a complete fabrication and that’s despite said bill often being extremely costly for what it is.

steveBK123 · 9 months ago

I think both things are true - people overestimate the level of support provided by AWS, but also re-building the laundry list of stuff OP did in-house to save $24k/year seems onerous.

But the idea that AWS provides some sort of white glove 24/7 support is laughable for anyone that's ever run into issues with one of their products...

hluska · 9 months ago

Why would cloud providers support anything more than their infrastructure?

o_m · 9 months ago

It makes sense if you consider there is a risk you might get kicked out by AWS because the US government force Amazon to close your account. The US is also hinting about going to war against Europe (Greenland), which makes a bad idea to have any connection to the US.

spwa4 · 9 months ago

... and the US just made the EU very unhappy by killing the ICC's Microsoft subscription. Which by the way was hosted on Azure in Europe (meaning local or "sovereign cloud" or whatever they call it provides exactly zero protection against US sanctions).

So no more Microsoft software then?

The EU isn't willing to pay for that. They'll just throw the ICC under the bus, just like they'll throw any EU company that the US sanctions under the bus. That costs less. The EU has a nice name for throwing people under the bus like this: it's called "the peace dividend".

heisenbit · 9 months ago

AWS features may be expensive to replicate 100% but what if one only needs 80%. One also needs to consider the effort involved in configuring AWS and maintaining the skills for that. Then there are opportunity costs of using e.g. AWS dashboards vs. better ones with grafana etc..

I guess a lot depends on size, diversity and dynamics of the demand. Not every nail benefits from contact with the biggest hammer in the toolbox.

znpy · 9 months ago

> AWS features may be expensive to replicate 100% but what if one only needs 80%.

You are correct, but I think you're missing the point: my 80% and your 80% don't overlap completely.

randomtoast · 9 months ago

> $24,000 annual bill felt disproportionate

>> That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.

In terms of absolute savings, we’re talking about 90% of 24k, that’s about 21.6k saved per year. A good amount, but you cannot hire an SRE/DevOps Engineer for that price; even in Europe, such engineers are paid north of 70k per year.

I personally think the TCO (total cost of ownership) will be higher in the long run, because now every little bit of the software stack has to be managed by their infra team/person, and things are getting more and more complex over time, with updates and breaking changes to come. But I wish them well.

mk89 · 9 months ago

In mid sized companies, creating/using/maintaining AWS resources requires nevertheless one or more teams of devops/sre.

Out of experience, in the long run, this "managed aws saved us because we didn't need people" feels always like the typical argument made by saas sales people. In reality, many services/saas are really expensive, and you probably will only need a few features which sometimes you can rollout yourself.

The initial investment might be higher, but in the long run I think it's worth it. It's a lot like Heroku vs AWS. Superexpensive, but it allows you with little knowledge to push a POC in production. In this case, it's AWS vs self hosted or whatever.

Finally, can we quantify the cost of data/information? This company seems to be really "using" this strategy (= everything home made, you're safe with us) for sales purposes. And it might work, although for the final consumer this might have a higher price, which finally pays the additional devops to maintain the system. So who cares?

How important is for companies to not be subject to CLOUD act or funny stuff like that?

Elinvynia · 9 months ago

70k? Just hire in Poland/Czechia/Slovakia for 50% off!

Unless by Europe you mean the Apple feature availability special of UK/Germany/France/Spain/Italy

theragra · 9 months ago

I think many European countries have SRE lower for than 70k. How good is hard to judge. Our DevOps likely earns less, but she is just decent, not google-level.

rz2k · 9 months ago

Isn’t $24k also a naive accounting of the annual cost of AWS in this case? What FTE-equivalent was required to set up the services they use at AWS? What FTE-equivalent is required to keep the annual bill down to $24k from say $48k or $100k?

sksjvsla · 9 months ago

Before migration (AWS): We had about 0.1 FTE on infra — most of the time went into deployment pipelines and occasional fine-tuning (the usual AWS dance). After migration (Hetzner + OVHCloud + DIY stack): After stabilizing it is still 0.1 FTE (but I was 0.5 FTE for 3-4 months), but now it rests with one person. We didn’t hire a dedicated ops person.

I am curious why you think AWS services are more hands-off than a series of VPSs configured with Ansible and Terraform? Especially if you are under ISO 27001 and need to document upgrades anyway.

wqaatwt · 9 months ago

> That's around 1-2 months of time for a decent

Presumably they are in Europe? so labour is a few times cheaper here.

> Providing 24x7 support

They are not maintaining the hardware itself and it’s not like Amazon is doing providing devops for free. Unless you are using mainly serverless stuff the difference might not be that significant

nostrebored · 9 months ago

Amazon’s effort in making sure things _actually are up_ is fundamentally different than budget clouds.

The systems you design when you have reliable queues, durable storage, etc. are fundamentally different. When you go this path you’re choosing to solve problems that are “solved” for 99.99% of business problems and own those solutions.

kiney · 9 months ago

your implicit assumption that AWS requires less (exoensive) labour is just not true

sksjvsla · 9 months ago

Exactly our insight having maintained the same app both places.

nostrebored · 9 months ago

I have helped hundreds of people migrate to AWS and never had a single person spend more effort unless they went for an apples to apples disaster. I have only seen this when people take a high overhead tool they don’t understand (eg k8s) and move to cloud services they don’t understand.

randomtoast · 9 months ago

> Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings

There will be a new AWS European Sovereign Cloud[1] with the goal of being completely US independent and 100% compliant with EU law and regulations.

[1]: https://www.aboutamazon.eu/news/aws/aws-plans-to-invest-7-8-...

jjani · 9 months ago

> There will be a new AWS European Sovereign Cloud[1] with the goal of being completely US independent

The idea that anything branded AWS can possibly be US independent when push comes to shove is of course pure fantasy.

zius · 9 months ago

Our customers across EU (hospitals) are not impressed or interested (n=175). Such a delusional project.

The ICC move by MS made hospitals go in an even higher gear to prepare off-ramp plans. From private Azure cloud to “let’s get out”

toomuchtodo · 9 months ago

There are still US AWS people the US gov can apply pressure to. Sovereignty requires nothing on US soil, people, infrastructure, entities, etc. What Microsoft and AWS are doing is performance art around “EU sovereignty.”

nyrikki · 9 months ago

Even if you are cloud native, it makes sense to have scaffolding to allow for vendor mitigation, unless you want to tie your entire companies future to the whims of a single company.

Monitoring and persistence layers are cross cutting and already an abstraction with impedance mismatch already.

You don't need a full blown SOA2 systems, just minimal scaffolding to build on later.

Even if you stick to AWS for the remainder of time, that scaffolding will help when you grow, AWS services change, or you need a multi cloud strategy.

As a CTO, you need to also de-risk in the medium and longer term, and keeping options open is a part of that.

Building tightly coupled systems with lots of leakage is stepping over dollars to pick up pennies unless selling and exiting is your plan for the organization.

The author doesn't mention what they had to write, but typically it is cloud provider implementation details leaking into your code.

Just organizing ansible files in a different way can often help with this.

If I was a CTO who thought this option was completely impossible for my org, I would start on a strategic initiative to address it ASAP.

Once again you don't need to be able to jump tomorrow, but to me the belief that a vendor has you locked in would be a serious issue to me.

awongh · 9 months ago

90% sounds good but the real dollar amount feels low.

Two reasons for this stick out:

- Are the multi-million dollar SV seed rounds distorting what real business costs are? Counting dev salaries etc. (if there is at least one employee) it doesn't seem worth the effort to save $20k - i.e., 1/5 of a dev salary? But for a bootstrapped business $20k could definitely be existential.

- The important number would be the savings as percent of net revenue. Is the business suddenly 50% more profitable? Then it's definitely worth it. But in terms of thinking about positively growing ARR doing cost/benefit on dropping AWS vs. building a new (profitable) feature I could see why it might not make sense.

Edit to add: it's easy to offhand say "oh yeah easy, just get to $2M ARR instead of saving $20k- not a big deal" but of course in the real world it's not so simple and $20k is $20k. The prevalent SV mindset of just spending without thinking too much about profitability is totally delusional except for like 1 out of 10000 startups.

layer8 · 9 months ago

From the blog post: "We are a Danish workforce management company doing employee scheduling." Definitely not a VC-funded SV startup. Probably bootstrapped.

Garlef · 9 months ago

My last contact with AWS support (100€/month tier) was someone feeding me LLM generated slop that contained hallucinations about nonexistent features and configuration options.

hiAndrewQuinn · 9 months ago

This is what I'm wondering too. 90% is a lovely number to throw around but what is the opportunity cost?

sksjvsla · 9 months ago

> Cost of DIY and support: You’re absolutely right that 24x7 ops could eat up any savings if you built everything from scratch without automation or if you needed dedicated staff watching dashboards all night. In our case:

• We heavily invested upfront in infrastructure-as-code (Terraform + Ansible) so that infra is deterministic, repeatable, and self-healing where possible (e.g. auto-provisioning, automated backup/restore, rolling updates).

• Monitoring + alerting (Prometheus + Alertmanager) means we don’t need to watch screens — we get woken up only if there’s truly a critical issue.

• We don’t try to match AWS’s service level (e.g. RTO of minutes for every scenario) — we sized our setup to our risk profile and customers’ SLAs.

> True cost comparison:

• The migration was done as part of my CTO role, so no external consulting costs. The time investment paid back within months because the ongoing cost to operate the infra is low (we’re not constantly firefighting).

• I agree that if you had to hire more people just to manage this, it could negate the savings. That’s why for some teams, AWS is still a better fit.

> Business vs. cost drivers: Honestly, our primary driver was sovereignty and compliance — cost savings just made the business case easier to sell internally. Like you, our European customers were increasingly skeptical of US cloud providers, so this aligned with both compliance and go-to-market.

> Terraform / YAGNI: Fair point! Terraform probably is more than we need for the current scale. I went with it partly because it fits our team’s skillset and lets us keep options open as we grow (multi-provider, DR regions, etc).

And, finally, because this, I am posting about it. I am sharing as much as I can, and just spread the work about it. I just sharing my experience and knowledge. If you have any questions or want to discuss further, feel free to reach out at jk@datapult.dk!

physix · 9 months ago

I think it's indeed the opportunity cost and the commoditization of the infrastructure and operational expertise that drives startups to AWS. But over time, as you scale, they can easily become your biggest component to your marginal cost, without an easy exit, because they locked you in.

sksjvsla · 9 months ago

https://news.ycombinator.com/item?id=44335920#44336757

uncircle · 9 months ago

I feel there is a lot of FUD spread whenever someone moves off the cloud, with the inane comparison to the annual wage of a dedicated sysadmin, trying to discourage you from doing a “reckless” migration which will bite you in the ass, your servers will catch fire every day and that it is better to stay within the golden handcuffs of AWS and GCP.

I wonder if it’s both stockholm syndrome and learned helplessness of developers that cannot imagine having to spend a little more effort and save, like OP, 90% off their monthly bill.

Yeah sure for some use cases AWS is the market leader, but let’s not kid ourselves, 9/10 companies on AWS don’t require more than a few servers and a database.

sksjvsla · 9 months ago

Well said. It reminds me of a story I heard in a podcast once.

A database administrator for a drug cartel became an informant for the police.

His cartel boss called him in on a weekend due to a server errors. He said in the podcast "I knew I've been found out because a database running Linux never crashes"

Makes you wonder what everyone is telling themselves about the need for RDS..

toomuchtodo · 9 months ago

Good enough is good enough for most folks. In most cases, downtime is cheaper than higher reliability.

Dead Comment

trod1234 · 9 months ago

I kind of cringed reading this article because there is also the cost in downtime which doesn't seem to be considered along with the RTO timelines.

Hetzner has had issues where they just suddenly bring servers down with no notice, sometimes every server attached to an account because they get a bogus complaint, and in some cases it appears they are still up but all your health checks fail, where you are scurrying around trying to find the cause with no visibility or lifeline. All this costs money, a lot of money, and its unmanageable risk.

For all the risks and talk of compliance, what about the counterparty-risk where a competitor (or whoever) sends a a complaint from a nonexistent email which gets your services taken down. Sure after support gets involved and does their due dilligence they see its falsified and bring things back up but this may be quite awhile.

It takes their support at least 24 hours just to get back to you.

DIY hosting is riddled with so many unmanageable costs I don't see how OP can actually consider this a net plus. You basically are playing with fire in a gasoline refinery, once it starts burning who knows when the fire will go out so people can get back to work.

sksjvsla · 9 months ago

Totally valid concerns — I don’t disagree that DIY hosting comes with real risks that managed platforms abstract away (but AWS could close your account too).

We didn’t go into this blind though — we spent a lot of time testing scenarios (including Hetzner/OVH support delays) and designing mitigation strategies.

Some of what we do:

• Our infra is spread across multiple providers (Hetzner, OVH)) + Cloudflare for traffic management. If Hetzner blackholes us, we can redirect within minutes. • DB backups are encrypted and replicated nightly to various regions/providers (incl. one outside the primary vendors), with tested restore playbooks.

The key point: no platform is free of counterparty risk — whether that’s AWS pulling a region for legal reasons, or Hetzner taking a server offline. Our approach tries to make the blast radius smaller and the recovery faster, while also achieving compliance and cutting costs substantially (~90% as noted).

DIY is definitely not for everyone — it is more work, but for our particular constraints (cost, sovereignty, compliance) we found it a net win. Happy to share more details if helpful!

Oh, an imagine being kicked out of AWS and you used Aurora.. My certified multi-cloud setup with standard components should not make you cringe.

I think the most often mentioned problems mentioned are pollution of Hetzner addresses by shady people (might be addressed with "exits" from AWS / Cloudflare) and you are running on hardware which does tend to fail / needs upgrades. Were there some concerns on those from you?

Also, Loki! How do you handle memory hunger on loki reader for those pesky long range queries, and are there alternatives?

sksjvsla · 9 months ago

Pollution: We front everything user-facing through Cloudflare, so external users (and bots) don’t interact directly with our Hetzner/OVH IPs. We lock down our IPs at the firewall (ufw + Cloudflare IP allowlisting) so only trusted sources can even connect at L4.

Failures/upgrades: We provision with Terraform, so spinning up replacements or adding capacity is fast and deterministic.

We monitor hardware metrics via Prometheus and node exporter to get early warnings. So far (9 months in) no hardware failure, but it’s a risk we offset through this automation + design.

Apps are mostly data-less and we have (frequently tested) disaster recovery for the database.

Loki: We’re handling the memory hunger by

• Distinguishing retention limits and index retention

• Tuning query concurrency and max memory usage via Loki'’'s config + systemd resource limits.

• Use Promtail-style labels + structured logging so queries can filter early rather than regex the whole log content.

• Where we need true deep history search, we offload to object store access tools or simple grep of backups — we treat Loki as operational logs + nearline, not as an archive search engine.

Keyframe · 9 months ago

Thanks for thorough answer! Seems like you've platformized(!) yourself to an extent, have you considered going full on with k8s on top of metal (their machines) to offset some of the concerns about hardware?

liampulles · 9 months ago

(Not OP): On the loki question: yeah our project had a similar issue. I did a lot of playing around with the loki configuration, and what you'll discover by reading their blogs on Loki performance is that the indexing settings they recommend are not the ones that are used by default in helm (and probably other deployment configurations). Once I did some reconfiguration, added read specific instances, and implemented their other recommendations - we did see much better performance.

Just remember: their interest is that you buy their cloud service, not in giving an out-of-the-box great experience on their open source stuff.

sksjvsla · 9 months ago

A good alternatives for Loki is Victoria. Popular, way more performant and reputable but we went with Loki because of the relative size and diversity of maintainers between the two projects. Your points are super valid and we worked around it as mentioned above.

chuckadams · 9 months ago

Quickwit is also worth a look, along with its log collector companion Vector. I think at least Vector was a YC company before they got shlorped up by Datadog, but they're still both actively maintained open source.

TZubiri · 9 months ago

https://en.wikipedia.org/wiki/Sybil_attack

One of the advantages of more expensive providers seems to be that they have good reputation due to a de facto PoW mechanism.

sksjvsla · 9 months ago

Depends on the use case, right? I don’t accept traffic from random Hetzner IPs — only Cloudflare’s IPs are allowed.

The only potential indirect risks is if your Hetzner VPS IP range gets blacklisted (because some Hetzner clients abuse it for Sybil attacks or spam).

Or if Hetzner infrastructure was heavily abused, their upstream or internal networking could (in theory) experience congestion or IP reputation problems — but this is very unlikely to affect your individual VPS performance.

This depends on what you are doing on Hetzner and how you restrict access but for an ISO-27001 certified enterprise app, I believe this is extremely unlikely.