We built a service that executes arbitrary user-submitted code. An RCE service. It's the thing you're not supposed to build, but we had to do it.
Running arbitrary code means containers weren't a good fit ( container breakouts happen), so we are spinning up and down ec2 instances. This means we have actual infrastructure as code (i.e. not just piles of terraform but go code running in a service that spins up and down VMs based on API calls).
The service spins up and down EC2 instances based on user requests and executes user-submitted build scripts inside them.
It's not the standard web service we were used to building, so we thought we'd write it up and share it with anyone interested.
One cool thing we learned was how quickly you can Hibernate and wake up x86 EC2 instances. That ended up being a game-changer for us.
Corey and Brandon did the building, I'm mainly just the person who wrote things down, but hopefully, people find this interesting.
Source: ran a container based RCE service that ran millions of arbitrary workloads per month. We had sophisticated network and system anomaly detection, high priced pentesters etc and never had a breakout.
We found that privileged is a pretty big hammer and thought we needed it too but we found ways to give us the functionality we needed without all the extra stuff we didn't need the privileged brings in.