Hey HN! We've built a system that lets you run any application on spot instances without worrying about preemption. It works by running VMs on top of VMs - without the need for nested virtualization or hardware acceleration support - by integrating our open-source projects: Drafter (https://github.com/loopholelabs/drafter - handles VM live migration), PVM (https://github.com/loopholelabs/linux-pvm-ci - enables nested virtualization without hardware support), and Silo (https://github.com/loopholelabs/silo - provides efficient live storage migration over the public internet). The cool part is that we can migrate workloads between spot instances faster than they get preempted, with no dropped connections - even across different cloud providers and regions.
While there are other solutions that try to handle spot instance preemption through checkpointing, we take a fundamentally different approach by making preemption irrelevant through continuous state capture and seamless migration. We showed this off at KubeCon NA 2024 by migrating a Redis pod between AWS, GCP, and Azure while maintaining active client connections.
All core components are open source, including our Firecracker patches (https://github.com/loopholelabs/firecracker/tree/main-live-m...). We're currently in the process of launching with GitHub Actions runners that can safely run on spot instances (which are 75%+ cheaper!) without risk of interruption, even for long-running builds and stateful workloads at https://architect.run/.
More info in the linked blog post! Would love to hear your thoughts and feedback on the technical implementation and potential use cases.
While there are other solutions that try to handle spot instance preemption through checkpointing, we take a fundamentally different approach by making preemption irrelevant through continuous state capture and seamless migration. We showed this off at KubeCon NA 2024 by migrating a Redis pod between AWS, GCP, and Azure while maintaining active client connections.
All core components are open source, including our Firecracker patches (https://github.com/loopholelabs/firecracker/tree/main-live-m...). We're currently in the process of launching with GitHub Actions runners that can safely run on spot instances (which are 75%+ cheaper!) without risk of interruption, even for long-running builds and stateful workloads at https://architect.run/.
More info in the linked blog post! Would love to hear your thoughts and feedback on the technical implementation and potential use cases.