Hobbyist game dev here with random systemd thoughts. I’ve recently started to lean on systemd more as my ‘local game server process manager’ process. At first I thought I’d have to write this up myself as a whole slew of custom code, but then I realized the linux distros I use have systemd. That + cgroups and profiling my game server’s performance lets me pack an OS with as many game servers dynamically (target 80% resource utilization, funny things happen after that — things I don’t quite understand).
In this way I’m able to set up AWS EC2 instances or digital ocean droplets, a bunch of game servers spin up and report back their existence to a backend game services API. So far it’s working but this part of my project is still in development.
I used to target containerizing my apps, which adds complexity, but often in AWS I have to care about VMs as resources anyways (e.g. AWS gamelift requires me to spin up VMs, same with AWS EKS). I’m still going back and forth between containerizing and using systemd; having a local stack easily spun up via docker compose is nice, but with systemd what I write locally is basically what runs in prod environment, and there’s less waiting for container builds and such.
I share all of this in case there’s a gray beard wizard out there who can offer opinions. I have a tendency to explore and research (it’s fuuun!) so I’m not sure if I’m on a “this is cool and a great idea” path or on a “nobody does this because <reasons>” path.
> (target 80% resource utilization, funny things happen after that — things I don’t quite understand).
The closer you get to 100% resource utilization the more regular your workload has to become. If you can queue requests and latency isn't a problem, no problem, but then you have a batch process and not a live one (obviously not for games).
The reason is because live work doesn't come in regular beats, it comes in clusters that scale in a fractal way. If your long term mean is one request per second what actually happens is you get five requests in one second, three seconds with one request each, one second with two requests, and five seconds with 0 requests (you get my point). "fractal burstiness"
You have to have free resources to handle the spikes at all scales.
Also very many systems suffer from the processing time for a single request increasing as overall system loads increase. "queuing latency blowup"
So what happens? You get a spike, get behind, and never ever catch up.
Yea. I realize I ought to dig into things more to understand how to push past into 90%-95% utilization territory. Thanks for the resource to read through.
If you use podman quadlets, you get containers and systemd together as a first class citizen, in a config that is easily portable to kubernetes if you need more complex features.
Definitely don't recommend going down this path if you're not already familiar with Nix, but if you are, a strategy that I find works really well is to package your software with Nix, then you can run it easily via systemd but also create super lightweight containers using nix-snapshotter[0] so you don't have to "build" container images if you still want the flexibility of containers. You can then run the containers on Docker or Kubernetes without having to build heavy images.
This is sort of how I designed Accelbytes managed gameserver system (previously called: Armada).
You provide us a docker image, and we unpack it, turn it into a VM image and run as many instances as you want side-by-side with CPU affinity and NUMA awareness. Obviating the docker network stack for latency/throughput reasons - since you can
They had tried nomad, agones and raw k8s before that.
Checking out the website now. Looks enticing. Would a user of accelbyte multiplayer services still be in the business of knowing about underlying VMs? I caught some copy on the website that led me to question.
As a hobbyist part of me wants the VM abstracted completely (which may not be realistic). I want to say “here’s my game server process, it needs this much cpu/mem/network per unit, and I need 100 processes” and not really care about the underlying VM(s), at least until later. The closest thing I’ve found to this is AWS fargate.
Also holy smokes if you were a part of the team that architected this solution I’d love to pick your brain.
> I’m still going back and forth between containerizing and using systemd
Why not both? Systemd allows you to make containers via nspawn, which are defined just about the exact same as you do a regular systemd service. Best of both worlds.
This actually works really well with custom user scripts to do the initial setup. It’s also trivial to do this with docker/podman if you don’t want it to take over the machine. Batching/Matchmaking is the hard part of this, setting up a fleet is the fun part of this.
I’ve also done Microsoft Orleans clusters and still recommend the single pid, multiple containers/processes approach. If you can avoid Orleans and kubernetes and all that, the better. It just adds complexity to this setup.
> If you can avoid Orleans and kubernetes and all that, the better. It just adds complexity to this setup.
I’m starting to appreciate simplicity away from containers that’s why I’m even exploring systemd. I bet big on containers and developed plenty of skills, especially with k8s. I never stopped to appreciate that I’m partly in the business of making processes run on OSes, and it kinda doesn’t matter if the pid is a container or running ‘directly’ on the hardware. I’ll probably layer it back in but for now I’m kinda avoiding it as an exercise.
E.g. if I’m testing a debug ready build locally and want to attach my debugger, I can do that in k8s but there’s a ceremony of opening relevant ports and properly pointing to the file system of the container. Not a show stopper since I mostly debug while writing testing/production code in dev… But occasionally the built artifact demands inspection.
Yes! It’s a great project. I’m super happy they have a coherent local development story. I kinda abandoned using it though when I said “keeeep it simple” and stopped using containers/k8s. I think I needed to journey through understanding why multiplayer game services like Agones/gamelift/photon were set up like they were. I read through Multiplayer Game Programming: Architecting Networked Games by Joshua Glazer and Sanjay Madhav really helped (not to mention allowed me to better understand GDC talks over multiplayer topics much better).
This all probably speaks to my odd prioritization: I want to understand and use. I’ve had to step back and realize part of the fun I have in pursuing these projects is the research.
systemd-networkd now implements a resolve hook for its internal DHCP
server, so that the hostnames tracked in DHCP leases can be resolved
locally. This is now enabled by default for the DHCP server running
on the host side of local systemd-nspawn or systemd-vmspawn networks.
That was almost 15 years ago and the support is evidently not as useful.
Also it's entirely contained within a program that creates systemd .service files. It's super easy to extract it in a separate project. I bet someone will do it very quickly if there's need.
Despite being philosophically opposed to it, I can't deny that it is as common as it, because of how easy it seems to make the initial setup. By comparison, when I recently tried void linux, it simply requires ( maybe even demands ) more of its user.
Who needs to read mail when you can even make it receive mail!
Make an `smtp.socket`, which calls `smtp.service`, which receives the mail and prints it on standard output, which goes to a custom journald namespace (thanks `LogNamespace=mail` in the unit) so you can read your mail with `journalctl --namespace=mail`.
musl support is excellent. If you were unhappy with transparency, simplicity, maintainability and thinness of Alpine Linux - now you can install systemd and loose all of these disadvantages.
Probably no biggie to google the necessary copypasta to launch stuff from .service files instead. Which, being custom, won't have their timeout set back to "infinity" with every update. Unlike the existing rc.local wrapper service. Which, having an infinity timeout, and sometimes deciding that whatever was launched by rc.local can't be stopped, can cause shutdown hangs.
Great, looking forward to having to relearn the new way to do something inconsequential, and having random scripts break because a unix config file is now a systemd output rather than an actual config.
In this way I’m able to set up AWS EC2 instances or digital ocean droplets, a bunch of game servers spin up and report back their existence to a backend game services API. So far it’s working but this part of my project is still in development.
I used to target containerizing my apps, which adds complexity, but often in AWS I have to care about VMs as resources anyways (e.g. AWS gamelift requires me to spin up VMs, same with AWS EKS). I’m still going back and forth between containerizing and using systemd; having a local stack easily spun up via docker compose is nice, but with systemd what I write locally is basically what runs in prod environment, and there’s less waiting for container builds and such.
I share all of this in case there’s a gray beard wizard out there who can offer opinions. I have a tendency to explore and research (it’s fuuun!) so I’m not sure if I’m on a “this is cool and a great idea” path or on a “nobody does this because <reasons>” path.
The closer you get to 100% resource utilization the more regular your workload has to become. If you can queue requests and latency isn't a problem, no problem, but then you have a batch process and not a live one (obviously not for games).
The reason is because live work doesn't come in regular beats, it comes in clusters that scale in a fractal way. If your long term mean is one request per second what actually happens is you get five requests in one second, three seconds with one request each, one second with two requests, and five seconds with 0 requests (you get my point). "fractal burstiness"
You have to have free resources to handle the spikes at all scales.
Also very many systems suffer from the processing time for a single request increasing as overall system loads increase. "queuing latency blowup"
So what happens? You get a spike, get behind, and never ever catch up.
https://en.wikipedia.org/wiki/Network_congestion#Congestive_...
[0] https://github.com/pdtpartners/nix-snapshotter
Dead Comment
You provide us a docker image, and we unpack it, turn it into a VM image and run as many instances as you want side-by-side with CPU affinity and NUMA awareness. Obviating the docker network stack for latency/throughput reasons - since you can
They had tried nomad, agones and raw k8s before that.
As a hobbyist part of me wants the VM abstracted completely (which may not be realistic). I want to say “here’s my game server process, it needs this much cpu/mem/network per unit, and I need 100 processes” and not really care about the underlying VM(s), at least until later. The closest thing I’ve found to this is AWS fargate.
Also holy smokes if you were a part of the team that architected this solution I’d love to pick your brain.
Why not both? Systemd allows you to make containers via nspawn, which are defined just about the exact same as you do a regular systemd service. Best of both worlds.
That would be portable[1] services.
[1]: https://systemd.io/PORTABLE_SERVICES/
I’ve also done Microsoft Orleans clusters and still recommend the single pid, multiple containers/processes approach. If you can avoid Orleans and kubernetes and all that, the better. It just adds complexity to this setup.
I’m starting to appreciate simplicity away from containers that’s why I’m even exploring systemd. I bet big on containers and developed plenty of skills, especially with k8s. I never stopped to appreciate that I’m partly in the business of making processes run on OSes, and it kinda doesn’t matter if the pid is a container or running ‘directly’ on the hardware. I’ll probably layer it back in but for now I’m kinda avoiding it as an exercise.
E.g. if I’m testing a debug ready build locally and want to attach my debugger, I can do that in k8s but there’s a ceremony of opening relevant ports and properly pointing to the file system of the container. Not a show stopper since I mostly debug while writing testing/production code in dev… But occasionally the built artifact demands inspection.
This all probably speaks to my odd prioritization: I want to understand and use. I’ve had to step back and realize part of the fun I have in pursuing these projects is the research.
All the services you forgot you were running for ten whole years, will fail to launch someday soon.
Also it's entirely contained within a program that creates systemd .service files. It's super easy to extract it in a separate project. I bet someone will do it very quickly if there's need.
Deleted Comment
However, it is not easy figuring out which of those script are actually a SysVInit script and which simply wrap systemd.
* https://en.wikipedia.org/wiki/Jamie_Zawinski#Zawinski's_Law
* https://www.jwz.org/hacks/
:)
Make an `smtp.socket`, which calls `smtp.service`, which receives the mail and prints it on standard output, which goes to a custom journald namespace (thanks `LogNamespace=mail` in the unit) so you can read your mail with `journalctl --namespace=mail`.
Breaking systemd was a thorn on distributions trying to use musl.
Deleted Comment
Probably no biggie to google the necessary copypasta to launch stuff from .service files instead. Which, being custom, won't have their timeout set back to "infinity" with every update. Unlike the existing rc.local wrapper service. Which, having an infinity timeout, and sometimes deciding that whatever was launched by rc.local can't be stopped, can cause shutdown hangs.