Readit News logoReadit News
dangoodmanUT · 2 years ago
This is really exciting, but there are a few things they will certainly have to work through:

*Services:*

Kubernetes expects DNS records like {pod}.default.svc.cluster.local. In order to achieve this they will have to have some custom DNS records on the "pod" (fly machine) to resolve this with their metadata. Not impossible, but something that has to be take into account.

*StatefulSets:*

This has 2 major obstacles:

The first is dealing with disk. k8s expects that it can move disks to different logical pods when they lose them (e.g. mapping EBS to an EC2 node). The problem here is that fly has a fundamentally different model. It means that it either has to decide not to schedule a pod because it can't get the machine that the disk lives on, or not guarantee that the disk is the same. While this does exist as a setting currently, the former is a serious issue.

The second major issue is again with DNS. StatefulSets have ordinal pod names (e.g. {ss-name}-{0..n}.default.sv.cluster.local). While this can be achieved with their machine metadata and custom DNS on the machine, it means that it either has to run a local DNS server to "translate" DNS records to the fly nomenclature, or have to constantly update local services on machines to tell them about new records. Both will incur some penalty.

benpacker · 2 years ago
Am I understanding correctly that because they map a “Pod” to a “Fly Machine”, there’s no intermediate “Node” concept?

If so, this is very attractive. When using GKS, we had to do a lot of work to get our Node utilization (the percent of resources we had reserve on a VM actually occupied by pods) to be higher than 50%.

Curios what happens when you run “kubectl get nodes” - does it lie to you, or call each region one Node?

btown · 2 years ago
GKE Autopilot is an attractive option here if you don't want to worry about node utilization and provisioning. Effectively you have an on-demand infinitely-sized k8s cluster that scales up and down as you need new pods. Some caveats, but it's an incredible onramp if you're coming from a Heroku or similar PaaS and don't want to worry about the infrastructure side of things: Github Actions building images and deploying a Helm chart to GKE Autopilot is a remarkable friendly yet customizable stack. Google should absolutely promote it more than it does. https://cloud.google.com/kubernetes-engine/docs/concepts/aut...
benpacker · 2 years ago
Unfortunately last I checked the compute pricing for GKE autopilot was almost double, so if you can beat 50% utilization, you might as well just keep the under-utilized Node around.
benpacker · 2 years ago
If this is “free GKE autopilot” (autopilot billed at the same price as regular Fly Machine compute), then that changes the way I think about Fly’s basic compute pricing a lot.

I would think they should highlight that a lot more in the product announcement!

harpratap · 2 years ago
GKE Autopilot is pretty much useless, very few cases where it actually turns out cheaper than simply using Cluster Autoscaler + Node autoprovisioning. Not only is the pricing absolutely absurd, they don't even allow normal K8s bursting behavior (requests need to be equal to limits) which means you not only end up paying more than regular K8s cluster but now also need to highly overprovision your pods
spankalee · 2 years ago
Why would you use GKE Autopilot over Cloud Run?
kuhsaft · 2 years ago
The node would be a virtual-kubelet. You can check out the virtual-kubelet GitHub repo for more info.

Interestingly, there are already multiple providers of virtual-kubelet. For example, Azure AKS has virtual nodes where pods are Azure Container Instances. There’s even a Nomad provider.

> So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

So probably a cluster per region. You could theoretically spin up multiple virtual-kubelets though and configure each one as a specific region.

> Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.

This would mean the control-plane would be on a single-server without high-availability? Although, I suppose there really isn’t any state stored since they are just proxying requests to the Fly Machine API. But still, if the machine went down your kubectl commands wouldn’t work.

philsnow · 2 years ago
The diagram on https://virtual-kubelet.io/docs/architecture/ makes me wonder whether it's possible to have a k8s cluster where the nodes are all virtual kubelets backed by different cloud providers (and then perhaps schedule loads preferentially with selectors)
chologrande · 2 years ago
> Had to do a lot of work to get node utilization ... higher than 50%

How is this the schedulers fault? Is this not just your resource requests being wildly off? Mapping directly to a "fly machine" just means your "fly machine" utilization will be low

benpacker · 2 years ago
I think there’s a slight misunderstanding - I’m referring to how much of a Node is being used by the Pods running on it, not how much of each Pod’s compute is being used by the software inside it.

Even if my Pods were perfectly sized, a large percent of the VMs running the Pod was underutilized because the Pods were poorly distributed across the Nodes

kuhsaft · 2 years ago
DX might be better I suppose, since you don’t have to fiddle with node sizing, cluster autoscalers, etc.

Someone else linked GKE Autopilot which manages all of that for you. So if you’re using GKE I don’t see much improvement, since you lose out on k8s features like persistent volumes and DaemonSets.

verdverm · 2 years ago
> we had to do a lot of work to get our Node utilization ... over 50%

Same, a while back you had to install cluster-autoscaler and set it to aggressive mode. GKE has this option now on setup, though I think anyone who's had to do this stuff knows that just using a cluster-autoscaler is never enough. I don't see this being different for any cluster and is more a consequence of your workloads and how they are partitioned (if not partitioning, you'll have real trouble getting high utilization)

robertlagrant · 2 years ago
I wonder how it copes with things like anti-affinity rules, where you don't want two things running on the same physical / virtual server for resilience reasons.
kuhsaft · 2 years ago
You wouldn’t use affinity rules anymore. The pods are scheduled on a single virtual-kubelet node, so if you use anti-affinity scheduling would fail.
arccy · 2 years ago
if it is pod per vm, that would make it like EKS Fargate
arccy · 2 years ago
is GKS some amalgamation of GKE and EKS
benpacker · 2 years ago
Typo haha - I meant GKE. Fixed now.
corobo · 2 years ago
Is this still a limitation for Fly k8s?

> A Fly Volume is a slice of an NVMe drive on the physical server your Fly App runs on. It’s tied to that hardware.

Does the k8s have any kind of storage provisioning that allows pods with persistent storage (e.g. databases) to just do their thing without me worrying about it or do I still need to handle disks potentially vanishing?

I think this is the only hold-up that stops me actually using Fly. I don't know what happens if my machine crashes and is brought back on different hardware. Presumably the data is just not there anymore.

Is everyone else using an off-site DB like Planetscale? Or just hoping it's an issue that never comes up, w/ backups just in case? Or maybe setting up full-scale DB clusters on Fly so it's less of a potential issue? Or 'other'?

tptacek · 2 years ago
Not speaking for the FKS case, but in general for the platform: when you associate an app with a volume, your app is anchored to the hardware the volume is on (people used to use tiny volumes as a way to express hard-locked region affinity when we were still using Nomad). So if your Fly Machine crashes, it's going to come back on the same physical as the volume lives on.

We back up volumes to off-net block storage, and, under the hood, we can seamlessly migrate a volume to another physical (the way we do it is interesting, and we should write it up, but it's still also an important part of our work sample hiring process, which is why we haven't). So your app could move from one physical to another; the data would come with it.

On the other hand: Fly Volumes are attached storage. They're not a SAN system like EBS, they're not backed onto a 50-9s storage engine like S3. If a physical server throws a rod, you can lose data. This is why, for instance, if you boot up a Fly Postgres cluster here and ask us to do it with only one instance, we'll print a big red warning. (When you run a multi-node Postgres cluster, or use LiteFS Cloud with SQLite, you'd doing at the application layer what a more reliable storage layer would do at the block layer).

Deleted Comment

asim · 2 years ago
And fly becomes the standard cloud provider like everyone else. I think this transition is only natural. It's hard to be a big business without catering to the needs of larger companies and that is the operation of many services, not individual apps.
tptacek · 2 years ago
Nothing is changing for anybody who doesn't care about K8s. If you're not a K8s person, or you are and you don't like K8s much, you shouldn't ever touch FKS.
therein · 2 years ago
I used Fly for some projects, I really like it.

But once again, for many of my projects, I still need my outbound IPs to resolve to a specific country. I can't have them all resolve to Chicago, US in undeterministic ways.

I would be willing to pay an additional cost for this but even with reserved IPs, I am given IPs that are labelled as Chicago, US IPs by GeoIP providers even for non US regions.

zifnab06 · 2 years ago
fwiw - our network folks _should_ have fixed this a few weeks ago. Some of the outbound IPs were incorrectly tagged in some of the geoip databases as being in the US when they were not.

Deleted Comment

verdverm · 2 years ago
If they are reluctant and only do it because they have to, are they really the right vendor for managed k8s?

What about them makes for a good trade-off when considering the many other vendors?

tptacek · 2 years ago
We're not a K8s vendor. We're a lower-level platform than that. If all you care about is K8s, and no part of the rest of our platform is interesting to you --- the global distribution and Anycast, the fly-proxy features, the Machines API --- we're not a natural fit for what you're doing.

We were surprised at how FKS turned out, which is part of why we decided to launch it as a feature and all of why we wrote it up this way. That's all.

verdverm · 2 years ago
> We're not a K8s vendor

It might now be more accurate to say "You were not a k8s vendor", but now you are based on

> If K8s is important for your project, and that’s all that’s been holding you back from trying out Fly.io, we’ve spent the past several months building something for you.

If it's fundamentally different, maybe you shouldn't call it Kubernetes, perhaps a Kubernetes API compatible alternative?

fwiw/context, I use GKE and also many of the low-level services on GCP

Is Fly supposed to be simpler for the average developer?

grossvogel · 2 years ago
I'm excited about this as a way to configure my Fly.io apps in a more declarative way. One of my biggest gripes about Fly.io is that there's a lightly documented bespoke config format to learn (fly.toml), and at the same time there's a ton of stuff you can't even do with that config file.

I love Kubernetes because the .yaml gives you have the entire story, but I'd _really_ love to get that experience w/o having to run Kubernetes. (Even in most managed k8s setups, I've found the need to run lots of non-managed things inside the cluster to make it user-friendly.)

frenchman99 · 2 years ago
Probably good for people already used to fly or interested in fly for other reasons, that could also use k8s ?

Sometimes you just want to run k8s without thinking too much about it, without having all the requirements that gcp have answers to.

szundi · 2 years ago
If their reluctance were based on valid reasons that they handled in a unique way - might be good. In theory.
verdverm · 2 years ago
k8s has become a standard api and platform for running apps, having a * on it makes the implementation an outlier from the standard, not normally considered a good thing because you have to be aware of the nuanced differences.
paxys · 2 years ago
Maybe a got fit for someone who is reluctant to use Kubernetes but has to for whatever reason.
nailer · 2 years ago
If someone isn't a cloud provider they should be reluctant to use Kubernetes.
motoboi · 2 years ago
There is a very high price to pay when going with your own scheduling solution: you have to compete with the resources google and others are throwing at the problem.

Also, there is the market for talent, which is non-existent for fly.io technology if it's not open source (I see what you did here, Google): you'll have to teach people how your solution works internally and congratulations, now you have a global pool of 20 (maybe 100) people that can improved it (if you have really deep pockets, maybe you can have 5 Phd). Damn, universities right now maybe have classes about Kubernetes for undergrad students. Will they teach your internal solution?

So, if a big part of your problem is already solved by a gigantic corporation investing millions to create a pool of talented people, you better take use of that!

Nice move, fly.io!

nojvek · 2 years ago
What if it’s really not that complicated, and by adding more people you make it more complex. So complex that you need even more people to maintain that complexity?

I love fly.io for rethinking some of the problems.

kuhsaft · 2 years ago
How does this handle multiple containers for a Pod? In a container runtime k8s, containers within a pod share the same network namespace (same localhost) and possibly pid namespace.

The press release maps pods to machines, but provides no mapping of pod containers to a Fly.io concept.

Are multiple containers allowed? Do they share the same network namespace? Is sharing PID namespace optional?

Having multiple containers per pod is a core functionality of Kubernetes.

remram · 2 years ago
You can use mount namespaces, or even containers in your VM. Maybe that's how?
kuhsaft · 2 years ago
Fly.io claims it’s “just a VM”. But, Fly.io Machines are an abstraction of microVMs using Firecracker. Building upon that, the FKS implementation is an abstraction on top of Fly.io Machines. So what I’m asking is how, if even, does the FKS implementation support multiple containers for a pod? Using FKS, the abstraction is no longer a VM.

It seems that Fly.io Machines support multiple processes for a single container, but not multiple containers per Machine [0]. This means one container image per Machine and thus no shared network namespace across multiple containers.

[0] https://community.fly.io/t/multi-process-machines/8375

javaunsafe2019 · 2 years ago
Why should you do this - sounds like an antipattern to me
kuhsaft · 2 years ago
It’s used widely in the Kubernetes world and is known as sidecars [0].

[0] https://kubernetes.io/blog/2023/08/25/native-sidecar-contain...

postalrat · 2 years ago
Its because you have multiple processes (containers) that work together in their little pod. You could stick them all in a single image somehow but that would be much more work and less flexible.
thowrjasdf32432 · 2 years ago
Great writeup! Love reading about orchestration, especially distributed.

> When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

Why a single machine? Is it because this single fly machine is itself orchestrated by your control plane (Nomad)?

> ...we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF). But the ideas are the same.

very cool, is this similar to how Cilium works?

loloquwowndueo · 2 years ago
The control plane is not nomad anymore : https://community.fly.io/t/the-death-of-nomad/16220