> The biggest problem: developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs. System engineers may have smart, fussy opinions on how to get their models loaded with CUDA, and what the best GPU is. But software developers don’t care about any of that. When a software developer shipping an app comes looking for a way for their app to deliver prompts to an LLM, you can’t just give them a GPU.
I'm increasingly coming to the view that there is a big split among "software developers" and AI is exacerbating it. There's an (increasingly small) group of software developers who don't like "magic" and want to understand where their code is running and what it's doing. These developers gravitate toward open source solutions like Kubernetes, and often just want to rent a VPS or at most a managed K8s solution. The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
A company like fly.io absolutely appeals to the latter. GPU instances at this point are very much appealing to the former. I think you have to treat these two markets very differently from a marketing and product perspective. Even though they both write code, they are otherwise radically different. You can sell the latter group a lot of abstractions and automations without them needing to know any details, but the former group will care very much about the details.
> There's an (increasingly small) group of software developers who don't like "magic" and want to understand where their code is running and what it's doing. These developers gravitate toward open source solutions like Kubernetes
Kubernetes is not the first thing that comes to mind when I think of "understanding where their code is running and what it's doing"...
> Kubernetes is not the first thing that comes to mind when I think of "understanding where their code is running and what it's doing"...
In the end it's a scheduler for Docker containers on a bunch of virtual or bare metal machines. Once you get that in your head life becomes much more easy.
The only thing I'd really love to see from an ops perspective is a way to force-revive crashed containers for debugging. Yes, one shouldn't have to debug cattle, just haul the carcass off and get a new one... but I still prefer to know why the cattle died.
One may think Kubernetes is complex (I agree), but I haven't seen alternative that simultaneously allows to:
* Host hundreds or thousands of interacting containers across multiple teams in sane manner
* Let's you manage and understand how is it done in the full extent.
Of course there are tons of organizations that can (and should) easily resign from one of these, but if you need both, there isn't better choice right now.
I agree with the blog post that using K8s + containers for GPU virtualization is a security disaster waiting to happen. Even if you configure your container right (which is extremely hard to do), you don't get seccomp-bpf.
People started using K8s for training, where you already had a network isolated cluster. Extending the K8s+container pattern to multi-tenant environments is scary at best.
I didn't understand the following part though.
> Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map virtualized GPUs into Intel Cloud Hypervisor.
Why was this part so hard? Doing PCI passthrough with the Cloud Hypervisor (CH) is relatively common. Was it the transition from Firecracker to CH that was tricky?
Yeah, I think this really exemplifies the "everyone more specialized than me doesn't get the bigger picture, and everyone less specialized than me is wasting their time" trope. Developers who don't want to deal with the nitty gritty in one area are dealing with it in another area. Everyone has 24 hours in a day.
I almost started laughing at the same comment. Kubernetes is the last place to know what your code is doing. A VM or bare metal is more practical for the persona that OP described. The git pushers might want the container on k8s
If you have a system that's actually big or complex enough to warrant using Kubernetes, which, to be frank, isn't really that much considering the realities of production, the only thing more complex than Kubernetes is implementing the same concepts but half-assed.
I really wonder why this opinion is so commonly accepted by everyone. I get that not everything needs most Kubernetes features, but it's useful. The Linux kernel is a dreadfully complex beast full of winding subsystems and full of screaming demons all over. eBPF, namespaces, io_uring, cgroups, SE Linux, so much more, all interacting with eachother in sometimes surprising ways.
I suspect there is a decent likelihood that a lot of sysadmins have a more complete understanding of what's going on in Kubernetes than in Linux.
Kubernetes is an abstraction of VMs so that single container can be implemented in the absence of a code package. The container is the binary in this circumstance. Unfortunately they lose control of blame shifting if their deployment fails. I can no longer be the VMs fault for failure. What is deployed in lower environments is what is in Prod physically identical outside of configuration.
Really? There are plenty of valid criticisms of kubernetes, but this doesn't strike me as one of them. It gives you tons of control over all of this. That's a big part of why it's so complex!
It's very easy to understand once you invest a little bit of time.
That's assuming you have a solid foundation in the nuts and bolts of how computers work to begin with.
If you just jumped into software development without that background, well, you're going to end up in the latter pool of developers as described by the parent comment.
Fly.io probably runs it on Kubernetes as well. It can be something in the middle, like RunPod. If you select 8 GPUs, you'll get a complete host for yourself. Though there is a lot of stuff lacking at RunPod too. But Fly.io... First of all, I've never heard about this one. Second, the variety of GPUs is lacking. There are only 3 types, and the L40S on Fly.io is 61.4% more expensive than on RunPod. So I would say it is about marketing, marketplace, long-term strategy, and pricing. But it seems at least they made themselves known to me (I bet there others which heard about them first time today too).
Core kubernetes (deployments, services etc..) is fairly easy to understand. lot of other stuff in the cncf ecosystem is immature. I don't think most people need to use all the operators, admission controllers, otel, service mesh though.
If you're running one team with all services trusting each other, you don't have problems solved by these things. Whenever you introduce a CNCF component outside core kubernetes, invest time in understanding it and why it does what it does. Nothing is "deploy and forget" and will need to be regularly checked and upgraded, and when issues come up you need some architecture-level of the component to troubleshoot because so many moving parts are there.
So if I can get away writing my own cronjob in 1000 lines rather than installing something from GitHub with a helm chart, I will go with the former option.
(Helm is crap though, but you often won't have much choice).
Maybe not Kubernetes, but what about Docker Compose or Docker Swarm? Having each app be separate from the rest of the server, with easily controllable storage, networking, resource limits, restarts, healthchecks, configuration and other things. It's honestly a step up from well crafted cgroups and systemd services etc. (also because it comes in a coherent package and a unified description of environments) while the caveats and shortcomings usually aren't great enough to be dealbreakers.
But yeah, the argument could have as well just said running code on a VPS directly, because that also gives you a good deal of control.
Based on the following I think they also meant _how_ the code is running:
> The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
I'm a "full full-stack" developer because I understand what happens when you type an address into the address bar and hit Enter - the DNS request that returns a CNAME record to object storage, how it returns an SPA, the subsequent XHR requests laden with and cookies and other goodies, the three reverse proxies they have to flow through to get to before they get to one of several containers running on a fleet of VMs, the environment variable being injected by the k8s control plane from a Secret that tells the app where the Postgres instance is, the security groups that allow tcp/5432 from the node server to that instance, et cetera ad infinitum. I'm not hooking debuggers up to V8 to examine optimizations or tweaking container runtimes but I can speak intelligently to and debug every major part of a modern web app stack because I feel strongly that it's my job to be able to do so (and because I've worked places where if I didn't develop that knowledge then nobody would have).
I can attest that this type of thinking is becoming increasingly rare as our industry continues to specialize. These considerations are now often handled by "DevOps Engineers" who crank out infra and seldom write code outside of Python and bash glue scripts (which is the antithesis to what DevOps is supposed to be, but I digress). I find this unfortunate because this results in teams throwing stuff over the wall to each other which only compounds the hand-wringing when things go wrong. Perhaps this is some weird psychopathology of mine but I sleep much better at night knowing that if I'm on the hook for something I can fix it once it's out in the wild, not just when I'm writing features and debugging it locally.
I enjoy the details, but I don’t get paid to tell my executives how we’re running things. I get paid to ship customer facing value.
Particularly at startups, it’s almost always more cost effective to hit that “scale up” button from our hosting provider than do any sort of actual system engineering.
Eventually, someone goes “hey we could save $$$$ by doing XYZ” so we send someone on a systems engineering journey for a week or two and cut our bill in half.
None of it really matters, though. We’re racing against competition and runway. A few days less runway isn’t going to break a startup. Not shipping as fast as reasonable will.
I’ve been in similar situations, but details matter. If your scale up button is heavily abstracted services, your choice starts to become very different as the cost of reimplementing what the service does might be high enough that you end up with a no win situation of your own making.
The closer your “Scale up” button is referencing actual hardware, the less of a problem it is.
This is a false dichotomy. The truth is we are constantly moving further and further away from the silicon. New developers don't have as much need to understand these details because things just work; some do care because they work at a job where it's required, or because they're inherently interested (a small number).
Over time we will move further away. If the cost of an easily managed solution is low enough, why do the details matter?
> The truth is we are constantly moving further and further away from the silicon.
Are we? We're constantly changing abstractions, but we don't keep adding them all that often. Operating systems and high-level programming languages emerged in the 1960s. Since then, the only fundamentally new layer of abstraction were virtual machines (JVM, browser JS, hardware virtualization, etc). There's still plenty of hardware-specific APIs, you still debug assembly when something crashes, you still optimize databases for specific storage technologies and multimedia transcoders for specific CPU architectures...
The details matter because someone has to understand the details, and it's quicker and more cost-effective if it's the developer.
At my job, a decade ago our developers understood how things worked, what was running on each server, where to look if there were problems, etc. Now the developers just put magic incantations given to them by the "DevOps team" into their config files. Most of them don't understand where the code is running, or even what much of it is doing. They're unable or unwilling to investigate problems on their own, even if they were the cause of the issue. Even getting them to find the error message in the logs can be like pulling teeth. They rely on this support team to do the investigation for them, but continually swiveling back-and-forth is never going to be as efficient as when the developer could do it all themselves. Not to mention it requires maintaining said support team, all those additional salaries, etc.
(I'm part of said support team, but I really wish we didn't exist. We started to take over Ops responsibilities from a different team, but we ended up taking on Dev ones too and we never should've done that.)
This statement encapsulates nearly everything that I think is wrong with software development today. Captured by MBA types trying to make a workforce that is as cheap and replaceable as possible. Details are simply friction in a machine that is obsessed with efficiency to the point of self-immolation. And yet that is the direction we are moving in.
Details matter, process matters, experience and veterancy matters. Now more than ever.
I used to think this, ut it only works if the abstractions hold - it’s like if we stopped random access memory and went back to tape drives suddenly abstractions matter.
My comment elsewhere goes into but more detail but basically silicon stopped being able to make single threaded code faster in about 2012 - we just have been getting “more parallel cores” since. And now at wafer scale we see 900,000 cores on a “chip”. When 100% parallel coding runs 1 million times faster than your competitors, when following one software engineering path leads to code that can run 1M X, then we will find ways to use that excess capacity - and the engineers who can do it get to win.
Tell that to the people who keep the network gear running at your office. You might not see the importance of knowing the details, but those details still matter and are still in plain use all around you every day. The aversion to learning the stack you're building with is frustrating to the people who keep that stack running.
I think that if the development side knew a little bit of the rest of the stack they'd write better applications overall.
I think that’s the same answer someone would say about an IBM mainframe in 1990. And just as wrong.
I’ll use my stupid hobby home server stuff as an example. I tossed the old VMware box years ago. You know what I use now? Little HP t6x0 thin clients. They are crappy little x86 SoCs with m2 slots, up to 32GB memory and they can be purchased used for $40. They aren’t fast, but perform better than the cheaper AWS and GCP instances.
In that a trivial use case? Absolutely. Now move from $30 to about $2000. Buy a Mac Mini. It’s a powerful arm soc with ridiculously fast storage and performance. Probably more compute than a small/mid size company computer room a few years ago and more performant than a $1M SAN a decade ago.
6G will bring 10gig cellular.
Hyperscalers datacenters are the mainframe of 2025.
Have you ever had a plumber, HVAC tech, electrician, etc. come out to your house for something, and had them explain it to you? Have you had the unfortunate experience of that happening more than once (with separate people)? If so, you should know why this matters: because if you don’t understand the fundamentals, you can’t possibly understand the entire system.
It’s the same reason why the U.S. Navy Nuclear program still teaches Electronics Technicians incredibly low-level things like bus arbitration on a 386 (before that, it was the 68000). Not because they expect most to need to use that information (though if necessary, they carry everything down to logic analyzers), but because if you don’t understand the fundamentals, you cannot understand the abstractions. Actually, the CPU is an abstraction, I misspoke: they start by learning electron flow, then moving into PN junctions, then transistors, then digital logic, and then and only then do they finally learn how all of those can be put together to accomplish work.
Incidentally, former Navy Nukes were on the initial Google SRE team. If you read the book [0], especially Chapter 12, you’ll get an inkling about why this depth of knowledge matters.
Do most people need to understand how their NIC turns data into electrical signals? No, of course not. But occasionally, some weird bug emerges where that knowledge very much matters. At some point, most people will encounter a bug that they are incapable of reasoning about, because they do not possess the requisite knowledge to do so. When that happens, it should be a humbling experience, and ideally, you endeavor to learn more about the thing you are stuck on.
Capture and product stickiness. If your product is all serverless wired together with an event system by the same cloud provider, you are in a very weak position to argue that you will go elsewhere where, leveraging the competitive market to your advantage.
The more the big cloud providers can abstract cpu cycles, memory, networking, storage etc, the more they don’t have to compete with others doing the same.
What happens in reality is that things are promised to work and (at best) fulfill that promise so long as no developers or deployers or underlying systems or users deviate from a narrow golden path, but fail in befuddling ways when any of those constraints introduce a deviation.
And so what we see, year over year, is continued enshittening, with everything continuously pushing the boundaries of unreliability and inefficiency, and fewer and fewer people qualified to actually dig into the details to understand how these systems work, how to diagnose their issues, how to repair them, or how to explain their costs.
> If the cost of an easily managed solution is low enough, why do the details matter?
Because the patience that users have for degraded quality, and the luxury that budgets have for inefficiency, will eventually be exhausted and we'll have collectively led ourselves into a dark forest nobody has the tools or knowledge to navigate out of anymore.
Leveraging abstractions and assembling things from components are good things that enable rapid exploration and growth, but they come with latent costs that eventually need to be revisited. If enough attention isn't paid too understanding, maintaining, refining, and innovating on the lowest levels, the contraptions built through high-level abstraction and assempbly will eventually either collapse upon themselves or be flanked by competitors who struck a better balance and built on more refined and informed foundations.
As a software engineer who wants a long and satisfying career, you should be seeking to understand your systems to as much depth as you can, making informed, contextual choices about what abstractions you leverage, exactly what they abstract over, and what vulnerabilities and limitations are absorbed into your projects by using them. Just making naive use of the things you found a tutorial for, or that are trending, or that make things look easy today, is a poison to your career.
> If the cost of an easily managed solution is low enough
Because vertical scaling is now large enough that I can run all of twitter/amazon on one single large server. And if I'm wrong now, in a decade I won't be.
Compute power grows exponentially, but business requirements do not.
This is context based dichotomy, not a person-based one.
In my personal life, I’m curiosity-oriented, so I put my blog, side projects and mom’s chocolate shop on fully self hosted VPSs.
At my job managing a team of 25 and servicing thousands of customers for millions in revenue, I’m very results-oriented. Anyone who tries to put a single line of code outside of a managed AWS service is going to be in a lot of trouble with me. In a results-oriented environment, I’m outsourcing a lot of devops work to AWS, and choosing to pay a premium because I need to use the people I hire to work on customer problems.
Trying to conflate the two orientations with mindsets / personality / experience levels is inaccurate. It’s all about context.
One end is PaaS like Heroku, where you just git push.
The other end is bare metal hosting.
Every option you mentioned (VPS, Manages K8S, Self Hosted K8S, etc) they all fall somewhere between these two ends of the spectrum.
If, a developer falls into any of these "groups" or has a preference/position on any of these solutions, they are just called juniors.
Where you end up in this spectrum is a matter of cost benefit. Nothing else. And that calculation always changes.
Those options only make sense where the cost of someone else managing it for you for a small premium gets higher than the opportunity/labor cost of you doing it yourself.
So, as a business, you _should_ not have a preference to stick to. You should probably start with PaaS, and as you grow, if PaaS costs get too high, slowly graduate into more self-managed things.
A company like fly.io is a PaaS. Their audience has always been, and will always be application developers who prefer to do nothing low-level. How did they forget this?
> Where you end up in this spectrum is a matter of cost benefit. Nothing else. And that calculation always changes.
This is where I see things too. When you start out, all your value comes from working on your core problem.
eg: You'd be crazy to start a CRM software business by building your own physical datacenter. It makes sense to use a PaaS that abstracts as much away as possible for you so you can focus on the actual thing that generates value.
As you grow, the high abstraction PaaS gets increasingly expensive, and at some point bubbles up to where it's the most valuable thing to work on. This typically means moving down a layer or two. Then you go back to improving your actual software.
You go through this a bunch of times, and over time grow teams dedicated to this work. Given enough time and continuous growth, it should eventually make sense to run your own data centers, or even build your own silicon, but of course very few companies get to that level. Instead most settle somewhere in the vast spectrum of the middle, with a mix of different services/components all done at different levels of abstraction.
This is news to us. Our primary DX is a CLI. One of our defining features is hardware isolation. To use us, you have to manage Dockerfiles. Have you had the experience of teaching hundreds of Heroku refugees how to maintain a Dockerfile? We have had that experience. Have you ever successfully explained the distinction between "automated" Postgres and "managed" Postgres? We have not.
You're not wrong that there's a PaaS/public-cloud dividing line, and that we're at an odd place between those two things. But I mean, no, it is not the case that our audience is strictly developers who do nothing low-level. I spent months of my life getting _UDP_ working for Fly apps!
Aren’t we just continually moving up layers of abstractions? Most of the increasingly small group doesn’t concern itself with voltages, manually setting jumpers, hand-rolling assembly for performance-critical code, cache line alignment, raw disk sector manipulation, etc.
I agree it’s worthwhile to understand things more deeply but developers slowly moving up layers of abstractions seems like it’s been a long term trend.
We certainly need abstractions for the first layer of the hardware. An abstraction of the abstraction can be useful if the first abstraction is very bad or very crude. But we are now at an abstraction of an abstraction x 8 or so. It's starting to get a bit over the top.
> I'm increasingly coming to the view that there is a big split among "software developers" and AI is exacerbating it.
I don't think this split exists, at least in the way you framed it.
What does exist is workload, and problems that engineers are tasked with fixing. If you are tasked with fixing a problem or implementing a feature, you are not tasked with learning all the minute details or specifics of a technology. You are tasked with getting shit done, which might even turn out to not involve said technology. You are paid to be a problem-solver, not an academic expert on a specific module.
What you tried to describe as "magic" is actually the balance between broad knowledge vs specialization, or being a generalist vs specialist. The bulk of the problems that your average engineer faces requires generalists, not specialists. Moreover, the tasks that actually require a specialist are rare, and when those surface the question is always whether it's worth to invest in a specialist. There are diminished returns on investment, and throwing a generalist at the problem will already get some results. You give a generalist access to a LLM and he'll cut down on the research time to deliver something close to what a specialist would deliver. So why bother?
With this in mind, I would go as far as to frame a scenario backhandedly described as "want to understand where their code is running and what it's doing" (as if no engineer needs to have insight on how things work?) as opposed to the dismissive "just wants to `git push` and be done with it" scenario, can actually be classified as a form of incompetence. You,as an engineer, only have so many hours per day. Your day-to-day activities involve pushing new features and fixing new problems. To be effective, your main skillet is learn the system in a JIT way, dive in, fix it, and move on. You care about system traits, not low-level implementation details that may change tomorrow on a technology you may not even use tomorrow. If, instead, you feel the need to waste time on topics that are irrelevant to address the immediate needs of your role, you are failing to deliver value. I mean, if you frame yourself as a Kubernetes expert who even know commit hashes by heart, does that matter if someone asks you, say, why is a popup box showing off-center?
I'm not entirely certain. Or perhaps we're all part of both groups.
I want to understand LLMs. I want to understand my compiler, my gc, my type system, my distributed systems.
On the other hand, I don't really care about K8s or anything else, as long as I have something that works. Just let me `git push` and focus on making great things elsewhere.
this feels right to me. application development and platform development are both software development tasks, and lots of software devs do both. i like working on platform-level stuff, and i like building applications. but i like there to be a good distinction between the two, and when i'm working on application-level stuff, i don't want to have to think about the platform.
services like fly.io do a good job of hiding all the platform level work and just giving you a place to deploy your application to, so when they start exposing tools like GPUs that are more about building platforms than building applications, it's messy.
I am the former. I also make cost benefit based decisions that involve time. Unless I have very specific configuration needs, the git push option lets me focus on what my users care about and gives me one less thing that I need to spend my time on.
Increasingly, Fly even lets you dip into most complex configurations too.
I’ve got no issue with using Tofu and Ansible to manage my own infrastructure but it takes time to get it right and it’s typically not worth the investment early on in the lifecycle.
>who don't like "magic" and want to understand where their code is running and what it's doing.
I just made this point in a post on my substack. Especially in regulated industries, you NEED to the able to explain your AI to the regulator. You can't have a situation where a human say "Well, gee I don't know. The AI told me to do it."
I feel like fly.io prioritizes a great developer experience and I think that appeals to engineers who both do and don't like magic.
But the real reason I like fly.io is because it is a new thing that allows for new capabilities. It allows you to build your own Cloudflare by running full virtual machines colocated next to appliances in a global multicast network.
"Enjoys doing linux sysadmin" is not the same as "Wants to understand how things work". It's weird to me that you group those two kinds of people in one bucket.
Move to EKS and you still need a k8s engineer, but one who also knows AWS, and you also pay the AWS premium for the hosting, egress, etc. It might make sense for your use case but I definitely wouldn’t consider it a cost-saving measure.
> There's an (increasingly small) group of software developers who don't like "magic" and want to understand where their code is running and what it's doing.
That problem started so long ago and has gotten so bad that I would be hard pressed to believe there is anyone on the planet who could take a modern consumer pc and explain what exactly is going on the machine without relying on any abstractions to understand the actual physical process.
Given that, it’s only a matter of personal preference on where you draw the line for magic. As other commenters have pointed out, your line allowing for Kubernetes is already surprising to a lot of people
> I'm increasingly coming to the view that there is a big split among "software developers" and AI is exacerbating it
This is admittedly low effort but the vast majority of devs are paid wages to "write CRUD, git push and magic" their way to the end of the month. The company does not afford them the time and privilege of sitting down and analyzing the code with a fine comb. An abstraction that works is good enough.
The seasoned seniors get paid much more and afforded leeway to care about what is happening in the stack, since they are largely responsible for keeping things running. I'm just pointing out it might merely be a function of economics.
I don't think this is entirely correct. I'm working for a company that does IT Consulting and so I see many Teams working on many different Projects and one thing I have learned the hard way is that Companies and Teams that think they should do it all themselves are usually smaller companies and they often have a lot of Problems with that attitude.
Just an example I recently came across: Working for a smaller company that uses Kubernetes and manages everything themselves with a small team. The result: They get hacked regularly and everything they run is constantly out of date because they don't have the capacity to actually manage it themselves. And it's not even cheaper in the long run because Developer Time is usually more expensive than just paying AWS to keep their EKS up to date.
To be fair, in my home lab I also run everything bare metal and keep it updated but I run everything behind a VPN connection and run a security scanner every weekend that automatically kills any service it finds > Medium Level CVE and I fix it when I get the time to do it.
As a small Team I can only fix so much and keep so much up to date before I get overwhelmed or the next customer Project gets forced upon me by Management with Priority 0, who cares about security updates.
I'd strongly suggest to use as much managed service as you can and focus your effort as a team on what makes your Software Unique. Do you really need to hire 2-3 DevOps guys just to keep everything running when GCP Cloud Run "just werks"?
Everything we do these days runs on so many levels of abstraction anyway, it's no shame to share cost of managing the lower levels of abstraction with others (using managed Service) and focus on your product instead. Unless you are large enough to pay for whole teams that deal with nothing but infrastructure to enable other teams to do Application Level Programming you are, in my limited experience, just going to shoot yourself in the foot.
And again, just to emphasize it: I like to do everything myself because for privacy reasons I use as little services that aren't under my control as possible but I would not recommend this to a customer because it's neither economical nor does it work well in my, albeit limited, experience.
I might be an outlier. I like to think I try for a deeper understanding of what I’m using. Like, fly uses firecracker vms afaik. Sometimes, especially for quick projects or testing ideas I just want to have it work without wrangling a bunch of AWS services. I’m typically evaluating is this the right tool or service and what is the price to convenience? For anything potentially long term, what’s the amount of lock in when or if I want to change providers?
I agree that split exists, and that the former is more rare, but in my experience the split is less about avoid magic and more about keeping control of your system.
Many, likely most, developers today don't care about controlling their system/network/hardware. There's nothing wrong with that necessarily, but it is a pretty fundamental difference.
One concern I've had with building LLM features is whether my customers would be okay with me giving their data over to the LLM vendor. Say I'm building a tool for data analysis, is it really okay to a customer for me to give their table schemas or access to the data itself to OpenAI, for example?
I rarely hear that concern raised though. Similarly when I was doing consulting recently, I wouldn't use copilot on client projects as I didn't want copilot servers accessing code that I don't actually own the rights to. Maybe its over protective though, I have never heard anyone raise that concern so maybe its just me.
I work for a major consulting firm and we’ve been threatened with fire and brimstone if any part of client info (code, docs, random email, anything) ever gets sent to an LLM. Even with permission from the client our attack lawyers prefer us not to use them. It’s a very sensitive topic. I still use LLMs from time to time but always starting with a blank prompt and the ask anonymized. (Heh I’m probably not even supposed to do that)
I don't agree, I think you're just describing two sides of the same coin.
As a software developer I want strong abstractions without bloat.
LLMs are so successful in part because they are a really strong abstraction. You feed in text and you get back text. Depending on the model and other parameters your results may be better or worse, but changing from eg. Claude to ChatGPT is as simple as swapping out one request with another.
If what I want is to run AI tasks, then GPUs are a poor abstraction. It's very complicated (as Fly have discovered) to share them securely. The amount of GPU you need could vary dramatically. You need to worry about drivers. You need to worry about all kinds of things. There is very little bloat to the ChatGPT-style abstraction, because the network overhead is a negligable part of the overall cost.
If I say I don't want magic, what I really mean is that I don't trust the strength of the abstraction that is being offered. For example, when a distributed SQL database claims to be PostgreSQL compatible, it might just mean it's wire compatible, so none of my existing queries will actually work. It might have all the same functions but be missing support for stored procedures. The transaction isolation might be a lie. It's not that these databases are bad, it's that "PostgreSQL as a whole" cannot serve as a strong abstraction boundary - the API surface is simply too large and complex, and too many implementation details are exposed.
It's the same reason people like containers: running your application on an existing system is a very poor abstraction. The API surface of a modern linux distro is huge, and includes everything from what libraries come pre-installed to the file-system layout. On the other hand the kernel API is (in comparison) small and stable, and so you can swap out either side without too much fear.
K8S can be a very good abstraction if you deploy a lot of services to multiple VMs and need a lot of control over how they are scaled up and down. If you're deploying a single container to a VM, it's massively bloated.
TLDR: Abstractions can be good and bad, both inherently, and depending on your use-case. Make the right choice based on your needs. Fly are probably correct that their GPU offering is a bad abstraction for many of their customer's needs.
It’s not about wanting, it’s about what the job asks for. As a self employed engineer I am paid to solve business problems in an efficient way. Most of the time it just make more business sense for the client and for me to pay to just have to git push if there is no performance challenges needing custom infrastructure.
All professional developers want two things: Do their work as fast as possible and spend as little budget to make things work. That's the core operating principle of most companies.
What's changing is that managed solutions are becoming increasingly easier to set up and increasingly cheaper on smaller scales.
While I do personally enjoy understanding the entire stack, I can't justify self-hosting and managing an LLM until we run so many prompts a day that it becomes cheaper for us to run our own GPUs compared to just running APIs like OpenAI/Anthropic/Deepseek/...
> There's an (increasingly small) group of software developers who don't like "magic" and want to understand where their code is running and what it's doing. (...) The other group (increasingly large) just wants to `git push` and be done with it
I think we're approaching the point where software development becomes a low-skilled job, because the automatic tools are good enough to serve business needs, while manual tools are too difficult to understand by anyone but a few chosen ones anyway.
I think it's true that engineers who want to understand every layer of everything in depth, or who want to have platform ownership, are not necessarily the same group as the more "product itself" focused sort who want to write something and just push it, I don't actually think I'm sold at all that any of these groups, in a vacuum, have substantial demand for GPU compute unless that's someone's area of interest for a pet project.
This. Personally, I’d want a GPU to self host whatever model, because I think that’s fun, plain and simple. Probably many people do too. But the business is not making money from people who are just thinking about fun.
> The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
lol, even understanding git is hard for them. Increasingly, software engineers don't want to learn their craft.
I think the root of it is most people coming into the software engineering industry just want a good paying job. They don’t have any real interest in computers or networks or anything else. Whatever keeps the direct deposits coming is what they’ll do. And in their defense, the web dev industry is so large in breadth and depth and the pay/benefits are so generous it’s an attractive career path no matter what your passion is.
The way I think about it is this: any individual engineer (or any individual team) has a limited complexity budget (in other words, how much can you fit in your meat brain). How you spend it is a strategic decision. Depending on your project, you may not want to waste it on infra so you can fit a lot of business logic complexity.
increasingly small is right. i'm definitely part of that former group but sadly more and more these days i just feel dumb for being this way. it usually just means that i'm less productive than my colleagues in practice as i'm spending time figuring out how things work while everybody else is pushing commits. maybe if we were put in a hypothetical locked room with no internet access i'd have a slightly easier time than them but that's not helpful to anybody.
once upon a time i could have said that it's better this way and that everybody will be thankful when i'm the only person who can fix something, but at this point that isn't really true when anybody can just get an LLM to walk them through it if they need to understand what's going on under the hood. really i'm just a nerd and i need to understand if i want to sleep at night lol.
The former have the mentality of being independent, at the cost of their ability to produce a result as quickly. The latter are happy to be dependent, because the result is more important than the means. Obviously this is a spectrum.
It depends on the product you're building. At my last job we hosted bespoke controlnet-guided diffusion models. That means k8s+GPUs was a necessity. But I would have loved to use something simpler than k8s.
I don’t think this comment does justice to fly.io.
They have incredible defaults that can make it as simple as just running ‘git push’ but there isn’t really any magic happening, it’s all documented and configurable.
Where does this dichotomy between Kubernetes, and superficial understanding come from? It is not consistent with my experience, and I don't have speculation its origin.
Somebody who doesn’t want to understand DNS, Linux, or anything beyond their framework is a hazard. They’re not able to do a competent code review on the vomit that LLMs produce. (Am I biased much?)
I’ve never laid bricks but in other trades I’ve worked in, well, a lot of people understood basics of the chemistry of the products we used. It’s useful to understand how they work together safely, if they can be exposed to different environments, if they’re heat-safe, cold-safe, do they off-gas, etc.
Paints, wood finishes, adhesives, oils, abrasives, you name it. You generally know at least a bit about what’s in it. I can’t say everyone I’ve worked with wanted to know, but it’s often intrinsic to what you’re doing and why. You don’t just pull a random product off a shelf and use it. You choose it, quite often, because of its chemical composition. I suspect it’s not always thought of this way, though.
This is the same with a lot of artistic mediums as well. Ceramicists often know a lot more than you’d expect about what’s in their clay and glazes. It’s really cool.
I’m not trying to be contrarian here. I know some people don’t care at all, and some people use products because it’s what they were told to do and they just go with it. But that wasn’t my experience most of the time. Maybe I got lucky, haha.
In my country I think the vocational/trade degree -which might lie between HS and uni level- on car mechanics has basic physics, mechanics and maybe some chemistry.
Ditto for the rest of technical voc degrees.
If you think you can do IT without at least a trade degree on understanding the low level components interact, (and I'm not talking about CS level, concurrency with CSP, O-notation, linear+discrete algebra... but basic stuff such as networking protocols, basic SQL database normalizations, system administration, configuration, how the OS boots, how processes work -idle, active, waiting..., if you don't get that, you will be fired faster than anyone around.
It's been a while since I tried, but my experience trying to manually set up GPUs was atrocious, and with investigation generally ending at the closed-source NVidia drivers it's easy to feel disempowered pretty quickly. I think my biggest learning from trying to do DL on a manually set up computer was simply that GPU setup was awful and I never wanted to deal with it. It's not that I don't want to understand it, but with NVidia software you're essentially not allowed to understand it. If open source drivers or open GPU hardware were released, I would gladly learn how that works.
Having worked with many of the latter and having had the displeasure of educating them on nix systems fundamentals: ugh, oof, I hate this timeline, yet I also feel a sense of job security.
We used to joke about this a lot when Java devs would have memory issues and not know how to adjust the heap size in init scripts. So many “CS majors” who are completely oblivious to anything happening outside of the JVM, and plenty happening within it.
Eh, the way I see it the entire practice of computer science and software engineering is built on abstraction -- which can be described as the the ability to not have to understand lower levels -- to only have to understand the API and not the implementations of the lowest levels you are concerned with, and to have to pay even less attention to lower levels than that.
I want to understand every possible detail about my framework and language and libraries. Like I think I understand more than many do, and I want to understand more, and find it fulfilling to learn more. I don't, it's true, care to understand the implementation details of, say, the OS. I want to know the affordances it offers me and the APIs that matter to me, I don't care about how it's implemented. I don't care to understand more about DNS than I need. I definitely don't care to spend my time futzing with kubernetes -- I see it as a tool, and if I can use a different tool (say heroku or fly.io) that lets me not have to learn as much -- so I have more time to learn every possible detail of my language and framework, so I can do what I really came to do, develop solutions as efficiently and maintainably as possible.
You are apparently interested in lower levels of abstraction than I am. Which is fine! Perhaps you do ops/systems/sre and don't deal with the higher levels of abstraction as much as I do -- that is definitely lucrative these days, there are plenty of positions like that. Perhaps you deal with more levels of abstraction but don't go as deep as me -- or, and I totally know it's possible, you just have more brain space to go as deep or deeper on more levels of abstraction as me. But even you probably don't get into the implementation details of electrical engineering and CPU design? Or if you do, and also go deep on frameworks and languages, I think you belong to a very very small category!
But I also know developers who, to me, dont' want to go to deep on any of the levels of abstraction. I admit I look down on them, as I think you do too, they seem like copy-paste coders who will never be as good at developing efficient maintainable soltuions.
I started this post saying I think that's a different axis than what layers of abstraction one specializes in or how far down one wants to know the details. But as I get here, while I still think that's likely, I'm willing to consider that these developers I have not been respecting -- are just going really deep in even higher levels of abstraction than me? Some of them maybe, but honestly I don't think most of them, but I could be wrong!
I was thinking about this just yesterday. I was advertised a device for an aircraft to geo-assist taxiing? (I have never flown so I don’t know why). The comments were the usual “old man shouts at cloud” angry that assistive devices make lives easier for people.
I feel this is similar to what you are pointing out. Why _shouldn’t_ people be the “magic” users. When was the last time one of your average devs looked in to how esm loading? Or the python interpreter or v8? Or how it communicates with the OS and lower level hardware interfacing?
This is the same thing. Only you are goalpost shifting.
> They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
This is baffling. What’s value proposition here? At some point customer will be directly asking an AI agent to create an app for them and it will take care of coding/deployment for them..
Some people became software developers because they like learning and knowing what they're doing, and why and how it works.
Some people became software developers because they wanted to make easy money back when the industry was still advertising bootcamps (in order to drive down the cost of developers).
Some people simply drifted into this profession by inertia.
And everything in-between.
From my experience there are a lot of developers who don't take pride in their work, and just do it because it pays the bills. I wouldn't want to be them but I get it. The thing is that by delegating all their knowledge to the tools they use, they are making themselves easy to replace, when the time comes. And if they have to fix something on their own, they can't. Because they don't understand why and how it works, and how and why it became what it is instead of something else.
So they call me and ask me how that thing works...
My heart stopped for a moment when reading the title. I'm glad they haven't decided to axe GPUs, because fly GPU machines are FANTASTIC!
Extremely fast to start on-demand, reliable and although a little bit pricy but not unreasonably so considering the alternatives.
And the DX is amazing! it's just like any other fly machine, no new set of commands to learn. Deploy, logs, metrics, everything just works out of the box.
Regarding the price: we've tried a well known cheaper alternative and every once in a while on restart inference performance was reduced by 90%. We never figured out why, but we never had any such problems on fly.
If I'm using a cheaper "Marketplace" to run our AI workloads, I'm also not really clear on who has access to our customer's data. No such issues with fly GPUs.
All that to say, fly GPUs are a game changer for us. I could wish only for lower prices and more regions, otherwise the product is already perfect.
I used the fly.io GPUs as development machines.
For that, I generally launch a machine when I need it and scale it to 0 when I am finished. And this is what's really fantastic about fly.io - setting this up takes an hour... and the Dockerfile created in the process can also be used on any other machine.
Here's a project where I used this setup:
https://github.com/li-il-li/rl-enzyme-engineering
This is in stark contrast to all other options I tried (AWS, GCP, LambdaLabs). The fly.io config really felt like something worth being in every project of mine and I had a few occasions where I was able to tell people to sign up at fly.io and just run it right there (Btw. signing up for GPUs always included writing an email to them, which I think was a bit momentum-killing for some people).
In my experience, the only real minor flaw was the already mentioned embedding of the whole CUDA stack into your container, which creates containers that approach 8GB easily. This then lets you hit some fly.io limits as well as creating slow build times.
I have a timeline that I am still trying to work through but it goes like this :
2012 - moores law basically ends - nand gates do t get smaller just more cleverly wrapped. Single threaded execution more or less stops at 2 GHz and has remained there.
2012-2022 - no one notices single threaded is stalled because everything moves to VMs in the cloud - the excess parallel compute from each generation is just shared out in data centres
2022 - data centres realise there is no point buying the next generation of super chips with even more cores because you make massive capital investments but cannot shovel 10x or 100x processes in because Amdahls law means standard computing is not 100% parallel
2022 - but look, LLMs are 100% parallel hence we can invest capital once again
2024 - this is the bit that makes my noodle - wafer scale silicon. 900,000 cores with GBs SRAM - these monsters run Llama models 10x faster than A100s
We broke moores law and hardware just kept giving more parallel cores because that’s all they can do.
And now software needs to find how to use that power - because dammit, someone can run their code 1 million times faster than a competitor - god knows what that means but it’s got to mean something - but AI surely cannot be the only way to use 1M cores?
I’m surprised nobody has yet (as of this writing) pointed out that Moore’s Law never claimed anything about single threaded execution or clock rates. Moore’s Law is that the number of transistors doubles every two years, and that trend has continued since 2012.
It looks like maybe the slope changed slightly starting around 2006, but it’s funny because this comment ends complaining that Moore’s Law is too good after claiming it’s dead. Yes, software needs to deal with the transistor count. Yes, parallel architectures fit Moore’s law. The need to go to more parallel and more parallel because of Moore’s Law was predicted, even before 2006. It was a talking point in my undergrad classes in the 90s.
>. Single threaded execution more or less stops at 2 GHz and has remained there.
2012-2022 - no one notices single threaded is stalled because everything moves to VMs in the cloud
Single Thread execution, I assume you mean IPC or may be more accurately as PPC ( Performance Per Clock ) has improved steadily if you accounted for ARM design and not just x86. That is why M1 was so surprising to everyone because most (all) thought Geekbench score on Phone doesn't translate to Desktop and somehow M1 went from nonsense to breakthrough.
Clockspeed also went from 2Ghz to 5Ghz and we are pushing 4Ghz on Mobile Phone already.
And Moores law, in terms of transistor density ends when Intel couldn't deliver 10nm on time, so 2016 / 2017 give or take. But that doesn't mean transistor density is not improving.
The most surprising thing about M1 was the energy efficiency and price/performance point they hit. It had been known for a couple of years that the phone SOCs were getting really good, just that being passively cooled inside a phone case only allows them 1-2 seconds of max bursts.
The unsung hero of early computing was Dennard scaling. Taking CPUS from 10MHz to 2GHz, all alongside massive per-clock efficiency improvements must have been a crazy time.
From a 50MHz 486 in 1990 to a 1.4GHz P3 in 2000 is a factor of 28 improvement in speed solely due to clock speed! Add on all the other multiplicative improvements from IPC...
The greatest increase in clock frequency has been in the decade 1993-2003, when the clock frequency has increased 50 times (from a 66 MHz Pentium to a 3.2 GHz Pentium 4).
Since then, in more than 20 years, the clock frequency has increased only 2 times, while in the previous decade (1983-1993) it had increased only about 5 times, where a doubling of the clock frequency (33 to 66 MHz) had occurred between 1989 and 1993 (for cheap CPUs with MOS logic, because expensive CPUs using ECL had reached 80 MHz already during the seventies).
Also, Pentium III has reached 1.4 GHz only in early 2002, not in 2000, while 80486 has reached 50 GHz only in 1991, not in 1990.
>2012 - moores law basically ends - nand gates do t get smaller just more cleverly wrapped. Single threaded execution more or less stops at 2 GHz and has remained there.
My computer was initially built in 2014 and the CPU runs up to 3 GHz (and I don't think it was particularly new on the market). CPUs made today overclock to 5+ GHz. In what sense did "single threaded execution more or less stop at 2 GHz and remain there"?
We might not be seeing exponential increases in e.g. transistor count with the same doubling period as before, but there has demonstrably been considerable improvement to traditional CPUs in the last 12+ years.
So the size of nand gates is still roughly 28nm, but if you measure a car from above its size is fixed, if you stand the car on its nose you can measure it from above and it’s “smaller”, this is FinFet, then fold down the roof and the wheels and that’s roughly GAA. The car size stays the same - the parking density increases. It’s more marketing than reality but density is up …
As for clock speeds, yes and no, basically thermal limits stop most CPUs running full time at full speed - the problem was obvious back in the day - I would build PCs and carefully apply thermal paste to the plastic casing of a chip - thus waiting for the heat of the transistors to heat up the plastic to remove the waste. Yes they are working on thermal something something directly on the layers of silicon.
I am still feeling my way through these ideas, but think perhaps of an alternative universe where instead of getting cleverer with instruction pipelining (guessing what the next CPU will ask for and using the silicon to work that out), hardware had just added more parallel cores - so it did not need to guess the next instructions, it just went faster because the instructions were in parallel because we magically solved software and developers.
You could have a laptop with 1000 cores on it - simple 32/64 bit CPUs that just ran full pelt.
The lack of parallelism drove decisions to take silicon and make it do stuff that was not run everything faster. But to focus on getting one instruction set through one core faster.
AI has arrived and found a world of silicon that it by coincidence can use every transistor for going full pelt - and the CPUs we think of in our laptops are using only a fraction of their transistors for full pedal to the metal processing and the rest is … legacy??
> We broke moores law and hardware just kept giving more parallel cores because that’s all they can do.
You get more cores because transistor density didn't stop increasing, software devs/compiler engineers just can't think of anything better to do with the extra real estate!
> Single threaded execution more or less stops at 2 GHz and has remained there.
There are other semiconductor materials that do not have the heat limits of silicon-based FETs and have become shockingly cheap and small (for example, a 200W power supply the size of a wallet that doesn't catch on fire). We're using these materials for power electronics and RF/optics today but they're nowhere close to FinFETs from a few years ago or what they're doing today. That's because all the fabrication technology and practices have yet to be churned out for these new materials (and it's not just UV lasers), but they're getting better, and there will one day be a mcu made from wide bandgap materials that cracks 10GHz in a consumer device.
Total aside, hardware junkies love talking cores and clock speeds, but the real bottlenecks for HPC are memory and i/o bandwidth/latency. That's why the future is optical, but the technology for even designing and experimenting with the hardware is in its infancy.
Plenty of non-IT applications use lots of cores, e.g. physics simulations, constraint solving, network simulation used to plan roads or electrical distribution, etc.
Yes - but the amount of code that loves Amdahls law, is tiny compared to amount code churned out each day that can never run parallel over 1M cores - no matter how clever a compiler gets.
I cannot work out if we pack enough parallel problems in the world or just lack a programming language to describe them
> 2012 - moores law basically ends - nand gates do t get smaller just more cleverly wrapped. Single threaded execution more or less stops at 2 GHz and has remained there.
A 2GHz core from a 2012 is extremely slow compared to a 2GHz core of a modern CPU. The difference could be an order of magnitude.
There is more to scaling CPUs than the clock speed. Modern CPUs process many more instructions per clock on average.
Edit: it’s worth expanding on the data centre costs issues - if it’s fair to say we have stalled on “free”speed ups for standard software (ie C/Linux) - that is clock speeds more or less stopped and we get more 64 bit cores but each core is more or less no faster (hand wavy), then the amount of clients that can be packed into a data centre stays the same - you are renting out CPUs in a Docker VM - that’s basically one per Core(#). And while wafer scale gives you 900,000 cores in a 1u server, normal CPUs gives you what 64? 128?
Suddenly your cost for building a new data centre is something like twice the cost of the previous one (cold gets more expensive etc) and yet you only sell same amount of space. It’s not an attractive business in first place.
This was the push for lambda architecture etc - depending on usage you could have hundreds of people buying the same core. I would have put a lot of cash into making something that spins up docker instances so fast it’s like lambda - and guess what fly.io does?
I think fly.io’s obsession with engineering led them down a path of positive capital usage while AWS focused on rolling out new products on a tougher capital process.
Anyway - AI is the only thing that dense multi core data centres can use that packs in many many users compared to docker per core.
Unless we all learn how to think and code in parallel, we are leaving a huge amount of hardware gains on the table. And those gains are going to 100x in the next ten years and my bet is my janky-ass software will still not be able to use it - there will be a bifurcation of specialist software engineers who work in domains and with tooling that is embarrassingly parallel, and the rest of us will be on fly.io :-)
(#) ok so maybe 3 or 4 docker per core, with hyper visor doling out time slots, but much more than that and performance is a dog, and so the number of “virtual CPUs” you can sell is a limited number and creeps up despite hardware leaping ahead … the point I am making
Web applications and APIs for mobile apps are embarrassingly parallel, but many modern web languages and frameworks went back to the stone age on parallelism.
Ancient Java servlets back in the early 2000s were more suitable for performance on current gen hardware than modern NodeJS, Python etc...
I shelled out for a 4090 when they came out thinking it would be the key factor for running local llms. It turns out that anything worth running takes way more than 24GB VRAM. I would have been better off with 2+ 3090s and a custom power supply. It’s a pity because I thought it would be a great solution for coding and a home assistant, but performance and quality isn’t there yet for small models (afaik). Perhaps DIGITS will scratch the itch for local LLM developers, but performant models really want big metal for now, not something I can afford to own or rent at my scale.
There was a post on r/localLlama the other day about a presentation by the company building Digits hardware for Nvidia. The gist was that Digits is going to be aimed at academic AI research folks and as such don't expect them to be available in large numbers (at least not for this first version). It was disappointing. Now I'm awaiting the AMD Strix Halo based systems.
I started buying Macs with more memory, no regrets. An M4 Max with 64GB (in a laptop, no less!) runs most small models comfortably (but get 96GB or more if you really intend to use 70B models regularly). And when I'm not running LLMs, the memory is useful for other stuff.
Gosh. Good thing I haven't bought a GPU in almost a decade. With a little luck I'll catch this wave on the back end. I haven't had to learn web or mobile development thoroughly either
I haven't tested programming tasks with a local LLM vs. say, Claude 3.5. But it is nice to be able to run 14-32B LLMs locally and get an instant response. I have a single 3090.
With that said, this seems quite obvious - the type of customer that chooses Fly, seems like the last person to be spinning up dedicated GPU servers for extended periods of time. Seems much more likely they'll use something serverless which requires a ton of DX work to get right (personally I think Modal is killing it here). To compete, they would have needed to bet the company on it. It's way too competitive otherwise.
As someone who deploys a lot of models on rented GPU hardware, their pricing is not realistic for continous usage.
They're charging hyperscaler rates, and anyone willing to pay that much won't go with Fly.
For serverless usage they're only mildly overpriced compared to say Runpod, but I don't think of serverless as anything more than an onramp to renting dedicated machine, so it's not surprising to hear it's not taking off.
GPU workloads tend to have terrible cold-start performance by their nature, and without a lot of application specific optimizations it rarely ends up making financial sense to not take a cheaper continous option if you have an even mildly consistent workload. (and if you don't then you're not generating that much money for them)
My thing here is just: people self-hosting LLMs think about performance in tokens/sec, and we think about performance in terms of ms/rtt; they're just completely different scales. We don't really have a comparative advantage for developers who are comfortable with multisecond response times. And that's fine!
> The biggest problem: developers don’t want GPUs. They don’t even want AI/ML models. They want LLMs.
Fly.io seems to attract similar developers as Cloudflare’s Workers platform. Mostly developers who want a PaaS like solution with good dev UX.
If that’s the case, this conclusion seems obvious in hindsight (hindsight is a bitch). Developers who are used to having infra managed for them so they can build applications don’t want to start building on raw infra. They want the dev velocity promise of a PaaS environment.
Cloudflare made a similar bet with GPUs I think but instead stayed consistent with the PaaS approach by building Workers AI, which gives you a lot of open LLMs and other models out of box that you can use on demand. It seems like Fly.io would be in a good position to do something similar with those GPUs.
It's really a shame GPU slices aren't a thing -- a monthly cost of $1k for "a GPU" is just so far outside of what I could justify. I guess it's not terrible if I can batch-schedule a mega-gpu for an hour a day to catch up on tasks, but then I'm basically still looking at nearly $50/month.
I don't know exactly what type of cloud offering would satisfy my needs, but what's funny is that attaching an AMD consumer GPU to a Raspberry Pi is probably the most economical approach for a lot of problems.
Maybe something like a system where I could hotplug a full GPU into a system for a reservation of a few minutes at a time and then unplug it and let it go back into a pool?
FWIW it's that there's a large number of ML-based workflows that I'd like to plug into progscrape.com, but it's been very difficult to find a model that works without breaking the hobby-project bank.
Do you think that you can use those machines for confidential workflows for enterprise use? I'm currently struggling to balance running inference workloads on expensive AWS instances where I can trust that data remains private vs using more inexpensive platforms.
i use them a lot and constantly forget to turn mine off and it just drains my credits. i really need to write a job to turn them off when it's idle for longer than 20minutes
It also seems they got caught in the middle of the system integrator vs product company dilemma.
To me fly's offering reads like a system integrator"s solution. They assemble components produced mainly by 3rd parties into an offered solution. The business model of a system integrator thrives on doing the least innovation/custom work possible for providing the offering. You posotion yourself to take maximal advantage of investments and innovations driven by your 3rd party suppliers. You want to be squarely on their happy path.
Instead this artcle reads like fly, with good intention, was trying to divert their tech suppliers offer stream into niche edge cases outside of maistream support.
This can be a valid strategy for products very late into their maturity lifecycle where core innovation is stagnant, but for the current state of AI with extremely rapid innovation waves coarsing through the market, that strategy is doomed to fail.
I'm increasingly coming to the view that there is a big split among "software developers" and AI is exacerbating it. There's an (increasingly small) group of software developers who don't like "magic" and want to understand where their code is running and what it's doing. These developers gravitate toward open source solutions like Kubernetes, and often just want to rent a VPS or at most a managed K8s solution. The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
A company like fly.io absolutely appeals to the latter. GPU instances at this point are very much appealing to the former. I think you have to treat these two markets very differently from a marketing and product perspective. Even though they both write code, they are otherwise radically different. You can sell the latter group a lot of abstractions and automations without them needing to know any details, but the former group will care very much about the details.
Kubernetes is not the first thing that comes to mind when I think of "understanding where their code is running and what it's doing"...
Just an “idle” Kubernetes system is a behemoth to comprehend…
In the end it's a scheduler for Docker containers on a bunch of virtual or bare metal machines. Once you get that in your head life becomes much more easy.
The only thing I'd really love to see from an ops perspective is a way to force-revive crashed containers for debugging. Yes, one shouldn't have to debug cattle, just haul the carcass off and get a new one... but I still prefer to know why the cattle died.
* Host hundreds or thousands of interacting containers across multiple teams in sane manner * Let's you manage and understand how is it done in the full extent.
Of course there are tons of organizations that can (and should) easily resign from one of these, but if you need both, there isn't better choice right now.
People started using K8s for training, where you already had a network isolated cluster. Extending the K8s+container pattern to multi-tenant environments is scary at best.
I didn't understand the following part though.
> Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map virtualized GPUs into Intel Cloud Hypervisor.
Why was this part so hard? Doing PCI passthrough with the Cloud Hypervisor (CH) is relatively common. Was it the transition from Firecracker to CH that was tricky?
Bonus points for writing a basic implementation from first principles capturing the essence of the problem kubernetes really was meant to solve.
The 100 pages kubernetes book, Andriy Burkov style.
I really wonder why this opinion is so commonly accepted by everyone. I get that not everything needs most Kubernetes features, but it's useful. The Linux kernel is a dreadfully complex beast full of winding subsystems and full of screaming demons all over. eBPF, namespaces, io_uring, cgroups, SE Linux, so much more, all interacting with eachother in sometimes surprising ways.
I suspect there is a decent likelihood that a lot of sysadmins have a more complete understanding of what's going on in Kubernetes than in Linux.
That's assuming you have a solid foundation in the nuts and bolts of how computers work to begin with.
If you just jumped into software development without that background, well, you're going to end up in the latter pool of developers as described by the parent comment.
If you're running one team with all services trusting each other, you don't have problems solved by these things. Whenever you introduce a CNCF component outside core kubernetes, invest time in understanding it and why it does what it does. Nothing is "deploy and forget" and will need to be regularly checked and upgraded, and when issues come up you need some architecture-level of the component to troubleshoot because so many moving parts are there.
So if I can get away writing my own cronjob in 1000 lines rather than installing something from GitHub with a helm chart, I will go with the former option.
(Helm is crap though, but you often won't have much choice).
But yeah, the argument could have as well just said running code on a VPS directly, because that also gives you a good deal of control.
> The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
I'm a "full full-stack" developer because I understand what happens when you type an address into the address bar and hit Enter - the DNS request that returns a CNAME record to object storage, how it returns an SPA, the subsequent XHR requests laden with and cookies and other goodies, the three reverse proxies they have to flow through to get to before they get to one of several containers running on a fleet of VMs, the environment variable being injected by the k8s control plane from a Secret that tells the app where the Postgres instance is, the security groups that allow tcp/5432 from the node server to that instance, et cetera ad infinitum. I'm not hooking debuggers up to V8 to examine optimizations or tweaking container runtimes but I can speak intelligently to and debug every major part of a modern web app stack because I feel strongly that it's my job to be able to do so (and because I've worked places where if I didn't develop that knowledge then nobody would have).
I can attest that this type of thinking is becoming increasingly rare as our industry continues to specialize. These considerations are now often handled by "DevOps Engineers" who crank out infra and seldom write code outside of Python and bash glue scripts (which is the antithesis to what DevOps is supposed to be, but I digress). I find this unfortunate because this results in teams throwing stuff over the wall to each other which only compounds the hand-wringing when things go wrong. Perhaps this is some weird psychopathology of mine but I sleep much better at night knowing that if I'm on the hook for something I can fix it once it's out in the wild, not just when I'm writing features and debugging it locally.
Particularly at startups, it’s almost always more cost effective to hit that “scale up” button from our hosting provider than do any sort of actual system engineering.
Eventually, someone goes “hey we could save $$$$ by doing XYZ” so we send someone on a systems engineering journey for a week or two and cut our bill in half.
None of it really matters, though. We’re racing against competition and runway. A few days less runway isn’t going to break a startup. Not shipping as fast as reasonable will.
The closer your “Scale up” button is referencing actual hardware, the less of a problem it is.
Over time we will move further away. If the cost of an easily managed solution is low enough, why do the details matter?
Are we? We're constantly changing abstractions, but we don't keep adding them all that often. Operating systems and high-level programming languages emerged in the 1960s. Since then, the only fundamentally new layer of abstraction were virtual machines (JVM, browser JS, hardware virtualization, etc). There's still plenty of hardware-specific APIs, you still debug assembly when something crashes, you still optimize databases for specific storage technologies and multimedia transcoders for specific CPU architectures...
At my job, a decade ago our developers understood how things worked, what was running on each server, where to look if there were problems, etc. Now the developers just put magic incantations given to them by the "DevOps team" into their config files. Most of them don't understand where the code is running, or even what much of it is doing. They're unable or unwilling to investigate problems on their own, even if they were the cause of the issue. Even getting them to find the error message in the logs can be like pulling teeth. They rely on this support team to do the investigation for them, but continually swiveling back-and-forth is never going to be as efficient as when the developer could do it all themselves. Not to mention it requires maintaining said support team, all those additional salaries, etc.
(I'm part of said support team, but I really wish we didn't exist. We started to take over Ops responsibilities from a different team, but we ended up taking on Dev ones too and we never should've done that.)
This statement encapsulates nearly everything that I think is wrong with software development today. Captured by MBA types trying to make a workforce that is as cheap and replaceable as possible. Details are simply friction in a machine that is obsessed with efficiency to the point of self-immolation. And yet that is the direction we are moving in.
Details matter, process matters, experience and veterancy matters. Now more than ever.
My comment elsewhere goes into but more detail but basically silicon stopped being able to make single threaded code faster in about 2012 - we just have been getting “more parallel cores” since. And now at wafer scale we see 900,000 cores on a “chip”. When 100% parallel coding runs 1 million times faster than your competitors, when following one software engineering path leads to code that can run 1M X, then we will find ways to use that excess capacity - and the engineers who can do it get to win.
I’m not sure how LLMs face this problem.
I think that if the development side knew a little bit of the rest of the stack they'd write better applications overall.
A fantastic talk.
I’ll use my stupid hobby home server stuff as an example. I tossed the old VMware box years ago. You know what I use now? Little HP t6x0 thin clients. They are crappy little x86 SoCs with m2 slots, up to 32GB memory and they can be purchased used for $40. They aren’t fast, but perform better than the cheaper AWS and GCP instances.
In that a trivial use case? Absolutely. Now move from $30 to about $2000. Buy a Mac Mini. It’s a powerful arm soc with ridiculously fast storage and performance. Probably more compute than a small/mid size company computer room a few years ago and more performant than a $1M SAN a decade ago.
6G will bring 10gig cellular.
Hyperscalers datacenters are the mainframe of 2025.
Have you ever had a plumber, HVAC tech, electrician, etc. come out to your house for something, and had them explain it to you? Have you had the unfortunate experience of that happening more than once (with separate people)? If so, you should know why this matters: because if you don’t understand the fundamentals, you can’t possibly understand the entire system.
It’s the same reason why the U.S. Navy Nuclear program still teaches Electronics Technicians incredibly low-level things like bus arbitration on a 386 (before that, it was the 68000). Not because they expect most to need to use that information (though if necessary, they carry everything down to logic analyzers), but because if you don’t understand the fundamentals, you cannot understand the abstractions. Actually, the CPU is an abstraction, I misspoke: they start by learning electron flow, then moving into PN junctions, then transistors, then digital logic, and then and only then do they finally learn how all of those can be put together to accomplish work.
Incidentally, former Navy Nukes were on the initial Google SRE team. If you read the book [0], especially Chapter 12, you’ll get an inkling about why this depth of knowledge matters.
Do most people need to understand how their NIC turns data into electrical signals? No, of course not. But occasionally, some weird bug emerges where that knowledge very much matters. At some point, most people will encounter a bug that they are incapable of reasoning about, because they do not possess the requisite knowledge to do so. When that happens, it should be a humbling experience, and ideally, you endeavor to learn more about the thing you are stuck on.
[0]: https://sre.google/sre-book/table-of-contents/
The more the big cloud providers can abstract cpu cycles, memory, networking, storage etc, the more they don’t have to compete with others doing the same.
If that were true, you might be right.
What happens in reality is that things are promised to work and (at best) fulfill that promise so long as no developers or deployers or underlying systems or users deviate from a narrow golden path, but fail in befuddling ways when any of those constraints introduce a deviation.
And so what we see, year over year, is continued enshittening, with everything continuously pushing the boundaries of unreliability and inefficiency, and fewer and fewer people qualified to actually dig into the details to understand how these systems work, how to diagnose their issues, how to repair them, or how to explain their costs.
> If the cost of an easily managed solution is low enough, why do the details matter?
Because the patience that users have for degraded quality, and the luxury that budgets have for inefficiency, will eventually be exhausted and we'll have collectively led ourselves into a dark forest nobody has the tools or knowledge to navigate out of anymore.
Leveraging abstractions and assembling things from components are good things that enable rapid exploration and growth, but they come with latent costs that eventually need to be revisited. If enough attention isn't paid too understanding, maintaining, refining, and innovating on the lowest levels, the contraptions built through high-level abstraction and assempbly will eventually either collapse upon themselves or be flanked by competitors who struck a better balance and built on more refined and informed foundations.
As a software engineer who wants a long and satisfying career, you should be seeking to understand your systems to as much depth as you can, making informed, contextual choices about what abstractions you leverage, exactly what they abstract over, and what vulnerabilities and limitations are absorbed into your projects by using them. Just making naive use of the things you found a tutorial for, or that are trending, or that make things look easy today, is a poison to your career.
Because vertical scaling is now large enough that I can run all of twitter/amazon on one single large server. And if I'm wrong now, in a decade I won't be.
Compute power grows exponentially, but business requirements do not.
In my personal life, I’m curiosity-oriented, so I put my blog, side projects and mom’s chocolate shop on fully self hosted VPSs.
At my job managing a team of 25 and servicing thousands of customers for millions in revenue, I’m very results-oriented. Anyone who tries to put a single line of code outside of a managed AWS service is going to be in a lot of trouble with me. In a results-oriented environment, I’m outsourcing a lot of devops work to AWS, and choosing to pay a premium because I need to use the people I hire to work on customer problems.
Trying to conflate the two orientations with mindsets / personality / experience levels is inaccurate. It’s all about context.
One end is PaaS like Heroku, where you just git push. The other end is bare metal hosting.
Every option you mentioned (VPS, Manages K8S, Self Hosted K8S, etc) they all fall somewhere between these two ends of the spectrum.
If, a developer falls into any of these "groups" or has a preference/position on any of these solutions, they are just called juniors.
Where you end up in this spectrum is a matter of cost benefit. Nothing else. And that calculation always changes.
Those options only make sense where the cost of someone else managing it for you for a small premium gets higher than the opportunity/labor cost of you doing it yourself.
So, as a business, you _should_ not have a preference to stick to. You should probably start with PaaS, and as you grow, if PaaS costs get too high, slowly graduate into more self-managed things.
A company like fly.io is a PaaS. Their audience has always been, and will always be application developers who prefer to do nothing low-level. How did they forget this?
This is where I see things too. When you start out, all your value comes from working on your core problem.
eg: You'd be crazy to start a CRM software business by building your own physical datacenter. It makes sense to use a PaaS that abstracts as much away as possible for you so you can focus on the actual thing that generates value.
As you grow, the high abstraction PaaS gets increasingly expensive, and at some point bubbles up to where it's the most valuable thing to work on. This typically means moving down a layer or two. Then you go back to improving your actual software.
You go through this a bunch of times, and over time grow teams dedicated to this work. Given enough time and continuous growth, it should eventually make sense to run your own data centers, or even build your own silicon, but of course very few companies get to that level. Instead most settle somewhere in the vast spectrum of the middle, with a mix of different services/components all done at different levels of abstraction.
You're not wrong that there's a PaaS/public-cloud dividing line, and that we're at an odd place between those two things. But I mean, no, it is not the case that our audience is strictly developers who do nothing low-level. I spent months of my life getting _UDP_ working for Fly apps!
I agree it’s worthwhile to understand things more deeply but developers slowly moving up layers of abstractions seems like it’s been a long term trend.
I don't think this split exists, at least in the way you framed it.
What does exist is workload, and problems that engineers are tasked with fixing. If you are tasked with fixing a problem or implementing a feature, you are not tasked with learning all the minute details or specifics of a technology. You are tasked with getting shit done, which might even turn out to not involve said technology. You are paid to be a problem-solver, not an academic expert on a specific module.
What you tried to describe as "magic" is actually the balance between broad knowledge vs specialization, or being a generalist vs specialist. The bulk of the problems that your average engineer faces requires generalists, not specialists. Moreover, the tasks that actually require a specialist are rare, and when those surface the question is always whether it's worth to invest in a specialist. There are diminished returns on investment, and throwing a generalist at the problem will already get some results. You give a generalist access to a LLM and he'll cut down on the research time to deliver something close to what a specialist would deliver. So why bother?
With this in mind, I would go as far as to frame a scenario backhandedly described as "want to understand where their code is running and what it's doing" (as if no engineer needs to have insight on how things work?) as opposed to the dismissive "just wants to `git push` and be done with it" scenario, can actually be classified as a form of incompetence. You,as an engineer, only have so many hours per day. Your day-to-day activities involve pushing new features and fixing new problems. To be effective, your main skillet is learn the system in a JIT way, dive in, fix it, and move on. You care about system traits, not low-level implementation details that may change tomorrow on a technology you may not even use tomorrow. If, instead, you feel the need to waste time on topics that are irrelevant to address the immediate needs of your role, you are failing to deliver value. I mean, if you frame yourself as a Kubernetes expert who even know commit hashes by heart, does that matter if someone asks you, say, why is a popup box showing off-center?
I want to understand LLMs. I want to understand my compiler, my gc, my type system, my distributed systems.
On the other hand, I don't really care about K8s or anything else, as long as I have something that works. Just let me `git push` and focus on making great things elsewhere.
this feels right to me. application development and platform development are both software development tasks, and lots of software devs do both. i like working on platform-level stuff, and i like building applications. but i like there to be a good distinction between the two, and when i'm working on application-level stuff, i don't want to have to think about the platform.
services like fly.io do a good job of hiding all the platform level work and just giving you a place to deploy your application to, so when they start exposing tools like GPUs that are more about building platforms than building applications, it's messy.
Increasingly, Fly even lets you dip into most complex configurations too.
I’ve got no issue with using Tofu and Ansible to manage my own infrastructure but it takes time to get it right and it’s typically not worth the investment early on in the lifecycle.
I just made this point in a post on my substack. Especially in regulated industries, you NEED to the able to explain your AI to the regulator. You can't have a situation where a human say "Well, gee I don't know. The AI told me to do it."
But the real reason I like fly.io is because it is a new thing that allows for new capabilities. It allows you to build your own Cloudflare by running full virtual machines colocated next to appliances in a global multicast network.
May just be my naïveté, but I thought that something like ECS or EKS is much cheaper than an in-house k8 engineer.
It’s always baffling to me why people think that ECS or god forbid EKS is somehow easier than a few Linux boxes.
That problem started so long ago and has gotten so bad that I would be hard pressed to believe there is anyone on the planet who could take a modern consumer pc and explain what exactly is going on the machine without relying on any abstractions to understand the actual physical process.
Given that, it’s only a matter of personal preference on where you draw the line for magic. As other commenters have pointed out, your line allowing for Kubernetes is already surprising to a lot of people
This is admittedly low effort but the vast majority of devs are paid wages to "write CRUD, git push and magic" their way to the end of the month. The company does not afford them the time and privilege of sitting down and analyzing the code with a fine comb. An abstraction that works is good enough.
The seasoned seniors get paid much more and afforded leeway to care about what is happening in the stack, since they are largely responsible for keeping things running. I'm just pointing out it might merely be a function of economics.
Just an example I recently came across: Working for a smaller company that uses Kubernetes and manages everything themselves with a small team. The result: They get hacked regularly and everything they run is constantly out of date because they don't have the capacity to actually manage it themselves. And it's not even cheaper in the long run because Developer Time is usually more expensive than just paying AWS to keep their EKS up to date.
To be fair, in my home lab I also run everything bare metal and keep it updated but I run everything behind a VPN connection and run a security scanner every weekend that automatically kills any service it finds > Medium Level CVE and I fix it when I get the time to do it.
As a small Team I can only fix so much and keep so much up to date before I get overwhelmed or the next customer Project gets forced upon me by Management with Priority 0, who cares about security updates.
I'd strongly suggest to use as much managed service as you can and focus your effort as a team on what makes your Software Unique. Do you really need to hire 2-3 DevOps guys just to keep everything running when GCP Cloud Run "just werks"?
Everything we do these days runs on so many levels of abstraction anyway, it's no shame to share cost of managing the lower levels of abstraction with others (using managed Service) and focus on your product instead. Unless you are large enough to pay for whole teams that deal with nothing but infrastructure to enable other teams to do Application Level Programming you are, in my limited experience, just going to shoot yourself in the foot.
And again, just to emphasize it: I like to do everything myself because for privacy reasons I use as little services that aren't under my control as possible but I would not recommend this to a customer because it's neither economical nor does it work well in my, albeit limited, experience.
Many, likely most, developers today don't care about controlling their system/network/hardware. There's nothing wrong with that necessarily, but it is a pretty fundamental difference.
One concern I've had with building LLM features is whether my customers would be okay with me giving their data over to the LLM vendor. Say I'm building a tool for data analysis, is it really okay to a customer for me to give their table schemas or access to the data itself to OpenAI, for example?
I rarely hear that concern raised though. Similarly when I was doing consulting recently, I wouldn't use copilot on client projects as I didn't want copilot servers accessing code that I don't actually own the rights to. Maybe its over protective though, I have never heard anyone raise that concern so maybe its just me.
As a software developer I want strong abstractions without bloat.
LLMs are so successful in part because they are a really strong abstraction. You feed in text and you get back text. Depending on the model and other parameters your results may be better or worse, but changing from eg. Claude to ChatGPT is as simple as swapping out one request with another.
If what I want is to run AI tasks, then GPUs are a poor abstraction. It's very complicated (as Fly have discovered) to share them securely. The amount of GPU you need could vary dramatically. You need to worry about drivers. You need to worry about all kinds of things. There is very little bloat to the ChatGPT-style abstraction, because the network overhead is a negligable part of the overall cost.
If I say I don't want magic, what I really mean is that I don't trust the strength of the abstraction that is being offered. For example, when a distributed SQL database claims to be PostgreSQL compatible, it might just mean it's wire compatible, so none of my existing queries will actually work. It might have all the same functions but be missing support for stored procedures. The transaction isolation might be a lie. It's not that these databases are bad, it's that "PostgreSQL as a whole" cannot serve as a strong abstraction boundary - the API surface is simply too large and complex, and too many implementation details are exposed.
It's the same reason people like containers: running your application on an existing system is a very poor abstraction. The API surface of a modern linux distro is huge, and includes everything from what libraries come pre-installed to the file-system layout. On the other hand the kernel API is (in comparison) small and stable, and so you can swap out either side without too much fear.
K8S can be a very good abstraction if you deploy a lot of services to multiple VMs and need a lot of control over how they are scaled up and down. If you're deploying a single container to a VM, it's massively bloated.
TLDR: Abstractions can be good and bad, both inherently, and depending on your use-case. Make the right choice based on your needs. Fly are probably correct that their GPU offering is a bad abstraction for many of their customer's needs.
Deleted Comment
I prefer to either manage software directly with no wrappers on top, or use a fully automated solution.
K8S is something I'd rather avoid. Do you enjoy writing configuration for your automation layer?
What's changing is that managed solutions are becoming increasingly easier to set up and increasingly cheaper on smaller scales.
While I do personally enjoy understanding the entire stack, I can't justify self-hosting and managing an LLM until we run so many prompts a day that it becomes cheaper for us to run our own GPUs compared to just running APIs like OpenAI/Anthropic/Deepseek/...
I think we're approaching the point where software development becomes a low-skilled job, because the automatic tools are good enough to serve business needs, while manual tools are too difficult to understand by anyone but a few chosen ones anyway.
Deleted Comment
lol, even understanding git is hard for them. Increasingly, software engineers don't want to learn their craft.
once upon a time i could have said that it's better this way and that everybody will be thankful when i'm the only person who can fix something, but at this point that isn't really true when anybody can just get an LLM to walk them through it if they need to understand what's going on under the hood. really i'm just a nerd and i need to understand if i want to sleep at night lol.
They have incredible defaults that can make it as simple as just running ‘git push’ but there isn’t really any magic happening, it’s all documented and configurable.
tell me whether there's many brick layers who wants to understand the chemical composition of their bricks.
Paints, wood finishes, adhesives, oils, abrasives, you name it. You generally know at least a bit about what’s in it. I can’t say everyone I’ve worked with wanted to know, but it’s often intrinsic to what you’re doing and why. You don’t just pull a random product off a shelf and use it. You choose it, quite often, because of its chemical composition. I suspect it’s not always thought of this way, though.
This is the same with a lot of artistic mediums as well. Ceramicists often know a lot more than you’d expect about what’s in their clay and glazes. It’s really cool.
I’m not trying to be contrarian here. I know some people don’t care at all, and some people use products because it’s what they were told to do and they just go with it. But that wasn’t my experience most of the time. Maybe I got lucky, haha.
Ditto for the rest of technical voc degrees.
If you think you can do IT without at least a trade degree on understanding the low level components interact, (and I'm not talking about CS level, concurrency with CSP, O-notation, linear+discrete algebra... but basic stuff such as networking protocols, basic SQL database normalizations, system administration, configuration, how the OS boots, how processes work -idle, active, waiting..., if you don't get that, you will be fired faster than anyone around.
Who owns and depreciates the logs, backups, GPUs, and the database(s)?
K8s docs > Scheduling GPUs: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus... :
> Once you have installed the plugin, your cluster exposes a custom schedulable resource such as amd.com/gpu or nvidia.com/gpu.
> You can consume these GPUs from your containers by requesting the custom GPU resource, the same way you request cpu or memory
awesome-local-ai: Platforms / full solutions https://github.com/janhq/awesome-local-ai?platforms--full-so...
But what about TPUs (Tensor Processing Units) and QPUs (Quantum Processing Units)?
Quantum backends: https://github.com/tequilahub/tequila#quantum-backends
Kubernetes Device Plugin examples: https://kubernetes.io/docs/concepts/extend-kubernetes/comput...
Kubernetes Generic Device Plugin: https://github.com/squat/generic-device-plugin#kubernetes-ge...
K8s GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator...
Re: sunlight server and moonlight for 120 FPS 4K HDR access to GPU output over the Internet: https://github.com/kasmtech/KasmVNC/issues/305#issuecomment-... :
> Still hoping for SR-IOV in retail GPUs.
> Not sure about vCPU functionality in GPUs
Process isolation on vCPUs with or without SR-IOV is probably not as advanced as secure enclave approaches.
Intel SGX is a secure enclave capability, which is cancelled on everything but Xeon. FWIU there is no SGX for timeshared GPUs.
What executable loader reverifies the loaded executable in RAM after imit time ?
What LLM loader reverifies the in-RAM model? Can Merkle hashes reduce that cost; of nn state verification?
Can it be proven that a [chat AI] model hosted by someone else is what is claimed; that it's truly a response from "model abc v2025.02"?
PaaS or IaaS
We used to joke about this a lot when Java devs would have memory issues and not know how to adjust the heap size in init scripts. So many “CS majors” who are completely oblivious to anything happening outside of the JVM, and plenty happening within it.
I want to understand every possible detail about my framework and language and libraries. Like I think I understand more than many do, and I want to understand more, and find it fulfilling to learn more. I don't, it's true, care to understand the implementation details of, say, the OS. I want to know the affordances it offers me and the APIs that matter to me, I don't care about how it's implemented. I don't care to understand more about DNS than I need. I definitely don't care to spend my time futzing with kubernetes -- I see it as a tool, and if I can use a different tool (say heroku or fly.io) that lets me not have to learn as much -- so I have more time to learn every possible detail of my language and framework, so I can do what I really came to do, develop solutions as efficiently and maintainably as possible.
You are apparently interested in lower levels of abstraction than I am. Which is fine! Perhaps you do ops/systems/sre and don't deal with the higher levels of abstraction as much as I do -- that is definitely lucrative these days, there are plenty of positions like that. Perhaps you deal with more levels of abstraction but don't go as deep as me -- or, and I totally know it's possible, you just have more brain space to go as deep or deeper on more levels of abstraction as me. But even you probably don't get into the implementation details of electrical engineering and CPU design? Or if you do, and also go deep on frameworks and languages, I think you belong to a very very small category!
But I also know developers who, to me, dont' want to go to deep on any of the levels of abstraction. I admit I look down on them, as I think you do too, they seem like copy-paste coders who will never be as good at developing efficient maintainable soltuions.
I started this post saying I think that's a different axis than what layers of abstraction one specializes in or how far down one wants to know the details. But as I get here, while I still think that's likely, I'm willing to consider that these developers I have not been respecting -- are just going really deep in even higher levels of abstraction than me? Some of them maybe, but honestly I don't think most of them, but I could be wrong!
Dead Comment
Dead Comment
I feel this is similar to what you are pointing out. Why _shouldn’t_ people be the “magic” users. When was the last time one of your average devs looked in to how esm loading? Or the python interpreter or v8? Or how it communicates with the OS and lower level hardware interfacing?
This is the same thing. Only you are goalpost shifting.
Dead Comment
Dead Comment
This is baffling. What’s value proposition here? At some point customer will be directly asking an AI agent to create an app for them and it will take care of coding/deployment for them..
Some people became software developers because they wanted to make easy money back when the industry was still advertising bootcamps (in order to drive down the cost of developers).
Some people simply drifted into this profession by inertia.
And everything in-between.
From my experience there are a lot of developers who don't take pride in their work, and just do it because it pays the bills. I wouldn't want to be them but I get it. The thing is that by delegating all their knowledge to the tools they use, they are making themselves easy to replace, when the time comes. And if they have to fix something on their own, they can't. Because they don't understand why and how it works, and how and why it became what it is instead of something else.
So they call me and ask me how that thing works...
Extremely fast to start on-demand, reliable and although a little bit pricy but not unreasonably so considering the alternatives.
And the DX is amazing! it's just like any other fly machine, no new set of commands to learn. Deploy, logs, metrics, everything just works out of the box.
Regarding the price: we've tried a well known cheaper alternative and every once in a while on restart inference performance was reduced by 90%. We never figured out why, but we never had any such problems on fly.
If I'm using a cheaper "Marketplace" to run our AI workloads, I'm also not really clear on who has access to our customer's data. No such issues with fly GPUs.
All that to say, fly GPUs are a game changer for us. I could wish only for lower prices and more regions, otherwise the product is already perfect.
This is in stark contrast to all other options I tried (AWS, GCP, LambdaLabs). The fly.io config really felt like something worth being in every project of mine and I had a few occasions where I was able to tell people to sign up at fly.io and just run it right there (Btw. signing up for GPUs always included writing an email to them, which I think was a bit momentum-killing for some people).
In my experience, the only real minor flaw was the already mentioned embedding of the whole CUDA stack into your container, which creates containers that approach 8GB easily. This then lets you hit some fly.io limits as well as creating slow build times.
2012 - moores law basically ends - nand gates do t get smaller just more cleverly wrapped. Single threaded execution more or less stops at 2 GHz and has remained there.
2012-2022 - no one notices single threaded is stalled because everything moves to VMs in the cloud - the excess parallel compute from each generation is just shared out in data centres
2022 - data centres realise there is no point buying the next generation of super chips with even more cores because you make massive capital investments but cannot shovel 10x or 100x processes in because Amdahls law means standard computing is not 100% parallel
2022 - but look, LLMs are 100% parallel hence we can invest capital once again
2024 - this is the bit that makes my noodle - wafer scale silicon. 900,000 cores with GBs SRAM - these monsters run Llama models 10x faster than A100s
We broke moores law and hardware just kept giving more parallel cores because that’s all they can do.
And now software needs to find how to use that power - because dammit, someone can run their code 1 million times faster than a competitor - god knows what that means but it’s got to mean something - but AI surely cannot be the only way to use 1M cores?
It looks like maybe the slope changed slightly starting around 2006, but it’s funny because this comment ends complaining that Moore’s Law is too good after claiming it’s dead. Yes, software needs to deal with the transistor count. Yes, parallel architectures fit Moore’s law. The need to go to more parallel and more parallel because of Moore’s Law was predicted, even before 2006. It was a talking point in my undergrad classes in the 90s.
https://upload.wikimedia.org/wikipedia/commons/0/00/Moore%27...
http://cva.stanford.edu/classes/cs99s/papers/moore-crammingm...
But to further needle, the law is
Single Thread execution, I assume you mean IPC or may be more accurately as PPC ( Performance Per Clock ) has improved steadily if you accounted for ARM design and not just x86. That is why M1 was so surprising to everyone because most (all) thought Geekbench score on Phone doesn't translate to Desktop and somehow M1 went from nonsense to breakthrough.
Clockspeed also went from 2Ghz to 5Ghz and we are pushing 4Ghz on Mobile Phone already.
And Moores law, in terms of transistor density ends when Intel couldn't deliver 10nm on time, so 2016 / 2017 give or take. But that doesn't mean transistor density is not improving.
https://www.man.com/technology/single-core-stagnation-and-th...
From a 50MHz 486 in 1990 to a 1.4GHz P3 in 2000 is a factor of 28 improvement in speed solely due to clock speed! Add on all the other multiplicative improvements from IPC...
Since then, in more than 20 years, the clock frequency has increased only 2 times, while in the previous decade (1983-1993) it had increased only about 5 times, where a doubling of the clock frequency (33 to 66 MHz) had occurred between 1989 and 1993 (for cheap CPUs with MOS logic, because expensive CPUs using ECL had reached 80 MHz already during the seventies).
Also, Pentium III has reached 1.4 GHz only in early 2002, not in 2000, while 80486 has reached 50 GHz only in 1991, not in 1990.
In 2012, 22nm architectures were new (https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchite...). Now we have 3nm architectures (https://en.wikipedia.org/wiki/3_nm_process). In what sense have nand gates "not gotten smaller"?
My computer was initially built in 2014 and the CPU runs up to 3 GHz (and I don't think it was particularly new on the market). CPUs made today overclock to 5+ GHz. In what sense did "single threaded execution more or less stop at 2 GHz and remain there"?
We might not be seeing exponential increases in e.g. transistor count with the same doubling period as before, but there has demonstrably been considerable improvement to traditional CPUs in the last 12+ years.
As for clock speeds, yes and no, basically thermal limits stop most CPUs running full time at full speed - the problem was obvious back in the day - I would build PCs and carefully apply thermal paste to the plastic casing of a chip - thus waiting for the heat of the transistors to heat up the plastic to remove the waste. Yes they are working on thermal something something directly on the layers of silicon.
Because 22nm was not actually 22nm, and 3nm is not actually 3nm.
You could have a laptop with 1000 cores on it - simple 32/64 bit CPUs that just ran full pelt.
The lack of parallelism drove decisions to take silicon and make it do stuff that was not run everything faster. But to focus on getting one instruction set through one core faster.
AI has arrived and found a world of silicon that it by coincidence can use every transistor for going full pelt - and the CPUs we think of in our laptops are using only a fraction of their transistors for full pedal to the metal processing and the rest is … legacy??
You get more cores because transistor density didn't stop increasing, software devs/compiler engineers just can't think of anything better to do with the extra real estate!
> Single threaded execution more or less stops at 2 GHz and has remained there.
There are other semiconductor materials that do not have the heat limits of silicon-based FETs and have become shockingly cheap and small (for example, a 200W power supply the size of a wallet that doesn't catch on fire). We're using these materials for power electronics and RF/optics today but they're nowhere close to FinFETs from a few years ago or what they're doing today. That's because all the fabrication technology and practices have yet to be churned out for these new materials (and it's not just UV lasers), but they're getting better, and there will one day be a mcu made from wide bandgap materials that cracks 10GHz in a consumer device.
Total aside, hardware junkies love talking cores and clock speeds, but the real bottlenecks for HPC are memory and i/o bandwidth/latency. That's why the future is optical, but the technology for even designing and experimenting with the hardware is in its infancy.
I’d bet most code we use every day spends most time just waiting for things like disk and network. Not to mention it’s probably inherently sequential.
I cannot work out if we pack enough parallel problems in the world or just lack a programming language to describe them
A 2GHz core from a 2012 is extremely slow compared to a 2GHz core of a modern CPU. The difference could be an order of magnitude.
There is more to scaling CPUs than the clock speed. Modern CPUs process many more instructions per clock on average.
Suddenly your cost for building a new data centre is something like twice the cost of the previous one (cold gets more expensive etc) and yet you only sell same amount of space. It’s not an attractive business in first place.
This was the push for lambda architecture etc - depending on usage you could have hundreds of people buying the same core. I would have put a lot of cash into making something that spins up docker instances so fast it’s like lambda - and guess what fly.io does?
I think fly.io’s obsession with engineering led them down a path of positive capital usage while AWS focused on rolling out new products on a tougher capital process.
Anyway - AI is the only thing that dense multi core data centres can use that packs in many many users compared to docker per core.
Unless we all learn how to think and code in parallel, we are leaving a huge amount of hardware gains on the table. And those gains are going to 100x in the next ten years and my bet is my janky-ass software will still not be able to use it - there will be a bifurcation of specialist software engineers who work in domains and with tooling that is embarrassingly parallel, and the rest of us will be on fly.io :-)
(#) ok so maybe 3 or 4 docker per core, with hyper visor doling out time slots, but much more than that and performance is a dog, and so the number of “virtual CPUs” you can sell is a limited number and creeps up despite hardware leaping ahead … the point I am making
Ancient Java servlets back in the early 2000s were more suitable for performance on current gen hardware than modern NodeJS, Python etc...
Deleted Comment
Let's be honest, the other stuff is just Chrome: Tell me 96gb is enough?
Prompt eval is slow, inference for large models at high context is slow, training is limited and slow.
It's better than not having anything, but we got rid of our M1 Max 192GBs after about a year.
Deleted Comment
With that said, this seems quite obvious - the type of customer that chooses Fly, seems like the last person to be spinning up dedicated GPU servers for extended periods of time. Seems much more likely they'll use something serverless which requires a ton of DX work to get right (personally I think Modal is killing it here). To compete, they would have needed to bet the company on it. It's way too competitive otherwise.
They're charging hyperscaler rates, and anyone willing to pay that much won't go with Fly.
For serverless usage they're only mildly overpriced compared to say Runpod, but I don't think of serverless as anything more than an onramp to renting dedicated machine, so it's not surprising to hear it's not taking off.
GPU workloads tend to have terrible cold-start performance by their nature, and without a lot of application specific optimizations it rarely ends up making financial sense to not take a cheaper continous option if you have an even mildly consistent workload. (and if you don't then you're not generating that much money for them)
My Fly machine loads from turned off to first inference complete in about 35 seconds.
If it’s already running, it’s 15 seconds to complete. I think that’s pretty decent.
Fly.io seems to attract similar developers as Cloudflare’s Workers platform. Mostly developers who want a PaaS like solution with good dev UX.
If that’s the case, this conclusion seems obvious in hindsight (hindsight is a bitch). Developers who are used to having infra managed for them so they can build applications don’t want to start building on raw infra. They want the dev velocity promise of a PaaS environment.
Cloudflare made a similar bet with GPUs I think but instead stayed consistent with the PaaS approach by building Workers AI, which gives you a lot of open LLMs and other models out of box that you can use on demand. It seems like Fly.io would be in a good position to do something similar with those GPUs.
I don't know exactly what type of cloud offering would satisfy my needs, but what's funny is that attaching an AMD consumer GPU to a Raspberry Pi is probably the most economical approach for a lot of problems.
Maybe something like a system where I could hotplug a full GPU into a system for a reservation of a few minutes at a time and then unplug it and let it go back into a pool?
FWIW it's that there's a large number of ML-based workflows that I'd like to plug into progscrape.com, but it's been very difficult to find a model that works without breaking the hobby-project bank.
If you could checkpoint a GPU quickly enough it would be possible to run multiple isolated workloads on the same GPUs without any issues.
Deleted Comment
To me fly's offering reads like a system integrator"s solution. They assemble components produced mainly by 3rd parties into an offered solution. The business model of a system integrator thrives on doing the least innovation/custom work possible for providing the offering. You posotion yourself to take maximal advantage of investments and innovations driven by your 3rd party suppliers. You want to be squarely on their happy path.
Instead this artcle reads like fly, with good intention, was trying to divert their tech suppliers offer stream into niche edge cases outside of maistream support.
This can be a valid strategy for products very late into their maturity lifecycle where core innovation is stagnant, but for the current state of AI with extremely rapid innovation waves coarsing through the market, that strategy is doomed to fail.