Fly.io has GPUs now - Readit News

Does it have basic functioning other stuff? I am shocked at how our production usage of Fly has gone. Even basic stuff as support not being able to just... look up internal platform issues. Cryptic/non-existent error messages. I'm not impressed. It feels like it's compelling to those scared of or ignorant of Kubernetes. I thought I was over Kubernetes, but Fly makes me miss it.

parhamn · 2 years ago

I was hoping to migrate to Fly.io and during my testing I found that simple deploys would drop connections for a few seconds during a deploy switch over. Try a `watch -n 2 curl <serviceipv4>` during a deploy to see for yourself (try any one of the the strategies documented including blue-green). I wonder how many people know this?

When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.

It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.

> It feels like it's compelling to those scared of or ignorant of Kubernetes

I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

asaddhamani · 2 years ago

Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.

sofixa · 2 years ago

> I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.

rollcat · 2 years ago

> Try a `watch -n 2 curl <serviceipv4>` during a deploy

You need blackbox HTTP monitoring right now, don't ever wait for your customer to tell you that your service is down.

I use Prometheus (&Grafana), but you can also get a hosted service like Pingdom or whatever.

xena · 2 years ago

Can you email the first two letters of my username at fly.io with more details? I'd love to find out what you've been having trouble with so I can help make the situation better any way I can. Thanks!

bongobingo1 · 2 years ago

Another support.flycombinator.com classic.

throwaway220033 · 2 years ago

...as if it's one person who had issues! I thought it was just incompetency. But it now looks like a theatre, pretending now.

pech0rin · 2 years ago

Yep they have terrible reliability and support. Couldn’t deploy for 2 days once and they actually told me to use another company. Unmanaged dbs masquerading as managed. Random downtime. I could go on but it’s not a production ready service and I moved off of it months ago.

benzible · 2 years ago

The header at the top of their Getting Started is "This Is Not Managed Postgres " [1]

and they have a managed offering [2] in private beta now...

> Supabase now offers their excellent managed Postgres service on Fly.io infrastructure. Provisioning Supabase via flyctl ensures secure, low-latency database access from applications hosted on Fly.io.

[1] https://fly.io/docs/postgres/getting-started/what-you-should...

[2] https://fly.io/docs/reference/supabase/

biorach · 2 years ago

> Unmanaged dbs masquerading as managed

Are you talking about fly postgres? Because I use it and feel they've been pretty clear that it's unmanaged.

morgante · 2 years ago

Unfortunately this is a pretty common story. Half the people I know who adopted Fly migrated off it.

I was very excited about Fly originally, and built an entire orchestrator on top of Fly machines—until they had a multi-day outage where it took days to even get a response.

Kubernetes can be complex, but at least that complexity is (a) controllable and (b) fairly well-trodden.

loloquwowndueo · 2 years ago

Fly.io is not comparable to Kubernetes. It’s a bit like comparing AWS to Terraform.

Or to clarify your comment, Kubernetes on which cloud? Amazon? google? Linode?

awestroke · 2 years ago

I have run several services on Fly for almost a year now, have not had any issues.

rmbyrro · 2 years ago

I find it amazing how much bad vibes fly.io gets here.

It looks worse than AWS or Azure to me.

Never used the service, but based on what I hear, I'll never try...

M_Joverflow · 2 years ago

I don't see any reason use Fly. There are more mature, feature-richer and cheaper solutions out there. We have the big complex ones like AWS, Azure, GCP, the easier more affordable all rounder like DO, Render, the hosting plattforms like Vercel, Heroku and finally the biggest bang for your money barebones like Hetzner.

Why should I choose Fly? How come they are so prominent on hackernews? Are they backed by VC and get their default 400 upvotes by backers? I get the impression that Fly posts here are kind of sponsored.

throwaway220033 · 2 years ago

I switched to Kamal and Hetzner. It's the sweet spot.

Deleted Comment

chachra · 2 years ago

Been on it 7 months, 0 issues. Feel like you're alone on this potentially.

weird-eye-issue · 2 years ago

Alone? Every thread about Fly has complaints about reliability and people complain about it on Twitter too

uo21tp5hoyg · 2 years ago

https://community.fly.io/t/reliability-its-not-great/11253

Deleted Comment

heeton · 2 years ago

Not alone, I’ve been part of two teams who have evaluated fly and hit weird reliability or stability issues, deemed it not ready yet.

yawnxyz · 2 years ago

this is what I thought, until once I spent two days to publish a new, trivial code change to my Fly.io hosted API — it just wouldn't update! And every time I tried to re-publish it'd give me a slightly different error.

When it works, it's brilliant. The problem is that it hasn't worked too well in the last few months.

Hi, author of the post and Fly.io devrel here in case anyone has any questions. GPUs went GA yesterday, you can experiment with them to your heart's content should the fraud algorithm machine god smile upon you. I'm mostly surprised my signal post about what the "GPUs" are didn't land well here: https://fly.io/blog/what-are-these-gpus-really/

If anyone has any questions, fire away!

benreesman · 2 years ago

I'd be fascinated to hear your thoughts on Apple hardware for inference in particular. I spend a lot of time tuning up inference to run locally for people with Apple Silicon on-prem or even on-desk, and I estimate a lot of headroom left even with all the work that's gone into e.g. GGUF.

Do you think the process node advantage and SoC/HBM-first will hold up long enough for the software to catch up? High-end Metal gear looks expensive until you compare it to NVIDIA with 64Gb+ of reasonably high memory bandwidth attached to dedicated FP vector units :)

One imagines that being able to move inference workloads on and off device with a platform like `fly.io` would represent a lot of degrees of freedom for edge-heavy applications.

xena · 2 years ago

Well, let me put it this way. I have a MacBook with 64 GB of vram so I can experiment with making an old-fashioned x.ai clone (the meeting scheduling one, not the "woke chatgpt" one) amongst other things now. I love how Apple Silicon makes things vroomy on my laptop.

I do know that getting those working in a cloud provider setup is a "pain in the ass" (according to ex-AWS friends) so I don't personally have hope in seeing that happen in production.

However, the premise makes me laugh so much, so who knows? :)

LtdJorge · 2 years ago

If you want something similar to Apple silicon for AI, look for AMD Instinct MI300A. 24 cores, 256MB cache, 128GB of HBM3 and 1 petaflop of FP16.

thangngoc89 · 2 years ago

This is right on time. I'm evaluating "severless" GPU services for my upcoming project. I see on the announcement that pricing is per hours. Is scaling to zero priced based on minutes/seconds? For my workflow, medical image segmentation, one file takes about 5 minutes.

LtdJorge · 2 years ago

How long does it take to process on CPU?

qeternity · 2 years ago

I posted further down before seeing your comment. First, congrats on the launch!

But who is the target user of this service? Is this mostly just for existing fly.io customers who want to keep within the fly.io sandbox?

xena · 2 years ago

Part of it is for people that want to do GPU things on their fly.io networks. One of the big things I do personally is I made Arsène (https://arsene.fly.dev) a while back as an exploration of the "dead internet" theory. Every 12 hours it pokes two GPUs on Fly.io to generate article prose and key art with Mixtral (via Ollama) and an anime-tuned Stable Diffusion XL model named Kohaku-XL.

Frankly, I also see the other part of it as a way to ride the AI hype train to victory. Having powerful GPUs available to everyone makes it easy to experiment, which would open Fly.io as an option for more developers. I think "bring your own weights" is going to be a compelling story as things advance.

tptacek · 2 years ago

This isn't the target user, but the boy's been using it at the soil bacteria lab he works in to do basecalling for a FAST5 data from a nanopore sequencer.

subarctic · 2 years ago

Commenters like this, for one thing: https://news.ycombinator.com/item?id=34242767

Nevin1901 · 2 years ago

How fast are coldstarts, and how do you compare against other gpu providers (runpod modal etc)

xena · 2 years ago

The slowest part is loading weights into vram in my experience. I haven't done benchmarking on that. What kind of benchmark would you like to see?

yla92 · 2 years ago

Not a question but the link "Lovelace L40s are coming soon (pricing TBD)" is 404.

thangngoc89 · 2 years ago

If it's a link to nvidia.com then it's expected to be broken. Seriously, I've never seen a valid link to nvidia.com

xena · 2 years ago

Uhhhh that's not ideal. I'll go edit that after dinner. Thanks!

bl4kers · 2 years ago

How difficult world it be to set up Folding@home on these? https://foldingathome.org

xena · 2 years ago

I'm not sure, the more it uses CUDA the easier I bet. I don't know if it would be fiscally worth it though.

Deleted Comment

k8svet · 2 years ago

niz4ts · 2 years ago

As far as I know, Fly uses Firecracker for their VMs. I've been following Firecracker for a while now (even using it in a project), and they don't support GPUs out of the box (and have no plan to support it [1]).

I'm curious to know how Fly figured their own GPU support with Firecracker. In the past they had some very detailed technical posts on how they achieved certain things, so I'm hoping we'll see one on their GPU support in the future!

[1]: https://github.com/firecracker-microvm/firecracker/issues/11...

mrkurt · 2 years ago

The simple spoiler is that the GPU machines use Cloud Hypervisor, not Firecracker.

yencabulator · 2 years ago

There has been weirdly little discussion on HN about Cloud Hypervisor. I guess because it's such a horribly bland non-descriptive Enterprise Naming name?

It looks pretty sweet. Rust & sharing libraries with Firecracker and ChromeOS's crosvm, with more emphasis on long-running stateful services than in Firecracker.

https://github.com/cloud-hypervisor/cloud-hypervisor

https://github.com/rust-vmm

Way simpler than what I was expecting! Any notes to share about Cloud Hypervisor vs Firecracker operationally? I'm assuming the bulkier Cloud Hypervisor doesn't matter much compared to the latency of most GPU workloads.

iambateman · 2 years ago

It’s cool to see that they can handle scaling down to zero. Especially for working on experimental sites that don’t have the users to justify even modest server costs.

I would love an example on how much time a request charges. Obviously it will vary, but is it 2 seconds or “minimum 60 seconds per spin up”?

We charge from the time you boot a machine until it stops. There's no enforced minimum, but in general it's difficult to get much out of a machine in less than 5 seconds. For GPU machines, depending on data size for whatever is going into GPU memory, it could need 30s of runtime to be useful.

sodality2 · 2 years ago

How long does model loading take? Loading 19GB into a machine can't be instantaneous (especially if the model is a network share).

bbkane · 2 years ago

I see the whisper transcription article. Is there an easy way to limit it to, say $100 worth of transcription a month and then stop till next month? I want to transcribe a bunch of speeches but I want to spread the cost over time

andes314 · 2 years ago

Do you offer some sort of keep_warm parameter that removes this latency (for a greater cost)?

pgt · 2 years ago

I was an early adopter of Fly.io. It is not production-ready. They should fix their basic features before adding new ones.

urduntupu · 2 years ago

Unfortunately true. Also jumped the fly.io ship after initial high excitement for their offering. Moved back to DigitalOcean's app platform. A bit more config effort, significantly pricier, but we need stability on production. Can't have my customers call me b/c of service interruption.

+1 - It's the most unreliable hosting service I've ever used in my life with "nice looking" packaging. There were frequently multiple things broken at same time, status page would always be green while my meetings and weekends were ruined. Software can be broken but Fly handles incidents with unprofessional, immature attitude. Basically you pay 10x more money for an unreliable service that just looks "nice". I'm paying 4x less to much better hardware with Hetzner + Kamal; it works reliably, pricing is predictable, I don't pay 25% more for the same usage next month.

https://news.ycombinator.com/item?id=36808296

ecmascript · 2 years ago

Comments like these are just sad to see on HN. It is not constructive. What is these basic features that need fixing you're speaking about and what is the fixes required?

cschmatzler · 2 years ago

Reliability and support. Having even “the entire node went down” tickets get an auto-response to “please go fuck off into the community forum” is insane. What is the community forum gonna do about your reliability issues? I can get a 4€/mo server at Hetzner and have actual people in the datacenter respond to my technical inquiries within minutes.

He who sees fraud and does not cry 'fraud,' is fraud.

1. Provisioned machines should not die randomly and not spin back up.

Saving you time and money.

nakovet · 2 years ago

About Fly but not about the GPU announcement, I wish they had a S3 replacement, they suggest a GNU Affero project that is a dealbreaker for any business, needing to leave Fly to store user assets was a dealbreaker for us to use Fly on our next project, sad cause I love the simplicity, the value for money, the built in VPN.

simonw · 2 years ago

Sounds like you might be interested in the Tigris preview:

- https://www.tigrisdata.com/

- https://benhoyt.com/writings/flyio-and-tigris/ (discussed here: https://news.ycombinator.com/item?id=39360870)

- https://fly.io/docs/reference/tigris/

JoshTriplett · 2 years ago

> I wish they had a S3 replacement, they suggest a GNU Affero project that is a dealbreaker for any business

AGPL does not mean you have to share everything you've built atop a service, just everything you've linked to it and any changes you've made to it. If you're accessing an S3-like service using only an HTTPS API, that isn't going to make your code subject to the AGPL.

bradfitz · 2 years ago

Regardless, some companies have a blanket thou-shalt-not-use-AGPL-anything policy.

RcouF1uZ4gsC · 2 years ago

> AGPL does not mean you have to share everything you've built atop a service, just everything you've linked to it and any changes you've made to it. If you're accessing an S3-like service using only an HTTPS API, that isn't going to make your code subject to the AGPL.

I am not so sure about that. Otherwise, you could trivially get around the AGPL by using https services to launder your proprietary changes.

There is not enough caselaw to say how a case that used only http services provided by AGPL to run a proprietary service would turn out, and it is not worth betting your business on it.

benbjohnson · 2 years ago

We have an region-aware S3 replacement that's in beta right now: https://community.fly.io/t/global-caching-object-storage-on-...

benhoyt · 2 years ago

They're about to get an S3 replacement, called Tigris (it's a separate company but integrated into flyctl and runs on Fly.io infra): https://benhoyt.com/writings/flyio-and-tigris/

martylamb · 2 years ago

Funny you should mention that: https://news.ycombinator.com/item?id=39360870

itake · 2 years ago

The dealbreaker should be their uptime and support. They deleted my database and have many uptime issues.

Give us a minute.

benatkin · 2 years ago

This looks promising https://github.com/seaweedfs/seaweedfs

candiddevmike · 2 years ago

Seaweed requires a separate coordination setup which may simplify the architecture but complicates the deployment.

Who is the target market for this? Small/unproven apps that need to run some AI model, but won't/can't use hosted offerings by the literally dozens of race-to-zero startups offering OSS models?

We run plenty of our own models and hardware, so I get wanting to have control over the metal. I'm just trying to figure out who this is targeted at.

We have some ideas but there's no clear answer yet. Probably people building hosting platforms. Maybe not obvious hosting platforms, but hosting platforms.

KTibow · 2 years ago

Fly is an edge network - in theory, if your GPUs are next to your servers and your servers are next to your users, your app will be very fast, as highlighted in the article. In practice this might not matter much since inference takes a long time anyway.

We're really a couple things; the edge stuff was where we got started in 2020, but "fast booting VMs" is just as important to us now, and that's something that's useful whether or not you're doing edge stuff.

joshxyz · 2 years ago

this is crazy, this move alone cements fly as an edge player for the next 3 / 5 / 10 years.

dathinab · 2 years ago

TL;DR: (skip to last paragraph)

- having the GPU compute in the same data center or at least from the same cloud provider can be a huge plus

- it's not that rare for various providers we have tried out to run out of available A100 GPUs, even with large providers we had issues like that multiple times (less an issue if you aren't locked to specific regions)

- not all providers provide a usable scale down to zero "on demand" model, idk. how well it works with fly long term but that could be another point

- race-to-zero startups have the tendency to not last, it's kind by design from a 100 of them just a very few survive

- if you are already on fly and write a non-public tech demo which just gets evaluated a few times their GPU offering can act like a default don't think much about it solution (through you using e.g. Huggingface services would be often more likely)

- A lot of companies can't run their own hardware for various reasons, at best they can rent a rack in another Datacenter but for small use use-cases this isn't always worth it. Similar there are use cases which do might A100s but only run them rarely (e.g. on weekly analytics data). Potentially less then 1h/w in which case race-to-zero pricing might not look interesting at all

To sum up I think there are many small reasons why some companies, not just startups, might have interest in fly GPUs, especially if they are already on fly. But there is no single "that's why" argument, especially if you are already deploying to another cloud.

It's not like Fly has GPUs in every PoP...so there goes all the same datacenter stuff (unless you just want to be in the PoP with GPUs in which case...)

But none of this answers my question.

I'm trying to understand the intersection of things like "people who need GPU compute" and "people who need to scale down to zero".

This can't be a very big market.

DreamGen · 2 years ago

I am not seeing any race-to-zero in the hosted offering space. Most charge multiples of what you would pay on GCP, and the public prices on GCP are already several times what you would pay as an enterprise customer.

I don't know what you think I'm talking about, or who is charging multiples of GCP? But I'm talking about hosted inference, where many startups are offering Mistral models cheaper than Mistral are.

Dead Comment

ec109685 · 2 years ago

The recipe example or any any LLM use case seems like a very poor way of highlighting “inference at the edge” given the extra few hundred ms round trip won’t matter.

unraveller · 2 years ago

The better use case is obviously voice assistant at the edge. As in voice 2 text 2 search/GPT 2 voice generated response. That is where ms matter but it is also a high abuse angle no one wants to associate with just yet. My guess is they are going to do this in another post, and if so they should make their own perplexity style online-gpt. For now they just wanted to see what else people can think up by making the introduction of it boring.

There’s three options for inference: 1) On device inference 2) Inference “on the edge” 3) Inference in a data center

Given fly is deployed in equinox data centers just like everyone else, fundamentally there isn’t much difference between #2 and #3.

manishsharan · 2 years ago

This. I cannot think of a business case for running LLMs on the edge. Is this a Pets.com moment for the AI industry?