Readit News logoReadit News
simonw · 4 years ago
> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.

There's a very solid solution to this that isn't as widely known as it should be.

Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad!

The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit.

The Fly replay header is perfect for this. Here's what to do:

Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value.

I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire.

In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region.

This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code.

Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do.

But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective.

Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/

simonw · 4 years ago
There's another, more sophisticated trick that works for some databases: tracking a global transaction counter of some sort, persisting that in a cookie when a user makes a write and redirecting the user to the lead database if the replica they are talking to hasn't made it to that point yet.

Chris McCord describes how Elixir does that with PostgreSQL here: https://news.ycombinator.com/item?id=31434094

Wikipedia implements this trick on top of PHP and MySQL global transaction IDs (GTIDs) so it definitely scales!

simonw · 4 years ago
Actually the way Wikipedia works is slightly different: they don't redirect to a lead database, they instead call this MySQL function to wait on the replica for it to catch up:

    SELECT WAIT_FOR_EXECUTED_GTID_SET($gtidArg, $timeout)
https://github.com/wikimedia/mediawiki/blob/434c333d9b2be817...

I wonder if there's a PostgreSQL equivalent of this?

randito · 4 years ago
(Disclaimer: Not an expert.. just sharing something I read somewhere)

I think FoundationDB does something really interesting with this problem. When you make changes, you do it via a transaction. But all the client reads are using the previous version, until the transaction changes have propagated across the nodes, then the new value is returned.

treis · 4 years ago
This is a ton of effort to save the RTT of sending all the requests to a central server. And it all goes out the window the second you need to call an external API in the processing of your requests. And to get what benefit there may be you need to, more or less, pay for a server in every big city. IMHO, outside of gaming there's no real need for what fly.io does.

For something like this to be useful I think the code would need to be running on the user's network. That would drop server ping to sub 1 ms and open up a whole lot of interesting possibilities. But I don't see what changing server ping from 80 ms to 15ms gets me.

simonw · 4 years ago
This trick isn't just about geographic distribution - it's most commonly used for classic horizontal scaling, where you use multiple read-replicas to handle more traffic.
tptacek · 4 years ago
If you're getting 80ms response times for user requests, consistently, then it doesn't change much.
rkangel · 4 years ago
I think his stack is a little confused. He's got HTMX and Phoenix in there.

If you are using Phoenix then LiveView is the obvious approach to dynamically updating a page based on server stuff. It's a similar-ish architecture to HTMX, but integrated into the framework. The page is rendered on the server as normal, then when it loads on the client a web-socket is opened to a task on the server (page includes the LiveView JS). Then when something changes on the server, some new HTML generated and then the parts that have changed are sent down the websocket to the client to insert into the page. LiveView is part of Phoenix, leverages Elixir's concurrency, is very performant and a joy to use.

HTMX is a way of getting similar functionality but for a conventional server rendered framework like Django which doesn't have any of this stuff built in. It would be challenging to build it in anyway because the concurrency isn't as powerful. Simplistically, Phoenix exists because Chris McCord was trying to do a LiveView equivalent in Ruby, had issues, went on a search discovered Elixir.

So either use:

Elixir + Phoenix + Phoenix LiveView

Or:

Python + Django + HTMX (Python and Django can be substituted for other frameworks like Rails)

In both cases, Alpine can then be useful to sprinkle in some clientside only UI features.

hartleybrody · 4 years ago
OP here, thanks for making that distinction more clear. I had listed them all as new tech that I am starting to use, but you're correct that I wouldn't intend to use them all _together_.

The HTMX and alpine libs were intended to be sprinkled onto existing web apps (my usual python/flask stack), whereas Phoenix would be for building all new projects.

ellen364 · 4 years ago
From the article, I’m not sure if the author is using all of Phoenix + HTMX + alpine.js or just exploring the combos to see what works.

I recently started playing with Phoenix and the intro to channels and LiveView has been a bit confusing. E.g. a few days ago I wondered if it was worth using something like Svelte for the frontend and then realised I could just use LiveView. As a newbie to the ecosystem, it’s taking a while to get the lay of the land and start understanding the options.

jjdeveloper · 4 years ago
Not even sure Alpine is needed anymore as they have Phoenix.LiveView.JS now https://fly.io/phoenix-files/sdeb-toggling-element/
BeFlatXIII · 4 years ago
Thanks for this explainer. This is the missing “here is when it’s redundant” guideline when investigating whether to add to your stack.
chrismccord · 4 years ago
One of the points about read replicas and read-your-own-writes is correct to call out, but on the Elixir side we have an answer to that:

> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.

Elixir is distributed out of the box, so nodes can message each other. This allowed us to easily ship a `fly_postgres_elixir` library that guarantees read-your-own-writes: https://github.com/superfly/fly_postgres_elixir

It does this by sending writes to the primary region over RPC (via distributed elixir). The write is performed on a primary instance adjacent to the DB, then the result, and the postgres log-sequence-number, is sent back to the remote node. When the library gets a result of the RPC write, it blocks locally until its local read replica matches an LSN >= write LSN, then the result is returned to the caller

This gives us read-your-own-writes for the end-user, and the calling code remains unchanged for standard code paths. This doesn't solve all classes of race conditions – for example you may broadcast a message over Phoenix.PubSub that causes a read on the remote node for data that isn't yet replicated, but typically you'd avoid an N query problem from pubsub in general by populating the data in the message on the publisher beforehand.

There's no completely avoiding the fact you have a distributed system where the speed of light matters, but it's Fly's (and Phoenix's) goal to push those concerns back as far as possible. For read heavy apps, or apps that use caching layers for reads, developers already face these kinds of problems. If you think of your read-replicas as cache with a convenient SQL interface, you can avoid most foot guns.

I'm happy to answer other questions as it relates to Phoenix, Fly or what Phoenix + Fly enables from my perspective.

benwilson-512 · 4 years ago
That's very cool. Presumably as well this could be opt out if you had specific operations (perhaps from a write-only API) that don't need to wait to do further writes?
chrismccord · 4 years ago
Yeah we expose interfaces to ignore blocking on the LSN, but the way this works is by proxying the Ecto Repo interface with our own Repo. So you could call your underlying Repo directly if you wanted to perform a write without blocking on the LSN as well.
csmpltn · 4 years ago
An unrelated, yet honest question.

There have been many posts hitting the HN frontpage regarding fly.io recently. Is it healthy to have so much content about a single PAAS platform showing up here so often now?

ghoomketu · 4 years ago
As per dang's comment a few days back(1)

> I wish more startups would achieve this, YC or not. Whenever I run across one that's trying to succeed on HN, I try to help them do so (YC or not)—why? because it makes HN better if the community finds things it loves here. Among the startups of today, I can think of only two offhand who are showing signs of maybe reaching darling status—fly.io (YC), and Tailscale (not YC).

Personally too both these companies are doing a lot of incredible things. I also love Litestream, phoenixframework and other things they are doing.

(1) https://news.ycombinator.com/item?id=30066969

moritonal · 4 years ago
Interesting to consider the power the mods have here to nudge certain companies into the lime-light of influential technologists.
spoils19 · 4 years ago
HN: one of the last remaining Great Good Places of the Internet, a lone tavern in an iconic gateway town to the now not-so-wild west.

Beyond the western borders of this little town, the tech gold rush has both expanded to epic proportions, affecting all the economies in the world, and also gone through enough booms and busts that the phrase "gold rush" seems somehow off.

As more and more young'uns join and jaded veterans return to throng the tavern alike, it often seems to be on the brink of either exploding with the largest gun fight in history, or jumping the shark.

And yet, against all odds, it retains its original magnetism - drawing throngs that grow in number and diversity while seers like [https://news.ycombinator.com/user?id=patio11](https://news.y... and [https://news.ycombinator.com/threads?id=tptacek](https://new... continue to return - dispensing worldly wisdom worth its weight in gold from corner tables.

The secret is the man at the corner of the bar @dang, always around with a friendly smile and a towel on his shoulder. The only sheriff in the west who still doubles as the friendly bartender: always polite, always willing to break up a fight with kind words and clean up messes himself.

Yes a cold-hard look from him is all it takes to get most outlaws to back down, yes, his Colt-45 "moderator" edition is feared by all men, but the real secret to his success: his earnest passion (some call it an obsession) for the seemingly sisyphean task of sustaining good conflict - letting it simmer but keeping it all times below the boiling point based on "the code":

"Conflict is essential to human life, whether between different aspects of oneself, between oneself and the environment, between different individuals or between different groups. It follows that the aim of healthy living is not the direct elimination of conflict, which is possible only by forcible suppression of one or other of its antagonistic components, but the toleration of it—the capacity to bear the tensions of doubt and of unsatisfied need and the willingness to hold judgement in suspense until finer and finer solutions can be discovered which integrate more and more the claims of both sides. It is the psychologist's job to make possible the acceptance of such an idea so that the richness of the varieties of experience, whether within the unit of the single personality or in the wider unit of the group, can come to expression."

May the last great tavern in the West and it's friendly bartender-sheriff live long and prosper.

sph · 4 years ago
fly.io didn't do phoenix. They hired its creator Chris McCord, but Phoenix was already an established product.
a-dub · 4 years ago
litestream looks super solid. would definitely consider it for a new project, if appropriate.
CJefferson · 4 years ago
My personal feeling (based on what I upvote) is that ycombinator isn't getting enough quality writing about tech issues to fill the front page, so if it's "full of fly.io", that just means there isn't enough stuff about other systems at the moment.

Same reason for a while the world seemed full of Rust articles -- at that point in time there wasn't (speaking as a C++ programmer) a pile of quality C++ articles around which the Rust was pushing out.

ethbr0 · 4 years ago
It seems like a combination of HN "top of mind" (ie that HN users submit articles on things they're currently reading/researching), social "top of mind" (ie that people write blog posts on things their peer group is talking about), and a side effect of HN positional age-decay.

If there aren't new and different articles to fill the HN front page, then there has to be something.

And that something ends up being a base layer of blogs about "current stuff".

dang · 4 years ago
If there's significant, good, on-topic technical content that isn't getting posted to HN, for god's sake someone please let me know.

If it's getting posted to HN but not getting traction, for god's sake someone please let me know that too.

Literally the only thing we're trying to do is have HN be as interesting as possible. Missing out on the best content is disastrous for that goal—sort of like missing out on the best startups is disastrous for an investor.

tptacek · 4 years ago
We agree! It's flattering and it's super interesting to read what people think about us and we're all blushing about the "heir to the vast Heroku fortune" stuff (even if it's probably not true), but we're also cringing a bit.

We have a big announcement/technical post queued up --- we'd planned to run it on Monday --- and we're holding off on it because of the "organic" attention we're getting this week. We'd much rather talk about things like Litestream, app pentests, hiring processes, and how we replaced Nomad in our architecture.

But we're as aware as everyone else is that the front page has limited bandwidth, and we can't be on it all the time, so we're waiting for this (hopefully short) wave of attention to crest before we post our own stuff.

danjac · 4 years ago
It's related to all the negative press recently around Heroku (security concerns + Salesforce neglect). People had a lot of goodwill towards the Paas, and a lot of that is missing what a PaaS does for you in the current environment of AWS or k8s complexity around devops, so they are looking for a replacement, and fly.io looks like one of the more innovative, or at least well-marketed, in this space.
telotortium · 4 years ago
For better or worse, fly.io has as a principal tptacek, who's at the top of the HN leaderboard and so has built up a lot of goodwill here.
mbesto · 4 years ago
And to be fair, Fly's blog content is very good for this audience (which is largely the result of tptacek).
tut-urut-utut · 4 years ago
This actually makes me wonder if people here generally pay attention, who posted what? Does that influence actual upvote status?

I never look at the person name when replying or voting, only the content. For example, I remembered this tptacek not because I remembered his posts, but because they get frequently mentioned in other people posts.

Mizza · 4 years ago
Fly isn't even that great, it's just everything else is much worse. I use it and I'm not surprised other people do as well.

There's also a lot of momentum for the Elixir/Phoenix right now, and they're pretty tightly integrated with that community.

ThinkBeat · 4 years ago
I agree but in a wider sense as well.

There are an awful lot of programming languages and methods that receive no hype. I have wondered about this, the most likely reason

Is that the majority of the crowd at the site come from a shared sphere, and to some agree the same type of priorities.

Personally, I like to stay away from the bleeding edge technology.

That does not constitute much of a problem for a lot of companies.

The annoying part is that recruiters often cram all sorts of technology into requirements for a CV and a job, even if the client has no need for those things, at least not yet.

There are millions or at least 100,000 of "enterprise" software projects out there.

I do admit it is not as sexy as the latest and greatest and start ups but it is a field where there are a whole lot of devs working in.

They do perhaps not spend as much time on HN, or they are quiet

alphabettsy · 4 years ago
Fair question. I’d rather see more of this than culture-war and politics.
wg0 · 4 years ago
fly.io is a solution but I don't know what the problem is. I can think of it like more dynamic CDN that can have more compute capacity (deploy whatever backed by SQLite/Postgres) to serve customers right way far more instantly.

Most applications and bigger chunk of them, are transactional and enterprise software is all about consistency and accuracy.

Nevertheless, I think its a great engineering fiat in and of itself anyway and hence gets discussed often probably could be the explanation.

rkangel · 4 years ago
<disclaimer: haven't actually used Fly.io>

The usecase I had for my startup (a few years ago, before fly.io) was "I have an Elixir/Phoenix application - stuff working in the background plus web frontend". I would like to host it with as little thinking about individual servers, load balancers etc. I went with GAE at the time and it was fine.

Fly.io seems like a much more streamlined version of the same thing, with the addition of "global load balancing" stuff on top, if I got to the point of caring about international customers.

ceejayoz · 4 years ago
I mean, that's a cool aspect useful to some, but I spent a day getting a Laravel app running on there, all in one region, and that's fine; I'm looking for a place to move my hobby apps off Heroku right now.
nickstinemates · 4 years ago
The last time I remember such a run on HN was when we created Docker. It was on the front page or mentioned in front page comments almost daily for at least 6 months (late 2013 early 2014)

Fly.io has a great combination of user virality/momentum and fundamentally technically interesting content on a wide range of topics.

stingraycharles · 4 years ago
Given that Fly.io has been part of YC W20, and that they make some interesting technology choices (eg all-in on sqlite), it creates more traction on this site.

Typically this type of thing goes in phases, and I wouldn’t worry about it, assuming you’re already OK with HN being biased towards YC-funded startups.

dan-robertson · 4 years ago
A big competitor of theirs, tailscale, also does well here.

I think the lesson is partly that the typical somewhat-deranged writing style/topics are popular. More companies should try to write engaging blog posts and be more open if they want to be successful.

It seems to have paid off for them as I would guess at least some of the people trying it out are learning about fly.io from HN.

pid-1 · 4 years ago
> A big competitor of theirs, tailscale, also does well here.

How does a container/database as a service platform competes with a Wireguard as a service platform?

alexmuro · 4 years ago
I think the perspective of this article is a very healthy and productive one and I found it particularly useful.

Assessing if attending on the shoulders of the new Giants who stand up every few years is a difficult problem that I'm interested in and I appreciate the amount of context given here considering different use cases.

rvz · 4 years ago
Generally, it is all YC hype. HN is biased towards YC startups.

There are other alternatives like Render.com, railway.app, etc but it is clear that fly.io is unsurprisingly overhyped by the HN crowd, especially if you are looking for a Heroku alternative.

It’s like asking a barber if you need a haircut.

ctvo · 4 years ago
Give credit where it's due:

fly.io spends a tremendous amount of time on creating interesting technical content that attracts this type of attention. The company is intentional about this as a customer acquisition strategy. They have an illustrator on staff for their unique art style, for example. Their founder and senior technical staff engage with these posts and answer questions, etc.. It's not YC favoritism, it's a deep understanding of the developer first mindset / ecosystem and targeting it as a company strategy.

tptacek · 4 years ago
Just a quick note that the list of applications for Fly.io at the end of this post was taken from our Launch HN --- https://news.ycombinator.com/item?id=22616857 --- and we've changed (expanded) since then.

When we launched, we didn't do persistent storage for instances, so it didn't make as much sense to run ordinary apps here; rather, the idea was that you'd run your full-stack app somewhere like us-east-1, and carve off performance-sensitive bits and run them on Fly.io. That's "edge computing".

But a bit over a year ago, we added persistent volumes, and then we built Fly Postgres on top of it. You can store files on Fly.io or use a bunch of different databases, some of which we support directly. So it makes a lot more sense to run arbitrary applications, like a Rails or Elixir app, which is not something we would have said back in March 2020.

nicoburns · 4 years ago
> But despite how much I want to learn the fly.io platform – it has been a bit tricky for me wrap my head around a good use-case for this type of distributed hosting service.

Worth noting that you don't have to use the distributed aspect. I have my site hosted on a single one of a fly.io's smallest instances (which one can get 3 of for free), and even like this the performance is excellent (50ms response times), and it doesn't have the problem of spinning down when not in use like Heroku's free tier.

It's nice to at least get a choice of regions. For example, the company I work for (not hosted on fly.io currently) only has customers in the UK and Ireland. So it's would be to be able to pop our servers there with a simple config setting.

petercooper · 4 years ago
Same. I'm really impressed with the experience on there now that I finally spent a day trying it out. The geodistribution stuff had no interest to me so I'd avoided them till now, but it's really the underlying tooling and experience that has won me over.
hartleybrody · 4 years ago
This is an excellent point. While their main value prop seems to be "servers closer to your users" you could also just use them as a drop-in replacement for something like heroku and just use one region to simplify the mental model, pricing and orchestration.
davidkuennen · 4 years ago
Do many companies actually need databases geolocated near users?

I'm working on big and small projects/companies and that has never been any concern of ours.

I always imagined it to be something only the very very big players care about. And as a big player I would usually bet on a big partner like AWS, GCP, Azure. Or am I missing something?

manigandham · 4 years ago
Almost none.

I've built 3 adtech companies including all the tech and it's one of the few cases where data needs to be spread across global regions for latency and regulations. It's a lot of effort regardless of the underlying provider and not worth it unless you really have the scale and latency requirements.

You can receive an HTTP response from the other side of the planet in less than a second so server-side rendering and sending a single HTML page works just fine. The problem is actually all these client-side SPAs that make a dozen requests and are actually much slower because of it.

simonw · 4 years ago
Companies start to get a lot more interested in this when their business truly goes global. Users in Australia have money to spend and get pretty poor performance from apps hosted in the USA due to speed of light issues.

I've looked at implementing this in the past and always found it to be SO difficult that the benefit would not be worth the cost.

Fly has changed that equation for me. It has moved this problem from "I'd love to do it if I could but it's just too hard" to "This is a thing I could do with small enough engineering effort that it would be worthwhile".

This is my favourite type of technology: I love things that move something from the "too expensive" to the "now feasible to implement" bucket!

Humphrey · 4 years ago
Yeah, and I guess all my Australian hosted things are slow for you Americans who have even more money to spend
sph · 4 years ago
Web Scale was first a meme, than an ideal everyone pushes towards even when deploying their small scale blog. Who knows what'll happen if suddenly you get a million concurrent users tomorrow? Better scale it geographically and put it behind a CDN today. Look at those generous free tiers.

Like you said, it's mostly snake oil except for very big players.

pid-1 · 4 years ago
I'd say using stuff like Netlify or GH Pages for static sites is worth it even if you have zero traffic. They legit are much easier to use than setting up your own VPS.
fauigerzigerk · 4 years ago
This isn't quite the same thing as trying to scale like Google though. Low latency is very important for usability regardless of how many users you serve. How easy it is to achieve depends more on the geographical distribution of your users than on their number.

If my app has a handful of users that are split between the US and Europe or Asia, and the app is 90% reads, then the distributed DB approach of fly.io or Cloudflare makes a lot of sense. It also adds considerable complexity though, so it's obviously a tradeoff.

kitbrennan · 4 years ago
Our business has an API that can be used for displaying dynamic information at point of sale (i.e. dynamic in that it cannot be cached and will need a DB call).

While we encourage our customers to try and use us asynchronously, we have a number of enterprises that don't and therefore demand incredibly fast response times with low latency. They pay us accordingly, so as a result we have geolocated databases (in our case though, we are using AWS Aurora replication).

kasey_junk · 4 years ago
Have your projects had customers all over the globe and have you measured the user experience for those that are furthest from your database?

If so, locality jumps up to the top of the performance bottlenecks pretty quick and there is no amount of performance optimization you can do to fix it.

maliker · 4 years ago
I’ve used fly.io for a couple new projects. The main thing I like about it is that it supports affordable and easy to use persistent volumes, something the container services (GCP, AWS) don’t. This lets me test locally with volumes in a way that is identical to how things will work when deployed. With the other container hosts, I’ve had to refactor to use cloud storage services like S3.

Fly.io also has a clean, highly usable CLI and minimal set of services unlike the hundreds of options on other providers. But that’s just icing on top—the volume support is the big advantage for me.