SQLedge: Replicate Postgres to SQLite on the Edge

Following the PostgreSQL logical replication stream to update a local SQLite database copy is definitely a neat trick, and feels very safe to me (especially since you track the Log Sequence Number in a postgres_pos table).

The bit that surprised me was that this thing supports writes as well!

It does it by acting as a PostgreSQL proxy. You connect to that proxy with a regular PostgreSQL client, then any read queries you issue run against the local SQLite copy and any writes are forwarded on to "real" PostgreSQL.

The downside is that now your SELECT statements all need to be in the subset of SQL that is supported by both SQLite and PostgreSQL. This can be pretty limiting, mainly because PostgreSQL SQL is a much, much richer dialect than SQLite.

Should work fine for basic SELECT queries though.

I'd find this project useful even without the PostgreSQL connection/write support though.

I worked with a very high-scale feature flag system a while ago - thousands of flag checks a second. This scaled using a local memcached cache of checks on each machine, despite the check logic itself consulting a MySQL database.

I had an idea to improve that system by running a local SQLite cache of the full flag logic on every frontend machine instead. That way flag checks could use full SQL logic, but would still run incredibly fast.

The challenge would be keeping that local SQLite database copy synced with the centralized source-of-truth database. A system like SQLedge could make short work of that problem.

DaiPlusPlus · 3 years ago

> I worked with a very high-scale feature flag system a while ago - thousands of flag checks a second.

May I ask why the flags are checked that frequently? Couldn't they be cached for at least a minute?

> It does it by acting as a PostgreSQL proxy. [...] and any writes are forwarded on to "real" PostgreSQL.

What happens if there's a multi-statement transaction with a bunch of writes sent-off to the mothership - which then get returned to the client via logical replication, but then there's a ROLLBACK - how would that situation be handled such that both the SQLite edge DBs and the mothership DB are able to rollback okay - would this impact other clients?

rockostrich · 3 years ago

Feature flag systems are usually based on a set of rules that could be serialized and evaluated locally (this is how pretty much every open source feature flag system and feature flag SaaS works). Usually it's based on some kind of UUID being hashed with a per-flag seed and bucketed after some set of targeting rules are applied to other properties passed in for that user. There are added features where you can stores large cohorts to do specific targeting and usually there's some kind of local cache added to make that look-up faster for recent users.

I'm not sure what the original commenter was doing but it sounds like they had some kind of targeting that was almost entirely based on cohorts or maybe they needed to have stability over time which would require a database. We did something similar recently except we just store a "session ID" with a blob for look-up and the evaluation only happens on the first request for a given session ID.

vasco · 3 years ago

> May I ask why the flags are checked that frequently? Couldn't they be cached for at least a minute?

Not in that project but feature flags don't have to be all or nothing. You can apply flags to specific cohorts of your users for example, so if you have a large user base, even if you cache them per-user, it still translates into many checks a second for large systems.

jmull · 3 years ago

> May I ask why the flags are checked that frequently? Couldn't they be cached for at least a minute?

Not the previous poster, but it appears in the scenario, the SQLite database is the cache.

simonw · 3 years ago

They were being cached for at least a minute (maybe even more than that, I can't remember the details) - that's what the local memcached instance was for.

This was problematic though because changing a feature flag and then waiting for a minute plus to see if the change actually worked can be frustrating, especially if it relates to an outage of some sort.

zknill · 3 years ago

The logical replication protocol sends a series of messages that essentially follow the flow that a database transaction would.

i.e. a stream of messages like: "BEGIN", "[the data]", ["COMMIT" or "ROLLBACK"].

So any application that listens to the Postgres replication protocol can handle the transaction in the same way that Postgres does. Concretely you might choose to open a SQLite transaction on BEGIN, apply the statements, and then COMMIT or ROLLBACK based on the next messages received on the stream replication protocol.

The data sent on the replication protocol includes the state of the row after the write query has completed. This means you don't need to worry about getting out of sync on queries like "UPDATE field = field + 1" because you have access to the exact resulting value as stored by Postgres.

TL;DR - you can follow the same begin/change/commit flow that the original transaction did on the upstream Postgres server, and you have access to the exact underlying data after the write was committed.

It's also true (as other commenters have pointed out) that for not-huge transactions (i.e. not streaming transactions, new feature in Postgres 15) the BEGIN message will only be sent if the transaction was committed. It's pretty unlikely that you will ever process a ROLLBACK message from the protocol (although possible).

runeks · 3 years ago

My limited understanding of logical replication is that writes only happen at COMMIT. Ie. nothing is replicated until it's committed.

paulddraper · 3 years ago

> Couldn't they be cached for at least a minute?

Only per feature+per user. (Though 1000s per second does seem high unless your scale is gigantic.)

> What happens if there's a multi-statement transaction with a bunch of writes sent-off to the mothership - which then get returned to the client via logical replication, but then there's a ROLLBACK

Nothing makes it into the replication stream until it is committed.

Omnipresent · 3 years ago

Honest question: why is SQLLite needed for local? Why would you not have PG at edge that replicates data with central PG? That way the SQL dialect problem you mentioned wouldn't exist.

yashap · 3 years ago

That is a much safer way to go for most use cases. Well actually, most use cases don't need edge compute at all, but for those that do, this setup is indeed common, and fine for most apps:

- Say we do edge compute in San Francisco, Montreal, London and Singapore

- Set up a PG master in one place (like San Francisco), and read replicas in every place (San Francisco, Montreal, London and Singapore)

- Have your app query the read replica when possible, only going to the master for writes

In rare cases, maybe any network latency is not OK, you really need an embedded DB for ultimate read performance - then this is pretty interesting. But a backend server truly needing an embedded DB is certainly a rare case. I would imagine this approach would come with some very major downsides, like having to replicate the entire DB to each app instance, as well as the inherent complexity/sketchiness of this setup, when you generally want your DB layer to be rock solid.

This is probably upvoted so high on HN because it's pretty cool/wild, and HN loves SQLite, vs. it being something many ppl should use.

joshuahaglund · 3 years ago

SQLite is much smaller and self-contained than postgres. It's written in ANSI-C and by including one file you have access to a database (which is stored in another single file). It's popular in embedded systems like, I imagine, edge devices

linsomniac · 3 years ago

You know what's faster than a local connection to Postgres? Having the database engine directly embedded in your application. No context switch.

btown · 3 years ago

A simple version of this is to do a very cheap SELECT * [where tenant = ...] of your feature flag table(s) to a dictionary structure in memory on every single edge/application server, and do this atomically every few seconds or minutes.

Statsig [0] and Transifex [1] both use this pattern to great effect, transmitting not only data but logic on permissions and liveness, and you can roll your own versions of all this for your own domain models.

I'm of the opinion that every agile project should start with a system like this; it opens up entirely new avenues of real-time configuration deployment to satisfy in-the-moment business/editorial needs, while providing breathing room to the development team to ensure codebase stability.

(As long as all you need is eventual consistency, of course, and are fine with these structures changing in the midst of a request or long-running operation, and are fine with not being able to read your writes if you ever change these values! If any of that sounds necessary, you'll need some notion of distributed consensus.)

[0] https://docs.statsig.com/server/introduction

[1] https://developers.transifex.com/docs/native

hinkley · 3 years ago

Does it though? If it’s a proxy it can support the SQLite read and the Postgres write syntax. If reads only ever go to SQLite they don’t need to work on Postgres.

aeyes · 3 years ago

How many flags are we talking here? I implemted a similar system and we just replace the whole sqlite DB file by downloading it from the centralized storage whenever it changes.

Even with 1M flags it's still only a few 100 kB compressed.

I wouldn't replicate per user flags to the edge to keep size under control.

seedless-sensat · 3 years ago

It's nice to see another pg proxy using the pgx parser (their src[1]) - I built one using this lib too. However, this implementation is missing a lot of low level features to be considered close to compatible, including: multi-query transactions, auth, TLS, extended query mode, query cancellation.

[1]: https://github.com/zknill/sqledge/blob/main/pkg/pgwire/postg...

yashap · 3 years ago

I wonder, how does this handle a single transaction that contains both reads and writes? Maybe it just says "within a transaction, all reads and writes go through the Postgres proxy, SQLite is ignored"?

ericraio · 3 years ago

One use case I can see this being valuable for is for a client based application and Postgres being a centralized database. The client would just query SQLite and not need to write Postgres SQL.

NicoJuicy · 3 years ago

Ff are mostly written centrally and cached to memory.