Rearchitecting: Redis to SQLite

I’m really interested in this model where each application server has a copy of a SQLite database file which is then replaced on a scheduled basis.

Here it’s being used for web application firewall rules.

Another place I’ve thought about using this is feature flag configuration. Feature flags can be checked dozens of times per request and often need the kind of queries (user is a member of group A and has an IP located in country B) which could be well served by a local SQLite - and feature flags have a tolerance for updates taking a few seconds (or longer) to roll out.

supriyo-biswas · a year ago

> I’m really interested in this model where each application server has a copy of a SQLite database file which is then replaced on a scheduled basis.

BTW, this is also the model used by all CDNs, where the global configuration file containing the certificates, HTTP routing rules etc. for all customers will be updated into into a single-file b-tree structure*, and that "bundle" is distributed among all edge locations frequently.

* I'm yet to see someone use sqlite for this purpose, it's usually DBM style databases like LMDB or Kyoto Cabinet.

twic · a year ago

> Kyoto Cabinet

Now, that's a name I've not heard in a long time.

Are people still using Kyoto Cabinet in new projects? Are people still using DBM-style storage generally? I thought that whole branch of the evolutionary tree had sort of died out.

redserk · a year ago

I’ve worked on a project a long time ago where we did this with BerkeleyDB files. BDB was used to store configuration data that was frequently looked up.

Periodically we would run a full sync to replace the database. Between the periodic full syncs, we had a background process keep changes applied on a rolling basis.

All-in-all, it worked pretty well at the time! The full database file sync ensured a bad database was timeboxed and we got a bootstrapping mechanism for free.

akira2501 · a year ago

The query engine in SQLite can be weak. In particular where JOINs across multiple columns are concerned. You really do need to be aware of this when designing indexes into SQLite files.

In any case, SQLite would serve this solution, but almost certainly with a measurable level of inefficiency built in.

Capricorn2481 · a year ago

> will be updated into into a single-file b-tree structure

I'm not knowledgeable on this, but my understanding was a b-tree is a way of sorting values that could be ordered in a certain way. Like this would be a b-tree of IDs

```

            [8]

           /   \

      [3, 5]   [10, 12]

     / | \     / | \  

  [1] [4] [6,7] [9] [11, 13]

```

You traverse by comparing your needle to the root node and going left or right depending on the results.

How is that done with configuration options? That seems like it would just be a regular hashmap which is already efficient to read. What would a b-tree of key/values even look like that wouldn't be less efficient than a hashmap?

d0mine · a year ago

you can use sqlite as a dbm db https://docs.python.org/3.13/library/dbm.html#module-dbm.sql...

jitl · a year ago

We used this model to distribute translations, feature flags, configuration, search indexes, etc at Airbnb. But instead of SQLite we used Sparkey, a KV file format developed by Spotify. In early years there was a Cron job on every box that pulled that service’s thingies; then once we switched to Kubernetes we used a deamomset & host tagging (taints?) to pull a variety of thingies to each host and then ensure the services that use the thingies only ran on the hosts that had the thingies.

In Ruby we called this “hammerspace” https://github.com/airbnb/hammerspace

otterley · a year ago

In Kubernetes, pod affinities (not taints) are the typical and preferred mechanism used to ensure pods are scheduled on the same node as their dependencies.

michaelbuckbee · a year ago

1. Very cool

2. If you were making it today would you just use SQLite?

quesera · a year ago

> Feature flags can be checked dozens of times per request

My strategy for resolving this is to fetch the flag value once, but to store it in the request object, so that a) you never have to take the expensive lookup hit more than once per request, and b) there's no risk of an inconsistent value if the flag is updated mid-request.

jitl · a year ago

Where is the “session object” stored?

CraigJPerry · a year ago

What’s the use case for re-checking the same feature flag in a single session?

I can see why you need to check multiple different flags in a session and I understand the parent point about looking in SQLite for them (effectively a function call into a library in process address space rather than a call over the network for each flag).

zellyn · a year ago

Having now spent enough years thinking about feature flags during an extended migration (from our own internal system to LaunchDarkly), I've become convinced that a really useful primitive would be:

* SQLite, but every change you make increments a global version counter.

* Some way to retrieve and/or apply just the changes between version counter numbers.

Then, you could layer all manner of distribution mechanisms on top of that. You could imagine gossip networks, an explicit tree-shaped cache/distribution network, etc. Anyone who has a later version than you would be able to give you the update.

What would this get you?

* You can bundle the current version of a DB into your app artifact, but efficiently update it once your app runs. This would let you fall back to much better defaults than having no data at all. (eg. For feature flags or live config, this would let you fall back to a recent state rather than code-level defaults.)

* Any kind of client can send the global counter as an ETAG and get just the delta

* Reconnections if the network blips are really simple and cheap

* If the system also let you keep a couple of minutes of history, you could annotate the counter onto a microservice call (using W3C Baggage headers, for example), and evaluate your flags/config/data/whatever at a single version number across multiple microservices. Even without evaluate-at-time functionality, logging the generation number would help a lot when debugging what happened after the fact

nbbaier · a year ago

Really neat ideas, anywhere something like this has been implemented?

michaelbuckbee · a year ago

SQLite for distribution is neat. FWIW - this is at least partially inspired by your datasette project which we may still try and do something with later on the reporting and data exploration side of things.

jasonwatkinspdx · a year ago

Years ago I had a conversation with a friend of a friend that worked at one of the big chip fabs in their lab dept. He mentioned they made very heavy use of sqlite as a file format for distribution. This was back in the "xml all the things" era and it struck me as such a smart breath of fresh air.

I'm honestly surprised it isn't more pervasive.

nnf · a year ago

I've wanted to implement this on a distributed web server environment I manage. Right now there's a centralized MySQL database that the web servers read from when rendering a web page, but there can be lots of queries for a single render (page, sections, snippets, attributes, assets, etc.), and sending that all over the wire, while fast, is slower than reading from a database running on the same host. It'd be great to be able to copy the "master" database onto each web server instance, maybe once per minute, or just on-demand when a change to the data is made. I imagine this would make reads much faster.

pkhuong · a year ago

That's how https://github.com/backtrace-labs/verneuil 's read replication is meant to be used. There's a command-line tool to recreate a sqlite DB file from a snapshot's manifest, with an optional local cache to avoid fetching unchanged pages, or you can directly use a replica in memory, with pragmas for (async) refreshes.

The write tracking needs to intercept all writes with a custom VFS, but once registered and configured, it's regular in-memory SQLite (no additional daemon).

otoolep · a year ago

rqlite[1] could basically do this, if you use read-only nodes[2]. But it's not quite a drop-in replacement for SQLite at the write-side. But from point of view of a clients at the edge, they see a SQLite database being updated which they can directly read[3].

That said, it may not be practical to have hundreds of read-only nodes, but for moderate-size needs, should work fine.

Disclaimer: I'm the creator of rqlite.

[1] https://rqlite.io/

[2] https://rqlite.io/docs/clustering/read-only-nodes/

[3] https://rqlite.io/docs/guides/direct-access/

michaelbuckbee · a year ago

(author here) I had looked at both Rqlite and the different commercial versions of this, but I didn't pursue them as they all seemed to require running an additional service on the host machines.

antman · a year ago

What would be the challenge with hundreds of read nodes?

adroitboss · a year ago

You may want to check out LiteFS and LiteStream by benbjohnson. There was a time in 2022 where he was trending every week for his projects. The following snippet is taken from the LiteFS webpage. "LiteFS is a distributed file system that transparently replicates SQLite databases. You can run your application like it’s running against a local on-disk SQLite database but behind the scenes the database is replicated to all the nodes in your cluster."

faangguyindia · a year ago

Why sqlite but not something like Rocksdb which is optimised for NVMe.

While SQLite can be used on SSDs, it is not as heavily optimized for SSDs as RocksDB. SQLite is a general-purpose embedded database and was primarily designed for use on devices like mobile phones or lightweight embedded systems, where storage is often slower and smaller (e.g., flash storage or basic SSDs).

SQLite’s traditional B-tree structure involves more random I/O compared to RocksDB's LSM tree, which is less ideal for maximizing the performance of high-speed SSDs like NVMe.

SQLite’s limited concurrency (single-writer, multiple-readers) also means it cannot fully utilize the parallelism and high throughput potential of NVMe SSDs.

otterley · a year ago

This is a high read, low write application. And SSDs offer very fast random I/O performance. SQLite is more than sufficient and the entire database may well fit into the page cache anyway.

Salgat · a year ago

We use something similar for our event based databases. We project millions of events into an in-memory object (usually a dictionary), and periodically persist that state as json in S3. It's guaranteed to be consistent across all service instances because the persisted state contains the event stream positions to catchup from. The only drawback of course is that it can use up to several GB of memory depending on how big the state is, but it's extremely fast and doing in-memory operations is trivial compared to using SQL or a library with api calls.

atomicnumber3 · a year ago

In a past life, I used this pattern in hadoop mapreduce clusters. A job would do "small-data" sql queries locally to pull configs, maps of facts related to the business domain, then pack them into sqlite db files and ship them up with the job. Hadoop already has a process called "job localization" where it can download files sent up with the job down to the PWD of the mapper/reducer. And then the mapper/reducer can use it read-only while doing big-data things.

bob1029 · a year ago

> a SQLite database file which is then replaced on a scheduled basis.

You could look into WAL replication if you wanted an efficient way to update the copies. Something like Litestream.

er0k · a year ago

litestream is great. Dead simple and rock solid IME.

TmpstsTrrctta · a year ago

I’ve utilized this in Lambda@Edge for use case half feature flag-y, half HTTP routing/proxy serving rules as mentioned in a sibling comment. Lambdas pick up ~50-200MB of data on first boot, and keep their copy through their lifecycle.

As requests come in, gather their features, convert to effectively an int vector, filter for row where match and serve request

closeparen · a year ago

This is the type of architecture we use for feature flagging, but it's just a JSON file.

chipdart · a year ago

> Feature flags can be checked dozens of times per request and often need the kind of queries (user is a member of group A and has an IP located in country B) which could be well served by a local SQLite - and feature flags have a tolerance for updates taking a few seconds (or longer) to roll out.

This doesn't sound right. A feature flag only requires checking if a request comes from a user that is in a specific feature group. This is a single key:value check.

The business logic lies in assigning a user to a specific feature group, which the simplest way means pre assigning the user and in the most complex cases takes place at app start/first request to dynamically control dialups.

Either way, it's a single key: value check where the key is user ID+feature ID, or session ID + feature ID.

I mean, I guess you can send a boat load of data to perform the same complex query over and over again. I suppose. But you need to not have invested any thought onto the issue and insisted in making things very hard for you, QAs, and users too. I mean, read your own description: why are you making the exact same complex query over and over and over again, multiple times in the same request? At most, do it once, cache the result, and from therein just do a key:value check. You can use sqlite for that if you'd like.

simonw · a year ago

I've worked at places where the feature flag system was much more dynamic than that, considering way more than just membership in a group.

This meant you could roll features out to:

- Specific user IDs

- Every user ID in a specific group

- Every object owned by a specific user ID (feature flags might apply to nested objects in the system)

- Requests from IP addresses in certain countries

- Requests served by specific website TLDs

- Users who are paid members of a specific plan

- etc etc etc

It was an enormously complicated system, that had evolved over 5-10 years.

Not saying that level of complexity is recommended, but that's what we had.

Looks like I gave a talk about this back in 2014: https://speakerdeck.com/simon/feature-flags

nnf · a year ago

GP's comment is talking about checking multiple feature flags, not checking a single feature flag multiple times.

>While Redis is "fast" in comparison to traditional RDBMS, it's still a database that you have to manage connections, memory, processes, etc., which introduces more brittleness into the stack (the opposite of what we're trying to achieve).

Every database, Relational or Nonrelational, requires approximately the same level of management and maintenance when you start dealing with non-toy levels of transactions.

The "Fast" part is a little funny. If you don't care about joins, then row inserts and retrievals are pretty damn fast too =)

chipdart · a year ago

> The "Fast" part is a little funny. If you don't care about joins, then row inserts and retrievals are pretty damn fast too =)

What makes SQLite exceptionally fast in a server environment is that you do not require a network call to do the query or even retrieve the data. Your remarks about joins and transactions are meaningless once you understand you're just reading stuff from your very own local HD, which is already orders of magnitude faster.

vundercind · a year ago

You can connect to Redis over a local socket, no network overhead.

Modern cloud architecture hates local sockets and wants to push everything to the network, but if you care about performance and are using Redis, that’s how you’d favor deploying & connecting to it.

gwbas1c · a year ago

SQLite has its vacuum operation, which is kind-of like running a garbage collection. Every time I read the docs about when to run a vacuum, I end up confused.

The last time I shipped an application on SQLite, I ended up just using a counter and vacuuming after a large number of write operations.

prirun · a year ago

HashBackup author here, been using SQLite for about 15 years.

Doing a vacuum after a large number of deletes might make sense. The only real purpose of vacuum IMO is to recover free space from a database. Vacuum may also optimize certain access patterns for a short while, though I have never tested this, and it would be highly dependent on the queries used. If fragmentation is a bigger concern for you than recovering free space, you can also compute the fragmentation to decide whether to vacuum by using the dbstat table:

https://www.sqlite.org/dbstat.html

Then again, computing this will require accessing most of the database pages I'm guessing, so might take nearly as long as a vacuum. The other gotcha here is that just because db pages appear to be sequential in a file doesn't mean they are sequential on a physical drive, though filesystems do strive for that.

SQLite has pragma commands to tell you the number of total and free db pages. When the percentage of free pages is greater than x% and it's a convenient time, do a vacuum. For a highly volatile db, you can add a table containing this percentage, update it every day, and make your decision based on an average, but IMO it's easier just to check for more than 50% free (or whatever) and do the vacuum.

Vacuums used to be (circa 2019) pretty slow operations, but the SQLite team has sped them up greatly since then. Vacuuming a 3GB SQLite db on a SSD takes less than a minute these days. That's with the db 100% full; with only 50% used pages, it would be considerably faster.

Vacuums are done in a statement transaction, so you don't have to worry about a "half vacuum that runs out of disk space" screwing up your database.

zvr · a year ago

> Every database, Relational or Nonrelational, requires approximately the same level of management and maintenance

I disagree with this statement. Surely there is a difference on the effort for "management and maintenance" if your database requires the operation of independent server processes.

Going to extreme examples, do you really believe it makes no difference whether you use SQLite or Oracle?

Deleted Comment