Orama is definitely a hidden gem, and it's a clever usage for complementary indexing!
Also agreed Triplit's DX is excellent. I'd recommend giving it another look, Triplit's recent 1.0 release has up to 10x performance boost (https://www.triplit.dev/blog/triplit-1.0).
Since your use-case is data in the range of gigabytes, you could consider using duckdb-wasm. However I'm not sure how to best integrate this with collaboration / CRDTs (sqlRooms is also interesting prior art).
But, does Replicache work for your native targets? Or you are okay with a different data layer for native (sqlite) vs web (boutique data model on top of IndexedDB). At the start of the article it sounds like the goal is to use the same abstraction across web and mobile native and solutions that bifurcate implementation are unacceptable, but then we end up preferring a solution that's different between web target and native targets.
Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed; like SQLite-over-IndexedDB, I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records. The big difference between SQLite-over-IndexedDB and Zero-over-IndexedDB is that Zero is written with sympathy to IndexedDB's performance characteristics, whereas SQLite is written with sympathy to conventional filesystem performance.
On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
> Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed
Replicache exposes only a kv interface. Zero does expose a SQL-like interface.
> I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records.
The pages are JSON values not binary encoded, but that's an impl detail. At a big picture, you're right that both Replicache and Zero aggregate many values into pages that are stored in IDB (or SQLite in React Native).
> On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
Replicache and Zero are a bit different here. Replicache keeps only up to 64MB in memory. It uses an LRU cache to manage this. The rest is paged in and out of IDB.
This ended up being a really big perf cliff because bigger applications would thrash against this limit.
In Zero, we just keep the entire client datastore in memory. Basically we use IDB/SQLite as a backup/restore target. We don't page in and out of it.
This might sound worse, but the difference is Zero's query-driven sync. Queries automatically fallback to the server and sync. So the whole model is different. You don't sync everything, you just sync what you need. From some upcoming docs:
Yes, Replicache works beautifully on our mobile/native targets.
The constructor allows you to pass in any arbitrary KVStore provider, and we happen to use op-sqlite as its performance is exceptional.
There is no "different data layer" per se, just a different storage mechanism.
Replicache also holds a mem cache that is limited to ~50MB if I recall. Our use case is extremely data-heavy, so we might end up never migrating to Zero – who knows.
Perhaps I misunderstood your question, let me know if I can clarify further.
I’m glad this post made its way to HN, since it allowed me to reflect on how much progress we’ve made since it was written. The blog post doesn’t support comments, so here goes:
> The DX is the worst by quite a margin.
DX is something that we care a lot about. Like it’s a topic in every single planning and strategy session. We’ve made a ton of progress since this was posted, and we have a lot more on this front on our roadmap.
> Not only does it require Postgres-level integration
If you’re looking to stream data from your Postgres database to clients, I’m not aware of any other way to do this but to integrate with Postgres. So I’m not sure why this is framed in a negative light. PowerSync also only requires a user with SELECT privileges, so it’s not an invasive integration.
> it also needs a HA MongoDB cluster
It’s now possible to use Postgres for intermediary storage instead of MongoDB. I actually recall we gave you a shout out in one of our product updates on this :)
> a lot of arcane yaml configuration
We’ve since published a schema for the yaml config - add this to powersync.yaml:
> It also required us to completely denormalise our Postgres tables, as relations are not properly supported across the sync buckets.
This is ultimately a function of our decision to build a system that scales well. Having said that, we have plans to address this limitation this year.
> we found horrifying problems like off-by-one bugs in their Drizzle ORM integration
This was fixed around the time of the post, and our Drizzle integration is now in beta i.e. fully supported in production environments.
> queries returning data from local db very slowly (100ms+),
It’s hard to say what was causing this without knowing more, but as mentioned elsewhere in the comments here we’ve since added support for OPFS which provides a big performance boost.
> and long initialisation times (45s+ after login to Marco) with the UI view not updating until the sync fully completed.
We’ve added APIs to solve this - if you want to sync everything up front you can use sync priorities to log the user in quickly and sync the rest in the background [1]. We’ve also implemented many optimizations like transport-level compression, a waaaay faster rust sync client, etc with more to come. Also, using sync streams (available in the next few weeks) [2] you can sync data just in time as the user navigates around the app instead of syncing everything up front.
> No real case studies
It’s often quite a slow process to get case studies published with large companies, but you can see logos of customers using PowerSync on our website. But yes.
The punchline of this article is that all the implementations they tried (WatermelonDB, PowerSync, ElectricSQL, Triplit, InstantDB, Convex) are all built on top of IndexedDB.
"The root cause is that all of these offline-first tools for web are essentially hacks. PowerSync itself is WASM SQLite... On top of IndexedDB."
But there's a new web storage API in town, Origin Private File System. https://developer.mozilla.org/en-US/docs/Web/API/File_System... "It provides access to a special kind of file that is highly optimized for performance and offers in-place write access to its content."
OPFS reached Baseline "Newly Available" in March 2023; it will be "Widely Available" in September.
WASM sqlite on OPFS is, finally, not a hack, and is pretty much exactly what the author needed in the first place.
We do see about 10x the database row corruption rate w/ WASM OPFS SQLite compared to the same logic running against native SQLite. For read-side cache use-case this is recoverable and relatively benign but we're not moving write-side use-case from IndexedDB to WASM-OPFS-SQLite until things look a bit better. Not to put the blame on SQLite here, there's shared responsibility for the corruption between the host application (eg Notion), the SQLite OPFS VFS authors, the user-agent authors, and the user's device to ensure proper locking and file semantics.
Yeah, I did fail to mention OPFS in the blog post. It does look very promising, but we're not in a position to build on emergent tech – we need a battle-tested stack. Boring over exciting.
Not sure anything in the offline-first ecosystem qualifies as "boring" yet. You would need some high-profile successful examples that have been around for a few years to earn that title
I’m doing offline-first apps at work and want to emphasize that you’re constraining yourself a lot trying to do this.
As mentioned, everything fast(ish) is using SQLite under the hood. If you don’t already know, SQLite has a limited set of types, and some funky defaults. How are you going to take this loosey-goosey typed data and store it in a backend database when you sync? What about foreign key constraints, etc., can you live without those? Some of the sync solutions don’t support enforcing them on the client.
Also, the SQLite query planner isn’t great in my experience, even when you’re only joining on ids/indexes.
Document databases seem more friendly/natural, but as mentioned indexeddb is slow.
I wish this looked at https://rxdb.info/ more. They have some posts that lead me to believe they have a good grasp on the issues in this space at least
Also, OPFS is a newish thing everyone is using to store SQLite directly instead of wrapping IndexedDB for better performance.
Notion is a very async collaborative application and we rely on a form of transactions. When you make a change in Notion like moving a bunch of blocks from one page to another, we compose the transaction client-side given the client's in-memory snapshot view of the universe, and send the transaction to the server. If the transaction turns out to violate some server-side validation (like a permissions issue), we reject the change as a unit and roll back the client.
I'm not sure how we'd do this kind of thing with RxDb. If we model it as a delete in one document and an insert into another document, we'd get data loss. Maybe they'd tell us our app shouldn't have that feature.
I am continually bewildered how no one ever gives RxDB, which has been around for many years longer than the rest of these tools, any love.
It has so many optimizations and features that the others dont. And is even better when you use the premium addons. I compared it to pretty much everything, and its not even close.
IndexedDB is a standard and can be implemented however the user-agent sees fit. Chromium source tree has an implementation on LevelDB and an implementation on SQLite; I'm not sure how they pick the appropriate backend. Firefox and WebKit both appear to use SQLite as the backend.
WebSQL was a clunky API, but not as clunky as IndexedDB which is truly yucky and very easy to get wrong in modern apps that use promises.
wa-sqlite on top of OPFS is actually pretty great these days. Performance is about half of what I'd get in native SQLite, which is not too bad overall. It's around 10x faster than SQLite on top of IndexedDB for large databases in my experience.
It's much better than WebSQL could ever be. You get the full power of modern SQLite, with the version, compile options, additional extensions, all under your control.
Triplit and Orama are definitely often overlooked hidden gems.
Since the post is already a few months old, it's worth mentioning that the newly released Triplit 1.0 had had a massive performance update (up to 10x). You should definitely reconsider it for larger scale data projects and the team is really highly knowledgable. https://www.triplit.dev/blog/triplit-1.0
I struggled with this landscape a few years ago when building Mere Medical to manage my own medical records. To be fair, I was aiming for not just offline-first, but offline-only (user data was exclusively stored on device, not in any server). I got surprisingly far with RxDB, but it definitely felt like I was pushing these tools and the web platform to their limit.
There’s just an assumption that these client databases don’t need mature tools and migration strategies as “it’s just a web client, you can always just re-sync with a server”. Few client db felt mature enough to warrant building my entire app on as they’re not the easiet to migrate off of.
I also tried LokiJS which is mentioned in the OP. I even forked (renamed it SylvieJS lol) it to rewrite it in TS and update some of the adapters. I ultimately moved away from it as well. I found an in memory db will struggle past a few hundred mbs which I hit pretty quickly.
No matter what db you use, you’re realistically using indexed db behind the hood. What surprised me was that a query to indexed db can be slower than a network call. Like what.
On midrange and below Android devices, literally any local persisted data access can be slower than a network call. Even a point read from a small SQLite b-tree can be coming off a $3 microsd card and a CPU equivalent to a 10 year old iPhone. https://infrequently.org/2024/01/performance-inequality-gap-...
Nice post! I'm building an offline-first collaboration app and went on the route of building a custom sync engine, mainly because the app is open-source and I didn't want to introduce any dependency. I've implemented a simple cursor based sync with Postgres on server and SQLite in client side.
Initially I built only a desktop client, because I didn't like IndexedDB. After the app got into HN, someone recommended to check for OPFS (Origin Private File System).
Now we have a full offline-first app in web using SQLite on top of OPFS. We didn't test it with large scale yet, but so far looks very promising. The good thing is that we use Kysely as an abstraction for performing queries in SQLite which helps us share most of the code across both platforms (electron + web) with some minor abstractions.
Depending on your data model, LiveStore is a completely open-source, SQLite based approach for local first sync-y apps: https://livestore.dev/
It's oriented around event sourcing and syncs the events, which get materialized into local table views on clients. It's got pretty slick devtools too.
I did look into it back then, but was not very convenient for my use case. Apart from the data model, I wanted to use Yjs for conflict resolution and wanted more direct control over the sync.
p.s Just wanted to say thank you for all the contribution you do here on HN. Colanode (the app I'm building) is an alternative to Notion and I learned a lot about how you (Notion) build things through reading your comments.
I find InstantDB's page confusing: How far is it open-source and self-hostable ? I don't mind you having a sustainable cash flow, but it all seems a bit unclear which parts are fully open-source and self-hostable.
We've had great success with Replicache+Orama since this was written. We're keen to give Zero a spin once it's a bit more stable.
Triplit has essentially folded as a "company" and become some sort of open-source initiative instead.
InstantDB has matured massively and is definitely worth a look for anyone starting a new project.
Also agreed Triplit's DX is excellent. I'd recommend giving it another look, Triplit's recent 1.0 release has up to 10x performance boost (https://www.triplit.dev/blog/triplit-1.0).
Since your use-case is data in the range of gigabytes, you could consider using duckdb-wasm. However I'm not sure how to best integrate this with collaboration / CRDTs (sqlRooms is also interesting prior art).
We have hundreds of thousands of entities in Replicache, and index them via Orama. We're able to perform full-text search in single-digit ms.
We persist the Orama index as JSON, so computation only happens once per mutation.
Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed; like SQLite-over-IndexedDB, I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records. The big difference between SQLite-over-IndexedDB and Zero-over-IndexedDB is that Zero is written with sympathy to IndexedDB's performance characteristics, whereas SQLite is written with sympathy to conventional filesystem performance.
On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
> Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed
Replicache exposes only a kv interface. Zero does expose a SQL-like interface.
> I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records.
The pages are JSON values not binary encoded, but that's an impl detail. At a big picture, you're right that both Replicache and Zero aggregate many values into pages that are stored in IDB (or SQLite in React Native).
> On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
Replicache and Zero are a bit different here. Replicache keeps only up to 64MB in memory. It uses an LRU cache to manage this. The rest is paged in and out of IDB.
This ended up being a really big perf cliff because bigger applications would thrash against this limit.
In Zero, we just keep the entire client datastore in memory. Basically we use IDB/SQLite as a backup/restore target. We don't page in and out of it.
This might sound worse, but the difference is Zero's query-driven sync. Queries automatically fallback to the server and sync. So the whole model is different. You don't sync everything, you just sync what you need. From some upcoming docs:
https://i.imgur.com/y91qFrx.png
The constructor allows you to pass in any arbitrary KVStore provider, and we happen to use op-sqlite as its performance is exceptional.
There is no "different data layer" per se, just a different storage mechanism.
Replicache also holds a mem cache that is limited to ~50MB if I recall. Our use case is extremely data-heavy, so we might end up never migrating to Zero – who knows.
Perhaps I misunderstood your question, let me know if I can clarify further.
I’m glad this post made its way to HN, since it allowed me to reflect on how much progress we’ve made since it was written. The blog post doesn’t support comments, so here goes:
> The DX is the worst by quite a margin.
DX is something that we care a lot about. Like it’s a topic in every single planning and strategy session. We’ve made a ton of progress since this was posted, and we have a lot more on this front on our roadmap.
> Not only does it require Postgres-level integration
If you’re looking to stream data from your Postgres database to clients, I’m not aware of any other way to do this but to integrate with Postgres. So I’m not sure why this is framed in a negative light. PowerSync also only requires a user with SELECT privileges, so it’s not an invasive integration.
> it also needs a HA MongoDB cluster
It’s now possible to use Postgres for intermediary storage instead of MongoDB. I actually recall we gave you a shout out in one of our product updates on this :)
> a lot of arcane yaml configuration
We’ve since published a schema for the yaml config - add this to powersync.yaml:
# yaml-language-server: $schema=https://unpkg.com/@powersync/service-schema@latest/json-sche...
> It also required us to completely denormalise our Postgres tables, as relations are not properly supported across the sync buckets.
This is ultimately a function of our decision to build a system that scales well. Having said that, we have plans to address this limitation this year.
> we found horrifying problems like off-by-one bugs in their Drizzle ORM integration
This was fixed around the time of the post, and our Drizzle integration is now in beta i.e. fully supported in production environments.
> queries returning data from local db very slowly (100ms+),
It’s hard to say what was causing this without knowing more, but as mentioned elsewhere in the comments here we’ve since added support for OPFS which provides a big performance boost.
> and long initialisation times (45s+ after login to Marco) with the UI view not updating until the sync fully completed.
We’ve added APIs to solve this - if you want to sync everything up front you can use sync priorities to log the user in quickly and sync the rest in the background [1]. We’ve also implemented many optimizations like transport-level compression, a waaaay faster rust sync client, etc with more to come. Also, using sync streams (available in the next few weeks) [2] you can sync data just in time as the user navigates around the app instead of syncing everything up front.
> No real case studies
It’s often quite a slow process to get case studies published with large companies, but you can see logos of customers using PowerSync on our website. But yes.
[1] https://docs.powersync.com/usage/use-case-examples/prioritiz... [2] https://github.com/powersync-ja/powersync-service/discussion...
"The root cause is that all of these offline-first tools for web are essentially hacks. PowerSync itself is WASM SQLite... On top of IndexedDB."
But there's a new web storage API in town, Origin Private File System. https://developer.mozilla.org/en-US/docs/Web/API/File_System... "It provides access to a special kind of file that is highly optimized for performance and offers in-place write access to its content."
OPFS reached Baseline "Newly Available" in March 2023; it will be "Widely Available" in September.
WASM sqlite on OPFS is, finally, not a hack, and is pretty much exactly what the author needed in the first place.
But notably, not directly atop. We build our own KV store that uses IDB just as block storage. So I sort of agree w/ you.
But if we were to build atop OPFS we'd also just be using it for block storage. So I'm not sure it's a win? It will be interesting to explore.
As mentioned, everything fast(ish) is using SQLite under the hood. If you don’t already know, SQLite has a limited set of types, and some funky defaults. How are you going to take this loosey-goosey typed data and store it in a backend database when you sync? What about foreign key constraints, etc., can you live without those? Some of the sync solutions don’t support enforcing them on the client.
Also, the SQLite query planner isn’t great in my experience, even when you’re only joining on ids/indexes.
Document databases seem more friendly/natural, but as mentioned indexeddb is slow.
I wish this looked at https://rxdb.info/ more. They have some posts that lead me to believe they have a good grasp on the issues in this space at least
Also, OPFS is a newish thing everyone is using to store SQLite directly instead of wrapping IndexedDB for better performance.
Notion is a very async collaborative application and we rely on a form of transactions. When you make a change in Notion like moving a bunch of blocks from one page to another, we compose the transaction client-side given the client's in-memory snapshot view of the universe, and send the transaction to the server. If the transaction turns out to violate some server-side validation (like a permissions issue), we reject the change as a unit and roll back the client.
I'm not sure how we'd do this kind of thing with RxDb. If we model it as a delete in one document and an insert into another document, we'd get data loss. Maybe they'd tell us our app shouldn't have that feature.
You could always ask your question in their discord - I've always gotten prompt and helpful responses
It has so many optimizations and features that the others dont. And is even better when you use the premium addons. I compared it to pretty much everything, and its not even close.
WebSQL was a clunky API, but not as clunky as IndexedDB which is truly yucky and very easy to get wrong in modern apps that use promises.
It would be nice to have WebSQL though, even if it has to be spec'd as "it's sqlite".
It's much better than WebSQL could ever be. You get the full power of modern SQLite, with the version, compile options, additional extensions, all under your control.
Since the post is already a few months old, it's worth mentioning that the newly released Triplit 1.0 had had a massive performance update (up to 10x). You should definitely reconsider it for larger scale data projects and the team is really highly knowledgable. https://www.triplit.dev/blog/triplit-1.0
There’s just an assumption that these client databases don’t need mature tools and migration strategies as “it’s just a web client, you can always just re-sync with a server”. Few client db felt mature enough to warrant building my entire app on as they’re not the easiet to migrate off of.
I also tried LokiJS which is mentioned in the OP. I even forked (renamed it SylvieJS lol) it to rewrite it in TS and update some of the adapters. I ultimately moved away from it as well. I found an in memory db will struggle past a few hundred mbs which I hit pretty quickly.
No matter what db you use, you’re realistically using indexed db behind the hood. What surprised me was that a query to indexed db can be slower than a network call. Like what.
Initially I built only a desktop client, because I didn't like IndexedDB. After the app got into HN, someone recommended to check for OPFS (Origin Private File System).
Now we have a full offline-first app in web using SQLite on top of OPFS. We didn't test it with large scale yet, but so far looks very promising. The good thing is that we use Kysely as an abstraction for performing queries in SQLite which helps us share most of the code across both platforms (electron + web) with some minor abstractions.
You can check the implementation in Github: https://github.com/colanode/colanode
It's oriented around event sourcing and syncs the events, which get materialized into local table views on clients. It's got pretty slick devtools too.
p.s Just wanted to say thank you for all the contribution you do here on HN. Colanode (the app I'm building) is an alternative to Notion and I learned a lot about how you (Notion) build things through reading your comments.
* We added some serious typescript types
* We have sorting and ordering on fields
* We added the $like operator
* We added reactive queries on the backend
We're on a mission to make the best DX possible for building apps. We take your feedback seriously, and ship as quick as we can.
As the author mentions in the comments here:
> InstantDB has matured massively and is definitely worth a look for anyone starting a new project.
If you get a chance to try us out, we'd love feedback :)
Everything you see on instantdb.com is open source, and hosted in one monorepo here: https://github.com/instantdb/instant