Turso SQLite Offline Sync Public Beta

It'd be nice if the post went into how conflict resolution will (?) work because that's the hard part here and the main selling point imo.

titaphraz · 9 months ago

A lot of offline sync projects just drop data on conflict and pretend it didn't happen. It's the salesman job to divert your questions about it.

I found another blogpost from turso where they say they offer 3 options on conflict: drop it, rebase it (and hope for no conflict?) and "handle it yourself".

Writing an offline sync isn't hard. Dealing with conflicts is a PITA.

https://turso.tech/blog/introducing-offline-writes-for-turso...

bob1029 · 9 months ago

Conflict resolution can't work in a general sense.

How you reconcile many copies of the same record could depend on time of action, server location, authority level of the user, causality between certain business events, enabled account features, prior phase of the moon, etc.

Whether or not offline sync can even work is very much a domain specific concern. You need to talk to the business about the pros & cons first. For example, they might not like the semantics regarding merchant terminals and offline processing. I can already hear the "what if the terminal never comes back online?" afternoon meeting arising out of that one.

ochiba · 9 months ago

I would say this is why server-authoritative systems that allow for custom logic in the backend for conflict resolution work well in practice (like Replicache, PowerSync and Zero - custom mutators coming in beta for the latter). Predefined deterministic distributed conflict resolution such as CRDT data structures work well for certain use cases like text editing, but many other use cases require deeper customizability based on various factors like you said.

larkost · 9 months ago

CFRDT (Conflict Free Replicated Data Types) can absolutely reconcile many-writers situations. There are a number of these systems, and they all have their own rules around that replication (sometimes very complicated rules that are hard to reason about). As long as you can live inside those rules, and accept that they are going to have sharp corners that don't quite make sense for your use case, then you can get a virtually free lunch there.

But living inside of those rules (and sometimes just understanding those rules) can be a big ask in some situations, so you have to know what you are doing.

jahewson · 9 months ago

This. It’s so obvious when you think about it, but everybody wants a free lunch.

mystifyingpoi · 9 months ago

I think this point confuses me the most in this regard:

> Local-first architectures allow for fast and responsive applications that are resilient to network failures

So are we talking about apps that can work for days and weeks offline and then sync a lot of data at once, or are we talking about apps that can survive a few seconds glitch in network connectivity? I think that what is promised is the former, but what will make sense in practice is the latter.

hnthrow90348765 · 9 months ago

Local-first is overkill for transient faults. This is probably meant for outage and disaster scenarios.

ochiba · 9 months ago

There are niche use cases where the former (work for days to weeks offline) are useful and even critical - like certain field service use cases. Surviving glitches in network connectivity is useful for mainstream/consumer applications for users in general, especially those on mobile.

In my experience, it can affect the architecture and performance in a significant way. If a client can go offline for an arbitrary period of time, doing a delta sync when they come back online is more tricky, since we need to sync a specific range of operation history (and this needs to be adjusted for specific scope/permissions that the client has access to). If you scale up a system to thousands or millions of clients, having them all do arbitrary range queries doesn't scale well. For this reason I've seen sync engines simply force a client to do a complete re-sync if it "falls behind" with deltas for too long (e.g. more than a day or so.) Maintaining an operation log that is set up and indexed for querying arbitrary ranges of operations (for a specific scope of data) works well.

Matthias247 · 9 months ago

I'm wondering too.

In general this seems to work only if there's a single offline client that accepts the writes.

With limitations to the data scheme (e.g. have distinct tables per client), it might work with multiple clients. However those would need to be documented and I couldn't see anything in this blog post.

This sounds great, but I have some questions regarding data integrity and security.

If I build an offline first app using Turso, will my client directly exchange data with the database, without a layer of backend APIs to guarantee data integrity and security? For example, certain db write is only permitted for certain users, but when the db API is exposed, will that cause problems? A concrete example would be a forum where only moderators can remove users and posts. Say if I build an offline first forum, can a hacker hack the database on the filesystem and utilize the syncing feature to propagate the hacked data to the server?

aboodman · 9 months ago

Yes, this is a central issue in sync. For most applications, sync engines just aren't useful without some solution. Of course you need to validate inputs, support fine-grained permissions, etc., as developers have done with web apps for eons.

In Replicache, we addressed this by making your application server responsible for writes:

https://doc.replicache.dev/concepts/how-it-works

By doing this, your server can implement any validation it wants. It can also interact with external systems, do notifications, etc. Anything you can do with a traditional API.

In our new sync engine, Zero (https://zerosync.dev), we're adding this same ability soon (like this week) under the name custom mutators:

https://bugs.rocicorp.dev/issue/3045

This has been a hard project, but is really critical to use sync engines for anything serious.

isaachinman · 9 months ago

Happy user of Replicache. You and the team got it right.

franciscop · 9 months ago

The blog post doesn't even touch on write conflicts, which is the main reason I opened it (I was curious on how they solved them), so not surprised there's no many details about security etc.

refulgentis · 9 months ago

You raise an interesting point, that along with the replies, compels me to note that all of this stuff is bespoke, and things that sound simple like "I just want a good syncing library" are intractable in practice.

Ex. if I'm doing a document-based app, users can have at it, corrupt their own data all they want.

I honestly cannot wrap my mind around discussions re: SQLite x web dev, perhaps because I've been in mobile dev: but I don't even know what it'd mean to have an "offline-first forum" that syncs state: it's a global object with shared state rendered on the client.

When you set aside the implications introduced by using a hack scenario, a simpler question emerges: How would my clients sync the whole forum back to the cloud? Generally, my inclination is to handwave about users being able to make posts and have it "just work", after all, can't Turo help with simple scenarios like a posts table that has a date column? That makes it virtually conflict free...but my experience is "virtually" bites you, hard.

Deleted Comment

ochiba · 9 months ago

I am not sure about Turso but I've seen a few different approaches to this with other sync engine architectures:

1. At a database level: Using something like RLS in Postgres

2. At a backend level: The sync engine processes write operations via the backend API, where custom validation and authorization logic can be applied.

3. At a sync engine level: If the sync engine processes the write operations, there can be some kind of authorization layer similar to RLS enforced by the sync engine on the backend.

tracker1 · 9 months ago

I'm pretty sure you'll have to write parts of your app against your own APIs that represent the owner of the db for a group.

With Turso, you would want a model that had, for example a db per user and one per group. With the turso model you want to think something closer to sharding by hand for secure write user or group.

I could be wrong on this though. That's just my rough understanding.

krashidov · 9 months ago

This is my problem with these local first libraries. What happens if there's some data that needs to live in a db that's separate from the replicated sqlite db?

What I would really love is a sync engine library that is agnostic of your database.

Haven't really seen one yet.

vekker · 9 months ago

Exactly. So many local first libs don't cover this that it makes me wonder if the applications I am typically working on are so fundamentally different from what the local-first devs are normally building?

Most apps have user data that needs to be (partially or fully) shielded from other users. Yet, most local-first libs neglect to explain how to implement this with their libraries, or sometimes it's an obscure page or footnote somewhere in their docs, as if this is just an afterthought...

justanotheratom · 9 months ago

That is a very crucial question. I am also interested in the answer.

Perhaps they have RLS type policies that are only modifiable on the server.

nightowl_games · 9 months ago

Honestly this is so simple and core to the idea that I literally just assume it's handled.

thisislife2 · 9 months ago

I'd have thought that in this day and age every developer would know by now the importance of sanitizing user input before a web application accepts it? Your doubt has given me some pause ...

setr · 9 months ago

If the database is local, your web app database access is local. It can be modified and changed by the user, unlike code hosted on the web server, and any sanitization can thus be bypassed.

Meaning the user has effectively direct access to the underlying local database. Which, if blindly and totally synced, gives the user effectively direct access to the central database.

I'd have thought that in this day and age every developer would know by now the importance of not trusting frontend validation in a web application? your doubt has given me some pause.

wahnfrieden · 9 months ago

No need to give a rude, condescending and unhelpful answer - there will always be people learning