Show HN: DriftDB – an open source WebSocket backend for real-time apps

I've not looked at DriftDB in depth (cloudflare worker running this is neat!), but can't MQTT handle this sort of workload?

Obv. there's not a cloudflare worker running say an MQTT server over websockets, but you can scope topics with wildcards (https://www.hivemq.com/blog/mqtt-essentials-part-5-mqtt-topi...), replay missed messages on reconnection, last-will-and-testament, ACLs, dynamic topic creation, binary messages etc.

I'm asking as many of these websocket projects seem to use custom protocols rather than anything standard aka interoperable.

paulgb · 3 years ago

The problem with MQTT is that most of the use cases I’m interested in involve a web browser as at least one party of the connection, and the browser doesn’t support MQTT. I could wrap MQTT in a WebSocket, but then I’d lose the advantages of MQTT’s compactness and interoperability (unless MQTT-over-WebSocket is a thing?)

The other operation that I haven’t seen elsewhere, but is vital to enabling stream compaction without a leader, is the idea of a stream rollup up to a specific stream number. NATS Jetstream, for example, has the ability to roll up an entire stream, but if another message hits the stream between when the rollup is computed and when it arrives at the server, that message too will be replaced (IIRC). So I thought about using NATS (which already has a WebSocket protocol), but ruled it out.

elithrar · 3 years ago

> The problem with MQTT is that most of the use cases I’m interested in involve a web browser as at least one party of the connection, and the browser doesn’t support MQTT. I could wrap MQTT in a WebSocket, but then I’d lose the advantages of MQTT’s compactness and interoperability (unless MQTT-over-WebSocket is a thing?)

We support MQTT over WS (or JSON over WS, or just HTTP) in Cloudflare Pub/Sub, FWIW - https://developers.cloudflare.com/pub-sub/learning/websocket...

I also agree with the comments re: MQTT being well suited to a lot of these "broadcast" use-case, but that the IoT roots seem to hold it back. MQTT 5.0 is just a great protocol — clear spec, explicit about errors, flexible payloads — that make it well suited to these broadcast/fan-in/real-time workloads. The traditional cloud providers do MQTT (3.1.1) in their respective IoT platforms but never grew it beyond that.

alexisread · 3 years ago

MQTT-over-websocket does exist (https://github.com/mqttjs/MQTT.js), and most MQTT brokers support it (Mosquito, AmazonMQ etc.). You're right about the compaction - MQTT doesn't have anything in it's protocol about compaction, and I don't know of any brokers that implement it. Having said that, you could use an MQTT-kafka bridge.

Something like Mosquito + https://github.com/nodefluent/mqtt-to-kafka-bridge + Redpanda in a docker image would work, though obv. this might be a bit overkill for most. Having said that, it does open many new avenues for interaction at scale. You pays your money...

chrisdalke · 3 years ago

Most MQTT implementations do support MQTT-over-Websocket. I use it extensively at work and it's been fairly reliable!

Kinrany · 3 years ago

Isn't it fairly easy to implement rollups as a stream of rollups with matching stream numbers?

nine_k · 3 years ago

Maybe it's the richness of MQTT that makes it a worse choice for a startup. Offering a conformant MQTT broker is a lot of work, and the semantics come from elsewhere, not geared towards emphasizing your unique advantages.

Building a much simpler, custom-tailored protocol allows to ship faster, and improve gradually. If the point is to deploy on Cloudflare in a massively-parallel fashion (which is likely harder for a regular MQTT broker), the custom protocol allows to concentrate on that special advantage, and not on standards conformance or interoperability with a bevy of existing libraries.

manv1 · 3 years ago

Funny, the IoT space has bought into MQTT but the general internet space has not.

MQTT scales and works. And it's easy, fast, and small.

I've been trying to get our guys to do MQTT-based pub/sub, and they're rather do their own thing with web sockets because MQTT is scary. <shrug>.

That's the problem when front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like deployment, scalability, capabilities, etc.

Scottopherson · 3 years ago

Jeez that's a big paint brush you're slinging around.

That's the problem when non-front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like accessibility, design scalability, client device capabilities, etc.

manv1 · 3 years ago

I mean, it'd be trivial to write stream replay for MQTT. It's literally just stashing messages and sending them back on connect. Not sure what the issue is there.

How the race conditions are handled? If one of the clients of the shared state delivers the the input with a delay(network issue etc.), will it overwrite state of the other client once delivered or will be dismissed? Is there a concept of slave/master client?

Edit:

So, I played a bit and it appears that if a client is disconnected and changes of the state happens when offline, once connected these changes will be applied to the other client who was having its own changes in the state. So its working on the "last message" basis? Also it seems like it can't detect the offline/online status?

I'm curious because the interesting part of this kind of systems is the way races are handled.

paulgb · 3 years ago

> So, I played a bit and it appears that if a client is disconnected and changes of the state happens when offline, once connected these changes will be applied to the other client who was having its own changes in the state. So its working on the "last message" basis? Also it seems like it can't detect the offline/online status?

From the server’s point of view, it’s just an ordered broadcast channel with replay. The conflict semantics are whatever you build on top of that.

The `useSharedState` hook in the React bindings implements last-write-wins. For the `useSharedReducer` hook, the reducer itself determines the semantics, but in the voxel editor demo we also use last-write-wins.

> Also it seems like it can't detect the offline/online status?

Online/offline status is exposed in the client libraries, e.g. in the react bindings there is a useConnectionStatus hook: https://driftdb.com/docs/react#useconnectionstatus-hook

> I'm curious because the interesting part of this kind of systems is the way races are handled.

It’s academically the interesting part, but I think it matters less than people assume it does. Here’s a section from a blog post I wrote a couple months ago:

> Developers may find it tempting to treat collaborative applications as any other distributed systems, and in many ways that’s a useful way to look at them. But they differ in an important way, which is that they always have humans-in-the-loop. As a result, many edge cases can simply be deferred to the user.

> For example, every multiplayer application has to decide how to handle two users modifying the same object concurrently. In practice, this tends to be rare, because of something I call social locking: the tendency of reasonable people not to clobber each other’s work-in-progress, even in the absence of software-based locking features. This is especially the case when applications have presence features that provide hints to other users about where their attention is (cursor position, selection, etc.) In the rare times it does occur, the users can sort it out among themselves.

> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.

https://driftingin.space/posts/you-might-not-need-a-crdt

mrtksn · 3 years ago

Good point, in the case of users interacting it’s probably a non issue. Thanks for the insight.

samhuk · 3 years ago

Looks interesting. Coincidentally, I've just completed the bulk of work on a distributed Websocket network system to synchronize certain bits of state between multiple clients for my own kind of Storybook tool [0]. How interesting!

This kind of tool is exactly what I would have needed, instead of the approach I've taken which is a bit kludgy and grass-roots.

By far the most difficult part of it for me was ensuring that the web socket network can heal from outages of any of the clients or the server. E.g. If a client loses connection, how does it regain knowledge of state? If the server dies, what do clients do with state changes they want to upload? Etc. It was really difficult!

Good work :)

[0] https://github.com/samhuk/exhibitor/pull/22

rlt · 3 years ago

Neat.

> DriftDB is a real-time data backend that runs on the edge

What does it mean for these backends to be “on the edge”? Do geographically disperse clients connect to different backends? If so are messages synchronized between them? If so what’s the point of them being on the edge?

By “on the edge”, I mean that if you’re in London and I’m in Amsterdam, and we want to exchange messages, the messages shouldn’t have to do a round-trip through Virginia, they should go through a server closer to both of us. (Of course, if I’m in SF and you’re in London, this is less of a win.)

The way it works in DriftDB is that everything is siloed into “rooms”, which are effectively broadcast channels. The room is started based on the geography of the person who first joins it (Cloudflare handles this part).

> The room is started based on the geography of the person who first joins it

Cool, makes a lot of sense because people using a given “room” are often likely to be geographically collocated.

trollitarantula · 3 years ago

Nice! Would love to see Cloudflare deployment guide. Cloudflare isn't mentioned in the docs.

HighlandSpring · 3 years ago

Oh, cool! So kinda like IRC?

fernandopj · 3 years ago

OP must have meant it runs on Cloudflare Edge.

scaredginger · 3 years ago

Please explain your reasoning here

BTBurke · 3 years ago

This is great. I'm going to use this with something I'm working on. The edge behavior is just what I need.

When you say limitations are a "relatively small number of clients need to share some state over a relatively short period of time," I read in another comment about a dozen or so clients, but what about the time factor? Can it be on the order of hours?

> but what about the time factor? Can it be on the order of hours?

So far I’ve focused on use cases where clients are online for overlapping time intervals. When all the clients go offline, Cloudflare will shut down the worker after some period and the replay ability will be lost. The core data structure is designed such that it could be stored in the Durable Object storage Cloudflare provides, but I haven't wired it up yet.

One more thought - any consideration of hooking this to Cloudflare's queue? Then you could optionally connect another worker to that and e.g. persist everything in their D1 SQLite database.

That works perfectly for what I'm using it for. Thanks for building this!

jcq3 · 3 years ago

I didn't find the use case section, the first thing I read before code, implementation example or whatever. Why is it always lacking in SaaS landpages?

Good feedback, here you go :) https://github.com/drifting-in-space/driftdb/commit/8d946217...

Beautiful, now it makes me want to use your tool because I can relate to use cases I might have...

speps · 3 years ago

Reminds me of Colyseus: https://github.com/colyseus/colyseus

Colyseus has support for persistence as well as matchmaking!

dabeeeenster · 3 years ago

This is super interesting! Do you have any data on how well this scales when running on Cloudflare Edge? Can you run more than one instance and have them share state?

Thanks! When hosted on Cloudflare, it uses their Durable Objects product. Rather than running multiple backend instances that share state, it's set up so that all users in the same "room" are connected to the same instance. The instances can then be scaled out horizontally (but Cloudflare takes care of that.)

Within a room, things are a bit more constrained. We haven't found the limit yet, and I suspect it's pretty high, but our design goal was to support on the order of dozens of users in a room, not necessarily beyond that. (Targeting e.g. a shared whiteboard use case)

tmikaeld · 3 years ago

We also looked at using Cloudflare, but it was prohibitively expensive, because you pay for the duration of each "room" (Connection, depending on how you use it).

https://developers.cloudflare.com/workers/platform/pricing/#...

Eventually we went with Centrifuge.