Hey HN! I’ve written a bunch of WebSocket servers over the years to do simple things like state synchronization, WebRTC signaling, and notifying a client when a backend job was run. I realized that if I had a simple way to create a private, temporary, mini-redis that the client could talk to directly, it would save a lot of time. So we created DriftDB.
In addition to the open source server that you can run yourself, we also provide https://jamsocket.live where you can use an instance we host on Cloudflare’s edge (~13ms round trip latency from my home in NY).
You may have seen my blog post a couple months back, “You might not need a CRDT”[1]. Some of those ideas (especially the emphasis on state machine synchronization) are implemented in DriftDB.
Here’s an IRL talk I gave on DriftDB last week at Browsertech SF[2] and a 4-minute tutorial of building a cross-client synchronized slider component in React[3]
[1] https://news.ycombinator.com/item?id=33865672
Obv. there's not a cloudflare worker running say an MQTT server over websockets, but you can scope topics with wildcards (https://www.hivemq.com/blog/mqtt-essentials-part-5-mqtt-topi...), replay missed messages on reconnection, last-will-and-testament, ACLs, dynamic topic creation, binary messages etc.
I'm asking as many of these websocket projects seem to use custom protocols rather than anything standard aka interoperable.
The other operation that I haven’t seen elsewhere, but is vital to enabling stream compaction without a leader, is the idea of a stream rollup up to a specific stream number. NATS Jetstream, for example, has the ability to roll up an entire stream, but if another message hits the stream between when the rollup is computed and when it arrives at the server, that message too will be replaced (IIRC). So I thought about using NATS (which already has a WebSocket protocol), but ruled it out.
We support MQTT over WS (or JSON over WS, or just HTTP) in Cloudflare Pub/Sub, FWIW - https://developers.cloudflare.com/pub-sub/learning/websocket...
I also agree with the comments re: MQTT being well suited to a lot of these "broadcast" use-case, but that the IoT roots seem to hold it back. MQTT 5.0 is just a great protocol — clear spec, explicit about errors, flexible payloads — that make it well suited to these broadcast/fan-in/real-time workloads. The traditional cloud providers do MQTT (3.1.1) in their respective IoT platforms but never grew it beyond that.
Something like Mosquito + https://github.com/nodefluent/mqtt-to-kafka-bridge + Redpanda in a docker image would work, though obv. this might be a bit overkill for most. Having said that, it does open many new avenues for interaction at scale. You pays your money...
Building a much simpler, custom-tailored protocol allows to ship faster, and improve gradually. If the point is to deploy on Cloudflare in a massively-parallel fashion (which is likely harder for a regular MQTT broker), the custom protocol allows to concentrate on that special advantage, and not on standards conformance or interoperability with a bevy of existing libraries.
MQTT scales and works. And it's easy, fast, and small.
I've been trying to get our guys to do MQTT-based pub/sub, and they're rather do their own thing with web sockets because MQTT is scary. <shrug>.
That's the problem when front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like deployment, scalability, capabilities, etc.
That's the problem when non-front-end guys make decisions about tech sometimes, they choose stuff that seems easy to integrate without caring about things like accessibility, design scalability, client device capabilities, etc.
This kind of tool is exactly what I would have needed, instead of the approach I've taken which is a bit kludgy and grass-roots.
By far the most difficult part of it for me was ensuring that the web socket network can heal from outages of any of the clients or the server. E.g. If a client loses connection, how does it regain knowledge of state? If the server dies, what do clients do with state changes they want to upload? Etc. It was really difficult!
Good work :)
[0] https://github.com/samhuk/exhibitor/pull/22
Edit:
So, I played a bit and it appears that if a client is disconnected and changes of the state happens when offline, once connected these changes will be applied to the other client who was having its own changes in the state. So its working on the "last message" basis? Also it seems like it can't detect the offline/online status?
I'm curious because the interesting part of this kind of systems is the way races are handled.
From the server’s point of view, it’s just an ordered broadcast channel with replay. The conflict semantics are whatever you build on top of that.
The `useSharedState` hook in the React bindings implements last-write-wins. For the `useSharedReducer` hook, the reducer itself determines the semantics, but in the voxel editor demo we also use last-write-wins.
> Also it seems like it can't detect the offline/online status?
Online/offline status is exposed in the client libraries, e.g. in the react bindings there is a useConnectionStatus hook: https://driftdb.com/docs/react#useconnectionstatus-hook
> I'm curious because the interesting part of this kind of systems is the way races are handled.
It’s academically the interesting part, but I think it matters less than people assume it does. Here’s a section from a blog post I wrote a couple months ago:
> Developers may find it tempting to treat collaborative applications as any other distributed systems, and in many ways that’s a useful way to look at them. But they differ in an important way, which is that they always have humans-in-the-loop. As a result, many edge cases can simply be deferred to the user.
> For example, every multiplayer application has to decide how to handle two users modifying the same object concurrently. In practice, this tends to be rare, because of something I call social locking: the tendency of reasonable people not to clobber each other’s work-in-progress, even in the absence of software-based locking features. This is especially the case when applications have presence features that provide hints to other users about where their attention is (cursor position, selection, etc.) In the rare times it does occur, the users can sort it out among themselves.
> A general theme of successful multiplayer approaches we’ve seen is not overcomplicating things. We’ve heard a number of companies confess that their multiplayer approach feels naive — especially compared to the academic literature on the topic — and yet it works just fine in practice.
https://driftingin.space/posts/you-might-not-need-a-crdt
> DriftDB is a real-time data backend that runs on the edge
What does it mean for these backends to be “on the edge”? Do geographically disperse clients connect to different backends? If so are messages synchronized between them? If so what’s the point of them being on the edge?
The way it works in DriftDB is that everything is siloed into “rooms”, which are effectively broadcast channels. The room is started based on the geography of the person who first joins it (Cloudflare handles this part).
Cool, makes a lot of sense because people using a given “room” are often likely to be geographically collocated.
When you say limitations are a "relatively small number of clients need to share some state over a relatively short period of time," I read in another comment about a dozen or so clients, but what about the time factor? Can it be on the order of hours?
So far I’ve focused on use cases where clients are online for overlapping time intervals. When all the clients go offline, Cloudflare will shut down the worker after some period and the replay ability will be lost. The core data structure is designed such that it could be stored in the Durable Object storage Cloudflare provides, but I haven't wired it up yet.
Colyseus has support for persistence as well as matchmaking!
Within a room, things are a bit more constrained. We haven't found the limit yet, and I suspect it's pretty high, but our design goal was to support on the order of dozens of users in a room, not necessarily beyond that. (Targeting e.g. a shared whiteboard use case)
https://developers.cloudflare.com/workers/platform/pricing/#...
Eventually we went with Centrifuge.