Hi HN! I made Beatsync, an open-source browser-based audio player that syncs audio with millisecond-level accuracy across many devices.
Try it live right now: https://www.beatsync.gg/
The idea is that with no additional hardware, you can turn any group of devices into a full surround sound system. MacBook speakers are particularly good.
Inspired by Network Time Protocol (NTP), I do clock synchronization over websockets and use the Web Audio API to keep audio latency under a few ms.
You can also drag devices around a virtual grid to simulate spatial audio — it changes the volume of each device depending on its distance to a virtual listening source!
I've been working on this project for the past couple of weeks. Would love to hear your thoughts and ideas!
There are a ton of directions I can think about you taking it in.
The household application: this one is already pretty directly applicable. Have a bunch of wireless speakers and you should be able to make it sound really good from anywhere, yes? You would probably want support for static configurations, and there's a good chance each client isn't going to be able to run the full suite, but the server can probably still figure out what to send to each client based on timing data.
Relatedly, it would be nice to have a sense of "facing" for the point on the virtual grid and adjust 5.1 channels accordingly, automatically (especially left/right). [Oh, maybe this is already implicit in the grid - "up" is "forward"?]
The party application: this would be a cool trick that would take a lot more work. What if each device could locate itself in actual space automatically and figure out its sync accordingly as it moved? This might not be possible purely with software - especially with just the browser's access to sensors related to high-accuracy location based on, for example, wi-fi sources. However, it would be utterly magical to be able to install an app, join a host, and let your phone join a mob of other phones as individual speakers in everyone's pockets at a party and have positional audio "just work." The "wow" factor would be off the charts.
On a related note, it could be interesting to add a "jukebox" front-end - some way for clients to submit and negotiate tracks for the play queue.
Another idea - account for copper and optical cabling. The latency issue isn't restricted to the clocks that you can see. Adjusting audio timing for long audio cable runs matters a lot in large areas (say, a stadium or performance hall) but it can still matter in house-sized settings, too, depending on how speakers are wired. For a laptop speaker, there's no practical offset between the clock's time and the time as which sound plays, but if the audio output is connected to a cable run, it would be nice - and probably not very hard - to add some static timing offset for the physical layer associated with a particular output (or even channel). It might even be worth it to be able to calculate it for the user. (This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.)
0.3 microseconds. The period of a wave at 20kHz (very roughly the highest pitch we can hear) is 50 microseconds. So - more or less insignificant.
Cable latency is basically never an issue for audio. Latency due to speed of sound in air is what you see techs at stadiums and performance halls tuning.
Do you suppose there exists some other reason for that, like maybe matching impedance on each cable, or is this likely one of those superstitions that audiophiles fall prey to?
The instant you start having wireless speakers (eg. bluetooth) or any sort of significant delay between commanding playback and the actual sound coming out, the latency becomes audible.
If you support mic input, you can allow the user to select a device as the "nexus" with mic recording on. Then you tell each device in your setup to "chirp" at the same exact time, but at different frequencies. Then you can derive the individual device's "local delay" and compensate.
This allows you to tune the surround setup to full accuracy for a given point in space, and it will take care of ring buffer differences, wireless transfers of non-teathered speakers, etc.
An OSS app with the ability to sync everyone up over mobile or wifi, on Android or iOS with BYO headphones, would be incredible. This should be a thing :)
Dead Comment
Someone brought up the idea of an internet radio, which I thought was cool. If you could see a list of all the rooms people are in and tune it to exactly what they're jamming to.
How can you guarantee that? NTP fails to guarantee that all clocks are synced inside a datacenter, let alone across an ocean (Did not read the code yet)
EDIT: The wording got me. "Guarantee" & "Perfect" in the post title, and "Millisecond-accurate synchronization" in the README. Cool project!
Going off on a tangent: Back in the days of Live Aid, they tried doing a transatlantic duet. Turns out it’s literally physically impossible because if A songs when they hear B, then B hears A at least 38ms too late, which is too much for the human body to handle and still make music.
they're doing a smarter thing by doing streaming. i don't do any streaming right now.
the upside is that beatsync works in the browser. just a link means no setup is required.
Just to share a couple of similar/related projects in case useful for reference:
http://strobe.audio multi-room audio in Elixir
https://www.panaudia.com multi-user spatial audio mixing in Rust
Deleted Comment
Needed on Server and Clients is an override to a) fix my domain users having the same cookie if its stored in default location and b) make sure the server only starts when the network is REALLY up - the normal network online is a system service only and thus you cannot check for it in a users service. In my case the server runs under a domain users profile.
~/.config/systemd/user/pipewire-pulse.service.d/override.conf
~/.config/systemd/user/user-network-wait.service Server Pulseaudio:Not needed but very useful:
/etc/pipewire/pipewire-pulse.conf.d/50-networkparty.conf
# needed. Note how to to make sure s16le is used across all devices to keep conversion to a minimum and how to name the sink somewhat sane/etc/pipewire/pipewire-pulse.conf.d/70-rtp-sender-sink.conf
]/etc/pipewire/pipewire-pulse.conf.d/71-rtp-sender-23912611.conf
You can play to the sink f.e. in mpd with: Client Pulseaudio:/etc/pipewire/pipewire-pulse.conf.d/71-rtp-receiver.conf
you can play with the latency_msec, journalctl will tell you the lowest fragment if you just put 0 or 1ms here. It needs to be a multiple of that minimum, just experiment. Im fine with this even though 12ms would also work in my lan, but its more stable across the wifi bridge.The sap_address on the client may work to select the right multicast address even though its actually for the SAP announcements but don't count on that, i have not tested multiple streams so far and would not use "magic" solutions like SAP on the server (and they didn't work in my case and seem pipewire-only). Right now the client seems to pick the right stream - experiment ;)
The sink in my case is a module-combo, just check with pactl list sinks which sink you want the stream to play on. Note that this is not some application you can dynamically assign to other sinks!!
For LAN, if you run openwrt just enable igmp_snooping and multicast_querier on the softwarebridge (Luci --> Network --> Interfaces --> Tab Devices) and maybe Multi to Uni in your wifi advanced settings. I dont use this though as my wifi is another vlan or WDS-bridged so i stay out of these problems mostly.
There are more advanced settings possible with openwrt, including having working igmp_snooping on the hardware switch, if you are interested frequent my documentation (german) on Krei.se as i will write a guide for this sometime lol (or just ask me by DM). Its possible to run this ms-exact with clean network in any case, there is no need to install extra software or clog unused ports with multicast-traffic. If you are perfect about this the music will flow like water through your LAN only where its needed.
But Poettering Software (systemd/pulseaudio) is quite composable so even though there is a learning curve the alternative are 20k config file monoliths.
Still this even turns rooted LG TVs and cheap raspi picos into sinks.
The only latency i now have is the bass traveling slower than the trebles lol
First, I do clock synchronization with a central server so that all clients can agree on a time reference.
Then, instead of directly manipulating the hardware audio ring buffers (which browsers don't allow), I use the Web Audio API's scheduling system to play audio in the future at a specific start time, on all devices.
So a central server relays messages from clients, telling them when to start and which sample position in the buffer to start from.