Show HN: Beatsync – perfect audio sync across multiple devices

whimsy · 9 months ago

This is very, very cool; it's a thing I've been looking for on my backburner for several years. It's a very interesting problem.

There are a ton of directions I can think about you taking it in.

The household application: this one is already pretty directly applicable. Have a bunch of wireless speakers and you should be able to make it sound really good from anywhere, yes? You would probably want support for static configurations, and there's a good chance each client isn't going to be able to run the full suite, but the server can probably still figure out what to send to each client based on timing data.

Relatedly, it would be nice to have a sense of "facing" for the point on the virtual grid and adjust 5.1 channels accordingly, automatically (especially left/right). [Oh, maybe this is already implicit in the grid - "up" is "forward"?]

The party application: this would be a cool trick that would take a lot more work. What if each device could locate itself in actual space automatically and figure out its sync accordingly as it moved? This might not be possible purely with software - especially with just the browser's access to sensors related to high-accuracy location based on, for example, wi-fi sources. However, it would be utterly magical to be able to install an app, join a host, and let your phone join a mob of other phones as individual speakers in everyone's pockets at a party and have positional audio "just work." The "wow" factor would be off the charts.

On a related note, it could be interesting to add a "jukebox" front-end - some way for clients to submit and negotiate tracks for the play queue.

Another idea - account for copper and optical cabling. The latency issue isn't restricted to the clocks that you can see. Adjusting audio timing for long audio cable runs matters a lot in large areas (say, a stadium or performance hall) but it can still matter in house-sized settings, too, depending on how speakers are wired. For a laptop speaker, there's no practical offset between the clock's time and the time as which sound plays, but if the audio output is connected to a cable run, it would be nice - and probably not very hard - to add some static timing offset for the physical layer associated with a particular output (or even channel). It might even be worth it to be able to calculate it for the user. (This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.)

camtarn · 9 months ago

> This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.

0.3 microseconds. The period of a wave at 20kHz (very roughly the highest pitch we can hear) is 50 microseconds. So - more or less insignificant.

Cable latency is basically never an issue for audio. Latency due to speed of sound in air is what you see techs at stadiums and performance halls tuning.

whimsy · 9 months ago

Oh, thanks for correcting me! Now that you mention it, I'm confused by a memory I have. Wired speakers seem to be less common these days but I remember being told about two decades ago that the "proper" way to install speakers was to run out equal lengths of speaker cable (basically just jacketed copper, afaik) to different speakers even if they weren't equidistant in a room. (This was advice for home installation, not stadium-sized installations.)

Do you suppose there exists some other reason for that, like maybe matching impedance on each cable, or is this likely one of those superstitions that audiophiles fall prey to?

superjan · 9 months ago

For those wondering: The rule thumb here is that light travels at one foot per nanosecond. 300 ns =0,3 μsec. Electricity is a bit slower but the same order of magnitude.

freemanjiang · 9 months ago

Thank you for the kind words! Yeah, I think it gets a lot more complicated once you start dealing with speaker hardware. It pretty much only works for the device's native speaker at the moment.

The instant you start having wireless speakers (eg. bluetooth) or any sort of significant delay between commanding playback and the actual sound coming out, the latency becomes audible.

raisedbyninjas · 9 months ago

For devices with mics, can you have them play a test chirp to measure the latency of Bluetooth or other laggy sound stack?

WhtWsThtAgn · 9 months ago

Awesome project!

If you support mic input, you can allow the user to select a device as the "nexus" with mic recording on. Then you tell each device in your setup to "chirp" at the same exact time, but at different frequencies. Then you can derive the individual device's "local delay" and compensate.

This allows you to tune the surround setup to full accuracy for a given point in space, and it will take care of ring buffer differences, wireless transfers of non-teathered speakers, etc.

hgomersall · 9 months ago

Silent disco in which everyone brings their own source and headphones.

cypherpunks01 · 9 months ago

Absolutely! Silent disco still requires impractically expensive rental hardware to work well as far as I know. A lot of them run off FM radio, since it's the simplest way to go, but nobody owns portable radios anymore.

An OSS app with the ability to sync everyone up over mobile or wifi, on Android or iOS with BYO headphones, would be incredible. This should be a thing :)

pmontra · 9 months ago

"Their own source" looks like they are bringing their own files or (more probably) their Spotify or YouTube. It happens all the time on public transport. Or did you mean bringing their own music and taking turns at sharing it with the other people around? That might be against the terms of service of some services.

pcthrowaway · 9 months ago

I believe the syncing won't work when playing with a bluetooth device

Dead Comment

freemanjiang · 9 months ago

I primarily built this for group in-person listening, and that's what the spatial audio controls are for. But what is interesting is that since it only requires the browser, it works across the internet as well. You can guarantee that you and someone else are listening to the same thing even across an ocean.

Someone brought up the idea of an internet radio, which I thought was cool. If you could see a list of all the rooms people are in and tune it to exactly what they're jamming to.

Ne02ptzero · 9 months ago

> You can guarantee that you and someone else are listening to the same thing even across an ocean.

How can you guarantee that? NTP fails to guarantee that all clocks are synced inside a datacenter, let alone across an ocean (Did not read the code yet)

EDIT: The wording got me. "Guarantee" & "Perfect" in the post title, and "Millisecond-accurate synchronization" in the README. Cool project!

moomin · 9 months ago

More, the speed of light puts a hard cap on how simultaneous you can be. Wolfram Alpha reckons New York to London is 19ms in a vacuum, more using fibre.

Going off on a tangent: Back in the days of Live Aid, they tried doing a transatlantic duet. Turns out it’s literally physically impossible because if A songs when they hear B, then B hears A at least 38ms too late, which is too much for the human body to handle and still make music.

freemanjiang · 9 months ago

Haha yeah guarantee is a strong word. I just mean that it’s good enough to not be noticeable (even within the same physical room)

daredoes · 9 months ago

Have you seen snapcast? That's currently my go-to audio sync solution for running whole house audio. Always open to alternatives, but so far nothing beats the performance and accessibility

freemanjiang · 9 months ago

yes but only after posting! it's very cool—i'm actually a little embarrassed to not have seen it before.

they're doing a smarter thing by doing streaming. i don't do any streaming right now.

the upside is that beatsync works in the browser. just a link means no setup is required.

thruflo · 9 months ago

This looks really cool, congrats!

Just to share a couple of similar/related projects in case useful for reference:

http://strobe.audio multi-room audio in Elixir

https://www.panaudia.com multi-user spatial audio mixing in Rust

Deleted Comment

Krei-se · 9 months ago

If you do RTP with pulseaudio you can know the latency of all devices and have it synced by design - no extra software needed, device agnostic. If it somehow runs linux it will "just work".

fao_ · 9 months ago

Works with pipewire too, although the user-facing docs are pretty sparse

Krei-se · 9 months ago

Here are my server and client configs needed in case someone comes across this from google. It sets up sinks and sources, so you can just mute it, but it would just play automatic from logon:

Needed on Server and Clients is an override to a) fix my domain users having the same cookie if its stored in default location and b) make sure the server only starts when the network is REALLY up - the normal network online is a system service only and thus you cannot check for it in a users service. In my case the server runs under a domain users profile.

~/.config/systemd/user/pipewire-pulse.service.d/override.conf

  [Unit]
  After=user-network-wait.service avahi-daemon.service

  [Service]
  # this changes the location of the cookie because i use roaming homes for domain clients and each machine would have the same cookie
  ExecStartPre=/bin/bash -c 'systemctl --user set-environment PULSE_COOKIE=/run/user/$(id -u)/pulse/cookie'

~/.config/systemd/user/user-network-wait.service

  [Unit]
  Description=Wait for Network Connectivity

  [Service]
  Type=oneshot

  # This pings your LAN router and creates a network-online file in /run to pick up
  ExecStart=/bin/bash -c '[ -f /run/user/$(id -u)/network-online ] || (until ping -c1 10.126.0.1 >/dev/null 2>&1; do sleep 1; done; touch /run/user/$(id -u)/network-online)'
 
  [Install]
  WantedBy=default.target

Server Pulseaudio:

Not needed but very useful:

/etc/pipewire/pipewire-pulse.conf.d/50-networkparty.conf

  context.exec = [
      { path = "pactl" args = "load-module module-native-protocol-tcp auth-anonymous=yes listen=10.126.1.1 auth-ip-acl=127.0.0.1;10.126.0.0/16" }
  ]

# needed. Note how to to make sure s16le is used across all devices to keep conversion to a minimum and how to name the sink somewhat sane

/etc/pipewire/pipewire-pulse.conf.d/70-rtp-sender-sink.conf

  context.exec = [
    { path = "pactl"        args = "load-module module-null-sink sink_name=rtp_sender_sink format=s16le channels=2 rate=48000 sink_properties='device.description=\"RTP Sender Sink\"'" }

]

/etc/pipewire/pipewire-pulse.conf.d/71-rtp-sender-23912611.conf

  context.exec = [
    { path = "pactl" args = "load-module module-rtp-send source=rtp_sender_sink.monitor source_ip=10.126.1.1 destination_ip=239.126.1.1 port=5004 inhibit_auto_suspend=always" }
  ]

You can play to the sink f.e. in mpd with:

  audio_output {
    type "pulse"
    name "RTP Sender Sink Pulse"
    sink "rtp_sender_sink"
  }

Client Pulseaudio:

/etc/pipewire/pipewire-pulse.conf.d/71-rtp-receiver.conf

  context.exec = [
      { path = "pactl"        args = "load-module module-rtp-recv sink=combine_sink sap_address=239.126.1.1 latency_msec=64.3750" }
  ]

you can play with the latency_msec, journalctl will tell you the lowest fragment if you just put 0 or 1ms here. It needs to be a multiple of that minimum, just experiment. Im fine with this even though 12ms would also work in my lan, but its more stable across the wifi bridge.

The sap_address on the client may work to select the right multicast address even though its actually for the SAP announcements but don't count on that, i have not tested multiple streams so far and would not use "magic" solutions like SAP on the server (and they didn't work in my case and seem pipewire-only). Right now the client seems to pick the right stream - experiment ;)

The sink in my case is a module-combo, just check with pactl list sinks which sink you want the stream to play on. Note that this is not some application you can dynamically assign to other sinks!!

For LAN, if you run openwrt just enable igmp_snooping and multicast_querier on the softwarebridge (Luci --> Network --> Interfaces --> Tab Devices) and maybe Multi to Uni in your wifi advanced settings. I dont use this though as my wifi is another vlan or WDS-bridged so i stay out of these problems mostly.

There are more advanced settings possible with openwrt, including having working igmp_snooping on the hardware switch, if you are interested frequent my documentation (german) on Krei.se as i will write a guide for this sometime lol (or just ask me by DM). Its possible to run this ms-exact with clean network in any case, there is no need to install extra software or clog unused ports with multicast-traffic. If you are perfect about this the music will flow like water through your LAN only where its needed.

worthless-trash · 9 months ago

Careful i think the apple marketing department believes they have the exclusive use of the "just work" moniker.

Krei-se · 9 months ago

Haha, well you have to set up multicast in your LAN like a good plumber, then fight context.exec and context.modules - also "just works" means in my case setting up a systemd After=network-watchdog in pulseaudios override that checks whether network is up in userspace.

But Poettering Software (systemd/pulseaudio) is quite composable so even though there is a learning curve the alternative are 20k config file monoliths.

Still this even turns rooted LG TVs and cheap raspi picos into sinks.

The only latency i now have is the bass traveling slower than the trebles lol

Dwedit · 9 months ago

How does it deal with the audio ring buffers on the various devices? Does it just try to start them all at the same time, or does it take into account the sample position within the buffer?

freemanjiang · 9 months ago

Great question! There's two steps:

First, I do clock synchronization with a central server so that all clients can agree on a time reference.

Then, instead of directly manipulating the hardware audio ring buffers (which browsers don't allow), I use the Web Audio API's scheduling system to play audio in the future at a specific start time, on all devices.

So a central server relays messages from clients, telling them when to start and which sample position in the buffer to start from.

camtarn · 9 months ago

Interesting. Feels like this might still have some noticeable tens-of-millisends latency on Windows, where the default audio drivers still have high latency. The browser may intend to play the sound at time t, but when it calls Windows's API to play the sound I'm guessing it doesn't apply a negative time offset?

serial_dev · 9 months ago

So it doesn't need to use the microphone? I guess from the "works across the ocean" comment and based on this description. I would have thought you would listen to the mic and sync based on surrounding audio somehow but it's good to know that it's not needed.

cosmotic · 9 months ago

Another issue is seeking in compressed audio. When seeking (to sync), some API's snap to frame boundaries.

cosmotic · 9 months ago

I solved this by decompressing the whole file into memory as PCM.

brcmthrowaway · 9 months ago

This is my question, does it do interpolation or pitch bending

TowerTall · 9 months ago

Could have used this 25 years ago when I was working in a large room with ~100 other people. Every friday an mp3 was distributed and then at the same time we all started playing it signaling that the workday has ended and the friday bar was open. Fun times.

Groxx · 9 months ago

Impressively accurate - Android phone in Firefox <-> Chrome on OSX == basically perfect to my ear. That's super cool, thanks for sharing!

cypherpunks01 · 9 months ago

For fun I tried syncing over Tor as well. It works impressively well! Amazingly tight sync considering the latency is 3 random hops around the world.