I've noticed an interesting feature in Chrome and Chromium: they seem to isolate internal audio from the microphone input. For instance, when I'm on a Google Meet call in one tab and playing a YouTube video at full volume in another tab, the video’s audio isn’t picked up by Google Meet. This isolation doesn’t happen if I use different browsers for each task (e.g., Google Meet on Chrome and YouTube on Chromium).
Does anyone know how Chrome and Chromium achieve this audio isolation?
Given that Chromium is open source, it would be helpful if someone could point me to the specific part of the codebase that handles this. Any insights or technical details would be greatly appreciated!
Within a single process, or tree of processes that can cooperate, this is straightforward (modulo the actual audio signal processing which isn't) to do: keep what you're playing for a few hundreds milliseconds around, compare to what you're getting in the microphone, find correlations, cancel.
If the process aren't related there are multiple ways to do this. Either the OS provides a capture API that does the cancellation, this is what happens e.g. on macOS for Firefox and Safari, you can use this. The OS knows what is being output. This is often available on mobile as well.
Sometimes (Linux desktop, Windows) the OS provides a loopback stream: a way to capture the audio that is being played back, and that can similarly be used for cancellation.
If none of this is available, you mix the audio output and perform cancellation yourself, and the behaviour your observe happens.
Source: I do that, but at Mozilla and we unsurprisingly have the same problems and solutions.
>The missile knows where it is at all times. It knows this because it knows where it isn't. By subtracting where it is from where it isn't, or where it isn't from where it is (whichever is greater), it obtains a difference, or deviation
https://knowyourmeme.com/memes/the-missile-knows-where-it-is
Here's a short historical interview with Harold Black from AT&T on his discovery/invention of the negative feedback technique for noise reduction. It's not super explanatory but a nice historical context: https://youtu.be/iFrxyJAtJ7U?si=8ONC8N2KZwq3Jfsq
Here's a more indepth circuit explanation: https://youtu.be/iFrxyJAtJ7U?si=8ONC8N2KZwq3Jfsq
IIRC the issue was AT&T was trying to get cross-country calling, but to make the signal carry further you needed a louder signal. Amplifying the signal also the distortion.
So Harold came up with this method that ultimately allowed enough signal reduction to allow calls to cross the country within the power constraints available.
For some reason I recall something about transmission about Denver being a cut off point before the signal was too degraded... But I'm too old and forgetful so I could be misremembering something I read a while ago. If anyone has more specific info/context/citations that'd be great. Since this is just "hearsay" from memory, but I think it's something like this.
The OS doesn't have more information about this than applications and it's not that obvious whether an application wants the OS to fuck around with the audio input it sees. Even in the applications where this might be the obvious default behavior, you're wrong - since most listeners don't use loudspeakers at all, and this is not a problem when they wear headphones. And detecting that (also, is the input a microphone at all?) is not straightforward.
Not all audio applications are phone calls.
Some do.
But you need to have a strong-handed OS team that's willing to push everybody towards their most modern and highly integrated interfaces and sunset their older interfaces.
Not everybody wants that in their OS. Some want operating systems that can be pieced together from myriad components maintained by radically different teams, some want to see their API's/interfaces preserved for decades of backwards compatibility, some want minimal features from their OS and maximum raw flexibility in user space, etc
This is the way things usually work in the Free Software world. For example: need JPEG support? You'll probably end up linking to libjpeg or an equivalent. Most languages have a binding to the same library.
Is that part of the OS? I guess the answer depends on how you define OS. On a Free Software platform it's difficult to say when a given library is part of the OS and when it is not.
That's mac of course but in my experience Windows is much more trusting of what it gives applications access to so I suppose the same thing is available there.
https://docs.pipewire.org/page_module_echo_cancel.html
https://wiki.archlinux.org/title/PipeWire/Examples#Echo_canc...
If you're still on pulseaudio for some reason, it ships with a similar module named "module-echo-cancel":
https://www.freedesktop.org/wiki/Software/PulseAudio/Documen...
Dead Comment
Can't tell you anything else due to NDAs.
(I realize this situation isn't up to you and I appreciate that you chimed in as you could!)
When I worked at Mozilla, most stuff was open, but I still couldn't talk about stuff publicly because I wasn't a spokesperson for Mozilla. Same at OpenDNS/Cisco, or at Fastly, and now at Amazon. Lots of stuff I can talk about, but I generally avoid threads and comments about Amazon, or if I do, it's strictly to reference public documentation, public releases, or that sort of thing.
It's easier to simply not participate, link a document, or say no comment than it is to cross reference what I might say against what's public, and what's not.
https://news.ycombinator.com/item?id=39669626
> I've been working on an audio application for a little bit, and was shocked to find Chrome handles simultaneous recording & playback very poorly. Made this site to demo the issue as clearly as possible
https://chrome-please-fix-your-audio.xyz/
> <he...@google.com>
> Status: Won't Fix (Intended Behavior)
> Looking at the sample in https://chrome-please-fix-your-audio.xyz, the issue seems to be that the constraints just aren't being passed correctly [...]
> If you supply the constraints within the audio block of the constraints, then it seems to work [...]
> See https://jsfiddle.net/40821ukc/4/ for an adapted version of https://chrome-please-fix-your-audio.xyz. I can repro the issue on the original page, not on that jsfiddle.
https://issues.chromium.org/issues/327472528#comment14
It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.
[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...
I'm not aware of anyone doing echo cancellation using an analog circuit, but that doesn't mean no-one did. I guess it's theoretically possible but I don't see how the adaption could work.
This is needed because many people don't use headphones and if you have more than one endpoint with mic and speakers open you will get feedback gallore if you don't do something to suppress it.
I'd say it depends on the combination of the hardware/software/OS that does pieces of it on how audio routing comes together.
Generally you have to see what's available, how it can or can't be routed, what software or settings could be enabled or added to introduce more flexibility in routing, and then making the audio routing work how you want.
More specifically some datapoints:
SOUND DRIVERS: Part of this can be managed by the sound drivers on the computer. Applications like web browsers can access those settings or list of devices available.
Software drivers can let you pick what's that's playing on a computer, and then specifically in browsers it can vary.
CHANNELS: There are often different channels for everything. Physical headphone/microphone jacks, etc. They all become devices with channels (input and output).
ROUTING: The input into a microphone can be just the voice, and/or system audio. System audio can further be broken down to be specific ones. OBS has some nice examples of this functionality.
ADVANCED ROUTING: There are some audio drivers that are virtual audio drivers that can also help you achieve the audio isolation or workflow folks are after.