"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
Back in high school, I built (with some parental assistance) an apparatus to measure how quickly the pressure would drop (in a pressurized cylinder) when a very small hole allowed air to leak out.
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
I love these kind of inadvertent measurements. One of my favorite examples is that a sufficiently accurate IMU can get you relatively accurate longitude measurements from the Coriolis effect.
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
I believe this is one of the initial steps an aircraft INS uses to find north while it is aligning, but it's been too long since I had aircraft systems theory in the front of my brain.
Similarly, diesel engines come with a reserve fuel supply that you can accidentally use once. (diesel engines will happily run on engine oil when warm)
a colleague of mine spent months analysing fluctuations in narrow band signal from a geophone only for a more senior colleague to get fed up with it and demonstrated that actually the fluctuations simply correlate with the air temperature and do so within the spec sheets reported temperature tolerance.
I first encountered it in Elecia White's book Making Embedded Systems, but the attribution is anonymous and whom it's attributed to may have heard it elsewhere.
The highest grade gauge blocks use laser interferometry from Mitutoyo have a measured coefficient of thermal expansion AND a uncertainty of that coefficient. And they have a size variance of plus or minus 30nm. That is only about 410 oxygen atoms.
A lot of people like myself consider heat a form of light but I guess a photographer would be just thinking visible light. They say that about 50% of the sun's light emissions comes in the infrared frequencies.
I’m not sure how the speed of sound could depend on altitude, even in principle. The air doesn’t know where it is!
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
Right, it gets even worse: Air pressure in not only altitude-dependent but fluctuates even at constant altitude. The pressure (altitude) dependence is comparatively weak, though.
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
Reminds me of Intellectual Venture's Optical Fence developed to track and kill mosquitoes with short laser pulses.
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
I did a similar project at 18. Needless to say I didn't have enough HW and SW skills to do much since I implemented the most naive form of the TDOA algorithms as well as the most inefficient way of estimating the time difference through cross correlation. I still learnt a lot and it led me to eventually getting a PhD in SAR systems, which are actually beamformers using the movement of the platform instead of an array
What were the results of your study? I’ve heard that bat lungs are so sensitive that when they fly across the pressure differential of large turbines their capillaries basically explode
Yes basically. Bird lungs are relatively rigid, open at both ends like a tube, and have a one-way flow of air, so they are less prone to pressure-related injuries. Bat lungs are mammalian lungs that expand and contract as they breathe just like us, so they are particularly vulnerable to barotrauma near wind turbines.
After writing a bunch of MATLAB code to find the bats, I handed it off and haven't heard back about whether they actually built the wind turbines or not.
I would love to do something like that to track the bats in my garden, how feasible would it be for an amateur to do as a personal project?
Any good references on where to start.
A nice mention about this is the outstanding and quiet work of the Cosys-Lab of the University of Antwerp. They once put a microphone array below a scorpio, and showed how bats moved their ultrasonic beam to scan for a scorpio. Incredible stuff [0].
Here's the report [1], written when I was a second year undergrad in 2010.
It's very basic. The species identification is based on matching contours of the spectrogram against some template contour. The multilateration was, embarrassingly, done by brute force by generating a dense 3D grid. At the time, I didn't have any knowledge of Kalman filters or anything that could have been helpful for actually tracking the bats.
Honestly, that sounds like amazing work. I wish I could afford to get out of enterprise software engineering and just do academic software development like that.
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
(OP here) tverbeure hit most of the main points, but mostly cost ($2/mic vs $0.5/mic adds up when there are 192 microphones), difficulty of finding things with enough i2s interfaces (even with 16 way daisy chaining, thats still more than most/all things will have). The FPGA/custom hardware was part of the fun as well!
Yeah, I've also had difficulty finding something with enough I2S. It was a while back and I've used Sprocket carrier for Jetson TX2 - it had 6 lanes, so up to 96. It was for a SODAR application, so the sampling frequency was not that critical and to me it felt like the perfect trick to make an array with off-the-shelf hardware. So I was just curious, if this was something you've considered.
For something indoors, yes, I can see how low sampling frequency gets very limiting. And 192 microphones, that's really pushing it. Love it.
The $2/mic vs $0.5/mic argument is a fun one. You've obviously poured enormous amount of engineering in there, involving PCB design, FPGA and network programming, writing custom CUDA kernels, signal processing, PyTorch, the list goes on. And you've had 4090 plugged in your PC in 2023. Classic hobbit in a mithril vest ;)
Not OP, but I looked in to this a few years ago. It was more expensive then, and only went to 20 kHz. Higher frequencies are helpful if you're listening for the hiss of leaking gas, or corona discharge of an electric arc.
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
I've considered making a phased array myself, but never got around to sending out the PCB. But here are two reasons by I2S is not the best option:
* I2S requires 3 instead of the 2 pins of PDM. However, in the datasheet that you provided, it shows how you can daisy-chain microphones which is really cool (even if not standard I2S.) So that argument goes away.
* PDM gives you access to way higher sample rates which in turns gives you more flexibility in choosing the delay for a delay-and-sum operation. For example, if the PDM clock is 2MHz, you could theoretically delay with a precision of 0.5us. In practice, you'll do that with lower precision, but with I2S, the clock will typically max out at 192kHz.
1) and 3) are valid, but 2) isn't really. In that sort of pipeline, you usually do IQ sampling which allows you to phase-shift by any arbitrary value with a complex multiplication.
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
I am very interested in how small these arrays can be. From talking with a friend with cochlear implants, I would assume this could help dramatically with the right signal processing to help him hear.
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
I do my PhD in in-air ultrasound with phased arrays and talk to the medical guys at conferences/labs that we talk to and it's soooo much harder in solids/liquids. The frequency is significantly higher, think 1-10MHz instead of like 40khz, so any normal electronics are out the window.
One problem is that the speed of sound is not constant (or approximately constant) across the bandwidth you're interested in when the sound wave is traveling through solids and liquids.
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
For the hard of hearing like me the killer application would be live transcription in a noisy setting like a meetup or party, with source separation and grouping of speech from different speakers. Could be life-changing.
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
* Automatic speech recognition (ASR) systems have progressed to the point where humans can interact with computing devices using speech. However, the distance between a device and the speaker will cause a loss in speech quality and therefore impact the effectiveness of ASR performance. As such, there is a greater need to have reliable voice capture for far-field speech recognition. The launch of Amazon Echo devices prompted the use of far-field ASR in the consumer electronics space, as it allows its users to interact with the device from several meters away by using microphone array processing techniques.*
This is known as the Cocktail Party Problem. It turns out or brains do an incredible amount of processing to allow us to understand a person talking to us in a noisy room.
In general the position of the microphones in space must be known precisely for the phase shifting math to be done well, and also the clocks on the phones would need to be in sync at high precision like 10x the highest frequency sound you're picking up. In other words within 10s of thousands of a second. Also if the array mic locations is not a simple straight line, circle, or other simple geometry the computer code (ie. math) to milk out an improved signal becomes very difficult.
10ms? That's a very long time. Phone clocks are much more accurate than that because they're synced to the atomic clocks in cell towers and GPS satellites.
Hell even NTP can do 1ms over the internet. AFAIK the only modern devices with >10ms inaccurate clocks by default are Windows desktops. I complained about that before because it screwed up my one-way latency measurements: https://github.com/microsoft/WSL/issues/6310
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
A corollary that's one of my rules to live by: Never measure anything over time without also measuring the ambient temperature.
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
https://www.oxts.com/blog/going-round-circles-earth-rotation...
https://www.britannica.com/science/latitude
Deleted Comment
Dead Comment
https://youtu.be/zsA3X40nz9w?si=oGg2wdUlLXSDxpsN
I wanna say that’s a Bob Pease quote but I can’t find an attribution to it.
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
https://en.wikipedia.org/wiki/Speed_of_sound
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
After writing a bunch of MATLAB code to find the bats, I handed it off and haven't heard back about whether they actually built the wind turbines or not.
[0]: https://www.youtube.com/watch?v=57ScSPWhGqU
It's very basic. The species identification is based on matching contours of the spectrogram against some template contour. The multilateration was, embarrassingly, done by brute force by generating a dense 3D grid. At the time, I didn't have any knowledge of Kalman filters or anything that could have been helpful for actually tracking the bats.
[1] https://daniel.lawrence.lu/public/bat-report.pdf
As opposed to?
I understand that ICS-52000 is a relatively low cost ($2/100pcs) and there are even breakout boards available with 4 microphones, which can be chained to 8 or 16, like https://www.cdiweb.com/datasheets/notwired/ds-nw-aud-ics5200...
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
For something indoors, yes, I can see how low sampling frequency gets very limiting. And 192 microphones, that's really pushing it. Love it.
The $2/mic vs $0.5/mic argument is a fun one. You've obviously poured enormous amount of engineering in there, involving PCB design, FPGA and network programming, writing custom CUDA kernels, signal processing, PyTorch, the list goes on. And you've had 4090 plugged in your PC in 2023. Classic hobbit in a mithril vest ;)
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
My notes:
ICS-52000 $3.50, 20 kHz
ICS-41350 $1.05, 40 kHz
SPH0641LU4H-1 $1.45, 80 kHz+
[1] https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide...
* I2S requires 3 instead of the 2 pins of PDM. However, in the datasheet that you provided, it shows how you can daisy-chain microphones which is really cool (even if not standard I2S.) So that argument goes away.
* PDM gives you access to way higher sample rates which in turns gives you more flexibility in choosing the delay for a delay-and-sum operation. For example, if the PDM clock is 2MHz, you could theoretically delay with a precision of 0.5us. In practice, you'll do that with lower precision, but with I2S, the clock will typically max out at 192kHz.
* PDM microphones then do be cheaper.
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
https://assets.amazon.science/da/c2/71f5f9fa49f585a4616e49d5...
https://en.wikipedia.org/wiki/Cocktail_party_effect?wprov=sf...
10ms? That's a very long time. Phone clocks are much more accurate than that because they're synced to the atomic clocks in cell towers and GPS satellites.
Hell even NTP can do 1ms over the internet. AFAIK the only modern devices with >10ms inaccurate clocks by default are Windows desktops. I complained about that before because it screwed up my one-way latency measurements: https://github.com/microsoft/WSL/issues/6310
I solved that problem by RTFM and toggling some settings until I got the same accuracy as Linux: https://learn.microsoft.com/en-us/windows-server/networking/...
Anyway I dunno why the math would be too complicated, GPUs are great at this kind of signal processing
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.