I'm not the original author, however, ages ago I invented what's now being called "acoustic camera". (Specifically, the SOTA on the math side for precision, accuracy.)
The resolution is fine enough that with COTS parts, I can record my signature simply by sketching it out with my fingernail on a table.
Every few years I dust this off and play with it, wondering if there's some application or other way to "turn this into money" (an increasing concern in the coming months...<tiny>PLZ HIRE ME</>), but I'm not a "product guy".
I'll answer some questions about the technology, but would really love to know if anyone here has advice on somehow using this achievement to pay rent. :)
Videoconferencing applications are implementing this in conjunction with face detection to "steer" wide angle cameras into "talking head" shots.
The current tech is spooky trash, parlor trick-quality from what I've used. Every time we use some of the automatic gizmos in conference rooms, we get tickets to make it stop.
Pick your favorite top video conferencing platform or camera maker and they'll want to improve what they have. The Creepy - "Just works" jump is a big one.
I am an acoustics consultant who designs buildings with architects, then sees them through construction. Doors intended to isolate noisy rooms regularly underperform, whether due to manufacturing or installation problems. Lots of fingerpointing when we call it out on project sites, and having a camera show the weaknesses due to the perimeter gasketing, frame, door leaf, or wall construction surrounding the door would provide the necessary visual for contractors to see the problems we are pointing out.
I wonder how many building projects use and engage acoustic engineers (unless sound is specific to the use like for theatres or conference facilities)?
I had the "opportunity" as a patient in a very new hospital wing some 7 years ago for about 13 days. While I had a private room, the door was always open to the walkway, I guess as is normal to allow quick response by medical staff. But at one point I really felt overwhelmed by all the external noise that I could hear from the other rooms in the ward, nurse stations, and so on. I really felt that the hard surfaces and even the angular nature of the floor layout was conspiring against me - almost focussing noise into my room. I imagine given enough data you could show the poorer rest of patients prolonged their recovery period and hence increased bed occupancy and cost to the health system (important in a state run public hospital service that is predominant in Australia).
As such it would be nice to think hospitals, schools, offices would include thorough acoustic assessment to at least allow appropriate mitigation of noise during design (before having invoke more active measures like soft furnishings, etc).
My response is US-centric, but most hospital projects do have an acoustics consultant involved. Improvements have been made, but not nearly enough. Challenges include the need for sound absorption to be porous, but this is at odds with cleanability. Increased focus has been on patient room doors, which are increasingly sliders. They work well when they're closed, but the challenges include getting a good bottom seal when the threshold has to accommodate beds and equipment being rolled in and out. Hospital equipment manufacturers are also improving the sounds of their equipment, away from the cheap piezo beepers.
Schools very often have acoustics reviews, although more often in cities than rural areas. Classrooms in addition to auditoria, gyms and common areas. Standards exist for those too.
Office buildings are hit or miss. The developer may hire us for a base building review. Tenants' architects hire us as they design their workplaces. There's a lot of push and pull to find a balance of the modern open ceiling industrial aesthetic and glass conference rooms with reasonable acoustical goals.
Working in conference rooms, people far too often are concerned about the look ( big windows, natural light, great table ) and less concerned about acoustics than would be reasonable. In my experience, Architects think people just sit around the table, and chat.
I once had a brand new office upfit project with a "flagship room" that was a large flex space, (could be a board meeting, could be a hackathon). There were three sides of glass, a concrete floor and metal ceiling structure.
If you clapped in the room, it could be heard for 1.5 seconds as it bounced across all those surfaces before decaying.
My bosses were furious at the microphone quality, the installer was unhelpful and bailed on us. We hired a consultant to perform an evaluation and he told us that the room was awful, with lots of numbers ( figures for acoustic reverberation at different frequencies ) and told us of certain products that could help in those ranges.
It is a lot easier to design rooms with acoustic features than it is to retrofit them in terrible sounding rooms.
I think you're talking about a different problem, though that one could also be helped with the right tool.
After setting up the array, once could answer questions like "where is that darn squeak coming from!", and even characterize the undesirable noise, both spatially and spectrally. You could also measure how effective noise isolation materials are, but I don't see what the spatial information gets you.
However, if weaknesses due to "perimeter gasketing, frame, door leaf, or wall construction" result in some sort of localized noise, then a system like this would certainly pinpoint it.
Yes, this is in reference to noise passing through a door that is localizable. Typically we would play pink noise (broadband, equal energy per octave band) on one side and listen for "hotspots" on the quiet side. I would wonder about the precision of being able to tell in the image the contribution of the door frame from the perimeter gasketing.
Acoustic cameras like from Noiseless Acoustics are on the market though they seem to be marketed to industrial customers. There are similar mapping systems using a scanning mic like from Soft dB.
Ah! In that case yeah, this tech could possibly help. At least, I'd love to give it a shot! (I hope you don't mind an email from me later...)
It's sensitive enough to noise that I can pick up (and locate) the air vents in a room, even when the sound is at the threshold of hearing. Noise (pink and white), and even more so MLS (maximum length sequence) really "jumps out" (it's very obvious), well below my threshold of hearing.
There's so many interesting areas of research I've never had the time/money to fully investigate. I'd love to play with an "active" system, not just "passive", with a goal of experimentally finding modes of resonance of objects in a room.
I bet once can tell the relative contribution to noise of one physical object over another, but I don't know enough about construction to know if one would be able to separate the door frame from the perimeter gasketing. You do need line of sight for it to work. At the least, you'd have a way to quantify the sound leak, with numbers and reproducibility.
FWIW the Noiseless Acoustics camera costs ~$18k(!)
I have absolutely no knowledge about this domain, so this idea 1) might not be viable and 2) is not a full developed product idea - but I thought it would be fun to get your thoughts.
Could you put some sound sensors inside of some mechanical structure and use the acoustics to figure out where some physical contact is happening on the outside of the structure?
Specific Application: prosthetics devices that can - with only a few acoustic sensors - determine where the 'touch' was on the outside of the device.
If viable, it may be similarly useful for robots - or any machine in general - that needs a low-hardware (and thus low cost) method of getting course tactile information around it's boundary.
I would suggest asset protection. I used to work for a company (Droneshield) that would provide protection from drone-based sabotage/intelligence gathering/smuggling.
It's a big field, but if you could, for instance, set up an alert for when a noise source crosses a property border, or when something that sounds like a human comes within x meters of a particular building, I think that would be valuable. I'm no longer there, but happy to talk if you're interested.
This sort of tech was really good at tracking drones. Neat thing about drones is they put off so much waste audio, rich with information.
I know very little about drones, but, in a demo I made years ago (trying to convince someone to give me money for using this in a drone defense product), I was able to "fingerprint" different drone models, even sometimes distinguish between two different drones of the same model. As long as there's line of sight, you can sometimes "see", in the data, slight changes in the speed of the propeller, as well as the rough "shape" of the drone itself.
But for all those applications, even though I love sound, I always think to myself.... "wouldn't a pair of cheap cameras do this better?" :)
> no longer there, but happy to talk if you're interested.
> "wouldn't a pair of cheap cameras do this better?"
That's part of where they've gone (though the cameras are far from cheap) as well as RF, with some AI magic sprinkled on top.
I wouldn't suggest doing drone defense, but smaller-scale asset protection might be more approachable. There's what, 3000 local governments in the US? I'm sure a lot of them have had a tractor stolen from a road construction site. Or maybe if they lease their equipment you could sell a solution to the leasing company.
I’d definitely pay for it. Though I’d use it outdoors in the woods to see where birds and other critters are hiding. Though I reckon the noise will pose a challenge.
There was a thread on here a while back where people would set up cameras and be able to make a motion image showing what parts moved with a regular frequency. Was being used at factories to determine points of failure via vibration. Can't find the thread now though.
Not saying it's exactly the same thing, as I don't think a video overlay was involved, but I know someone that got their PhD in this area in the 1970s and had a long career working for a U.S. military contractor doing this. The U.S. military has a significant interest in acoustic beamforming, both in the air and underwater, for obvious reasons.
Oh absolutely! Sorry, I was excited in my typing. It's not every day you see "your baby" on HN :) Everything I built was on the shoulders of giants, and lol, I didn't invent beamforming itself (of course).
The problem I have trying to find the niche application is that there's not much (at least, that I could think of) where you can have high quality audio data, but where a simple camera wouldn't work. Also, full imaging (as opposed to just tracking the largest/loudest source via TDOA) is quite different math, stupendously more computationally intensive.
Monitoring structural vibrations is also useful, and I think is an ongoing research area. I mention this because it's possible to sell it and research it at the same time.
What about synchronized cameras in different locations?
In industrial maintenace are such cameras in use to find air leaks in pneumatic. The model from Fluke is something like 20k USD. Because of this super high price, I don't know anybody who has a such model. So ask another company, if they are interested.
You want to pay the rent? Don't make a product, make a service, mainly, "WHERE IS THAT ANNOYING SOUND COMING FROM? SERVICE". People will pay you to locate sources of irritating sounds.
I use handheld sound amplitude meters and spectrogram software on smartphones to track down sounds I want to eliminate. This is often for musical purposes, but also for annoying sounds. An amplitude meter can be useful for finding the source of a sound, but lower frequency sounds are not as directional, and standing waves in rooms can make a sound source appear to come from somewhere other than the source (though perhaps moving a sound camera around physically could help eliminate this ambiguity?) The spectrogram can provide evidence of a specific periodicity or frequency range that can help identify a source. However these tools while useful are not sufficient all the time and are certainly not quick to use. I've often thought it would be really useful to have something like a gunshot detector for ordinary sounds, to be able to locate sounds both indoors and outdoors. It would help in answering questions like "is that sound a bear/dog raiding my trash?", "is the fan failing in my air purifier?", "is my air conditioner rattling something nearby?", "is my fridge transmitting a low level hum throughout the house?", "is that engine noise a truck pulling up to my house or a truck climbing a hill on the nearby highway?", all problems I have actually faced. Not sure if a sonic camera can deal well with low frequency sounds, but it would certainly make it easier to deal with higher frequency sources of noises. I'd buy one on impulse for $100, with some consideration of features at $300.
I have "inverse aptitude" at knowing what should be a product. :) Never trust me to know what would sell or not. In fact, "bet against whatever I think". ;)
In my ideal (and in the best jobs I've had in the past), someone finds me (or I find someone) who I can share a list of "cool things I've figured out how to do, but don't know the usefulness of", and that person then tells me what to build.
For your "where is that annoying sound coming from?" service, what sort of scale do you imagine, and what form factor?
A handheld consumer device with a range of ~10m which points in the direction of the loudest thing?
How much would you pay for such a device? (would love your thoughts in email)
How precise can I make it? And how do I think about optimal microphone layout?
I've been considering putting multiple microphones on one side of the house to track birds. I mainly want to isolate the audio for recognition. But setting up a PTZ camera to get good shots would be even cooler.
> multiple microphones on one side of the house to track birds.
This was one of my first at home demos! Tracking birds and cars. If you want to get started DIY and figuring it out yourself, try about 5 microphones, each 1m apart. The trick is having all the mics in line of sight to the object in question. Otherwise you're left with doing an "intersection of angles" method, which is much simpler, but horrible in terms of precision.
In a later demo I used a PTZ laser pointer to track the moving sound sources, so, it's at least possible!
Thanks! And do you have suggestions about how to think about extracting relatively pure sound from that point? It seems intuitive to me that it's doable, and I'm sure there must be a lot of work already done, but I'm clearly not googling the right terms.
Interesting. I'm casually familiar with Video Amplification (the approach at SIGGRAPH a decade ago IIRC), but have never implemented it myself. A really cool result, using the changes in the phase of the basis vectors over time to infer motion, without having to do dense optic flow.
I'm curious how you would combine acoustic localization in 3 space with motion amplification. I unreservedly agree that they are both "super cool", but don't see how they tie together to make something greater than the sum of their parts.
The only thing I thought of is, if two data channels (video, audio) are registered accurately enough, one could maybe combine the spatially limited frequency information from both channels for higher accuracy?
For example: voxel 10,10,10 is determined (by the audio system) to have a high amount of coherent sound with a fundamental frequency of 2khz. Can that 2khz + 10,10,10 be passed to the video system to do something.... cool? useful? If we know that sound of a certain spectral profile is coming from a specific region, is it useful to amplify (or deaden) video motion with a same frequency?
Stupid idea(?): Back-project onto some sunglasses (or corners/edge for behind), and give deaf people some basic level of sound-based situational awareness. Combine with some voice -> text tech, and you could have something pretty interesting.
Probably requires the technology to reach some tipping point. It was the same with VR and motion tracking. We've been able to do those things for nearly half a century, but it hasn't been anywhere near commercially viable until recently.
The tipping point is the availability of quality, affordable AR glasses. Until recently, AR has been too niche, so even if the acoustic camera technology is fine, the company also has to build AR glasses to go with the other part of the system. Whenever Apple comes out with AR glasses, then writing an acoustic camera app is almost trivial in comparison to having to also having to design some AR glasses. Not having to design the glasses makes acoustic camera technology overlay software way more commercially viable.
And since Intel, Google, Facebook etc keep buying startups that produce cool things and preventing them from producing more cool things (North Focals being the most recent I'm aware of) it's gonna be a while
Epson and others are still going strong, but AR has been pigeonholed into drone piloting and industrial applications space for several years. The software ecosystem (especially compelling and usable interfaces for input and programming) is what's really lacking for broad adoption, the hardware works well and will only have incremental improvements and price reductions at this point.
Have a look at Microsoft Research's Seeing AI. It is still under development but can describe scenes and objects within, plus a bunch of other stuff such as documents, people, light, colour, currency, products etc. The app is only on Apple (no Android!) but the home page does have videos of each feature.
There are people with fear of unexpected strangers walking up behind them, frequently from PTSD. This can be crippling in outdoor walking situations. A bag/fanny-pack sized device which can tap the wearer to let them know it really is time to check behind them would find a market… of some size… at some price.
One application for those that I think might be interesting is to record a scene and retain all of the raw audio. On playback, allow people to click on parts of the image and use beamforming to focus on that part of the audio.
Does anyone know if the array used here supports timestamped samples and/or clock sync to support multiple arrays? Or is it a single 16-channel stream?
Having done some very primitive dabbling with this stuff, the DSP programming is always the most interesting part to me.
These folks are killing it with some really cool 3D scanning integration to the acoustic analysis
That would be a lot of data. Instead of a few bytes of color data per pixel per frame, you'd need a thousands of samples per second per unit of spatial resolution.
Another approach to this is the Ambisonics method of capturing the directional soundfield at a point. But you'd need to use a high degree multipole expansion to get resolution anywhere closer to video.
No, you need a few audio channels. If the algorithm can filter out what's where based on this 16 or even 4 microphone array, so can your client when provided with those sound tracks which are just a few kilobytes each. Probably all audio streams together are still less data than the video stream (given typical audio and video track quality combinations). You don't need a preprocessed stream for every point. Even if you want to keep the algorithm secret, you can have the client send the desired position to the server and do the magic there.
I think you're mostly right. If you want to be able to preserve enough bandwidth for music, you're talking about kilobytes per second of audio, but yeah, you could just store the streams from each mic in the array, so multiply the bitrate by the array size. And it's probably possible to find ways to save bandwidth with joint encoding.
Is it possible to tune this to specific frequencies to detect mosquitos? Their audio signal is pretty weak but its also a very specific frequency. This would definitely help in the hunting and killing of the little bastards.
Exactly.. the laser introduces all kinds of safety issues. Simply shining a light and pointing me in the right direction vastly increases my lethality, especially combined with one of those fly swatter electric rackets.
My cat would serve a similar role for larger insects. Her eyes would track them for me so I could locate and destroy them. Unfortunately she either cannot, or more likely chooses not to, track mosquitos.
Yes. One surprising result is that weak sounds, even some below the threshold of hearing, are easy to detect, provided you have clear line of sight and no turbulence.
If the mosquito frequency is less than half of the sampling rate of these mics, then yes. Very basically, these algorithms work by looking at the delay between each mic picking up a certain frequency, and calculating the direction of the sound wave from that.
A mosquito racket increases efficiency at least 10 times. If you direct the output of that camera to a VR visor you can chase and zap them in the dark.
This took me years to figure out. :) Even cooler is that you can put them in a random pattern AND have the system determine its own geometry w/o measurements.
Would work! One of the first applications I made was a processing layer returning, in spherical coordinates centered at an arbitrary reference point, what the system determined as the "primary sound source".
In demo, the two angles drove a pair of servos steering a laser pointer. Followed the loudest object around the room :)
IME, finding a way to communicate the information to the user is often non-intuitive. That is to say, once a device has located birds in trees, how would you like it to inform you?
Hm. Doable. I think the hard part for that then might be getting the real time information about the position and orientation of the viewfinder in high enough resolution.
Keep in mind "the black box" can output the position in 3 space (x,y,z, measured in mm) of coherent sound sources, but to know where those are relative to the camera, so that once can draw a little arrow, can be hard.
I'd like to try hooking it up to a VR/AR headset, since I imagine those already handle the task of knowing precisely where my head is and where it's looking.
I think this might be possible with a phone that has AR support - you'd scan a QR code on the sound camera to capture its position relative to the world, then the phone could display a 3D view through the camera of where the sound source(s) are.
Oh that's interesting! Is your thinking something like:
1. mount the array on a tripod somewhere in the frame of the camera
2. the array is covered with an assortment of fiducials,
3. software uses the known intrinsics and extrinsics of the camera to figure out the array position relative to the camera
4. do the obvious thing with chaining transforms until you get the sound source position relative to the camera
If so, I think that would work, but would be a lot of coding to do all that CV...
Awesome article. Deserves a spot on the front page by itself! Makes me want to learn a lot more about W. L. Bragg's physics exploits, along with those of his father.
Based on my experience building corrscope, I feel this is the kind of project that will outgrow Python once you want to implement your own low-level algorithms, make it embeddable or shippable as an application, or parallelize it. I wonder what's the easiest way to port Python DSP code and UIs to a compiled/etc. language.
Not a single microphone but there is acoustic vector sensor which can also give you the sound's direction. Very expensive though, several Israel companies use them for detecting gunshot's direction.
https://www.ese.wustl.edu/~nehorai/research/avs/avssensor.ht...
Do you know anything about these "acoustic vector sensors"?
When I first saw a popular science article about them, I got excited about incorporating them into an array, but couldn't find any technical details, just a lot of what looked like vaporware. Is it anything more than three orthogonal pressure sensors? (aka.... 3 microphones?)
Microflown makes one, it uses very small temperature differences. You can look for acoustic particle velocity sensor to find more about how they work. I can't remember the paper, otherwise I'd provide a better link.
https://www.microflown.com/products/standard-probes
That would work for amplitude-based location but this is using phase correlation to find time of flight difference to each microphone. With a fan idea you would get a lot of phase drop outs and smearing that I think would make that difficult.
Not to say wouldn’t work, you would get results, but they will be based in a different strategy.
I was interested in making my own alexa-like device, but it seems mic arrays are sooooo expensive - more than the cost of an alexa device for the least expensive one i can find :/
The mike hardware used in the UMA-16 USB mic array [0] is the Knowles SPH1668LM4H-1 which runs about a buck and a quarter [1]. The DSP, SHARC ADSP21489, is pricier as an eval board >$500 [2].
Not that I can find! Building the array is way more expensive than it needs to be.
I have limited EE knowledge, so have been stumbling through it on my own, building my first array out of reference microphones, another with $10 omnis from guitar center, and one with 8x, cheap, repurposed webcams.
Right now, my limiting factor on driving the cost of a future array down is that I haven't figured out how to get a lot (at least 8) I2S inputs to a micro-controller. If that were solved, it would be easier.
Main limitation is USB 1.1 IO, so ~1MB/s, unless you are fine with recording to SDcard then probably around 10MB/s. Pico itself can interface 29 microphones with no sweat (30 GPIOs, 2.0 GB/s internal bandwidth).
Funny you should link that! I have gotten as far as getting a raspberry pi zero to be a USB audio gadget, but I haven't played with the Pico yet. The raspi zero has one i2s input and one i2s output.
> Pico itself can interface 29 microphones with no sweat.
I... had no idea. I thought that since it didn't have an i2s peripheral I was going to have to either find a micro that did, or do something bitbangy using SPI and perhaps an external buffer. I see that it might be possible to get a few I2S inputs using the PIOs. Thanks for this, will certainly give it a shot.
(though I don't see how you're getting 29 microphones :P "prove it" ;)
The post links to an inexpensive array at the end. I don't really get why the 16-mic one he used is so expensive, those smd mics can't be more than $1 or so each...
The actual mic capsules are likly far cheaper than $1 a piece (probably closer to $0.10 than $1) but the mics in an array need to be phased-matched. The two approaches to getting phased microphones are 1) building them using precision techniques so they are phased-match from the start (which is expensive and why pro phase-matched mics are around $1,000 each), or 2) get a whole pile of cheap mic, test them one-by-one (or really, pair-by-pair) and select the mics that are best phased-match to use them in the array. The #2 approach is cheaper, but does add cost.
I don't know enough about this so maybe dumb question, but couldn't you use DSP to correct phase between microphones if you knew their relative differences?
The mics are probably cheap but 16x ADC in decent quality with decent power supply and low time offset between channels? How much is a 16chn audio interface with 16 mic preamps these days?
It would be interesting to see how well this could detect non incident sounds - for instance detecting reflective/resonant hotspots in an audio mixing/recording room.
I found that quite hard! Curious if there's anything public about your approach :)
With high enough frequencies I can see reflections, but not at any distance, and the sound source has to be loud. Of course, I'm relying on line of sight, and perfect reflection. Any bumps in the wall would add some phase error I think.
If I ever get a chance to work on the problem again I'd love to see if anything interesting can be done with multipath.
I feel like there has to be a cheaper way to do this than a $275 acoustic array. It's only 16 elements. You couldn't do this with 16 cheap microphones?
How cheap do you want them? $275/16 = $17 per microphone, or $15 per + $35 worth of additional materials to make it into an array. Or $10 mics + $115 of metal and plastic.
$275 doesn't really seem exorbitant for niche hardware given than you need 16 decently high quality microphones. I eagerly await a ShowHN using $2 mics and cardboard instead!
> given that you need 16 decently high quality microphones...
But... you don't. :) The challenge I find is getting the data into the computer. That's what always costs the most. I've done it with 8x $1 mics and a used $100 sound card.
CY7C68013A FX2LP can synchronously sample and transfer over USB 2.0 16 bits at up to 20MHZ. You can set it up with external IFCLK (5MHz minimum) and just pump data from 16 MEMS microphones to the PC all day long.
I know enough EE to do simple things, about the amount you'd expect someone who's worked as a firmware engineer to know. The fact that you're saying "Can't you ... cheap..."? make me think this must be a viable path though.
My hands shake too much for anything but the simplest soldering; is there a cheap FPGA board you'd recommend? And getting all the data into the computer... it could easily be ~70Mbps. (16 mics, 192/24) Making a custom USB class could be a mess, I wonder how hard it would be to just dump ethernet.
If you search around on Alibaba you can find some cheap dev boards that appear to have ethernet already (instead of you having to provide an ethernet PHY, which I think takes up something like 18 IOs?). Beware that even though most dev boards have a USB interface for programming it probably doesn't function as a USB for comms (i.e. you'd probably need to provide another USB PHY). Each I2S device probably requires 3 GPIOs. Someone who knows a lot more about this can probably make a much better recommendation.
If you do the beamforming on the FPGA then you probably only need to output a lot less data that might be easily done over a simple UART.
In the end, designing your own board with the right-size FPGA is probably the right solution, but that's only cheap per-unit in quantity and requires someone who more than sort of knows what they're doing. Though for someone who knows what they're doing it'd be a relatively quick project...
These are very commonly used in manufacturing plants to find leaks in compressed air lines. I had a Fluke vendor visit the ol' airplane factory to see if we could use their tools to find air leaks in low-pressure ECS system ducting.
How many times cheaper would a competing product need to be for you to consider buying one?
Obviously, Fluke, and the positive reputation that brand is known for, and reliable product support are worth a LOT, but I'm sure there's some $$ divisor beyond which you/someone like you would take a risk on something substantially cheaper.
Are you kidding me???! It costs so... so... so... much less. I thought automotive might be a good application, considering all the doors opened by using more DSP tricks layered in addition to source localization. (I can localize coherent sound patterns s well as coherent sound)
I would love to chat with you, happy to buy a coffee or beer for your time. My email's on my profile.
Some of these devices for automotive are large enough to surround a car on 3-4 sides, with several hundreds of microphones and the associated cables and positioning arms. Depending on where the devices he mentioned are being used, there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.
> there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.
HRTF stuff is fun, if that's what you're referring to! :) I've worked with some of that stuff before, including the stupidly overpriced mannequin heads.
> Some of these devices for automotive are large enough to surround a car on 3-4 sides, with several hundreds of microphones and the associated cables and positioning arms. Depending on where the devices he mentioned are being used, there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.
Do you work in a field that would benefit from the same results, for a fraction of the cost? Or, if not, do you have any advice on how to find and talk to these mythical industries that could pay me? It looks like Porsche wanted to build their own, in house, but I'm hoping if it costs less than a tenth as much, maybe more people would want one.
Please do!!! I spent about 10 years of my life obsessed with this problem/area of research, and, when I have ability to pay rent and eat, it's the problem I'll go back to.
This is super cool. I was thinking about making a 4x4 mems mic array on pcb exactly like that one. I had no idea you could just buy one off the shelf these days. Has anyone put four together to make a 64 mic 3D acoustic camera?
Awesome work! How computationally intensive is Acoular / how complex would doing this from a live feed instead of recorded files be? Thanks for posting your project.
I'm not familiar with Acoular*, but the math involved in computing the sound coherence function over a large space is quite involved!
In my implementation, there are multiple stages using a dataflow approach with lots of compile time optimization. In 2011 I could image a roughly 2m^3 space using 8 microphones at ~10fps in real time on 3 desktop computers, 2015 I was able to do 12mics, 3m^3 space, on 2 laptops, but that involved a LOT of custom numeric programming to shave cycles.
If I had access, I'd love to see what could be done given a well tuned implementation and modern GPUs. An efficient scatter gather OP (like what AVX3 has) would increase performance by an order of magnitude.
Audio is low enough frequency that you can process the signal directly. The time delay/phase information between each mic allows you to know which direction the sound is coming from. This is essentially the opposite of beam forming. Theoretically you could do it with visible light and not need a lens if you had a computer and sensors that could operate fast enough. But optical sensors typically only tell you the amplitude of light and not phase for example.
Edit: To clarify, the "opposite" of beam forming means using processing you can choose which direction you want to listen at any one time, like a beam. Then you can scan the beam across x,y and make an image.
The major difference between a microphone array and an imaging sensor is the availability of phase information for the received wave. A microphone oscillates with the sound pressure wave, and that oscillation is translated directly to a voltage. Your software can see the full time series of that wave, so the information about it is 'complete'.
An optical image sensor, essentially, turns photons into electrons. The optical wave is too fast to turn into a voltage time series, so you only see the wave's amplitude at a given sample in time. Therefore, in order to turn it into an image, you need to recover some fraction of the phase information in some way.
A pinhole is one way to do that. One way to think of a pinhole is that it maps every source point to a distinct imaging plane point, so the phase of the wave doesn't matter as much to the final image. It acts as a filter that cuts out ambiguous information that phase would have disambiguated.
A lens performs a similar operation by interacting with the light wave's phase to bend wavefronts in a way that maps points on the object to an imaging plane.
Those approaches don't recover 100% of the phase information, but they recover or filter enough to form the image you care about. Light field cameras attempt to recover more complete phase information through various ways better explored in the wikipedia link.
Could you create a sound blocking plane with a pinhole that makes an acoustic camera that follows similar principles to an optical camera obscura? I bet at some level you could, but I also bet it would not be very advantageous. You still need a microphone array to act as the imagine plane. The size of the pinhole is probably very constrained by sound wave diffraction (it's a pretty long wave after all, compared to light). Using the directly available acoustic phase information is more compact and efficient.
I figured if you were to create an optical camera on the same principles of an acoustic camera you would get into trouble with the very short coherence length of sunlight. It's easy enough to build something that can deal with a laser, but sunlight has a coherence length of just a couple of dozen micrometers. If you are working on a larger scale than that, the phase information effectively becomes useless.
Fun fact: we manage to record amplitude and phase of radiowaves, though. That allows us to record them at different points on Earth, ship the recorded data to a datacenter and computationally merge them to get a planet-wide virtual telescope dish with a much better angular resolution that a single telescope dish ever could have.
we don't record phase, there is no way to recover the phase from a single signal
what we do we make sure that all receivers are synchronized, i.e take samples at the exact same time
then you can correlate the signal received between dishes (which will arrive at different times due to delays in propagation), and find out the time difference of the signal which then points out to signal origin (beam forming) - this is how phased radar works
once you align the signals you can use the minute differences in the signals to compute a synthetic aperture, i.e improving the angular resolution
There are two main ways to do it: algorithms based on time difference of arrival and algorithms based on estimation of sound energy on a predefined grid. You can also estimate the distance but it will not as accurate as the direction.
The resolution is fine enough that with COTS parts, I can record my signature simply by sketching it out with my fingernail on a table.
Every few years I dust this off and play with it, wondering if there's some application or other way to "turn this into money" (an increasing concern in the coming months...<tiny>PLZ HIRE ME</>), but I'm not a "product guy".
I'll answer some questions about the technology, but would really love to know if anyone here has advice on somehow using this achievement to pay rent. :)