Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DIY Acoustic Camera (navat.substack.com)
394 points by tomsonj on Oct 27, 2021 | hide | past | favorite | 163 comments


I'm not the original author, however, ages ago I invented what's now being called "acoustic camera". (Specifically, the SOTA on the math side for precision, accuracy.)

The resolution is fine enough that with COTS parts, I can record my signature simply by sketching it out with my fingernail on a table.

Every few years I dust this off and play with it, wondering if there's some application or other way to "turn this into money" (an increasing concern in the coming months...<tiny>PLZ HIRE ME</>), but I'm not a "product guy".

I'll answer some questions about the technology, but would really love to know if anyone here has advice on somehow using this achievement to pay rent. :)


Videoconferencing applications are implementing this in conjunction with face detection to "steer" wide angle cameras into "talking head" shots.

The current tech is spooky trash, parlor trick-quality from what I've used. Every time we use some of the automatic gizmos in conference rooms, we get tickets to make it stop.

Pick your favorite top video conferencing platform or camera maker and they'll want to improve what they have. The Creepy - "Just works" jump is a big one.

PS., the industry trade show is happening RIGHT NOW at https://www.infocommshow.org/


I am an acoustics consultant who designs buildings with architects, then sees them through construction. Doors intended to isolate noisy rooms regularly underperform, whether due to manufacturing or installation problems. Lots of fingerpointing when we call it out on project sites, and having a camera show the weaknesses due to the perimeter gasketing, frame, door leaf, or wall construction surrounding the door would provide the necessary visual for contractors to see the problems we are pointing out.


I wonder how many building projects use and engage acoustic engineers (unless sound is specific to the use like for theatres or conference facilities)?

I had the "opportunity" as a patient in a very new hospital wing some 7 years ago for about 13 days. While I had a private room, the door was always open to the walkway, I guess as is normal to allow quick response by medical staff. But at one point I really felt overwhelmed by all the external noise that I could hear from the other rooms in the ward, nurse stations, and so on. I really felt that the hard surfaces and even the angular nature of the floor layout was conspiring against me - almost focussing noise into my room. I imagine given enough data you could show the poorer rest of patients prolonged their recovery period and hence increased bed occupancy and cost to the health system (important in a state run public hospital service that is predominant in Australia).

As such it would be nice to think hospitals, schools, offices would include thorough acoustic assessment to at least allow appropriate mitigation of noise during design (before having invoke more active measures like soft furnishings, etc).


My response is US-centric, but most hospital projects do have an acoustics consultant involved. Improvements have been made, but not nearly enough. Challenges include the need for sound absorption to be porous, but this is at odds with cleanability. Increased focus has been on patient room doors, which are increasingly sliders. They work well when they're closed, but the challenges include getting a good bottom seal when the threshold has to accommodate beds and equipment being rolled in and out. Hospital equipment manufacturers are also improving the sounds of their equipment, away from the cheap piezo beepers.

Schools very often have acoustics reviews, although more often in cities than rural areas. Classrooms in addition to auditoria, gyms and common areas. Standards exist for those too.

Office buildings are hit or miss. The developer may hire us for a base building review. Tenants' architects hire us as they design their workplaces. There's a lot of push and pull to find a balance of the modern open ceiling industrial aesthetic and glass conference rooms with reasonable acoustical goals.


Some, but Not enough

Working in conference rooms, people far too often are concerned about the look ( big windows, natural light, great table ) and less concerned about acoustics than would be reasonable. In my experience, Architects think people just sit around the table, and chat.

I once had a brand new office upfit project with a "flagship room" that was a large flex space, (could be a board meeting, could be a hackathon). There were three sides of glass, a concrete floor and metal ceiling structure. If you clapped in the room, it could be heard for 1.5 seconds as it bounced across all those surfaces before decaying.

My bosses were furious at the microphone quality, the installer was unhelpful and bailed on us. We hired a consultant to perform an evaluation and he told us that the room was awful, with lots of numbers ( figures for acoustic reverberation at different frequencies ) and told us of certain products that could help in those ranges.

It is a lot easier to design rooms with acoustic features than it is to retrofit them in terrible sounding rooms.


I think you're talking about a different problem, though that one could also be helped with the right tool.

After setting up the array, once could answer questions like "where is that darn squeak coming from!", and even characterize the undesirable noise, both spatially and spectrally. You could also measure how effective noise isolation materials are, but I don't see what the spatial information gets you.

However, if weaknesses due to "perimeter gasketing, frame, door leaf, or wall construction" result in some sort of localized noise, then a system like this would certainly pinpoint it.


Yes, this is in reference to noise passing through a door that is localizable. Typically we would play pink noise (broadband, equal energy per octave band) on one side and listen for "hotspots" on the quiet side. I would wonder about the precision of being able to tell in the image the contribution of the door frame from the perimeter gasketing.

Acoustic cameras like from Noiseless Acoustics are on the market though they seem to be marketed to industrial customers. There are similar mapping systems using a scanning mic like from Soft dB.


Ah! In that case yeah, this tech could possibly help. At least, I'd love to give it a shot! (I hope you don't mind an email from me later...)

It's sensitive enough to noise that I can pick up (and locate) the air vents in a room, even when the sound is at the threshold of hearing. Noise (pink and white), and even more so MLS (maximum length sequence) really "jumps out" (it's very obvious), well below my threshold of hearing.

There's so many interesting areas of research I've never had the time/money to fully investigate. I'd love to play with an "active" system, not just "passive", with a goal of experimentally finding modes of resonance of objects in a room.

I bet once can tell the relative contribution to noise of one physical object over another, but I don't know enough about construction to know if one would be able to separate the door frame from the perimeter gasketing. You do need line of sight for it to work. At the least, you'd have a way to quantify the sound leak, with numbers and reproducibility.

FWIW the Noiseless Acoustics camera costs ~$18k(!)


I have absolutely no knowledge about this domain, so this idea 1) might not be viable and 2) is not a full developed product idea - but I thought it would be fun to get your thoughts.

Could you put some sound sensors inside of some mechanical structure and use the acoustics to figure out where some physical contact is happening on the outside of the structure?

Specific Application: prosthetics devices that can - with only a few acoustic sensors - determine where the 'touch' was on the outside of the device.

If viable, it may be similarly useful for robots - or any machine in general - that needs a low-hardware (and thus low cost) method of getting course tactile information around it's boundary.


I would suggest asset protection. I used to work for a company (Droneshield) that would provide protection from drone-based sabotage/intelligence gathering/smuggling.

It's a big field, but if you could, for instance, set up an alert for when a noise source crosses a property border, or when something that sounds like a human comes within x meters of a particular building, I think that would be valuable. I'm no longer there, but happy to talk if you're interested.


This sort of tech was really good at tracking drones. Neat thing about drones is they put off so much waste audio, rich with information.

I know very little about drones, but, in a demo I made years ago (trying to convince someone to give me money for using this in a drone defense product), I was able to "fingerprint" different drone models, even sometimes distinguish between two different drones of the same model. As long as there's line of sight, you can sometimes "see", in the data, slight changes in the speed of the propeller, as well as the rough "shape" of the drone itself.

But for all those applications, even though I love sound, I always think to myself.... "wouldn't a pair of cheap cameras do this better?" :)

> no longer there, but happy to talk if you're interested.

Thanks!


> "wouldn't a pair of cheap cameras do this better?"

That's part of where they've gone (though the cameras are far from cheap) as well as RF, with some AI magic sprinkled on top.

I wouldn't suggest doing drone defense, but smaller-scale asset protection might be more approachable. There's what, 3000 local governments in the US? I'm sure a lot of them have had a tractor stolen from a road construction site. Or maybe if they lease their equipment you could sell a solution to the leasing company.


It would be great if I could point something at a noisy machine and find out precisely which panel is loose and vibrating!


Really? Would it be "great" enough that you'd pay for such a device? If so how much? (would love to continue over email)


Looks like it's found use in automotive applications for tracking down NVH (noise, vibration, and harshness) issues.

https://www.youtube.com/watch?v=C18AA_nlQN8


I’d definitely pay for it. Though I’d use it outdoors in the woods to see where birds and other critters are hiding. Though I reckon the noise will pose a challenge.


Yes. Email sent ;)


There was a thread on here a while back where people would set up cameras and be able to make a motion image showing what parts moved with a regular frequency. Was being used at factories to determine points of failure via vibration. Can't find the thread now though.


Aha, it was a video by Steve Mould: https://www.youtube.com/watch?v=rEoc0YoALt0


Not saying it's exactly the same thing, as I don't think a video overlay was involved, but I know someone that got their PhD in this area in the 1970s and had a long career working for a U.S. military contractor doing this. The U.S. military has a significant interest in acoustic beamforming, both in the air and underwater, for obvious reasons.


Oh absolutely! Sorry, I was excited in my typing. It's not every day you see "your baby" on HN :) Everything I built was on the shoulders of giants, and lol, I didn't invent beamforming itself (of course).

The problem I have trying to find the niche application is that there's not much (at least, that I could think of) where you can have high quality audio data, but where a simple camera wouldn't work. Also, full imaging (as opposed to just tracking the largest/loudest source via TDOA) is quite different math, stupendously more computationally intensive.


Industrial automation monitoring is a major commercial application. I was going to look at https://www.minidsp.com/products/usb-audio-interface/uma-16-... for doing it. Do you know what limits of detection you could get, and on what equipment?

Monitoring structural vibrations is also useful, and I think is an ongoing research area. I mention this because it's possible to sell it and research it at the same time.

What about synchronized cameras in different locations?


In industrial maintenace are such cameras in use to find air leaks in pneumatic. The model from Fluke is something like 20k USD. Because of this super high price, I don't know anybody who has a such model. So ask another company, if they are interested.


You want to pay the rent? Don't make a product, make a service, mainly, "WHERE IS THAT ANNOYING SOUND COMING FROM? SERVICE". People will pay you to locate sources of irritating sounds.


I use handheld sound amplitude meters and spectrogram software on smartphones to track down sounds I want to eliminate. This is often for musical purposes, but also for annoying sounds. An amplitude meter can be useful for finding the source of a sound, but lower frequency sounds are not as directional, and standing waves in rooms can make a sound source appear to come from somewhere other than the source (though perhaps moving a sound camera around physically could help eliminate this ambiguity?) The spectrogram can provide evidence of a specific periodicity or frequency range that can help identify a source. However these tools while useful are not sufficient all the time and are certainly not quick to use. I've often thought it would be really useful to have something like a gunshot detector for ordinary sounds, to be able to locate sounds both indoors and outdoors. It would help in answering questions like "is that sound a bear/dog raiding my trash?", "is the fan failing in my air purifier?", "is my air conditioner rattling something nearby?", "is my fridge transmitting a low level hum throughout the house?", "is that engine noise a truck pulling up to my house or a truck climbing a hill on the nearby highway?", all problems I have actually faced. Not sure if a sonic camera can deal well with low frequency sounds, but it would certainly make it easier to deal with higher frequency sources of noises. I'd buy one on impulse for $100, with some consideration of features at $300.


Use this to find a cricket indoors. I defy you.


I have "inverse aptitude" at knowing what should be a product. :) Never trust me to know what would sell or not. In fact, "bet against whatever I think". ;)

In my ideal (and in the best jobs I've had in the past), someone finds me (or I find someone) who I can share a list of "cool things I've figured out how to do, but don't know the usefulness of", and that person then tells me what to build.

For your "where is that annoying sound coming from?" service, what sort of scale do you imagine, and what form factor?

A handheld consumer device with a range of ~10m which points in the direction of the loudest thing?

How much would you pay for such a device? (would love your thoughts in email)


A "squid game"-style kids game.

Kids must tiptoe across in front of the camera.

If they make any noise, the camera uses a powered gimbal to aim a hose at them and squirt them with water.

And a terrifying siren goes off.


How precise can I make it? And how do I think about optimal microphone layout?

I've been considering putting multiple microphones on one side of the house to track birds. I mainly want to isolate the audio for recognition. But setting up a PTZ camera to get good shots would be even cooler.


> multiple microphones on one side of the house to track birds.

This was one of my first at home demos! Tracking birds and cars. If you want to get started DIY and figuring it out yourself, try about 5 microphones, each 1m apart. The trick is having all the mics in line of sight to the object in question. Otherwise you're left with doing an "intersection of angles" method, which is much simpler, but horrible in terms of precision.

In a later demo I used a PTZ laser pointer to track the moving sound sources, so, it's at least possible!


Thanks! And do you have suggestions about how to think about extracting relatively pure sound from that point? It seems intuitive to me that it's doable, and I'm sure there must be a lot of work already done, but I'm clearly not googling the right terms.


sound tracking + laser == mosquito zapper


I have seen a similar product used in automotive acoustic wind tunnels to help identify the locations of acoustic hot spots in the flow around a car.


What kind of range/sensitivity could you get outdoors? Maybe spotting deer for hunters. (caveat I'm not a hunter)


How does this compare with commercial ultrasonic flaw detection systems for physical objects?


No idea! For a few month's stipend and cost of ultrasonic parts, I can find out for you. :)


Combining this with Motion Amplification/Video Magnification [1] could result in some very interesting visuals and applications for factory equipment.

[1] https://www.youtube.com/watch?v=rEoc0YoALt0 Explainer Youtube video about Motion Amplification


Interesting. I'm casually familiar with Video Amplification (the approach at SIGGRAPH a decade ago IIRC), but have never implemented it myself. A really cool result, using the changes in the phase of the basis vectors over time to infer motion, without having to do dense optic flow.

I'm curious how you would combine acoustic localization in 3 space with motion amplification. I unreservedly agree that they are both "super cool", but don't see how they tie together to make something greater than the sum of their parts.

The only thing I thought of is, if two data channels (video, audio) are registered accurately enough, one could maybe combine the spatially limited frequency information from both channels for higher accuracy?

For example: voxel 10,10,10 is determined (by the audio system) to have a high amount of coherent sound with a fundamental frequency of 2khz. Can that 2khz + 10,10,10 be passed to the video system to do something.... cool? useful? If we know that sound of a certain spectral profile is coming from a specific region, is it useful to amplify (or deaden) video motion with a same frequency?


I don't suppose you have any idea if there are publicly available motion amplification tools, yet?


A starting point for the MIT research in question can be found here https://people.csail.mit.edu/mrub/vidmag/


The authors of the predecessor method released some of their code:

https://people.csail.mit.edu/mrub/vidmag/#code


Motion and color amplification from wu et Al are underused in my opinion. Maybe because under patent?


Patents will expire starting from 2035 up to 2040 depending on the method used.


Thus another surveillance tool is born.


It has been rumored that the US military has heartbeat sensors (aka real-life minimap) for decades now, would this really be a new one?


You can pick up heartbeat and breathing rate (if they are relatively still) with simple cw radar, i did it with a $10 HB100 module and audacity.

I bet if you really worked on tuning and filtering you might even be able to pick up the vibration of a persons throat to hear what they say.


Then put a gun on it and you have something even worse!


That is a great video.


Stupid idea(?): Back-project onto some sunglasses (or corners/edge for behind), and give deaf people some basic level of sound-based situational awareness. Combine with some voice -> text tech, and you could have something pretty interesting.


There is a long trail of dead startups attempting this. But don't let this dissuade you, please do a Show HN when you launch.


Probably requires the technology to reach some tipping point. It was the same with VR and motion tracking. We've been able to do those things for nearly half a century, but it hasn't been anywhere near commercially viable until recently.


The tipping point is the availability of quality, affordable AR glasses. Until recently, AR has been too niche, so even if the acoustic camera technology is fine, the company also has to build AR glasses to go with the other part of the system. Whenever Apple comes out with AR glasses, then writing an acoustic camera app is almost trivial in comparison to having to also having to design some AR glasses. Not having to design the glasses makes acoustic camera technology overlay software way more commercially viable.


> availability of quality, affordable AR glasses

And since Intel, Google, Facebook etc keep buying startups that produce cool things and preventing them from producing more cool things (North Focals being the most recent I'm aware of) it's gonna be a while


Epson and others are still going strong, but AR has been pigeonholed into drone piloting and industrial applications space for several years. The software ecosystem (especially compelling and usable interfaces for input and programming) is what's really lacking for broad adoption, the hardware works well and will only have incremental improvements and price reductions at this point.


Are any of those accessible to hobbyists? As in, can I order a pair of glasses and have a reasonably open SDK?


And if someone actually makes something remotely useful and successful, Google or Amazon will create a direct competitor and totally run you over.


Seems like the scenario the patent system is supposed to prevent.


Have a look at Microsoft Research's Seeing AI. It is still under development but can describe scenes and objects within, plus a bunch of other stuff such as documents, people, light, colour, currency, products etc. The app is only on Apple (no Android!) but the home page does have videos of each feature.

https://www.microsoft.com/en-us/ai/seeing-ai


There are people with fear of unexpected strangers walking up behind them, frequently from PTSD. This can be crippling in outdoor walking situations. A bag/fanny-pack sized device which can tap the wearer to let them know it really is time to check behind them would find a market… of some size… at some price.


One application for those that I think might be interesting is to record a scene and retain all of the raw audio. On playback, allow people to click on parts of the image and use beamforming to focus on that part of the audio.

Does anyone know if the array used here supports timestamped samples and/or clock sync to support multiple arrays? Or is it a single 16-channel stream?

Having done some very primitive dabbling with this stuff, the DSP programming is always the most interesting part to me. These folks are killing it with some really cool 3D scanning integration to the acoustic analysis

https://youtube.com/user/gfaitechgmbh


>On playback, allow people to click on parts of the image and use beamforming to focus on that part of the audio.

You can do that, but the gain isn't as pronounced as you'd like. A 12-16dB gain doesn't sound that dramatic.

Now, combined with some other newfangled math, like neural source separation, you might be able to do something spectacular...


Could you do the inverse and use Melodyne to isolate a particular sound, click on it, and then have the camera scan for and isolate its location?

https://www.celemony.com/en/start


I was thinking the inverse and transmit beamformed noise cancellation and create the 'shut-up gun' lol


That would be a lot of data. Instead of a few bytes of color data per pixel per frame, you'd need a thousands of samples per second per unit of spatial resolution.

Another approach to this is the Ambisonics method of capturing the directional soundfield at a point. But you'd need to use a high degree multipole expansion to get resolution anywhere closer to video.

https://en.m.wikipedia.org/wiki/Ambisonics


No, you need a few audio channels. If the algorithm can filter out what's where based on this 16 or even 4 microphone array, so can your client when provided with those sound tracks which are just a few kilobytes each. Probably all audio streams together are still less data than the video stream (given typical audio and video track quality combinations). You don't need a preprocessed stream for every point. Even if you want to keep the algorithm secret, you can have the client send the desired position to the server and do the magic there.


I think you're mostly right. If you want to be able to preserve enough bandwidth for music, you're talking about kilobytes per second of audio, but yeah, you could just store the streams from each mic in the array, so multiply the bitrate by the array size. And it's probably possible to find ways to save bandwidth with joint encoding.


Is it possible to tune this to specific frequencies to detect mosquitos? Their audio signal is pretty weak but its also a very specific frequency. This would definitely help in the hunting and killing of the little bastards.


No need for the hi-tech equipment for finding mosquitoes. Just take me to the spot you're looking, and they will find me.


Following Someone1234's comment[1] Stupid idea(?): Attach a mid-power laser to zap bugs, could even be a DIY project with an arduino and an PTZ mount.

[1] https://news.ycombinator.com/item?id=29015202


Just be careful not to blind yourself. Even a narrow flashlight cone could be enough for you to find and kill the bug by hand.

Link to a previous post about this mosquito turret concept: https://news.ycombinator.com/item?id=27552516


Exactly.. the laser introduces all kinds of safety issues. Simply shining a light and pointing me in the right direction vastly increases my lethality, especially combined with one of those fly swatter electric rackets.

My cat would serve a similar role for larger insects. Her eyes would track them for me so I could locate and destroy them. Unfortunately she either cannot, or more likely chooses not to, track mosquitos.


Yes. One surprising result is that weak sounds, even some below the threshold of hearing, are easy to detect, provided you have clear line of sight and no turbulence.


If the mosquito frequency is less than half of the sampling rate of these mics, then yes. Very basically, these algorithms work by looking at the delay between each mic picking up a certain frequency, and calculating the direction of the sound wave from that.


A mosquito racket increases efficiency at least 10 times. If you direct the output of that camera to a VR visor you can chase and zap them in the dark.


Coupled with a VR headset, this would recreate the sound goggles that were featured in the halloween Magic School Bus special 30 or so years ago.


FYI if you put your microphones in a random pattern you can reduce aliasing artefacts. It's basically the same as dithering / noise shaping.


This took me years to figure out. :) Even cooler is that you can put them in a random pattern AND have the system determine its own geometry w/o measurements.


This could be really useful for finding birds in trees when I'm out with my camera...


Would work! One of the first applications I made was a processing layer returning, in spherical coordinates centered at an arbitrary reference point, what the system determined as the "primary sound source".

In demo, the two angles drove a pair of servos steering a laser pointer. Followed the loudest object around the room :)

IME, finding a way to communicate the information to the user is often non-intuitive. That is to say, once a device has located birds in trees, how would you like it to inform you?


My first thought is for something like an arrow visible through the viewfinder, a bit like a damage indicator in an FPS game like Halo.


Hm. Doable. I think the hard part for that then might be getting the real time information about the position and orientation of the viewfinder in high enough resolution.

Keep in mind "the black box" can output the position in 3 space (x,y,z, measured in mm) of coherent sound sources, but to know where those are relative to the camera, so that once can draw a little arrow, can be hard.

I'd like to try hooking it up to a VR/AR headset, since I imagine those already handle the task of knowing precisely where my head is and where it's looking.


I think this might be possible with a phone that has AR support - you'd scan a QR code on the sound camera to capture its position relative to the world, then the phone could display a 3D view through the camera of where the sound source(s) are.


Oh that's interesting! Is your thinking something like:

1. mount the array on a tripod somewhere in the frame of the camera 2. the array is covered with an assortment of fiducials, 3. software uses the known intrinsics and extrinsics of the camera to figure out the array position relative to the camera 4. do the obvious thing with chaining transforms until you get the sound source position relative to the camera

If so, I think that would work, but would be a lot of coding to do all that CV...

> phone that has AR support

I take it cell phones now do much of this work?


Is this sensitive enough to find flying insects in a room?


Not speaking to OP's device, but yes, I was able to track a loud fly buzzing through a room in real-time. ~cm accuracy, but that can be improved on.


I think with some filtering it definitely could be.


If so, it’d definitely be worth coupling it with lasers to kill mosquitoes!


visualize that would be super cool


They usually find me.


Actual information on how it's done is here:

http://www.acoular.org/literature/index.html


fwiw they did this in world war I with microphone arrays + seismometer tape (picture of tape on p5)

https://acousticstoday.org/wp-content/uploads/2020/06/Battle...


Awesome article. Deserves a spot on the front page by itself! Makes me want to learn a lot more about W. L. Bragg's physics exploits, along with those of his father.


Based on my experience building corrscope, I feel this is the kind of project that will outgrow Python once you want to implement your own low-level algorithms, make it embeddable or shippable as an application, or parallelize it. I wonder what's the easiest way to port Python DSP code and UIs to a compiled/etc. language.


I'd like to see this done with a single microphone and a moving 'sound mirror' like a fan.

The fan blades should cause doppler shift and changing amplitude that varies based on the location of the sound.

I suspect that after just a few seconds, this would give better information than an array of 16 microphones.


Not a single microphone but there is acoustic vector sensor which can also give you the sound's direction. Very expensive though, several Israel companies use them for detecting gunshot's direction. https://www.ese.wustl.edu/~nehorai/research/avs/avssensor.ht...


Do you know anything about these "acoustic vector sensors"?

When I first saw a popular science article about them, I got excited about incorporating them into an array, but couldn't find any technical details, just a lot of what looked like vaporware. Is it anything more than three orthogonal pressure sensors? (aka.... 3 microphones?)


Microflown makes one, it uses very small temperature differences. You can look for acoustic particle velocity sensor to find more about how they work. I can't remember the paper, otherwise I'd provide a better link. https://www.microflown.com/products/standard-probes

This PDF may be helpful http://past.isma-isaac.be/downloads/isma2010/papers/isma2010...


_temperature_ differences. Ah. Thanks for that paper, it's enlightening.


Somehow I'm imagining doing the inverse kinematic model of a Leslie speaker cabinet by measuring a single point of spl over time


That would work for amplitude-based location but this is using phase correlation to find time of flight difference to each microphone. With a fan idea you would get a lot of phase drop outs and smearing that I think would make that difficult.

Not to say wouldn’t work, you would get results, but they will be based in a different strategy.


But the phase smearing is useful information...

Sure, the maths is complex... But there is only one source soundwave and location which causes a given smearing. The challenge is to find it...


Are there any inexpensive microphone arrays?

I was interested in making my own alexa-like device, but it seems mic arrays are sooooo expensive - more than the cost of an alexa device for the least expensive one i can find :/


The mike hardware used in the UMA-16 USB mic array [0] is the Knowles SPH1668LM4H-1 which runs about a buck and a quarter [1]. The DSP, SHARC ADSP21489, is pricier as an eval board >$500 [2].

0. https://www.minidsp.com/products/usb-audio-interface/uma-16-...

1. https://www.digikey.com/en/products/detail/knowles/SPH1668LM...

2. https://www.digikey.com/en/products/detail/analog-devices-in...


UMA-16 main audio processor is XMOS though.


> Are there any inexpensive microphone arrays?

Not that I can find! Building the array is way more expensive than it needs to be.

I have limited EE knowledge, so have been stumbling through it on my own, building my first array out of reference microphones, another with $10 omnis from guitar center, and one with 8x, cheap, repurposed webcams.

Right now, my limiting factor on driving the cost of a future array down is that I haven't figured out how to get a lot (at least 8) I2S inputs to a micro-controller. If that were solved, it would be easier.


https://www.hackster.io/sandeep-mistry/create-a-usb-micropho...

Main limitation is USB 1.1 IO, so ~1MB/s, unless you are fine with recording to SDcard then probably around 10MB/s. Pico itself can interface 29 microphones with no sweat (30 GPIOs, 2.0 GB/s internal bandwidth).


Funny you should link that! I have gotten as far as getting a raspberry pi zero to be a USB audio gadget, but I haven't played with the Pico yet. The raspi zero has one i2s input and one i2s output.

> Pico itself can interface 29 microphones with no sweat.

I... had no idea. I thought that since it didn't have an i2s peripheral I was going to have to either find a micro that did, or do something bitbangy using SPI and perhaps an external buffer. I see that it might be possible to get a few I2S inputs using the PIOs. Thanks for this, will certainly give it a shot.

(though I don't see how you're getting 29 microphones :P "prove it" ;)


30 GPIOs available to PIO state machines = 29 input pins from PDM MEMs microphones recorder at the same time.


The post links to an inexpensive array at the end. I don't really get why the 16-mic one he used is so expensive, those smd mics can't be more than $1 or so each...


those smd mics can't be more than $1 or so each

The actual mic capsules are likly far cheaper than $1 a piece (probably closer to $0.10 than $1) but the mics in an array need to be phased-matched. The two approaches to getting phased microphones are 1) building them using precision techniques so they are phased-match from the start (which is expensive and why pro phase-matched mics are around $1,000 each), or 2) get a whole pile of cheap mic, test them one-by-one (or really, pair-by-pair) and select the mics that are best phased-match to use them in the array. The #2 approach is cheaper, but does add cost.


IME, the array only needs phased matched mics if you're doing SDB, or something else that cares deeply about audio fidelity.

I've never used phased matched mics in my arrays (can't afford it!) and also have never needed to "bin" them. ("pair-by-pair" testing).


Couldn't one instead record at a higher sample rate (192 KHz+) and then align in software instead of phase matching the mics?


I don't know enough about this so maybe dumb question, but couldn't you use DSP to correct phase between microphones if you knew their relative differences?


Yes you can. Another problem though is that the microphones need to have a common timebase, unless you have More Magic.


The mics are probably cheap but 16x ADC in decent quality with decent power supply and low time offset between channels? How much is a 16chn audio interface with 16 mic preamps these days?


It can be done w/o phase locked ADCs :D it just takes... More Cleverness.

I made a 16 mic array out of a bunch of trash-picked and cheap 4ch ADCs.


I see they have a XMOS 200 chip on there maybe they do some DSP magic with that?


Yeah, the respeaker ones i have looked at - but at $25+ they still feel very expensive to me. Are they that complicated/expensive to make?


It’s probably a niche enough product that the engineering / marketing work that goes into it is a higher fraction of the cost than the raw components.


You can estimate 2D direction with 2 microphones (most of phones and laptops have at least 2 mic).


It would be interesting to see how well this could detect non incident sounds - for instance detecting reflective/resonant hotspots in an audio mixing/recording room.


Works very well for that! At least on high enough frequencies. Source: have done something similar.


I found that quite hard! Curious if there's anything public about your approach :)

With high enough frequencies I can see reflections, but not at any distance, and the sound source has to be loud. Of course, I'm relying on line of sight, and perfect reflection. Any bumps in the wall would add some phase error I think.

If I ever get a chance to work on the problem again I'd love to see if anything interesting can be done with multipath.


I feel like there has to be a cheaper way to do this than a $275 acoustic array. It's only 16 elements. You couldn't do this with 16 cheap microphones?


How cheap do you want them? $275/16 = $17 per microphone, or $15 per + $35 worth of additional materials to make it into an array. Or $10 mics + $115 of metal and plastic.

$275 doesn't really seem exorbitant for niche hardware given than you need 16 decently high quality microphones. I eagerly await a ShowHN using $2 mics and cardboard instead!


> given that you need 16 decently high quality microphones...

But... you don't. :) The challenge I find is getting the data into the computer. That's what always costs the most. I've done it with 8x $1 mics and a used $100 sound card.


CY7C68013A FX2LP can synchronously sample and transfer over USB 2.0 16 bits at up to 20MHZ. You can set it up with external IFCLK (5MHz minimum) and just pump data from 16 MEMS microphones to the PC all day long.

$5 dev boards on ebay with free shipping.

https://www.cypress.com/file/138911/download

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73...


Can't you buy a bunch of i2s microphones and use a cheap FPGA dev board with a USB interface?'

I may just order a bunch of i2s microphones...


Can one? Probably. Can I? Never done so before.

Can you? :)

I know enough EE to do simple things, about the amount you'd expect someone who's worked as a firmware engineer to know. The fact that you're saying "Can't you ... cheap..."? make me think this must be a viable path though.

My hands shake too much for anything but the simplest soldering; is there a cheap FPGA board you'd recommend? And getting all the data into the computer... it could easily be ~70Mbps. (16 mics, 192/24) Making a custom USB class could be a mess, I wonder how hard it would be to just dump ethernet.


Can I do right now... no but probably could figure it out :).

I don't have any specific recommendations for a dev board, though a selection of some cheap dev boards is here: https://www.joelw.id.au/FPGA/CheapFPGADevelopmentBoards

If you search around on Alibaba you can find some cheap dev boards that appear to have ethernet already (instead of you having to provide an ethernet PHY, which I think takes up something like 18 IOs?). Beware that even though most dev boards have a USB interface for programming it probably doesn't function as a USB for comms (i.e. you'd probably need to provide another USB PHY). Each I2S device probably requires 3 GPIOs. Someone who knows a lot more about this can probably make a much better recommendation.

If you do the beamforming on the FPGA then you probably only need to output a lot less data that might be easily done over a simple UART.

In the end, designing your own board with the right-size FPGA is probably the right solution, but that's only cheap per-unit in quantity and requires someone who more than sort of knows what they're doing. Though for someone who knows what they're doing it'd be a relatively quick project...


Why in the world would you need to sample at rates that high? The quality of the audio is next to irrelevant, you just need amplitude and position.


Right, it's not like the microphone probably has much of a response above 20 kHz and you can upsample while beamforming anyway .


what $100 sound card has eight analog (TRS?) inputs?


M-Audio Fast Track Ultra 8R. It's old. :) 8 in/8 out, class compliant, and has integrated preamps.


wow this is amazing. i directly work in automotive and we use super expensive stuff which does exactly this (for 500k) lol


These are very commonly used in manufacturing plants to find leaks in compressed air lines. I had a Fluke vendor visit the ol' airplane factory to see if we could use their tools to find air leaks in low-pressure ECS system ducting.

But c'mon, they are not $500k. More like $20K.

https://www.fluke.com/en-us/product/industrial-imaging/sonic...


How many times cheaper would a competing product need to be for you to consider buying one?

Obviously, Fluke, and the positive reputation that brand is known for, and reliable product support are worth a LOT, but I'm sure there's some $$ divisor beyond which you/someone like you would take a risk on something substantially cheaper.


Anything in a couple hundred dollar range is an impulse buy for small independent car shops if it has potential of saving substantial time.


If you want to upgrade, we can bolt one onto a Spot for you.

https://youtu.be/0cu2bT6fdZY


> $500k

Are you kidding me???! It costs so... so... so... much less. I thought automotive might be a good application, considering all the doors opened by using more DSP tricks layered in addition to source localization. (I can localize coherent sound patterns s well as coherent sound)

I would love to chat with you, happy to buy a coffee or beer for your time. My email's on my profile.


Some of these devices for automotive are large enough to surround a car on 3-4 sides, with several hundreds of microphones and the associated cables and positioning arms. Depending on where the devices he mentioned are being used, there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.

Here's an article about a large installation at Porsche. https://www.azom.com/article.aspx?ArticleID=18378


> there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.

HRTF stuff is fun, if that's what you're referring to! :) I've worked with some of that stuff before, including the stupidly overpriced mannequin heads.

> Some of these devices for automotive are large enough to surround a car on 3-4 sides, with several hundreds of microphones and the associated cables and positioning arms. Depending on where the devices he mentioned are being used, there are things like mannequins with heads and models of how humans hear for identifying sources inside a car.

Do you work in a field that would benefit from the same results, for a fraction of the cost? Or, if not, do you have any advice on how to find and talk to these mythical industries that could pay me? It looks like Porsche wanted to build their own, in house, but I'm hoping if it costs less than a tenth as much, maybe more people would want one.


Do you mind if I send you an email later at the address in your profile?


Please do!!! I spent about 10 years of my life obsessed with this problem/area of research, and, when I have ability to pay rent and eat, it's the problem I'll go back to.


This is super cool. I was thinking about making a 4x4 mems mic array on pcb exactly like that one. I had no idea you could just buy one off the shelf these days. Has anyone put four together to make a 64 mic 3D acoustic camera?


finally hardware to nail the guy who leans on his car horn outside my place


Awesome work! How computationally intensive is Acoular / how complex would doing this from a live feed instead of recorded files be? Thanks for posting your project.


I'm not familiar with Acoular*, but the math involved in computing the sound coherence function over a large space is quite involved!

In my implementation, there are multiple stages using a dataflow approach with lots of compile time optimization. In 2011 I could image a roughly 2m^3 space using 8 microphones at ~10fps in real time on 3 desktop computers, 2015 I was able to do 12mics, 3m^3 space, on 2 laptops, but that involved a LOT of custom numeric programming to shave cycles.

If I had access, I'd love to see what could be done given a well tuned implementation and modern GPUs. An efficient scatter gather OP (like what AVX3 has) would increase performance by an order of magnitude.

*OK, I've skimmed Acoular.


I imagine this + AR glasses can become quite the lifesaver for deaf folks. Throw in some voice recognition and you can have real-life speech bubbles!


Am I understanding it correctly that this is not using anything analogous to a lens? How does this not need a lens when optical cameras need them?


Audio is low enough frequency that you can process the signal directly. The time delay/phase information between each mic allows you to know which direction the sound is coming from. This is essentially the opposite of beam forming. Theoretically you could do it with visible light and not need a lens if you had a computer and sensors that could operate fast enough. But optical sensors typically only tell you the amplitude of light and not phase for example.

Edit: To clarify, the "opposite" of beam forming means using processing you can choose which direction you want to listen at any one time, like a beam. Then you can scan the beam across x,y and make an image.


Wouldn't a light version of this basically just be a fancy camera obscura?


The light version of this would be closer to a light field camera [https://en.wikipedia.org/wiki/Light_field_camera]

The major difference between a microphone array and an imaging sensor is the availability of phase information for the received wave. A microphone oscillates with the sound pressure wave, and that oscillation is translated directly to a voltage. Your software can see the full time series of that wave, so the information about it is 'complete'.

An optical image sensor, essentially, turns photons into electrons. The optical wave is too fast to turn into a voltage time series, so you only see the wave's amplitude at a given sample in time. Therefore, in order to turn it into an image, you need to recover some fraction of the phase information in some way.

A pinhole is one way to do that. One way to think of a pinhole is that it maps every source point to a distinct imaging plane point, so the phase of the wave doesn't matter as much to the final image. It acts as a filter that cuts out ambiguous information that phase would have disambiguated.

A lens performs a similar operation by interacting with the light wave's phase to bend wavefronts in a way that maps points on the object to an imaging plane.

Those approaches don't recover 100% of the phase information, but they recover or filter enough to form the image you care about. Light field cameras attempt to recover more complete phase information through various ways better explored in the wikipedia link.

Could you create a sound blocking plane with a pinhole that makes an acoustic camera that follows similar principles to an optical camera obscura? I bet at some level you could, but I also bet it would not be very advantageous. You still need a microphone array to act as the imagine plane. The size of the pinhole is probably very constrained by sound wave diffraction (it's a pretty long wave after all, compared to light). Using the directly available acoustic phase information is more compact and efficient.


I figured if you were to create an optical camera on the same principles of an acoustic camera you would get into trouble with the very short coherence length of sunlight. It's easy enough to build something that can deal with a laser, but sunlight has a coherence length of just a couple of dozen micrometers. If you are working on a larger scale than that, the phase information effectively becomes useless.


Thanks!


Fun fact: we manage to record amplitude and phase of radiowaves, though. That allows us to record them at different points on Earth, ship the recorded data to a datacenter and computationally merge them to get a planet-wide virtual telescope dish with a much better angular resolution that a single telescope dish ever could have.


we don't record phase, there is no way to recover the phase from a single signal

what we do we make sure that all receivers are synchronized, i.e take samples at the exact same time

then you can correlate the signal received between dishes (which will arrive at different times due to delays in propagation), and find out the time difference of the signal which then points out to signal origin (beam forming) - this is how phased radar works

once you align the signals you can use the minute differences in the signals to compute a synthetic aperture, i.e improving the angular resolution


There are two main ways to do it: algorithms based on time difference of arrival and algorithms based on estimation of sound energy on a predefined grid. You can also estimate the distance but it will not as accurate as the direction.


In fact, you can use some lens such as parabolic reflector but it will make the problem very complicated to solve.


Could this work underwater? Would be interesting to have a acoustic tracking system for diver safety.


That's pretty cool as far as adding to a sensor fusion stack


I'd like to see a video with an acoustic mirror.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: