Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The conversion isn't so much as "automatic" or opinionated, but clearly defined in a spec. And it _is_ mathematically correct in terms of sound energy entering the ears. A device may be doing the conversion incorrectly of course.

But there is still an omission, which is the introduction of the centre speaker was (as I understand it) to "pin" dialogue to the screen more effectively. That implies there _is_ some physical phenomenon takes place (eg. phasing/interference) which is not compensated for in the spec.

My own system is set up to deliberately boost the centre channel in the mix and it does help a lot, however I'm interested to know how to define this amount of compensation in terms of an actual physics or acoustics phenomenon.



The relevant phenomena are known as localization and the cocktail party effect. If you put a microphone in the middle of a cacophonous cocktail party, it would be hard to follow any given conversation by listening to just that one combined signal. But if you're actually there, your brain can hone in on any of several conversations.

Having dialog in a center speaker means it comes from a different location than the music/fx, so it's easy to hone in on it even if it's a little quieter than the music/fx. Having dialog in the same speakers as music/fx makes it much harder. The specified 5.1 to 2.x mixdown ratios might be good or might be inadequate depending on how correlated the original left track is with the original right track. A ridiculously loud blast only on the 5.1 left means your brain can hear dialog from your 2.1 right unimpeded. A medium volume explosion on the 5.1 left and right (but not center!) leaves you with no 2.1 speaker producing dialog without it being masked by the explosion, especially if the explosion sound is mono-ish.


> The relevant phenomena are known as localization and the cocktail party effect. If you put a microphone in the middle of a cacophonous cocktail party, it would be hard to follow any given conversation by listening to just that one combined signal. But if you're actually there, your brain can hone in on any of several conversations.

That's because a human is not 1 microphone. It's 2 microphones, with a known distance between the 2, which allows realtime 3d positioning and isolation of sound to an area.

The open source hardware "ReSpeaker" allows to start experimenting how a microphone array works, including why the cocktail party effect doesn't really affect us in most cases.

The notable exception is if there's a signal that is generated perfectly on the plane perpendicular to the 2 ears. Then, humans have a hard time localizing it between front or back (180deg swap). We can still get an angular vector where the sound is. However simply turning your head removes this constraint exception.

(Also bring able to your your head and move your body also shows a visual-acoustic SLAM algorithm going on in your brain.)


1. "That's because a human is not 1 microphone." true 2. "It's 2 microphones, with a known distance between the 2" 3. "which allows realtime 3d positioning and isolation of sound to an area"

#3 does not follow from statement #2.

The missing element is that 3D localization is due to the pinna.

https://royalsocietypublishing.org/doi/10.1098/rspb.1967.005...


I have a 3.1 set-up with the center boosted by the maximum amount allowable. Will eventually upgrade to 5.1 once the kids are old enough to not climb on the rear speakers.


You will find that even with a 5.1 setup, you will still want to boost the center a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: