The conversion isn't so much as "automatic" or opinionated, but clearly defined ...

hunter2_ · on Sept 18, 2022

The relevant phenomena are known as localization and the cocktail party effect. If you put a microphone in the middle of a cacophonous cocktail party, it would be hard to follow any given conversation by listening to just that one combined signal. But if you're actually there, your brain can hone in on any of several conversations.

Having dialog in a center speaker means it comes from a different location than the music/fx, so it's easy to hone in on it even if it's a little quieter than the music/fx. Having dialog in the same speakers as music/fx makes it much harder. The specified 5.1 to 2.x mixdown ratios might be good or might be inadequate depending on how correlated the original left track is with the original right track. A ridiculously loud blast only on the 5.1 left means your brain can hear dialog from your 2.1 right unimpeded. A medium volume explosion on the 5.1 left and right (but not center!) leaves you with no 2.1 speaker producing dialog without it being masked by the explosion, especially if the explosion sound is mono-ish.

noasaservice · on Sept 18, 2022

> The relevant phenomena are known as localization and the cocktail party effect. If you put a microphone in the middle of a cacophonous cocktail party, it would be hard to follow any given conversation by listening to just that one combined signal. But if you're actually there, your brain can hone in on any of several conversations.

That's because a human is not 1 microphone. It's 2 microphones, with a known distance between the 2, which allows realtime 3d positioning and isolation of sound to an area.

The open source hardware "ReSpeaker" allows to start experimenting how a microphone array works, including why the cocktail party effect doesn't really affect us in most cases.

The notable exception is if there's a signal that is generated perfectly on the plane perpendicular to the 2 ears. Then, humans have a hard time localizing it between front or back (180deg swap). We can still get an angular vector where the sound is. However simply turning your head removes this constraint exception.

(Also bring able to your your head and move your body also shows a visual-acoustic SLAM algorithm going on in your brain.)

lebuffon · on Sept 18, 2022

1. "That's because a human is not 1 microphone." true 2. "It's 2 microphones, with a known distance between the 2" 3. "which allows realtime 3d positioning and isolation of sound to an area"

#3 does not follow from statement #2.

The missing element is that 3D localization is due to the pinna.

https://royalsocietypublishing.org/doi/10.1098/rspb.1967.005...

radicalbyte · on Sept 18, 2022

I have a 3.1 set-up with the center boosted by the maximum amount allowable. Will eventually upgrade to 5.1 once the kids are old enough to not climb on the rear speakers.

brokenmachine · on Sept 18, 2022

You will find that even with a 5.1 setup, you will still want to boost the center a lot.