mutexes are inherently single-threaded code, it's *exclusive*. It's a point wher...

mutexes are inherently single-threaded code, it's exclusive. It's a point where all of your threads have to serialize and run one-at-a-time to perform a certain operation.

additionally, common implementations often have bad behavior as the number of threads increases. spinlocks don't work well if you have 32 threads trying to take the same lock, they can consume quite a lot of CPU time just trying to take the lock. they don't work at all if you have thousands of threads (which is a situation that occurs regularly in GPGPU programming!).

you can of course increase the number of locks but that's basically what SQL/RDBMS does with row locks - and now you have the problem of deadlock too.

obviously it all depends on specifics - how many threads doing how much work on their own, compared to how many locks they're trying to take. it's not that they are inherently bad, they are in fact one of the basic primitives in concurrent programming really. they just don't really scale well with increased concurrency.

unfortunately, the deeper solutions involve restructuring your program and your data processing (or the data itself) to expose higher levels of concurrency and less interdependency, which is its own can of worms.