The problem isn't that algorithms are inherently sequential though but rather th...

The problem isn't that algorithms are inherently sequential though but rather that parallel programming is a separate discipline.

In single threaded programming you have almost infinite flexibility, you can acquire infinite resources in any arbitrary order. In multithreaded programming you must limit the number of accessible resources and the order is preferably well defined.

In my opinion expecting people to write parallel algorithms is too much, not because it is too difficult but rather because it has to permeate through your entire codebase. That is a nonstarter unless the required changes are not unreasonable.

The challenge then becomes, how do we let people write single threaded programs that can run on multiple cores and gracefully degrade to being single threaded the worse the code is optimized?

I don't have the perfect answer but I think there is an opportunity for a trinity of techniques that can be used in combination: lock hierarchies, STM and the actor model.

There is a unit of parallelism like the actor model that executes code in a sequentially single threaded fashion. Multiple of these units work in parallel, however, rather than communicating through messages, STM is used to optimistically execute code and obtain a runtime heuristic of the acquired locks. If there are no conflicts, then performance scales linearly, if there are conflicts, then by carefully defining a hierarchy of the resources, you can calculate the optimal level in the hierarchy to execute the STM transaction in. This will allow the STM transaction to succeed with a much higher chance which then eliminates the primary downside of STM: performance loss due to failed transactions whose failure rate creeps up the more resources are being acquired.

A lock hierarchy could look like this: object, person, neighborhood, city, state, country, planet.

You write an STM transaction that looks like single threaded code. It loops around all people in the times square and would thereby acquire their locks. However, if that transaction was executed on the object level, it would almost certainly fail because out of thousands of people only one needs to be changed by another transaction to fail. The STM transaction acquired a thousand people, therefore it's optimal level in the hierarchy is the neighborhood. This means only one lock needs to be acquired to process thousands of people. If it turns out that the algorithm needs information like the postal address of these people, it is possible that some of them are tourists and you therefore acquire resources that are all over the world, you might need to execute this transaction at the highest level of the hierarchy for it to finish.

The primary objection would be the dependence on STM for obtaining the necessary information about which resources have been acquired. This means that ideally, all state is immutable to make software transactional memory as cheap as possible to implement. This is not a killer but it means that if you were to water this approach down, then the acquired resources must be static, i.e. known at compile time to remove the need for optimistic execution. That still works and it lets you get away with a lot more single threaded code than you might think, especially legacy code bases.