If you have a legacy application that demands threads we can emulate threads by essentially using coroutine approach.
http://en.wikipedia.org/wiki/Coroutine
What would be the impact of such a coroutine emulation if the threading is used to leverage multi-core hardware for high performance computing such as done by Atlas [1], OpenBLAS [2] or MKL [3]? These libraries are tuned to maximize CPU cache hits. It seems to me that executing each thread task sequentially using coroutines would probably break such optimizations.