It's a big deal because, while it has some downsides, being stalkless means they can have next to no overhead, meaning it can be performant to use coroutines to write asynchronous code for even very fast operations. The example given https://www.youtube.com/watch?v=j9tlJAqMV7U&t=13m30s is that you can launch multiple coroutines to issue prefetch instructions and process the fetched data, so you can have clean code that issues multiple prefetches and process the results. Whereas in Python (don't get me wrong, I love Python) you might use a generator to "asynchronize" slow operations like requesting and processing data from remote servers, C++ coroutines can be fast enough to asynchronously process "slow" operations like requesting data from main memory.
Wow, that talk is a fantastic link. He actually gets negative overhead from using coroutines, because the compiler has more freedom to optimize when humans don't prematurely break the logic into multiple functions.