Can someone explain the use of "volatile int running" here?
I'm not sure it's correct but it might work here because the only place it's where it's read from without locking is the main thread (`while(cond->running)`), which is also the only place where it's written to (with locks this time).
I'd err on the side of caution here and go with the rule of thumb "volatile is almost always wrong". The slightest change of the code (e.g. allowing another thread to terminate the app) will cause it to fail. It's not much effort to change it to __atomic_load/__atomic_store and remove the volatile qualifier or lock/unlock the mutex every time it's being accessed (costs about 50 nanoseconds).
volatile is usually neither necessary nor sufficient for concurrency as it doesn't have correct semantics. In some simple notification cases (just a polled exit flag for example), given a few assumptions about the compiler and machine model, volatile might have worked. But C11 has _Atomic and there is no excuse for volatile.
In the specific example given in the article, not only volatile, but even _Atomic would be completely superfluous. 'running' it is set once in main before any threads is spawned so there is no concurrency issues; after that every read and write access is done under the protection of the 'lock' mutex, so there is no need for any special qualification in POSIX or C11.
Interestingly, a 'volatile sig_atomic_t' used in a signal handler is both required and sufficient to communicate with the thread of execution interrupted by a signal handler and is the only portable use of the keyword.
edit: there is another portable use in conjunction with {sig,}{set,long}jmp of course.
I'm pretty sure the example code is not correct, but may work out of luck.
The first 'running = 1' is not guaranteed to be visible to other threads because there is no synchronization around it. In practice the pthread_create will probably "fix" the race condition.
My best understanding is that you need a mutex or an atomic every time you access shared data to guarantee visibility. Most CPU architectures don't need the implicit memory barriers from mutex or atomic but it's not wise to rely on it.
the creation of the thread (i.e. the call to pthread_create) in main "synchronizes with" the invocation of 'worker' in the child thread; the store is sequenced before pthread_create; in turn the invocation of 'worker' is sequenced before the load in 'worker', which implies that the store 'happens before' the load, guaranteeing visibility and absence of races.
This is formally guaranteed by C++11 and C11 for std::thread and thrd_create, and also by posix for pthread_create although the language is more informal.
That's wildly inaccurate. It's a language level feature that indicates values can change in ways other than what's suggested by the code that manipulate them.
I'm not sure it's correct but it might work here because the only place it's where it's read from without locking is the main thread (`while(cond->running)`), which is also the only place where it's written to (with locks this time).
I'd err on the side of caution here and go with the rule of thumb "volatile is almost always wrong". The slightest change of the code (e.g. allowing another thread to terminate the app) will cause it to fail. It's not much effort to change it to __atomic_load/__atomic_store and remove the volatile qualifier or lock/unlock the mutex every time it's being accessed (costs about 50 nanoseconds).