Yes. But we mostly have one memory model today - one big flat memory space, because C.
There are other approaches, but most of the clever ones died off. x86-32 has lots of interesting things you can do with segments. Nobody used that stuff, so amd64 has flat memory.
Burroughs once had a memory system that worked like pathnames. Think of an address as being "job/program/module/function/variable[n]", not a number. User programs could not see the physical memory address. This allowed the OS to do swapping and compaction underneath the program. The hardware would catch subscripting out of range. Worked fine, never caught on.
Today we do that with collection classes, rather than at the OS/hardware level.
Memory models are making a comeback, as we hit the limits of caches. Machines with some memory shared, some memory unshared, and some slower if shared have appeared.
> But we mostly have one memory model today - one big flat memory space, because C.
Not really. C (or C++) memory model isn’t really well-defined yet, there’s still fights going on regarding e.g. values out of thin air (or maybe they’ve been resolved recently, I haven’t checked).
In contrast, Java has a well-defined memory model without any undefined behaviour, that can get quite tricky at times (e.g. what happens if you write to a final field (!!) (you first need to make it non-final using runtime reflection operations or use the Unsafe module)).
Interesting, thanks! I'll read the paper in its entirety later, but suffice to say, me and Hans disagree about the fact that all data races are errors, or what a reasonable compiler is.
For example, take this code:
// thread 1: // thread 2:
with lock(counter_lock): with lock(counter_lock):
print(counter) atomic { counter += 1 }
There's one of two possible linear executions of this code: either you print the old value of the counter (n), or the new value (n+1). That's it.
I argue that this code is correct, and should remain legal, even if locks are elided (provided that the write to counter is actually atomic, i.e. doesn't result in a torn write / read). In fact, if the lock is implemented as a spin lock, that's exactly what happens even with locks! I also think that no compiler is "reasonable" if it breaks this code, or invalidates the whole program ("undefined behaviour means anything can happen").
Edit: He actually mentions a similar problem in the paper, and argues that it's valid for the compiler to "optimize" the code and break it. I think the original problem here is that C++ compilers implicitly assume the right to perform global optimisations. I would argue that it's better to move to a local optimisations model, because it's easy to modify the code to make it locally optimisable (just read all variables and make local copies of them, and adopt Rust-like ownership data model for structs/arrays etc. so the programmer can communicate to the compiler the fact that the block of memory is local).
The pseudo-code you listed does not contain a data race by the author's definition since there is no simultaneous read/write assuming words have their usual meanings. As a result I believe it would not be UB in C++ today.
The protest that languages like C or C++ should just leave Undefined Behaviour alone so it will Do What You Mean is probably almost as old as I am and even more futile. You don't actually want that, the languages go as fast as they do because optimising compilers assume undefined behaviour can't happen.
The Java model tries to deliver what you wanted here and it fails. You still get surprising behaviour sometimes (defeating the objective of ensuring that the behaviour with data races is something you can reason about) but now you can't do certain optimisations that seem otherwise reasonable because they defy the promises the model made.
There are people in this space, who believe if they tighten the model even further they can get rid of the surprising behaviour while retaining most performance. That's where OCaml seems to be going. Maybe it will work.
The Rust model is the opposite, Rust evades the problem with its rule about modification XOR multi-reference. Either one thread can write this data, or one or more threads might read the data, but never both and thus no data races. Instead of ensuring you're able to understand the residual data races (and maybe conclude they're "benign") Rust ensures your program doesn't have any.
The pseudocode isn't UB because of the locks, but my argument is that it shouldn't be UB even if the locks were gone. In that case, there is a simultaneous read/increment, but it's still obviously linearizable exactly the same way as with locks, so it still shouldn't be UB. (Or, if the memory fence caused by atomic increment instruction makes it non-UB, then just imagine overwriting a flag without any fences instead.) In Rust it's not even possible to write such code.
Another example is Figure 5 in this paper [1] (referenced from Bohem's paper above):
x = y = 0
// thread 1: // thread 2:
a = x; c = y;
b = x; x = c;
if (a == b)
y = 1;
My point is that the assumption that a == b is invalid if x is a global variable. That's not a valid assumption for the compiler to make, ever, unless it can prove that x isn't accessed from different threads (as a human would have to, to manually perform the optimisation).
I haven't been tracking OCaml's progress too closely (multicore is just taking too long), but hopefully it's going in this direction... if you've any relevant references / descriptions of their memory model, let me know!
> In Rust it's not even possible to write such code.
What does "such code" mean though? Rust won't let you say that your mutable variables are just shared between threads, because that's unsafe. But if you say OK, I want atomic data types (and so they lack tearing and other things you were bothered about) then you can clone an Arc of them, and write code to manipulate the atomics whose consequences are just as under specified. It's just that now you have to spell out that this is what you wanted, writing e.g. std::sync::atomic::Ordering::Relaxed
If you later complain to me that this code surprised you because the memory ordering wasn't what you expected, I will point at that symbol. If you wanted some specific ordering then "Relaxed" was wrong.
> My point is that the assumption that a == b is invalid if x is a global variable.
No it isn't? This is an important optimisation, which you would prefer to sacrifice rather than just writing what you mean in the first place.
> if you've any relevant references / descriptions of their memory model, let me know!
Sure, however now your atomic has 'static lifetime so it lives forever and that's sort of cheating. The reference counter means nobody in particular needs to be in charge of cleaning up, whoever has it last will drop it. Call it personal preference.
Also, is the broken kinda of look of your blog an aesthetic choice or a bug? Is "![Today%20is%20my%20first" really what that ought to display under the title ? I feel like it probably isn't.
I don't see how 'static is cheating, that's the semantic you want: to be able to live as long as the various threads. If you used scoped threads you wouldn't need 'static, but those are less general, so I didn't get into it.
No, it's just buggy. :( Thanks for mentioning though, that bug is not one I had seen just yet.
> C (or C++) memory model isn’t really well-defined yet
To the limited extent this is true, it's about pragmatism.
In Java if the situation is that correct Java now has to be 5% slower then a new JRE is shipped that's 5% slower and some people are unhappy but life goes on.
In C++ if the situation is that correct C++ now has to be 5% slower then C++ programmers will mostly demand an "Incorrect-but-faster" setting on their compiler, and they continue to use the faster code.
So, in practice the fine details of the memory model must reflect the practical reality of the physical computers or else they're useless.
> But we mostly have one memory model today - one big flat memory space, because C.
This is not what is meant by 'memory model'. What is really specifies is how memory operations can ordered and when a write made by one thread is seen by another, acquire, release, relaxed etc.
That’s not what Gp is talking about. « Memory model » in a multithreaded context is the guarantees about memory-level interaction of concurrent threads.
I think the flat memory model predates C significantly. Didn't the earliest LISP (195x) already have a flat, garbage collected memory? In fact, C is one of the few languages which actually exposes the possibility of a segmented memory model to programs.
This topic is outside the realm of my knowledge, but at the language level, we do have that (compacting GCs), and I think we do have it at the OS level as well due to virtual memory (but hardware accelerated TLB).
There are other approaches, but most of the clever ones died off. x86-32 has lots of interesting things you can do with segments. Nobody used that stuff, so amd64 has flat memory.
Burroughs once had a memory system that worked like pathnames. Think of an address as being "job/program/module/function/variable[n]", not a number. User programs could not see the physical memory address. This allowed the OS to do swapping and compaction underneath the program. The hardware would catch subscripting out of range. Worked fine, never caught on. Today we do that with collection classes, rather than at the OS/hardware level.
Memory models are making a comeback, as we hit the limits of caches. Machines with some memory shared, some memory unshared, and some slower if shared have appeared.