I’ve had an idea for a while now, of writing a tutorial that works similar to this one, but backwards. I have no time to write the tutorial, but here’s the idea:
Behind the scenes, write a short C program, and compile it into LLVM IR through Clang. Present the code first as its LLVM IR representation, then work through how you’d manually decompile it back into C, hopefully giving a result that will compile back to the same LLVM IR.
(You’d be teaching some of the fundamentals of compiler theory here, in “reverse order”, but IMHO that’s exactly what you need if you’re to understand where C and Rust really diverge.)
Then, walk through the decompilation process again, except this time with Rust semantics, to end up with unsafe Rust that compiles to identical LLVM IR as the C code does.
And then, finally, treating the unsafe Rust as the new IR, walk through how you’d decompile that unsafe Rust into safe Rust, introducing each static-analysis compiler pass that Rust does in reverse, adding one feature at a time, until the code is all safe Rust, but can be theoretically partial-compiled into the earlier unsafe-Rust code†, and from there into the same LLVM IR as the other two examples.
The idea here is that this tutorial would be a way of “reading history backwards” (ala https://slatestarcodex.com/2013/04/11/read-history-of-philos...). I.e., rather than coming into a world with a fully-formed Rust and a fully-formed C, you’d start with a world that only has ASM; and then introduce C into it; and then introduce Rust into it, seeing how each introduction changes what you can do.
† It’d be helpful if rustc had a mode where it could emit some sort of “core Rust” IR, that was all unsafe{} and had only the barest types, in the vein of early C++ compilers that emit C, or how the Erlang compiler works. Sadly, I don’t think rustc is architected in a way that’d allow this. (Though, paging steveklabnik to correct me.)
> It’d be helpful if rustc had a mode where it could emit some sort of “core Rust” IR, that was all unsafe{} and had only the barest types, in the vein of early C++ compilers that emit C, or how the Erlang compiler works. Sadly, I don’t think rustc is architected in a way that’d allow this. (Though, paging steveklabnik to correct me.)
MIR seems to be the closest thing to a "Core Rust". You still have access to all the types, but everything is desugared down to the barest possible syntax. While and for loops are desuggared to loop+break, complex field accesses are split into temporary to ease borrow analysis, etc...
I'm not sure what happens to unsafe block, but I suspect those gets stripped as well, and MIR is assumed to be valid. There's a tool called miri that can be used to find UB by running the MIR - if you manage to trigger UB, miri will error out telling you what went wrong.
> It’d be helpful if rustc had a mode where it could emit some sort of “core Rust” IR, that was all unsafe{} and had only the barest types
Rust has a "core Rust" called MIR. It has nothing to do with unsafe and definitely doesn't remove types though. It's mostly about desugaring constructs, so by and large a simplified version of Rust (though it has things normal Rust doesn't, namely goto).
Compiling is a destructive process. While you can definitely learn a lot by reading IR and figuring a program out, and while it is easy to show why programming in ASM is time consuming, I don't think it makes sense to teach compiler theory and compiler passes that way.
In general, moving from one language to a "higher level" or "stricter one" (such as unsafe Rust to safe Rust) requires a redesign to properly do it (if at all possible).
The notes of the tutorial are well included and answer all the questions I ask myself while reading, it's making the tutorial much more enjoyable than the ones I usually read. A great read if you're interested in Rust but haven't played with it yet IMO.
Do you mean the hypothetical part 6? It doesn't exist. Part 5 was published on December 14th but the author hasn't released anything since (I wouldn't be surprised if they were busy given it's the end-of-year).
Direct. Tried again, works fine. It probably got a bit under the weather when it finally reached the front page, especially given it's also at the top of /r/programming right now.
Nowadays it seems quite rare to find someone using C for low level without solid CS background. In fact I bet a good percent of CS major undergraduate or even graduate can’t write good C code.
My girlfriend who doesn't really programm worked with C on a Arduino. I don't have any CS backgound at all and use it constantly on embedded systems (be it Arduino, Teensy, or some MCU).
It might make more sense during the ages when programming is still considered a speciality, then CS as a major has some virtue and it’s largely a pre-req. Of course nowadays the bar is low enough
Behind the scenes, write a short C program, and compile it into LLVM IR through Clang. Present the code first as its LLVM IR representation, then work through how you’d manually decompile it back into C, hopefully giving a result that will compile back to the same LLVM IR.
(You’d be teaching some of the fundamentals of compiler theory here, in “reverse order”, but IMHO that’s exactly what you need if you’re to understand where C and Rust really diverge.)
Then, walk through the decompilation process again, except this time with Rust semantics, to end up with unsafe Rust that compiles to identical LLVM IR as the C code does.
And then, finally, treating the unsafe Rust as the new IR, walk through how you’d decompile that unsafe Rust into safe Rust, introducing each static-analysis compiler pass that Rust does in reverse, adding one feature at a time, until the code is all safe Rust, but can be theoretically partial-compiled into the earlier unsafe-Rust code†, and from there into the same LLVM IR as the other two examples.
The idea here is that this tutorial would be a way of “reading history backwards” (ala https://slatestarcodex.com/2013/04/11/read-history-of-philos...). I.e., rather than coming into a world with a fully-formed Rust and a fully-formed C, you’d start with a world that only has ASM; and then introduce C into it; and then introduce Rust into it, seeing how each introduction changes what you can do.
† It’d be helpful if rustc had a mode where it could emit some sort of “core Rust” IR, that was all unsafe{} and had only the barest types, in the vein of early C++ compilers that emit C, or how the Erlang compiler works. Sadly, I don’t think rustc is architected in a way that’d allow this. (Though, paging steveklabnik to correct me.)