I can believe that such a data structure would be harder to use, since if you bubble the unsafety upwards in your API then you'll force your users to use the `unsafe` keyword. But I'm afraid I don't see how this implies that data structures would be limited or harder to implement: raw pointers exist in Rust, and they're exactly as powerful and unrestricted as they are in C++. The only way it would be harder would be if you were trying to use references to ensure safety guarantees about your program above and beyond what C++ is capable of guaranteeing (i.e. doing more than what C++ does).
> See this example, the problem is trivial to solve in most
> languages
I disagree. That proposed code (again using references which provide guarantees that no other language provides) is failing because Rust statically prevents data races, which is not only non-trivial to solve in most languages, but actually impossible to solve in general in most languages. The way around this is to, again, regress to what other languages do and use raw pointers directly.
About raw pointers — no you can’t do same as in C++. There’s no malloc/free, and placement new is still unstable. Also they feel foreign to the language, and sometimes I can’t even get them from the standard collections (like when I need 2 mutable pointers to different elements).
About the safety, it’s important to understand there’s a price. In some cases, it causes simple things hard to implement, or cost performance.
For example, there’re algorithms out there that modify items of the same collection at the same time, both single threaded like sorting, and parallel.
Can such algorithm cause a data race? Yes.
Is it good the compiler tells me about that? Sure.
Is it good the compiler prevents me from doing that at all, and forces me to code some workarounds, that will cost me time to implement, will be ugly, and/or will sacrifice runtime performance? Not sure, I like having a choice.
That's a fair criticism, the onus is on us to get these features finalized and polished. :) Placement new is certainly going to be a high priority once MIR and specialization are wrapped up.
> Is it good the compiler prevents me
> from doing that at all
But it's not preventing you from doing anything, it's just telling you that it can't automatically prove that a use of references is safe. This is precisely what raw pointers are intended for: Rust isn't about entirely eliminating potential unsafety, it's about encapsulating it behind an "I know better than the compiler" interface that can be feasibly hand-audited.
> workarounds, that will cost me time to
> implement, will be ugly, and/or will
> sacrifice runtime performance
I still don't quite understand this criticism. If you were so inclined, you could forgo references altogether and code Rust as though it were C, with nothing but raw pointers, and it shouldn't take any more time to write than C, be no more ugly (syntactic differences aside), and have no difference in runtime performance.
> you could forgo references altogether and code Rust as though it were C, with nothing but raw pointers, and it shouldn't take any more time to write than C
Disagree.
In C++ we have sophisticated debuggers, we have standard C library, STL with its containers, algorithms and lots more, boost and other third-party libraries.
When coding typical Rust, we have descent-quality standard library, also big set of third-party libraries.
When coding Rust as though it were C, we have almost nothing, because the rust’s standard library doesn’t expose unsafe API.
Unsafe Rust only takes same time to write as C++ if you’re coding something trivial, like a “Programming 101” homework. For anything even remotely useful, debugger and libraries make huge difference.
A single bounds checking feature (C++ has that with STL containers in debug builds, controlled by _GLIBCXX_DEBUG or _ITERATOR_DEBUG_LEVEL) can sometimes save hours.
What features do you look for in a sophisticated debugger? There's upstream support for Rust in GDB (https://np.reddit.com/r/rust/comments/4merw3/gdb_now_support...), I've seen people do reverse debugging with rr (http://huonw.github.io/blog/2015/10/rreverse-debugging/) as well as Valgrind, and the compiler knows how to emit debuginfo, and so should work with any tool that can read DWARF (like LLDB). Mozilla is investing even more into dedicated IDE support this year and the next.
> the rust’s standard library doesn’t expose unsafe API
Well, here’s a simple example I’ve wrote above: “Some container that keeps some items + a linked list to track and evict the least recently used item from that container”.
You know, LRU caches are everywhere. I remember in C++, I recently wrote something similar in three completely unrelated projects. Well, a linked list alone isn’t usually enough, but a linked list + unordered map that maps elements to list iterators do the job very well. To update the LRU order, I query the hashmap and get a list position, then I move that element to the list’s tail. And when I need to evict an element, I grab the list’s head and evict those.
Note the following good things about this C++ solution:
(1) All those operations (hashmap lookup, linked list reorder, linked list remove head) are O(1), i.e. I can have millions items in my cache, and still update my LRU order for practically no cost (a few pointer writes).
(2) I didn’t wrote much code, only creatively combined what was already here in the C++ standard library.
Now Rust.
The safety means pointers to list nodes aren’t part of the list’s API.
And without raw pointers, linked lists are nearly useless, because we can’t split or reorder them efficiently. If you’ll read the documentation for LinkedList<T>::split_off, you’ll find the complexity is O(n), while for most other languages, linked lists are famous for being able to do that for O(1).
I’m not sure any efficient linked list API is possible within Rust memory management model, safe or unsafe.
> The safety means pointers to list nodes aren’t part of the list’s
> API.
A quick glance at LinkedList leads me to believe that there's no technical reason why we couldn't support a `split_at_node` function that takes a raw pointer to a list node (along with additional APIs that actually return raw pointers to list nodes, which themselves wouldn't need to be unsafe).
And even if my intuition is wrong and there's some reason why the implementation of LinkedList in the standard library can't do this for some reason, I see no additional reason why some other implementation of linked lists couldn't do this. (The standard library implementation likely optimizes for convenience rather than power, given that our data structures guru has a complicated relationship with linked lists: http://cglab.ca/~abeinges/blah/too-many-lists/book/README.ht... ). The fact that Rust doesn't favor these sorts of things hasn't stopped others from going ahead and writing implementations anyway: https://crates.io/search?q=intrusive
> I didn’t wrote much code, only creatively combined what
> was already here in the C++ standard library.
> I’m not sure any efficient linked list API is possible
> within Rust memory management model, safe or unsafe.
I'm still unsure what you're implying here, given that, again, unsafe code can use raw pointers willy-nilly (though as we've established, using the stdlib to compose data structures via raw pointers isn't always convenient).
> there's no technical reason why we couldn't support a `split_at_node` function that takes a raw pointer to a list node
How about rust memory management model?
When I split linked list in two, the new list must somehow become the owner of the newly detached elements.
For both unmanaged and garbage-collected languages there’s a heap (garbage collected or not) that’s shared across all threads of the process. List nodes are freed either explicitly with delete/free, or implicitly with GC.
Rust memory managed model forces to transfer the ownership when the list is split. AFAIK that’s exactly what happens inside split_off, and that operation is O(n).
> unsafe code can use raw pointers willy-nilly
If so, why the implementation of LinkedList<T> doesn’t? You know, O(1) and O(n) is a big difference.
> When I split linked list in two, the new list must
> somehow become the owner of the newly detached elements.
Yes, which is what `split_off` is already doing. It doesn't take unsafe code or breaking the "Rust memory management model" to achieve this; ownership transfer is a fundamental part of the language. The `split_at_node` function that I describe could simply copy the implementation of the existing `split_off` function while removing the part where `split_off` iterates through the list to find the node at the specified index (which is the O(n) part of the function).
> If so, why the implementation of LinkedList<T> doesn’t?
Because the stdlib doesn't want to expose too many unsafe APIs on its datastructures, giving it freedom to evolve the internals. And LinkedLists are considered a niche type anyway, not for general use.
As kibwen mentioned, you are free to create your own linkedlist implementation that exposes this API.
Rust has a heap, and you can free things directly if you need.
Let’s summarize. I want to implement efficient LRU cache data structure.
In C++, I don’t need to spend time implementing and debugging my own linked lists, because standard library implementation already good enough for the task. I only need to encapsulate two standard containers together, and write a pair of methods to add/remove stuff.
In Rust, I need to roll out my own lists, because stdlib doesn't want to expose too many unsafe APIs on its data structures.
You agree with all of the above?
If yes, hope now you see how Rust isn’t good enough for creating complex specialized data structures.
That's a non sequitur. The choice made by thee stdlib to not support many linked list operations (there probably are crates in the ecosystem for this anyway) does not imply Rust is unsuitable for complex specialized datastructures. It implies that Rust has opinions about what should and shouldn't be in the stdlib.
In fact, you said it yourself, "specialized". Since linkedlists are generally used in specific cases with different required ergonomics, you do not want to expose a catch-all.
Why not? An LRU cache is moderately complex specialized data structure. As you see, for some reasons, it’s much simpler to implement in C++ then in Rust.
> The choice made by thee stdlib to not support many linked list operations does not imply Rust is unsuitable for complex specialized data structures
I never said it’s unsuitable. Technically, all modern programming languages are Turing complete, so in some sense they’re equivalent. I only said Rust isn’t good enough for that. Meaning, in many other languages doing that is much simpler than in Rust.
Linked lists are special case of trees, which in turn special case of graphs. Both trees and graphs are used a lot: DOM trees, syntax trees, KD/PK trees, quad edge graphs, and so on.
Rust’s memory management model is neither manual nor garbage collected. This feature makes it hard to implement those pointer-based data structures, where items are in shared ownership of the container, where you want to split/merge those containers, and move items across containers.
Not impossible, just hard and/or inefficient (like those ref.counted implementation that increment/decrement those counters each time I traverse the graph, even if I only reading stuff).
> it’s much simpler to implement in C++ then in Rust.
Because of the standard library. This is not a language issue, this is a library issue. This has nothing to do with Rust's memory model. It has to do with Rust's choices of having a smaller, maintainable stdlib and relying on the crates ecosystem for most libraries (which can then evolve independently of the language, which is nice).
Someone has to write a linkedlist implementation of the type you describe and put it in a crate, and that's it; everyone can use that. It should be trivial to make a pull request to https://github.com/contain-rs/linked-list to add the method you describe, for example.
--------
Edit:
Rust does make it harder to implement the bottom level of low level abstractions (doubly linked lists, LRU caches, etc), since these deal with memory the most and can easily become unsafe. You implement them once with a safe API, and use them as much as you want.
The reason std doesn't expose too many unsafe APIs on its datastructures is because if you need that level of control, it is better to have your own implementation. Safety is hard; and the more unsafe APIs you expose, the more you run the risk of external unsafe code depending on the internals of the implementation in the stdlib. Due to this, it would be preferred if you write the LRU cache with a linkedlist implementation embedded in it (which could be a separate crate for reusability), with the invariants well documented. This is a choice made by the stdlib, but has nothing to do with the language itself and is a choice that external crates are free to not follow.
> Rust does make it harder to implement the bottom level of low level abstractions
Yeah, prohibitively harder. If you really need an efficient linked list, or tree, or graph, Rust makes it nearly impossible to accomplish.
> The reason std doesn't expose too many unsafe APIs on its datastructures is because if you need that level of control, it is better to have your own implementation
This might be true for Rust, but the argument is language-specific. In C++ I often need “that level of control” and yet I’m happy with many stock implementations. Moreover, I can combine different data structures to produce whatever I need. Can’t in Rust, because the library designers (not just std, the rest of them as well) think I better have my own implementation if I need that level of control.
> Safety is hard
Not necessarily. C# is very safe, and yet it’s much friendlier to developers: I never used unsafe C# code to overcome language limitations, only to interop with unmanaged code.
No it doesn't. It's literally adding one extra simple function to a library.
> Not necessarily.
Well-performing safety is hard, then. Safety is hard in C++ and Rust, less so in C#. Yes, it is easier to write a linked list in C++, but ensuring it is used safely is hard.
> Can’t in Rust, because the library designers (not just std, the rest of them as well) think I better have my own implementation if I need that level of control.
Right, and that is solely a library issue. Rust doesn't prohibit having composable unsafe libraries either. The ecosystem just hasn't produced those yet, because people probably didn't need it yet; and it is mostly preferred for libraries to keep the unsafety encapsulated inside behind a safe abstraction.
> No function will transfer ownership of N nodes to the new container for O(1) complexity.
No safe function. It is easy to do with unsafe. This has been explained already.
> And a tree. And a graph. Trees / graphs structures and algorithms are everywhere, too bad Rust language designers forgot that.
Rust's designers are mainly from the C++ world. They know this. And we have seen tons of these datastructures crop up that are written in Rust code everywhere.
The idea is that you find a safe abstraction boundary for you low level data structure, and stick to it. For an LRU cache, the boundary is the LRU cache (not the internal linked list, which needs to be accessed using `unsafe`). The stdlib exposes a linked list that can be used standalone, but as part of a larger unsafe datastructure you should copy it out and integrate directly to get finer grained unsafe access.
Sure, they wanted a safe language. What they actually did, they created a language where data is a second-class citizen.
The strategy “find a safe abstraction boundary for you low level data structure, and stick to it” works well for a scripting language. Or for high-level code that’s not performance-critical, like business logic.
It doesn’t work well for a general-purpose language that’s used for algorithms-heavy performance critical tasks.
There’s no “low level data structures”, because data structures are all the way down, and composed all the way down. Very often I need unsafe things at my API boundaries, because performance or because interop. I can’t implement my data structures once and stick to them, because as I work on the project I change both code and data structures (the main input here is profiler).
That safety thing makes data structures impractical. As you see from my linked list example, they are hard to create, use, and compose.
BTW, graphs (including trees and lists) are just one example how pointers are useful in data structures. Here’s another one.
In C++, I can add an index to a collection using unordered_map<tKey, Value*>. This index allows constant-time lookups, and if the tKey is something small like an int, the RAM cost is very low, because only keys + pointers are stored in the hash map, not the values themselves.
If I’ll want to do same in Rust, I suppose you’ll tell me I need to copy-paste the collection source code and add index there, because what’s in stdlib ain’t composable the way I want :-)