*The C++ 'std::optional' type makes calling '*v' undefined behaviour if 'v' is e...

CJefferson · on Dec 24, 2024

No, I'm talking about std::optional<T>. It's a type designed to store an object of type T, or nothing. You use '*v' to get the value out (if one is present).

It was originally discussed as a way of avoid null pointers, for the common case where they are representing the possibility of having a value. However, it has exactly the same UB problems as a null pointer, so isn't really any safer.

I'm going to be honest, saying that memory corruption in C++ is 'almost never a problem' just doesn't match with my experience, of working on many large codebases, games, and seeing remote security holes caused by buffer overflows and memory corruption occurring in every OS and large software program ever written.

Also, that 'penalty' really does seem to be tiny, as far as I can tell. Rust pays it (you can use unsafe to disable it, but most programs use unsafe very sparingly), and as far as I've seen benchmarks, the introduced overhead is tiny, a couple of percent at most.

CyberDildonics · on Dec 24, 2024

It was originally discussed as a way of avoid null pointers,

I don't think this is true. I don't think it has anything to do with null pointers, it is a standard way to return a type that might not be constructed/valid.

I think you might be confusing the fact that you can convert std::optional to a bool and do if(opt){ opt.value(); }

You might also be confusing it for the technique of passing an address into a function that takes a pointer as an argument, where the function can check the pointer for being null before it puts a value into the pointer's address.

I'm going to be honest, saying that memory corruption in C++ is 'almost never a problem' just doesn't match with my experience,

In modern C++ it is easy to avoid putting yourself in situations where you are calculating arbitrary indices outside of data structures. Whether people do this is another story. It would (or should) crash your program in any other language anyway.

Also, that 'penalty' really does seem to be tiny

The penalty is not tiny unless it is elided all together, which would happen in the same iteration scenarios that in C++ wouldn't be doing raw indexes anyway.

Maxatar · on Dec 24, 2024

>I don't think this is true. I don't think it has anything to do with null pointers

The paper that proposed std::optional literally uses examples involving null pointers as one of the use cases that std::optional is intended to replace:

https://isocpp.org/files/papers/N3672.html

Here is an updated paper which is intended to fix some flaws with std::optional and literally mentions additional use cases of nullptr that the original proposal did not address but the extended proposal does:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p29...

I am going to be somewhat blunt, but if you're not familiar with how some of the use cases for std::optional is as a replacement for using raw pointers, along with some of your comments about how C++ treats undefined behavior, as if it's just results in an exception, suggests you may not have a rigorous enough understanding of the language to speak so assertively about the topic.

CyberDildonics · on Dec 24, 2024

I am going to be somewhat blunt, but if you're not familiar with how some of the use cases for std::optional is as a replacement for using raw pointers

I never said there wasn't a use case, I said it wasn't specifically about protecting you from them. If you put a null pointer in and reference the value directly, it doesn't save you.

If you don't understand the context of the thread start arguments over other people's simplified examples. I'm not going to write huge paragraphs to try to avoid someone's off topic criticisms, I'm just telling someone their problems can be avoided.

Maxatar · on Dec 24, 2024

Yes you did and I literally quoted it, here it is, your words:

"I don't think this is true. I don't think it has anything to do with null pointers, it is a standard way to return a type that might not be constructed/valid."

This was in response to:

"It was originally discussed as a way of avoid null pointers,"

I have provided you with the actual papers that proposed std::optional<T> and they clearly specify that a use case is to eliminate the use of null pointers as sentinel values.

CyberDildonics · on Dec 24, 2024

It's a template, you can put whatever you want in it including a pointer. All your example shows is that you can bind a pointer to a reference and it will wrap it but won't access the pointer automatically.

I'm not sure what your point here is other than to try to mince words and argue. It's a standard way to put two values together and what people were probably already doing with structs. You can put pointers into a vector too, but that doesn't mean it's all about pointers.

saghm · on Dec 24, 2024

I'm pretty sure the issue that the parent commenter is referring to isn't about wrapping a pointer type in an optional, but wrapping a _non-pointer_ type in an optional, and then trying to access the value inside. std::optional literally provides a dereference operator operator[1] which contains the following documentation:

> The behavior is undefined if this does not contain a value.

The equivalent to this in Rust isn't `Option::unwrap`, which will (safely) panic if the value isn't present; the equivalent is `Option::unwrap_unchecked`, which can't be invoked without manually marking the code as unsafe. I've been writing Rust professionally for a bit over five years and personally for almost ten, and I can definitively say that I've never used that method a single time. I can't say with any amount of certainty whether I've accidentally used the deference operator on an optional type in C++ despite writing at least a couple orders of magnitude less C++ code because it's not something that's going to stick out; the deference operator gets used quite often and wouldn't necessarily be noticeable on a variable that wasn't declared nearby, and the compiler isn't going to complain because it's considered entirely valid to do that.

[1]: From https://en.cppreference.com/w/cpp/utility/optional/operator

jpc0 · on Dec 24, 2024

Is your argument that rust with panic if you are a bad programmer and C++ says its Ub of you are a bad programmer?

That's just fundamental difference of opinion, Rust isn't designed for efficiency, it's designed for safety first. C++ unofficial motto is, don't pay for what you don't use.

If I type *X why would I pay for a check if its empty, I literally should have checked the value isn't empty.

If you work in a code base with people who don't check, your codebase doesn't have static analysers, you do no code review and dereferencing an uncheck optional get's to production, do you think a .unwrap in rust wouldn't have made it to production?

CJefferson · on Dec 24, 2024

Your basis seems to be no-one is ever going to write bad code, anywhere, ever, and invoke undefined behavior. That doesn’t seem reasonable.

Also, an unwrap isn’t perfect, but it’s much better than UB. It asserts. No memory corruption, no leaking all your user’s data, no massive fines.

The equivalent to C++ would be an unchecked unwrap in an unsafe code block, and that would throw up flags during review in any Rust codebase.

jpc0 · on Dec 24, 2024

An unchecked dereference should also throw up flags during review in a C/C++ codebase. I didn't assume that nobody would make mistakes. My argument has always been that you use a language like C++ where needed. Most of your code should be in a GC language. Going in with that mentality, even if I wrote that code in Rust, I'm exporting a C API, which means I may as well have written the code in C++ and spend some more time in code review.

EDIT: an unwrap that crashes in a panic is a dos condition. In severity this might be worse or better depending where it happens.

Both are programmer error, both should be caught in review, both aren't checked by the compiler.

jodrellblank · on Dec 24, 2024

> "Rust isn't designed for efficiency"

Citation needed, because Graydon Hoare the original Rust creator (who has not been involved with Rust development for quite a long time) wrote about how the Rust that exists is not like the original one he was designing:

- "Tail calls [..] I got argued into not having them because the project in general got argued into the position of "compete to win with C++ on performance" and so I wound up writing a sad post rejecting them which is one of the saddest things ever written on the subject. It remains true with Rust's priorities today"

- "Performance: A lot of people in the Rust community think "zero cost abstraction" is a core promise of the language. I would never have pitched this and still, personally, don't think it's good. It's a C++ idea and one that I think unnecessarily constrains the design space. I think most abstractions come with costs and tradeoffs, and I would have traded lots and lots of small constant performancee costs for simpler or more robust versions of many abstractions. The resulting language would have been slower. It would have stayed in the "compiled PLs with decent memory access patterns" niche of the PL shootout, but probably be at best somewhere in the band of the results holding Ada and Pascal."

https://graydon2.dreamwidth.org/307291.html

jpc0 · on Dec 24, 2024

The fact that by default array access is bounds checked in Rust and by default it isn't in C++ disproves that.

I think you would have a hard time convincing the C++ standards committee to put a checked container in the standard, maybe now with the negative publicity maybe but definitely not before.

I'm guessing it would be impossible to get an unchecked container into the rust stdlib.

estebank · on Dec 26, 2024

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get...

saghm · on Dec 25, 2024

My point isn't that people aren't going to write bugs in Rust; my point is that people _will_ write bugs in literally any language, and bugs that cause panics in Rust are not going to expose the same level of vulnerability as bugs in C++ that cause UB.

There clearly are people who think that UB isn't as dangerous as I do, so if that's where you stand, I guess it is just a "fundamental difference of opinion". If you actually believe that you (or anyone else) is capable of being careful enough that you aren't going to accidentally write code that causes undefined behavior, then I don't think you're wrong as much as delusional.

Maxatar · on Dec 24, 2024

It's okay to admit you were wrong.

CyberDildonics · on Dec 24, 2024

I think if you had something real to say here you would have done it already.

CJefferson · on Dec 24, 2024

I just want to pick out one thing:

It would (or should) crash your program in any other language anyway.

To me there is a massive difference between "there is a way a malicious user can trigger an assert, which will crash your server / webbrowser", and "there is a way a malicious user can use memory corruption to take over your server / webbrowser".

And bounds-checking penalties are small. I can tell they are small, because Rust doesn't have any kind of clever 'global way' of getting rid of them, and I've only noticed the cost showing up in profiles twice. In those cases I did very carefully verify my code, then turn off the bounds checking -- but that's fine, I'm not saying we should never bounds check, just we should do it by default. Do you have any evidence the costs are expensive?

CyberDildonics · on Dec 24, 2024

If you profile looping through an array of pixels in linear memory order vs bounds checking every access it should be more expensive due to the branching.

As someone else mentioned you can boundary check access if you want to in C++ anyway and it's built in to a vector.

My point here is that it isn't really a big problem with C++, you can get what you want.

Maxatar · on Dec 24, 2024

Iterating over an array in Rust does not involve bounds checking every access since Rust guarantees immutability to the container over the course of the iteration. Hence all that's needed is to access the size of the array at the start of the iteration and then it is safe to iterate over the entire array.

Note that this is not something C++ is able to do, and C++ has notoriously subtle iterator invalidation rules as a consequence.

CyberDildonics · on Dec 24, 2024

They asked for evidence of bounds checks being expensive, please try to keep the context in mind.

Note that this is not something C++ is able to do

Or you could just not mutate the container.

jodrellblank · on Dec 24, 2024

> "They asked for evidence of bounds checks being expensive"

They said "bounds-checking penalties are small" not that it was free. You've described a situation where bounds checking happens and declared that it "should be" expensive because it happens. You haven't given evidence that it's expensive.

CyberDildonics · on Dec 26, 2024

I didn't give direct evidence, that's true.

You can profile it yourself, but it is the difference between direct access to cached memory (from prefetching) and two comparisons plus two branches then the memory access.

To someone who doesn't care about performance having something trivial become multiple times the cost might be acceptable (people use scripting languages for a reason), but if your time is spent looping through large amounts of data, that extra cost is a problem.

Maxatar · on Dec 24, 2024

It's the optimization that C++ is unable to perform, not the immutability.

CyberDildonics · on Dec 24, 2024

Are you talking about optimizing out bounds checking that isn't happening in the first place?

pptr · on Dec 24, 2024

If you declare using an invalidated iterator as UB, the compiler can optimize as if the container was effectively immutable during the loop.

Gibbon1 · on Dec 24, 2024

I've been wondering how much you'd pay if you just required that out of bounds accesses be a no-op with an option of calling an error handler. De-reference a null or out of bounds pointer you get zero. Write out of bounds, does nothing.

  int a = v[-100]; // yolo a is set to 0

I really suspect that unlike the RISC mental model compiler writers think in terms of a superscalar processor would barely be slowed down. That is if the compiler doesn't delete the checks because it can prove they aren't needed.

superblas · on Dec 25, 2024

My understanding is that the cost that is mentioned comes from the branch introduced by checking if the dereference is in-bounds (and the optimizations these checks allegedly prohibit), not from what happens if an OOB access is detected

Gibbon1 · on Dec 28, 2024

Where I feel that breaks down while probably leet code would be faster. But high quality code tends to have checks to make sure operations are safe. So you got branches anyways in real code.

carom · on Dec 24, 2024

You have to do a very similar operation to has_value() .value() with Rust's optional though... How is opt.ok_or("Value not found.")?; so different from the C++ type?

Kranar · on Dec 24, 2024

It seems like people think dereferencing an unengaged optional results in a crash or exception, but ironically you are actually less likely to get a crash from an optional than you would from dereferencing an invalid pointer. This snippet of code, for example, is not going to cause a crash in most cases, it will simply return a garbage and unpredictable value:

    auto v = std::optional<int>();
    std::cout << *v << std::endl;

While both are undefined behavior, you are actually more likely to get a predictable crash from the below code than the above:

    int* v = nullptr;
    std::cout << *v << std::endl;

I leave it to the reader to reflect on the absurdity of this.

d0mine · on Dec 24, 2024

What is the desired behavior? I see at least 3 options: panic (abort at runtime predictably), compiler error (force handling both Some&Nothing cases [needs language support otherwise annoying], exceptions (annoying to handle properly). There is too much undefined behavior already.

Perhaps, 3 types could exist for 3 options.

carom · on Dec 24, 2024

I understand what undefined behavior is, I just don't dereference pointers or optionals without first checking them against nullptr or nullopt (respectively). In fact, I generally use the .has_value() and .value() interface on the optional which, to my point in the above comment, is a very similar workflow to using an optional in Rust.

I think if you adopted a more defensive programming style where you check your values before dereferencing them, handle all your error cases, you might find C++ is not so scary. I would also recommend not using auto as it makes the types less clear.

    std::optional<int> v = std::nullopt;
    if (v == std::nullopt) {
        return std::unexpected("Optional is empty.");
    }

    std::println("{}", *v);

If you are dereferencing things without checking that they can be dereferenced I don't know what to tell you.

Maxatar · on Dec 24, 2024

One is undefined behavior which may manifest as entirely unpredictable side-effects. The other has semantics that are well specified and predictable. Also while what you wrote is technically valid, it's not really idiomatic to write it the way you did in Rust, you usually write it as:

    if let Some(value) = some_optional {
    }

At which point value is the "dereferenced" value. Or you can take it by reference if you don't want to consume the value.

carom · on Dec 24, 2024

I prefer ok_or for optionals and map_err for for results in Rust. I believe the way you proposed ends up with very deeply nested code which I try to avoid.

calo_star · on Dec 25, 2024

`std::optional<T>`'s style is more akin to using

  if x.is_some() {
    let x_value = unsafe { x.unwrap_unchecked() };
  }

everywhere.

tangus · on Dec 24, 2024

You can pay the penalty and get safe container access, too. AFAIK, C++ standard containers provide both bounds checked (.at()) and unchecked ([]) element retrieval.

CJefferson · on Dec 24, 2024

Now we are getting down to a philosophical issue (but I think an important one).

In Rust, they made the safe way of writing things easy (just write v[x]), and the unsafe one hard (wrap your code in 'unsafe'). C++ is the opposite, it's always more code, and less standard (I don't think I've seen a single C++ tutorial, or book, use .at() as standard rather than []), to do bounds checked.

So, you can write safe code in both, and unsafe code in both, but they clearly have a default they push you towards. While I think C++'s was fine 20 years ago, I feel nowadays languages should push developers to write safer code wherever possible, and while sometimes you need an escape hatch (I've used unsafe in a very hot inner loop in Rust a couple of times), safer defaults are better.