Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I prefer language constructs define that new storage is zero-initialized. It doesn't prevent all bugs (i.e. application logic bugs) but at least gives deterministic results. These days it's zero cost for local variables and near-zero cost for fields. This is the case in Virgil.


That makes things worse if all-zero is not a valid value for the datatype. I'd much prefer a set-up that requires you to initialise explicitly. Rust, for example, has a `Default` trait that you can implement if there is a sensible default, which may well be all-zero. It also has a `MaybeUninit` holder which doesn't do any initialisation, but needs an `unsafe` to extract the value once you've made sure it's OK. But if you don't have a suitable default, and don't want/need to use `unsafe`, you have to supply all the values.


C & C++ run on systems where it may not be zero cost. If you need low latency startup it could be a liability to zero out large chunks of memory.


I think it's acceptable to leave an escape hatch for these situations instead of leaving it to easy to misunderstand nooks and crannies of the standard.

You don't want to zero out the memory? Slap a "foo = uninitialized" in there to have that exact behavior and get the here be demons sign for free.


Yeah this issue is super obvious and non-controversial.

Uninitialized state is totally fine as an opt-in performance optimization. But having a well defined non-garbage default value should obviously be the default.

Did C fuck that up 50 years ago? Yeah probably. They should have known better even then. But that’s ok. It’s a historical artifact. All languages are full of them. We learn and improve!


I don't know, I expect all variables to be uninitialized until proven otherwise. It makes it easier for me to reason about code, especially convoluted code. But I also like C a lot and actually explicitly invoke UB quite often, so there is that.


I like C and it's great. I wish more people wrote C instead of C++. But there's a reason that literally no modern language makes this choice.

If uninitialization was opt-in you would still be free to "assume uninitialized until proven otherwise". But uninitialized memory is such a monumental catastrophic footgun that really is not a justifiable reason to make that default behavior. Which, again, is why no modern languages make that (terrible) design choice.


I am talking about random convoluted code, I did neither wrote nor control. The UB does not only help the compiler, it also helps me the reverse engineer, since I also can assume that an access without a previous write is either a bug, or I misinterpreted the control flow.


You can assume whatever initialization you want when reading code, even if it's not in the standard. Is your concern that people would start writing code assuming zero-init behavior (as they already do)?

That purpose would be better served by reclassifying uninitialized reads as erroneous behavior, which they are for C++26 onwards. What useful purpose is served by having them be UB specifically?


> Is your concern that people would start writing code assuming zero-init behavior (as they already do)?

Yes, I couldn't assume that such code can be deleted safely. Not sure, if people really rely on it, given that it doesn't work.

> erroneous behavior

So they finally did the thing and made the crazy optimizations illegal?

> If the execution of an operation is specified as having erroneous behavior, the implementation is permitted to issue a diagnostic and is permitted to terminate the execution of the program.

> Recommended practice: An implementation should issue a diagnostic when such an operation is executed. [Note 3: An implementation can issue a diagnostic if it can determine that erroneous behavior is reachable under an implementation-specific set of assumptions about the program behavior, which can result in false positives. — end note]

I don't get it at all. The implementation is already allowed to issue diagnostics as it likes including when the line number of the input file changes. In the case of UB it is also permitted to emit code, that terminates the program. This sounds all like saying nothing. The question is what the implementation is NOT allowed to do for erroneous behaviour, that would be allowed for undefined behaviour.

Also if they do this, does that mean that most optimizations are suddenly illegal?

Well, yeah the compiler can assume UB never happens, optimizes and that can sometimes surprise the programmer. But I the programmer also program based on that assumption. I don't see how defining all the UB serves me.


UB doesn't mean there will be nasal demons. It means there can be nasal demons, if the implementation says so. It means the language standard does not define a behavior. POSIX can still define the behavior. The implementation can still define the behavior.

Plenty of things are UB just because major implementations do things wildly differently. For example:

    realloc(p, 0)
Having initialization be UB means that implementations where it's zero cost can initialize them to zero, or implementations designed for safety-critical systems can initialize them to zero, or what have you, without the standard forcing all implementations to do so.


> UB doesn't mean there will be nasal demons. It means there can be nasal demons, if the implementation says so.

Rather "if the implementation doesn't say otherwise".

Generally speaking compiler writers are not mustache-twirling villains stroking a white cat thinking of the most dastardly miscompilation they could implement as punishment. Rather they implement optimisation passes hewing as close as they can to the spec's requirements. Which means if you're out of the spec's guarantees you get whatever emergent behaviour occurs when the optimisation passes run rampant.


This is both factually incorrect and philosophically unsound.

Every asm or IR instruction is emitted by the compiler. It isn't a "doesn't say otherwise" kind of thing. Whatever the motivations are, the compiler and its authors are responsible for everything that results.

"if you're out of the spec's guarantees you get whatever emergent behaviour occurs" is simply and patently not factual. There isn't a single compiler in existence for which this is true. Every compiler makes additional guarantees beyond the ISO standard, sometimes due to local dialect, sometimes due to other standards like POSIX, sometimes controlled by configuration or switches (e.g., -fwrapv).


Yeah that’s just really bad language design. Which, again, literally no modern languages do because it’s just terrible horrible awful no good very bad design.


It's describing rather than prescribing, which yeah isn't really design. Most modern languages don't even (plan to) have multiple implementations, much less a standard.


All of that implementation freedom is also available if the behavior is erroneous instead. Having it defined as UB just gets you nasal demons, which incidentally this rule leads to on modern compilers. For example:

https://godbolt.org/z/ncaKGnoTb


There are non-standard mechanisms to control variable initialization. GCC has -ftrivial-auto-var-init=zero for zero-init of locals (with some caveats). For globals, you can link them into a different section than bss to disable zero-init.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: