I can't help but notice how this code, as well as in the article, there's no che...

yarcob · on Dec 7, 2020

That code is not a general purpose library function; it assumes the element exists.

If you call that method and the element doesn't exist, presumably something has gone wrong already. Adding a NULL pointer check is not going to fix it. You just silently ignore the error. You'll prevent the crash, but there's no mechanism for handling the error.

adrianN · on Dec 7, 2020

And instead of adding a check and crashing visibly here you wait even longer until your program misbehaves due to the error? The longer the program runs after something went wrong, the harder it is to debug, and the more likely it becomes that you get an exploitable bug.

scythe · on Dec 7, 2020

Where you add safety checks like that is a top-down system organization issue, not something you can couch quarterback while looking at twelve lines from a million-line codebase. There is simply no way to determine if an assert is appropriate without further context, and it has nothing to do with the choice of algorithm.

snarfy · on Dec 7, 2020

Performance is a feature and that null check has a cost. The code is correct. The null check is unnecessary. The bug is passing null in.

cmroanirgo · on Dec 7, 2020

NULL being passed in is not the only problem. Dangling pointers, coding mistakes by other team members and race conditions during initialisation can all contribute to this blowing up. Some argue (correctly) that it's best to have it blow up, other that assertions are the way to go. I'm of the latter ideology. Make your debug build do assertions, but release code be crash proof by doing NULL checks.

There are several comments indicating the performance cost of a NULL check. Go and look at the disassembler and see the code. You should find that it equates to JZ, which at worst case costs 1 clock cycle. Am I incorrect?

So, another comment below mentioned 10 million deletions per second. Ignoring the fact that you should be using a double linked list at that point (like the kernel does), in a single threaded 2.4GHz CPU a 1 step opcode (JZ) equates to a bit over 4ms/sec. With branch prediction, I expect to be far less.

So, the NULL check is almost free. Even on embedded systems, it'd still be a good idea to keep it in, to insulate yourself from flipped bits (eg, solar flares, or degrading memory) and coding mistakes elsewhere.

In these days of security research, I strongly advise for defensive code: it means that each function has had it's edge cases thought about, which means the developer has spent time thinking about the stability of their program, which cannot be a bad thing. Don't make your code a deliberate point of exploitation.

nomel · on Dec 7, 2020

This makes a big assumption that the null check, which is the result of a programmer error, can be recovered. What would you do once you catch the null? Log that there was a null, maybe with some debug information, then halt? I personally would prefer a proper crash handler to do all of that for me.

I guess my question is, what's wrong with crashing, in the case of a real screwup, where something has gone horrible wrong, assuming you have a proper crash handler?

s1dev · on Dec 7, 2020

it's one instruction in the generated code and we can reasonably expect the branch predictor to do the right thing, but branches and error handling can inhibit vectorization which is a much higher performance cost. While in this particular case you probably aren't vectorizing much, it is not always the case that error handling is almost free

stuff4ben · on Dec 7, 2020

Safety is a feature and that null check prevents bugs. I'd rather have slow but safe code rather than fast but buggy code.

Sohcahtoa82 · on Dec 7, 2020

If the null check silently ignores the attempt to remove a non-existent item from the list, then the null check did not prevent a bug, it merely hid it.

IMO, even though this is C, the Zen of Python still applies, which states:

> Errors should never pass silently unless explicitly silenced.

Its the caller with the bug, not the library code.

ninkendo · on Dec 8, 2020

Can we all at least agree that if the language could express non-nullable pointers, we'd all be better off? (Something like `Option<T>` in Rust, or Swift's `?` sugar for their `Optional`, etc etc)

Then the type of the function would just declare a that the pointer can't be null, and code which sends a nullable pointer would refuse to compile.

pmart123 · on Dec 7, 2020

Adding checks creates more code branches and tests. Passing null in is the bug.

freemint · on Dec 7, 2020

Can you show it has any impact on benchmarks? This is something speculative execution should take care of that quickly?

"Premature optimization.... " you get the drift.

hinkley · on Dec 7, 2020

Make it work

Make it right

Make it fast

This is an ordered list, and the people who forget that make a lot of work for the people who don’t

Sohcahtoa82 · on Dec 7, 2020

I would argue that if you have code that is trying to remove a non-existent item from a list, then making it silently fail is only giving the appearance of making it work.

It's definitely not right, while probably not really working either. The immaterial performance gain from removing the null check is completely irrelevant.

It's a foot-gun for sure, but it's up to you to not pull the trigger.

larsrc · on Dec 7, 2020

Are you arguing that we should leave known crash bugs in the code?

adrianN · on Dec 7, 2020

I'm arguing that we should use asserts or similar defensive coding mechanisms to ensure that preconditions are met instead of merrily continuing to run our program with cascading error effects.

BoorishBears · on Dec 7, 2020

I've had this argument a ton with people used to new high level languages that make null safety "too easy" like:

  next?.doThing()

Where doThing is never called if next is null.

Languages that do this use "scary" operators to crash on null:

  next!!.doThing()

And it's drilled into people's heads the latter is a Bad Thing (tm)

-

You really need to consider context in null handling.

Imagine an app that alerts a nurse when the patients heartbeat is out of range.

In an application where the UI context might have been closed out, it's common to see

  someUiContext?.showAlert()

But what actually happens if the context is gone?

It's better to crash and have part of your startup procedure be communicating that a crash occurred, and the doctor should check that something went wrong, than silently continuing.

-

The problem is when you tell people this, the kneejerk reaction is always "are you saying we intentionally add crashes"!

Because safe null handling was specifically added to avoid the situation where that crashes...

(For example, if the app was a news reader and the alert was "Article failed to load", you wouldn't want the app to crash just because the user left a certain page before the alert was shown)

But I think the pendulum has swung too far at this point, people are so used to just sweeping nulls under the rug, and it's not great for finding issues

veets · on Dec 7, 2020

Yes - and that is the point the OP you responded to is making. This is not a generic library function. It used in a specific setting where preconditions exist and presumably are checked _prior_ to calling this code.

Sure you could make the argument that we don't _know_ they are being checked, but it's a pointless discussion. Who cares? _If_ preconditions are met, this code is safe, if they aren't, it's not safe. Since we don't know one way or the other, there's no point in discussing further. The kernel developers know their stuff...

adrianN · on Dec 7, 2020

Imo every piece of code (for reasonable definitions of "piece") is supposed to check its own preconditions and not rely on the caller to check them.

scoutt · on Dec 7, 2020

It's a bad idea, especially in kernel/low-level/performant code. It's OK to check some values at specific points (like when a value is passed from user space), but in general it's bad practice to check it at every single function. You trust in your program flow.

Imagine if at every function down a complex stack you go with:

    if (!ptr1 || !*ptr1 || *ptr1 > 5 || param1 < 0 || param2)
        /* etc.... */
    {
        return NULL;
    }

(used arbitrary names and values).

veets · on Dec 7, 2020

This is nice in theory but a bad idea in practice (as a blanket statement, I am all for checking preconditions in general). An easy example is binary search, which I think is a reasonable "piece" of code by your definition. One should never check its preconditions _inside_ the binary search function (that the list of elements being searched is partitioned by the search predicate). Checking the precondition is O(n) while the algorithm is O(lg n).

adrianN · on Dec 7, 2020

You're right, but checking for null is not terribly expensive compared to iterating over a linked list.

HunterWare · on Dec 7, 2020

The thing is that you aren’t really advocating for just that... Since this is pretty general support library code, the result of this philosophy is defensive checks in all support functions and that means checks every spinlock, mutex/semaphore, and god knows how many other places... everything really. We also would want these checks at multiple levels since we shouldn’t trust that other people checked things. This would have a significant effect on performance and many would prefer their system not to run however-many percent slower. All to avoid a possible issue that can’t be very large or systems would be kernel panic’ing all over the place.

From a black box, zero-knowledge, perspective it’s maybe worth remembering that code built this way successfully runs mission critical systems all over the world every day. Thoughtful people would have switched to other (slower, more pedantic) systems if doing things differently was a real overall life improvement. People have had the choice, there are plenty of OS’s out there... some far more formal/pedantic. Linux wasn’t the safe choice back in the day, it was just better by several important metrics consistently over time.

Perhaps these insanely experienced kernel devs know what they are doing.

zabzonk · on Dec 7, 2020

This isn't possible in many cases - consider the simple C library function strlen, one of who's preconditions is that it must only be called on a zero-terminated string. There is no way to write code in strlen to check that this is true.

adrianN · on Dec 7, 2020

Which is one of the reasons why coding standards like MISRA forbid using strlen (at least in C++, I guess even MISRA doesn't want to force you to write safer C with sane string types).

andrepd · on Dec 7, 2020

Asserts have run-time cost. What you need is to ensure ensure those preconditions are met in the first place, statically, by formal methods for example.

throwaway_pdp09 · on Dec 7, 2020

Which is ideal, but generally beyond reach for now. So assertions it is.

aidos · on Dec 7, 2020

It’s not a bug, in this case. It depends on the behaviour the library is specified to handle, but if you ask me to remove a specific item from a list, and that item isn’t in the list, what should I do?

I can fail so you can catch it, or if you don’t catch it, you’ll at least see exactly what went wrong. You expected the item to be in the list and it wasn’t. Your preconditions were wrong from the beginning and you have an (unknown) bug in the code.

The alternative is that I can’t find the item and I silently suck it up and don’t notify you. If that’s the api we’ve all agreed on, that’s fine. It was on you to check for the thing in the list first and then remove it. But it’s a bit weird if you think about it - now you need to traverse the list twice. Once to check for the existence (and fail out yourself) and once to remove.

christophilus · on Dec 7, 2020

I think a better approach is to return a bool or something like that, rather than crashing. But I agree that silently handling this case is the worst of the options.

banachtarski · on Dec 7, 2020

What you are espousing is known as “defensive programming” which is very much a bad thing (tm).

adrianN · on Dec 7, 2020

I'm interested in hearing more about why defensive programming is bad. Do you have some links for me?

stuff4ben · on Dec 7, 2020

Clearly you've never worked with contract developers whose only purpose is to write a crap ton of code to get a "feature" out that crashes as soon as it encounters the real world. As they say, "doveryay, no proveryay" and it's served me well.

hyperdimension · on Dec 7, 2020

> As they say, "doveryay, no proveryay"

I'm sorry if this might be common knowledge, but I've honestly never heard this and can't understand it.

Is it saying that it's better to do something than to (formally) prove it, like the opposite of (I think) Knuth's famous quote?

DanBC · on Dec 7, 2020

"trust, but verify". https://en.wikipedia.org/wiki/Trust,_but_verify

JulianMorrison · on Dec 7, 2020

Then it ought to document that it assumes that entry exists and that this walk will not hit the end of the list. Which also implies the list is not empty.

Cthulhu_ · on Dec 7, 2020

It should, yeah, but if it can safely assume there will be no nulls, it will save a nontrivial amount of defensive programming checks for each cycle. And IIRC, the Linux kernel uses tons of linked lists, millions of items per second sometimes; any operation that can be removed will make a big impact. Relatively.

But for general purpose code, e.g. libraries where performance isn't super important but stability is, more defensive programming would be a thing to do.

mjevans · on Dec 7, 2020

I believe the implicit requirement is that the item is known to exist within the list. With that requirement in place the extra guards aren't requisite (they're implicit) and in kernel code this probably offers a performance edge and is a reasonable requirement. Either having an implementation that has an extra check once at the end of the list, or that verifies the presence of the item and passes that node through to a deletion subroutine, would be good alternatives but still utilize the same pointer to structure view.

shallowthought · on Dec 7, 2020

That's a pretty big freakin thing to leave out, though. Code that depends on its context is, some would say, bad.

silisili · on Dec 7, 2020

Simply put, safety slows code down. It's a matter of whether you know what's happening underneath or not. The more you try to make C completely safe, the more you slow it down and therefore remove the need to have written it in C in the first place. Whether that's a good thing or not is an exercise for the implementer.

inglor_cz · on Dec 7, 2020

True, but costs of comparison to NULL aren't massive. It is one of the fastest things out there.

It was my experience when tuning a linear algebra library (just for internal use) that such comparisons are almost unobservable in total performance tests.

atq2119 · on Dec 7, 2020

If you're writing a linear algebra library, then the bulk of executing code should be floating point vector operations in tight loops with known bounds. A NULL pointer check in the prolog where you set up the loop is going to be negligible.

This is very different from kernel code, which by its nature isn't very computational but doing resource management all day long and pretty much all it does is stuff that looks like walking linked lists.

inglor_cz · on Dec 7, 2020

That was a very specific linear algebra, we toyed around with huge but sparse matrices. Not anything graphics-related, rather number theory-related. But there was a lot of integer comparisons in there.

This was around 2001-2002 anyway. Things might have changed.

scoutt · on Dec 7, 2020

> costs of comparison to NULL aren't massive

Depends on how often you are doing the comparison. Is it 1 time/second, or 10 million times a second?

There is a difference.

If the code above is the one in charge of putting/removing network packets in a queue, or putting threads ordered by priority for the scheduler, then you should consider the side effects of checking for NULL.

If you are going to implement this function in a library for Jimmy The Programmer[1], then check for NULL.

[1] https://blog.codinghorror.com/new-programming-jargon/

GoblinSlayer · on Dec 7, 2020

It's written in C because C is popular, has platform support and the code was originally written in C, not because the kernel should be a security nightmare.

shallowthought · on Dec 7, 2020

That's irrelevant to what Linus is saying. He's not saying "this is good code, but only with the caveat that it's written in C and run in the Linux Kernel". He's saying that changing it to make it far more difficult to read and modify, but cleverer, has made it better code in general.

isochronous · on Dec 7, 2020

See, and this is where I (and I'm guessing you) might beg to differ. I'm a web developer, so the vast majority of my time is spent working in JavaScript. Our team has been working in ES6 for the past ~2 years. Our lead developer just LOVES him some ES6 - destructuring, aliasing, lambdas, you name it, and he's all-in.

Me, though? Well, let's just put it like this: when we find some framework-level bug that's written in "clever" ES6 syntax, our first step in debugging is almost ALWAYS to rewrite the given function traditionally, without any of the ES6 shorthand. And the reason we do that is because reading and debugging a whole stack of anonymous lambda calls is a PAIN IN THE ASS. Or figuring out where a certain variable is coming from when someone uses overly-complex destructuring syntax to magically pull a value from deep within a nested object.

I mean, don't get me wrong, I do like and use almost all of the modern ES6 niceties, but I also feel like it's much more difficult to parse and understand code compared to what we were all writing a few years back. People will, I'm sure, be arguing about what constitutes "good code" for decades to come, but to me, when working in an evolving codebase, especially with other people, plain ol' human readability is paramount. If people can't figure out what your code is doing without throwing in a breakpoint and stepping through line-by-line, you've failed at writing good code. And this will be my opinion right up until the day humans stop writing code by hand.

cmroanirgo · on Dec 7, 2020

This is why Linus' code shouldn't be copied verbatim and used in the real world. The Kernel is an extremely isolated and contained system.

At the very least, the code should add comments on the the assumptions that are vital...otherwise n00bs will copy the code as gospel and run into all sorts of problems.

ggrrhh_ta · on Dec 7, 2020

The actual code in the kernel is well documented (and different, lines 103-149): https://github.com/torvalds/linux/blob/master/include/linux/...

kbumsik · on Dec 7, 2020

You don't need to check !!indirect. There is no cases that the indirect pointer is NULL:

  while (!!(*indirect) && (*indirect) != entry)
    indirect = &(*indirect)->next;

Also I would rather use * indirect instead of !!(* indirect).

cmroanirgo · on Dec 7, 2020

correct. I was just being overly pedantic

muro · on Dec 7, 2020

Why the !! ? I thought that only makes sense in JS.

nitrogen · on Dec 7, 2020

I thought that only makes sense in JS.

In C any nonzero value is considered "truthy", and on most architectures NULL is defined to be (void * )(0) or similar. The logical not operator AKA bang operator will replace truthiness with 0, and falsiness with 1. So applying it twice collapses all nonzero values to 1.

user-the-name · on Dec 7, 2020

Well, yes, but that does absolutely nothing in this code, as it is immediately used as the argument to an if statement.

nitrogen · on Dec 7, 2020

Indeed, I was only commenting on the JS vs C part.

cmroanirgo · on Dec 7, 2020

I'm not sure about the current state of compiler optimisations, but IIRC in the olde-days, a not not in an if() statement would assemble to JZ, saving a clock cycle (& an opcode?), and wouldn't need to stall to load in the full width of the register. Today's branch predicting compilers are beyond me.

I still use it as a clarification that it's a deliberate boolean operation, rather than implicit.

user-the-name · on Dec 7, 2020

I highly doubt that any modern compiler would generate different code with or without the !!. Not even sure why an old compiler would do that? It's the exact same semantics, surely.

eps · on Dec 7, 2020

That had nothing to do with pedantry.