You don't need machine learning, you need a better architecture. Micro kernels c...

adrianN · on Oct 3, 2019

If you're willing to throw a large part of the existing code away to move to a microkernel architecture, you might as well rewrite it in a language that doesn't have race conditions by design.

jacquesm · on Oct 3, 2019

Race conditions can be constructed in almost any language. It is more of a systemic thing than a language artifact though some languages are more race condition prone than others.

You can even have race conditions in hardware. Multi-threading or distributed software are excellent recipes for the introduction of race conditions, sometimes so subtle that the code looks good and the only proof that there is something bad going on is the once-in-a-fortnight system lockup.

Symmetry · on Oct 3, 2019

One of my professors in processor design had a story about debugging a register leak in an out of order processor with register renaming.

lokedhs · on Oct 3, 2019

I believe the undocumented instructions in the 6502 was caused by race conditions in the CPU. Some of these instructions are actually usable while others give random results.

jacquesm · on Oct 3, 2019

I wrote a 6502 assembler and had a really hard time not giving $88 it's own mnemonic in the main table rather than to load it at the start of every file from an include. It was just too useful :)

lokedhs · on Oct 7, 2019

Why wouldn't you? These opcodes seems to be used by most people these days, so there seems to be no reason not to treat them just like any other opcode?

mondoshawan · on Oct 3, 2019

Ah, no. Those are caused by don't care states in the logic of the PLA instruction decode block.

vkaku · on Oct 3, 2019

Race conditions can be created by I/O paths. I'm willing to state that unless the said language abstracts away all I/O, it will always be possible to introduce a bug :)

ukj · on Oct 3, 2019

Haskell Monads!

#Sarcasm

moomin · on Oct 3, 2019

The funny thing about this being that Linux was originally announced on comp.os.minix.

tormeh · on Oct 3, 2019

https://redox-os.org/

danmg · on Oct 3, 2019

How does rust negate data race conditions? These aren't memory errors.

jolux · on Oct 3, 2019

The borrow checker prevents it. Race conditions that aren’t data races are not prevented, though.

tormeh · on Oct 3, 2019

It's an OS with microkernel written in Rust. It's literally the first thing you see on the website. So it's both microkernel, and Rust, so it is an example of people doing what parent mentioned.

danmg · on Oct 3, 2019

Yikes.

tormeh · on Oct 3, 2019

Yeah, I exploded a bit. Sorry about that :(

not_a_cop75 · on Oct 3, 2019

Is the trade off efficiency? Because I'm sure a language that is willing to wait 5 seconds between operations can do pretty well at eliminating race conditions.

jacquesm · on Oct 3, 2019

Race conditions do not have anything to do with absolute time, it is simply an assumption about the sequence in which things are going to happen and finding out in practice that the implementation allows other sequences the output of which is undefined.

not_a_cop75 · on Oct 3, 2019

Most absolutes spoken about anything are natural fallacies. The truth is that they can. Hell, I see Apple devices that don't update anything including time until you take them out of screen lock, and you see the previous date and charge amount sometimes even after unlock. That is an efficiency based form of race condition.

atq2119 · on Oct 3, 2019

Perhaps the microkernel is free of race conditions. But now you've introduced a zoo of interacting processes (drivers, servers, etc.) and with that a new class of race condition in their interactions, which may or may not be harder to debug than what you started out with.

jacquesm · on Oct 3, 2019

That zoo will still have a lot of advantages over a monolith, and one of those advantages will be that if the components stick to their defined interfaces for interaction that your chances of race conditions are substantially limited.

gpderetta · on Oct 3, 2019

What magic in microkernels prevents race conditions?

jacquesm · on Oct 3, 2019

That they allow for reduction of scope to the point that race conditions can be mostly eliminated. A macro kernel has untold opportunities for race conditions because the kernel itself is multi-threaded. Effectively every driver is a potential candidate.

In a - properly implemented - micro kernel you will have each of those drivers isolated in its own user process, and if there is an issue (memory corruption, crash, logic error or race condition) the possibility of its effects spreading is limited.

Isolation is half the battle when dealing with issues such as these.

gpderetta · on Oct 3, 2019

> Isolation is half the battle when dealing with issues such as these.

having had to deal with race conditions between processes running at opposite side of a continent, color me unimpressed.

jacquesm · on Oct 3, 2019

That's a sign of a very poor architecture, see also: the fallacies of distributed systems.

joosters · on Oct 3, 2019

Do you mean that each driver is forced to be single-threaded, somehow? Or just that the drivers are isolated, so they can have as many race conditions as they like, but the microkernel will keep running?

Because I don't see the advantage here. So what if my filesystem driver is isolated, a race condition in there is still going to corrupt all my data and I'm not going to be cheered up by the fact that the kernel can keep on ticking.

jacquesm · on Oct 3, 2019

That's one way of looking at it.

Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.

And being able to identify the source of a problem allows for several other options: restart the process, proper identification of a longer term resolution because of the ability to log the error much closer to where it originated.

These are not silver bullets but structural approaches acknowledging the fact that software over a certain level of complexity is going to be faulty.

In my opinion the only system that gets this right is Erlang. (not the language, but the whole system)

joosters · on Oct 3, 2019

I take your point about Erlang, and its systems for restarting/replacing failed code & processes, but I don't see how this eliminates race conditions or their consequences.

One problem here is that it's not always a simple layer on layer order of dependencies, they can work both ways. Take swapping/paging for example. It can't be isolated as a separate driver or service. The kernel relies upon it, and so do processes, and yet the service itself relies upon filesystems, a separate layer again. A race condition in the swap system (like one of the bugs found by the tool in question) would break multiple layers, or a race in the filesystem could cause the swap system to fail. There is no real isolation here that saves an OS from failures.

jacquesm · on Oct 3, 2019

It's not 'magic', but will eliminate a whole swath of possibilities.

One reason why race conditions in monolithic kernels are very common is due to the requirement that code is re-entrant because of multi-threaded execution of the kernel itself. In a micro kernel situation the multi-threading can be avoided because you have enough processes to spread the cpu cores over to have a number of them active at the same time without having two or more of them active within the same process.

gpderetta · on Oct 3, 2019

> Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.

is there any reason to believe that most (or even a just considerable part) of race conditions in a kernel are cross module?

jacquesm · on Oct 3, 2019

Not necessarily, but the effects of a race condition (for instance: a privilege escalation) would be limited to the process the race was found in. That would substantially limited the potential for damage.

ergdgdsgdgs · on Oct 3, 2019

They're small :P!

Less facetiously, some micro kernels have been proven formally correct. You need a small code to prove formally correct.

And even on an intuitive level, if the code is small I can "hold" it all in my head - at least really grock it in a way you can 1E6 loc.

gpderetta · on Oct 3, 2019

why would a microkernel be smaller than than a monolithic kernel, given the same feature set? Unless you claim that async message passing leads to shorter code than procedure calls.

jacquesm · on Oct 3, 2019

Because it gets spread around over multiple processes you get process based isolation for each module that normally all lives within the same address space. So you may be able to break a module but you can't use that to elevate yourself to lord of the realm.

gpderetta · on Oct 3, 2019

I very much agree that a microkernel can be both more robust and secure. I'm arguing against being easier to understand. Modularity does not require address space separation, which is a runtime property, not a code organizations feature.

jacquesm · on Oct 4, 2019

The one tends to go hand-in-hand with the other, in my experience. Of course, that's anecdata, but the more complex projects that I've worked on that used message passing separated much more naturally and cleanly into isolated chunks of code that produced their own binaries. Essentially message passing and micro kernels dictate a services based approach, which is an excellent match for OS development.