If you're willing to throw a large part of the existing code away to move to a microkernel architecture, you might as well rewrite it in a language that doesn't have race conditions by design.
Race conditions can be constructed in almost any language. It is more of a systemic thing than a language artifact though some languages are more race condition prone than others.
You can even have race conditions in hardware. Multi-threading or distributed software are excellent recipes for the introduction of race conditions, sometimes so subtle that the code looks good and the only proof that there is something bad going on is the once-in-a-fortnight system lockup.
I believe the undocumented instructions in the 6502 was caused by race conditions in the CPU. Some of these instructions are actually usable while others give random results.
I wrote a 6502 assembler and had a really hard time not giving $88 it's own mnemonic in the main table rather than to load it at the start of every file from an include. It was just too useful :)
Why wouldn't you? These opcodes seems to be used by most people these days, so there seems to be no reason not to treat them just like any other opcode?
Race conditions can be created by I/O paths. I'm willing to state that unless the said language abstracts away all I/O, it will always be possible to introduce a bug :)
It's an OS with microkernel written in Rust. It's literally the first thing you see on the website. So it's both microkernel, and Rust, so it is an example of people doing what parent mentioned.
Is the trade off efficiency? Because I'm sure a language that is willing to wait 5 seconds between operations can do pretty well at eliminating race conditions.
Race conditions do not have anything to do with absolute time, it is simply an assumption about the sequence in which things are going to happen and finding out in practice that the implementation allows other sequences the output of which is undefined.
Most absolutes spoken about anything are natural fallacies. The truth is that they can. Hell, I see Apple devices that don't update anything including time until you take them out of screen lock, and you see the previous date and charge amount sometimes even after unlock. That is an efficiency based form of race condition.
Perhaps the microkernel is free of race conditions. But now you've introduced a zoo of interacting processes (drivers, servers, etc.) and with that a new class of race condition in their interactions, which may or may not be harder to debug than what you started out with.
That zoo will still have a lot of advantages over a monolith, and one of those advantages will be that if the components stick to their defined interfaces for interaction that your chances of race conditions are substantially limited.
That they allow for reduction of scope to the point that race conditions can be mostly eliminated. A macro kernel has untold opportunities for race conditions because the kernel itself is multi-threaded. Effectively every driver is a potential candidate.
In a - properly implemented - micro kernel you will have each of those drivers isolated in its own user process, and if there is an issue (memory corruption, crash, logic error or race condition) the possibility of its effects spreading is limited.
Isolation is half the battle when dealing with issues such as these.
Do you mean that each driver is forced to be single-threaded, somehow? Or just that the drivers are isolated, so they can have as many race conditions as they like, but the microkernel will keep running?
Because I don't see the advantage here. So what if my filesystem driver is isolated, a race condition in there is still going to corrupt all my data and I'm not going to be cheered up by the fact that the kernel can keep on ticking.
Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.
And being able to identify the source of a problem allows for several other options: restart the process, proper identification of a longer term resolution because of the ability to log the error much closer to where it originated.
These are not silver bullets but structural approaches acknowledging the fact that software over a certain level of complexity is going to be faulty.
In my opinion the only system that gets this right is Erlang. (not the language, but the whole system)
I take your point about Erlang, and its systems for restarting/replacing failed code & processes, but I don't see how this eliminates race conditions or their consequences.
One problem here is that it's not always a simple layer on layer order of dependencies, they can work both ways. Take swapping/paging for example. It can't be isolated as a separate driver or service. The kernel relies upon it, and so do processes, and yet the service itself relies upon filesystems, a separate layer again. A race condition in the swap system (like one of the bugs found by the tool in question) would break multiple layers, or a race in the filesystem could cause the swap system to fail. There is no real isolation here that saves an OS from failures.
It's not 'magic', but will eliminate a whole swath of possibilities.
One reason why race conditions in monolithic kernels are very common is due to the requirement that code is re-entrant because of multi-threaded execution of the kernel itself. In a micro kernel situation the multi-threading can be avoided because you have enough processes to spread the cpu cores over to have a number of them active at the same time without having two or more of them active within the same process.
> Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.
is there any reason to believe that most (or even a just considerable part) of race conditions in a kernel are cross module?
Not necessarily, but the effects of a race condition (for instance: a privilege escalation) would be limited to the process the race was found in. That would substantially limited the potential for damage.
why would a microkernel be smaller than than a monolithic kernel, given the same feature set? Unless you claim that async message passing leads to shorter code than procedure calls.
Because it gets spread around over multiple processes you get process based isolation for each module that normally all lives within the same address space. So you may be able to break a module but you can't use that to elevate yourself to lord of the realm.
I very much agree that a microkernel can be both more robust and secure. I'm arguing against being easier to understand. Modularity does not require address space separation, which is a runtime property, not a code organizations feature.
The one tends to go hand-in-hand with the other, in my experience. Of course, that's anecdata, but the more complex projects that I've worked on that used message passing separated much more naturally and cleanly into isolated chunks of code that produced their own binaries. Essentially message passing and micro kernels dictate a services based approach, which is an excellent match for OS development.