I'd hazard to say that every design is "flawed" in some regards: there's no way to achieve all desirable qualities and none of undesirable qualities. For one, some desirable qualities contradict each other.
So "${thing} is flawed" is not precise enough; an interesting statement would be "${thing} is not the best choice for ${conditions}". A monolithic OS is not the best choice for a high-reliability system on unreliable hardware. A microkernel OS that widely uses hardware memory protection is not the best choice for a controller with 4KB of RAM. A unikernel setup is not the best choice for a desktop system where the user is expected to constantly install new software. Etc, etc.
In other words, the ancient concept of "right tool for the job" still applies.
The main point remains: server and desktop OS kernels are a critical piece of infrastructure, worthy of full machine checked verification.
In this context, it should be pretty obvious that a design that fails to minimise the trusted computing base is flawed. Even if a kernel vulnerability is unlikely to kill any given user, the sum of all the crashes, hacks, patching effort… is huge.
I bet my hat rewriting the entire kernel for all popular OSes would be far cheaper worldwide than not doing it. (Of course, this won't happen any time soon, because path dependence and network effects. Unless maybe someone works seriously on the 30 million lines problem described by Casey Muratory.)
> In other words, the ancient concept of "right tool for the job" still applies.
Absolutely. Monolithic kernels are clearly the wrong tool for the job.
Says Simon Biggs, Damon Lee, and Gernot Heiser. Have you even read the abstract?
> The security benefits of keeping a system’s trusted computing base (TCB) small has long been accepted as a truism, as has the use of internal protection boundaries for limiting the damage caused by exploits. Applied to the operating system, this argues for a small microkernel as the core of the TCB, with OS services separated into mutually-protected components (servers) – in contrast to “monolithic” designs such as Linux, Windows or MacOS. While intuitive, the benefits of the small TCB have not been quantified to date. We address this by a study of critical Linux CVEs, where we examine whether they would be prevented or mitigated by a microkernel-based design. We find that almost all exploits are at least mitigated to less than critical severity, and 40% completely eliminated by an OS design based on a verified microkernel, such as seL4.
If the effect is that huge, of course security trumps pretty much all other considerations. These are consumer OS kernels we're talking about. One that fails to more or less maximise security is obviously the wrong tool for the job. As the title of the paper suggests, in case you failed to read that as well.
I don't see why any of those are mutually exclusive. Capability based operating systems provide least privilege security, experiments with capability-based UIs have shown they are quite intuitive and secure because they align user actions with explicit access grants, and they don't perform any worse than monolithic or microkernel operating systems.
The problems are really the flawed mental models people insist on bringing to these problems, despite decades of research showing these models are irreparably flawed.
Capability chains are relatively more expensive to check than a bitmask, or just nothing (as e.g. in an embedded system). Also, performance asks for shortest code paths and out-of-order execution, security asks for prevention of timing attacks and Spectre-like attacks.
Security requires to identify yourself with hard-to-fake means, it takes time and effort (recollecting and typing a password, fumbling with a 2FA token); ease of use asks for trusting and immediate response (light switch, tv, etc).
Performance asks for uninterrupted execution and scheduling of tasks based on throughput; ease of use asks for maximum resources given to the interactive application, and scheduling based on lowest interactive latency.
Also feature set vs source code observability, simplicity vs configurability, performance vs modularity, build time vs code size vs code speed, etc.
I'm not sure you and I are referring to the same thing by "capability". There is no chain of capabilities that needs to be checked, the reference you hold is necessary and sufficient for the operations it authorizes. In existing capability operating systems, this is a purely local operation, requiring only two memory load operations. Hardly expensive.
Every OS is vulnerable to the hardware upon which it runs, but capability security at least makes side channel attacks somewhat more difficult because of least privilege and limiting access to non-deterministic operations, like the clock.
Identity-based security built on an authorization-based model like capabilities also ensures it's difficult to make promises you can't keep. Access list models let you easily claim unenforceable properties and then people are surprised when they are easily violated.
> ease of use asks for maximum resources given to the interactive application, and scheduling based on lowest interactive latency.
That's unnecessary. You need a low latency upper bound, not a "lowest latency". This does not necessarily conflict with throughput, and furthermore, and install requiring high throughput and interactivity rarely overlap.
I'm not even sure what the rest of the properties are supposed to be about. I don't think most of those are mutually exclusive either.
So "${thing} is flawed" is not precise enough; an interesting statement would be "${thing} is not the best choice for ${conditions}". A monolithic OS is not the best choice for a high-reliability system on unreliable hardware. A microkernel OS that widely uses hardware memory protection is not the best choice for a controller with 4KB of RAM. A unikernel setup is not the best choice for a desktop system where the user is expected to constantly install new software. Etc, etc.
In other words, the ancient concept of "right tool for the job" still applies.