You've just described the hardest parts of a web browser, while ignoring the necessity of a JIT compiler for your bytecode.
To give proof by counterexample, the "simple wrapper" is a graphics stack like GTK alongside the POSIX APIs. It turns out this is insufficient for application development.
Some things you're missing are text layout and accessibility. Neither are small or simple.
The browser is the new OS and they're big and complicated like OSes by necessity, not accident.
All the things you mention could be available as a shared library, to be run inside the sandbox along with the application.
Let's keep the sandbox as simple as possible. We can build complexity inside it, and not make complexity part of it. This makes security guarantees much simpler.
The issue I see is we don't need a sandbox at all for this, we need capability based security models for applications provided by the OS vendors (like iOS, Android). Apple has taken the right steps towards such a model, yet get lambasted by developers for it. We have flatpaks and snaps for Linux, the former isn't really targeted by many and the latter is similarly lambasted by users and developers. I don't even know what Microsoft recommends today, since no developer is going to use it and that rec will change tomorrow.
While technically, complexity is not necessary, pragmatically the sandbox offers next to nothing for users or developers without OS vendors enforcing its usage, or the sandbox being so compelling a target for developers they accept the limitations of the sandbox - which is why the only platforms with robust sandboxes are mobile and the browser.
IMO iOS and Android are terrible examples of how to do this. They break decades of convention in how to write POSIX-like applications, and force developers into locked-in practices.
Android gives lip service to the NDK, but in practice you can't really run many useful C/C++/Golang/Rust/etc programs on Android, because the system breaks POSIX. You can't keep a simple webserver running without starting a foreground service, which requires Java/Kotlin. There's no /etc/resolv.conf for DNS resolution, so you have to use custom servers or use the Android NDK compiler (which can be a huge pain). Plus they lock down the filesystem more each year. Android 10/11 are a complete mess on this front. Fortunately they've rolled back some of the more draconian changes, but only because of developer outcry. The fact that they even wanted to do some of the things in the first place is sad.
I'm not saying this is an easy problem to solve, but it's obvious that the big boys have only made a token (if that) effort to build a good, open way of doing it.
I think proper sandboxing on a traditional Unix is hard enough that ~nobody does it right; you can't let any apps run as the user if you want to prevent them from snooping on each others' files, but the user needs to be able to control all of them.
I don't know of a justification for removing /etc/resolv.conf, but I guess "applications' networking should be under the control of the user" is sorta necessary; I think it'd make more sense to do it with network namespaces or iptables rules or something, but /shrug.
>> you can't let any apps run as the user if you want to prevent them from snooping on each others' files, but the user needs to be able to control all of them.
I was thinking about that recently and my conclusion will be highly controversial.
In the same way Wayland implements security that X never could, I think the GUI might be a place to implement file and folder access permissions. If the user does not grant access to a file through a dialog, the app couldn't read it. Same for folder access, which would grant an application access to an entire folder or perhaps an entire hierarchy.
Obviously this would break some things, but real change may require some of that. How much, I don't know.
>IMO iOS and Android are terrible examples of how to do this. They break decades of convention in how to write POSIX-like applications
That's a good thing. We should break "decades of convention in how to write POSIX-like applications" if we want to get any further than POSIX-like OSes and programming models...
> Squeeze extra performance out of a device to achieve low latency or run computationally intensive applications, such as games or physics simulations.
> Reuse your own or other developers' C or C++ libraries.
Writing POSIX CLI applications is not a goal.
ISO C and ISO C++, alongside OpenGL ES, Vulkan and OpenSL are more than enough to have something like Qt running.
I’m a big fan of the capability-based photo access in ios; when an app tries to access my photos it opens a system dialog which allows me to select the set of images the app can see.
That's on them ("regular computers"). Their OSes could offer the same capabilities to map to.
That said, it's not:
(a) as if there aren't cross platform photo apps that run on iOS and "regular computers". Photoshop and Lightroom come to mind immediately, Affinity Photo also. And those aren't just iOS and Mac, they're also on Windows. So there's that.
(b) as if what would slow you down / prevent you from doing such a cross platform photo app would be the iOS capability / image access request feature.
You're completely right, of course. Seeing the amount of effort put into virtualization, sandboxing, containers, etc. is enough to make one wonder why every enterprise is so shortsighted. Capabilities obviate all of the above, and are neither a new untested idea nor more complicated than what we have--as we clearly see in the repeated attempts to build things that approximate them with ACLs and sandboxes and seccomp rules and so on.
Linux is slowly moving toward capability-centric design with more and more comprehensive namespacing and file-descriptor-based interfaces to things like pidfd and memfd, but there's still a long way to go before we can jettison ambient authority such as the filesystem entirely. Meanwhile, Google's Fuchsia may deploy a capability-based operating system to the masses, but it will likely only be used to sandbox applications written for Android anyway. The real potential of capabilities is to simplify the interface of a power-user operating system by eliminating the race conditions, side channels, privacy leaks, firewalls, virus scanners, and unix-style permissions from developers' and users' day-to-day experience completely.
There will still be memory-safety zero days, of course, until we abandon languages where humans are statistically incapable of writing memory-safe code.
People (developers) want a target (sandbox) that is as "open" as their browser. This means that by definition the slightest presence of business-motives would ruin the entire idea (your example of Ubuntu's snap and Apple's iOS and Google's Android).
I think you're confusing two very different concepts - the fact the browser is sandboxed is orthogonal to its "openness." Sandboxes need not be open, and open platforms need not be sandboxed.
Fundamentally, sandboxes restrict the possible functionality of executable code to prevent it from violating some preconditions within the context a user invokes it. Rarely do developers desire that behavior (it makes it harder for your app to work!), and users complain about it too (when MacOS began requiring explicit permissions for many entitlements, many users complained about the constant nag screens). Browsers have historically needed to reinvent every wheel provided by operating systems to provide features developers require that can't work within the sandbox (cookies, websockets, etc) which prevented entire classes of application from existing.
Browsers have enormous business incentives behind them, I'm not sure why you think that ruins the idea. Google spends gobs of money to prevent anyone from developing a competing non-Google browser, for example. Apple bans any non-Apple browser from existence on iOS and iPadOS.
I like this vision of shared libraries sandboxed inside a minimal runtime to replace the browser. But there are some serious problems with it (that I hope can be resolved somehow).
Replicating a significant fraction of the functionality of a browser ends up being a lot of code and data. For a web-like experience with instant navigation between apps it will be necessary for these shared libraries to be really shared, as in almost all apps use the same libraries and the same version of those libraries. Otherwise you'd be downloading and unpacking and JIT compiling dozens or hundreds of megabytes of code and data before you could display a single line of text, every time you click a link to a new site. The linkable nature of the web would be lost. (Yes, many websites are dozens of megabytes already, but the bulk of that is delay-loaded and happens in the background after the initial content is displayed).
In order for the standard libraries to be actually shared, everyone would have to agree on the standard libraries to use, and the standard libraries would have to be designed to be backwards compatible so that older sites could be force-upgraded to use newer versions. This is essentially the role that browser updates play today. But the browser evolved over decades and it's not clear how you'd get everyone to adopt a new set of standard libraries today, nor how you'd get everyone to agree on what functionality to add over time (the role that W3C plays today).
For libraries outside of the standard set, it sounds tempting to allow them to be shared too. But this is actually an unavoidable privacy violation. If you have a shared cache, then any app can discover the set of previously loaded libraries with a timing attack, thus revealing information about the user's browsing history. This is why all browsers are now moving to partition all caches by site: https://www.jefftk.com/p/shared-cache-is-going-away. Even if multiple sites reference the same wasm blob, it must be downloaded and JITed separately with no sharing.
As I said, I like the vision of a tiny low-level runtime executing shared libraries inside a sandbox. But I'm not sure how it could be done in an efficient and privacy-respecting way if you want to replace the Web with it.
Text rendering and accessibility APIs both need to access the host OS’s APIs to work correctly. (Text rendering is subtly different on macos and windows, and app rendering behaviour should follow the platform.)
I think we can expose standardised APIs to applications for this stuff, but the API will probably need on the order of hundreds of methods.
While it's probably true that a full gui ecosystem could be built on top of wasi+webgpu+windowing, it would mean that programs would often reinvent the wheel. Perhaps that's not a bad thing.
Speaking of moving complexity into sandboxed code, browsers could potentially run js engines inside the wasm vm, reducing a lot of complexity there.
>While it's probably true that a full gui ecosystem could be built on top of wasi+webgpu+windowing, it would mean that programs would often reinvent the wheel.
You can compile existing GUI libraries you just need to port the rendering backend and input.
To give proof by counterexample, the "simple wrapper" is a graphics stack like GTK alongside the POSIX APIs. It turns out this is insufficient for application development.
Some things you're missing are text layout and accessibility. Neither are small or simple.
The browser is the new OS and they're big and complicated like OSes by necessity, not accident.