Using btrfs subvolumes as the image format, that's a nice touch. On the same road as the hypothetical systemd packaging system (not that I'm very enthusiastic about that).
The network, PID and mount namespaces are the ones unshared, plus a private /proc.
I like tools like this because they're reality checks on how the basics of Linux containers are just a few essential system calls, and particularly that they're limited.
One of the things I found really interesting here is how much could be done with just basic userland tools, and how old some of those tools are.
Docker was released in 2013, but support for kernel namespacing has been around since ~2007. That's quite a long time for such a great feature to go mainstream.
Yes. The basic technology has been there for a while. And the Docker source code has some eyebrow-raising parts to it.
However, I've stated this in other threads: Docker isn't about containment. It's really about the packaging system. I don't think this technology demo gets that.
If Docker is about packaging, then it's one of the worst package management systems I've ever used. I use Docker for the abstraction over namespaces and cgroups mainly, and get frustrated with the layers of disk images, bad caching system, poor security story, and the weak Dockerfile DSL.
Perhaps the parent poster was wryly indicating that he doesn't think that much of Docker. Certainly I think both of you are correct: Docker is about packaging, and it absolutely sucks at that. The only reason that that is not obvious is that Docker is piggybacking on the relatively excellent and well-developed package management of distributions like Debian and Fedora.
Debian and Fedora do packaging better than Docker, relatively speaking, but they still have major issues that have lead to "solutions" like Docker, Chef/Omnibus, etc. They install packages globally which doesn't allow for having multiple versions of the same software/library, they don't allow for unprivileged package management so users are at the mercy of the sysadmin, there's no transactional upgrades and rollbacks for package updates gone bad, and builds aren't reproducible (Debian is doing great work to fix this, though), to name the most important issues.
I work on the GNU Guix project, which can do all of these things. Additionally, with Guix, I have access to a configuration management system that can create disk images, VMs, and (in a future release) containers (replace Chef/Puppet/Docker), a tool for quickly creating isolated dev environments without polluting the rest of the system (replaces virtualenv and Vagrant when combined with a VM/container), and more.
I'm convinced that more featureful package managers can and will solve a lot of our software deployment problems, and I'm also convinced that simply layering more tools on top of a shaky foundation isn't going to work well in the long term.
> Debian and Fedora do packaging better than Docker, relatively speaking, but they still have major issues that have lead to "solutions" like Docker, Chef/Omnibus, etc.
I get what you're saying, but the way you've phrased it makes it seem like it wasn't intentional when in fact before immutable git-style packages were discovered, you were forced to choose between packaging that works well for developers/ops and packaging that works well for end users.
Debian is the best example we have of the latter, but it's a mistake to say they did a bad job at making ops-friendly packaging. They are solving a different, mutually-exclusive (until recently) problem.
With a bunch more elbow grease and polish, the nix/guix approach allows us to have the best of both worlds, but this is a very new development; arguably it isn't even "there" yet.
Debian and Fedora do it better, yes. But it's not quite as easy to get started.
However once you are at a certain size, both solutions are horrible. (Docker and RPM).
Especially when you need to target more than one Fedora / CentOS / RHEL / etc...
Also editing Spec files is quite horrible.
>The only reason that that is not obvious is that Docker is piggybacking on the relatively excellent and well-developed package management of distributions like Debian and Fedora.
s/well-developed/widely-used
Just because a package manager has a broad user base does not make it excellent nor well-developed. pacman[1] user base is far smaller, but (IMHO) it's a much more refined package manager than apt or rpm.
Which is probably why people were concerned about Docker's expansion into the clustering and orchestration markets, even if from a business perspective those are their only real holdouts to avoid commodification. The base Docker is easy to replace if the project goes out of hand, the services around it are trickier.
There are several tech streams converging there. A bigger chunk is the space that Mesos and Kubernetes occupy.
And to be fair, I suspect the Docker folks were thinking less about clustering and orchestration so much as: (1) clustering and orchestration still sucks; (2) people want as good of an experience using docker as they do when spanning across multiple nodes; (3) let's make clustering and orchestration less sucky and use the 'Docker Way'[1]
[1] 'Docker Way' is a pointer to the fuzzy, difficult-to-verbalize thing that Docker enables, namely in packaging.
I don't see that changing until we get proper single-system imaging, location transparency, process and IPC migration, process checkpointing and RPC-based servers for representing network and local resources as objects (be they file-based or other) in our mainstream systems.
These things only really caught on in the HPC and scientific computing spaces, where you've had distributed task managers and workload balancers like HTCondor and MOSIX for decades. They've also been research interests in systems like Amoeba and Sprite, but sans that, not much.
The likes of Mesos, the Mesosphere ecosystem with Marathon and Chronos, and Docker Swarm bring only the primitive parts of the whole picture. Some other stuff they can half-ass by (ab)using file system features like subvolumes, but overall I don't see them improving on all the suck.
I ran an OpenMOSIX cluster as a hobby. The alternative was Beowulf (the meme of the day was "Imagine a Beowulf cluster of these things").
It seems the mainstream server industry has moved to more isolation rather than more interconnectedness, which is probably better for most public-facing systems.
Isolation is orthogonal to what I listed. MOSIX has decent sandboxing. I don't know about Linux-PMI or OpenMOSIX, though. They died off years ago anyway.
Two of the things I'd love to see is having the image format be:
(1) Open standards (and I'm hearing things moving in that direction)
(2) Content-addressability, so that images can be stored on IPFS.
Point (2) really plays the "packaging" rather than the "containment" aspect. I'm not really thinking about Docker Hub or any proprietary services like that.
There's a project put out by Chef (formerly Opscode) called Omnibus. It allows you to build a monolithic the package, complete with all the library dependencies and such. Chef Server is distributed with that monolithic omnibus. What had happened was that various library dependencies would cause problems with the various systems that needed to come together. It was easier to specify the precise version of the components needed. (But it also put the onus of security fixes on Chef).
That is the real problem that Docker solves. Packaging. It enables a kind of shift in thinking that's difficult to put into words. People say "light-weight containers' or whatever, but none of that really nails the conceptual shift that Docker enables. In about five years, it'll become obvious the way 'cloud' is obvious now, and non-obvious back in 2005.
Omnibus is a step backwards. Every monolithic Omnibus package has its own copy of each dependency, so you end up with duplicated binaries. You can no longer patch a library system-wide, you have to figure out which Omnibus packages have the library and rebuild it with the patched version. Package management was invented to deduplicate files across the system, and people seem to have given up on that.
You say that Docker solves this problem, but it doesn't really. Sure, it creates isolated runtime environments that avoid the clashes you described, but it only further obscures the problem of system-wide deduplication of dependencies. The real solution here is better package managers, such as GNU Guix, that can easily handle multiple programs on the same machine that require different versions of the same dependencies whilst also deduplicating common files system-wide. Once such a foundation is in place, a container system no longer needs to deal with disk images, it can just bind-mount the needed software builds from the host into the container thereby deduplicating files across all containers, too.
Omnibus and Docker are papering over problems, not solving them.
It turns out that the kernel is smart enough to deduplicate (in memory) the same version of a shared library across VM boundaries, so I don't really see why we need to make packaging handle this, especially if this can be applied to containers (if it isn't already). Duplicates of the same file on disk is not a big deal, and can be solved by a good file system which handles deduplication. I don't see why we necessarily want to do all of this in a package manager.
It's actually a good thing that we have these duplicates from a packaging perspective, because you completely remove the host requirements altogether, and can focus on what an app needs, and if you want to take advantage of deduplication (on disk, or in memory), then you can let another subsystem resolve that for you.
To resolve the patching something system-wide, you simply use a common base image, it's orthogonal to containers. Just because you can have different versions of a dependency, doesn't mean you need to. The main advantage of the container having it's own version is that you can independently upgrade components without worrying that a change will effect another application.
You might argue that this is a security concern, but I'd argue that it's more secure to have an easily updatable application than an easy way to update a particular library across all applications. In the latter case, we already know what happens, people don't update the library nearly as often as they should, because it could break other applications which might rely on that particular version. At least in the first case we can upgrade with confidence, meaning we actually do the upgrades.
This means your security depends on the app maintainer, which is a terrible place to be in. I don't want to have to wait for the latest image of 100 apps and hope they didn't break anything else just to deal with an openssl vulnerability.
If your system consists of 100 apps, you have a bigger problem, and likely is a shop big enough to deal with it.
I'm working on a production deployment of a CoreOS+Docker system for a client now, and the entire system consists of about a dozen container-images, most of which have small, largely non-overlapping dependencies.
Only two have a substantial number of dependencies.
This is a large part of what excites people about Docker and the like: It gives us dependency isolation that often results in drastically reducing the actual dependencies.
None of this e.g. requires statically linked binaries, so no, you don't have to wait for the latest image of 100 apps. You need to wait for the latest package of whatever library is an issue, at which point you rebuild your images, if necessary overriding the package in question for them.
One of the touted benefits of containers is shipping images to people with your software. That means as a customer you cant rebuild the image yourself.
A lot of the cgroups and namespaces functionality too time to mature and stabilize. User name spaces for instance was only available with 3.8. Cgroups and namespaces still don't play well with each other.
Cgroups was initially added by some folks from Google in 2007. A lot of the early work on Linux containers was done by Daniel Lezcano and Serge Hallyn of the LXC project, supported by IBM. It was initially a kernel patch and userland tools. You can still see it on the IBM website. It was merged in 2.6.32.
Then around 2012 the LXC project started being supported by Ubuntu and Stephane Graber of Ubuntu continued the work with Serge Hallyn. LXC was of course focused on OS containers and they didn't really market themselves.
Around 2013 when LXC was finally becoming usable by end users, Docker who were probably using it in their previous avatar in dotcloud as a PAAS platform, took it as a base, modified the container OS's init to run single apps, removed storage persistence, and built it with aufs layers, and took it to market aggressively.
But if you look beyond the PAAS centric use case, OS containers are simpler to use, offer near seamless migration of VM workloads, more flexibility in storage and networking scenarios and are more easily used with the ecosystem of apps and tools with a normal multi-process OS environment.
The ability to gain the advantages of containers without needing to re engineer how you deploy applications is an incredible value proposition.
LXC is mature, pretty advanced and simpler to use than Docker, but a lot of users and media have got the impression that its 'low level' or difficult to use.
The Docker, PAAS and micro services folks are the only ones really messaging and going out there to gain adoption and there is an unfortunate conflation of containers to Docker and monoculture developing. The 'Open Container Standard' is an example. Shouldn't that be 'Open App Container Standard'?
App containers are a constrained OS environment and add complexity, and the various Docker specific solutions being developed for everything from networking to storage is evidence of the additional complexity. There is obviously a huge devops PAAS case here that people see value in. And the sheer amount of money and engineering deployed means something good has to come out of it. But containers cannot be just about PAAS.
I run Flockport that provides an app store [1] based on OS containers that are as easy to use as Docker hub and extensive documentation [2] on using containers so do give it a look.
Systemd-nspawn is way easier to use than LXC imo in that it replicates the simplicity of chroot with the power of cgroups. The security story is unfinished though.
Not really, Nspawn is extremely promising and is developing fast. Systemd 220 adds support for user namespaces so you can run nspawn containers as non root users.
But containers need minimal OS templates, networking and a way to configure it properly, storage support for things like cloning and snapshots, a way to configure cgroups, and management and those are still not available beyond some basic machinectl commands, and neither is the documentation. Nspawn is going to be a very strong solution, especially given Systemd is now there by default on most mainstream distros, but its not there yet.
User namespaces while letting non root users run containers brings with it a whole bunch of problems on accessing host resources like mounting file systems, networking devices etc that LXC has faced and addressed.
I have an article up on using nspawn containers here [1]. There are a lot of wild misconceptions floating around about LXC. It is actually pretty mature and easy to use, has supported user namespaces since 2013, has advanced networking and storage support for things like cloning and snapshots with btrfs, zfs, overlayfs, LVM thin, aufs, a nice set of tools to manage containers, and a wide choice of minimal container OS templates.
We have a lightweight boot2lxc VM image based on Alpine Linux for those who want to give it a go [2]
It's openly pointed out in the docs that it's intended to prevent unintentional system alterations, not stop an actively hostile program - i.e. there's not a lot of confidence from the devs in it's isolation levels yet.
User namespaces actually weren't completely done until late 2013 (Ubuntu didn't have them enabled until 13.10 or 14.04 because it didn't work with XFS).
also that they dont use a root daemon and random hard to debug, hard to verify things.
it feels like most of them build large products around things to justify making money, while i like the simple, elegant, easy to verify little pieces of software (some call it "the unix way")
The network, PID and mount namespaces are the ones unshared, plus a private /proc.
I like tools like this because they're reality checks on how the basics of Linux containers are just a few essential system calls, and particularly that they're limited.