I am your target audience and few things put me off: - no SSH access to nodes. A...

andrewrynhard · on Sept 26, 2019

First of all, thank you for taking the time to write this out. The feedback is very valuable. I will do my best to address each comment.

Let me start by laying out our design constraints. We knew we wanted a handful of simple features:

- minimal

- immutable

- and secure

and we approached them with the willingness to do whatever it took to achieve them, no matter how different it would be from any Linux distribution today.

The degree to which we want to obtain minimalism is what I like to call "ultra". Not a single file should be on the image that isn't absolutely needed. Furthermore, not a single process should be allowed to run that isn't required to obtain the goal of running Kubernetes. So we started by creating an image with just enough to run the kubelet and the kubelet only. Obviously, this isn't practical, but it was a place to start.

In implementing our immutability design constraint we decided to:

- make the root filesystem read-only - have no package manager - not allow any generic use of the OS (i.e. it would be only for the purposes of running Kubernetes)

When optimizing for one thing, you often degrade another. In our case, if we optimize for minimalism, then immutability becomes degraded. We need to address a way to manage and debug the node, and we need libraries/binaries to do so. With no package manager, this means everything must be baked into the image, and thus we degrade immutability.

Tacking on yet another design constraint, security, things become even more interesting. The more you add to a system, the higher the risk in vulnerabilities. The more allowed permissions in a system, the higher the risk in vulnerabilities. So minimalism, and immutability actually complement security. In our case, security has the highest priority of all, which means we aren't willing to degrade anything that supports the security of the system. So minimalism, and immutability must be present.

Aside from our design constraints of minimalism, and immutability, we also avoid C as much as possible. We want to build something using a modern language for all the reasons you would choose a modern language over C today, but mostly for security purposes.

Taking all the above into consideration, this meant that we are still left with figuring out how to manage a machine without degrading minimalism, immutability, and security. So without tooling on the rootfs, without a package manager, and without a way to run custom processes, we still need a way to obtain the information we need from a machine. Thus the API was born.

The API doesn't only solve the management issue, it also reenforces all of our design constraints:

- we can keep the image minimal with a single binary serving the API - we can keep the image immutable by building a robust API - we could retain security by using mutual TLS and offering a read-only API - we could write it in a modern language, using modern tooling (golang and gRPC)

At this point what need is there in SSH/console access if the design constraints essentially remove all usefulness in console access? The problem isn't necessarily the need for SSH/console, its the need for a way to get the data to make informed decisions.

There are also additional benefits to an API. There is a reason the concept exists. With an API you get a standarization, strong types, and constistent and well known output formats. The benefits are many.

I'd like to also point you in the direction of an execellent talk given this year at Blackhat: https://swagitda.com/speaking/us-19-Shortridge-Forsgren-Cont.... The section on D.I.E. in particular will add some additional support to the reasons I gave above.

That is my lengthy response to the reasoning behind the removal of SSH. Remember, just because we don't have SSH baked in, nothing is stopping you from running a DaemonSet that has SSH.

As for a custom kernel, we would love to support this. Happy to take in feedback here. We create Talos in containers and our goal is to create the necessary tooling to make this dead simple.

As for node joins, they do not happen with the trustd username and password. We use kubeadm under the hood, so its token based, and possible to have a TTL. We have since moved to token based approach for trustd as well. Note that the trustd token simply gives the node the ability to a worker to request a certficate for OSD, so that you can hit the node's API.

We are currently working on an upgrade operator and it is planned for v0.3. If you would like to have some say in the direction we go, we would be happy to have you in our community meetings!

You make good points about the diagrams. It is clear from this post that we have work to do around the documentation.

And finally, Talos is not based on any distribution. We have a toolchain that we build, and subsequently build our entire distribution from.

I hope I have answered your questions well enough. I look forward to hearing back from you. Your input is valued, and we really would like to use it to turn this into somethi great!