I wanted to refrain from commenting because honestly I’m not the biggest fan of relatively opaque complexity and kubernetes tries its hardest to be this. (Cloud providers doing magic things with annotations for example)
But, I have to say that kubernetes is not the devil. Lock-in, is the devil.
I recently underwent the task of getting us off of AWS, which was not as painful as it could have been (I talk about it here[0])
But the thing is: I like auto healing, auto scaling and staggered rollouts.
I had previously implemented/deployed this all myself using custom C++ code, salt and a lot of python glue. It worked super well but it was also many years of testing and trial and error.
Doing all of that again is an insane effort.
Kubernetes is 80% of the same stuff if your workload fits in it, but you have to learn the edge cases, which of course increases tremendously from the standard: python, Linux, terraform stuff most operators know.
Anyway.
I’m not saying go for it. But don’t replace it with lock-in.
I don't think lock-in should be a problem for most startups - in the same sense most startups don't need kubernetes.
For almost everyone, I'd say: just pick a cloud provider, stick to it, and (unless your whole business model is about computing resource management) your time is almost certainly better spent on other things.
I'm working at a company that had moved into AWS before I joined and I don't see it ever moving out of AWS. Of course we have some issues with our infrastructure, but "we're stuck at AWS" is the least of my concern. Any project to move stuff out of AWS is not going to be worth the engineering cost.
Usually I’m not working in startups. Usually I’m responsible for 10s-100s of millions of Euro projects.
Of course being pragmatic is a large part of what has led me to a successful career; and in that spirit of course whatever works for you is the best and I’m not going to refute it.
I would also argue for a single VPS over kubernetes for a startup, it’s incredibly unnecessary for an MVP or a relatively limited number of users.
But I wouldn’t argue for the kind of lock-in you describe.
I have seen many times how hitching your wagon to another company can hurt your long term goals. Using a couple of VPSs leaves your options open.
As soon as you’re buying the kook-aid of something that can’t easily be replicated outside the you’re hoping that none of the scenarios I’ve seen happen again.
Things I’ve seen:
Locayta: a search system, was so deeply integrated that our SaaS search was permanently crippled. That company went under but we could not possibly move away. It was a multi-year effort.
One of our version control systems changed pricing model so that we paid 90x more overall. We could do nothing because we’d completely hitched our wagon on this system. Replacing all build scripts, documentation and training users was a year long effort at the least. So we paid the piper.
This happens all the time: Adobe being another example that hasn’t impacted me directly.
It’s important in negotiations to be able to walk away.
I made a relatively large list of reasons. Most are going to sound fickle but I consider some to be very problematic if you’re woken up at 3am and have to orient yourself- others I consider problematic because they cause an order of magnitude increase in complexity.
Mostly it’s an issue of perception too, a cloud saves me time. If it doesn’t save me time it is not worth the premiums and for our case- it would not save time. (Due to the complexity mentioned before).
But here’s part of list (with project specific items redacted):
3am topics:
* Project name (impossible to see which project you're in, usually it's based on "account" but that gets messed up with SSO)
* instance/object names (`i-987348ff`, `eip-7338971`, `sub-87326`) are hard to understand meaning of.
* Terminated instances fill UI.
* Resources in other regions may as well not exist, they're invisible- sometimes only found after checking the bill for that month.
Time cost topics (stumbling things that make things slower):
* Placements only supported on certain instances
* EBS optimised only supported on certain instances
Other:
* Launch configurations (user_data) only 16KiB, life-cycling is hard also, user-data is a terrible name.
* 58% more objects and relationships (239 -> 378 LoC after terraform graph)
* networking model does not make best practice easy (Zonal based network, not regional)
* Committed use (vs sustained use) discounts means you have to run cost projections _anyway_ (W.R.T. cost planning on-prem vs cloud)
* no such thing as an unmanaged instance group (you need an ASG which can be provisioned exclusively with a user-data (launch script in real terms)
* managed to create a VPC where nothing could talk to anything. Even cloud experts couldn't figure it out, not very transparent or easy to debug.
Sticky topics (things that make you buy more AWS services or lock-in):
* Use Managed ES! -> AWS ES Kibana requires usage of separately billed cognito service if you want SAML SSO.
But, I have to say that kubernetes is not the devil. Lock-in, is the devil.
I recently underwent the task of getting us off of AWS, which was not as painful as it could have been (I talk about it here[0])
But the thing is: I like auto healing, auto scaling and staggered rollouts.
I had previously implemented/deployed this all myself using custom C++ code, salt and a lot of python glue. It worked super well but it was also many years of testing and trial and error.
Doing all of that again is an insane effort.
Kubernetes is 80% of the same stuff if your workload fits in it, but you have to learn the edge cases, which of course increases tremendously from the standard: python, Linux, terraform stuff most operators know.
Anyway.
I’m not saying go for it. But don’t replace it with lock-in.
[0]: https://www.gcppodcast.com/post/episode-265-sharkmob-games-w...