Right up until you needed to do one of the very many things k8s implements.
For example, in multiple previous employers, we had cronjobs: you just set up a cronjob on the server, I mean, really, how hard is that to do?
And that server was a single point of failure: we can't just spin up a second server running crond, obviously, as then the job runs twice. Something would need to provide some sort of locking, then the job would need to take advantage of that, we'd need the job to be idempotent … all of which, except the last, k8s does out of the box. (And it mostly forces your hand on the last.)
Need to reboot for security patches? We just didn't do that, unless it was something like Heartbleed where it was like "okay we have to". k8s permits me to evict workloads while obeying PDB — in previous orgs, "PDBs" (hell, we didn't even have a word to describe the concept) were just tribal knowledge known only by those of us who SRE'd enough stuff to know how each service worked, and what you needed to know to stop/restart it, and then do that times waaay too many VMs. With k8s, a daemonset can just handle things generically, and automatically.
Need to deploy? Pre-k8s, that was just bespoke scripts, e.g., in something like Ansible. If a replica failed to start after deployment, did the script cease deployment? Not the first time it brought everything down, it didn't: it had to grow that feature by learning the hard way. (Although I suppose you can decide that you don't need that readiness check in k8s, but it's at least a hell of a lot easier to get off the ground with.)
Need a new VM? What are the chances that the current one actually matches the Ansible, and wasn't snowflaked? (All it takes is one dev, and one point in time, doing one custom command!)
The list of operational things that k8s supports that are common amongst "I need to serve this, in production" things goes on.
The worse part of k8s thus far has been Azure's half-aaS'd version of it. I've been pretty satisfied with GKE, but I've only recently gotten to know it and I've not pushed it quite as hard as AKS yet. So we'll see.
I've never heard the term "resource budget" used to describe this concept before. Got a link?
That'd be an odd set of words to describe it. To be clear, I'm not talking about budgeting RAM or CPU, or trying to determine do I have enough of those things. A PodDisruptionBudget describes the manner in which one is permitted to disrupt a workload: i.e., how can I take things offline?
Your bog simple HTTP REST API service, for example, might have 3 replicas, behind like a load balancer. As long as any one of those replicas is up, it will continue to serve. That's a "PodDisruptionBudget", here, "at least 1 must be available". (minAvailable: 1, in k8s's terms.)
A database that, e.g., might be using Raft, would require a majority to be alive in order to serve. That would be a minAvailable of "51%", roughly.
So, some things I can do with the webservice, I cannot do with the DB. PDBs encode that information, and since it is in actual data form, that then lets other things programmatically obey that. (E.g., I can reboot nodes while ensuring I'm not taking anything offline.)
A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that. And most interesting cases where you want it, default PDBs don't cover it.
> A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that.
No, and that's my point: PDBs exist always. Whether your org has a term for it, or whether you're aware of them is an entirely different matter.
We I did work comprised of services running on VMs, there is still a (now, spritual) PDB associated with that service. I cannot just take out nodes willy-nilly, or I will be the cause of the next production outage.
In practice, I was just intimately familiar with the entire architecture, out of necessity, and so I knew what actions I could and could not take. But it was not unheard of for a less-cautions or less-skilled individual to do before thinking. And it inhibits automation: automation needed to be aware of the PDB, and honestly we'd probably just hard-code the needs on a per-service basis. PDBs, as k8s structures them, solves the problem far more generically.
Sounds like a PDB isn’t a resource budget then. We were using that concept in ESX farms 20 years ago but it seems PDBs are more what more SREs would describe as minimum availability.
Because they're completely different things you're comparing. The functionality that I describe as having to have built out as part of Ansible (needing to check that the deploy succeeded, and not move on to the next VM if not) is not present in any Helm chart (as that's not the right layer / doesn't make sense), as it's part of the deployments controller's logic. Every k8s Deployment (whether from a Helm chart or not) benefits from it, and doesn't need to build it out.
> needing to check that the deploy succeeded, and not move on to the next VM if not
It's literally just waiting for a port to open and maybe check for an HTTP response, or run an arbitrary command until non-zero status; all the orch tools can do that in some way.
… there's a difference between "can do it" and "is provided."
In the case of either k8s or VMs, I supply the health check. There's no getting around that part, really.
But that's it in the case of k8s. I'm not building out the logic to do the check, or the logic to pause a deployment if a check fails: that is inherent to the deployments controller. That's not the case with Ansible/Salt/etc.¹, and I end up re-inventing portions of the deployments controller every time. (Or, far more likely, it just gets missed/ignored until the first time it causes a real problem.)
¹and that's not what these tools are targetting, so I'm not sure it's really a gap, per se.
Right up until you needed to do one of the very many things k8s implements.
For example, in multiple previous employers, we had cronjobs: you just set up a cronjob on the server, I mean, really, how hard is that to do?
And that server was a single point of failure: we can't just spin up a second server running crond, obviously, as then the job runs twice. Something would need to provide some sort of locking, then the job would need to take advantage of that, we'd need the job to be idempotent … all of which, except the last, k8s does out of the box. (And it mostly forces your hand on the last.)
Need to reboot for security patches? We just didn't do that, unless it was something like Heartbleed where it was like "okay we have to". k8s permits me to evict workloads while obeying PDB — in previous orgs, "PDBs" (hell, we didn't even have a word to describe the concept) were just tribal knowledge known only by those of us who SRE'd enough stuff to know how each service worked, and what you needed to know to stop/restart it, and then do that times waaay too many VMs. With k8s, a daemonset can just handle things generically, and automatically.
Need to deploy? Pre-k8s, that was just bespoke scripts, e.g., in something like Ansible. If a replica failed to start after deployment, did the script cease deployment? Not the first time it brought everything down, it didn't: it had to grow that feature by learning the hard way. (Although I suppose you can decide that you don't need that readiness check in k8s, but it's at least a hell of a lot easier to get off the ground with.)
Need a new VM? What are the chances that the current one actually matches the Ansible, and wasn't snowflaked? (All it takes is one dev, and one point in time, doing one custom command!)
The list of operational things that k8s supports that are common amongst "I need to serve this, in production" things goes on.
The worse part of k8s thus far has been Azure's half-aaS'd version of it. I've been pretty satisfied with GKE, but I've only recently gotten to know it and I've not pushed it quite as hard as AKS yet. So we'll see.