People *think* it took less effort. Right up until you needed to do one of the v...

nailer · on March 28, 2023

I have 0 results for PDB SRE on multiple search engines. What does it mean?

mardifoufs · on March 28, 2023

I think it's Pod Disruption Budgets, a kubernetes redundancy/resiliency related concept.

nailer · on March 28, 2023

> k8s permits me to evict workloads while obeying PDB — in previous orgs, "PDBs" (hell, we didn't even have a word to describe the concept)

Odd, the parent makes it seem like resource budgets weren’t a thing before k8s.

deathanatos · on March 28, 2023

I've never heard the term "resource budget" used to describe this concept before. Got a link?

That'd be an odd set of words to describe it. To be clear, I'm not talking about budgeting RAM or CPU, or trying to determine do I have enough of those things. A PodDisruptionBudget describes the manner in which one is permitted to disrupt a workload: i.e., how can I take things offline?

Your bog simple HTTP REST API service, for example, might have 3 replicas, behind like a load balancer. As long as any one of those replicas is up, it will continue to serve. That's a "PodDisruptionBudget", here, "at least 1 must be available". (minAvailable: 1, in k8s's terms.)

A database that, e.g., might be using Raft, would require a majority to be alive in order to serve. That would be a minAvailable of "51%", roughly.

So, some things I can do with the webservice, I cannot do with the DB. PDBs encode that information, and since it is in actual data form, that then lets other things programmatically obey that. (E.g., I can reboot nodes while ensuring I'm not taking anything offline.)

( https://kubernetes.io/docs/tasks/run-application/configure-p... )

morelisp · on March 28, 2023

A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that. And most interesting cases where you want it, default PDBs don't cover it.

deathanatos · on March 28, 2023

> A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that.

No, and that's my point: PDBs exist always. Whether your org has a term for it, or whether you're aware of them is an entirely different matter.

We I did work comprised of services running on VMs, there is still a (now, spritual) PDB associated with that service. I cannot just take out nodes willy-nilly, or I will be the cause of the next production outage.

In practice, I was just intimately familiar with the entire architecture, out of necessity, and so I knew what actions I could and could not take. But it was not unheard of for a less-cautions or less-skilled individual to do before thinking. And it inhibits automation: automation needed to be aware of the PDB, and honestly we'd probably just hard-code the needs on a per-service basis. PDBs, as k8s structures them, solves the problem far more generically.

morelisp · on March 28, 2023

> we'd probably just hard-code the needs on a per-service basis.

For 99% of situations this is a better decision. For, idk, at least 20% of the remaining 1%, PDBs won't handle it anyway.

nailer · on March 29, 2023

Sounds like a PDB isn’t a resource budget then. We were using that concept in ESX farms 20 years ago but it seems PDBs are more what more SREs would describe as minimum availability.

nailer · on March 29, 2023

Maybe 'minimum instance availability' to be specific that were referring to instances of a service rather than an SLA.

lowercased · on March 28, 2023

"Pre-k8s, that was just bespoke scripts, e.g., in something like Ansible"

How are these 'bespoke scripts' but helm charts are not 'bespoke'? Or do you consider them 'bespoke-but-better'?

deathanatos · on March 28, 2023

Because they're completely different things you're comparing. The functionality that I describe as having to have built out as part of Ansible (needing to check that the deploy succeeded, and not move on to the next VM if not) is not present in any Helm chart (as that's not the right layer / doesn't make sense), as it's part of the deployments controller's logic. Every k8s Deployment (whether from a Helm chart or not) benefits from it, and doesn't need to build it out.

morelisp · on March 28, 2023

> needing to check that the deploy succeeded, and not move on to the next VM if not

It's literally just waiting for a port to open and maybe check for an HTTP response, or run an arbitrary command until non-zero status; all the orch tools can do that in some way.

deathanatos · on March 28, 2023

… there's a difference between "can do it" and "is provided."

In the case of either k8s or VMs, I supply the health check. There's no getting around that part, really.

But that's it in the case of k8s. I'm not building out the logic to do the check, or the logic to pause a deployment if a check fails: that is inherent to the deployments controller. That's not the case with Ansible/Salt/etc.¹, and I end up re-inventing portions of the deployments controller every time. (Or, far more likely, it just gets missed/ignored until the first time it causes a real problem.)

¹and that's not what these tools are targetting, so I'm not sure it's really a gap, per se.