Maybe, but the point with containers and kubernets is to treat it like cattle, n...

reacharavindh · on Jan 13, 2020

This is not good engineering. If somebody told me this at a business, I’d not trust them anymore with my infrastructure.

So, you say that problems happen, and you consciously don’t want to know/solve them. A recurring problem in you view is solved with constantly building new K8s clusters and your whole infrastructure in it every time!?! Simple example - A microservice that leaks memory.... let it keep restarting as it crashes?!

I remember at one of my first jobs, at a healthcare system for a hospital in India, their Java app was so poorly written that it kept leaking memory and bloated beyond GC could help and will crash every morning at around 11 AM and then again at around 3 PM. The end users - Doctors, nurses, pharmacists knew about this behavior and had breaks during that time. Absolutely bullshit engineering! It’s a shame on those that wrote that shitty code, and shame on whoever reckless to suggest a ever rebuilding K8s clusters.

kemitche · on Jan 13, 2020

> let it keep restarting as it crashes?!

Yes, "let it keep restarting while it crashes and while I investigate the issue" is MUCH preferred to "everything's down and my boss is on my ass to fix the memory issue."

The bug exists either way, but in one world my site is still up while I fix the bug and prioritize it against other work and in another world my site is hard-down.

mbreese · on Jan 13, 2020

That only works if the bug actually gets fixed. When you have normalized the idea that restarting the cluster fixes a problem — all of the sudden, you don’t have a problem anymore. So now your motivation to get the bug properly fixed has gone away.

Sometimes feeling a little pain helps get things done.

hashhar · on Jan 13, 2020

You and I wish that's what happened in real life. Instead, people now normalize the behavior thinking it'll sort itself out automatically over time without ever trying to fix it.

Self-healing systems are good but only if you have someone who is keeping track of the repeated cuts to the system.

carlio · on Jan 13, 2020

This is something that has been bothering me for the last couple of years. I consistently work with developers who no longer care about performance issues, assuming that k8s and the ops team will take care of it by adding more CPU or RAM or just restarting. What happened to writing reliable code that performed well?

novok · on Jan 13, 2020

Business incentives. It's a classic incentive tension between more time on nicer code that does the same thing or building more features. Code expands to it's performance budget and all.

At least on backend you can quantify the cost fairly easily. If you bring it up to your business people they will notice easy win and then push the devs to make more efficient code.

If it's a small $$ difference although, the devs are probably prioritizing correctly.

lllr_finger · on Jan 13, 2020

I've witnessed the same thing, however there is nothing mutually exclusive about having performant code running in Kubernetes. There's a trade-off between performance and productivity, and maintaining a sense of pragmatism is a good skill to have (that's directed towards those that use scaling up/out as a reason for being lax about performance).

dustinmoris · on Jan 13, 2020

Nothing is this black and white. I tried to emphasise just a simple philosophy that life gets a lot easier if you make things easily replaceable. That was the message I tried to convey, but of course if there is a deep problem with something it needs proper investigation + fixing, but that is an actual code/application problem.

lllr_finger · on Jan 13, 2020

That's not what cattle vs pets is. Treating your app as cattle means that it deploys, terminates, and re-deploys with minimal thought at the time of where and how. Your app shouldn't care which Kubernetes node it gets deployed to. There shouldn't be some stateful infrastructure that requires hand-holding (e.g. logging into a named instance to restart a specific service). Sometimes network partitions happen, a disk starts going bad, or some other funky state happens and you kill the Kubernetes pod and move on.

You should try to fix mem leaks and other issues like the one you described, and sometimes you truly do need pets. Many apps can benefit from being treated like cattle, however.

This article touched on the distinction and has plenty of associated links at the bottom: https://medium.com/@Joachim8675309/devops-concepts-pets-vs-c...

high_5 · on Jan 13, 2020

The problem is that your "cattle" has mad cow disease and no amount of restarting will help.

irishsultan · on Jan 13, 2020

So what if there is a recurring issue? Are you never going to debug it?

nickik · on Jan 13, 2020

When cattle are sick, you need to heal them. Not shoot them in the head and bring in new cattle. If you your software behaves badly you need to FIX THE SOFTWARE.

Just doing the old 'just restart everything' is typical windows admin behavoir and a recipy for making bad unstable systems.

Kubernetes absolutly does do strang things, crahes on strange things, does strange things and not tell you about it.

I like the system, but to pretend its this unbelievable great thing is an exaturation.

dustinmoris · on Jan 13, 2020

You fundamentally misunderstood what I said.

I agree with you, treat your software like a pet.

I am saying though, treat your infrastructure like cattle.

Infrastructure problems <> Software problems.

So yeah, if you have severe bugs in your app, go ahead and fix it, but that has nothing to do with Kubes or not Kubes anymore.