I think the point is that there are abstractions that require you to know almost...

root_axis · 2025-01-26T23:15:45 1737933345

Yes, k8s is an abstraction, and it's a useful one, even though not everyone needs it. At this new level of abstraction, your hardware becomes homogeneous, making it trivial to scale and recover from hardware failures since k8s automatically distributes your application instances across the hardware in a unified manner. It also has many other useful capabilities downstream of that (e.g. zero downtime deployment/rollback/restart). There's not really any other (well supported) alternative if you want that. Of course, most organizations don't need it, but it's very nice to have in a service oriented system.

zug_zug · 2025-01-27T00:24:30 1737937470

> There's not really any other (well supported) alternative if you want that

You don't think AWS autoscale groups give you both of those things?

stouset · 2025-01-27T00:31:10 1737937870

I think you comically underestimate what Kubernetes provides.

Autoscaling groups give you instances, but Kubernetes automatically and transparently distributes all your running services, jobs, and other workloads across all those instances.

Amongst a laundry list of other things.

zug_zug · 2025-01-27T13:56:45 1737986205

I think you’re comically misunderstanding what 95% of companies actually are doing with kubernetes

hadlock · 2025-01-27T23:21:30 1738020090

A big part of what kubernetes provides is a standard interface. When your infra guy gets hit by a bus, someone else (like a contractor) can plug in blindly and at least grasp the system in a day or two.

enoent · 2025-01-27T08:22:27 1737966147

AWS autoscaling does not take your application logic into account, which means that aggresive downscaling will, at worst, lead your applications to fail.

I'll give a specific example with Apache Spark: AWS provides a managed cluster via EMR. You can configure your task nodes (i.e. instances that run the bulk of your submitted jobs to Spark) to be autoscaled. If these jobs fetch data from managed databases, you might have RDS configured with autoscaling read replicas to support higher volume queries.

What I've frequently see happening: tasks fail because the task node instances were downscaled at the end of the job, because they are no longer consuming enough resources to stay up, but the tasks themselves haven't finished. Or tasks failed because database connections were suddenly cut off, since RDS read replicas were no longer transmitting enough data to stay up.

The workaround is to have a fixed number of instances up, and pay the costs you were trying to avoid in the first place.

Or you could have an autoscaling mechanism that is aware of your application state, which is what k8s enables.

znpy · 2025-02-01T18:42:50 1738435370

> since RDS read replicas were no longer transmitting enough data to stay up.

As an infra guy, I’ve seen similar things happening multiple times. This could be a non problem if developers handled the connection lost case, reconnection with retries and stuff.

But most developers just don’t bother.

So we’re often building elastic infrastructure that is consumed by people that write code as if we were still on the late 90ies with the single instance dbs expected to be always available.

zug_zug · 2025-01-27T14:02:07 1737986527

Asgs can do both of those things, it’s a 5% use-case so it takes a little more work but not much

enoent · 2025-01-27T19:56:43 1738007803

Can you elaborate on that "little more work", given that resizing on demand isn't sufficient for this use-case, and predictive scaling is also out of the question?

root_axis · 2025-01-27T06:21:27 1737958887

Nothing wrong with ASGs, but they're not really comparable to k8s. k8s isn't simply "scaling", it's a higher level of abstraction that has granular control and understanding of your application instances in a manner that allows it to efficiently spread workloads across all your hardware automatically, all while managing service discovery, routing, lb, rollbacks and countless more. Comparing it to ASG suggests you may not be that familiar with k8s.

I think it's fair to argue that k8s is overkill for many or even most organizations, but ASG is not even close to an alternative.

roncesvalles · 2025-01-27T16:06:27 1737993987

It seems that you don't understand ASGs. They do all the things that you listed.

K8s is essential when working with a fleet of bare metals. It's an unneeded abstraction if you're just going to deploy it on AWS or similar.

stouset · 2025-01-26T21:21:00 1737926460

Those only require you to understand them because you’re working directly on top of them. If you were writing a filesystem driver you would absolutely need to know those details. If you’re writing a database backend, you probably need to know a lot about the filesystem. If you’re writing an ORM, you need to know a lot about databases.

Some of these abstractions are leakier than others. Web development coordinates a lot of different technologies so often times you need to know about a wide variety of topics, and sometimes a layer below those. Part of it is that there’s a lot less specialization in our profession than in others, so we need lots of generalists.

zug_zug · 2025-01-26T22:13:32 1737929612

I think you're sort of hand-waving here.

I think the concrete question is -- do you need to learn more or fewer abstractions to use kubernetes versus say AWS?

And it looks like kubernetes is more abstractions in exchange for more customization. I can understand why somebody would roll their eyes at a system that has as much abstraction as kuberenetes does if their use-case is very concrete - they are scaling a web app based on traffic.

stouset · 2025-01-26T23:10:23 1737933023

Kubernetes and AWS aren’t alternatives. They occupy vastly different problem spaces.

zug_zug · 2025-01-27T00:21:47 1737937307

Not really.

SJC_Hacker · 2025-01-27T03:15:52 1737947752

Kubernetes isn't locked to any vendor

Try moving your AWS solution to Google Cloud without a massive rewrite.

Also Kubernetes doesn't actually deal with the underlying physical devices, directly. That would be done something like Terraform or if you're still hardcore, shell scripts.

zug_zug · 2025-01-27T13:58:42 1737986322

I’ve never seen a single company use kubernetes or terraform to move vendors; the feasibility of that was massively over represented

SJC_Hacker · 2025-01-28T18:52:35 1738090355

Well we did at my company when we moved from AWS to GCP

stouset · 2025-01-27T00:36:14 1737938174

Sure, what do I know, I only operate the Kubernetes platform (on AWS) that runs most of a $50bn public company.

zug_zug · 2025-01-27T02:45:57 1737945957

"It is difficult to get a man to understand something when his salary depends on his not understanding it." - Upton Sinclair

stouset · 2025-01-27T03:13:14 1737947594

My salary directly depends upon me deeply understanding both AWS and Kubernetes. Better luck next time.

saynay · 2025-01-27T19:26:26 1738005986

I wrote a tiny one that worked as glue between our application's opinion on how node DNS names should be, and what ExternalDNS controller would accept automatically. When GKE would scale the cluster, or upgrade nodes, it was requiring manual steps to fix the DNS. So, instead of rewriting a ton of code all over in our app, and changing the other environments we were running on, I just wrote a ~100 line controller that would respond to node-add events by annotating the node in a way ExternalDNS would parse, and in turn automatically create DNS entries in the form we wanted.

vel0city · 2025-01-27T20:24:00 1738009440

I both agree this should exactly be what these kinds of small custom operators should be and also see the nuisance of awkward database triggers bubbling up into the "I dunno why it works, just magic" kind of lost knowledge into how systems actually function.