If a distribute system relies on clients gracefully exiting to work the system w...

Rhapso · 2025-05-05T01:43:16 1746409396

And i believe that so much that I don't even consider graceful shutdown in design. Components should be able to safely (and even frequently) hard-crash and so long as a critical percentage of the system is WAI then it shouldn't meaningfully impact the overall system.

The only way to make sure a system can handle components hard crashing, is if hard crashing is a normal thing that happens all the time.

All glory to the chaos monkey!

ikiris · 2025-05-04T22:40:48 1746398448

There's a big gap between graceful shutdown to be nice to clients / workflows, and clients relying on it to work.

smcleod · 2025-05-04T22:19:28 1746397168

Way back when, in physical land - I used STONITH for that! https://smcleod.net/2015/07/delayed-serial-stonith/

XorNot · 2025-05-04T22:31:21 1746397881

There's valid reasons to want the typical exit not to look like a catastrophic one even if that's a recoverable situation.

That my application went down from sig int makes a big difference compared to kill.

Blue-Green migrations for example require a graceful exit behavior.

shoo · 2025-05-04T22:45:01 1746398701

> Blue-Green migrations for example require a graceful exit behavior.

it may not always be necessary. e.g. if you are deploying a new version of a stateless backend service, and there is a load balancer forwarding traffic to current version and new version backends, the load balancer could be responsible for cutting over, allowing in flight requests to be processed by the current version backends while only forwarding new requests to the new backends. then the old backends could be ungracefully terminated once the LB says they are not processing any requests.

eknkc · 2025-05-05T08:54:01 1746435241

Yeah. However, I do not need to pull the plug to shut things down even if the software was designed to tolerate that.

In a second thought though, maybe I do. That might be the only way to ensure the assumption is true. Like the Netflix's chaos monkey thing a couple years ago.

antonvs · 2025-05-06T08:36:06 1746520566

> Like the Netflix's chaos monkey thing a couple years ago.

That was released 15 years ago.

eknkc · 2025-05-06T09:34:28 1746524068

Thanks for reminding how old I am.

icedchai · 2025-05-05T13:19:56 1746451196

Relying on graceful exit and supporting it are two different things. You want to support it so you can stop serving clients without giving them nasty 5xx errors.

Thaxll · 2025-05-05T01:15:13 1746407713

No one said that.