You’re conflating unattended-upgrades (server mutability, hard to roll back) wit...

markstos · on July 6, 2022

> I bet when you update your software dependencies you run those changes through your tests but your OS is a giant pile of code that usually gets updated differently and independently because mostly historical reasons.

Close. We are moving towards defining our server states through Ansible, but the project is not close to completion. Perhaps once that's further along, we could use Ansible Molecule + CI to test a new server state when there's a new patch available, but that's not an option on the table today.

The system we had in place for /today/ worked: Lower priority or redundant servers were set to auto-reboot after applying security updates, while other critical servers require manual reboot at low-risk times. By then, the patch has already been tested on lower-risk servers.

As a result, this issue caused no user-visible downtime for us, and due to the staggered runs of unattended-upgrades affected a minimal number of servers.

And this was the first time in 10+ years that something like this happened and we have to choose to write to prioritize spending our process-improvement time based on likelihood and impact.