> Remember the first time you saw the AWS console? And the last time?
There was a time in between for me - that was Rightscale.
For me, the real thing that k8s bring is not hardware-infra - but reliable ops automation.
Rightscale was the first place where I encountered scripted ops steps and my current view on k8s is that it is a massively superior operational automation framework.
The SRE teams which used Rightscale at my last job used to have "buttons to press for things", which roughly translated to "If the primary node fails, first promote the secondary, then get a new EC2 box, format it, install software, setup certificates, assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing."
The value was in the automation of the steps in about 4 domains - monitoring, node allocation, package installation and configuration realignment.
The Nagios, Puppet and Zookeeper combos for this was a complete pain & the complexity of k8s is that it is a "second system" from that problem space. The complexity was always there, but now the complexity is in the reactive ops code, which is the final resting place for it (unless you make your arch simpler).
> The SRE teams which used Rightscale at my last job used to have "buttons to press for things", which roughly translated to "If the primary node fails, first promote the secondary, then get a new EC2 box, format it, install software, setup certificates, assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing."
If I understand this correctly, all of the things could have been automated in AWS fairly easily .
"If the primary node fails" Health check from EC2 or ELB.
"get a new EC2 box" ASG will replace host if it fails health check.
"format it" The AMI should do it.
"install software, setup certificates" Userdata, or Cloud-init.
"assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing" This could be orchestrated by some kind of SWF workflow if it takes a long time or just some lambda function if it's within a few mins.
There was a time in between for me - that was Rightscale.
For me, the real thing that k8s bring is not hardware-infra - but reliable ops automation.
Rightscale was the first place where I encountered scripted ops steps and my current view on k8s is that it is a massively superior operational automation framework.
The SRE teams which used Rightscale at my last job used to have "buttons to press for things", which roughly translated to "If the primary node fails, first promote the secondary, then get a new EC2 box, format it, install software, setup certificates, assign an elastic IP, configure it to be exactly like the previous secondary, then tie together replication and notify the consistent hashing."
The value was in the automation of the steps in about 4 domains - monitoring, node allocation, package installation and configuration realignment.
The Nagios, Puppet and Zookeeper combos for this was a complete pain & the complexity of k8s is that it is a "second system" from that problem space. The complexity was always there, but now the complexity is in the reactive ops code, which is the final resting place for it (unless you make your arch simpler).