My question is: Why is only k8s so popular when there are better alternatives fo...

verdverm · on May 29, 2020

What are these alternatives with more users?

Where is the momentum?

Hosted GKE costs the same per month as an hour of DevOps time, what's wrong with paid management for k8s?

heipei · on May 29, 2020

I didn't say more users, I said appropriate for more users. The alternative I mentioned is Nomad and I wish more people would give it a try and decide for themselves. The momentum behind it is Hashicorp, makers of Vault, Consul, Terraform, Vagrant, all battle-proven tools. The fact that there's one big player behind it really shows in how polished the tool, UI and documentation is.

The issue that I have with managed k8s is that these products will decrease the pressure to improve k8s documentation, tooling and setup itself. And then there's folks (like me) who want or need to run something like k8s on bare metal hardware outside of a cloud where the cloud-managed solution isn't available.

jsmith12673 · on May 29, 2020

I got a bit disillusioned with k8s and looked at Nomad as an alternative.

As a relatively noob sysadmin, I liked it a lot. Easy to deploy and easy to maintain. We've got a lot of mixed rented hardware + cloud VPS, and having one layer to unify them all seemed great.

Unfortunately I had a hard convincing the org to give it a serious shot. At the crux of it, it wasn't clear what 'production ready' Nomad should look like. It seemed like Nomad is useless without Consul, and you really should use Vault to do the PKI for all of it.

It's a bit frustrating how so many of the HashiCorp products are 'in for penny, in for a pound' type deals. I know there's _technically_ ways for you use Nomad without Consul, but it didn't seem like the happy path, and the community support was non-existent.

Please tell me why I'm wrong lol, I really wanted to love Nomad. We are running a mix of everything and its a nightmare

chucky_z · on May 30, 2020

Nomad + Consul is the happy path. Adding Vault into the mix is nice, but not required.

Consul by itself is the game-changer. Even in k8s it's a game-changer. It solves so many questions in an elegant way.

"How do I find and reach the things running in (orchestrator) with (unknown ip/random port) from (legacy)?" being the most important. You run 5 servers, and a relatively lightweight client on everything (which isn't even outright required, but it sure is useful!), and you get a _lot_ with that.

Consul provides multiple interfaces and ingress points to find everything. It also is super easy to operate, and has a pretty big community.

If you absolutely cannot have Consul, Nomad is still a really good batch job engine, and makes a very great "distributed cron," which is more extensible, scalable, and easy to use than something like Jenkins for the same task.

My team is pretty small (was 4 people, now 6) and we manage one of the worlds largest nomad and consul clusters (there are some truly staggeringly large users of Vault so I won't make that claim). Even when shit really hits the fan, everything is designed in a way that stuff mostly works; and there's enough operator friendly entry points that we can always figure out the problem.

jsmith12673 · on May 31, 2020

Interesting, thanks for sharing!

I'm assuming your team is using vault for PKI, but is there a similarly happy path for issuing certs without Vault.

I started off just using `openssl` but it all felt very janky, and I didn't really have any idea how CRLs should be setup

chucky_z · on May 31, 2020

Vault is great for just a PKI, even if you aren't using it for anything else. There are some tools that just do PKI, but Vault works a real treat at it. Any Terraform backend that supports encryption + Terraform + Vault gives you such an amazing workflow. We use a mix of short and long certs, with different roles based on what's getting a cert.

For now, we have CRLs disabled on all short-lived backends, enabled on long-lived backends and we're actually looking at disabling storing short-lived certs in the storage system at all, and just cranking the TTL down to really truly short. We've tested it as low as 30m, but a more real-world max-ttl is 1 week, with individual apps setting it as low as they can handle. For reference we run more than 10 PKI backends, and adding one (or a bunch) more is just a little terraform snippet for us.

The way it works via hashicorp template land, is that you just plop

    {{ with secret "name-of-pki/issue/name-of-role" "common_name=my.allowed.fqdn" "ttl=24h" }} {{ .Data.certificate }} {{ end }}

into your Nomad template stanza, or use consul-template directly as a binary, or use vault agent with it's template capability. You can get the CA chain if required the same way, just hitting a different PKI endpoint.

Also, as of Vault 1.4, Vault's internal raft backend is now production ready, making it a snap to run.

Try running through a few of the Vault quick-start guides, and replicating them in Terraform as much as possible. There's a few things TF does not handle gracefully last I checked (initial bootstrap), but you can get around that by using a null_resource or just handling that outside Terraform.

schmichael · on May 30, 2020

Nomad Team Lead here.

Edit: just noticed an actual Nomad user replied as well, and I like their answer better. Consider mine an addendum. :)

Batch workloads rarely require Consul, but for deploying your standard network services on Nomad: Consul is basically required. You could likely use any number of service mesh systems instead (either as sidecars, Docker network plugins, or soon CNI), but you'll be doing a lot of research and development on your own I'm afraid.

The Nomad team is by no means opposed to becoming more flexible in the future (and indeed better CNI support is landing soon as a first step), but we wanted to focus on getting one platform right and a pleasure to use before trying to genericize and modularize it.

jsmith12673 · on May 31, 2020

Thanks for reaching out! Since I have the chance I'll add - Nomad is pretty awesome, and I love the work your team is doing.

My org looked at Nomad at a time when there was a lot of pressure from above to deliver something as soon as possible. Two weeks just weren't enough to full lay of the land ¯\_(ツ)_/¯

Funny thing is even if I could plug in my own service discovery into Nomad, I would probably chuck it away and replace it with Consul after a few weeks anyway haha

kelnos · on May 29, 2020

I'm sympathetic toward the idea of a system made of interchangeable parts, but I also kinda feel like it's a bit unrealistic, maybe? Even with well-defined interfaces, there will always be interop problems due to bugs or just people interpreting the interface specs differently. Every new piece to the puzzle adds another line (or several) to a testing matrix, and most projects just don't have the time and resources to do that kind of testing. It's unfortunate, but IMO understandable that there's often a well-tested happy path that everyone should use, even when theoretically things are modular and replaceable.

0xbadcafebee · on May 30, 2020

Nomad isn't really feature mature or user friendly enough, you still eventually need 100 bolt-ons.

I think a Distributed OS is the only sane solution. Build the features we need into the kernel and stop futzing around with 15 abstractions to just run an isolated process on multiple hosts.

schmichael · on May 30, 2020

As the Nomad Team Lead I sympathize with your first statement, but I hope our continued efforts will dissuade you from the second.

Linux (and the BSDs) are remarkably stable, festureful, and resilient operating systems. I would hate to give up such a strong foundation. Nomad can crash without affecting your running services. Nomad can be upgraded or reconfigured without affecting your running services. Nomad can be observed, developed, and debugged as a unit often without having to consider the abstractions that sit above or below it. The right number of abstractions is a beautiful thing. Just no more and no less. :)

dnautics · on May 29, 2020

I'm resisting kubernetes and might go with nomad (currently I'm "just using systemd" and I get HA from the BEAM VM)... But I do also get the argument that the difference between kubernetes and nomad is that increasingly kubernetes is supported by the cloud vendors, and nomad supports the cloud vendors.

torvald · on May 29, 2020

I second this.

ravenstine · on May 29, 2020

> I still believe 90% of users would be better served by Nomad.

Well sure, but if the story just ended with "everyone use the least exciting tool", then there'd be few articles for tech journals to write.

But Kubernetes promises so much, and deep down everyone subtly thinks "what if I have to scale my project?" Why settle for good enough when you could settle for "awesome"? It's just human nature to choose the most exciting thing. And given that I do agree that there's some manufactured hype around Kubernetes, it isn't surprising to me why few are talking about Nomad.