To add another data point: After working with ES for the past 10 years in production I have to say that ES is never giving us any headaches. We've had issues with ScyllaDB, Redis etc. but ES is just chugging along and just works.
The one issue I remember is: On ES 5 we once had an issue early on where it regularly went down, turns out that some _very long_ input was being passed into the search by some scraper and killed the cluster.
I agree, and I don't get where the claims that ES is hard to operate originate from. Yeah, if you allow arbitrary aggregations that exceed the heap space, or if you allow expensive queries that effectively iterate over everything you're gonna have a bad time. But apart from those, as long as you understand your data model, your searches and how data is indexed, ES is absolutely rock-solid, scales and performs like a beast. We run a 35-node cluster with ~ 240TB of disk, 4.5TB of RAM, and about 100TB of documents and are able to serve hundreds of queries. The whole thing does not require any maintenance apart from replacing nodes that failed from unrelated causes (hardware, hosting). Version upgrades are smooth as well.
The only bigger issue we had was when we initially added 10 nodes to double the initial capacity of the cluster. Performance tanked as a result, and it took us about half a day until we finally figured out that the new nodes were using dmraid (Linux RAID0) and as a result the block devices had a really high default read-ahead value (8192) compared to the existing nodes, which resulted in heavy read amplification. The ES manual specifically documents this, but since we hadn't run into this issue ourselves it took us a while to realise what was at fault.
The thing I like about ES: When the business comes around and adds new requirements out of nowhere, the answer is always: "Yup, we can do it!" Unlike other tools such as Cassandra that force a data design from the get go and make it expensive to change later on.
And they can pay the vendors for "bring your own cloud" or similar. If data sovereignty is important to them, then they can probably afford it. And if cost is an issue, then they wouldn't be looking at hosted solutions in the first place.
Nobody is actively looking after it. Good alerting + monitoring and if there's an alert like a node going down because of some Kubernetes node shuffling or a version upgrade that has to be performed one of our few infra people will do that.
It's really not something that needs much attention in my experience.
The one issue I remember is: On ES 5 we once had an issue early on where it regularly went down, turns out that some _very long_ input was being passed into the search by some scraper and killed the cluster.