There's also rqlite. There's definitely a place for this kind of stuff. But we a...

otoolep · on May 9, 2022

rqlite author here. Anything else you can tell me about why you decided against it? Just simpler, as you say, to avoid a distributed system when you can (something I understand).

tptacek · on May 9, 2022

We like rqlite a lot. There's some comments in your issue tracker from Jerome about it at the time. The decision wasn't against rqlite as a piece of software so much as it was us deliberately deciding not to introduce more Raft into our architecture; any place there is Raft, we're concerned we'll essentially need to train our whole on-call rotation on how to handle issues.

The annoying thing about global consensus is that the operational problems tend to be global as well; we had an outage last night (correlated disk failure on 3 different machines!) in Chicago, and it slowed down deploys all the way to Sydney, essentially because of invariants maintained by a global Raft consensus and fed in part from malfunctioning machines.

I think rqlite would make a lot of sense for us for applications where we run multiple regional clusters; it's just that our problems today tend to be global. We're not just looking for opportunities to rip Raft out of our stack; we're also trying to build APIs that regionalize nicely. In nicely-regionalized, contained settings, rqlite might work a treat for us.