Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's also rqlite. There's definitely a place for this kind of stuff. But we already use a bunch of stuff that does distributed consensus in our stack, and the experience has left us wary of it, especially for global distribution. We almost used rqlite for a statekeeping feature internally, but today we'd certainly just use sqlite+litestream for the same kinds of features, just because it's easier to reason about and to deal with operationally when there's problems.

https://fly.io/blog/a-foolish-consistency/



rqlite author here. Anything else you can tell me about why you decided against it? Just simpler, as you say, to avoid a distributed system when you can (something I understand).


We like rqlite a lot. There's some comments in your issue tracker from Jerome about it at the time. The decision wasn't against rqlite as a piece of software so much as it was us deliberately deciding not to introduce more Raft into our architecture; any place there is Raft, we're concerned we'll essentially need to train our whole on-call rotation on how to handle issues.

The annoying thing about global consensus is that the operational problems tend to be global as well; we had an outage last night (correlated disk failure on 3 different machines!) in Chicago, and it slowed down deploys all the way to Sydney, essentially because of invariants maintained by a global Raft consensus and fed in part from malfunctioning machines.

I think rqlite would make a lot of sense for us for applications where we run multiple regional clusters; it's just that our problems today tend to be global. We're not just looking for opportunities to rip Raft out of our stack; we're also trying to build APIs that regionalize nicely. In nicely-regionalized, contained settings, rqlite might work a treat for us.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: