Hi Chris, great question. There are a couple of advantages actually.
- You don't have to write logic to figure out which clients need to be updated with some data. You just run the queries that you need to to generate the data for the client, and then RethinkDB sends you an update only if the result of that specific query changes.
- Having the thing that modifies the data send the updates to the connected clients becomes increasingly difficult if you have multiple application servers. Then you would need to set up some separate message passing / broadcasting infrastructure if you also want to update clients connected to other servers. RethinkDB takes care of "passing the news around", even in a distributed environment.
- RethinkDB supports changefeeds not just on the raw data, but also on transformations of the data. Not all transformations are supported yet (for example map reduce queries are not), but our goal is definitely to support changefeeds on pretty much any query. Just knowing that the underlying data has changed isn't enough. In a traditional setting, you would still need to either recompute the whole query, or implement your own logic for incrementally updating their results for every type of query you want to run. RethinkDB updates query results incrementally and efficiently.
As I mentioned the code that writes the event to the db notifies the client about it, so it knows about the client. This isn't really an issue.
In the case of an exchange I never split a single order book across multiple servers, but I can imagine a lot of applications where this could be an issue. How do you handle data consistency across nodes? Ultimately you have to solve the same issue...
Again, this isn't a big limitation for my use case. That said your answer has certainly given me a greater understanding of other circumstances where it would be very useful. Thank you.
If you shard a RethinkDB table to split it across multiple servers, and then create a changefeed on the table, the database will automatically send changes from both servers. Basically, server management/sharding in RethinkDB is visible to ops people, but is completely abstracted from the application developer. All writes are immediately consistent.
Rethink doesn't provide ACID guarantees, though. If you want to make a change to multiple documents in a table and have ACID guarantees, I'd stick with traditional RDBMSes.
How are all writes immediately consistent? What if two clients make simultaneous writes to each shard? Or more extreme, what if there is a net split and writes continue on both shards?
I suppose I could just ask if you have an architecture document floating around :)
- You don't have to write logic to figure out which clients need to be updated with some data. You just run the queries that you need to to generate the data for the client, and then RethinkDB sends you an update only if the result of that specific query changes.
- Having the thing that modifies the data send the updates to the connected clients becomes increasingly difficult if you have multiple application servers. Then you would need to set up some separate message passing / broadcasting infrastructure if you also want to update clients connected to other servers. RethinkDB takes care of "passing the news around", even in a distributed environment.
- RethinkDB supports changefeeds not just on the raw data, but also on transformations of the data. Not all transformations are supported yet (for example map reduce queries are not), but our goal is definitely to support changefeeds on pretty much any query. Just knowing that the underlying data has changed isn't enough. In a traditional setting, you would still need to either recompute the whole query, or implement your own logic for incrementally updating their results for every type of query you want to run. RethinkDB updates query results incrementally and efficiently.
(I work for RethinkDB)