It conveniently doesn't mention the hardest part: conflicts.
If you just drop a conflicting edit then it's a stretch to call your app "offline". Yes, it "works" but who wants to use an application to at drops you edits?
I used to think this a lot, but then I find products that do exactly this and work reliably most of the time by simply asking you "Local Version" or "Server Version". This is what Steam does, what Nextcloud does, what PS5 does, and probably others. It's a naive approach but it seems to work well enough.
Beyond that, you get into complex territory but maybe we've all been overthinking the problem space.
It only works for certain classes of products. A game has decidedly few copies (will just be your own saves across multiple devices), and is already serializing in a singular high-level format. When you have business software that is used my multiple people, and has multiple different 'kinds' of things that are being reconciled (at varying levels of granularity), it gets really messy and really hard to hit the sweet spot.
Conflict resolution and the inability to come to a business decision on what to do has stopped offline-first support in a couple apps I've done. Its a really messy conversation to have, especially with people that aren't used to sweating the details. You can't just handwave away the complexity!
It doesn’t happen often, until you make e.g. a ticketing system with a pool of ticket handlers that claim a ticket to work on it, instead of a sole ticket handling responsible, and now it happens dozens of times a day and the customer is irate at such incompetent software.
I may at an earlier part of my career, when I knew far less about building robust software, have stumbled into such an experience.
Conflicts themselves are not hard: Keep a directed acyclic graph of immutable records. Changes to a record point to the parent/prior record. Two users update the same record, now you have a tree.
The challenge is interpreting what that tree structure should mean.
If you can, let a user decide how to resolve the conflict.
- User logs in, they have a “conflict inbox” of things that need to be resolved.
- Two coworkers make conflicting edits, maybe the manager gets a notification in their conflict inbox and they decide
Conceptually that's not hard, but in practice an approach like that can significantly increase the complexity of the app:
1. Do you store the tree structure for every table in your app? If you have 20 tables that could be edited offline, do you re-implement it for each table, or try to have a generic implementation?
2. Do you design all your tables around the tree structure, or do you just store it in addition to your "normal" tables?
3. Every piece of code that modifies one of these tables need to do it via the tree structure - if you update your tables directly from any place it could effectively cause conflicts.
4. Do you build separate UI to resolve conflicts for every table?
5. Do you query and cache the tree structure on the device, or does it have to be online to resolve conflicts?
6. Do you expose the tree structure via external APIs, or keep it internal?
I find that "last-write-wins" is sufficient for a large percentage of cases, and much simpler to implement. Or in some cases, just doing conflict detection is sufficient (notify the user that the data has changed between loading and saving, and they need to re-apply their edits).
If you do need conflict resolution on a large scale (many different tables), I'd recommend using data structures designed for that. CRDTs is one example - while it is typically used for automatic conflict resolution, it often stores enough data to allow manual resolution if desired.
The library works at the database-level and stores databases modifications as binary change sets in a separate table that models a graph similar to a git repository. Capturing modifications is as simple as wrapping the transaction with a handler provided by the library. The graph of database modifications is stored locally on each device, but can easily by synced with an online repository.
The library detects conflicts, and provides a handler to the application for conflict resolution.
If we want to discuss terminology, I agree I should have said "handling conflicts" instead of just "conflicts".
What you are describing is just the beginning of conflict handling. The consequences for bad handling are dire: data loss. If conflict handling and resolving was easy (hint: it's not), the article would have mentioned it.
Thank you. Do you see the difficulty in implementing the data structure to handle this? Or in the decision about what to do about it? Or about how to automate resolution?
Let’s take git as an example. If we both push changes to our branch, there’s no problem. We have a git repo with different branches. For a single record, this is even simpler, just an append-only table with a foreign key to the prior state.
If someone reviews a PR and finds a merge conflict, it gets handled. Maybe one wins, both get rejected, or both get accepted (a fork). But there’s no requirement that data be discarded.
But automating it seems impossible in all circumstances since it depends on the human intent.
The scenarios where people are simultaneously editing the same space online and offline are very rare. When both users are online, I use the UI to communicate who's editing what to avoid conflicts. I would like to improve this in the future, but for now it's a utilitarian scope/resource issue
Thats only solving the conflicts at a technical level. The hard problem to solve is what conflict-resolution means at a business level. The even harder problem to solve is auditing and coming up with an appropriate answer for _every single entity in the system_.
For most products, there isnt a one-size-fits-all answer to how to handle conflicts.
If you just drop a conflicting edit then it's a stretch to call your app "offline". Yes, it "works" but who wants to use an application to at drops you edits?