MongoDB has supported the equivalent of Postgres' serializable isolation for man...

zihotki · on July 7, 2024

Or is it? Jepsen reported a number of issues like "read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions."

That report (1) is 4 years old, many things could have changed. But so far any reviewed version was faulty in regards to consistency.

1 - https://jepsen.io/analyses/mongodb-4.2.6

pipe_connector · on July 7, 2024

Jepsen found a more concerning consistency bug than the above results when Postgres 12 was evaluated [1]. Relevant text:

We [...] found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable

I have run Postgres and MongoDB at petabyte scale. Both of them are solid databases that occasionally have bugs in their transaction logic. Any distributed database that is receiving significant development will have bugs like this. Yes, even FoundationDB.

I wouldn't not use Postgres because of this problem, just like I wouldn't not use MongoDB because they had bugs in a new feature. In fact, I'm more likely to trust a company that is paying to consistently have their work reviewed in public.

1. https://jepsen.io/analyses/postgresql-12.3

endisneigh · on July 7, 2024

That’s been resolved for a long time now (not to say that MongoDB is perfect, though).

nick_ · on July 7, 2024

I just want to point out that 4 years is not a long time in the context of consistency guarantees of a database engine.

I have listened to Mongo evangelists a few times despite my skepticism and been burned every time. Mongo is way oversold, IMO.

vorticalbox · on July 7, 2024

That is for mongo 4.x but latest stable is 6.0.7 which has note More resilient operations and Additional data security.

https://www.mongodb.com/blog/post/big-reasons-upgrade-mongod...

mmcclimon · on July 8, 2024

FWIW, the latest stable release is 7.0.12, released a week or so ago: https://www.mongodb.com/docs/upcoming/release-notes/7.0/. (I'm not sure why the URL has /upcoming/ in it, actually: 7.0 is definitely the stable release.)

throwup238 · on July 7, 2024

> I'm not sure what "with strong consistency benefits" means.

"Doesn't use MongoDB" was my first thought.

danpalmer · on July 8, 2024

MongoDB had "strong consistency" back in 2013 when I studied it for my thesis. The problem is that consistency is a lot bigger space than being on or off, and MongoDB inhabited the lower classes of consistency for a long time while calling it strong consistency which lost a lot of developer trust. Postgres has a range of options, but the default is typically consistent enough to make most use-cases safe, whereas Mongo's default wasn't anywhere close.

They also had a big problem trading performance and consistency, to the point that for a long time (v1-2?) they ran in default-inconsistent mode to meet the numbers marketing was putting out. Postgres has never done this, partly because it doesn't have a marketing team, but again this lost a lot trust.

Lastly, even with the stronger end of their consistency guarantees, and as they have increased their guarantees, problems have been found again and again. It's common knowledge that it's better to find your own bugs than have your customers tell you about them, but in database consistency this is more true than normal. This is why FoundationDB are famous for having built a database testing setup before a database (somewhat true). It's clear from history that MongoDB don't have a sufficiently rigorous testing procedure.

All of these factors come down to trust: the community lacks trust in MongoDB because of repeated issues across a number of areas. As a result, just shipping "strong consistency" or something doesn't actually solve the root problem, that people don't want to use the product.

pipe_connector · on July 8, 2024

It's fair to distrust something because you were burned by using it in the past. However, both the examples you named -- Postgres and FoundationDB -- have had similar concurrency and/or data loss bugs. I have personally seen FoundationDB lose a committed write. Writing databases is hard and it's easy to buy into marketing hype around safety.

I think you should reconsider your last paragraph. MongoDB has a massive community, and many large companies opt to use it for new applications every day. Many more people want to use that product than FoundationDB.

daniel-grigg · on July 8, 2024

Can you elaborate on why ‘many large companies’ are choosing MongoDB over alternatives and what their use cases are? I’ve been using Mdb for a decade and with how rich the DB landscape is for optimising particular workloads I just don’t see what the value proposition is for Mdb is compared to most of them. I certainly wouldn’t use it for any data intensive application when there’s other fantastic OLAP dbs, nor some battle hardened distributed nodes use case, so that leaves a ‘general purpose db with very specific queries and limited indexes’. But then why not just use as PG as others say?

azurelake · on July 13, 2024

I’d be curious to hear more detail about the FoundationDB data loss issue that you saw? Do you remember what version / what year that you saw it?

nijave · on July 8, 2024

Have you looked at versions in the last couple years to see if they've made progress?

danpalmer · on July 8, 2024

This kinda misses my point. By having poor defaults in the past, marketing claims at-odds with reality, and being repeatedly found to have bugs that reduce consistency, the result is that customer have no reason to trust current claims.

They may have fixed everything, but the only way to know that is to use it and see (because the issue was trusting marketing/docs/promises), and why should people put that time in when they've repeatedly got it wrong, especially when there are options that are just better now.

nijave · on July 9, 2024

Right, I was curious if you put even more time in :)

I see lots of comments from people insisting it's fixed now but it's hard to validate what features they're using and what reliability/durability they're expecting.

throwaway2037 · on July 8, 2024

    > my thesis

Can you share a link? I would like to read your research.

Izkata · on July 7, 2024

> MongoDB has supported the equivalent of Postgres' serializable isolation for many years now.

That would be the "I" in ACID

> I'm not sure what "with strong consistency benefits" means.

Probably the "C" in ACID: Data integrity, such as constraints and foreign keys.

https://www.bmc.com/blogs/acid-atomic-consistent-isolated-du...

lkdfjlkdfjlg · on July 7, 2024

> Pongo - Mongo but on Postgres and with strong consistency benefits.

I don't read this as saying it's "MongoDB but with...". I read it as saying that it's Postgres.

jokethrowaway · on July 7, 2024

Have you tried it in production? It's absolute mayhem.

Deadlocks were common; it uses a system of retries if the transaction fails; we had to disable transactions completely.

Next step is either writing a writer queue manually or migrating to postgres.

For now we fly without transaction and fix the occasional concurrency issues.

pipe_connector · on July 7, 2024

Yes, I have worked on an application that pushed enormous volumes of data through MongoDB's transactions.

Deadlocks are an application issue. If you built your application the same way with Postgres you would have the same problem. Automatic retries of failed transactions with specific error codes are a driver feature you can tune or turn off if you'd like. The same is true for some Postgres drivers.

If you're seeing frequent deadlocks, your transactions are too large. If you model your data differently, deadlocks can be eliminated completely (and this advice applies regardless of the database you're using). I would recommend you engage a third party to review your data access patterns before you migrate and experience the same issues with Postgres.

akoboldfrying · on July 7, 2024

>Deadlocks are an application issue.

Not necessarily, and not in the very common single-writer-many-reader case. In that case, PostreSQL's MVCC allows all readers to see consistent snapshots of the data without blocking each other or the writer. TTBOMK, any other mechanism providing this guarantee requires locking (making deadlocks possible).

So: Does Mongo now also implement MVCC? (Last time I checked, it didn't.) If not, how does it guarantee that reads see consistent snapshots without blocking a writer?

devit · on July 8, 2024

Locking doesn't result in deadlocks, assuming that it's implemented properly.

If you know the set of locks ahead of time, just sort them by address and take them, which will always succeed with no deadlocks.

If the set of locks isn't known, then assign each transaction an increasing ID.

When trying to take a lock that is taken, then if the lock owner has higher ID signal it to terminate and retry after waiting for this transaction to terminate, and sleep waiting for it to release the lock.

Otherwise if it has lower ID abort the transaction, wait for the conflicting transaction to finish and then retry the transaction.

This guarantees that all transactions will terminate as long as each would terminate in isolation and that a transaction will retry at most once for each preceding running transaction.

It's also possible to detect deadlocks by keeping track of which thread every thread is waiting for and signaling the either the highest transaction ID in the cycle or the one the lowest ID is waiting for to abort, wait for ID it was waiting for terminate and retry.

akoboldfrying · on July 14, 2024

Yes, I'm aware that deadlock can be avoided if the graph having an edge uv whenever a task tries to acquire lock v while already holding lock u is acyclic, and this property can either be guaranteed by choosing a total order on locks and then only ever acquiring them in this order or, or dynamically maintained by detecting tasks that potentially violate this order and terminating them, plus retries.

However, those techniques apply only to application code where you have full control over how locks are acquired. This is generally not the case when feeding declarative SQL queries to a DBMS, part of whose job is to decide on a good execution plan. And even in application code, assuming a knowledgeable programmer, they need to either know about all locks in the world or run complex and expensive bookkeeping to detect and break deadlocks.

The fundamental problem is that locks don't compose the way other natural CS abstractions (like, say, functions) do: https://stackoverflow.com/a/2887324

pipe_connector · on July 7, 2024

MongoDB (via WiredTiger) has used MVCC to solve this problem since transactions were introduced.

threeseed · on July 7, 2024

> Next step is either writing a writer queue manually

You can just use a connection pool and limit writer threads.

You should be using one to manage your database connections regardless of which database you are using.