uds5501's comments

uds5501 · on March 20, 2025

I noticed the de-optimization bits when the compiler handles corner cases. How often did Vitess notice that the VM had to switch back to the old AST implementation?

tanoku · on March 20, 2025

Very very seldomly! Obviously we need to be correct in 100% of the cases, which is why the de-optimizer is there, but in practice it just never triggers. Right now the things that trigger a deopt are very limited — malformed time literals, oferflowing integer negations and very little else. The performance impact is essentially zero.

uds5501 · on March 7, 2025

[not-author] but found it really fascinating. My open questions - is the version monotonically increasing though? - how would one handle wrap arounds once the versions don't fit a datatype? - who decides the version to be assigned to a thread's attempt?

lvogel · on March 7, 2025

The version is just an atomic integer assigned to that btree node which is monotonically increasing. Each writer increases the version when it releases the lock IF it has modified the node.

Wraparounds are only a theoretical issue. There would have to be exactly UINT64_MAX writers between a reader first checking the version and verifying the version.

adrian_b · on March 7, 2025

"Optimistic locking" is a bad choice of words because no locking is optimistic.

In the well-known access method described here, the writers access the shared data with mutual exclusion, i.e. they use locking.

The readers use no locking, but they access the shared data concurrently and optimistically, hoping that the access will succeed at the first attempt.

When the readers are unlucky, they must retry the access.

So there is locking used by writers for mutual exclusion and there is optimistic access used by the readers.

There is no "optimistic locking", which is a contradiction in terms (locking is pessimistic).

In general, there are only 3 methods for accessing shared data: mutual exclusion (a.k.a. pessimistic access), where locking forces the accesses to be sequential, and 2 methods where accesses may be concurrent, optimistic access (a.k.a. lock-free), where retries may be necessary, and dynamic partitioning of the shared data (typically used for shared arrays or for shared buffers/queues), where neither locking nor retries are needed.

The method described here for accessing B-trees employs a combination of all 3 methods, because the release of the locks at higher levels is a consequence of restricting the future accesses to only a part of the shared data, i.e. the writers that access the shared B-tree start by accessing sequentially the root, but then they partition the tree between themselves, so the next accesses that fall in distinct subtrees may proceed concurrently.

layer8 · on March 8, 2025

“Optimistic locking” is a well-established and widespread terminology though, not the least due to its catchiness. The more factual, but unwieldy term is “optimistic concurrency control”.

adrian_b · on March 8, 2025

I agree that it is widespread, but whenever you see such illogical terms you have to wonder whether the authors who use them do not understand what they are really doing or they understand, but they succumb to conforming with a widespread inappropriate usage in the hope to be better understood by naive readers.

Understanding the difference between pessimistic access (mutual exclusion implemented by locking) and optimistic access (concurrent accesses with retries when necessary) is absolutely critical for the correct and efficient implementation of algorithms that use shared data structures.

It is frequent to combine both methods in an algorithm, like here, but in English that is not "optimistic locking", but at most "locking and optimistic access" or "optimism and locking".

Pessimistic access means that you expect that another process will attempt to access concurrently the shared data, so you must use a lock to prevent this. Optimistic access means that you expect that no other process will attempt to access concurrently the shared data, so you may proceed to access it immediately, but then you must have some means to detect that your assumption has been wrong and another process has interfered, when the transaction must be retried.

Depending on the application, either pessimistic access or optimistic access results in a better performance, neither is always better than the other. Optimistic access (lock-free access) makes better the best case, but it makes much worse the worst case. Depending on the frequency distribution of such cases optimistic access increases or decreases the performance.

Pessimistic access and optimistic access have the advantage of being applicable to any kind of shared data structure, but dynamic data partitioning, where applicable, like for this shared B-tree example, which can be partitioned in sub-trees accessed concurrently, normally results in better performance than both pessimistic access and optimistic access, by being deterministic and avoiding both locking and retries. Dynamic data partitioning may require locking for a very short time in order to partition the shared resource before accessing the allocated part, though the mutual exclusion provided by atomic instructions may be sufficient for this purpose. It is frequent that an atomic fetch-and-add instruction is enough to partition a shared data structure, like a shared array or a shared message queue, between concurrent processes attempting to access it.

xxs · on March 7, 2025

I chuckle every time anyone has tried to handle long(64bit) overflows while incrementing by one.

Other than that, the version checks + retries have been a thing since forever (they are the most bog standard way to do any lock-free datastructures, along with database updates in the same manner). They do need a back off, though.

pfent · on March 7, 2025

When incrementing the version with 4 GHz, it takes over 100 years non-stop for a 64bit wraparound.

xxs · on March 7, 2025

likely even more as it has to be an atomic operation which causes extra latency and coherency traffic