Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

S3 is strongly consistent now: https://aws.amazon.com/s3/consistency/


What a ridiculous marketing term. This is a RYW (Read your writes) level of consistency which is a far cry from Strong consistency (see https://jepsen.io/consistency). Seems like eventual consistency with some affinity bolted on.


That page does not give a hard definition for strong consistency, it says that it uses them informally as relative terms. AWS is not claiming serializability, they call it "strong read-after-write consistency." I don't see the problem here? S3 famously wasn't guaranteed to read data you had just written for a long time, and now it is. That's significant.

Here's more about the specifics: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcom...

In particular:

> Any read (GET or LIST request) that is initiated following the receipt of a successful PUT response will return the data written by the PUT request.

So this is stronger than RYW (emphasis mine).


I have always understood strong consistency to refer to linearizability or sequential consistency - i.e. all clients have the same view of the global order but with sequential consistency permitting slightly more flexibility in how operations of different clients can be reordered wrt each other.


I asked below but which property of linearizability is missing here? Is it the way it handles concurrent writes?


Hey, having actually looked at the link you provided, in fact both examples they give are linearizations so they could plausibly be providing linearizability (with respect to a single key). It's hard to say whether there are corner cases in which different clients could observe different orderings but if not then I stand corrected!


There was another thread where somebody claimed it was causally consistent. I’m sort of surprised Amazon hasn’t been clearer about this, but my feeling is that they would say it was linearizable if they were sure it was linearizable. Would love to read a real deep dive on this, I checked to see if Kyle Kingsbury had looked into it yet but he hasn’t.


> > Any read (GET or LIST request) that is initiated following the receipt of a successful PUT response will return the data written by the PUT request.

> So this is stronger than RYW.

I'm not sure that it is? The examples listed below that description only specify making an update and then immediately reading it back from the same process.


Look at the graphics in the section "Concurrent applications," specifically the first one.

At T0 Client 1 writes 'color = red.' Write 1 completes at T1.

At T2 Client 2 writes 'color = ruby.' Write 2 completes at T3.

At T4 Client 1 reads 'color = ruby,' the result of Write 2 from Client 2.

The explanation above says "Because S3 is strongly consistent, R1 and R2 both return color = ruby." There are clearly some subtleties (as explained further down the page) but I don't think Amazon are really being deceptive here.


Maybe it's just my suspicious-of-everything-distributed brain, but that diagram seems to assume a single universal time scale without any discussion of the implications.


You successfully nerdsniped me and I'm having a lot of trouble finding discussion of the formal implications of what they call "strong consistency" here, other than reading that they did in fact formally verify it to some extent. The best that I could find is this other HN thread where people claim it is causally consistent in a discussion about a Deep Dive (frustratingly shallow, as it happens): https://news.ycombinator.com/item?id=26968627


I have never heard strong consistency used to describe such a weak guarantee before - i.e. it's marketing bs. Usually strong consistency refers to linearizability (or at the least sequential consistency). The diagram a few pages in to this paper gives a nice overview: https://arxiv.org/abs/1512.00168


Yes I actually read that paper while I was digging around but it didn’t seem to help in this case because Amazon don’t specify whether reads made after a concurrent write is made are guaranteed to return the same value as each other. If they are I think the system would be linearizable, yes? Either way they don’t say linearizable anywhere and they describe it specifically as “read-after-write” so I think it would be wrong to assume linearizability. What’s missing from this model for linearizability? S3 doesn’t have transactions after all.


Isn't this definition CAP consistency?


CAP is defined wrt linearizability yes.


I'm not convinced that issues requests from the multiple clients for the same key actually matters. My speculation is that they map a key to their backend via some type of (consistent/rendezvous) hash and then ensure that all requests for said key lands on the same process/server* that contains the state for the key.

This means that for a specific key, you end up on 1 specific process. If you can ensure this, you basically get monotonic reads/writes along with RYW and Writes-Follow-Reads. All this maps to causal consistency so it is believable.

* The request could probably be sent to a read-only replica first but it could then forward it to the leader replica handling writes by examining some logical timestamp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: