Redlock discussion


  1. On Monday I published a blog post called “How to do distributed locking”. It was a kind of two-in-one post: firstly, I wanted to explain why fencing is necessary when a shared resource is protected by a lock, and how to implement it. Secondly, I wanted to discuss a case study: the Redlock algorithm, proposed by Salvatore Sanfilippo (antirez) as a way of implementing distributed locks on top of Redis. Redlock is interesting because it makes fairly strong timing assumptions, and so it's worth exploring to what degree those assumptions are reasonable.
  2. Several people found the blog post useful and/or interesting, for example:
  3. There was also an extensive discussion of the article on Hacker News (a website that I normally don't pay much attention to, but I'll mention for the sake of completeness):
  4. Statements made confidently on Hacker News should be taken with a grain of salt, as exemplified by a commenter who cited Herlihy's consensus numbers, but got a key fact wrong:
  5. One of my assertions about distributed locking was that the safety properties of an algorithm should not assume bounded clock skew. Julia Evans wrote a follow-up post exploring how realistic a bounded-clock-skew assumption is, and came to the conclusion that it is not a safe assumption (also, Julia's enthusiastic and inquisitive writing style is wonderful):
  6. Flavio Junqueira, who knows way more about distributed consensus than me (after all, he's one of the key people behind ZooKeeper), wrote a follow-up post giving some more detail on how distributed locks and fencing are used in the context of Kafka and BookKeeper:
  7. However, most keenly awaited was Salvatore's response. Although my post was only partly a critique of Redlock, people wanted to know what the author of Redlock would have to say. I had sent a draft of my article to Salvatore a week in advance of publication, and we had a good discussion by email. He posted his public response on Tuesday:
  8. I think Salvatore's post is helpful, as it explains the assumptions and thought process behind Redlock, so users can decide for themselves whether they think the assumptions are reasonable. It was followed by heated HN debate:
  9. I mostly stayed out of the discussion, because I think my original article already says everything I want to say on the topic. There is certainly room for debate about which timing assumptions are reasonable, but in the end that comes down to how your systems are managed, which is up to you. I can only point out the trade-offs, potential pitfalls, and existing research in this area.
  10. Salvatore correctly pointed out that I had missed one particular clock check in my analysis of Redlock:
  11. However, I don't think that check substantively changes the properties of the algorithm. It removes a dependency on network delay in one place, but the dependency on network delay remains in other places: