# Distributed Systems

Nov 06, 2020

My plan with this post is to give the initial pointers for everyone's individual further research if they are so inclined. I'm clarifying that I'm learning all of this in a more formal way myself and am inviting other previously inexperienced (younger?) programmers and thinkers in this area to consider investing time into this fascinating and very broad subject.

We cannot possibly cover everything in one post, not even in one book. This area is an inexhaustible source of interesting problems and very effective but partial solutions to specific subsets of problems. The beauty of it is that all of it can very be well described with mathematics and computation primitives.

The most important notion to take away from this introduction is that there are always tradeoffs in Distributed Systems. We cannot have all the nice properties at the same time, something always has to give. The reason for this is the underlying "imperfection" (or better: reality) of the physical medium on which we are computing. See fallacies of distributed computing.

Distributed Systems are intimately connected with state, please read the previous (recently updated and expanded) post about State Machines as well.

### What are distributed systems?

Let's now begin with a definition from a famous research paper written in 1978:

A distributed system consists of a collection of distinct processes which are spatially separated and which communicate with one another by exchanging messages.

Here is one computer:

And here are two spatially separated computers:

They communicate with one another by exchanging messages:

If time behaved as we usually think it does, everything would be very simple and we always knew the Order of Things.

Second part of definition from the famous paper by Lamport:

A system is distributed if the message transmission delay is not negligible compared to the time between events in a single process.

Also in the paper:

A single computer can also be viewed as a distributed system in which the central control unit, the memory units, and the input-output channels are separate processes.

When considering a network of computers (or even a pair of computers) our logical unit (process) is one computer. Time needed for inter-process messages is much higher than time for internal messaging: $$t \ggg t_{int}$$.

As we saw with our state machines post where we draw the abstraction line matters. We have to decide at which level of detail are we actually looking.

Distributed systems as a network of spatially separated computers with non-negligible messaging delays are thus not concerned with implementation details of each computer separately. Each computer is a single process from this perspective. Abstraction is a very powerful mental model when working with both mathematics and computation. We only bring important things in focus at each level and we ignore all the internal details of implementation.

This concludes the introduction to the paper but does not explain any of its results or findings or explains its importance. Please watch an interview with the author of the paper linked at the bottom of this document. Interview takes some time but is very well worth it. It flows nicely and builds the tempo masterfully.

### Background story of the paper

With each historically important and broadly cited scientific paper it is great to know a little bit (a lot) of context of how it came about and other such "trivia". Fortunately we are able to obtain this incredibly interesting contextual information about this particular paper 🤩. Here it is:

💡 Consider reading the really succinct background story written by Leslie Lamport, the author of the paper.

These are the two important pieces of information:

Jim Gray once told me that he had heard two different opinions of this paper: that it’s trivial and that it’s brilliant. I can’t argue with the former, and I am disinclined to argue with the latter.

Read this excerpt to see what was the brilliant part and what was actually trivial:

The origin of this paper was the note The Maintenance of Duplicate Databases by Paul Johnson and Bob Thomas. I believe their note introduced the idea of using message timestamps in a distributed algorithm. I happen to have a solid, visceral understanding of special relativity. This enabled me to grasp immediately the essence of what they were trying to do.
Special relativity teaches us that there is no invariant total ordering of events in space-time; different observers can disagree about which of two events happened first. There is only a partial order in which an event e1 precedes an event e2 iff e1 can causally affect e2. I realized that the essence of Johnson and Thomas’s algorithm was the use of timestamps to provide a total ordering of events that was consistent with the causal order. This realization may have been brilliant. Having realized it, everything else was trivial.
Because Thomas and Johnson didn’t understand exactly what they were doing 🥺, they didn’t get the algorithm quite right; their algorithm permitted anomalous behavior that essentially violated causality. I quickly wrote a short note pointing this out and correcting the algorithm.

↑ 1st important piece of information is the importance of the notion of relative, subjective time for each process and implications for possible ordering of events that is even theoretically achievable.

Fundamental theory of distributed systems should be based on fundamental theory of information propagation through space-time and that is special relativity published in 1905 by Albert Einstein. Leslie Lamport was the first person to really understand the importance of this fundamental connection between mathematics, physics and (distributed) computing.

Read more about Special Relativity, 4D Space-Time Mathematical Construct and Partial Ordering of Events here:

And especially from this great book from 2018:

### Why is ordering of events so important in a distributed system?

The answer is: most distributed systems have aspects of State Machines and it is very important to apply state transitions in the 'correct order' or else we end up with inconsistent state.

Further comment from Leslie Lamport from the background story of his paper:

This is my most often cited paper. Many computer scientists claim to have read it. But I have rarely encountered anyone who was aware that the paper said anything about state machines 🙄. People seem to think that it is about either the causality relation on events in a distributed system, or the distributed mutual exclusion problem. People have insisted that there is nothing about state machines in the paper. I’ve even had to go back and reread it to convince myself that I really did remember what I had written 😒.

↑ 2nd important notion is that if we want to maintain a consistent distributed state, we have to take great care when thinking about the ordering of events.

We haven't talked about the byzantine type of distributed systems failures, this is not the topic for this introductory post, these developments came some time later in the research history of distributed systems. They bring the level of difficulty and attention needed when designing such systems to yet another level. With byzantine fault tolerant systems we have to account for not just physical reality of computation and proper ordering of events but also for lost messages, intentionaly faulty processes emitting wrong information (attacks on distributed system) etc. All of this builds firmly on Leslie Lamport's findings. All of it, be it modern blockchain systems or big data center synchronization protocols used by a few of our well known technological giants.

Leslie Lamport modestly says that people from Amazon told him that although they don't understand his Paxos algorithm (1989) completely, the practical and empirically tested difference between this and other algorithms from similar category is that Paxos always works and it never fails in any of the weird edge cases.

## Lost messages

Asynchronous nature of messaging between different processes of a Distributed System means that there is no common clock between processes and also no reliable upper bound of how much time messages are expected to take (we never know how long a message roundtrip should be, it's an arbitrary number). If we add in unreliable communication medium in the mix, we have a perfect storm ⛈️ How do we know if the message actually arrived? The answer is that usually we have to receive a confirmation (another message in reverse direction). But how long do we wait for a confirmation until deciding that the message is lost? What if the confirmation message gets lost?

Let's keep this subject open and investigate it further in the post about synchronous vs. asynchronous messaging.

## Distributed data stores

Second part of this post takes us one level higher: it is about distributed data stores. Distributed data stores can be thought of as a upgrade of the definition of Distributed Systems in some sense. What Leslie Lamport managed to define are not distributed data stores per se but rather a mechanism for distributed synchronization. This, however, is needed for any kind of distributed data store. Many more people built on top of Leslie Lamport's firm foundation and they created a whole blossoming ever growing field based on his lifetime of theories about concurrency.

The most mentioned concept at the conceptual level of distributed data stores is the famous  CAP theorem, also named Brewer's theorem.

CAP theorem is a trilemma (we have 3 possible options):

⚠️ Common misconception: our options are not Consistency (C), Availability (A) and Partition Tolerance (P) but any combination of two of these three concepts:

We can never have all three great things together.

More formally we calculate the number of our options with this equation:

$$\binom{n}{k}$$ where $$n=3$$ and $$k=2$$:

This means all possible combinations of k items out of a set of n items. This is how we calculate the actual result denominated with this special bracket symbol:

$\binom{n}{k} =\frac{n!}{(n-k)!k!}$

$$\binom{3}{2}=3$$, we interpret this as: there are 3 possible combinations when choosing 2 options out of 3 options that are available...

All this very obvious and kind of useless to write binomial equations but I had to throw this in because binomials are otherwise great and besides a small LaTeX excercise seemed useful. Sorry! This has nothing to do with distributed systems though, sorry again, back to the gist of distributed data stores now! 😸

As we saw, we can only pick two from the menu of three. This gives us these possibilities:

It should be mentioned that the exact description of what these CA, CP and AP options actually mean in practice can differ greatly. It depends on who exactly is explaining, what practical experience they have etc. There are indeed many compatible possible viewpoints.

Also be aware that the CAP theorem is not without its critiques.

### CAP theorem resources

I hope that you sensed (I certainly did, today, finally 🙃) that the CAP theorem is not in the same category of scientific results as some other formulations and theories, for example Lamport's. It is much muddier but still a very useful framework for thinking about distributed systems / data stores.

Here are some interesting slides suggesting that Partitions are almost always a part of distributed system and it's then a question of which of the remaining two properties Availability or Consistency we decide to give up. We further learn that there are at least three types of consistency:

• Weak consistency
• Strong consistency
• Eventual consistency

Each one with subcategories of their own! Each one not formally defined but still somewhat agreed on between modern researchers.

Partial conclusion: it is complicated but still understandable, however this is not strict math with formal proofs, it is more like a thinking framework which is evolving over time as well. CAP theorem might mean something a bit different a few years down the road. It will probably not change in its basic form but we will come to understand these tradeoffs even better as we test more systems in practice with ever growing scale.

Here is another interesting article called CAP Twelve Years Later: How the “Rules” Have Changed by the author of the original theorem himself.

To get even more sense of the nuance see this excerpt from here:

One last resource that hints about derived concepts of safety and liveness of distributed systems is here. This is just for my personal reference when I come back for more information / research but feel free to 🐠 explore as well, of course :)

## Conclusion

If we are not aware of the basics of what is possible and what can we expect, then the whole subject of computing is rather frustrating instead of enjoyable. We get used to be told that restarting devices or programs frequently is normal and that this is just how things are. The truth is that we can do better but the difference between 95% correct systems and 100% formally correct ones is more than just 5% in perspiration and time invested and usually it is not economical for companies to do a proper job when sometimes even 80% of the job is good enough for an average user. Note that 100% formally correct distributed system does not mean 100% reliability instead it means:

best theoretical raliability possible under given conditions and according to the agreed upon tradeoffs in the specification step

In general we can be certain that some "mysteries" of Distributed Systems are going to be a common knowledge in a decade or less. These kinds of systems are becoming prevalent around and among us.

Formal verification ✓

Formal verification of distributed systems is the future. You can learn more about it here, here and here. If you have additional 5.5h to spare for a more lightweight interview with Leslie Lamport, by all means listen to him speak about his life of research: part 1 | part 2. Somewhere in the interview he describes why he believes there are no errors in any of his many papers.

✓✓✓

UPDATE: this is probably the best resource for hard-core dive into different Byzantine-Fault Tolerant distributed protocols. Check out the entire tarilabs site for more great information.

🐇 Deeper and deeper down we went... Time to resurface!

Please enjoy the upcoming week as much as currently possible.

#### davidhq

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.