VoidPapers: Time, Clocks, and the Ordering of Events in a Distributed System
Reading
-Time, Clocks, and the Ordering of Events in a Distributed System
-Leslie Lamport
-Massachusetts Computer Associates, Inc.
-https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Time-Clocks-and-the-Ordering-of-Events-in-a-Distributed-System.pdf
Recently, I read the classic "Time, Clocks, and the Ordering of Events in a Distributed System" paper by Leslie Lamport. I had lots of fun learning about time and order in distributed systems. The preface to the paper is interesting, you should definitely check it out.
The concept of time is fundamental to how we view the world and our existence. Similarly, [computer] systems need to “think” about how to deal with time. When thinking about time, one has to ask questions about the order in which an event occurred in relation to time — before a certain time or after the said time. There are various concepts discussed in this paper, but for the sake of this post, we shall discuss only partial ordering and logical clocks. In this post, I gloss over the mathematical aspects of the concepts — I highly recommend reading the original paper for a more in-depth view of the topics discussed in this post.
Early on in the paper, the author establishes that order is a building block of the concept of time and highlights the nuances of thinking about order in distributed systems we shall begin by talking about the "happens before" relationship and how the relationship is different in distributed systems than physical theories of time.
Partial Ordering
When thinking about time in distributed systems, it becomes tricky to say which of multiple events occurred first. The “happened before” relation in distributed systems is often a partial ordering which is dependent on other events in the system.
Following the physical theories of time that governs our universe we can define event a as happening before event b if it occurred at an earlier time, however within distributed systems the happened before relation with respect to physical time does not apply — well unless the distributed system is based on physical clocks and even then, there is still the problem that such clocks are not accurate. For most of the rest of the paper, the author goes on to discuss the happened before relation independent of physical clocks[&time].
The author explains that for a system to meet a specification correctly, the specification must be given in terms of events observable within the system. To dive deeper, we need to have a better picture of the system being discussed. The author describes the system as being composed of a collection of processes with each process consisting of a sequence of events — an event can be the execution of a subprogram or the execution of a single machine instruction. Another assumption being made is that the events of a process make up a sequence, where the a event occurs before the b event in this sequence if a happens before the b event. Assuming that sending a message is an event in a process, we can say a happens before b if a sends a message before b does.
The happened before relation is denoted as “—>” and is defined on a set of events of a system as the smallest relation satisfying the following three conditions:
If a and b are events in the same process, and a comes before b, then a —> b.
If a is sending a message by one process and b is the recipient of the same message by another process, then a —> b.
If a —> b and b —> c then a —> c
If you have three events a, b & c, then they are totally ordered if there is a clear requirement that they always have to happen in the order a —> b —> c. However, if there is a requirement that a must happen before c irrespective of when the b event happens, then the events a, b & c are partially ordered.
Logical Clocks
While you might be tempted to think about [logical] clocks in relation to time, It is important you don't. Logical clocks provide a way of assigning a number to an event in a process. This number assigned by the clock can be thought of as the time the event occurred. Each process in a distributed system is expected to hold its own clock that is incremented periodically by a positive number. The logical clocks are essentially functions that assign numbers to events in a process. While logical clocks can relate to physical time, It is also possible that they can be implemented by counters with no actual timing logic.
The author explores what it means for a clock to be accurate. The definition of accuracy cannot be based on physical time but instead on the order of events in a system. We should look at the happened before the relationship of events in a process to see if the clock is accurate. The clock condition defines the correctness of time — If an event a occurs before another event b then a should happen at an earlier time than b.
Building on our earlier assumption, Every event being sent from one process to another must contain a timestamp[Time at which it was sent]. On receiving a timestamped message, a process must increase its clock to a time later than the timestamp message. The image below further illustrates this idea.
Summary
So far we've been able to cover some concepts Lamport discussed in this paper. I hope you had fun reading this.
Here are some further resources I'll recommend you checkout to get more understanding of logical clocks and partial ordering.
Martin Kleppman in his awesome distributed systems lecture series discusses logical clocks.
https://levelup.gitconnected.com/distributed-systems-physical-logical-and-vector-clocks-7ca989f5f780
I hope you enjoyed reading this VoidPaper issue! Remember to subscribe to get the latest posts in your inbox as soon as they go live.