26. Clocks Revisited

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Real-Time Systems

Real-time systems need to be able to measure time! Consider the example of a program to control a stepping motor. Stepping motors are motors that rotate a fixed increment every time an electronic step signal is sent to them, so long as the step signals are not sent in a manner that would require exceeding the motor's speed, acceleration or torque limits. Stepping motors are quite common in computer controlled mechanisms, and the step signals for the motor are usually issued by software. The following example illustrates a fragment of a simple stepping motor control program:

       loop until position = destination
         delay( steptime() );
         step(); position++;
       end loop

The function steptime computes the appropriate time per step as a function of the current motor velocity and whether the motor should be accelerating or decelerating.

Clock Servers

On a centralized system, this bit of code requires a real-time clock with an interface that privides a separate delay timer for each process. In a distributed real-time system, the problem arises of how to deal with time. A natural solution is to install a single real-time clock somewhere on the system. A clock server process could await messages from client processes that wish to be informed of the arrival of some particular time or the passage of some particular interval.

A typical message to the clock server process might have the meaning "send me a reply after one second" or "send me a message at ten AM". The clock server would maintain a data structure listing the outstanding requests, and when one of the requested times arrives, it would send a reply.

There are three problems with this approach. First, if communication delays are significant, the accuracy of the clock service will be degraded. Second, if the system is large, the clock server may become a bottleneck, limiting system performance. Third, if messages are lost, recovery is difficult because most protocols for detecting message loss rely on the availability of a clock!

Multiple Clock Servers

All three of these objections lead to the desire to use multiple clock servers, usually one on each computer in the network; this eliminates the possibility of lost messages to the clock server and it eliminates communication delays. The problem with this is that all of these clocks must be synchronized if the distributed system is to provide useful real-time service!

Absolute clock synchronization is impossible! The best that can be done is to guarantee that two clocks are within some error bound of each other. As a consequence, the relative times of two events on a system cannot be exactly compared. Either one event is clearly before the other, or the two events occurred within the error bound of the clocks used, in which case we cannot say which event occurred first, an in fact, different observers may draw different conclusions about which was first!

Characteristics of Real Clocks

Real clocks never keep exact time. Consider an ideal clock. The clock hands on this clock show the real time, and the ideal clock ticks exactly once per second. Unfortunately, this ideal cannot be built! Real clocks never keep exact time; the builder of the clock hopes it will tell the real time and the builder hopes that it will tick exactly once per second, but all of those who build real clocks know that the best that they can achieve are clocks that show the approximate time and that tick approximately once per second.

Inaccuracies in real clocks can be classified into two large categories, drift and jitter. Drift is the result of systematic errors that cause a clock to run fast or slow; these errors are the result of our inability to construct any two mechanisms that measure exactly the same in any dimension, be it length, mass, volume, speed or frequency. Jitter comes from the fact that we cannot build systems that remain constant in any dimension. Unless we keep the temperature at absolute zero, which we cannot do, atoms constantly diffuse, thermal effects add random physical vibration and electronic noise, and many other effects. As a result of jitter, real clocks do not tick exactly once per second, but only at an average rate that the builder hopes will be once per second. One consequence of jitter is that the ticking of a real clock is described mathematically as a random walk, and this, in turn, guarantees that the clock will eventually drift from its initial settings.

If you could graph the time of day shown on a real clock versus the time of day shown on an ideal clock, where you would hope for a straight diagonal line, you would instead see a slightly wavy line that starts out with errors centered on the diagonal but slowly and systematically departs from the 45 degree diagnonal. The wavyness of this line is a random walk determined by the clock jitter, while the slow departure from the diagonal is the result of clock drift.

As a result of these considerations, the times shown on any collection of independent real clocks will slowly diverge, even if they are all identical and perfectly calibrated. In fact, if you graph the times shown on any two independent physical clocks, the result will be very similar to the graph of one clock versus an ideal clock, except that the jitter and drift you observe will be the sum of the errors of the two clocks.

It should be noted that truly independent clocks are very hard to make! When two clocks are physically near each other, they may influence each other in unexpected and subtle ways. There's a well known story of an 19th century observatory that had three pendulum clocks, and to the surprise of the timekeepers, the pendulums in all three clocks remained perfectly synchronized. It turned out that the clocks were so precisely tuned to the same frequency that once one clock was within a few milliseconds of ticking, the sound of the tick of another clock would jar its escapement free, forcing it to tick. As a result, if the clocks were started randomly, the natural random drift of the clocks would slowly bring them into near synchronization, and once they were synchronized, they remained exactly synchronized. Radio frequency coupling can similarly synchronize modern quartz clocks that are near each other.

Clock Synchronization Using a Central Authority

If we have a central authority that is trusted, for example, the National Institute of Standards atomic clocks in Fort Collins Colorado, we can synchronize all clocks to that authority. The time from the NIST clocks is broadcast continuously on the WWV radio station; For many years, Heathkit sold a WWV receiver with an RS-232 port on the back. This receiver even had a correction you could set to take into account the propagation delay of the time signal as it traveled from Fort Collins to wherever your receiver was located.

The GPS (Global Positioning System) satelites provide another relatively authoritative time base; each of these satelites includes a local clock of very high quality, and the broadcast of their local time-of-day is available anywhere on or near the earth.

Adjusting the time.

Given an authority, for example, GPS or WWV, what do you do if the authority disagrees with your local clock? A simple assignment of a new value to the local time of day can cause chaos! If the time of day jerks forward suddenly, there may be no damage (but in real-time control systems, for example, the stepping motor example, even this may cause loss of control). If the time of day jerks backward, some time of day will occur twice, and time will briefly appear to move backward, with the consequence that, for example, some timestamps will appear to have been made in the future. This can lead to severe problems even in non-realtime applications such as database systems.

The solution to this problem is to avoid direct assignments to the time of day. Instead, if the official time of day is ahead of the local realtime clock, the local clock can be run fast until the clocks agree, and if the official time of day is behind the local clock, the local clock can be run slower until the clocks agree.

The Unix system service adjtime() works this way. A call to adjtime() can be used to request the system clock run fast or slow until it gains or loses the required amount. This is, of course, a privileged system call!

The adjustments to the speed of the clock made by these services do not involve changing the frequency of the oscillator inside the clock (typically a quartz crystal). Instead, the adjustments are done entirely in software.

For clock software based on a periodic clock interrupt, for example, one that gives a nominal 100 interrupts per second, the clock might maintain the time of day by counting milliseconds. Normally, the clock software would add 10 milliseconds to the time of day every time there is a clock interrupt. If the clock is instructed to "run fast" in order to gain time, the clock software might add 11 milliseconds for every clock interrupt, while if the clock has been instructed to "run slow", it might add only 9 milliseconds per clock interrupt.

The problem is a bit more complex when the underlying clock hardware is based on a programmable interval timer, but the solution is equivalent. If the next clock event is to be scheduled 100 milliseconds into the future and the clock has been instructed to make up time, it might set the timer to 90 milliseconds and then, when the timer interrupt arrives, it would update the time of day variable by 100 milliseconds. Similarly, if the clock has been instructed to lose time, it might set the timer to interrupt after 110 milliseconds.

Note, in real systems, that the hardware clock is usually precise to much better than 1 part in 100,000 (1 second per day), and that when clock is found to be in error, it is typically reset to its correct value by adjustments to the clock speed on the order of 1 part in 100 or slower. The 1 part in 10 adjustments used in the above examples would be far too abrupt for many applications.

Synchronizing one clock to another

To keep one clock in a network synchronized to another more authoritative clock on the same network, the following code would typically be called periodically:

       send time request to remote server;
       t1 = local time;
       await t, the time sent by the remote server;
       t2 = local time;
       advance local time by t - (t2+t1)/2 time units.

This code assumes that the remote server responds immediately when it gets a time request, and that the outgoing network delay is comparable to the incoming network delay. As a consequence, (t2+t1)/2 is an estimate of the time at which the remote server inspected its clock in order to get the time t.

If the interval t2-t1 is significantly greater than the shortest measured interval in previous transactions, it may be prudent to assume that there was contention for resources somewhere in the system, and thus that (t2+t1)/2 is not a good estimate, and thus, that the returned time should not be accepted. This leads to the following more robust version of the code:

       send time request to remote server;
       t1 = local time;
       await t, the time sent by the remote server;
       t2 = local time;
       triptime = t2-t1;
       if triptime < mintriptime
         then mintriptime = triptime;
         else mintriptime = mintriptime + epsilon;
       if triptime < (mintriptime + tolerance)
         then try to gain t - (t2+t1)/2 time units;

Exercise: Why, in the above code, does the variable mintriptime get incremented by epsilon when we do not see a new minimum? Why not leave it at the lowest value ever recorded?

Synchronization in the Absence of Authority

Given two computers, each with an unreliable clock, and each running the above algorithm periodically to keep their clocks synchronized, it is not hard to show that the maximum difference between the two clocks will be bounded by the rate at which they drift apart without synchronization times the interval at which they are resynchronized, and that if the resynchronizations occur randomly, not favoring one or the other clock, the rate of drift of the composite time relative to real time will be the average of drift rates of the individual clocks.

There are several generalizations of this algorithm to more than two clocks. For example, each clock in the group could synchronize to one other, for example, by arranging the clocks in a logical ring, with each clock synchronizing itself to the time given by the clock to its left.

A more interesting alternative is to have each clock estimate its error relative to each of the other clocks in the system and then compute an average of the error estimates, possibly weighting this average by how much trust each clock places in each of the others and discarding measurements from other clocks that appear to be unacceptable, for example, because of trip times that are too long. After each cycle of estimating errors, this average can be used to adjust the speed of the local clock.

       samples = 0
       sum = 0
       for i set to each of n remote clocks
         send time request to remote server i;
         t1 = local time;
         await t, the time sent by remote server i;
         t2 = local time;
         triptime = t2-t1;
         if triptime < mintriptime[i]
           then mintriptime[i] = triptime;
           else mintriptime[i] = mintriptime[i] + epsilon;
         if triptime < (mintriptime[i] + tolerance)
           error = t - (t2+t1)/2
           sum = sum + error
           samples = samples + 1
       endfor
       average = sum / (samples + 1)
       try to gain average time units;

Exercise: Why is the divisor used in computing the average of the errors relative to the remote clocks one more than the number of clocks sampled?

The expected drift rate of a composite clock constructed from a number of identical local clocks using the above algorithm will be the average of the drift rates of each independent clock. If all clocks are weighted equally, the resulting drift rate will be the average of the independent clock drift weights. It is not hard to construct a weighted average, so that the clocks tend to synchronize with the subset of the clocks believed to be more accurate.

The expected error of any particular clock relative to the composite clock will depend on the inherent drift rate of that clock and on the frequency with which it is resynchronized. Frequent resynchronization does not improve the drift rate of the composite, it merely keeps the times reported by the clocks that make up that composite closer to each other.

This scheme still works if each clock only synchronizes with a subset of the others. For example, if each clock periodically synchronizes with a randomly selected subset of the clocks, there is only one effect: each clock will, on the average, stay within a larger bound of the average seen on all the clocks than if all clocks were accounted for on each round of synchronization. Because the random subsets do not weight one clock more than another, the drift of the composite clock will be exactly what it would have been with total synchronization on each step.

Similarly, the use of a ring topology, where each clock synchronizes itself to only one other, its neighbor to the right, does not change the drift rate, only the error of individual clocks relative to the composite.

Variations on these kinds of clock synchronization algorithms are used by the various national timekeeping authorities (NIST in the United States, the Bureau de L'Heure in France, and others) to reconcile their clocks. This is done within the set of clocks used by each national standars organization, and also between the clocks maintained independently by different nations in order to arrive at the value for Coordinated Universal Time, the composite representing all of the high quality national standards around the globe.