You also may wish to see my Clock Syncronization and NTP page.

Definitions of the terms used in this document:

Term Definition
Wall Clock Time Time displayed in a format understandable to humans,
such as "12:00 am, Jan 1st, 1970."
Real-Time Clock A hardware-based clock found on system motherboards,
functions independent of a computer's power state.
System Time A software-based clock or time line initialized
during system boot from the real-time clock
Reference Clock A highly precise and stable time source used to
synchronize and calibrate other clocks.
Clock Cycles A measure of the transitions in an electronic circuit,
akin to the movement of a production line.
Counter A device that increments based on specific events or
time periods, independent of CPU-executed code.
Timer A device that tracks the number of increments occurring
between two points in time.
Clock Source A device queried by CPU-executed code to obtain the
current value of a counter.
Monotonic Counter A counter that increments at regular intervals, such as
once per second, analogous to a metronome.
System A self-contained computing entity, such as a Linux server,
Windows desktop, or Mac laptop.
Time Line A graphical or numerical depiction of the passage of time.
Time Scale A representation of time, either graphical or numerical,
adjusted to illustrate drift or skew

x86 Clock Sources and System Timing

RTC (Real Time Clock)

  • Maintains an approximation of Wall Clock time when the x86 computer is powered off, powered by a battery on the motherboard.
  • Similar to a battery-operated digital wristwatch.
  • Utilizes a quartz crystal as a stable oscillator (32 kHz).

Problems with RTC * Prone to inaccuracies due to temperature changes and manufacturing tolerances. * Limited precision compared to modern CPU clock rates. * OS polling of the RTC can be computationally expensive and slow.

ACPI PM (Advanced Configuration and Power Interface Power Management)

  • ACPI PM uses hardware timers or other mechanisms to manage power states and handle timing-related tasks during system sleep and wake cycles.

TSC (Time Stamp Counter)

The original type of the TSC was introduced by Intel with the original Pentium CPU in 1993.

  • Located within the CPU.
  • When an Intel Pentium computer is powered up the TSC register is set to zero.
  • The value in the 64 bit TSC register was incremented by 1 for every CPU ClockCycle. (this is no longer the case)
    • For example, If the CPU ClockCycle was 60MHz then the TSC would also be incremented at a rate of 60MHz.
  • Any code running on the system could read this register to determine what the current value was.
  • Because both the CPU frequency and the TSC frequency were fixed this type of counter was known as a Monotonic Counter.

The TSC was often preferred over many of the modern alternative Clock Sources because it was simple, effective and computationally inexpensive to use. For a running process (application) to query the current value of the TSC, it only requires a single X86 instruction call. Retrieving the TSC counter value in this this CPU register can be achieved by calling either of the following nonprivileged x86 instructions "rdtsc" or "rdtscp". Interestingly; these two x86 instructions "rdtsc" or "rdtscp" can be called from any privilege level and they do not require a CPU ring transition.

Over the years, as Intel X86 platforms became more complicated, it was no longer possible to continue to rely on the simple monotonic TSC to maintain track of Wall Clock Time. Intel introduced SMP, Frequency Scaling, Power States, all of which necessitated the development of advancements to the original TSC and the creation of new Clock Sources.

HPET (High Precision Event Timer)

  • The HPET provides a high-resolution timer with a frequency typically ranging from 10 MHz to 25 MHz, allowing for precise time measurements with resolutions down to nanoseconds.
  • HPETs typically consists of multiple timer channels, each capable of operating independently and providing precise timing information.
  • The HPET operates independently of the CPU's clock speed and is not affected by frequency scaling or changes in the CPU's power state. This ensures consistent and accurate timing information regardless of the system's power management settings.
  • A HPET can generate interrupts at specified intervals, allowing the operating system to perform tasks such as scheduling processes, managing I/O operations, and handling time-sensitive events.
  • HPETs are integrated into the ACPI specification and are managed by the operating system's ACPI driver.

PIT (Programmable Interval Timer)

The PIT is a hardware component used in PCs and other systems. It is a timer device that can be used to generate periodic interrupts, to perform various tasks such as timekeeping, scheduling, and generating audio signals.

  • The PIT can be programmed to generate interrupts at regular intervals by setting specific values in its registers. This feature allowed the system to perform tasks on a periodic basis, such as updating system time or scheduling tasks.

  • The PIT can generate square wave signals at specific frequencies, which were commonly used to generate audio tones or drive other timing-sensitive peripherals.

  • The PIT generates hardware interrupts to notify the CPU when a programmed interval has elapsed. This allows the operating system to perform tasks in response to timer events, such as servicing device requests or scheduling processes.

  • Limited Precision: The PIT's precision is limited compared to modern timer devices like the HPET (High Precision Event Timer). It typically operates at frequencies in the range of a few hundred hertz to a few kilohertz, resulting in lower timing resolution compared to modern high-resolution timers.

Wall Clock time

The term "wall time" refers to the actual time elapsed from the start to the end of a particular process or task as measured by a clock on the wall, or in other words, a real-world clock. Measured in the time units that us humans use Hours, Minutes, Seconds, Micro Seconds, etc...for example 12:00:01:102 am, Jan 1st, 1970

System Time

System time refers to the current time and date as maintained by the operating system of a computer system. It is a representation of the current time in a standardized format that can be used by software applications and humans. System time is initialised and set during system boot up from the Real Time Clock. System time is often kept in sync with the WallClockTime /Reference time by using an NTP client to poll a reference NTP time source. When a System is powered off, the OS is not running therefore SystemTime is no longer being maintained.

Reference Clock / Time

A high precision Reference Clock is used to keep track of the agreed-upon time, examples include...

  • Atomic clocks,

    • They are are the most precise timekeeping devices available today.
    • They use the oscillation frequency of atoms, typically cesium or rubidium atoms, to measure time.
    • They can be accurate to within a few billionths of a second per day.
    • Note: You can buy these in various form factors, I had one in a Sun Sparc server in the early 2000s.
  • GPS Time, GPS Clocks

    • The Global Positioning System (GPS) relies on a network of atomic clocks on satellites to provide precise timing information.
    • GPS time is accurate to within a few tens of nanoseconds.
  • NTP Devices and Servers (Network Time Protocol)

    • NTP Devices and Servers synchronize time across networks and rely on high-precision time sources such as atomic clocks or GPS receivers.
    • NTP servers can achieve sub-millisecond accuracy when properly configured.
  • Quartz oscillators.

    • Commonly used in consumer electronic devices such as wristwatches and computer clocks.
    • While not as precise as atomic clocks, well built quartz oscillators can still provide accurate timekeeping to within a few seconds per month.

Maintaining System Time on a single core fixed frequency x86 system

Below is a simplified timeline of what happens when an Intel Pentium I single-core fixed Clock Frequency x86 computer is powered up and running a standard Linux Kernel 2.6.

  1. RTC continues working independently of the state of the computer. (Battery backed)
  2. CPU is powered on and initialised.
  3. The ClockSource Counters built into the CPU are initialized to Zero.
  4. The OS is booted.
  5. The OS System Time is set to the WallClock time based on the value in the RTC. (Generally, the RealTime clock is not read again until the next boot)
  6. The OS queries to see what ClockSources are available.
  7. The OS determines which ClockSource to use, based on manual configuration overrides and or the calculated stability of the available ClockSources.
    • For simplicity's sake let's say that the OS has decided to use the TSC (Time Stamp Counter) as the ClockSource.
  8. The OS determines the ratio of the ClockSource frequency vs the frequency of the CPU Clock Cycles. ( on this computer is 1:1 )
    • For simplicity's sake let's assume that the CPU Clock Frequency is 10Hz and the ClockSource Counter frequency is also 10Hz.
  9. The OS takes note of the value of the ClockSource counter and the System WallClock time.
    • The Computer has been powered on for 10 seconds, the ClouckSource counter = 10 (seconds) x 10hz = 100
    • The SystemTime & WallClock time is 12:00:10 Jan 1 2016
  10. The OS runs the various process and applications.
  11. The OS-maintained SystemTime can now be kept in sync with the estimated WallClock time by moving it forward by an offset of the value maintained by the ClockSource Counter.
    • For simplicity's sake let's say that the OS has been busy running processes and applications and the SystemTime has not been updated for 10 seconds in WallClock time.
    • The OS queries the ClockSource Counter (which continues independently of CPU load) and notices that the counter has increased from 100 to 200.
    • The OS calculates the current SystemTime by...
      • TSC Counter difference (current) 200 - (previous) 100 = 100
      • The OS knows that the TSC frequency is 10Hz
      • 100 / 10Hz = 10 seconds
      • (previous) SystemTime + 10 seconds = 12:00:10 Jan 1 2016

There are 2 major problems with this method of maintaining system time.

  1. There is no way to determine the initial accuracy of the RTC's approximation of the True Wall Clock time..
    • You cannot determine what time zone it is set to.
    • You cannot determine if it is even set to the correct year.
  2. There is no way to validate the accuracy or the stability of the ClockSource..
    • Is it exactly 10hz or is it 10.01hz ?
    • Is it varying between 9hz and 11hz ?

NTP solves problem [1] by polling a remote time source across the network to get an accurate reading for the True WallClock time in reference to the prime epoch.

NTP goes a great way to mitigate problem [2] by comparing the apparent time drift between the multiple values returned by the local ClockSource with multiple values returned by the more accurate remote NTP time source.

If it is determined that the local ClockSource is drifting from its stated speed of 10Hz, then the local NTP daemon calculates the drift offset and adjusts the system clock.


Advancements of x86 CPU's and the effects upon the MonoTonic TSC

Brief overview of the advances in X86 system architecture that affected the previously used method of using a simple monotonic counter (TSC) to maintain time.

For most purposes, the methodology for maintaining system time described above (Intel Pentium I - III) worked fairly well. However times have changed and as x86 systems have evolved they have become far more complicated, as a result, the old method of determining the WallClock time needed to be changed to work on modern CPUs.

The following 3 advances in X86 CPUs nullified the previously used method of using a simple monotonic counter (TSC) to maintain time.

  • x86 systems evolved from having a single CPU core to multiple CPU cores.
  • x86 CPUs could change Clock frequency whilst a system was running, to both save power and turbo boost.
  • x86 CPU cores could change power states whilst a system was running, individual CPU cores could be turned off and then be turned back on.

As X86 system architecture became more complex, the TSCs and how they were used had to adapt.

Below I have detailed 5 points in the evolution of X86 CPUs that I believe had the most effect on system timekeeping.

  1. Single CPU fixed clock frequency. (tsc)

    • Historically back when x86 CPU frequencies were fixed, the TSC incremented in synchronization with the CPU cycles and everything was fine.
  2. SMP (symmetric multi-processing) fixed clock frequency. (tsc)

    • When SMP (symmetric multi-processing) x86-based systems were released, each CPU had its own TSC and there was no mechanism to keep them in sync.
    • Having separate TSC's on the same x86 system that could be out of sync with each other sometimes caused issues when applications/processes were moved between CPUs + TSC's, as the time could appear to jump either forwards or backward.
    • As a fix for the out-of-sync TSCs on SMP systems, the TSCs were kept in sync by the BIOS/firmware during CPU resets and the situation was better again.
  3. SMP (symmetric multi-processing) variable clock frequency (SpeedStep). (constant_tsc)

    • When x86 CPUs were released with new features such as variable clock speed once again the multiple TSC's within a single x86 system were out of sync.
    • To solve this issue the variable clock/tick/rate/frequency of the CPU, the TSC frequency was Decoupled from the CPU clock frequencies.
    • The TSCs would be incremented at a constant rate regardless of the variable CPU frequency.
    • To communicate to the OS that the TSCs are in sync the CPU cpuid bit (flag) constant_tsc is presented to the OS.
  4. SMP (symmetric multi-processing), variable clock frequency, and Power States (ACPI P-, C-. and T-states, etc..) (nonstop_tsc)

    • Now individual CPU cores can change their clock frequency independently of the other cores and even the uncore can be shutdown.
    • When the uncore is shutdown the associated TSC is frozen and is now out of sync with the other TSC's
    • To solve the issue of the TSC being shutdown with the CPUs and then started again, Intel introduced the "nonstop_tsc"
  5. Sleep x86 (nonstop_tsc_s3)

    • x86 systems can be put to sleep where all system context is lost except system memory.
  • tsc : TSC is present
  • constant_tsc: TSC ticks at a constant rate
  • nonstop_tsc: TSC does not stop in C states
  • nonstop_tsc_s3: TSC doesn't stop in S3 state

Plus 1 more TSC CPUID flag that is only set if the kernel determines that the TSC is reliable.

  • tsc_reliable: TSC is known to be reliable

CPU Bugs and Clock Sources

There have been a few different CPU, Firmware, and Operating System bugs related to CPU clock sources.

One example is an issue with the Intel Xeon E5 v2 CPUs which results in the TSC not being cleared upon a CPU warm reset.

This can result in the various TSC's on a system getting out of sync with each other.

The bug is referenced in the following Intel Xeon E5 v2 spec update as "CA105 TSC is Not Affected by warm reset"

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v2-spec-update.pdf

If CPUs with this bug are warm reset or the uncores are powered down, the multiple TSC's within a system can get out of sync with each other.

This problem can be mitigated by motherboard firmware/BIOS keeping the various TSC's in sync.

However, If the firmware does not reset/sync the TSCs, then they may stay out of sync until a cold reset (power off/power on) is performed.

The Linux Operating system should be able to detect if the TSC's are out of sync and flag the TSC clocksource/s as being unstable. If this occurs, the Linux Operating system should then failover and utilise one of the alternative clock sources and continue functioning normally. Unfortunately, certain kernel builds in certain Linux distributions do not successfully fail over to alternate clock sources.

For example, kernel-2.6.32-504.30.3.el6.x86_64, continues to use the unstable TSC as a clock source This can result in timewarping of the System Time so severe that it may drift too much to be corrected by NTP.

Example C programs to poll the TSC, HPET, and ACPI_PM clock sources

Example C Program to read the TSC

#include <stdio.h>
#include <stdint.h>

int main() {
    unsigned int tsc_val_lo, tsc_val_hi;
    uint64_t timestamp;

    // Execute the rdtsc instruction
    __asm__ __volatile__ ("rdtsc" : "=a"(tsc_val_lo), "=d"(tsc_val_hi));

    // Combine the low and high parts to form the full 64bit timestamp
    timestamp = ((uint64_t)tsc_val_lo) | (((uint64_t)tsc_val_hi) << 32);

    printf("TSC value: %lu\n", timestamp);

    return 0;
}

Example C program to read the HPET

#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>

#define HPET_BASE_ADDRESS 0xFED00000 // Base address of HPET on most x86 machine
#define HPET_SIZE 0x400              // HPET size in bytes

int main() {
    int hpet_fd;
    volatile void *hpet_base;
    uint64_t hpet_counter;

    // Open the /dev/mem device to access physical memory
    hpet_fd = open("/dev/mem", O_RDONLY);
    if (hpet_fd < 0) {
        perror("Error opening /dev/mem");
        return 1;
    }

    // Memory-map the HPET region
    hpet_base = mmap(NULL, HPET_SIZE, PROT_READ, MAP_SHARED, hpet_fd, HPET_BASE_ADDRESS);
    if (hpet_base == MAP_FAILED) {
        perror("Error mapping HPET region");
        close(hpet_fd);
        return 1;
    }

    // Read the HPET counter value
    hpet_counter = *((volatile uint64_t *)(hpet_base + 0xFEE));

    // Unmap the HPET region and close /dev/mem
    munmap((void *)hpet_base, HPET_SIZE);
    close(hpet_fd);

    printf("HPET Counter: %lu\n", hpet_counter);

    return 0;
}

Example C program to read the RTC

 include <stdio.h>
 include <fcntl.h>
 include <unistd.h>
 include <sys/ioctl.h>
 include <linux/rtc.h>

 int main() {
    int rtc_fd;
    struct rtc_time rtc_tm;

    // Open the RTC device file
    rtc_fd = open("/dev/rtc", O_RDONLY);
    if (rtc_fd == -1) {
        perror("Failed to open RTC device");
        return 1;
    }

    // Read the current time from the RTC
    if (ioctl(rtc_fd, RTC_RD_TIME, &rtc_tm) == -1) {
        perror("Failed to read RTC time");
        close(rtc_fd);
        return 1;
    }

    // Print the current time
    printf("Current RTC time: %02d:%02d:%02d\n", rtc_tm.tm_hour, rtc_tm.tm_min, rtc_tm.tm_sec);

    // Close the RTC device file
    close(rtc_fd);

    return 0;

Example C program to read the ACPI_PM

#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <sys/io.h>
#include <unistd.h>

#define PM_BASE_ADDRESS 0x40 // Base address of ACPI PM timer on most x86 machine

int main() {
    int pm_fd;
    uint32_t pm_counter;

    // Open /dev/port to access I/O ports
    pm_fd = open("/dev/port", O_RDONLY);
    if (pm_fd < 0) {
        perror("Error opening /dev/port");
        return 1;
    }

    // Enable I/O port access
    if (iopl(3) < 0) {
        perror("Error enabling I/O port access");
        close(pm_fd);
        return 1;
    }

    // Read the ACPI PM timer counter value
    outl(PM_BASE_ADDRESS, 0x43); // Select counter 0 and latch its value
    pm_counter = inl(PM_BASE_ADDRESS); // Read the latched value

    // Close /dev/port
    close(pm_fd);

    printf("ACPI PM Timer Counter: %u\n", pm_counter);

    return 0;
}

Example C program to Read the TSC on each of 4 CPU cores

This program reads the TSC on each of 4 CPU cores, it then checks that the values retrieved from the TSC are in ascending order. If the values returned are not in ascending order, it could be an indication that the TSCs are not being kept in SYNC.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <stdbool.h>

#define NUM_CORES 4 // Adjust this according to the number of CPU cores

// Function to read the Time Stamp Counter (TSC)
unsigned long long read_tsc(void);

int main() {
    int core;
    unsigned long long tsc_values[NUM_CORES]; // Array to store TSC values for each core
    bool in_order = true; // Flag to indicate if TSC values are in ascending order

    // Loop through each CPU core
    for (core = 0; core < NUM_CORES; core++) {
        // Set CPU affinity to a specific core
        cpu_set_t set;
        CPU_ZERO(&set); // Initialize CPU set to empty
        CPU_SET(core, &set); // Add current core to the CPU set
        if (sched_setaffinity(0, sizeof(cpu_set_t), &set) == -1) { // Set CPU affinity
            perror("sched_setaffinity");
            exit(EXIT_FAILURE);
        }

        // Read the TSC value for the current core and store it in the array
        tsc_values[core] = read_tsc();

        // Check if the TSC value is higher than the previous core's TSC value
        if (core > 0 && tsc_values[core] <= tsc_values[core - 1]) {
            in_order = false; // If not, set the flag to false
        }
    }

    // Print core and TSC values
    for (core = 0; core < NUM_CORES; core++) {
        printf("Core %d: %llu\n", core, tsc_values[core]);
    }

    // Print whether the TSC values are in ascending order or not
    if (in_order) {
        printf("TSC values are in ascending order.\n");
    } else {
        printf("TSC values are not in ascending order.\n");
    }

    return 0;
}

// Function to read the Time Stamp Counter (TSC)
unsigned long long read_tsc(void) {
    unsigned int low, high;
    __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high));
    return ((unsigned long long)high << 32) | low;
}

Sources for this document:

Intel CPU specifications and developer guides

Linux Kernel and Linux Distribution documentation

NTP (Official Documentation)

Wikipedia links

Extra Info