PTP VS NTP

overview

The basic principles of the two protocols are the same. 
Computers or other devices that have a clock are connected in a network and form a hierarchy of 
time sources in which time is distributed from top to bottom. 

The devices on top are normally synchronized to a reference time source (e.g. a timing signal from a GPS receiver). 
Devices "below" periodically exchange timestamps with their time sources in order to measure the offset of their clocks. 

The clocks are continuously adjusted to correct for random variations in their rate (due to effects like thermal changes) 
and to minimize the observed offset.

NTP可以有多个源，但是PTP只能有一个源。

NTP

The reference implementation of NTP is provided in the ntp package. 
Starting with Red Hat Enterprise Linux 7.0 (and now in Red Hat Enterprise Linux 6.8) a more 
versatile NTP implementation is also provided via the chrony package, 
which can usually synchronize the clock with better accuracy and has other advantages over the reference implementation. 

NTP was primarily designed for synchronization over the Internet using unicast, 
where it can usually achieve accuracy in the single-digit millisecond range. 

PTP

PTP is implemented in the linuxptp package.

PTP was designed for local networks with broadcast/multicast transmission and, in ideal conditions, 
the system clock can be synchronized with sub-microsecond accuracy to the reference time. 

Source Selection

NTP由client选择source，所有的source只负责发送协议包，client从多个source中计算出正常的source和不正常的source，
然后根据最短路径和最佳匹配选择source。

PTP 根据BMC选择source

The first important difference between NTP and PTP is in how time sources are selected when 
multiple paths to the top of the hierarchy are available. 

In NTP, the selection happens on the client side. 
An NTP client is receiving timestamps from all of its possible sources and it's up to the client to 
decide which sources are acceptable for synchronization. 
The servers just tell the client what time they think it is and its maximum expected error. 
The client checks if the measurements overlap and a majority of the sources agree on what time it is. 
Sources that are off are rejected as serving false time (falsetickers). 
From the remaining sources are selected sources with best statistics and shortest paths to the reference time sources on stratum 1 servers. 
The measurements are combined and the result is used for the adjustment of the local clock. 
This source selection process / procedure makes the protocol very resilient against failures. 
The client will stay synchronized for as long as it has enough good sources to outvote falsetickers. 
With three sources, it can detect one falseticker, with five sources it can detect two falsetickers, and so on.

In PTP, each slave has only one master and there is only one grandmaster in a PTP domain. 
Masters on the communication paths are selected by a Best Master Clock (BMC) algorithm, 
which runs independently on each clock in the domain. 
There is no negotiation between the clocks. 
The current master announces attributes of its grandmaster (class, accuracy, priority, etc.) to other clocks 
on the path and if there is a clock with a better grandmaster, or is closer to the same grandmaster in the network topology, 
it will switch to the master state and the previous master will stop when it sees it's no longer the best master on the communication path. 
The selection may take several iterations before it stabilizes, but ultimately there will be just one grandmaster and all slaves will 
be synchronized with it through masters on the shortest paths.

# ptp的failover功能并不是很好
When a network link or master fails, another clock on a path to the grandmaster can take over. 
When the grandmaster fails, another clock with a reference time source can be the grandmaster. 
There are optional mechanisms in PTP that allow fast reselection. But the assumption here is that it fails completely, 
or at least is able to detect its failure and stop working. There is no resiliency against other failures. 
When the synchronization of a (grand)master fails or degrades for some reason 
(e.g. its clock becomes unstable or a network link becomes congested or asymmetric), 
the master will still be the clock with the best attributes on the communication path and all clocks synchronized to it will fail with it. 
A single failure can disrupt synchronization in whole PTP domain, 
even if there are redundant network paths and multiple clocks with a reference time source.

Synchronization in NTP

client/server mode:
In this mode, the client periodically sends a request to the server using a client mode packet and 
the server responds immediately with a server mode packet. 

From each exchange the client gets four timestamps: time when the request was sent (t1), time when it was received by the server (t2), 
time when the response (from the server) was sent (t3) and time when it was received back at the client (t4). 

Note that t1 and t4 are measured by the client's clock and t2 and t3 are measured by the server's clock. 
The offset of the client's clock is the difference between the midpoints of intervals [t1, t4] and [t2, t3]. 

The delay is the round-trip time not including the server's processing time 
(i.e. the length of the local interval [t1, t4] without remote interval [t2, t3]).

symmetric mode:
The symmetric mode is similar to the client/server mode, except it allows synchronization in both directions. 
It's typically used between NTP servers operating at the same stratum as a backup in case one of them 
loses its upstream synchronization.

broadcast mode:
The main purpose of the broadcast mode is to simplify configuration of clients in very large networks. 
Instead of configuring each client with a list of NTP servers (or distributing the list via DHCP for instance), 
all clients can use the same configuration file, which just enables reception of broadcast or multicast NTP packets.

A broadcast server periodically sends a broadcast mode packet to an IP broadcast or multicast address. 
Its clients normally don't send any requests. After each received packet they have only two timestamps, 
the remote time when the packet was sent by the server (t3) and the local time when the packet was received (t4). 
That's not enough to calculate both offset and delay. The delay is measured independently using client mode packets first 
and the offset can be then calculated by subtracting half of the delay from the difference between t4 and t3.

Synchronization in PTP

A PTP master periodically sends sync messages, which are received by its slaves. 
This gives them two timestamps, remote time when the message was sent and local time when the message was received. 
As in the NTP broadcast mode, that's not enough to calculate the offset of the clock. 
The network delay between the master and slave has to be measured first.

There are two mechanisms how it can be measured: 
  end-to-end (E2E) using delay request/response and peer-to-peer (P2P) using peer delay request/response. 
With the E2E mechanism the slave sends a delay request and the master immediately responds with the timestamp when it received the request. 
This gives the slave the two missing timestamps, which allow it to calculate the delay exactly as in the NTP client/server mode. 
With the P2P mechanism, both request and response are timestamped, but the slave generally doesn't know exactly the remote timestamps, only their difference. 
This is sufficient to calculate the delay directly without using the timestamps from the sync message. 
When the slave knows the delay, it can calculate the offset with each sync message.

Unlike in the NTP broadcast mode, where clients normally measure the delay only once on start, PTP slaves measure the delay periodically. 
The rate is controlled by the master and it's normally a fraction of the rate of sync messages 
(i.e. slaves usually have more timestamps from the sync messages than delay response messages). 
The standard PTP approach is to calculate the offset and delay independently. 
Alternatively, with the E2E delay mechanism it's possible to use the four most recent timestamps (two for sync message and two for delay response) 
and calculate the offset and delay at the same time as in the NTP client/server mode. 
This allows more effective filtering, but reduces the number of samples. 
Which of the two works better depends on the stability of the clock and the amount of jitter in the measurements.

Accuracy of Synchronization

The accuracy of the NTP and PTP synchronization ultimately depends on the accuracy of the measured offset of the clock against the source. 
This error accumulates in the synchronization hierarchy.

The error has a variable component, which can be reduced with multiple measurements by filtering and averaging, 
and a constant component, which generally can't be detected. If the error is stable and the clock is also stable, 
the offset can be reduced to very small values, but that doesn't necessarily mean the clock is also accurate.

This is very important when looking at the offsets reported in the ptp4l, chronyd, or ntpd logs, or values printed by the pmc, chronyc, or ntpq programs. 
Small offsets generally indicate low network jitter and a stable clock, but it doesn't say much about the accuracy as there still may be a large constant error.

Asymmetry in Network Delay

One source of the error is asymmetry in the network delay. 
The calculation of the offset assumes the delay in the two directions is exactly the same, but that's rarely the case. 
For instance, if packets sent from A to B take 200 microseconds and packets sent from B to A only 100 microseconds, 
the measured offset will have an error of 50 microseconds. 
If A can keep its offset close to zero, it will actually be running 50 microseconds ahead of B.

The asymmetry has multiple sources. It may be in the physical or link layer, 
the packets may go over different network paths (e.g. due to asymmetric routing), 
and there may be different processing and queueing delays in switches and routers. 
There is nothing the clients/slaves can do to measure this error. 
It can be corrected only if it's measured externally by other means (e.g. with a reference time source connected directly to the client/slave). 
In PTP there is an option called delayAsymmetry intended for this correction.

Fortunately, this error has an upper bound. 
The clients/slaves don't know the asymmetry between the delays, but they do know the round-trip delay. 
If the packet was received instantly after it was sent in one direction and the other direction took the whole delay, 
the error in the offset would be equal to the half of the round-trip delay. 
This is the maximum error due to asymmetry in the network delay between two devices. 
In order to determine the maximum error relative to the reference time source, it's necessary to 
know the accumulated round-trip delay over all levels of the hierarchy to the reference time source.

In NTP this value is called root delay. It's exchanged in NTP packets together with root dispersion, 
which estimates the maximum error of the synchronization itself, 
taking into account stability of the clocks on the path to the reference time source. 
Both values can be monitored using the chronyc or ntpq programs. 
In PTP this information is not exchanged, which means only slaves 
of the grandmaster can estimate their maximum error and it's also less reliable due to 
the limitations of the broadcast synchronization as the delay is normally calculated independently from the offset.

The asymmetry in the network delay due to switches and routers can be corrected if they support a special correction field in PTP messages. 
A PTP-aware switch or router that supports this correction is called a transparent clock (TC). 
When it's forwarding a PTP message which includes this field, the difference between the time of 
reception and time of transmission is added to the value in the field. 
The slave includes these corrections in the calculation of the delay and offset, which improves the accuracy significantly. 
In an ideal case, the error drops to zero as if the slave and master were connected directly. 
NTP doesn't have anything like that. In order to avoid this error all NTP devices would have to be connected directly.

Timestamping Errors

Another source of the error in the offset is inaccuracy of the timestamping itself 
(i.e. the transmit or receive timestamp doesn't correspond exactly to the time when the packet was actually sent or received by the NIC).

On a Linux machine, there are basically three different places where the timestamps can be generated:

In user space (i.e. the NTP/PTP daemon), typically before making a send() system call and after a select() or poll() system call.
In the kernel, before the packet is copied to the NIC ring buffer and when the NIC issues an interrupt after receiving a packet. 
This is called software timestamping.
In the NIC itself, when the packet enters and leaves the link or physical layer. This is called hardware timestamping.

Software timestamping is more accurate than user-space timestamping, because it doesn't include the context switch, 
processing of the packet in the kernel and waiting in the network stack. 
Hardware timestamping is more accurate than software timestamping, because it doesn't include waiting in the NIC. 
However, there are several issues with HW and SW timestamping that make them more difficult to use than user-space timestamping.

The first issue is that not every NIC and network driver supports HW timestamping. 
In the past only few selected NICs had support for HW timestamping, but it's more common with modern hardware. 
Also, some NICs don't allow for the timestamping of arbitrary packets and support is limited to PTP packets. 
SW timestamping depends entirely on the driver. 
Supported timestamping features can be verified with the ethtool -T command.

The second issue is that with HW timestamping the NIC has its own clock, which is independent from the system clock, 
and there has to be some form of synchronization between these two clocks. 
HW timestamping can be so accurate that the weakest link of the synchronization may actually be between the NIC and the system clock (!). 
Measuring the offset between the two clocks involves sending messages over PCIe bus and the round-trip delay is typically a few microseconds. 
As the asymmetry in the delay on the bus and the time the NIC needs to respond are unknown, 
the error in the offset may actually be close to the half of the round-trip delay. 
There is no easy way to measure this error. Even if the NIC clock is accurate to few nanoseconds, the system clock may still be off by a microsecond.

The third issue is with servers/masters sending packets which are supposed to include the transmit timestamp of the packet itself. 
With SW timestamping the kernel would have to know where to put the timestamp in the packet. 
With HW timestamping it would have to be done by the NIC. 
The Linux kernel supports some NICs that can do this with PTP packets, which are called one-step clocks. 
Some NICs have a special "launch time" feature that would allow sending packets with an accurate transmit 
timestamp by pre-programming the time of the transmission 
instead of making modifications in the packet, but the kernel doesn't support that yet.

If the NIC can't modify the packet, the protocol itself has to provide some mechanism to send the transmit timestamp to the client/slave later. 
PTP has follow-up messages and the devices that use them are called two-step clocks. 
The NTP specification doesn't have anything like that (yet). 
The reference implementation supports special interleaved variants of the symmetric and broadcast modes, 
which allow the peer/server to send the transmit timestamp of the previous packet, 
but it doesn't support HW timestamping on Linux, so there is currently no practical use for it. 
The client/server mode could have an interleaved variant too if the server was allowed to keep some state for each client.

Both NTP implementations currently use SW timestamping for reception and user-space timestamping for transmission; 
linuxptp supports SW and HW timestamping for both reception and transmission. 
With HW timestamping the synchronization of the two clocks is separate. 
The NIC clock is synchronized by ptp4l and the system clock is synchronized to the NIC clock by phc2sys.

Similarly to the network delay, the error in the measured offset doesn't depend on the absolute error in the timestamping, 
but asymmetry in the errors between the server/master and client/slave. 
If the sum of the error in the transmit timestamp and receive timestamp for packets sent from A to B is similar 
to the sum of errors in timestamps for packets sent from B to A, 
the errors will cancel out. There may be a large asymmetry between the errors in transmit and receive timestamps on one side, 
but that's not a problem if the other side has a similar asymmetry.
For this reason it's recommended to use the same combination of timestamping on both sides and ideally also the same model of NIC. 
Mixing different combinations of timestamping or different NICs may increase the error in the measured offset.

One source of the error in user-space and SW timestamping is interrupt coalescing. 
When the NIC receives a packet, it may wait for more packets before interrupting the kernel in order to reduce the number of interrupts, 
but this means the user-space or SW timestamp is made later and has a larger error. 
On some NICs interrupts coalescing can be configured with the ethtool -C command. Different NICs and drivers have different configurations. 
Adjusting the values for a shorter delay may reduce the error in the receive timestamp, but without a reference time source it's difficult to tell 
how the asymmetry between the server/master and client/slave has changed and whether the accuracy has actually improved.

Combining PTP with NTP

In order to get both accuracy and resiliency at the same time, it would be useful if PTP and NTP could be combined. 
PTP would be the primary source for synchronization of the clock when everything is working as expected. 
NTP would keep the PTP sources in check and allow for fallback between different PTP sources, or to NTP servers when all PTP sources fail.

In Red Hat Enterprise Linux, this is possible. Programs from the linuxptp package can be used in a combination with an NTP daemon. 
A PTP clock on a NIC is synchronized by ptp4l and is used as a reference clock by chronyd or ntpd for synchronization of the system clock. 
The phc2sys program has an option to work as a shared memory (SHM) reference clock, which is supported by both NTP daemons. 
With multiple NICs supporting HW timestamping, for each PTP clock there is one ptp4l instance and one phc2sys instance. 
To make the configuration easy, linuxptp includes also a timemaster program, which from a single configuration file 
can create configuration files for all other programs and start them as needed. 
It supports both chronyd and ntpd. chronyd is preferred as it can synchronize the clock with better accuracy.

The timemaster configuration file is in /etc/timemaster.conf. It specifies NTP servers, network interfaces in PTP domains, 
and also options for chronyd/ntpd, ptp4l or phc2sys. 
This is a minimal example of the configuration file using a single PTP source and an NTP source:

[ptp_domain 0]
interfaces eth0
delay 10e-6

[ntp_server ntp.example.com]
minpoll 4
maxpoll 4

[timemaster]
ntp_program chronyd

In this configuration timemaster configures and starts chronyd with one NTP server and one SHM reference clock. 
The NTP server is polled every 16 seconds. The PTP clock on the eth0 interface is synchronized by ptp4l in the PTP domain number 0 and 
phc2sys provides the PTP clock as a SHM reference clock to chronyd.

The delay option sets the maximum expected error of the PTP clock due to asymmetry in network delays and timestamping to ±5 microseconds. 
This value can't be provided by PTP, so it needs to be specified in the configuration file. 
The default value is 100 microseconds (maximum error of ±50 microseconds), which should cover errors in SW timestamping, 
but in most cases with HW timestamping it would be unnecessarily large. 
Setting the delay to a smaller value will prevent chronyd from combining the PTP source with close NTP servers in local network, 
which are expected to have a much larger error than the PTP source, and it will also make the detection of falsetickers more sensitive.

In normal conditions the system clock is synchronized to the PTP clock. 
If ptp4l switches to an unsynchronized state (e.g. after a complete failure of a PTP master), phc2sys will stop updating the SHM refclock and 
chronyd will switch to the NTP source when the estimated error of the system clock becomes comparable to the estimated error of the NTP source. 
A short loss of the PTP source doesn't cause an immediate switch to a significantly worse NTP source. 
If synchronization of one source fails in such a way that it gives false time, there will be a problem. 
The two sources won't agree with each other and chronyd will have to stop the synchronization as it doesn't know which one is correct. 
A warning message will be written to the system log. If the configuration file specified a second NTP server, 
the falseticker could be identified and the synchronization would continue uninterrupted.

The next example shows a more resilient configuration using two different PTP domains and three NTP servers.
[ptp_domain 0]
interfaces eth0
delay 10e-6

[ptp_domain 1]
interfaces eth1
delay 10e-6

[ntp_server ntp1.example.com]
minpoll 4
maxpoll 4

[ntp_server ntp2.example.com]
minpoll 4
maxpoll 4

[ntp_server ntp3.example.com]
minpoll 4
maxpoll 4

[timemaster]
ntp_program chronyd

timemaster will now start chronyd with two ptp4l instances and two phc2sys instances.

This configuration is resilient against up to four sources failing completely and up to two sources giving false time. 
In normal conditions all five sources are expected to give true time, but only the two PTP sources are used for synchronization of the system clock. 
Combining the PTP sources may improve its accuracy as their average is likely to be closer to the true time. 
If one PTP source fails, the accuracy won't degrade significantly. 
If both PTP sources fail completely or they start giving false time, the synchronization will fall back to a combination of the NTP sources.


The following examples show what the chronyc sources command prints in this configuration when different failures of PTP sources are simulated. 
In the first example everything is working as expected and all sources agree with each other. 
Only the two PTP sources are used for synchronization (indicated with the * and + symbols).

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PTP0                          0   2   377     4    +23ns[   +5ns] +/- 5052ns
#+ PTP1                          0   2   377     4   -116ns[ -116ns] +/- 5083ns
^- ntp1.example.com              1   4   377     6  -9983ns[  -10us] +/-  122us
^- ntp2.example.com              1   4   377     5    +11us[  +11us] +/-   85us
^- ntp3.example.com              1   4   377     5    -12us[  -12us] +/-  155us

In the next example synchronization in one PTP domain failed. 
The source is off by about 200 microseconds and it's drifting away. 
It doesn't agree with other sources, so it's rejected as a falseticker (indicated with the x symbol). 
The other PTP source still works as expected and is used for synchronization.

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#* PTP0                          0   2   377     3    +59ns[  +73ns] +/- 5057ns
#x PTP1                          0   2   377     2   -239us[ -239us] +/- 5079ns
^- ntp1.example.com              1   4   377    11  -9996ns[  -10us] +/-  122us
^- ntp2.example.com              1   4   377    10  +8005ns[+8000ns] +/-   82us
^- ntp3.example.com              1   4   377     9    -13us[  -13us] +/-  163us

In the last example the other PTP source failed completely (? symbol) and NTP sources are used for synchronization. 
The accuracy of the clock is significantly worse than before.

MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? PTP0                          0   2     0   17m    +25us[  -22ns] +/- 5075ns
#x PTP1                          0   2   377     3   -382us[ -382us] +/- 5088ns
^* ntp1.example.com              1   4   377    11  -3078ns[-3000ns] +/-  121us
^+ ntp2.example.com              1   4   377     9  +6000ns[+6000ns] +/-   81us
^+ ntp3.example.com              1   4   377     7    -23us[  -23us] +/-  180us

When the PTP sources are fixed, they will be used for synchronization and the accuracy will improve again. 
Failure of any NTP source would be handled in the same way. 
The synchronization will work correctly for as long as the number of remaining good sources is larger than the number of falsetickers.

上篇PTP BASIC

下篇OVN - localnet + virtualVM + qos