PTP BASIC

redhat doc

FUJITSU doc

NetTimeLogic doc

FUJITSU DOC

Event ordering is very important, for incident analysis, performance analysis and so on.
Event ordering is based on timestamps.
Timestamps are collected from multiple servers, so Clock synchronization is important.
If precision and accuracy of clock synchronization are bad, event ordering can reverse against actual time.

NTP provides millisecond level synchronization.
NTP Maybe enough for remote machines, but not enough for locally cooperating machines.
Many events occur in a millisecond in multiple servers,Event ordering will frequently reverse.

Need another protocol:
  Higher precision and accuracy
  Not need to synchronize large area, but local servers and devices

PTP:
Standardized protocol, IEEE1588
Synchronize the clocks in local computing systems and devices
Microsecond to sub-microsecond accuracy and precision
Capability to autonomously decide time server(master) --> using BMC

Grandmaster Clock(Ordinary Clock):
  Original time source for the PTP network,
  Typically synchronize its clock to external time (GPS,NTP and so on)
  End point of PTP network is called Ordinary Clock

Boundary Clock:
  Typically it’s switch
  Synchronize its clock to a master
  Serve as a time source to other (slave) clocks
  May become Grandmaster clock if current Grandmaste is lost

Master: serve as a time source
Slave: synchronize to another clock

Slave Clock (Ordinary Clock):
  Synchronize its clock to a master (to the boundary clock in this example)
  May become Grandmaster clock if current Grandmaster is lost

Time offset between master and slave clocks is calculated based on timestamps at packet sending and receiving
Ideally, we want timestamps of the time just sending (or receiving) packet
But in reality, there is deference between timestamp timing and packet sending (or receiving) timing

Software timestamping:
  Timestamp at Application or OS layer
  Get time from system clock
  Error is relatively huge
    Application
       |
       |<---------timestamp (from sys clock)
      OS
       |
       |
      MAC
       |
       |
      PHY<--------transmit

Hardware Timestamping:
  Hardware assisted timestamp at PHY or MAC layer
  Get time from PTP Hardware Clock (PHC) on NIC
  Minimize error
    Application
       |
       |
      OS
       |
       |
      MAC<--------timestamp (from PHC)
       |
       |
      PHY<--------transmit

The linuxptp protocol itself is implemented on user-land

Kernel features for PTP:
  Socket option SO_TIMESTAMPING for packet timestamping  
  PHC subsystem: Allow to access PHC via clock_gettime/settime/adjtime system calls
  Drivers: Some drivers support Hardware and/or Software timestamping (e.g. e1000e, igb, ixgbe, and so on)

linuxptp applications:
  ptp4l: Implementation of PTP (Ordinary Clock, Boundary Clock)
    Ordinary / Boundary clock
    Hardware / Software timestamping
    Delay request-response / Peer delay mechanism
    IEEE 802.3 (Ethernet) / UDP IPv4 / UDP IPv6 network transport
  phc2sys: Synchronize two clocks (typically PHC and system clock)
    Synchronize two clocks (typically PHC and system clock)
    When you are using Hardware timestamping:
      ptp4l adjusts PHC
      phc2sys adjusts system clock
    When you are using Software timestamping:
      ptp4l directly adjusts system clock
      phc2sys is not needed
  pmc (PTP Management Client): Send PTP management messages to PTP nodes

Typical usage of ptp4l:
  Start as a slave node
  Use eth0 to send/receive messages
  Use /etc/ptp4l.conf as configuration file
  # ptp4l –i eth0 –f /etc/ptp4l.conf –s
    -s: Specify slave only mode. Otherwise, this node can be master.
    -i: interface
    -f: configuration file

Typical usage of phc2sys:
  Adjust system clock based on eth0’s PHC
  Wait until ptp4l starts synchronization to the master
  # phc2sys –s eth0 –c CLOCK_REALTIME –w
    -s: By specifying network interface to –s option, related PHC is automatically selected.Or, you can directly specify PHC like –s /dev/ptp0
    -w: Wait until ptp4l’s synchronization.
    -c: Specify the clock you want to adjust. CLOCK_REALTIME is system clock.

pmc (PTP Management Client):
Send PTP management messages to PTP nodes
  GET action: Get current values of data
  SET action: Update current values of variables
  CMD action: Initiate some events
PTP management messages are specified in IEEE1588
Many PTP devices have not supported management messages yet
  Also linuxptp has not supported many SET and CMD messages yet

Typical usage of pmc:
  Send a message to localhost’s node
  Get values of CURRENT_DATA_SET
  # pmc –u –b 0 ‘GET CURRENT_DATA_SET’
    -u: Indicate to use Unix Domain Socket. UDS is used to receive PTP management messages from localhost.
    -b specifies allowance number of boundary hops. In this case, management messages is sent only localhost.
    GET CURRENT_DATA_SET: Action and Management ID. 

Dynamic ticks make system clock stability worse:
  Dynamic ticks disable periodic timer tick interrupt
  It is a useful feature to power saving but…
  Error correction mechanism in kernel doesn’t aware dynamic ticks

You can disable dynamic ticks:
  Specify nohz=off in kernel boot option
  nohz=on is default

NetTimeLogic DOC

Packet based time synchronization protocol
  It describes the mechanisms how to distribute time (phase, frequency and absolute time) over a packet based network (Ethernet)
  Data and synchronization is using the same network

The standard defines how Mater and Slaves communicate and where timestamps are generated and how the differences are calculated
The standard says nothing about how to correct the Slave

To have synchronous time the Frequency and Phase have to be corrected
• Frequency Correction
	• The Slave‘s oscillator does not have exactly the same frequency as the Master
	• The Slave‘s oscillator frequency varies over time (due to environmental conditions)
• Phase Correction
	• The Slave and Master don’t start at the same time
	• The Master makes a jump in time

The simplest setup consists of two PTP nodes
 
    PTP MASTER ---------------- PTP SLAVE

1. All nodes listen for so called «Announce» message
   An Announce message contains quality information of the Clock (Class, Priorities and Qualities) which sends it
2. When no «Announce» messages was received for a defined interval the nodes become Master and start to send their own «Announce» messages
3. If a node receives an «Announce» message which is better by its quality, the node stops to send «Announce» messages and becomes Slave
4. If a node receives an «Announce» message which is worse by its quality, the node stays in Master and continues to send «Announce» messages in a defined interval
	When the network has determined the best node in
	the network, this is the only one sending Announce
	messages => 1 Master, N Slaves

	This algorithm runs all the time, means if another
	node becomes better or the current Master gets
	worse than another node the topology changes
5. This algorithm is called Best Master Clock Algorithm (BMCA)

The comparison is based on the following attributes in the respective order:
	1. Priority1: a configurable clock priority
	2. ClockClass: a clock ‘s traceability
	3. ClockAccuracy: a clock’s accuracy
	4. OffsetScaledLogVariance: a clock’s stability
	5. Priority2: a configurable second order clock priority
	6. ClockIdentity: a clock’s unique identifier (the tiebreaker if all other attributes are equal)

Based on this comparison a state decision is taken the port state is set to either MASTER, SLAVE, or PASSIVE

This was only half the part of the BMCA, it only determined which is the Best Clock in the
network, however the BMCA also checks other values to determine the network topology
	1. StepsRemoved: over how many hops the frame came
	2. SenderPortIdentity: a port’s unique identifier (the
	first tie-breaker if all other attributes are equal)
	3. ReceiverPortIdentity : a port’s unique identifier (the
	tie-breaker if all other attributes are equal)
	• This is needed e.g. in a multiport PTP device which
	receives Announce frames over multiple ports,
	which one is the one to synchronize to

The accuracy of the PTP system heavily depends on the accuracy of the timestamps

There are several levels where timestamps can be taken
    Application
        |
        |
    Network Stack
        |
        |
      Driver
        |
        |
       MAC
        |
        |
       PHY

  Accuracy differs, the higher in the Network stack the worse
  The timestamp point for Ethernet is the detection of the Start of Frame Delimiter (SFD) on the Cable
  Most implementations take timestamps between the MAC and PHY, which needs hardware support
  
  When Timestamping is done above the PHY two values have to be compensated for
    The RX PHY delay has to be subtracted from the timestamp (too late)
    The TX PHY delay has to be added to the timestamp (too early)
  RX and TX PHY delays are not the same and have to be handled separately
    Otherwise asymmetries are introduced

  PHYs introduce additional jitter on the timestamps due to FIFOs and clock domains

Adjust the frequency:
Master               Slave
|                      |
| --Announce-->        |
|                      |
|t1 --sync-->       t2 |
|   --sync followup--> |
|                      |
|t1' --sync-->      t2'|
|    --sync followup-->|
|                      |

1. The node which is Master sends a so called «Sync» messages and takes a timestamp (T1)
• A Sync message contains the timestamp when the Sync message was sent (T1)
• If a node can insert the sending timestamp (T1) into the Sync message on the fly this is called a «OneStep» clock
2. If the node can not insert the sending timestamp (T1) on the fly, it will send a so called «FollowUp» message
• A FollowUp message contains the sending timestamp of the Sync message (T1)
• In this case it puts either an estimate of the sending timestamp into the Sync or sets it to 0
• This is called a «TwoStep» clock
3. The Slave node takes a timestamp when it receives the «Sync» message (T2)
• This timestamp is stored for further
• No timestamp is taken when a FollowUp is received
4. The master repeats the sending of a «Sync» and optional «FollowUp» in a defined interval
• Timestamps are taken on both sides again (T1’ & T2’ …)
After two «Sync» messages the Slave can calculate the frequency difference to its master
• This is called «Drift»
• The calculation is a following:
  Drift  = ((T2'-T2) - (T1'-T1)) / T1'-T1
• This Drift can then be adjusted to align the frequency of the Slave with the one of the Master

Adjust the phase:
To adjust the phase the timestamps of the sending (T1) and receiving (T2) of the «Sync» are used
• Unfortunately these two timestamps (T1 & T2) are not enough to calculate the phase
• If the Slave just substracts T2 from T1 and adjusts this the two nodes would still be off.
• The delay which it takes from the sending to the receiving of the «Sync» needs to be calculated first
• There are two modes to measure the delay:
  End to End (E2E) and Peer to Peer (P2P)

• The E2E delay mechanism measures the delay from the Slave to the Master
• The P2P delay mechanism measures the delay between two nodes only independent of their states
• We will see later in detail how this works for a larger topology
• Both delay mechanisms assume a symmetrical transmission delay

Calculate the delay (E2E)
Master               Slave
|                      |
| --Announce-->        |
|                      |
|t1 --sync-->       t2 |
|   --sync followup--> |
|                      |
|t4<--delay req--   t3 |
|  --delay resp-->     |
|                      |
|t1' --sync-->      t2'|
|    --sync followup-->|
|                      |

1. The Master sends a «Sync» and an optional «FollowUp» message
• Timestamps are taken on both sides T1 & T2
2. The Slave sends short after the reception of the «Sync» message a so called «DelayReq» message and takes a timestamp (T3)
• The DelayReq does not contain any timestamp
• It takes a timestamp (T3) when it sent the DelayReq and stores it
3. The Master receives the «DelayReq» message and takes a timestamp (T4)
• It takes a timestamp (T4) when it received the DelayReq and stores it
4. The Master sends a so called «DelayResp» message
• The DelayResp message contains the timestamp when the DelayReq was received (T4)
5. The Slave receives the «DelayResp» message
• No timestamp is taken when a DelayResp is received
• Now it can calculate the delay between the nodes
• The Master does not initiate any delay measurement
• The timestamps used for this measurement must come from the synchronized clock.
• After the Slave has received all frames it can calculate the delay
• The calculation is a following:
  Delay = ((𝑇4 − 𝑇1) − (𝑇3 − 𝑇2)) / 2
• This delay is then used for adjusting the phase of the Slave
• This measurement is repeated in a defined interval


Calculate the delay (P2P)
Master                      Slave
|                            |
| --Announce-->              |
|                            |
|t1 --sync-->       t2       |
|   --sync followup-->       |
|                            |
|t4<--pdelay req--  t3       |
|t5--pdelay resp--> t6       |
|--pdelay resp followup->    |
|                            |
|t3--pdelay req-->  t4       |
|t6<--pdelay resp-- t5       |
|--pdelay resp followup->    |

|t1' --sync-->      t2'      |
|    --sync followup-->      |
|                            |

1. The Slave sends a so called «PDelayReq» message and takes a timestamp (T3)
• The PDelayReq does not contain any timestamp
• It takes a timestamp (T3) when it sent the PDelayReq and stores it
2. The Master receives the «PDelayReq» message and takes a timestamp (T4)
• It takes a timestamp (T4) when it received the DelayReq and stores it
3. The Master sends a so called «PDelayResp» message and takes a timestamp (T5)
• It takes a timestamp on sending of the PDelayResp (T5)
• The PDelayResp message contains either the timestamp when the PDelayReq was received (T4)
  in “TwoStep” mode or for “OneStep” mode the delta between the timestamps when sending the
  PDelayResp (T5) and receiving the PDelayReq (T4) which is inserted on the fly
4. If the node can not insert the delta (T5-T4) on the fly, it will send a so called «PDelayRespFollowUp» message
• A PDelayRespFollowUp message contains the sending timestamp of the PDelayResp message (T5)
5. The Slave receives the «PDelayResp» message and takes a timestamp (T6)
• Now it can calculate the delay between the nodes
• No timestamp is taken when a PDelayRespFollowUp is received
• There are 3 options how to get the timestamps from the responder to the requestor:
	• PDelayResp(T5-T4), «OneStep»
	• PDelayRespFollowUp(T5-T4), «TwoStep»
	• PDelayResp(T4)
	  PDelayRespFollowUp(T5) , «TwoStep»
• In this example the Slave measured the delay to the Master, the same is done also from the Master to the Slave (between all peers)
• The timestamps used for this measurement can come from a different clock than the synchronized one.
After the Slave has received all frames it can calculate the delay
• The calculation is a following:
 Delay = ((𝑇6 − 𝑇3) − (𝑇5 − 𝑇4)) / 2
• This delay is then used for adjusting the phase of the Slave
• This measurement is repeated in a defined interval

Adjust the phase:
Now that the Slave has calculated the delay it can calculate the phase
• This is called «Offset»
• The calculation is a following:
  𝑂𝑓𝑓𝑠𝑒𝑡 = 𝑇2 − 𝑇1 − 𝐷𝑒𝑙𝑎𝑦
• This offset is then used for adjusting the phase of the Slave
• It doesn’t matter which Delay Mechanism is used in this example


• Topology changes are considered every «Announce» interval
• Offset and drift are adjusted every «Sync» interval
• Delays are calculated every «Delay» interval
• «Announce», «Sync» and «Delay» intervals don’t have to be (and often are not) the same


Ordinary Clock (OC)
• A PTP node with only one port
• Can be Master or Slave
• If it is the best clock according to the BMCA it will act as Master otherwise as Slave

Grandmaster Clock (GM)
• A OC with either an external time source (GPS…) or a very high accuracy time (ATOM)
• Can only be Master
• If it is the best clock according to the BMCA it will act as Master otherwise it will go in a passive State

Slave Only Clock (SO)
• A PTP node with only one port
• Can only be Slave
• If no Master capable device is in the network it will be just free running

Boundary Clock (BC)
• A PTP node with more than one port
• Can be Master or Slave
• Normally a Switch
• PTP frames are not forwarded through the Switch, the BC is source and Sink for all PTP frames
• Each port has its own state
• Slave on one port and Master on all other ports, or Master on all ports determined by the BMCA
• The BC synchronizes itself to a Master on its Slave port and distributes the time on all its Master ports
• On each port a different delay mechanism and frame rates can be used
• Normal Switches (without PTP) have an nondeterministic forwarding delay which has a really bad influence on the accuracy
所有的ptp报文,包括announce sync,sync followup,delay req,dealy resp,pdelay req,pdelay resp,pdelay resp followup都不被boundary clock转发


Transparent Clock (TC)
• A PTP node with more than one port
• Stateless, does not take part in the BMCA
• Normally a Switch
• PTP frames are forwarded through the Switch
(except P2P messages) and their resident time is
added to a so called «correction field» of the Sync
and DelayReq if in «OneStep» mode and in the
corresponding «FollowUp» or non-time-critical
messages accordingly if in «TwoStep» mode
• On each port the same delay mechanism has to be used
• The TC might syntonize (frequency align) itself to the Master to calculate the resident times
• When in P2P mode the TC measures the delay on all its ports and adds the delay of the corresponding
  port to the correction field where the Sync is received
除了pdelay req,pdelay resp,pdelay resp followup,其他所有ptp报文都会被transparent clock转发

• P2P TC:
	• All link delays are measured on a peer to peer basis.
	• Sum of switch residence time and link delay along the path is reported to the Slave.

• E2E TC:
	• Delay measurement end to end between slave and master.
	• Sum of switch residence time along the path is reported to the slave.

• TC can also be «OneStep» or «TwoStep»
	• Residence time either on the fly added to Sync or later in the Follow Up

• «Correction Fields» of the «Sync» and «FollowUp» have to be added together at the Slave and T2 corrected accordingly


BC vs. TC
• BC
+ Different Delay mechanisms and message rates on each port
+ Can lower the network load for E2E
+ Can take over the Master rule
- Higher complexity, requires a PTP stack
- Cascaded PI Servo Loops (e.g. bad for Daisy-Chain)
- No fast topology changes possible

• TC
+ No cascaded PI Servo Loops
+ Fast topology changes possible
+ Lower complexity (no BMCA, no Synchronization)
+ No PTP stack required
- Requires the same Delay mechanisms and message rates on each port
- Can not take over the Master rule

E2E vs. P2P
E2E
• Always from the Slave port to a Master port
• In case of a TC all nodes see the Delay messages from all other nodes, doesn’t scale well
+ Works with legacy Switches (no PTP support)
- High network load when TCs are used
- Doesn’t scale well
- Can not handle topology changes seamless, has to measure the path to the new master first
- Most industrial profiles do not support E2E

P2P
• Between every two neighbor ports
• Each node sees only the Delay messages of his neighbor, scales well
+ Low network load when TCs or BCs are used
+ Scales well
+ Can handle topology changes seamless, all delays to all neighbors are pre measured
+ Easier to combine with HSR/PRP
+ Most industrial profiles support P2P
- Doesn’t works with legacy Switches (no PTP P2P support)

PTP over UDP/IPv4
• UDP Port 319 for Sync, DelayReq, PDelayReq & PDelayResp messages (time critical messages)
• UDP Port 320 for all other messages (non critical messages)
• IP Addr. 224.0.0.107 for PDelayReq & PDelayResp & PDelayRespFollowUp messages
• IP Addr. 224.0.1.129 for all others
• Ethertype 0x0800 for IP

PTP over UDP/IPv6
• UDP Port 319 for Sync, DelayReq, PDelayReq & PDelayResp messages (time critical messages)
• UDP Port 320 for all other messages (non critical messages)
• IP Addr. FF02:0:0:0:0:0:0:6B for PDelayReq & PDelayResp & PDelayRespFollowUp messages
• IP Addr. FF0x:0:0:0:0:0:0:181 for all others
• Ethertype 0x0800 for IP

PTP over 802.3
• MAC Addr. 01-80-C2-00-00-0E for PDelayReq & PDelayResp & PDelayRespFollowUp messages
• MAC Addr. 01-1B-19-00-00-00 for all others
• Ethertype 0x088F7 for PTP

Introduction

The Precision Time Protocol (PTP) is a protocol used to synchronize clocks in a network. 
When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, 
which is far better than is normally obtainable with NTP. 
PTP support is divided between the kernel and user space. 
The kernel in Red Hat Enterprise Linux 6 now includes support for PTP clocks, which are provided by network drivers. 
The actual implementation of the protocol is known as linuxptp, a PTPv2 implementation according to the IEEE standard 1588 for Linux.

** PTP kernel部分主要由driver实现,当然应该还包括其他一下基础代码。
** PTP user space就是ptp4l,phc2sys,包含在”yum install linuxptp“包中

The linuxptp package includes the ptp4l and phc2sys programs for clock synchronization. 

The ptp4l program implements the PTP boundary clock and ordinary clock. 
With hardware time stamping, it is used to synchronize the PTP hardware clock to the master clock, 
and with software time stamping it synchronizes the system clock to the master clock. 

The phc2sys program is needed only with hardware time stamping, 
for synchronizing the system clock to the PTP hardware clock on the network interface card (NIC).


        GPS
         |
         |
         |
        PTP Grandmaster
         |
         |
         |s
        Boundary Clock ------> Time Slave(ordinary clock (OC))
         |m
         |
         |s
        Boundary Clock ------> Time Slave(ordinary clock (OC))
         |m
         |
         |s
        Time Slave(ordinary clock (OC))


The clocks synchronized by PTP are organized in a master-slave hierarchy. 
The slaves are synchronized to their masters which may be slaves to their own masters. 
The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. 
When a clock has only one port, it can be master or slave, such a clock is called an ordinary clock (OC). 
A clock with multiple ports can be master on one port and slave on another, such a clock is called a boundary clock (BC). 
The top-level master is called the grandmaster clock, which can be synchronized by using a Global Positioning System (GPS) time source. 
By using a GPS-based time source, disparate networks can be synchronized with a high-degree of accuracy.
        

Advantages of PTP

One of the main advantages that PTP has over the Network Time Protocol (NTP) is hardware support present 
in various network interface controllers (NIC) and network switches. 
This specialized hardware allows PTP to account for delays in message transfer, and greatly improves the accuracy of time synchronization. 

While it is possible to use non-PTP enabled hardware components within the network, this will often cause an increase in jitter or 
introduce an asymmetry in the delay resulting in synchronization inaccuracies, 
which add up with multiple non-PTP aware components used in the communication path. 

To achieve the best possible accuracy, it is recommended that all networking components between PTP clocks are PTP hardware enabled. 
Time synchronization in larger networks where not all of the networking hardware supports PTP might be better suited for NTP.

With hardware PTP support, the NIC has its own on-board clock, which is used to time stamp the received and transmitted PTP messages. 
It is this on-board clock that is synchronized to the PTP master, and the computer's system clock is synchronized 
to the PTP hardware clock on the NIC. 

With software PTP support, the system clock is used to time stamp the PTP messages and it is synchronized to the PTP master directly. 

Hardware PTP support provides better accuracy since the NIC can time stamp the PTP packets at the exact moment they are sent and received 
while software PTP support requires additional processing of the PTP packets by the operating system.

In order to use PTP, the kernel network driver for the intended interface has to support either software or hardware time stamping capabilities.
driver必须支持硬件或者软件时间戳功能。

如果网卡支持硬件clock,那么ptp收发时间戳都是由硬件clock产生的,并且硬件clock首先被同步到master,然后系统时钟再同步到硬件clock。
如果网卡只支持软件PTP,那么ptp收发时间戳由driver使用系统时间产生,并且系统时间会被直接同步到master。

检查网卡支持那种时间戳:
# ethtool -T eth3
Time stamping parameters for eth3:
Capabilities:
        hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
        software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
        hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
        software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
        software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
        hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
        off                   (HWTSTAMP_TX_OFF)
        on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
        none                  (HWTSTAMP_FILTER_NONE)
        all                   (HWTSTAMP_FILTER_ALL)

For software time stamping support, the parameters list should include:
SOF_TIMESTAMPING_SOFTWARE
SOF_TIMESTAMPING_TX_SOFTWARE
SOF_TIMESTAMPING_RX_SOFTWARE
For hardware time stamping support, the parameters list should include:
SOF_TIMESTAMPING_RAW_HARDWARE
SOF_TIMESTAMPING_TX_HARDWARE
SOF_TIMESTAMPING_RX_HARDWARE

ptp4l

ptp4l默认使用硬件时间戳

# ptp4l -i eth3 -m
selected eth3 as PTP clock
port 1: INITIALIZING to LISTENING on INITIALIZE
port 0: INITIALIZING to LISTENING on INITIALIZE
port 1: new foreign master 00a069.fffe.0b552d-1
selected best master clock 00a069.fffe.0b552d
port 1: LISTENING to UNCALIBRATED on RS_SLAVE
master offset -23947 s0 freq +0 path delay       11350
master offset -28867 s0 freq +0 path delay       11236
master offset -32801 s0 freq +0 path delay       10841
master offset -37203 s1 freq +0 path delay       10583
master offset  -7275 s2 freq -30575 path delay   10583
port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
master offset  -4552 s2 freq -30035 path delay   10385

The master offset value is the measured offset from the master in nanoseconds. 
The s0, s1, s2 strings indicate the different clock servo states: s0 is unlocked, s1 is clock step and s2 is locked. 
Once the servo is in the locked state (s2), the clock will not be stepped (only slowly adjusted) unless 
the pi_offset_const option is set to a positive value in the configuration file (described in the ptp4l(8) man page). 
The freq value is the frequency adjustment of the clock in parts per billion (ppb). 
The path delay value is the estimated delay of the synchronization messages sent from the master in nanoseconds. 
Port 0 is a Unix domain socket used for local PTP management. Port 1 is the eth3 interface (based on the example above.) 
INITIALIZING, LISTENING, UNCALIBRATED and SLAVE are some of possible port states 
which change on the INITIALIZE, RS_SLAVE, MASTER_CLOCK_SELECTED events. 
In the last state change message, the port state changed from UNCALIBRATED to SLAVE 
indicating successful synchronization with a PTP master clock.

The ptp4l program can also be started as a service by running:
# service ptp4l start

When running as a service, options are specified in the /etc/sysconfig/ptp4l file. 
More information on the different ptp4l options and the configuration file settings can be found in the ptp4l(8) man page.

By default, messages are sent to /var/log/messages. 
However, specifying the -m option enables logging to standard output which can be useful for debugging purposes.

To enable software time stamping, the -S option needs to be used as follows:
# ptp4l -i eth3 -m -S

For ptp4l there is also a directive, summary_interval, to reduce the output and print only statistics, 
as normally it will print a message every second or so. 
For example, to reduce the output to every 1024 seconds, add the following line to the /etc/ptp4l.conf file:
  summary_interval 10

ptp4l Selecting a Delay Measurement Mechanism

There are two different delay measurement mechanisms and they can be selected by means of an option added to the ptp4l command as follows:
-P
The -P selects the peer-to-peer (P2P) delay measurement mechanism.
The P2P mechanism is preferred as it reacts to changes in the network topology faster, and may be more accurate in measuring the delay, than other mechanisms. 
The P2P mechanism can only be used in topologies where each port exchanges PTP messages with at most one other P2P port. 
It must be supported and used by all hardware, including transparent clocks, on the communication path.

-E
The -E selects the end-to-end (E2E) delay measurement mechanism. This is the default.
The E2E mechanism is also referred to as the delay “request-response” mechanism.

-A
The -A enables automatic selection of the delay measurement mechanism.
The automatic option starts ptp4l in E2E mode. It will change to P2P mode if a peer delay request is received.

Note
All clocks on a single PTP communication path must use the same mechanism to measure the delay. 
A warning will be printed when a peer delay request is received on a port using the E2E mechanism. 
A warning will be printed when a E2E delay request is received on a port using the P2P mechanism.

Specifying a Configuration File

No configuration file is read by default, so it needs to be specified at runtime with the -f option. For example:
~]# ptp4l -f /etc/ptp4l.conf

A configuration file equivalent to the -i eth3 -m -S options shown above would look as follows:
~]# cat /etc/ptp4l.conf
[global]
verbose               1
time_stamping         software
[eth3]

Using the PTP Management Client

The PTP management client, pmc, can be used to obtain additional information from ptp4l as follows:
~]# pmc -u -b 0 'GET CURRENT_DATA_SET'
sending: GET CURRENT_DATA_SET
        90e2ba.fffe.20c7f8-0 seq 0 RESPONSE MANAGMENT CURRENT_DATA_SET
                stepsRemoved        1
                offsetFromMaster  -142.0
                meanPathDelay     9310.0
~]# pmc -u -b 0 'GET TIME_STATUS_NP'
sending: GET TIME_STATUS_NP
        90e2ba.fffe.20c7f8-0 seq 0 RESPONSE MANAGMENT TIME_STATUS_NP
                master_offset              310
                ingress_time               1361545089345029441
                cumulativeScaledRateOffset   +1.000000000
                scaledLastGmPhaseChange    0
                gmTimeBaseIndicator        0
                lastGmPhaseChange          0x0000'0000000000000000.0000
                gmPresent                  true
                gmIdentity                 00a069.fffe.0b552d
Setting the -b option to zero limits the boundary to the locally running ptp4l instance. 
A larger boundary value will retrieve the information also from PTP nodes further from the local clock. 

The retrievable information includes:
	stepsRemoved is the number of communication paths to the grandmaster clock.
	offsetFromMaster and master_offset is the last measured offset of the clock from the master in nanoseconds.
	meanPathDelay is the estimated delay of the synchronization messages sent from the master in nanoseconds.
	if gmPresent is true, the PTP clock is synchronized to a master, the local clock is not the grandmaster clock.
	gmIdentity is the grandmaster's identity.

For a full list of pmc commands, type the following as root:
~]# pmc help

Additional information is available in the pmc(8) man page.

phc2sys

~]# phc2sys -h

usage: phc2sys [options]


 automatic configuration:
 -a             turn on autoconfiguration
 -r             synchronize system (realtime) clock
                repeat -r to consider it also as a time source
 manual configuration:
 -c [dev|name]  slave clock (CLOCK_REALTIME)
 -d [dev]       master PPS device
 -s [dev|name]  master clock
 -O [offset]    slave-master time offset (0)
 -w             wait for ptp4l
 common options:
 -f [file]      configuration file
 -E [pi|linreg] clock servo (pi)
 -P [kp]        proportional constant (0.7)
 -I [ki]        integration constant (0.3)
 -S [step]      step threshold (disabled)
 -F [step]      step threshold only on start (0.00002)
 -R [rate]      slave clock update rate in HZ (1.0)
 -N [num]       number of master clock readings per update (5)
 -L [limit]     sanity frequency limit in ppb (200000000)
 -M [num]       NTP SHM segment number (0)
 -u [num]       number of clock updates in summary stats (0)
 -n [num]       domain number (0)
 -x             apply leap seconds by servo instead of kernel
 -z [path]      server address for UDS (/var/run/ptp4l)
 -l [num]       set the logging level to 'num' (6)
 -t [tag]       add tag to log messages
 -m             print messages to stdout
 -q             do not print messages to the syslog
 -v             prints the software version and exits
 -h             prints this message and exits

The -a option causes phc2sys to read the clocks to be synchronized from the ptp4l application. 
It will follow changes in the PTP port states, adjusting the synchronization between the NIC hardware clocks accordingly. 

The system clock is not synchronized, unless the -r option is also specified. 
If you want the system clock to be eligible to become a time source, specify the -r option twice.

Alternately, use the -s option to synchronize the system clock to a specific interface's PTP hardware clock. For example:

~]# phc2sys -s eth3 -w

The -w option waits for the running ptp4l application to synchronize the PTP clock and then retrieves the TAI to UTC offset from ptp4l.

Normally, PTP operates in the International Atomic Time (TAI) timescale, while the system clock is kept in Coordinated Universal Time (UTC). 
The current offset between the TAI and UTC timescales is 36 seconds. 
The offset changes when leap seconds are inserted or deleted, which typically happens every few years. 
The -O option needs to be used to set this offset manually when the -w is not used, as follows:

~]# phc2sys -s eth3 -O -36

Once the phc2sys servo is in a locked state, the clock will not be stepped, unless the -S option is used. 
This means that the phc2sys program should be started after the ptp4l program has synchronized the PTP hardware clock.
However, with -w, it is not necessary to start phc2sys after ptp4l as it will wait for it to synchronize the clock.

To reduce the output from the phc2sys, it can be called it with the -u option as follows:
~]# phc2sys -u summary-updates

Serving PTP Time With NTP

使用NTP提供PTP时间给其他设备。
The ntpd daemon can be configured to distribute the time from the system clock synchronized by ptp4l or 
phc2sys by using the LOCAL reference clock driver. 
To prevent ntpd from adjusting the system clock, the ntp.conf file must not specify any NTP servers. 
The following is a minimal example of ntp.conf:

~]# cat /etc/ntp.conf
server   127.127.1.0
fudge    127.127.1.0 stratum 0

Serving NTP Time With PTP

NTP to PTP synchronization in the opposite direction is also possible. 
When ntpd is used to synchronize the system clock, ptp4l can be configured with 
the priority1 option (or other clock options included in the best master clock algorithm) 
to be the grandmaster clock and distribute the time from the system clock via PTP:

~]# cat /etc/ptp4l.conf
[global]
priority1 127
[eth3]
# ptp4l -f /etc/ptp4l.conf

With hardware time stamping, phc2sys needs to be used to synchronize the PTP hardware clock to the system clock:
~]# phc2sys -c eth3 -s CLOCK_REALTIME -w

To prevent quick changes in the PTP clock's frequency, 
the synchronization to the system clock can be loosened by using smaller P (proportional) and I (integral) constants of the PI servo:

~]# phc2sys -c eth3 -s CLOCK_REALTIME -w -P 0.01 -I 0.0001

Synchronize to PTP or NTP Time Using timemaster

使用timemaster可以从多个PTP和NTP源同步时间
When there are multiple PTP domains available on the network, or fallback to NTP is needed, 
the timemaster program can be used to synchronize the system clock to all available time sources. 
The PTP time is provided by phc2sys and ptp4l via shared memory driver (SHM) reference clocks to 
chronyd or ntpd (depending on the NTP daemon that has been configured on the system). 
The NTP daemon can then compare all time sources, both PTP and NTP, and use the best sources to synchronize the system clock.

On start, timemaster reads a configuration file that specifies the NTP and PTP time sources, 
checks which network interfaces have their own or share a PTP hardware clock (PHC), 
generates configuration files for ptp4l and chronyd or ntpd, and starts the ptp4l, phc2sys, and chronyd or ntpd processes as needed. 
It will remove the generated configuration files on exit. It writes configuration files for chronyd, ntpd, and ptp4l to /var/run/timemaster/.

Red Hat Enterprise Linux provides a default /etc/timemaster.conf file with a number of sections containing default options. 
The section headings are enclosed in brackets.

~]$ less /etc/timemaster.conf
# Configuration file for timemaster

#[ntp_server ntp-server.local]
#minpoll 4
#maxpoll 4

#[ptp_domain 0]
#interfaces eth0

[timemaster]
ntp_program chronyd

[chrony.conf]
include /etc/chrony.conf

[ntp.conf]
includefile /etc/ntp.conf

[ptp4l.conf]

[chronyd]
path /usr/sbin/chronyd
options -u chrony

[ntpd]
path /usr/sbin/ntpd
options -u ntp:ntp -g

[phc2sys]
path /usr/sbin/phc2sys

[ptp4l]
path /usr/sbin/ptp4l


Notice the section named as follows:
[ntp_server address]
This is an example of an NTP server section, “ntp-server.local” is an example of a host name for an NTP server on the local LAN. 
Add more sections as required using a host name or IP address as part of the section name. 
Note that the short polling values in that example section are not suitable for a public server, 
see Chapter 22, Configuring NTP Using ntpd for an explanation of suitable minpoll and maxpoll values.

Notice the section named as follows:
[ptp_domain number]
A “PTP domain” is a group of one or more PTP clocks that synchronize to each other. 
They may or may not be synchronized to clocks in another domain. Clocks that are configured with the same domain number make up the domain. 
This includes a PTP grandmaster clock. The domain number in each “PTP domain” section needs to correspond to one of the PTP domains configured on the network.
An instance of ptp4l is started for every interface which has its own PTP clock and hardware time stamping is enabled automatically. 
Interfaces that support hardware time stamping have a PTP clock (PHC) attached, however it is possible for a group of interfaces on a NIC to share a PHC. 
A separate ptp4l instance will be started for each group of interfaces sharing the same PHC and for each interface that supports only software time stamping. 
All ptp4l instances are configured to run as a slave. 
If an interface with hardware time stamping is specified in more than one PTP domain, then only the first ptp4l instance created will have hardware time stamping enabled.

Notice the section named as follows:
[timemaster]
The default timemaster configuration includes the system ntpd and chrony configuration (/etc/ntp.conf or /etc/chronyd.conf) in order to 
include the configuration of access restrictions and authentication keys. That means any NTP servers specified there will be used with timemaster too.

The section headings are as follows:
[ntp_server ntp-server.local] — Specify polling intervals for this server. Create additional sections as required. Include the host name or IP address in the section heading.
[ptp_domain 0] — Specify interfaces that have PTP clocks configured for this domain. Create additional sections with, the appropriate domain number, as required.
[timemaster] — Specify the NTP daemon to be used. Possible values are chronyd and ntpd.
[chrony.conf] — Specify any additional settings to be copied to the configuration file generated for chronyd.
[ntp.conf] — Specify any additional settings to be copied to the configuration file generated for ntpd.
[ptp4l.conf] — Specify options to be copied to the configuration file generated for ptp4l.
[chronyd] — Specify any additional settings to be passed on the command line to chronyd.
[ntpd] — Specify any additional settings to be passed on the command line to ntpd.
[phc2sys] — Specify any additional settings to be passed on the command line to phc2sys.
[ptp4l] — Specify any additional settings to be passed on the command line to all instances of ptp4l.


To change the default configuration, open the /etc/timemaster.conf file for editing as root:
~]# vi /etc/timemaster.conf

For each NTP server you want to control using timemaster, create [ntp_server address] sections . 
Note that the short polling values in the example section are not suitable for a public server, 
see Chapter 22, Configuring NTP Using ntpd for an explanation of suitable minpoll and maxpoll values.

To add interfaces that should be used in a domain, edit the #[ptp_domain 0] section and add the interfaces. 
Create additional domains as required. For example:
[ptp_domain 0]
       interfaces eth0

[ptp_domain 1]
	interfaces eth1

If required to use ntpd as the NTP daemon on this system, change the default entry in the [timemaster] section from chronyd to ntpd. 
See Configuring NTP Using the chrony Suite for information on the differences between ntpd and chronyd.

If using chronyd as the NTP server on this system, add any additional options below the default include /etc/chrony.conf entry in the [chrony.conf] section. 
Edit the default include entry if the path to /etc/chrony.conf is known to have changed.

If using ntpd as the NTP server on this system, add any additional options below the default include /etc/ntp.conf entry in the [ntp.conf] section. 
Edit the default include entry if the path to /etc/ntp.conf is known to have changed.

In the [ptp4l.conf] section, add any options to be copied to the configuration file generated for ptp4l. 
This chapter documents common options and more information is available in the ptp4l(8) manual page.

In the [chronyd] section, add any command line options to be passed to chronyd when called by timemaster. 
See Configuring NTP Using the chrony Suite for information on using chronyd.

In the [ntpd] section, add any command line options to be passed to ntpd when called by timemaster. See Chapter 22, Configuring NTP Using ntpd for information on using ntpd.

In the [phc2sys] section, add any command line options to be passed to phc2sys when called by timemaster. 
This chapter documents common options and more information is available in the phy2sys(8) manual page.

In the [ptp4l] section, add any command line options to be passed to ptp4l when called by timemaster. 
This chapter documents common options and more information is available in the ptp4l(8) manual page.

Save the configuration file and restart timemaster by issuing the following command as root:
~]# service timemaster restart