Re: [Ntp] NTP over PTP

Heiko Gerstung <heiko.gerstung@meinberg.de> Wed, 30 June 2021 13:43 UTC

User-Agent: Microsoft-MacOutlook/16.50.21061301
Date: Wed, 30 Jun 2021 15:42:53 +0200
Message-ID: <A64B90EF-00D0-4077-9771-9CB25C823FD6@meinberg.de>
Thread-Topic: [Ntp] NTP over PTP
References: <YNRtXhduDjU4/0T9@localhost> <36AAC858-BFED-40CE-A7F7-8C49C7E6782C@meinberg.de> <YNnSj8eXSyJ89Hwv@localhost> <D32FAF20-F529-496C-B673-354C0D60A5AF@meinberg.de> <YNrDGy2M2hpLz9zc@localhost> <C5D99A22-84B8-4D27-BE74-D8267FB1DCB0@meinberg.de> <YNrqWjHPtC7ToAL8@localhost> <125F908E-F80D-4873-A164-A460D96316E5@meinberg.de> <YNxCLd3vvm3yMTl7@localhost>
In-Reply-To: <YNxCLd3vvm3yMTl7@localhost>
Importance: Normal
Thread-Index: AZ2x3tU+NDJhN2EyODFjMTJjOGIyYg==
From: Heiko Gerstung <heiko.gerstung@meinberg.de>
To: "ntp@ietf.org" <ntp@ietf.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="----D54E7126E47C7189386BF605E32667C2"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/jMpYFUwUBbqyE-hRaBeoSTwTnvU>
Subject: Re: [Ntp] NTP over PTP
Precedence: list

> On Tue, Jun 29, 2021 at 01:17:45PM +0200, Heiko Gerstung wrote:
>>> On Tue, Jun 29, 2021 at 09:15:22AM +0200, Heiko Gerstung wrote:
>>>>> Without full on-path support NTP should generally perform better than
>>>> > PTP as it doesn't assume network has a constant delay.
>>>> Why do you think PTP assumes a constant network delay? PTP is measuring the
>>>> delay constantly in both directions and calculates the round trip.
>>>
>>> Yes, it does, but it is separate from the offset calculation.
>> Which is not the same as claiming that PTP assumes the "network has a
> constant delay".
> 
> Ok, let's try again. It is assumed to be constant in the interval
> between its measurement and the offset measurement. The delay is
> measured periodically in order to adapt to changes in network
> configuration and topology. It's not assumed to change frequently.

I understand your argument that the delay is assumed to be constant in between two delay measurements, and I agree with that. PTP allows the client to define the interval of those delay measurements. This means a client can decide how often it wants to measure the client-to-server part of the roundtrip. In a network where the assumption that this does not change frequently is made, the delay measurements in the client-server direction can happen in longer intervals. However, if a client does not know how stable the network delay is or in cases when it expects it to be highly dynamic, it can send delay requests with the same rate/interval as the sync messages (which are used to measure the server-to-client delay and calculate the offset). 

So PTP simply supports both cases: a network environment where delay changes are assumed to happen infrequently (in this case you choose a longer interval for the delay measurements and therefore save traffic and processing time) or a highly dynamic network environment where you need to do the delay measurements more often. 

As I said earlier, this is conceptually inferior to the NTP delay measurement mechanism which happens at the right time and ensures that the deltas of t4-t1 and t3-t2 are each based on the same clock (client and server), which is not true in PTP. But it is still working great ;-) ...

>>> The calculation is described in section 11.2 of 1588-2019. It uses
>>> <meanDelay> and there is only the TX and RX timestamp of the sync
>>> message like in the NTP broadcast mode. If the distribution of the
>>> actual delay is not symmetric, as is common without full on-path
>>> support, the average error of the measurements will not even get close
>>> to zero. PTP relies on full hardware support. Without that, it
>>> generally cannot perform as well as NTP.
>> Wrong, even without full on-path support unicast PTP uses delay
>> requests/responses to take the client-to-server delay into consideration as
>> well. See IEEE1588-2019 Subsection 11.3 for a description of how this works.

> Well, yes, but the delay measurement is separate from the offset
> measurement. If you can log and plot the offsets measured by PTP and
> NTP in a network without PTP support, you will see how the
> distribution is different due to the different calculation.

This depends, as pointed out earlier, on the network itself. If you have highly dynamic network paths where traffic patterns change quickly and therefore result in wildly changing queueing and forwarding delays, you will see slightly different delay distribution in NTP and PTP logs. In less dynamic environments, you will probably not notice something like that at all. And in networks with partial or full on-path support, you will see that PTP is performing great with its inferior delay measurement approach. 

>>> Another issue with using PTP in network without PTP support is RX
>>> timestamping fixed to the beginning of the message. If the server is
>>> on a 1Gb/s link and the PTP client is on a faster link, there will be
>>> an asymmetry of hundreds of nanoseconds due to the asymmetric delay
>>> in forwarding of messages between different link speeds.
>> Yes, there are implementations which take that into account by applying
>> static correction values to compensate for link speed asymmetry. I believe this
>> also affects NTP, but in most cases hundreds of nanoseconds are not a problem
>> for applications relying on NTP synchronization.
> 
> NTP is much less impacted as it timestamps the end of the reception.
> A software timestamp is captured after the packet is received.

That is a very confusing argument. 

We want to measure the delay of a server-to-client message (sync packets in PTP, NTP responses in NTP) in order to be able to compensate for the delay. There are three contributors to the error budget that you want to eliminate:

a) the latency of the packet in the server, i.e. from the point the software daemon wrote the timestamp into the packet until it actually goes out on the wire via the NIC port

b) the latency of the network, i.e. the time it took the packet from traveling from the server's network port to the network port of the client (passing through an arbitrary number of switches and routers which all contribute to the delay in addition to the mostly static delay of the cable)

c) the latency of the packet in the client, i.e. from the point it arrived at the network port of the client until it reaches the software client.

Both (a) and (c) depend on a lot of factors, a few of them pretty static (CPU power, kernel + driver and network stack code) and some highly dynamic (current workload of the system, activity on unrelated hardware, e.g. disk access). Using hardware time stamping right at the network port eliminates the highly dynamic part of (a) and (c). In addition, commercial PTP products also calibrate phy delays for both TX and RX path and use the measured values to correct for those delay (this also addresses phy delay asymmetry issues). The reason why VMs are not easily synchronized is mostly (a) and (b) and the fact that for a VM there are zero static factors.

If (b) would only be a direct cable between the server and the client, it would be *very* static. If it consists of multiple hops and switches/routers, it can be highly dynamic too. The PTP concept of a transparent clock tries to significantly eliminate the dynamic part of (b) by measuring the delta between a packet arriving at the router and leaving it on another interface. That delta is added to the correction field by every switch/router that the packet passes and the client in the end can deduct the resulting total value from its calculated delay. That way PTP eliminates most of the variable part of the network. 

In a network without any PTP support, hardware time stamping would still deal with (a) and (c) and would improve the accuracy. 

Please note that PTP typically uses the hardware clock implemented in the PTP network interface instead of the software clock maintained by the OS. This way you entirely remove (a) and (c) from the error budget. 

When we connect one of our grandmaster clocks (PTP server) to one of our PTP client devices over a direct crossover cable or fiber link, we can measure an offset of less than 20ns between the PPS generated by the GNSS synchronized master clock and the client. 

> Hardware timestamps are transposed as described in this document:
> https://www.eecis.udel.edu/~mills/stamp.html
> 
> In my testing with several different switches the ideal point of
> the transposition was around beginning of the FCS or a couple octets
> before it, instead of the end. I think the explanation is that it
> compensates for the preamble. In either case the error was much
> smaller than if it was not transposing at all.
The good thing here is that for a given link speed the delay from octet 1 to octet n in one received frame is pretty static. That means, you can timestamp wherever you want (as soon as you identified that is is a packet that needs to be timestamped) if you calibrate your timestamper engine to add that static delay. Some timestamper engines also simply take a timestamp for every packet at the very beginning and decide whether they want to store that timestamp in their TS queue once they identified that this is a PTP event message. 

>> There are more challenges I see for NTS-over-PTP. You need to synchronize the
> clock of the hardware timestamper itself, i.e. getting the time into the
> silicon that creates the timestamp. PTP timestamps are TAI (not UTC), which
> itself is not a problem as long as you know the TAI-UTC offset. On a server
> (PTP Grandmaster) this is typically done by using some form of hardware sync
> for the timestamper engine, e.g. setting the ToD to the upcoming TAI second and
> then use the PPS to zeroize the fractions. In reality the solution is typically
> more sophisticated as you do not want to see micro timesteps at the start of
> every second.
> 
> The same approach can be used with an NTP implementation. The hardware
> can keep time in TAI as long as the TAI-UTC offset is known.

Definitely yes, it needs to be implemented but is certainly solvable. 

>> On a client you have to synchronize your system time with the time of the hw
>> timestamper (e.g. the NIC). That time is synchronized by the hardware itself to
>> the PTP server. PTP4L uses phy2sys for this, but I am not sure about the
>> accuracy with which you can read out the PHC clock and correct the OS clock
>> with it. There is a delay when accessing a NIC over the PCI(e) bus, but this is
>> affecting PTP in the same way. So for the client, you should be on par with PTP
>> in this regard.
> 
> I don't see a difference between PTP and NTP in this aspect. You can
> use the protocol to synchronize the NIC or the system clock, directly
> or indirectly. The PCIe latency is an issue for the system clock
> either way. Some NICs support PTM, which is an NTP-like protocol for
> PCIe with hardware timestamping, which can be used to avoid the error
> due to asymmetric PCIe latency.

Yes, I agree. Same for NTP and PTP. 

>> But for a server you have to find a NIC that supports feeding the PPS of your
>> GNSS receiver (for example) to it, not impossible but also not an easy task for
>> someone who is responsible for maintaining highly accurate synchronization for
>> an entire corporate network.
> 
> Same applies to both NTP and PTP. The Intel I210 is a popular NIC for
> these use cases.

Yes, for sure. Requires a lot of tinkering but it is solvable, too. 

>> The next challenge is on the server, which for unicast PTP requires a certain
>> timestamp queue size to support a usable number of clients. A lot of NICs that
>> claim they have IEEE1588 hardware support have small to tiny ts queue sizes,
>> one common exampe is 4 timestamps. That means you have to be able to read out
>> the hardware timestamps very quickly and you will not really have a chance on
>> high speed links with hundreds and thousands of incoming NTS-over-PTP requests
>> per second.  Those hardware timestamping engines have been designed to be used
>> for PTP clients only, and even then not for the high packet rates that PTP
>> supports (and sometimes requires to improve accuracy over partial
>> on-path-support networks). They cannot be used for servers expecting to handle
>> a high packet rate.
> 
> Isn't that an issue for both NTP and PTP unicast using high rate sync
> and delay requests?

Yes, but unicast PTP servers with hardware time stamping are typically commercial products with dedicated network hardware to support the large number of timestamps that have to be stored and processed. The i210 has a time stamping queue length of 4, i.e. it stores a maximum of 4 timestamps in a ring buffer. A unicast PTP server with 10 clients running at 128 sync and delay pkt/s will generate 1,280 time stamps per second. Even on some serious hardware there will be a significant number of timestamps that are lost due to queue overruns.  

Our devices support 2,048 PTP clients on every Gigabit port, each client running at 128 sync pkt/s and 128 delay req pkt/s. This is only possible by designing your own hardware time stamping engine. What I wanted to say is that it will be very hard to build an NTS-over-PTP server which supports a reasonable number of NTS clients. Not impossible, of course. But a vendor will try to calculate whether the effort is worth it or not. There are solutions in the field that support hardware time stamping for NTP (our own products can do that, no need to put the NTP packets into a unicast PTP packet), and there are also solutions for hardware NTS (the Netnod implementation for example). 

The choice seems to be to either try and convince switch vendors and NIC manufacturers to add hw time stamping support for NTP(+NTS) or to convince PTP server vendors to implement NTS-over-PTP to make use of the PTP hw time stamping capabilities of the NICs and maybe also the switches etc. The latter seems to be the easier task, but without succeeding here, the approach might be a dead end. 

>> Finally, I am not sure if IEEE1588 would be happy about an IETF standard
>> "hijacking" one of their protocols, but most probably they cannot do anything
>> about it. Personally I think it is a hack and should not be standardized, but
>> that's just me. I would rather like to see some standard way of flagging an
>> Ethernet frame that I send out to trigger a hardware timestamping engine to
>> timestamp that frame. Such a universal approach could be used by NTP, PTP and
>> other protocols and applications as well (not only time sync protocols), for
>> example to measure network propagation delays etc. It is incredibly hard to get
>> support for this into the silicon of companies like Intel or Broadcom etc., but
>> if it would be universal enough, the chances are higher that it will make its
>> way into products eventually.
> I think the best approach is for the hardware to timestamp all packets
> as many NICs already do. The problem is with existing hardware that
> cannot do that. I agree it doesn't look great when you have to run NTP
> over PTP, but that seems to be the only way to get the timestamping
> working on this hardware.

I believe there should be a standard that adds a hardware timestamp to the end of every Ethernet frame. It requires a NIC vendor to implement a hardware clock and a time stamping engine into their silicon. The bandwidth between the NIC and the OS layer (driver) has to accommodate the extra data (but you could reduce the MTU of course). 

I know from personal experience how long it took to convince the big NIC manufacturers to add something like that to their products and I will not hold my breath for it. I do not know how long it takes IEEE 802.3 to standardize something like this, but before that has been done, those manufacturers will definitely not do it IMHO. Bottom line: I will not hold my breath ...

Regards,
   Heiko

-- 
Heiko Gerstung 
Managing Director 

MEINBERG® Funkuhren GmbH & Co. KG 
Lange Wand 9 
D-31812 Bad Pyrmont, Germany 
Phone: +49 (0)5281 9309-404 
Fax: +49 (0)5281 9309-9404 

Amtsgericht Hannover 17HRA 100322 
Geschäftsführer/Management: Günter Meinberg, Werner Meinberg, Andre Hartmann, Heiko Gerstung 

Email: 
heiko.gerstung@meinberg.de
Web: 
Deutsch https://www.meinberg.de
English https://www.meinbergglobal.com

Do not miss our Time Synchronization Blog: 
https://blog.meinbergglobal.com

Connect via LinkedIn: 
https://www.linkedin.com/in/heikogerstung

Attachment: smime.p7s

[Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
Re: [Ntp] NTP over PTP Doug Arnold
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
Re: [Ntp] NTP over PTP Doug Arnold
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
Re: [Ntp] NTP over PTP Doug Arnold
Re: [Ntp] NTP over PTP Miroslav Lichvar
Re: [Ntp] NTP over PTP Heiko Gerstung
[Ntp] RFC 8633 (NTP BCP), Appendix A: "restrict s… Ulrich Windl
[Ntp] Antw: [EXT] RFC 8633 (NTP BCP), Appendix A:… Ulrich Windl
Re: [Ntp] RFC 8633 (NTP BCP), Appendix A: "restri… Martin Burnicki
[Ntp] Antw: [EXT] Re: RFC 8633 (NTP BCP), Appendi… Ulrich Windl
Re: [Ntp] RFC 8633 (NTP BCP), Appendix A: "restri… Harlan Stenn

Re: [Ntp] NTP over PTP

Attachment: smime.p7s