Re: [Ntp] CLOCK_TAI (was NTPv5: big picture)

Martin Burnicki <martin.burnicki@meinberg.de> Fri, 08 January 2021 14:53 UTC

Return-Path: <martin.burnicki@meinberg.de>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 94B263A100D for <ntp@ietfa.amsl.com>; Fri, 8 Jan 2021 06:53:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.361
X-Spam-Level:
X-Spam-Status: No, score=-2.361 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.262, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=meinberg.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VNRnDIVvJC7S for <ntp@ietfa.amsl.com>; Fri, 8 Jan 2021 06:53:08 -0800 (PST)
Received: from server1a.meinberg.de (server1a.meinberg.de [176.9.44.212]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 168453A100A for <ntp@ietf.org>; Fri, 8 Jan 2021 06:53:07 -0800 (PST)
Received: from srv-kerioconnect.py.meinberg.de (unknown [193.158.22.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by server1a.meinberg.de (Postfix) with ESMTPSA id ED64471C052C; Fri, 8 Jan 2021 15:53:03 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meinberg.de; s=dkim; t=1610117584; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FevS7UvGMWzP+ZlDBa8vhUPJ9pnrHVRLUghZofoak4Q=; b=J7ZnZfs5df53FgpBptQXkcetpgCiceClrAb3njWUzGLRdNQzubrM2AIsuqTYqNsRpZT7hu a5K8KYpcQmcA8pV5gk5alJw0kVkmnNm7YVBMLa087gw47KhlB/ey+GfkJMy/xjN1qhG/HK 7OetMTCU4l2U+iA60Sj/KWV9htWvruvdTgpHHqiJo2f5g0kQXoEdKMdkUTyfniAEPS5GlM rl9fd1jnBAEJPlJQOOHt8GRL+J54xrdFCglZJzLyFxFrDFOQzRn1JIh05xHU9vRw+YgcSA vGxHLkrfkJhy0Uz5zgx75tLQf6QEtkWm8fqDTAWZmo+1CoouGWPmWUKDcdDXGw==
X-Footer: bWVpbmJlcmcuZGU=
Received: from localhost ([127.0.0.1]) by srv-kerioconnect.py.meinberg.de with ESMTPSA (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)); Fri, 8 Jan 2021 15:53:00 +0100
To: Magnus Danielson <magnus@rubidium.se>, Miroslav Lichvar <mlichvar@redhat.com>
Cc: ntp@ietf.org
References: <20210102081603.1F63C40605C@ip-64-139-1-69.sjc.megapath.net> <cecaf661-92af-8b35-4c53-2f025c928144@rubidium.se> <20210104164449.GE2992437@localhost> <b1e61f7d-6cea-5e99-69f0-7eae815d9e19@rubidium.se> <20210105083328.GA3008666@localhost> <ba5d2cde-6b5e-d9b6-1877-c4060bf43e80@rubidium.se> <f8a1b9fa-887f-3402-d6e9-19dd4fa98e33@meinberg.de> <75348282-d6aa-e1f1-0ab1-4dfbc1379ff4@rubidium.se> <39e28d2c-454d-43f1-ee58-b136187212b1@meinberg.de> <f1592fa2-3922-e2ac-a9d4-6dfccaa17c36@rubidium.se> <b835a9bf-510d-c1a4-52f7-29607cff3a5b@meinberg.de> <881dd23a-39a4-c5a8-04f3-bc8686aa7ccb@rubidium.se>
From: Martin Burnicki <martin.burnicki@meinberg.de>
Organization: Meinberg Funkuhren GmbH & Co. KG, Bad Pyrmont, Germany
Message-ID: <c0a8d6f2-0346-94be-d3d1-113443047246@meinberg.de>
Date: Fri, 08 Jan 2021 15:52:59 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <881dd23a-39a4-c5a8-04f3-bc8686aa7ccb@rubidium.se>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/gK_33YisatRC06FBMYgwERsbOLE>
Subject: Re: [Ntp] CLOCK_TAI (was NTPv5: big picture)
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jan 2021 14:53:12 -0000

Magnus,

Magnus Danielson wrote:
> On 2021-01-07 17:20, Martin Burnicki wrote:
>> My opinion is that if you simplify here, you just shift the problem
>> elsewhere, and you probably get even more problems if you have to care
>> about that.
> 
> But that is the whole point of the exercise! For sure the issue do not
> disappear, but the hope is that it can move into a corner where it makes
> less harm.

If you are not involved in the corner where it is shifted then it's fine
for you, but not for those involved in the corner.

For example, look at the specs of the IRIG codes. I'm sure you know that
the most commonly used IRIG codes (e.g. B122) only provided a
day-of-year number and the time of day, but the original IRIG specs
didn't even provide a way for a receiver to find out whether the
transported time was something like UTC, or some local time with an
offset to UTC.

This in why in 1995 the IEEE 1344 standard was released, which used some
"reserved" bits of the B122 format to provide the time code receivers
with some useful information, e.g. a 2 digit year number, a local time
offset that could be used to derive UTC from the transported local time,
and even some flags like a DST indicator and a leap second warning.

This extension was (and still is) very useful if you do synchronization
using these time codes. However, 10 year later, when IEEE 1344 was well
established and known, the specs were revised, and some folks meant that
the way the time offset is to be applied to convert local time to UTC
had to be reverted because in other (unrelated) scenarios this would
also have to be subtracted instead of being added (or vice versa, but
that doesn't matter here).

Anyway, the revised standard was called C37.118 according to the new
IEEE naming scheme, the information transported in the code were the
same as before, plus some more details, but the way the *existing*
information had to be handled exactly the other way round.

Even worse, some manufacturer adopted C37.118 but still called it "1344
extensions", which resulted in a huge mess.

But guess who had to take care about customers and other users, explain
to them why "1344" suddenly worked differently than before, and how they
could solve their problems?

It was *not* the guys who had written and changed the specs. It was the
guys that provide devices and services who had to provide support for
these "corner cases".

I really wouldn't appreciate if this was repeated with NTP, where just
another version works completely different than earlier versions, and
manufacturers and service provider have to take care and teach the users.

As I said before, if you want to make such basic changes to protocol
that has been long existed and is known to use UTC time stamps, you
should give a very different name to the beast.

>>>>>>> Will that satisfy all needs? Maybe not. OK, but will this provide a
>>>>>>> vehicle for more variants. Seems likely if we have a standard way of
>>>>>>> augmenting the core timing with additional parameters and users add the
>>>>>>> mapping.
>>>>>> That's also fine, but by default it should IMO be possible to provide
>>>>>> simple clients with time in a way as compatible as possible.
>>>>> Indeed. I am confident that the client can be very simple and provide
>>>>> the right time, even with leap seconds occurring.
>>>> Of course. But the limitation is in the server. Right now, it is
>>>> sufficient to have a time source like DCF77. If you enforce using of
>>>> TAI, that's not sufficient anymore.
>>>>
>>>> You only need the UTC/TAI offset if your system supports TAI, but there
>>>> are many applications where this isn't supported, and where it's not
>>>> even a requirement.
>>> Actually, your logic is NTPv4 centric.
>> No, my logic is based on the application and usability, for systems with
>> high requirements as well as for simply systems, of which there are lots.
> How then is leap-seconds handled, easily. I fail to see that being done
> easily.

As has been pointed out before, you only shift the problem elsewhere but
don't solve it by using TAI.

Even if you do all the computation in TAI, and adjust the TAI system
clock, which benefit has an application that needs time based on UTC?

How is the kernel improved by this so that applications don't see a time
step back in CLOCK_REALTIME when a leap second is inserted?

>>> Now, I tried to make very clear
>>> it would push requirements onto servers if you would take the suggested
>>> path. I'm not going to hide that, rather, I want people to understand
>>> that this would be the logical consequence. However, once that is taken,
>>> the other parts would become much easier.
>> At the server side, if you want or need to provide time synchronization
>> for TAI-based systems and for systems that don't need/use it, in any
>> case you have to take care to get a timestamp and TAI/UTC offset
>> *consistently*.
>>
>> The only question is whether you put TAI into the base packet and apply
>> the offset to yield UTC, or vice versa, where IMO the latter is much
>> more appropriate for simple systems.
> 
> Except that when you do your core processing, you need that to handle
> the occurrence of leap-seconds in all it's processing.

Maybe, but it shouldn't be too hard if you have consistent time stamps,
and it also works like before on systems that don't support TAI.

> If you use a
> TAI-like time-base in the core. There is a whole line of checks and
> balances in that core which just never add to their complexity... in a
> simple system. It's this which is the actual point of achieving that.
> But then that will come at a cost. It will have the side-consequence of
> servers needing to know what to do. Simple clients will shift some of
> their complexity from core processing to the output adaptations, sure.
> It is only by looking at all those checks and balances we can make the
> informed decision.
> 
>>
>>> If you attempt to go the other route, in which you have a proliferation
>>> of how many time-scales servers can support, you then push out to
>>> intermediary and clients nodes to handle the hurdles. There are many
>>> part of the algorithms we are used to use that will become complex, or
>>> you would have to push to the users only to choose servers with
>>> compatible time-scales, which would be even worse as it would deepen the
>>> division we already seeing.
>>>
>>>>>>> Sure, I am advocating for a particular part of the solution space. But I
>>>>>>> think it is doable without any of the parts becoming cumbersomely
>>>>>>> complex to test and verify. An example is how PTP has extension fields
>>>>>>> extended by SMPTE 2059-2 and the various output mappings documented
>>>>>>> separately in SMPTE 2059-1 to show how the transported parameters
>>>>>>> generate all the legacy timing things. SMPTE 2110-10 then extends SMPTE
>>>>>>> 2059-2 and 2059-1 for the application within the SMPTE 2110 transport
>>>>>>> protocols and timing model of media used there. Anyway, I think it is a
>>>>>>> fair model that seems to work.
>>>>>> If a *simple* client needs to do this just to derive UTC (what's
>>>>>> probably the case for most not telco devices) than it's a wrong approach
>>>>>> to provide TAI and derive UTC from it.
>>>>> Well, it may seem so at first, but if we taken on the burden of getting
>>>>> TAI and TAI-UTC and transport that, the client can be very simple and
>>>>> provide TAI, UTC, POSIX, LINUX, PTP, GPS time scale replicas using very
>>>>> little code or complexity. The mappings provided will not require many
>>>>> pages and it will be easy to implement them all.
>>>>>
>>>>> Check your inbox for a separate example.
>>>> I've seen it. I know conversion between different time scales can be
>>>> done easily *if* all required information is available. I've also
>>>> written lots of functions that convert between different time scales.
>>> Yes, I know you know that field very well. I just wanted to be explicit
>>> so we where agreeing what we where talking about.
>>>> AFAIK, there's no current OS that doesn't support UTC, but there are
>>>> systems that don't support TAI. So why not use the time scale that is
>>>> most supported, and support other scales if they are supported/required?
>>> Exactly what do you mean with "support UTC"? Does all OSes you know
>>> support time as 23:59:60Z? If they don't, they do not fall into "support
>>> UTC" in my book. Some do support mechanisms that enables user layer to
>>> print 23:59:60Z, but far from all. Those that does just renumber UTC
>>> leap second to either 23:59:59 or 00:00:00 with no other indication does
>>> not support UTC in my book.
>> OK, then let us call it POSIX time. We all know that you'll never see a
>> second 60 in the kernel clock because it only counts binary seconds anyway.
> POSIX time is one such time, yes.
>> A huge part of the whole problem is that an inserted leap second is
>> originally defined as a numbering of seconds like 58, 59, 60, 0 in
>> human-readable format, which is totally unsuitable for timekeeping in an
>> OS kernel.
> Agree.
>> However, you can convert this properly from the binary format to a
>> second "60" on a display *if* you have some associated status
>> information available with the timestamps, *and* the conversion routines
>> in the runtime library evaluate that status.
> Completely agree.
>> For example, the API calls for Meinberg PCI cards return seconds in
>> POSIX format *and* an associated status that tells you the current
>> timestamps *is* part of the leap second.
> Which is just what a well designed interface do, and I am not surprised
> you did it well, rather I expect it without checking the details.
>> If that status information is not available, you just pulsh the problem
>> to a different location, but you don't solve it.
> 
> Which is part of the problem that needs to be solved one way or another.
> I am very aware of these problems.
> 
> Now, the trouble is that people confuse POSIX time_t to represent UTC
> and think that any POSIX time_t like system will do UTC on it's own, and
> that does not work out if you do not one way or another have that extra
> information. Also, you need to have heads-up. Different systems enables
> or requires (max and min time) pre-information about pending
> leap-second. Some limited have ability to indicate without heads-up. So,
> if you now needs heads-up, and previous NTPv4 had 1 day heads-up, for
> NTPv5 we seem to opt for allowing more, you need to know that the
> TAI-UTC number is to shift.

The question how long in advance to announce a leap second has also been
discussed zillion times.

Some folks tend to accept a leap second only at the end of June or
December, which is current practice from IERS. Have you seen the new
bulletin C 61, which has just recently been published? This triese to
avoid accepting "false alarms" e.g. as we have seen from broken GPS
receivers at the end of September.

Then other folks say that a leap second had to be accepted at the end of
*any* month, because this could happen in theory, and in worst case.

Also you want to pass the leap second warning down to a chain of clients
as soon as possible. On the other hand, a Linux kernel must not receive
it more than 24 hours in advance, so even if you implement the
transmission in the protocol in a more sophisticated way,
implementations have to take care how to evaluate this.

> For NTP on wire protocol we have to recall,
> the leap-second can occur between a pair of packets, so we need to know
> in advance.

You *do* know in advance, if the server has the information available
early enough.

> So, if you need to know of the shift, you have almost the
> same problem as knowing which TAI-UTC you have and when that will step.

Again, in most cases where the OS supports leap seconds, the algorithm
just passes the warning down to the kernel, and as long as the OSs are
not fixed / improved, the problems seen by *applications* that use the
system time are the same, regardless whether the on-wire protocoluses
TAI or UTC-like POSIX.

> The whole problem of leap-second handling to support UTC thus require
> heads-up already at the server, and for all the intermediary nodes all
> the way to the client, so that the client can do the right thing for
> it's kernel and user process when the leap second do occur. There is no
> way around that part.

Agreed, but this already works pretty well in NTPv4. If at the top of
the chain a wrong leap second announcement is inserted, you have a
problem. If at the system clock level the leap second is handled in a
strange way, you also have a problem, though a different one.

None of these can be solved by just using TAI timestamps on the wire.

> If you say that only non-leapsecond time-scales is supported, you can do
> much simpler things. Then again, you have ended up ruling out the proper
> UTC support. So either we make these properties split fully, or we fuse
> them together in such way we agree it will be the simplest and most
> robust way, and then certain UTC specific consequences will end up
> defining a number of properties.
> 
> Then, there is a strong wish to keep the core time-processing as clean
> as possible, and the properties of a TAI-like timestamp format is
> attractive. That will create challenges for servers and clients.

That's perfectly fine with me, but IMO it shouldn't be called NTP then.
For example, if you called it ETP, you could say:

"Unlike NTP, which assumes UTC timestamps, the ETP protocol works with
TAI, just similar to PTP."

That would avoid a huge amount of confusion.

> However, making all servers provide compatible time has additional
> advantages as one wants to entertain the capability of combining
> responses from servers.

Of course, if you can determine UTC/POSIX or TAI from the server
response, everything is fine, regardless whether you can consistently
derive UTC from TAI, or vice versa. ;-)

In a previous email I gave an example how you can consistently derive
TAI from UTC, if you need TAI.

> Now, traditionally that have not been an issue.
> However, we see servers doing UT1 and others doing smeared leap-seconds
> in parallel with thus providing UTC based time (more or less good,
> depending on implementation, driver etc). In fact, people wish there to
> be a more open bag of time-scales. That will end up driving further
> division rather than improving strength.

Hm, I disagree here. If you can properly determine determine UTC and TAI
from the core protocol and algorithms, there should be no problem to
support other time scales for special cases as well e.g. by using
associated extension fields.

> So, we either have to take favorite topics of the table, or find a way
> to fuse them into a system that achieves all he goals, and with the wish
> list I have seen, using TAI-like base and provide mappings in and out is
> the only one I can forsee to be fruitful, and even that will come with
> it's costs, where the servers will hurt most, but I am greatly
> considering that being worth the cost. Yes, a simple client will need to
> do a little more to convert time, but that will be the small pain
> compared to the server side, which I still consider worth having.

And what about simple servers that don't support TAI, or don't have a
source for the TAI/UTC offset?


Martin
-- 
Martin Burnicki

Senior Software Engineer

MEINBERG Funkuhren GmbH & Co. KG
Email: martin.burnicki@meinberg.de
Phone: +49 5281 9309-414
Linkedin: https://www.linkedin.com/in/martinburnicki/

Lange Wand 9, 31812 Bad Pyrmont, Germany
Amtsgericht Hannover 17HRA 100322
Geschäftsführer/Managing Directors: Günter Meinberg, Werner Meinberg,
Andre Hartmann, Heiko Gerstung
Websites: https://www.meinberg.de  https://www.meinbergglobal.com
Training: https://www.meinberg.academy