Re: [Ntp] CLOCK_TAI (was NTPv5: big picture)

Magnus Danielson <magnus@rubidium.se> Tue, 05 January 2021 15:51 UTC

Return-Path: <magnus@rubidium.se>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09F2E3A102D for <ntp@ietfa.amsl.com>; Tue, 5 Jan 2021 07:51:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.361
X-Spam-Level:
X-Spam-Status: No, score=-2.361 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.262, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rubidium.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6HfwEms3xp3q for <ntp@ietfa.amsl.com>; Tue, 5 Jan 2021 07:51:19 -0800 (PST)
Received: from ste-pvt-msa1.bahnhof.se (ste-pvt-msa1.bahnhof.se [213.80.101.70]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BE9483A100D for <ntp@ietf.org>; Tue, 5 Jan 2021 07:51:17 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTP id 7F9863F6DC; Tue, 5 Jan 2021 16:51:15 +0100 (CET)
Authentication-Results: ste-pvt-msa1.bahnhof.se; dkim=pass (2048-bit key; unprotected) header.d=rubidium.se header.i=@rubidium.se header.b=hxBQyE+z; dkim-atps=neutral
X-Virus-Scanned: Debian amavisd-new at bahnhof.se
Received: from ste-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (ste-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZTqgUtBU9UUv; Tue, 5 Jan 2021 16:51:14 +0100 (CET)
Received: by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 068023F6BE; Tue, 5 Jan 2021 16:51:13 +0100 (CET)
Received: from machine.local (unknown [192.168.0.15]) by magda-gw (Postfix) with ESMTPSA id 717E69A04F3; Tue, 5 Jan 2021 16:51:13 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=rubidium.se; s=rubidium; t=1609861873; bh=7+FT1jL97Sb8wHkZn4xhMY2SKgyCtAmK8yCKF0M34Tc=; h=Cc:Subject:To:References:From:Date:In-Reply-To:From; b=hxBQyE+zGlQt98i2JJamH1tkP0BnxQgf/9Nb7mcOeqOfN2jfKK/ShyPZGbR9ZyEUR HldMYdI2oCp9FdxxX9u8i93EWyxj8mjYnLxPttGvuG4IqDt7RWk+22tH2H+FYNcL1/ komnXMpn1AeSBqEiQuSh5G8Qe9Au4Im3JqXG1fXeE8SZF9ZFWfA1+y1OrghuS+dsHa +xZlgujylxzRdDCpHCRnZ/PtlWsUir68GD7GEav64Q2F/9kIjQxAEOTcdqnMTXpIZ2 hq4j7hAlf5HpT5ktC6ORP7Uf+A1uHDzajj9XCr9TzqnyLaFBaeB6ojXI9lgJgji6jK 4xhr6Nk7WqRTQ==
Cc: magnus@rubidium.se, ntp@ietf.org
To: Miroslav Lichvar <mlichvar@redhat.com>
References: <20210102081603.1F63C40605C@ip-64-139-1-69.sjc.megapath.net> <cecaf661-92af-8b35-4c53-2f025c928144@rubidium.se> <20210104164449.GE2992437@localhost> <b1e61f7d-6cea-5e99-69f0-7eae815d9e19@rubidium.se> <20210105083328.GA3008666@localhost> <ba5d2cde-6b5e-d9b6-1877-c4060bf43e80@rubidium.se> <20210105144225.GH3008666@localhost>
From: Magnus Danielson <magnus@rubidium.se>
Message-ID: <35c4be55-b6af-82b5-aacd-d5a591383dec@rubidium.se>
Date: Tue, 05 Jan 2021 16:51:13 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <20210105144225.GH3008666@localhost>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/35Zw3d28MJKORDvDlET_V5Gc8sQ>
Subject: Re: [Ntp] CLOCK_TAI (was NTPv5: big picture)
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jan 2021 15:51:22 -0000

Miroslav,

On 2021-01-05 15:42, Miroslav Lichvar wrote:
> On Tue, Jan 05, 2021 at 03:10:55PM +0100, Magnus Danielson wrote:
>> On 2021-01-05 09:33, Miroslav Lichvar wrote:
>>> With hardware timestamping there is a separate (hardware) clock used
>>> for timestamping. It's not related to the CLOCK_TAI system clock
>>> discussed here.
>> So, you are saying that the kernel sense of TAI is not used to control
>> the time of the hardware clock? Fine, but some driver is doing that,
>> because the hardware clock of a typical is very stupid and is controlled
>> even with simpler forms than that for which the kernel clock is steered
>> by NTPD over the nanokernel interface.
> The kernel and driver have no idea in what timescale the hardware
> clock is keeping time, if any. That's up to the applications. Unlike
> with the system clock, there is no support for inserting/deleting leap
> seconds, so it needs to use a non-leaping representation.
>
>> Second challenge, if we have a TAI-like core format, and we indeed can
>> rebuild a TAI replica, and we do not consider this enough, would a TAI
>> format allows us to fairly cheaply adapt that to other formats? Turns
>> out that this comes very cheap and is easy to do robust, if and only if,
>> one has access to the TAI-UTC difference and information about an
>> upcoming one as that eventually approaches.
> Yes, that's the problem I'm trying to point out. The TAI-UTC offset is
> not universally available, but it is required for conversion between a
> leaping and non-leaping representation of time. It doesn't matter if
> it is in the kernel, libc, or the NTP implementation. When I want to
> synchronize two UTC clocks, which is probably the most common case,
> there is no point in converting the timestamps to a non-leaping
> representation and back. If it cannot be done reliably, as is the case
> with current systems, it will just make the synchronization less
> reliable.

Actually, here lies a problem. There seems to be fairly wide agreement
that we want the core time-stamping to use a TAI-like form of time. This
means, it keeps ticking, behaves like a well-functioned linear ramp.
That makes any processing of it easy and so does all encoding.

UTC fails to be such a ramp in many encodings. When the leapsecond is
inserted in 23:59:60, that becomes an encoding challenge for all the
formats I know.

TAI achieves this ramp property, and so does TAI-like things like PTP
and GPS.

UNIX/POSIX time_t fail this ramp property, as they recode the 23:59:60
into 00:00:00, which is followed by another 00:00:00.

NTPv4 fail this ramp property, as it recode the 23:59:60 into 00:00:00
and a leap-indication, which is followed by another 00:00:00.

Linux time_T fail this ramp property, as it recode the 23:59:60 into a
second 23:59:59.

So, when forces say they really, really want to get rid of these
encoding issues in the core transport of time-stamps, it's a real thing.
It will be completely incompatible with UTC transport without means to
handle things.

Now, if the source side is unable to derive the TAI-UTC difference from
it's timing source (say a simple GPS-receiver), when to resolve this,
some other means of resolvement needs to be done at the server. You
could use a leap-second.list file. You could use DNS based methods. You
could listen to other trusted NTP servers only to attain the TAI-UTC
difference and then re-distribute that. Regardless of which, when
achieved you can provide the full service for all clients.

Then, what happens if neither of these methods become at hand. Shall we
have a way to indicate this. This we should for sure discuss. So, if you
now know you have a good UTC-source, but it is unaided with the TAI-UTC
difference. Can I use it? Sure, it will provide the right frequency and
phase, it's annotation of time can jump strangely on leap-seconds, but
most of the time we will be able to handle it. OK, so a form of reduced
service, but still a service. Here comes the thing to discuss, do we
want to put the burden to handle this on the clients or push it to the
servers?

I think we need to agree on either really just have one time-scale in
the core which will be the same for all NTPv5 users, and with that say
that either you are able to deliver that, or you do not provide the
service. Thus, you require a basic set of adaptation mappings as you go
into the network and then with that you can pick and choose from a set
of adaptation mappings as you go out.

If we say that we indeed can have different time-scales, then each user
then needs to be able to handle the situation of conflicting time-scales
from different servers, either separate or at the same time. I end
thinking madness lies within. One of the things that NTP traditionally
allows us to do is to use multiple servers and weigh them together to a
new composite clock based on their performance (to the client). This is
really derived out of David Allan's clock algorithm work originally.
That will be more difficult to implement if we have diverse time-scales
to source from, notice now that I'm not saying impossible, but more
difficult. We may have to alter those details in the algorithm to handle
the various ways clocks can behave. We would need to know how a
leap-second smoothing clock is doing it's smoothing for instance.

It is exactly these things which makes it such a great idea to have a
very linear ramp type of clock in the core timing, it is a very
attractive technical solution. It will require the handling of TAI-UTC
difference as we get the clock, using one of many ways to do it, and
while it may be an implementation detail, we can provide hints and
potential solutions to many ways of achieving it. Once we done that, we
will actually have both TAI and TAI-UTC knowledge, so as a side effect
we might as well distribute it along.

Now, if you know the current TAI-UTC difference, often you can apply
heuristics to the jump you see on an input clock. Problem is you are now
late with that knowledge and the ambition to distribute it before the
event. So, for that reason I view that observation is not very useful
for the problem at hand.

So I think we need to choose between two different scenarios, either one
where we allow multiple timescales being sent, without TAI-UTC
difference at all times, or we go for a unified timescale meeting the
ideal model properties, and say that we need to be able to resolve the
issue. There is no real way around that central question.

I lean towards pushing the conversion requirement onto the NTP source
nodes just to achieve the intended cleanness of the rest of the
infrastructure.

It is only as a side-consequence that I note that it will come with very
low additional cost to also provide the TAI-UTC difference as needed
when we have done that, but it is not my primary motivation although a
very nice positive upside. Thus, a nice secondary gain.

Cheers,
Magnus