Re: [Ntp] Frequency transfer in NTP

Magnus Danielson <magnus@rubidium.se> Mon, 01 February 2021 11:01 UTC

Return-Path: <magnus@rubidium.se>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AE3213A0D71 for <ntp@ietfa.amsl.com>; Mon, 1 Feb 2021 03:01:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rubidium.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I21zfvXWFl2K for <ntp@ietfa.amsl.com>; Mon, 1 Feb 2021 03:01:05 -0800 (PST)
Received: from pio-pvt-msa1.bahnhof.se (pio-pvt-msa1.bahnhof.se [79.136.2.40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E6EA43A0D6F for <ntp@ietf.org>; Mon, 1 Feb 2021 03:01:01 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTP id D26CC3F777; Mon, 1 Feb 2021 12:00:59 +0100 (CET)
Authentication-Results: pio-pvt-msa1.bahnhof.se; dkim=pass (2048-bit key; secure) header.d=rubidium.se header.i=@rubidium.se header.b="GJFEZzA1"; dkim-atps=neutral
X-Virus-Scanned: Debian amavisd-new at bahnhof.se
Received: from pio-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (pio-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j4qlrCNa11CR; Mon, 1 Feb 2021 12:00:57 +0100 (CET)
Received: by pio-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 847D73F76A; Mon, 1 Feb 2021 12:00:57 +0100 (CET)
Received: from machine.local (unknown [192.168.0.15]) by magda-gw (Postfix) with ESMTPSA id ADE0B9A0524; Mon, 1 Feb 2021 12:00:54 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=rubidium.se; s=rubidium; t=1612177255; bh=0lz9OEPwnvPE8G3Z+YebeyW/0cs2i1sdRUSWXHb2OEo=; h=Cc:Subject:To:References:From:Date:In-Reply-To:From; b=GJFEZzA1GjVo6jeUpP4itB03w/6JO0pkydA+7lMbn+I3eyFrY0UC1RXkaTwcxuFUq JQubZLBErOHUnh/uVz7HzXISuiVSLFKyzBlsQe7Drr5lifDgPCbv5uVzPmrwGXJ4h0 ecd6+U6SwyLPJblrQzz4EKXh3LNQdSqG0Su15Pkm2bxGisHDfqBSlsS0pFxKaEbVhx ex1xUG5auOYRHxub2Oc/8aZ6rZxLA51wG03TacHEmT8Ru/7Ige94lHSc3BADRNaAHI LWEvkG6XBasE+LwJt6p3x2DxdjXbDiuUWGs2XVpg/wWVuPWh+l4E2CjgdSfg9ISa5D 5mbdSPfRNvC8w==
Cc: magnus@rubidium.se, ntp@ietf.org
To: Miroslav Lichvar <mlichvar@redhat.com>
References: <20210128143137.GA1205378@localhost> <f60202de-d53f-4dea-6e2b-d59dbb0e1143@rubidium.se> <20210201093709.GF1205378@localhost>
From: Magnus Danielson <magnus@rubidium.se>
Message-ID: <a22737e3-05d0-e681-e32f-daada351e51c@rubidium.se>
Date: Mon, 01 Feb 2021 12:00:50 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.7.0
MIME-Version: 1.0
In-Reply-To: <20210201093709.GF1205378@localhost>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/rLuINQtMXIAzCyLnPRWU1bsk-NA>
Subject: Re: [Ntp] Frequency transfer in NTP
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Feb 2021 11:01:10 -0000

Miroslav,

On 2021-02-01 10:37, Miroslav Lichvar wrote:
> On Sun, Jan 31, 2021 at 11:30:03PM +0100, Magnus Danielson wrote:
>> perfectly. The first PLL loop will feed the "worst case" input into the
>> next where it will amplify it most, and so it goes. The problem is
>> already well analyzed classically, and what you need to do is to
>> increase the damping factor, and hence reduce Q, of the PLL and this is
>> done by increase the P factor of a PI-loop PLL.
> Yes, you can dampen the loops to avoid the overshoot, but that will
> have a negative impact on the timekeeping performance. You can
> minimize the phase error, or frequency error, but not both at the same
> time.
Turns out that as you measure the phase stability in MTIE and TDEV, as
well as frequency stability in ADEV, you need to remove that overshot /
resonance for best stability. This becomes especially true as you build
a chain of them, as it will grow worse along the chain.
>> You also use a statement that you cannot in the protocol convey both
>> time and frequency errors. This is just plain incorrect. A phase
>> measurement will include both.
> The point that I'm trying to make is that transfering time and
> frequency separately works better than time transfer alone.

I have yet to see that work and make a distinction which is not
expressable by other means.

It is only when you do frequency transfer by some completely different
means that you can get a benefit, but not within the same protocol. Been
there, done that, did not pan out.

>
>> I have before mentioned that incorrect setting of parameters compared to
>> polling rate will create overshoots. This is because P and I parameters
>> needs to be scaled correctly with regards to the polling rate, or else
>> the balance between them will alter the actual damping factor and thus
>> going from over-damped to under-damped.
> PI loops are extremely simple, but difficult to tune.

While extremely simple, difficult to tune has not been my experience.
Once one have done once gain scaling right, properly orthogonal setting
of damping and frequency have been straight-forward. Getting optimum
performance given a certain condition is as always a bit tricker, but
not impossible.

As for simple, I have taught my colleagues how the loop works, and they
always got it lock up quickly and with very little training they have
been able to hand-tweak the parameters to about the right neighborhood,
after which we used good measurement tools to fine-tune.

>  I have plenty of
> experience with them in PTP. I find it interesting that you would
> consider using one in NTP, where the conditions are more variable,
> there is more filtering, non-constant update interval, etc. Making one
> comparable to the standard NTP loop would be tricky.

Non-constant update intervals is something you can parameter in. It goes
into the gain of the integrators of the loop, as always. The
polling-rate scale factor to the loop is not too hard to scale up.

The PI loop is simple, but very robust if treated right, so all one
needs to focus on is doing that, and that comes down to setting damping
factor and gain factors appropriatly.

A Kalman filter would adapt dynamically, and that's nice, but as it
converge to a stable steady state setting, it is just as a PI loop and
you have the same tuning issue with regard to the damping factor.
Semi-Kalman heuristics can be done fairly simple with a PI loop if you want.

I've worked with that assortment of tools, and they fit different needs.
Turns out, often you can do most of it just fine with a PI-loop and not
waste too much time and energy on it. Some cases needs the Kalman for
sure. Networks will not only need Kalman, you need non-linear
assistance, and that is what I have done when needed.

>
>> I can provide a range of references, unfortunatly not all in online
>> sources. I could do a fairly long derivation of equations for it, and
>> you can find posts from me where I do that.
> If you wanted to be constructive, it would be best if you showed us an
> example of a loop that doesn't overshoot in the step response and
> otherwise performs similarly to ntp or chrony.
I could provide an example of scaling with poll-rate, sure. I just don't
have the setup to do measurement of a chain, so that would take time and
effort that I might not be able to pull together, but others may have that.
>
> We can compare it a chain of servers and see if it performs better
> than the ntpd and chronyd patched for frequency transfer.

Recommended reading: Trischita & Verma "Jitter in Digital Transmission
Systems" from Artech House 1990. You can then find more context in ITU-T
G.810-813 and G.823-825. It's interesting to note that the reference
network used is a chain of 60 nodes, so they have for sure been looking
at the long chain issue. I can provide you with more references if you
like. I've always found Stefano Bregni "Synchronization of Digital
Telecommunications Network" a very handy book, and you should be able to
locate his home page and lots of reading material.

You might object that this does not match up with how NTP work. Well,
turns out that terminology etc. might be different, some technical
details different, but once one look at things coming down to time/phase
measures and frequency it ends up being more or less the same. You can
do the same for GPS/GNSS tracking as well. While the technical details
may look different, the overal dynamics ends up being quite similar. The
one thing that makes packet timing a bit unique, is the statistics of
pakets. As you do a two-way time-transfer you at least can get a
balanced "noise", but using non-linear methods you can do much better.
Core timing dynamics of the control loop ends up being more or less the
same situation.

I've also spend my time to build chains at work, I recall a 26 node
chain at one time, and it shows for sure as we measure. The theory and
practice add up. It really does. We have one setup running constantly,
but it is not quickly adapted to test this, but we learned this the hard
way, I've already done the exercise, and what you are saying does not
match up with my experience, both failures and ways to make things work
really well. In practice, both me and my customers run chains beyond 15
nodes in length on daily basis, delivering critical signals (at least
that's what our customers say they are), so if we got this wrong, let me
assure you that I would hear about it. In particular, as some of the
signals we carry is really very sensitive to timing disturbtions. So,
when I advice about this, it's not just an opinion, it's based on both
my own and others hard earned experience. I have designed and
implemented a time-transfer system independent of PTP, so I have played
the same game, both over telecom links and packets.

Turns out that jitter tolerance is another aspect, and high deviations
ends up being problematic there, again driving the Q down.

The same issue shows up in high speed digital jitter links, relating to
signal integrity and the link bit error rate. There is an extensive
report on that

So I disagree with you, based on all that science and personal
experience. I spent a lot of time to understand the issue, I've done the
same mistakes as you, and recovered from them.

Cheers,
Magnus