Re: [Ntp] Frequency transfer in NTP

Magnus Danielson <magnus@rubidium.se> Sun, 31 January 2021 22:30 UTC

Return-Path: <magnus@rubidium.se>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 819393A12D0 for <ntp@ietfa.amsl.com>; Sun, 31 Jan 2021 14:30:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rubidium.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XyXyuaXcTXZI for <ntp@ietfa.amsl.com>; Sun, 31 Jan 2021 14:30:13 -0800 (PST)
Received: from ste-pvt-msa1.bahnhof.se (ste-pvt-msa1.bahnhof.se [213.80.101.70]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 80D903A12CE for <ntp@ietf.org>; Sun, 31 Jan 2021 14:30:09 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTP id 637983F62E for <ntp@ietf.org>; Sun, 31 Jan 2021 23:30:07 +0100 (CET)
Authentication-Results: ste-pvt-msa1.bahnhof.se; dkim=pass (2048-bit key; unprotected) header.d=rubidium.se header.i=@rubidium.se header.b=FpzAD7cm; dkim-atps=neutral
X-Virus-Scanned: Debian amavisd-new at bahnhof.se
Received: from ste-pvt-msa1.bahnhof.se ([127.0.0.1]) by localhost (ste-pvt-msa1.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Hh00vprECULP for <ntp@ietf.org>; Sun, 31 Jan 2021 23:30:05 +0100 (CET)
Received: by ste-pvt-msa1.bahnhof.se (Postfix) with ESMTPA id 482923F5EB for <ntp@ietf.org>; Sun, 31 Jan 2021 23:30:04 +0100 (CET)
Received: from machine.local (unknown [192.168.0.15]) by magda-gw (Postfix) with ESMTPSA id 7B1189A008A; Sun, 31 Jan 2021 23:30:04 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=rubidium.se; s=rubidium; t=1612132204; bh=7+C956hyu0XNol4sIz270hJa1L6KQhozI7VpsrC/Va4=; h=Cc:Subject:To:References:From:Date:In-Reply-To:From; b=FpzAD7cm+aUsE3u3+OQt5QxEK+tXuqTDqmLpTn8jpjYgq8E0uQ8xUPuKmf9+sTMp9 9PZh4lezzPXBXGNaCpwg3Hpz/XscdCp3hgt2z8an2Pp5Bq3EZlZGkGrIXr6Or52kiP YwmkgnxOY7so46XluTmxXt1gdwpT8bKpMR8+OBiRooXFYhGQMPqvuJS1I8KcxTfq6/ s28lWyFT3fHPF1MpTWH8YPtsm+iKgsVMF3fFIgIcannQ4dD9tztXcrLfAA123JGM3A pc81tG3rBnd1rlqu9QLCURv6Ngk+DZq1pys+fGF3/OoK8uoPmBSbmQJRj/fDId5Pue yQP5TXlFmOIgg==
Cc: magnus@rubidium.se
To: ntp@ietf.org
References: <20210128143137.GA1205378@localhost>
From: Magnus Danielson <magnus@rubidium.se>
Message-ID: <f60202de-d53f-4dea-6e2b-d59dbb0e1143@rubidium.se>
Date: Sun, 31 Jan 2021 23:30:03 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.7.0
MIME-Version: 1.0
In-Reply-To: <20210128143137.GA1205378@localhost>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/gVIHSGS9WL0WvzdIhSQBzivdHX0>
Subject: Re: [Ntp] Frequency transfer in NTP
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 31 Jan 2021 22:30:17 -0000

Hi Miroslav,

On 2021-01-28 15:31, Miroslav Lichvar wrote:
> One of the new features that was proposed for NTPv5 is an additional
> receive timestamp to enable a frequency transfer to stabilize
> synchronization. I have tested this idea with two different
> implementations:
>
> https://fedorapeople.org/~mlichvar/ntp-freq-transfer/
>
> I think this would be a significant improvement of the protocol.
>
While I think there is room for improvements in NTP, I remain skeptic
about this one.

First of all, I see many issues with the implementations as you use as
your reference settings. The fact that they are poorly tuned to be
creating overshot is a tell-tale. In the frequency domain, each of the
PLLs will have significant "jitter peaking" (this is a searchable term
for relevant articles and books). Jitter peaking creates an increased
positive gain for each step at the same frequency as you progress down
the chain, and your time-domain plots illustrates the same fact just
perfectly. The first PLL loop will feed the "worst case" input into the
next where it will amplify it most, and so it goes. The problem is
already well analyzed classically, and what you need to do is to
increase the damping factor, and hence reduce Q, of the PLL and this is
done by increase the P factor of a PI-loop PLL.

So, you need to do this up front to be meaningful as reference. That you
didn't will show off in a tell-tale that I will come back to.

You also use a statement that you cannot in the protocol convey both
time and frequency errors. This is just plain incorrect. A phase
measurement will include both. A static phase error read will be that
phase error. A phase ramp will be the tell-tale of a frequency
difference. What you do in a PI-loop is that you scale the phase
difference measure through the P and I factors and then the I part hits
an integrator and then the output of that is summed by the phase scaled
by the P factor. For simplicity let's assumed that the loop have locked.
When it has locked, the loop as changed state such the integrator in the
PI regulator holds a state that reflects the frequency error, just
around where it needs to be as it steers the frequency of the oscillator
to keep the phase ramp of the source and phase ramp of the steered
oscillators on average alined. Any DC component out of the phase
detector would drive the integrator towards achieving the balance. The I
scaling factor is proportional to the square of the PLLs resonance
frequency. Now, as locked, the DC part of the phase error is driven to
zero, so what remains is the variations around the balance, and that AC
goes though the P factor which scales this in steering the oscillator.
The P factor will be proportional to the resonance frequency and the
damping factor. Increase P and you reduce the fluctuations. Reduce P and
you increase them, in fact as you drive it towards zero you go towards a
sine-oscillator.

Now, the resonance frequency I mentioned will also be the frequency of
the jitter peaking, and the gain at that will always be positive, and a
factor of the Q/d/P chosen. So, as you increase Q (that is decrease
damping d and factor P) the jitter peaking goes up. A rule of thumb, and
this is from memory, is that you want to keep jitter peaking below 0.25
dB, requiring a damping factor being 3 or higher. The over-shots you see
already on a single link tells me you have an underdamped system, that
is Q > 1/sqrt(2). You want to be in the overdamped system.

Now, I did not cover how the PI-regulator achieves lock, and that is
because you can easily consider this a messy nonlinear thing, chaotic to
the untrained eye. Turns out that if you just let it, a PI-loop will
always lock. Without going into too much messy details, it turns out you
need to go through the loop twice, once through the AC path and once
through the DC path to understand how the beat-notes will make the
integrator of the PI to move in the right direction. Once it is
sufficiently close, the remaining frequency and phase error is small
enough for it just to decay without loosing cycles. Lack of those AC
cycles is classically used to indicate the lock of the loop.

Can we improve this? Well, we could either do a frequency estimator or
go to full PID loop controller. I did this. The frequency error produces
a linear phase-ramp in the phase data. Now, as we derive that to get the
slope, we do get the derive path for the D in the PID. Turns out that
any other frequency estimator achieves the same basic processing. Fine,
but let's analyze that and balance it. To my great surprise, it turns
out that my P and D factors just add on top of each other in a PID PLL.
It's just the same. Also, a similar thing has been analyzed in a
mechanism called injection locking, at which the input frequency of a
signal will "pull" and oscillator to it. That also turns out to shift
the damping of the oscillator.

So, as you do this, all you end up doing is create a more complex way to
sett your damping factor and hence P. It will not be very helpful for
your understanding of the dynamics of the system. That you was able to
go from overshooting and thus under-damped to a more well-behaved
damping is because you ended up doing exactly this, alter the damping
factor. Your starting point of was very good to illustrate the effect,
but not very representative to what it should be. Actually, you should
go from boring response to even more boring response.

No need to get dizzy over the Fokker-Plank graphs, always something.

I have before mentioned that incorrect setting of parameters compared to
polling rate will create overshoots. This is because P and I parameters
needs to be scaled correctly with regards to the polling rate, or else
the balance between them will alter the actual damping factor and thus
going from over-damped to under-damped.

So, what I see with my experience is not a problem with the protocol,
but a problem with the implementations. I think those should be
addressed first, before going into considering changes to the protocol
making it more complex, especially as it may not be needed.

Sorry if this discourage you, but I did the exercise and ended up
learning, so I just want to share some experience and help getting
things towards the right direction.

I can provide a range of references, unfortunatly not all in online
sources. I could do a fairly long derivation of equations for it, and
you can find posts from me where I do that.

I would recommend that we do not progress further with this ID until the
issues I've pointed out have been addressed and we look at the reminding
benefit, if any, for the ID.

Cheers,
Magnus