Re: [Ntp] The bump, or why NTP v5 must specify impulse response

Miroslav Lichvar <mlichvar@redhat.com> Wed, 22 April 2020 10:40 UTC

Return-Path: <mlichvar@redhat.com>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B54713A0829 for <ntp@ietfa.amsl.com>; Wed, 22 Apr 2020 03:40:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mbvF6SDSxejE for <ntp@ietfa.amsl.com>; Wed, 22 Apr 2020 03:40:38 -0700 (PDT)
Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 490043A0821 for <ntp@ietf.org>; Wed, 22 Apr 2020 03:40:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587552037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7MHuZjw6nK/Jy9fC987ZIpHHn08wOUDqPhSTh7B67mc=; b=Mizzb+6ktSpRn3Ky9Z0BBzaTDlgoFxN7wV1W6ddDWmFJKLU1LYeht5ug6dn8m+74mOy30b lWT0y6zO/Kza2+BRgmpumS54HLMUk6/ONlVB2OEyGNu+WGJRLpC8jD32xrIOYLk1Sa9ZV6 fP5lehB2nvNT/0ScTbAiT9iEVhT5/gs=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-348-spYujZtjOIeFsRp7MGB2Pg-1; Wed, 22 Apr 2020 06:40:32 -0400
X-MC-Unique: spYujZtjOIeFsRp7MGB2Pg-1
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 29DDC1088380; Wed, 22 Apr 2020 10:40:31 +0000 (UTC)
Received: from localhost (holly.tpb.lab.eng.brq.redhat.com [10.43.134.11]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2F6A76084A; Wed, 22 Apr 2020 10:40:30 +0000 (UTC)
Date: Wed, 22 Apr 2020 12:40:28 +0200
From: Miroslav Lichvar <mlichvar@redhat.com>
To: Daniel Franke <dfoxfranke@gmail.com>
Cc: Watson Ladd <watsonbladd@gmail.com>, NTP WG <ntp@ietf.org>
Message-ID: <20200422104028.GH4396@localhost>
References: <CACsn0c=zzDKP6iBjPJWGF0rkqSaY3AY738ynGwDZO14sdBJ-Bg@mail.gmail.com> <CAJm83bB2A3VUxXX47Y0ubmS9Xne7PRSyV_xHY_D9YvHjqE-vFA@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CAJm83bB2A3VUxXX47Y0ubmS9Xne7PRSyV_xHY_D9YvHjqE-vFA@mail.gmail.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/VLB8CqU17wd0voj1-N1GeisEY9g>
Subject: Re: [Ntp] The bump, or why NTP v5 must specify impulse response
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Apr 2020 10:40:40 -0000

On Mon, Apr 13, 2020 at 10:21:40AM -0400, Daniel Franke wrote:
> On Sun, Apr 12, 2020 at 6:11 PM Watson Ladd <watsonbladd@gmail.com> wrote:
> > Section 14.5 of Phase Lock Techniques by Floyd Gardner describes
> > difficult problems caused by the accumulation of errors in a chain of
> > PLLs under benign conditions, and section 2.2.4 describes the root
> > cause, namely an inevitable peak in the transfer function of a second
> > order filter. I don't think these are insolvable in NTP v5, but they
> > should give pause to the idea that we can avoid needing to specify the
> > the synchronization algorithm. The peaking of the PLL needs to be
> > controlled, or some thing more complex needs to be specified.
> 
> Are the algorithms in RFC 5905 vulnerable to this? If so, I think that
> shoots down any implication that it's a practical necessity to solve
> this in the NTPv5 spec, since we've obviously been getting by okay so
> far.

That's a good question. I did some experiments to try to get an
answer. The code of the PLL/FLL that's provided in RFC 5905 seems to
be identical to the code in ntp-4.2.4. Here is a response to a 10ms
time step of three clients in a chain using the default minpoll and
maxpoll:

https://mlichvar.fedorapeople.org/ntpresponse/daemon.png

That looks ok to me.

However, that's not the loop which is actually used on most systems
running ntpd. By default it uses the kernel PLL/FLL (aka kernel
discipline), which has a different FLL. A response I saw in my test
was much worse:

https://mlichvar.fedorapeople.org/ntpresponse/kernel.png

There is a large overshoot and ringing, getting worse in the chain of
servers. If that's how ntp is expected to work, or at least users were
happy with so far, we probably don't need to worry about it too much.

I think it should be a responsibility of individual implementations to
have a reasonably well working control loop. Is it likely that two
different implementations would perform well in a homogeneous
environment, but fail horribly when mixed? If that happened, couldn't
they cooperate and make a fix on either side?

FWIW, the PTPv2 specification doesn't have any requirements on the
clock servos and different implementations seem to be able to
interoperate. As I understand it, the issue with long chains of clocks
(which is avoided with transparent clocks) is not specific to mixed
environments.

-- 
Miroslav Lichvar