Re: [Ntp] Is one refid enough

Miroslav Lichvar <mlichvar@redhat.com> Thu, 05 September 2019 08:41 UTC

Return-Path: <mlichvar@redhat.com>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2F805120CAA for <ntp@ietfa.amsl.com>; Thu, 5 Sep 2019 01:41:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.901
X-Spam-Level:
X-Spam-Status: No, score=-6.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IEYMEzIOT413 for <ntp@ietfa.amsl.com>; Thu, 5 Sep 2019 01:41:24 -0700 (PDT)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 71D13120BE7 for <ntp@ietf.org>; Thu, 5 Sep 2019 01:41:24 -0700 (PDT)
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AAB1C30860D1 for <ntp@ietf.org>; Thu, 5 Sep 2019 08:41:23 +0000 (UTC)
Received: from localhost (holly.tpb.lab.eng.brq.redhat.com [10.43.134.11]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 28482600F8 for <ntp@ietf.org>; Thu, 5 Sep 2019 08:41:22 +0000 (UTC)
Date: Thu, 05 Sep 2019 10:41:21 +0200
From: Miroslav Lichvar <mlichvar@redhat.com>
To: ntp@ietf.org
Message-ID: <20190905084121.GL15024@localhost>
References: <CACsn0c=0VFPtYHkQnyjaukK3-TBS60J=cZ0LM1hVkuZg3yLG_Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CACsn0c=0VFPtYHkQnyjaukK3-TBS60J=cZ0LM1hVkuZg3yLG_Q@mail.gmail.com>
User-Agent: Mutt/1.12.0 (2019-05-25)
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Thu, 05 Sep 2019 08:41:23 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/EDaIrzRIxmw1c4OQZPT1Q4y7738>
Subject: Re: [Ntp] Is one refid enough
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2019 08:41:27 -0000

On Wed, Sep 04, 2019 at 09:52:40PM -0700, Watson Ladd wrote:
> The first use put forward was for redundancy: one would gather
> intermediate sources until enough root sources were gathered. But this
> isn't actually a reflection of the reliability: the NTP environment is
> a graph, and the stratum 1 sources are the roots of a dynamically
> created spanning tree. In particular if we have two stratum 1 sources
> A and B, and two intermediates C and D, then if both C and D are using
> both A and B then there is full redundency, even if both have better
> connectivity and thus use A to synchronize with.

I'm not sure I understand this correctly. Do you mean that C and D
should or should not be synchronized to both A and B at the same time,
even if A is much better than B? I guess that would be up to the
clustering and combining algorithms later in the selection process,
and not anything related to refids.

> The second use was for preventing loop formation: by excluding a
> source that has synchronized to you, this prevents loops. Let's take a
> simple example: A and B are two stratum 1 sources, C and D take from A
> and B respectively, and are peered. Because A is so much more stable C
> synchronizes to it, and D synchronizes to C. Now assume that A goes
> down. What should eventually result is C synchronizing with D and D
> synchronizing to B. The question of which mechanism between using
> reference IDs and accumulating errors/stratum will work better is not
> obvious: it seems to me that not using reference IDs works just fine
> in this example and provides faster recovery: C can synchronize to D
> immediately as it is the best surviving timesource, and the error
> accumulation eventually means D prefers B (in practice quite quickly)
> vs. waiting for C to drift enough for D to switch before synchronizing
> to D.

No, that doesn't sound right to me. If C was significantly better than
B from the point of view of D when A stops working, D might prefer C
over B for quite some time and if C switched to D, there would be a
loop. They have to check the refid to prevent that from happening.

Fast reselection when something goes down is not a goal. As long as C
is better than B, we want C to run free and keep D synchronized to C.
When C accumulates so much dispersion that its worse than B, D will
switch to B and only then C can synchronize to D. At least that's how
I think it's expected to work.

-- 
Miroslav Lichvar