[Ntp] Antw: Re: Is one refid enough

"Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> Thu, 05 September 2019 09:06 UTC

Return-Path: <Ulrich.Windl@rz.uni-regensburg.de>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E3F71120DAD for <ntp@ietfa.amsl.com>; Thu, 5 Sep 2019 02:06:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v86gkQjFqaZ7 for <ntp@ietfa.amsl.com>; Thu, 5 Sep 2019 02:06:10 -0700 (PDT)
Received: from mx2.uni-regensburg.de (mx2.uni-regensburg.de [IPv6:2001:638:a05:137:165:0:3:bdf8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15A66120CF2 for <ntp@ietf.org>; Thu, 5 Sep 2019 02:06:10 -0700 (PDT)
Received: from mx2.uni-regensburg.de (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 53B206000055 for <ntp@ietf.org>; Thu, 5 Sep 2019 11:06:07 +0200 (CEST)
Received: from gwsmtp.uni-regensburg.de (gwsmtp1.uni-regensburg.de [132.199.5.51]) by mx2.uni-regensburg.de (Postfix) with ESMTP id 2DB546000054 for <ntp@ietf.org>; Thu, 5 Sep 2019 11:06:07 +0200 (CEST)
Received: from uni-regensburg-smtp1-MTA by gwsmtp.uni-regensburg.de with Novell_GroupWise; Thu, 05 Sep 2019 11:06:06 +0200
Message-Id: <5D70CFFD020000A1000337D6@gwsmtp.uni-regensburg.de>
X-Mailer: Novell GroupWise Internet Agent 18.1.1
Date: Thu, 05 Sep 2019 11:06:05 +0200
From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
To: "ntp@ietf.org" <ntp@ietf.org>,<mlichvar@redhat.com>
References: <CACsn0c=0VFPtYHkQnyjaukK3-TBS60J=cZ0LM1hVkuZg3yLG_Q@mail.gmail.com> <20190905084121.GL15024@localhost>
In-Reply-To: <20190905084121.GL15024@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/CCa3UhVhiSSI-dqs9MWkpv9T8wI>
Subject: [Ntp] Antw: Re: Is one refid enough
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Sep 2019 09:06:13 -0000

>>> Miroslav Lichvar <mlichvar@redhat.com>; schrieb am 05.09.2019 um 10:41 in
Nachricht <20190905084121.GL15024@localhost>:
> On Wed, Sep 04, 2019 at 09:52:40PM ‑0700, Watson Ladd wrote:
>> The first use put forward was for redundancy: one would gather
>> intermediate sources until enough root sources were gathered. But this
>> isn't actually a reflection of the reliability: the NTP environment is
>> a graph, and the stratum 1 sources are the roots of a dynamically
>> created spanning tree. In particular if we have two stratum 1 sources
>> A and B, and two intermediates C and D, then if both C and D are using
>> both A and B then there is full redundency, even if both have better
>> connectivity and thus use A to synchronize with.
> 
> I'm not sure I understand this correctly. Do you mean that C and D
> should or should not be synchronized to both A and B at the same time,
> even if A is much better than B? I guess that would be up to the
> clustering and combining algorithms later in the selection process,
> and not anything related to refids.

I think in this scenario there is no possibility for a loop. However if C and
D would peer, C could get time directly from A and also (at a higher stratum)
from D. The point is whether C will still benefit from peering with D (I think
yes, if D is closer (in an NTP sense) to C than A).

> 
>> The second use was for preventing loop formation: by excluding a
>> source that has synchronized to you, this prevents loops. Let's take a
>> simple example: A and B are two stratum 1 sources, C and D take from A
>> and B respectively, and are peered. Because A is so much more stable C
>> synchronizes to it, and D synchronizes to C. Now assume that A goes
>> down. What should eventually result is C synchronizing with D and D
>> synchronizing to B. The question of which mechanism between using
>> reference IDs and accumulating errors/stratum will work better is not
>> obvious: it seems to me that not using reference IDs works just fine
>> in this example and provides faster recovery: C can synchronize to D
>> immediately as it is the best surviving timesource, and the error
>> accumulation eventually means D prefers B (in practice quite quickly)
>> vs. waiting for C to drift enough for D to switch before synchronizing
>> to D.
> 
> No, that doesn't sound right to me. If C was significantly better than
> B from the point of view of D when A stops working, D might prefer C
> over B for quite some time and if C switched to D, there would be a
> loop. They have to check the refid to prevent that from happening.

Anyway C and D's stratum will go up in some ping-pong manner until they stop
syncing from each other (assuming there's no other refclock available)

> 
> Fast reselection when something goes down is not a goal. As long as C
> is better than B, we want C to run free and keep D synchronized to C.
> When C accumulates so much dispersion that its worse than B, D will
> switch to B and only then C can synchronize to D. At least that's how
> I think it's expected to work.

NTP can be surprising some times: Some time in the past (NTPv3) we had a
handful of servers located in the same machine room. All the servers peered
with each other and had one reference clock, and a LOCL fallback at stratum 12.
I observed that the "peer crowd" drifted away from the reference clock,
eventually declaring it invalid, because the clock quality between the peers
seemed so much better (they all had agreed on some wrong time).


Regards,
Ulrich

> 
> ‑‑ 
> Miroslav Lichvar
> 
> _______________________________________________
> ntp mailing list
> ntp@ietf.org 
> https://www.ietf.org/mailman/listinfo/ntp