Re: [Ntp] Antw: Re: Calls for Adoption -- NTP Extension Field drafts -- Four separate drafts

Heiko Gerstung <> Tue, 03 September 2019 10:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0B769120112 for <>; Tue, 3 Sep 2019 03:34:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.289
X-Spam-Status: No, score=-4.289 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id oCzrvDxFgkwH for <>; Tue, 3 Sep 2019 03:34:31 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 5C27A120111 for <>; Tue, 3 Sep 2019 03:34:31 -0700 (PDT)
Received: from (unknown []) (using TLSv1.1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 8E1C871C076D; Tue, 3 Sep 2019 12:34:26 +0200 (CEST)
X-DKIM: Sendmail DKIM Filter v2.8.2 8E1C871C076D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=mail201101; t=1567506869; bh=2qK9ArhWScWMcUuWGN03tBMzJFjzGXH1/5Qr3mDcLBQ=; h=Date:Subject:Message-ID:References:In-Reply-To:Mime-version:From: To:Content-Type:Content-Transfer-Encoding; b=BRMfB3yzMBfZYSBogVo1GkZ+NkB8xEk2Mjx8c9Ohj/09SuvwwHAQizAclFsq3rsUk GfAubi/aVrHVjifx8oJRy+Ek8bYLiljNvxbEujlbgKBGZq/JvGwoS9+kKWriLQGVN6 XYh7Jp0WSL88jqyO4ksbKk9FpHxRyNAFXVD1kPbs=
X-Kerio-Anti-Spam: Build: [Engines:, Stamp: 3], Multi: [Enabled, t: (0.000011,0.031487)], BW: [Enabled, t: (0.000007)], RTDA: [Enabled, t: (0.027731), Hit: No, Details: v2.7.53; Id: 15.1i6tsda.1djrb15iu.3lbjt], total: 0(700)
X-Footer: bWVpbmJlcmcuZGU=
User-Agent: Microsoft-MacOutlook/10.1c.0.190812
Date: Tue, 3 Sep 2019 12:34:22 +0200
Message-ID: <>
Thread-Topic: [Ntp] Antw: Re: Calls for Adoption -- NTP Extension Field drafts -- Four separate drafts
References: <> <> <> <> <20190828103752.GI24761@localhost> <> <20190828111458.GJ24761@localhost> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Mime-version: 1.0
Importance: Normal
X-Priority: 3
Thread-Index: AZ2x3tU+MDNjYzVhNjFjMDY4OWQ0MA==
From: Heiko Gerstung <>
To: Harlan Stenn <>, "" <>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: clamav-milter 0.100.3 at server1a
X-Virus-Status: Clean
Archived-At: <>
Subject: Re: [Ntp] Antw: Re: Calls for Adoption -- NTP Extension Field drafts -- Four separate drafts
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 03 Sep 2019 10:34:36 -0000

OK, I was so fed up with the quoting of emails by Outlook that I put together a workaround involving a shell script, apple script and a tool that allows me to run an Applescript using a keyboard shortcut.... Nevermind, it seems to work though ..
I trimmed the quoted text in an attempt to make this all more readable. 
> On 03.09.19, 11:06 "ntp im Auftrag von Harlan Stenn" < im Auftrag von> wrote:
> > > Using an EF. Adding an EF which says "send me the chain" to a request, triggering the upstream server to send its resonse with an EF attached to it that includes the chain.
> > 
> > In NTPv4? I'm seeing incredible resistance to offering OPTIONAL EF
> > proposals (mechanisms) for v4.
> > 
> > Harlan, you can at any time implement optional EFs in ntpd, they do not need to be standardized in an RFC. If they are picked up by the users, it might happen that they are ending up in a standalone RFC later (or in v5). 
> For this to be true we'd need to fix the IANA EF table registry, and
> I've been trying to do that for years.

I believe you can have EFs that only ntpd instances understand and let people activate them in the configuration, if they want to use them and can be sure that they are only using ntpd an no other implementation (or make sure that they do not break other implementations). 

> > I'm seeing people saying they don't see the need for that mechanism
> > because they want to prevent that as a policy choice for others.
> > 
> > I do not want to prevent this, I want to prevent that we end up adding this and that (optional) feature to v4 and see a ton of v4 implementations, each providing a different subset of all the features. I also do not want to wait another 10 years before we start working on v5, we should shorten the release cycle to a handful of years instead, making more incremental changes to the protocol instead of introducing a new version every 20 years and keep it alive by adding optional features to it during those two decades.
> We haven't seen that many implementations yet.
There is a growing number of implementations, as you may have noticed. 

> And with what I'm proposing there's a clear path forward for negotiating
> what EFs are supported.
And I believe this is a good thing, but I do not want to see this mechanism in v4. Adding it to v5 makes it easy to make all new features mandatory for v5 (those who should not be optional of course) and does not require any negotiation because a client knows that each v5 server supports that feature set. If the client wants to use optional features, it should be able to ask the server whether it supports them or not. 
> Without my proposals, and with no plan in sight to fix the EF table
> registry, and if you're correct that optional EFs can be implemented
> without an RFC, please tell me how you forsee interoperabililty.
How interoperable was NTPv4 in the period before RFC 5905 was published? I preferred an RFC, but if people do not understand the requirement for a specific EF and do not agree that there is a such a strong demand for it that we all should start working on publishing a separate RFC for it, just implement it and let the users decide. If a lot of people start using it, you have a strong argument for adding it as a mandatory part of v5 or, if that really again takes a decade or two, come up with a separate RFC for it later. 
> As for the time it takes to produce a new standard, I'm pretty sure you
> won't be happy with the results you get from the path you want to take.
No, this is *exactly* the reason why I do not want to work on 4 separate RFCs at the same time instead of kicking of work on a v5 RFC. It will take time, it will take lots of discussions and reviews and edits,  and I would not want the working group to end up with 10 small RFCs instead of one larger one. 

> I am also having difficulties understanding how you are advocating for
> "more incremental changes to the protocol" in v5 while at the same time
> fighting "adding optional features" to v4.
I wanted to shorten the release cycle for new NTP RFCs in general from 15-20 years to maybe 5 years. That IMHO requires to limit the number of changes that go into v5. I do not think this is a problem as v4 just introduced a lot of additional things in the RFC compared to v3. If we stick to v4 and create v5 based on it plus some additional features and changes, we can probably avoid having to debate another 10 years about the v5 RFCs. This all needs to be more efficient, list all the problems and possible improvements we all see for v4, debate which of the problems we want to solve and which of the improvements we want to include as mandatory or optional in v5 (and which we do not want to include at all), then find a way to add those changes to the v4 RFC. 

> > If a bad actor can abuse an upstream system, all downstream systems are toast anyway, right? But again, the InstanceID is a way to avoid that mapping and at the same time provides a unique ID for each NTP instance. 
> I wrote about this in my response too. The issue goes to making sure
> InstanceIDs are unique. To do this with 100% certainty seems to require
> communications between all directly participating parties along with at
> least 1 more degree of "neighbors".
> This seems to require a lot of traffic, with ongoing updates.
> I don't think this passes the cost/benefit analysis.

Again: you do not need additional communication except for the "chain of InstanceIDs". We can create unique random InstanceIDs with an appropriate level of certainty (it might not be 100%, maybe we end up with 99.9999999999%).

> We could go to UUIDs which we can *assume* are unique. But we're also
> talking about UDP packets for lots of this traffic, which have limited
> payload size.
Oh Harlan, come on. The typical chain would consist of probably 2-5 NTP servers (I rarely seem stratum 6 servers in the wild). If a chain entry would be 4 bytes for flags etc plus 16bytes for the InstanceID, you have 20bytes per entry, 6*20 bytes = 120 bytes for a chain of 6 servers. The maximum depth of a chain is IMHO 15, resulting in 15*20=300 bytes. UDP and other headers will still keep this below 500 bytes, which is fine for a UDP packet as far as I can see.

> Unless you believe that there should be 2 ongoing communications streams
> here, a UDP-based time sync channel and a TCP-based "ancillary
> information" channel.
No, not for this thing.

> I don't think that passes the cost/benefit analysis, either.
I agree. 

> > > Indeed, for this chain to work one would need to know all the players in
> > > the tree between the top of the heap and the "bottom", as the loop
> > > detection would need to know the various possible systems in the middle.
> > > 
> > > As far as I can see, you do not need to know anything about the players in the tree. You just get a list of instanceIDs and as long as your own is not included, and there is none appearing twice, you can assume that there is no loop.
> > 
> > I floated an idea for a proposal for this probably 20 years ago, and it
> > was shot down.
> > 
> > You have a great memory. I have no idea what I did 20 years ago to be honest. But just because it was shot down 20 years ago does not mean it is useless, unreasonable or not helpful today. And it definitely does not mean that we should leave it alone because something like this has been discussed in another time and people then decided it is not worth it. 
> > 
> > I think it requires negotiation between all connected parties out to
> > more than degree 2, as one must ensure that *none* of the current group
> > (which may have connections to members of outside groups) share a common
> > InstanceID (if I'm using that word correctly).
> > 
> > It means that changes in connectivity will cause ripples of
> > re-negotiations of the known participants *and likely the chains of
> > their neighbor groups*.
> > 
> > This probably depends on the length of the InstanceID. If that ID is created in a randomized way and the random ID is long enough to make it very unlikely that two NTP instances talking to each other end up having the same ID, it can work just fine without any negotiations etc. >
> > What happens to IPv6 addresses that are hashed into 4 bytes to create a refid, what is the probability that two systems talking to each other come up with the same value? I know that a chain of 3-4 servers multiplies the chance that any two of them create the same random ID, 
> > but if we choose to have a 16byte InstanceID for example, this should be fine. 
> We agree that space in an NTP UDP packet is limited, right?

Sigh, see above. A chain of 15 servers (which is not very common) would be 300 bytes. Do you think this creates any problem?

> And the most expensive part of this scenario is when a client (which may
> be a leaf, and it might be another "collection" of nodes that need to
> agree on the uniqueness of InstanceIDs) joins a new group.
Why? What happens in this scenario?

> This initial joining is going to be the predominant case during startup,
> when a clear goal is getting time synchronized quickly.
Yes, but why do you expect this to be a problem? 

> Do you see where this is going?
No, unfortunately not. 

> I'm not saying we shouldn't discuss this more, I'm just saying there are
> some really difficult and intricate tradeoffs here, and I'm pretty sure
> it's going to take a fair amount of time and a LOT of testing to get
> something that is as robust, predictable and reliable as the protocol
> and reference implementation have proven themselves to be.
OK, agreed. If you agree we should discuss this more, let's do it. 

> > Please imagine this in the face of the NTP Pool.
> > 
> > The pool typically provides stratum 2 or 3 servers. I think that the pool especially would benefit from being able to identify that my stratum 3 pool servers A,B and C are in fact synchronized to the same stratum 1 server and therefore do not really represent a multi-source solution 
> > for my client. At the moment I do not really have a chance to find that out. If I would know, the client could re-resolve the pool dns entries until it finds 4 servers all with different stratum 1 sources.
> It's not about the pool servers - it's about the folks who *sync* with
> the pool. Are you suggesting that *all* of the members of the pool
> should make sure they are using unique InstanceIDs to make it easier on
> folks who connect to the pool? If so, what would force different pool
> operators to cooperate here?

It would be possible to let the pool maintainers assign (a part of) the InstanceID for each registered pool server instead of creating it randomly at each startup. The pool monitoring system could check if that assigned (part of the) InstanceID is actually used by a server, otherwise it is not allowed in the pool. Seems to be something that can easily be automated. Please note this is only required if a pool server runs v5, all the existing pool servers already out in the field do not need to care about this if they choose not to touch their configuration.
> And why do you think it's important to find upstream servers that sync
> with different S1 sources? There's no timing loop involved there, so
> what exactly are you testing for?

I would like to avoid a single source of failure and make sure that a client gets its time from different sources to be protected better against manipulations. If all your upstream servers are synchronized from one stratum 1 server, a bug or attack resulting in bad time on this stratum 1 server can end up impacting all your upstream servers and then your client. I know that those upstream servers should have multiple sources, too. But I do not know for sure and cannot tell if they do. 
> At some point is it important to identify different S0 sources? At what
> level of granularity do we need to identify "time sources"? GPS v. some
> other GNSS source? The receiver chip used? The firmware? The vendor
> of the refclock?

Differentiating between the various hardware reference clocks and their source could be useful, too. I would certainly want to be able to identify which of my servers are using GPS, which are using Galileo and which are using GLONASS. Just like you can change the refID string for your hardware reference clocks, we could allow users to define an InstanceID for each of their refclocks with a default that represents "GPS", "GLONASS", "GALILEO" etc.. This would allow clients to choose a group of upstream servers which all have a different "root source of time", or at least as many different ones as possible. I can protect my NTP infrastructure against GPS spoofing if I can make sure that the selection of servers I use in a client is using different GNSS constellations. If the end user wants  to allow clients to identify different GPS receiver chips etc., it is possible to assign different InstanceIDs to different receiver chips, for example. You could also tackle this problem by allowing to define a chain of InstanceIDs for your hardware refclock. This would allow you to say that your server is synchronized to "GPS" via "GNSS receiver chip x", for example. Of course this would increase the theoretical maximum of entries in the InstanceID chain, but if you do not have more than 4-5 stratum levels in your infrastructure, you do not really increase the maximum UDP packet size to a point where it exceeds the UDP limit.

> > The worst thing that can happen in the very unlikely event of two servers in a chain accidentally ending up with the same random InstanceID is that the downstream server wrongly detects a loop and rejects this reference. It would be possible to solve this by making it mandatory for an instance to recreate a new random InstanceID if all upstream servers are ruled out due to this loop detection. There are two possibilities here: if this is a false positive and there is no loop, changing my own instanceID will solve it because the upstream server will stick to its InstanceID and the resulting chain does not trigger the loop detection anymore. If it really is a loop, the chain will, after a few polling cycles, again trigger the loop detection because the new InstanceID shows up in the chain. If we add information in the chain about the polling intervals or the age of the InstanceID entries, the instance that recreated its random InstanceID could wait until it is clear that the IntanceID triggering the loop detection remains unchanged. 
> What am I missing? Why would a downstream care if more than one of its
> upstreams have the same source? That is *not* a loop.
No, it is not a loop. But it represents a single point of failure and if you can avoid this because your clients can identify which of its upstream sources are using which root time source, why not?

> > I would not want something like a registry and I believe we do not need one. I am not sure about the dependency on an authoritative source. Can you explain why you believe that we do not need the loop detection as long as we have an authoritative source?
> I was too sparse with my words.
> My point is that with a registry (that has a means to authenticate
> entries) we don't need to worry about more than 1 machine using the same
> InstanceID.
> If we don't have a registry with a validation mechanism, one other
> option is to query all of the involved nodes (out to at least 1 degree
> of their involved nodes) to make sure InstanceIDs are unique.
Yes, good suggestion! If, at startup, a node does not respond to incoming requests and instead first of all asks all its configured upstream sources for their "chain", it will get a list of all used InstanceIDs. By choosing one that is not included in any of the upstream chains, you can dramatically reduce the probability of a random InstanceID being accidentially used twice. Obviously, the chains only contain the *currently used* InstanceIDs, if an upstream server switches to another reference later which has the same InstanceID that you came up with, you have a collision again. 

> Or we can decide that we don't care about InstanceIDs that are not
> unique and sometimes collide.
Yes, as I proposed your instance could simply throw away its InstanceID and create a new random one, whenever it detects that its current ID is used upstream. If it is a loop, the new InstanceID will show up again in the chain after a while, if it is not a loop, you solved the situation. 

> Or we can find other ways to address making sure InstanceIDs are unique.
Yes, although I believe that the two suggestions above should be good enough.

> > I'm very curious about the quality of assertions that we can actually
> > rely on because "the whole chain of clocks talked to each other using
> > NTS" (or any other authentication model).
> > 
> > Well, we can only trust our direct upstream server. That server, in turn, can only trust its own upstream servers. This is a chain of trust and as I said, if I do not have any access, information or trust in any of the upstream servers of my own upstream server, I cannot assert anything. 
> > The intention of this approach/flag is to make sure that a client chooses a chain of NTS protected servers over an alternative that uses unauthenticated NTP. It would also allow the user to tell their clients to not accept any upstream source when its chain does not contain only NTS 
> > protected servers.
> This "protection" is not unique to NTS. Private key can be just as
> trustworthy, it just doesn't have the ephemeral negotiation that NTS
> offers. There are other choices here as well, that offer equivalent
> protection.
> And there are limits to what this protection "offers".
That's why I proposed that the flag does not say "this entry in my chain is protected by NTS", it just says that "I trust this upstream server of mine because I am reasonably sure that I can talk to it in a secure way". 

> I'll note again that Daniel and some others promised that NTS would be
> able to offer protection for peer associations as well, and that promise
> still hasn't materialized. This level of protection was a foundational
> aspect of the plan Dave Mills submitted years ago.
> If folks want robust and reliable enterprise time sources, I submit that
> peer mode is an important part of that mechanism. Peering also goes to
> better behavior for loop detection/avoidance as well as for things like
> orphan mode.
Not part of this discussion, therefore I take the freedom to not comment on it to avoid that this thread is getting even more messy than it already is ;-) ...

> > The primary purpose of the REFID was degree-one loop detection. Once it
> > was there, in an IPv4 world we noticed we could use it for more things.
> > This eventually led to faulty assumptions and problematic behaviors.
> > Unintended consequences, if you will.
> > 
> > This is why I would like to see that a) the InstanceID is used as the RefID - but make it 16 bytes (or longer) and randomize it. This requires a change in the packet format (which was the start of this discussion). Using the InstanceID to create the chain, offering n-degree loop 
> > detection and an indication whether the upstream server relies on NTS/protected information, would be optional and can be implemented using EFs. 
> Do you see a way to do this with NTPv4?
Not without breaking interoperability with other NTPv4 implementations that have no idea about an InstanceID.

> We have a 3 bit version field for the packets. I expect that a "7" in
> that field will mean "version 7 or later" and that there will be a
> presumably immutable base structure that will give folks access to some
> other data value that will identify the actual (>6) version number.
> And while NTS may be one way to do this, it should not be the *only* way
> to do this.
I agree. If the version is 7, another additional version field in the v7 packet format would be required. No problem with this approach, problem solved.

> And to do this with EFs would require folks to allow EFs to be created,
> which means they'd have to start understanding that there's a difference
> between mechanism and policy, and that just because somebody doesn't
> agree with a policy choice doesn't mean the should not allow the
> mechanism to decide if that policy choice is used or not.
No need to do this with EFs. I would prefer that the version number is defined in the base packet header, as it needs to be the first information that any receiving node requires.

> > I see the same thing happening above, and with some proposals offered by
> > others.
> > 
> > I do not see unintended consequences, but that is probably the nature of unintended consequences (if you would see them, they become intended consequences). The fact that this quickly sketched InstanceID thing can lead to unwanted side effects should not hold us back from 
> > investigating this and other possible enhancements. The chance of "unintended consequences" is *always* there. You can deal with that by discussions and research. My whole point here is that we should do this. I am not married to the idea of the InstanceID and the chain and n-
> > degree loop detection etc. - I just believe it would improve NTP and is worthwhile to be investigated further. 
> Happy to discuss it.

> And sooner or later it needs to be implemented and tested, and holding
> off on that until a committee decides it's suitable advanced is, IMO,
> far too late.
I disagree. You and I know that implementation and testing resources are precious and if we can, we should avoid wasting them until we know that we agree on a feature/function.

> The other side of this is that unless these changes are carefully and
> consciously made, we're going to be living with the "pollution" for a
> very long time.

> -- 
> Harlan Stenn, Network Time Foundation
> - be a Member!
> _______________________________________________
> ntp mailing list

All the best,

Heiko Gerstung 
Managing Director

MEINBERG® Funkuhren GmbH & Co. KG
Lange Wand 9
D-31812 Bad Pyrmont, Germany
Phone:    +49 (0)5281 9309-404
Fax:        +49 (0)5281 9309-9404

Amtsgericht Hannover 17HRA 100322
Geschäftsführer/Management: Günter Meinberg, Werner Meinberg, Andre Hartmann, Heiko Gerstung


Do not miss our Time Synchronization Blog: 

Connect via LinkedIn: