Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements

David Venhoek <david@venhoek.nl> Wed, 27 December 2023 13:04 UTC

Return-Path: <david@venhoek.nl>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 595EAC14F60E for <ntp@ietfa.amsl.com>; Wed, 27 Dec 2023 05:04:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=venhoek-nl.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WPH0i2GLKJGE for <ntp@ietfa.amsl.com>; Wed, 27 Dec 2023 05:04:27 -0800 (PST)
Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2134C14F681 for <ntp@ietf.org>; Wed, 27 Dec 2023 05:04:26 -0800 (PST)
Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-5555ac4314eso517363a12.0 for <ntp@ietf.org>; Wed, 27 Dec 2023 05:04:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=venhoek-nl.20230601.gappssmtp.com; s=20230601; t=1703682264; x=1704287064; darn=ietf.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NFYONPEst/MSZb8hIwF+VFtEYuWhJNFq454D0DlCIXo=; b=Y8tXcdqGXnXiYVEmwax9KhSt3+VQHryEHtRFqRdC1nh0gM0wezEFT4R5liYcrpEcXp 0obwyX6zdo6el+Gcj1As1U1d12gXPKCwP5bL7kfeCzurexIk6gbzaLzk/BMrAFKvX/8h L7NnNkOd/w3RZHaS0XF5lmt1a4/w9qVQ9j96niNLN6kMsYURsh3UURpAADGEFVL7GsuY eqnJnWwPuKej7Xgz8wY8QRt5+kCebIhl+WV8zPLCTVO/tWvRIVNGSJMKU26kvaZFyA52 dkMY+/SjAoLL4oQm7Wx2b427i3p/nbR+G3upoDTjKZOiuU+YHPe1fLKCla/DhWZVtiec G60w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703682264; x=1704287064; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NFYONPEst/MSZb8hIwF+VFtEYuWhJNFq454D0DlCIXo=; b=FQRKxR2I4bDYD6x5pzfztATtlZmF/e/lG5lqVGQ5w8FOKLpfMeGKUGXydfReG1tmz4 /BQgd8XyOOGlEh1vJ/5tX1g19MX/V2glOg4xPS7q3zVJt4MJ0z7T1lbSZ9b83MfpWsc/ /zofZan+KV5rHgmJNHLBzpzLcI+BC7z3BIEENJ6wGxm+04zBYpLeEWXgCS92LJiFmK0W bgBYQpscgYPeFN6MSm4m+PURFwkjfiIA4pDgMlp/1PhHCTxJ0QQBSaJmxwkn2B7owqDh zaOZaFZFw2PhmsMLXlmC8jq6oLo2R8PaCyzPGeyhqSS1mzXiogo66hB2edweFYzkSADA qEIQ==
X-Gm-Message-State: AOJu0YypL+VME23xPeUkZXVeYZaBW1Ly4R74zWJ8WXXsJcUcgVbGvuD6 GqBTY1C6J8lDVs12gLSCYr6Kzg3fdUiTTzcMDUx3kk9ALb90qGasSRRCEboz6fE=
X-Google-Smtp-Source: AGHT+IGh7Uip3sQ6SkuCCQE50NV05dQIH9RmZm5y0or9x/B7J/2Q0uNQVD9ZmSdtUeS+6yCLuGFYY/+wJJXq8x3X8b0=
X-Received: by 2002:a17:906:5341:b0:a27:4255:ad86 with SMTP id j1-20020a170906534100b00a274255ad86mr474613ejo.33.1703682264280; Wed, 27 Dec 2023 05:04:24 -0800 (PST)
MIME-Version: 1.0
References: <CA+mgmiMFLDRggrBUzdJyjhgbM6q0m8nY8PUoU5oxbR2HtZh51A@mail.gmail.com> <CAD4huA4+5R+tVQJQRFwR6vXuO0FZbtgTZwJeTfDjTVDaT4AwJg@mail.gmail.com> <2AEB577B-AEC3-4414-B8B7-9BA7382F3F54@gmail.com> <2f4226a3-484a-4f44-bd1b-758d648a30cd@nwtime.org> <ZXs4h46SERybNw_t@localhost> <CAMbSiYDeP9BObzQS+A2xKk5wN3LiW_zQ4S+D_d9WwhYyrq9Mkg@mail.gmail.com> <e8e35fef-96ec-4571-b842-100a7579263c@nwtime.org>
In-Reply-To: <e8e35fef-96ec-4571-b842-100a7579263c@nwtime.org>
From: David Venhoek <david@venhoek.nl>
Date: Wed, 27 Dec 2023 14:04:13 +0100
Message-ID: <CAPz_-SU9Uk8-UnibFzZOGAZx9drL61tEaoACwdfciUjavEPqWQ@mail.gmail.com>
To: ntp@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/aLrJVfHQlgjovb-kLRZhMZP0y1U>
Subject: Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Network Time Protocol <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Dec 2023 13:04:32 -0000

Hi All,

>From my perspective, the fact that apparently (at least as harlan
seems to claim in his original email) the algorithm specified in
RFC5905 is fragile would be all the more reason to not specify it as
"the standard". If this is really the case, we should rather be
looking harder for solutions that are more resilient and not so
sensitive to

I've done enough experiments with real hardware and over the internet
to have realized that pretty much any noise profile that you could
imagine will exist somewhere. I've seen clocks showing 24 hour phase
oscillations due to temperature oscillations, as well as clocks that
had similar cycles but shorter due to a course-controlling ac system.
Over the internet, there are also connections with "interesting"
(read, hella anoying to deal with) patterns in their latency, from
weekly patterns to daily patterns and even some 1.5 hour pattern stuff
that I am not really understanding where it comes from. And honestly,
once I get to the point of looking at the 1-1000hz noise range in the
ptp implementation I am currently working on, I wouldn't be surprised
to at least find something related to the power grid and it's
idiosyncracies.

Yes, another algorithm may introduce a new noise pattern, but given
the very wide amount of variation that is already out there it is
unlikely to.

Looking at the current moment, we are now in a situation where there
are 3 algorithms operating in the space, with the one in the
"reference" implementation, the one from chrony and the one from
ntpd-rs. I have personally not seen any indication of problems caused
by this, rather the oposite. From measurements it looks very likely
that both chrony and ntpd-rs are capable of, with similar poll
intervals, synchronizing to about an order of magnitude larger
precision.

And it is this precision that seems to me at least the thing that end
users want. Both legal (i.e. for trading firms the requirement that
clocks are to be within 30us from UTC) as well as practical
applications typically have just an upper bound of the clock must not
be more than x seconds from a reference.

Given all this, strictly mandating an algorithm that is both fragile
and imprecise seems like exactly the opposite from what we should
want.

As for the discussion on differences between RFC5905 and the
"reference" implementation, I don't have concrete differences
(unfortunately). However, we had a lot of trouble with its performance
back when we did have it implemented in ntpd-rs, also compared to the
"reference" implementation. Part of this may have been due to bugs on
our end, but it wouldn't surprise me if there are further differences.

Also, the fact that David L Mills apparantly had the tendency to just
change the algorithm without changes to the standard at least on the
surface feels also just wrong to me. That gives a lot of vibes of
"innovation, but only by us" from the "reference" implementation
people, which is I think highly harmful. I sincerely hope that that is
just a wrong impression from my side.

Kind regards,
David Venhoek

On Wed, Dec 27, 2023 at 8:22 AM Harlan Stenn <stenn@nwtime.org> wrote:
>
> On 12/26/2023 3:23 PM, Dave Hart wrote:
> > On Thu, 14 Dec 2023 at 17:18, Miroslav Lichvar <mlichvar@redhat.com
> > <mailto:mlichvar@redhat.com>> wrote:
> >
> >     On Thu, Dec 14, 2023 at 03:16:29AM -0800, Harlan Stenn wrote:
> >      > The core "mission" of NTP is time synchronization with a (well)
> >     defined
> >      > response to a "time impulse".  This is the reason why previous NTP
> >      > specifications have included the algorithms.  Prof. Mills and
> >     some others
> >      > have done a LOT of testing to ensure reliable and predictable
> >     behavior of
> >      > time synchronization, in the "normal" and "time impulse" cases
> >     over a very
> >      > wide range of circumstances.
> >
> >     If the RFC 5905 PLL+FLL is so great, why is nothing using it, not even
> >     the "reference" implementation in default configuration?
> >
> >
> > Would you mind elaborating how the reference implementation's PLL+FLL
> > feedback loop differs from the NTPv4 spec?  I'm not aware of any
> > intentional deviation, but Dr. Mills wasn't shy about making changes to
> > the implementation that he felt was an improvement before documenting it
> > in another RFC.
> >
> >     ntpd in default configuration has a poor response with longer polling
> >     intervals. It suffers from oscillations,
> >
> >
> > If verified that would seem to me a reason to improve the algorithms,
> > rather than decide it's time for a wild west where every NTP
> > implementation is free to behave in any way, as that would invite
> > pathological results in situations where differing implementations sit
> > on the synchronization path between the reference clock and the ultimate
> > client.
>
> Or better describe the conditions for these problems?
>
> If you are saying that the default config can show oscillations as poll
> intervals increase, all I can say is we haven't seen reports of this.
>
> If we had, we'd be taking steps to fix it.
>
> If you have seen this, perhaps you'd be kind enough to post about how
> one might change the default values to ones more suitable for longer
> poll intervals, or even telling us how to demonstrate the problem.
>
> >     which can be sometimes seen
> >     even on monitoring graphs of pool.ntp.org <http://pool.ntp.org>.
> >
> >
> > That public pool uses primitive monitoring that does not take into
> > account the delay or jitter between the monitoring station and the
> > server.  Moreover, the requirements for participating are very lenient,
> > allowing clocks that appear to be up to 70ms off of UTC.  That pool is
> > therefore not a good example of a well-engineered and well-maintained
> > synchronization source.  It's fine to get the clock within a few hundred
> > milliseconds, but stricter requirements call for a more precise source
> > and better error budgeting.
> >
> >     Nobody seems to care. Maybe
> >     it's a bug, but after so many years I think we can conclude that
> >     Internet will not break if all NTP implementations don't have the
> >     "well defined" response.
> >
> >
> > The internet will not break even if all NTP sources were only good to a
> > few seconds.  Those who require tight sync (such as distributed
> > databases) engineer solutions to meet their requirements.
>
> I'm negatively impressed with your conclusion, Miroslav.
>
> "The Internet" probably won't break, because "the internet" doesn't
> exchange time that way, and I would bet that you know this.
>
> A single machine will either have a crafted config file, well-tended or
> not, and static or pool servers.  How well do you think the vast
> majority of these machines are monitored to see if there are problems?
> How badly would they have to screw up to be noticed?
>
> If somebody bothers to look and sees one of the hosts in their static
> config file is bad, they will likely just throw out the bad site and
> replace it.
>
> If they are using the "pool" directive and there are misbehaving servers
> *that otherwise survive the pool monitoring service* then ntpd will
> notice the bad performers and throw them out automatically.
>
> In an enterprise, the odds are quite high that time for the enterprise
> is sync'd from a set of curated machines.  These machines are likely
> getting their time from reliable sources.  They won't be talking to
> poorly-behaving time sources.  This translates to the (internal)
> machines that get their time from the (well-behaved/reliable) internal
> time sources.
>
> So sure, the stuff the NTP Project has put out there is very resilient
> and well-behaved.  There's a good chance it will continue to behave well
> even in an increasingly hostile environment.
>
> But why would any <positive-intentioned> person want to take steps to
> increase the environment's hostility?
>
> As I have said before, the world of time-synchronization is not the
> place to use creative destruction as a method to promote evolution.
>
> > Cheers,
> > Dave Hart
> >
> >
> > _______________________________________________
> > ntp mailing list
> > ntp@ietf.org
> > https://www.ietf.org/mailman/listinfo/ntp
>
> --
> Harlan Stenn <stenn@nwtime.org>
> http://networktimefoundation.org - be a member!
>
> _______________________________________________
> ntp mailing list
> ntp@ietf.org
> https://www.ietf.org/mailman/listinfo/ntp