Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements

David Venhoek <david@venhoek.nl> Wed, 07 February 2024 08:48 UTC

Return-Path: <david@venhoek.nl>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4F846C15152F for <ntp@ietfa.amsl.com>; Wed, 7 Feb 2024 00:48:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.906
X-Spam-Level:
X-Spam-Status: No, score=-0.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, URI_DOTEDU=1] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=venhoek-nl.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i-FJNke3IOUo for <ntp@ietfa.amsl.com>; Wed, 7 Feb 2024 00:47:56 -0800 (PST)
Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 42DEBC14F694 for <ntp@ietf.org>; Wed, 7 Feb 2024 00:47:53 -0800 (PST)
Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-a38291dbe65so37228166b.3 for <ntp@ietf.org>; Wed, 07 Feb 2024 00:47:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=venhoek-nl.20230601.gappssmtp.com; s=20230601; t=1707295672; x=1707900472; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+QILCpK8+6rsfFXoaEWjwOaLWwLF8vxWl2/GeKYt12M=; b=o/Xb0SVLTTgjC5Qrc4fYGRfFEinFu5h47JtWO9EMrtEtuX7rGaWpVs7wtQ46GKW+6j 8wup7+uWOXUPVo+2slnIwni9qWOW7yqq4H8Jfrj+Fv17UVug+CMXxcPMXUJOEmP87mac JxaQj9kxSoVF6RCDgzbbDvKXMh9m1FEUePNpgMe+DKXavu1hUooLFrG/cta54xc629+M tdgrJI8JsQSFLFleyaByM4v2GuqR1ldMzZRSHix6eko+urwHibbCAcDgwWn7DWBcnZ4l AJgDCL68wGGOud/vLnnHBPyYRoOygUDiTyHmzEupdHlaqecFETW7E8V2fAOvUlFpqqrP O9KQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707295672; x=1707900472; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+QILCpK8+6rsfFXoaEWjwOaLWwLF8vxWl2/GeKYt12M=; b=u0TJ1o6+csX6/CZP7HBN7XnCzqDG+NmXoQ9szKuP5e8pVTkTwy9hrjzBQbTg4fPKlF kf21JzgAtBZyMZMXuw/FpDXPYNvbxWF8zx4+NgUougiTzOufwqjq++/Cp9Q0nNZmeKHc 3QF7wX0Tk9L8Wx0900oAWiJZ3Mf0yFNBWXI0H/nNMCqIouuIjf9Wt1mRFirTEh9x3tRG PaPjxD2gB0xdC/Nsv8OzGu9IGL2Xz4qLdz07apURwY54Yn1y1NzFjICcaNNkyyKyLsFL syZeex5Cq89mNa4Zi9nUM74KHPO6JQ2R2xXhDNYxgRw123ADlLymIgzUv4HEujo8pPPE 5GDQ==
X-Gm-Message-State: AOJu0YyFVDWD/7Or7yEt75cAPxO9ZQRgJ2BWk0GlGyv1gbyvrFbmanH/ GDc6qfqW1pQQIQo+I3VrlzR84IXA05N+97oYoSy0sL3g7bmKOWxYQ29GU8nAB3ONxuik3XU5Wvh BAiCWUcjpS9NpGJeHLglk/rcC6kvQq1h+Qr3Oo1iZEvts0ApI9aA=
X-Google-Smtp-Source: AGHT+IE5ZQa/Z0nEk2Ig5ODkfjzQu+JWpEaOC8bwXrQwVcrlKE8wX0Z75O2yfagP97Ys7Txu8lv3SZaJ4ZoKurTsi5E=
X-Received: by 2002:a17:906:3444:b0:a37:d71b:ab38 with SMTP id d4-20020a170906344400b00a37d71bab38mr3554477ejb.27.1707295671863; Wed, 07 Feb 2024 00:47:51 -0800 (PST)
MIME-Version: 1.0
References: <CA+mgmiMFLDRggrBUzdJyjhgbM6q0m8nY8PUoU5oxbR2HtZh51A@mail.gmail.com> <CAD4huA4+5R+tVQJQRFwR6vXuO0FZbtgTZwJeTfDjTVDaT4AwJg@mail.gmail.com> <2AEB577B-AEC3-4414-B8B7-9BA7382F3F54@gmail.com> <2f4226a3-484a-4f44-bd1b-758d648a30cd@nwtime.org> <ZXs4h46SERybNw_t@localhost> <CAMbSiYDeP9BObzQS+A2xKk5wN3LiW_zQ4S+D_d9WwhYyrq9Mkg@mail.gmail.com> <e8e35fef-96ec-4571-b842-100a7579263c@nwtime.org> <CAPz_-SU9Uk8-UnibFzZOGAZx9drL61tEaoACwdfciUjavEPqWQ@mail.gmail.com> <CAMbSiYDXTZ4B6=+Qu8MizubM+KQR6dvyWJtxpVM8CpWF6vDe6Q@mail.gmail.com> <CAPz_-SV9KUwK_j3wkSRe7BcQhH_8f7hSTbZhOeMuOfwLajcncA@mail.gmail.com>
In-Reply-To: <CAPz_-SV9KUwK_j3wkSRe7BcQhH_8f7hSTbZhOeMuOfwLajcncA@mail.gmail.com>
From: David Venhoek <david@venhoek.nl>
Date: Wed, 07 Feb 2024 09:47:40 +0100
Message-ID: <CAPz_-SVPfcKvXroLsLvK0C5OgacZd0VPSN42C5L4DUh+_siTTQ@mail.gmail.com>
To: Dave Hart <davehart@gmail.com>
Cc: ntp@ietf.org
Content-Type: multipart/mixed; boundary="0000000000000c79f90610c6c0ab"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/GOguAKeNGGxO3UW4fjyFo3kyRp4>
Subject: Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Network Time Protocol <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Feb 2024 08:48:01 -0000

Apologies for the delay, processing the chrony data took me a lot
longer due to a variety of factors, but anyway here the same
statistics for chrony:

standard deviation: 2.4us
90 percent is within: 3.8 us
95 percent is within: 4.6 us
99 percent is within: 7.6 us

The same caveats mentioned in the previous email apply.

Kind regards,
David Venhoek

On Fri, Jan 12, 2024 at 10:41 AM David Venhoek <david@venhoek.nl> wrote:
>
> Apologies for this taking a while, but between needing access to the
> office (which was closed over the holidays) to get clean data
> demonstrating my point (I have internal data that shows the same, but
> it isn't to a standard I would expect others to accept), and making
> some mistakes it took me a while to get measurement data to share.
>
> The data presented here is for ntpd-rs and the reference
> implementation, chrony is still running as we speak and I will send it
> sometime next week when I have time. My argumentation for the order of
> magnitude improvement is this data and the comparison published at
> https://chrony-project.org/comparison.html.
>
> What I tested was the following specific situation: A endrun ninja gps
> reference clock running ntp, providing time to a raspberry pi 3 via a
> HP aruba switch. Each of ntpd-rs and the reference implementation was
> configured to use the endrun ninja as it's only remote time reference,
> with poll interval of 8 seconds. The raspberry pi 3 and the endrun
> ninja were both configured to output a pulse per second signal, the
> offset between these was measured.
>
> For each test, the implementation under test was started and given 2
> hours to stabilize, and then the offsets between rpi3 and ninja was
> measured for 4 hours.
>
> Below are provided for each the standard deviation from mean of the
> offset, and 90, 95 and 99 percent worst offsets from the mean offset.
> Mean offset was ignored because there are multiple known sources of
> assymetries in the system, including from the ninja reference clock
> and the switch, and I haven't done sufficient analysis of all sources
> to be able to attribute a quantity to the ntp implementation
> specifically (hence making this value rather meaningless). Furthermore
> assymetries in general are not a function of any particular algorithm,
> but rather of how the timestamping is done, which best I can tell is
> not standardized in teh spec and hence not part of the current
> discussion. Please take the p99 with a grain of salt, as with the 4
> hour measuring period the amount of data supporting that is somewhat
> on the thin side.
>
> Attached as images are plots of the allan deviation and offsets of the
> clock when controlled by each of the implementations in the range
> 1-2048 seconds.
>
> NTP reference implementation
> -----------------------------
>
> standard deviation: 36.5 us
> 90 percent is within: 24.6 us
> 95 percent is within: 59.4 us
> 99 percent is within: 192.2 us
>
> ntpd-rs
> -------
>
> standard deviation: 3.4 us
> 90 percent is within: 5.5 us
> 95 percent is within: 6.4 us
> 99 percent is within: 8.9 us
>
>
> Hope that answers your question regarding what I base my assertions
> on. If you are interested in further measurements please let me know,
> then we can discuss what we can do practically. Attached below are
> some of the technical bounds on errors in the measurement itself for
> those interested.
>
> For those wanting to play around with the data themselves, the
> measurement offset data is also attached. The format is one line per
> second, in order, with the offset in nanoseconds.
>
> Kind regards,
> David Venhoek
>
>
>
> Uncertainty estimates on the measurements:
>
> In the setup, we have identified the following sources of error:
> Cable length differences: We have differences in cable length between
> various parts of at most 1 meter. At 0.3c signal propagation speed
> this represents a 12 ns uncertainty
> Time interval measurement: The time difference measurement aparatus
> has a resolution of 1 ns, validated to 10 ns. Furthermore the clocking
> used internally gives a 0.01% relative uncertainty. For the range
> investigated here this represents an uncertainty of 30 ns
> endrun ninja pps edge uncertainty: Has been validated to be within
> 100ns of gps input.
> Rpi3 pps edge uncertainty: Has been validated to be within 100ns of
> internal clock.
>
> In combination, the measurements are accurate to at least 300 ns.
> Confidence bounds on the derived statistics depend significantly on
> assumptions on the underlying distribution, but the differences are
> significant enough to be unlikely to be chance for the standard
> deviation, p90 and p95 statistics. For the p99 statistic, the sample
> size is likely on the small side, especially given correlation between
> the individual measurements.
>
> On Thu, Dec 28, 2023 at 7:22 AM Dave Hart <davehart@gmail.com> wrote:
> >
> > On Wed, 27 Dec 2023 at 13:04, David Venhoek <david@venhoek.nl> wrote:
> >>
> >> >From my perspective, the fact that apparently (at least as harlan
> >> seems to claim in his original email) the algorithm specified in
> >> RFC5905 is fragile would be all the more reason to not specify it as
> >> "the standard". If this is really the case, we should rather be
> >> looking harder for solutions that are more resilient and not so
> >> sensitive to
> >
> >
> > Fortunately this is not the case.  Harlan did not claim in his original email of this thread that NTPv4 is fragile.  Your apparent misunderstanding may come from his mention that Dr. Mills showed that _violating_ the specification can easily break a synchronization.  That is quite the opposite of claiming RFC5905's algorithms are fragile.
> >
> >> [...]
> >>
> >> Looking at the current moment, we are now in a situation where there
> >> are 3 algorithms operating in the space, with the one in the
> >> "reference" implementation, the one from chrony and the one from
> >> ntpd-rs. I have personally not seen any indication of problems caused
> >> by this, rather the oposite. From measurements it looks very likely
> >> that both chrony and ntpd-rs are capable of, with similar poll
> >> intervals, synchronizing to about an order of magnitude larger
> >> precision.
> >
> >
> > Your use of precision is confusing in this context.  In NTP parlance, precision is the minimum change in the system clock, in ancient systems, the clock tick, in modern systems, the time to read the system clock.  I look forward to you documenting your claim of an order of magnitude better performance from Chrony and ntpd-rs compared to the reference implementation.  I would love to improve the performance of NTP across a wide variety of situations and then promote standardizing those improvements.
> >>
> >> [...]
> >>
> >> Also, the fact that David L Mills apparantly had the tendency to just
> >> change the algorithm without changes to the standard at least on the
> >> surface feels also just wrong to me. That gives a lot of vibes of
> >> "innovation, but only by us" from the "reference" implementation
> >> people, which is I think highly harmful. I sincerely hope that that is
> >> just a wrong impression from my side.
> >
> >
> > It sounds like you're not familiar with the history of NTP.  In the beginning, there were fuzzball routers and Arpanet...  I don't think it would be productive for me to spend a lot of time repeating what can be easily discovered with a little effort.  I refer you to Dr. Mills' papers at https://www.eecis.udel.edu/~mills/, the NTP RFCs, and his second edition NTP book "Computer Network Time Synchronization: The Network Time Protocol on Earth and in Space, Second Edition 2nd Edition" ISBN 978-1439814635.
> >
> > Suffice it to say Mills invented NTP and nurtured it personally for 30 years.  He maintained various iterations of the reference implementations over that timeframe, eventually with the assistance of Harlan Stenn.  Dr. Mills' health no longer allows him to participate substantially.  During that time, he naturally evolved the design _before_ codifying it in standards, and I know of no other reasonable approach.  He never wavered from the position that NTP was not simply a wire protocol, but a suite of algorithms carefully engineered, simulated, and tested over time to have predictable and consistently beneficial behavior in synchronizing networks of devices within the limits of their oscillators, reference clocks, communications channels, hardware, and underlying software.
> >
> > If you want to use the on-wire protocol without the algorithms, please see the SNTP RFCs.
> >
> > Cheers,
> > Dave Hart
> >
> >
> >>
> >> On Wed, Dec 27, 2023 at 8:22 AM Harlan Stenn <stenn@nwtime.org> wrote:
> >> >
> >> > On 12/26/2023 3:23 PM, Dave Hart wrote:
> >> > > On Thu, 14 Dec 2023 at 17:18, Miroslav Lichvar <mlichvar@redhat.com
> >> > > <mailto:mlichvar@redhat.com>> wrote:
> >> > >
> >> > >     On Thu, Dec 14, 2023 at 03:16:29AM -0800, Harlan Stenn wrote:
> >> > >      > The core "mission" of NTP is time synchronization with a (well)
> >> > >     defined
> >> > >      > response to a "time impulse".  This is the reason why previous NTP
> >> > >      > specifications have included the algorithms.  Prof. Mills and
> >> > >     some others
> >> > >      > have done a LOT of testing to ensure reliable and predictable
> >> > >     behavior of
> >> > >      > time synchronization, in the "normal" and "time impulse" cases
> >> > >     over a very
> >> > >      > wide range of circumstances.
> >> > >
> >> > >     If the RFC 5905 PLL+FLL is so great, why is nothing using it, not even
> >> > >     the "reference" implementation in default configuration?
> >> > >
> >> > >
> >> > > Would you mind elaborating how the reference implementation's PLL+FLL
> >> > > feedback loop differs from the NTPv4 spec?  I'm not aware of any
> >> > > intentional deviation, but Dr. Mills wasn't shy about making changes to
> >> > > the implementation that he felt was an improvement before documenting it
> >> > > in another RFC.
> >> > >
> >> > >     ntpd in default configuration has a poor response with longer polling
> >> > >     intervals. It suffers from oscillations,
> >> > >
> >> > >
> >> > > If verified that would seem to me a reason to improve the algorithms,
> >> > > rather than decide it's time for a wild west where every NTP
> >> > > implementation is free to behave in any way, as that would invite
> >> > > pathological results in situations where differing implementations sit
> >> > > on the synchronization path between the reference clock and the ultimate
> >> > > client.
> >> >
> >> > Or better describe the conditions for these problems?
> >> >
> >> > If you are saying that the default config can show oscillations as poll
> >> > intervals increase, all I can say is we haven't seen reports of this.
> >> >
> >> > If we had, we'd be taking steps to fix it.
> >> >
> >> > If you have seen this, perhaps you'd be kind enough to post about how
> >> > one might change the default values to ones more suitable for longer
> >> > poll intervals, or even telling us how to demonstrate the problem.
> >> >
> >> > >     which can be sometimes seen
> >> > >     even on monitoring graphs of pool.ntp.org <http://pool.ntp.org>.
> >> > >
> >> > >
> >> > > That public pool uses primitive monitoring that does not take into
> >> > > account the delay or jitter between the monitoring station and the
> >> > > server.  Moreover, the requirements for participating are very lenient,
> >> > > allowing clocks that appear to be up to 70ms off of UTC.  That pool is
> >> > > therefore not a good example of a well-engineered and well-maintained
> >> > > synchronization source.  It's fine to get the clock within a few hundred
> >> > > milliseconds, but stricter requirements call for a more precise source
> >> > > and better error budgeting.
> >> > >
> >> > >     Nobody seems to care. Maybe
> >> > >     it's a bug, but after so many years I think we can conclude that
> >> > >     Internet will not break if all NTP implementations don't have the
> >> > >     "well defined" response.
> >> > >
> >> > >
> >> > > The internet will not break even if all NTP sources were only good to a
> >> > > few seconds.  Those who require tight sync (such as distributed
> >> > > databases) engineer solutions to meet their requirements.
> >> >
> >> > I'm negatively impressed with your conclusion, Miroslav.
> >> >
> >> > "The Internet" probably won't break, because "the internet" doesn't
> >> > exchange time that way, and I would bet that you know this.
> >> >
> >> > A single machine will either have a crafted config file, well-tended or
> >> > not, and static or pool servers.  How well do you think the vast
> >> > majority of these machines are monitored to see if there are problems?
> >> > How badly would they have to screw up to be noticed?
> >> >
> >> > If somebody bothers to look and sees one of the hosts in their static
> >> > config file is bad, they will likely just throw out the bad site and
> >> > replace it.
> >> >
> >> > If they are using the "pool" directive and there are misbehaving servers
> >> > *that otherwise survive the pool monitoring service* then ntpd will
> >> > notice the bad performers and throw them out automatically.
> >> >
> >> > In an enterprise, the odds are quite high that time for the enterprise
> >> > is sync'd from a set of curated machines.  These machines are likely
> >> > getting their time from reliable sources.  They won't be talking to
> >> > poorly-behaving time sources.  This translates to the (internal)
> >> > machines that get their time from the (well-behaved/reliable) internal
> >> > time sources.
> >> >
> >> > So sure, the stuff the NTP Project has put out there is very resilient
> >> > and well-behaved.  There's a good chance it will continue to behave well
> >> > even in an increasingly hostile environment.
> >> >
> >> > But why would any <positive-intentioned> person want to take steps to
> >> > increase the environment's hostility?
> >> >
> >> > As I have said before, the world of time-synchronization is not the
> >> > place to use creative destruction as a method to promote evolution.
> >> >
> >> > > Cheers,
> >> > > Dave Hart
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > ntp mailing list
> >> > > ntp@ietf.org
> >> > > https://www.ietf.org/mailman/listinfo/ntp
> >> >
> >> > --
> >> > Harlan Stenn <stenn@nwtime.org>
> >> > http://networktimefoundation.org - be a member!
> >> >
> >> > _______________________________________________
> >> > ntp mailing list
> >> > ntp@ietf.org
> >> > https://www.ietf.org/mailman/listinfo/ntp
> >>
> >> _______________________________________________
> >> ntp mailing list
> >> ntp@ietf.org
> >> https://www.ietf.org/mailman/listinfo/ntp
> >
> >
> >
> > --
> > Cheers,
> > Dave Hart