Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements

David Venhoek <david@venhoek.nl> Fri, 12 January 2024 09:42 UTC

Return-Path: <david@venhoek.nl>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60B8AC14F6E2 for <ntp@ietfa.amsl.com>; Fri, 12 Jan 2024 01:42:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.906
X-Spam-Level:
X-Spam-Status: No, score=-5.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, URI_DOTEDU=1] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=venhoek-nl.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3DpuesloSOyy for <ntp@ietfa.amsl.com>; Fri, 12 Jan 2024 01:41:55 -0800 (PST)
Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 617B3C14F69D for <ntp@ietf.org>; Fri, 12 Jan 2024 01:41:31 -0800 (PST)
Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-55753dc5cf0so7415596a12.0 for <ntp@ietf.org>; Fri, 12 Jan 2024 01:41:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=venhoek-nl.20230601.gappssmtp.com; s=20230601; t=1705052489; x=1705657289; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8Sb0rpc/aiOTuR7s1JEeXRxm+n93HpKb3uoK98RWBxk=; b=qltHwxdtO/xdwA+3WXWLCL4d0qynDOrzlv23N05naxngOcvh14vj9afyreFkodO2OL Zl/KZvlejvgQOTV8aGjH9bKlcaJUg8l+t88wjadHZFUrN2AOos1mX72B9rDrzSne86tc 31xfQbri+zR3RjDo0SZH7P3AHHMZNfyS0MW5p+3FbcrSgpzzdhzxHOd/mMNiyXuyi92n HAI0rIOBYTivlgPBs+NDCA2EnGZHgUgtpEQU3a/e15Z0qFPMV3vQivM5zfRGAYC3UJPp YSv5Z1QRqw0gGKL6NG3i9LnHGsdO+T5SdpoyO42iSg+YBqFvDTAgwTzXdOTVAZHyszb0 J6Zw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705052489; x=1705657289; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8Sb0rpc/aiOTuR7s1JEeXRxm+n93HpKb3uoK98RWBxk=; b=Ek2cSnNZJzcN6O9axmlRmhLizc1nrutfV05trkxUz1z7GUw55irtCJZNMR0KLT2UXR xvarGtZRHBFVsQB9HUkkXQWFCUgOhGqchoWU0Z2epH43I4S1ABuUHDwlwRJn/uwcmMH2 Ld9HQJYgXX+URehXHbpdLG1R4u5CG7963AEo/vlnNAvRaUaHTVS0XFALyG2p3gf4L+kP BiOeInpg+qvwWkcADp2JzeerECyeSehAdgE5V+6YfBNrc8/uzNUighSrkQQNUwMfP+1N oeDqAebYgdGzjb4CI9V8EX3hDBZ79fVX7wu3toQ7mBZdRVxqylNxdRjMeYSor4PAlMid /Khg==
X-Gm-Message-State: AOJu0YwhDKSIPDSPft6ne91b3a3eGEKB0U/4j15YnpdW1B3xdwlhdpho wqLUHCEr1IlBU7DO/p8wDlWZpox1y1YjkX6GEH+6IP1oXiJ5+w==
X-Google-Smtp-Source: AGHT+IEQkB3QwQ2VwX9SHRI2n45exEO48Eu9IdLhOH3WX8Gw6aORsAKeH0p7RP7qQOzFp+VarMPD7PWqhOqqnPPoZaE=
X-Received: by 2002:a17:906:2a90:b0:a2c:3d26:de5d with SMTP id l16-20020a1709062a9000b00a2c3d26de5dmr448184eje.19.1705052488683; Fri, 12 Jan 2024 01:41:28 -0800 (PST)
MIME-Version: 1.0
References: <CA+mgmiMFLDRggrBUzdJyjhgbM6q0m8nY8PUoU5oxbR2HtZh51A@mail.gmail.com> <CAD4huA4+5R+tVQJQRFwR6vXuO0FZbtgTZwJeTfDjTVDaT4AwJg@mail.gmail.com> <2AEB577B-AEC3-4414-B8B7-9BA7382F3F54@gmail.com> <2f4226a3-484a-4f44-bd1b-758d648a30cd@nwtime.org> <ZXs4h46SERybNw_t@localhost> <CAMbSiYDeP9BObzQS+A2xKk5wN3LiW_zQ4S+D_d9WwhYyrq9Mkg@mail.gmail.com> <e8e35fef-96ec-4571-b842-100a7579263c@nwtime.org> <CAPz_-SU9Uk8-UnibFzZOGAZx9drL61tEaoACwdfciUjavEPqWQ@mail.gmail.com> <CAMbSiYDXTZ4B6=+Qu8MizubM+KQR6dvyWJtxpVM8CpWF6vDe6Q@mail.gmail.com>
In-Reply-To: <CAMbSiYDXTZ4B6=+Qu8MizubM+KQR6dvyWJtxpVM8CpWF6vDe6Q@mail.gmail.com>
From: David Venhoek <david@venhoek.nl>
Date: Fri, 12 Jan 2024 10:41:17 +0100
Message-ID: <CAPz_-SV9KUwK_j3wkSRe7BcQhH_8f7hSTbZhOeMuOfwLajcncA@mail.gmail.com>
To: Dave Hart <davehart@gmail.com>
Cc: ntp@ietf.org
Content-Type: multipart/mixed; boundary="000000000000e99468060ebc77b0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/PUKQzpL1WgQswvhArwGg7d2vLac>
Subject: Re: [Ntp] Hard NO: Re: WGLC - draft-ietf-ntp-ntpv5-requirements
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Network Time Protocol <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jan 2024 09:42:00 -0000

Apologies for this taking a while, but between needing access to the
office (which was closed over the holidays) to get clean data
demonstrating my point (I have internal data that shows the same, but
it isn't to a standard I would expect others to accept), and making
some mistakes it took me a while to get measurement data to share.

The data presented here is for ntpd-rs and the reference
implementation, chrony is still running as we speak and I will send it
sometime next week when I have time. My argumentation for the order of
magnitude improvement is this data and the comparison published at
https://chrony-project.org/comparison.html.

What I tested was the following specific situation: A endrun ninja gps
reference clock running ntp, providing time to a raspberry pi 3 via a
HP aruba switch. Each of ntpd-rs and the reference implementation was
configured to use the endrun ninja as it's only remote time reference,
with poll interval of 8 seconds. The raspberry pi 3 and the endrun
ninja were both configured to output a pulse per second signal, the
offset between these was measured.

For each test, the implementation under test was started and given 2
hours to stabilize, and then the offsets between rpi3 and ninja was
measured for 4 hours.

Below are provided for each the standard deviation from mean of the
offset, and 90, 95 and 99 percent worst offsets from the mean offset.
Mean offset was ignored because there are multiple known sources of
assymetries in the system, including from the ninja reference clock
and the switch, and I haven't done sufficient analysis of all sources
to be able to attribute a quantity to the ntp implementation
specifically (hence making this value rather meaningless). Furthermore
assymetries in general are not a function of any particular algorithm,
but rather of how the timestamping is done, which best I can tell is
not standardized in teh spec and hence not part of the current
discussion. Please take the p99 with a grain of salt, as with the 4
hour measuring period the amount of data supporting that is somewhat
on the thin side.

Attached as images are plots of the allan deviation and offsets of the
clock when controlled by each of the implementations in the range
1-2048 seconds.

NTP reference implementation
-----------------------------

standard deviation: 36.5 us
90 percent is within: 24.6 us
95 percent is within: 59.4 us
99 percent is within: 192.2 us

ntpd-rs
-------

standard deviation: 3.4 us
90 percent is within: 5.5 us
95 percent is within: 6.4 us
99 percent is within: 8.9 us


Hope that answers your question regarding what I base my assertions
on. If you are interested in further measurements please let me know,
then we can discuss what we can do practically. Attached below are
some of the technical bounds on errors in the measurement itself for
those interested.

For those wanting to play around with the data themselves, the
measurement offset data is also attached. The format is one line per
second, in order, with the offset in nanoseconds.

Kind regards,
David Venhoek



Uncertainty estimates on the measurements:

In the setup, we have identified the following sources of error:
Cable length differences: We have differences in cable length between
various parts of at most 1 meter. At 0.3c signal propagation speed
this represents a 12 ns uncertainty
Time interval measurement: The time difference measurement aparatus
has a resolution of 1 ns, validated to 10 ns. Furthermore the clocking
used internally gives a 0.01% relative uncertainty. For the range
investigated here this represents an uncertainty of 30 ns
endrun ninja pps edge uncertainty: Has been validated to be within
100ns of gps input.
Rpi3 pps edge uncertainty: Has been validated to be within 100ns of
internal clock.

In combination, the measurements are accurate to at least 300 ns.
Confidence bounds on the derived statistics depend significantly on
assumptions on the underlying distribution, but the differences are
significant enough to be unlikely to be chance for the standard
deviation, p90 and p95 statistics. For the p99 statistic, the sample
size is likely on the small side, especially given correlation between
the individual measurements.

On Thu, Dec 28, 2023 at 7:22 AM Dave Hart <davehart@gmail.com> wrote:
>
> On Wed, 27 Dec 2023 at 13:04, David Venhoek <david@venhoek.nl> wrote:
>>
>> >From my perspective, the fact that apparently (at least as harlan
>> seems to claim in his original email) the algorithm specified in
>> RFC5905 is fragile would be all the more reason to not specify it as
>> "the standard". If this is really the case, we should rather be
>> looking harder for solutions that are more resilient and not so
>> sensitive to
>
>
> Fortunately this is not the case.  Harlan did not claim in his original email of this thread that NTPv4 is fragile.  Your apparent misunderstanding may come from his mention that Dr. Mills showed that _violating_ the specification can easily break a synchronization.  That is quite the opposite of claiming RFC5905's algorithms are fragile.
>
>> [...]
>>
>> Looking at the current moment, we are now in a situation where there
>> are 3 algorithms operating in the space, with the one in the
>> "reference" implementation, the one from chrony and the one from
>> ntpd-rs. I have personally not seen any indication of problems caused
>> by this, rather the oposite. From measurements it looks very likely
>> that both chrony and ntpd-rs are capable of, with similar poll
>> intervals, synchronizing to about an order of magnitude larger
>> precision.
>
>
> Your use of precision is confusing in this context.  In NTP parlance, precision is the minimum change in the system clock, in ancient systems, the clock tick, in modern systems, the time to read the system clock.  I look forward to you documenting your claim of an order of magnitude better performance from Chrony and ntpd-rs compared to the reference implementation.  I would love to improve the performance of NTP across a wide variety of situations and then promote standardizing those improvements.
>>
>> [...]
>>
>> Also, the fact that David L Mills apparantly had the tendency to just
>> change the algorithm without changes to the standard at least on the
>> surface feels also just wrong to me. That gives a lot of vibes of
>> "innovation, but only by us" from the "reference" implementation
>> people, which is I think highly harmful. I sincerely hope that that is
>> just a wrong impression from my side.
>
>
> It sounds like you're not familiar with the history of NTP.  In the beginning, there were fuzzball routers and Arpanet...  I don't think it would be productive for me to spend a lot of time repeating what can be easily discovered with a little effort.  I refer you to Dr. Mills' papers at https://www.eecis.udel.edu/~mills/, the NTP RFCs, and his second edition NTP book "Computer Network Time Synchronization: The Network Time Protocol on Earth and in Space, Second Edition 2nd Edition" ISBN 978-1439814635.
>
> Suffice it to say Mills invented NTP and nurtured it personally for 30 years.  He maintained various iterations of the reference implementations over that timeframe, eventually with the assistance of Harlan Stenn.  Dr. Mills' health no longer allows him to participate substantially.  During that time, he naturally evolved the design _before_ codifying it in standards, and I know of no other reasonable approach.  He never wavered from the position that NTP was not simply a wire protocol, but a suite of algorithms carefully engineered, simulated, and tested over time to have predictable and consistently beneficial behavior in synchronizing networks of devices within the limits of their oscillators, reference clocks, communications channels, hardware, and underlying software.
>
> If you want to use the on-wire protocol without the algorithms, please see the SNTP RFCs.
>
> Cheers,
> Dave Hart
>
>
>>
>> On Wed, Dec 27, 2023 at 8:22 AM Harlan Stenn <stenn@nwtime.org> wrote:
>> >
>> > On 12/26/2023 3:23 PM, Dave Hart wrote:
>> > > On Thu, 14 Dec 2023 at 17:18, Miroslav Lichvar <mlichvar@redhat.com
>> > > <mailto:mlichvar@redhat.com>> wrote:
>> > >
>> > >     On Thu, Dec 14, 2023 at 03:16:29AM -0800, Harlan Stenn wrote:
>> > >      > The core "mission" of NTP is time synchronization with a (well)
>> > >     defined
>> > >      > response to a "time impulse".  This is the reason why previous NTP
>> > >      > specifications have included the algorithms.  Prof. Mills and
>> > >     some others
>> > >      > have done a LOT of testing to ensure reliable and predictable
>> > >     behavior of
>> > >      > time synchronization, in the "normal" and "time impulse" cases
>> > >     over a very
>> > >      > wide range of circumstances.
>> > >
>> > >     If the RFC 5905 PLL+FLL is so great, why is nothing using it, not even
>> > >     the "reference" implementation in default configuration?
>> > >
>> > >
>> > > Would you mind elaborating how the reference implementation's PLL+FLL
>> > > feedback loop differs from the NTPv4 spec?  I'm not aware of any
>> > > intentional deviation, but Dr. Mills wasn't shy about making changes to
>> > > the implementation that he felt was an improvement before documenting it
>> > > in another RFC.
>> > >
>> > >     ntpd in default configuration has a poor response with longer polling
>> > >     intervals. It suffers from oscillations,
>> > >
>> > >
>> > > If verified that would seem to me a reason to improve the algorithms,
>> > > rather than decide it's time for a wild west where every NTP
>> > > implementation is free to behave in any way, as that would invite
>> > > pathological results in situations where differing implementations sit
>> > > on the synchronization path between the reference clock and the ultimate
>> > > client.
>> >
>> > Or better describe the conditions for these problems?
>> >
>> > If you are saying that the default config can show oscillations as poll
>> > intervals increase, all I can say is we haven't seen reports of this.
>> >
>> > If we had, we'd be taking steps to fix it.
>> >
>> > If you have seen this, perhaps you'd be kind enough to post about how
>> > one might change the default values to ones more suitable for longer
>> > poll intervals, or even telling us how to demonstrate the problem.
>> >
>> > >     which can be sometimes seen
>> > >     even on monitoring graphs of pool.ntp.org <http://pool.ntp.org>.
>> > >
>> > >
>> > > That public pool uses primitive monitoring that does not take into
>> > > account the delay or jitter between the monitoring station and the
>> > > server.  Moreover, the requirements for participating are very lenient,
>> > > allowing clocks that appear to be up to 70ms off of UTC.  That pool is
>> > > therefore not a good example of a well-engineered and well-maintained
>> > > synchronization source.  It's fine to get the clock within a few hundred
>> > > milliseconds, but stricter requirements call for a more precise source
>> > > and better error budgeting.
>> > >
>> > >     Nobody seems to care. Maybe
>> > >     it's a bug, but after so many years I think we can conclude that
>> > >     Internet will not break if all NTP implementations don't have the
>> > >     "well defined" response.
>> > >
>> > >
>> > > The internet will not break even if all NTP sources were only good to a
>> > > few seconds.  Those who require tight sync (such as distributed
>> > > databases) engineer solutions to meet their requirements.
>> >
>> > I'm negatively impressed with your conclusion, Miroslav.
>> >
>> > "The Internet" probably won't break, because "the internet" doesn't
>> > exchange time that way, and I would bet that you know this.
>> >
>> > A single machine will either have a crafted config file, well-tended or
>> > not, and static or pool servers.  How well do you think the vast
>> > majority of these machines are monitored to see if there are problems?
>> > How badly would they have to screw up to be noticed?
>> >
>> > If somebody bothers to look and sees one of the hosts in their static
>> > config file is bad, they will likely just throw out the bad site and
>> > replace it.
>> >
>> > If they are using the "pool" directive and there are misbehaving servers
>> > *that otherwise survive the pool monitoring service* then ntpd will
>> > notice the bad performers and throw them out automatically.
>> >
>> > In an enterprise, the odds are quite high that time for the enterprise
>> > is sync'd from a set of curated machines.  These machines are likely
>> > getting their time from reliable sources.  They won't be talking to
>> > poorly-behaving time sources.  This translates to the (internal)
>> > machines that get their time from the (well-behaved/reliable) internal
>> > time sources.
>> >
>> > So sure, the stuff the NTP Project has put out there is very resilient
>> > and well-behaved.  There's a good chance it will continue to behave well
>> > even in an increasingly hostile environment.
>> >
>> > But why would any <positive-intentioned> person want to take steps to
>> > increase the environment's hostility?
>> >
>> > As I have said before, the world of time-synchronization is not the
>> > place to use creative destruction as a method to promote evolution.
>> >
>> > > Cheers,
>> > > Dave Hart
>> > >
>> > >
>> > > _______________________________________________
>> > > ntp mailing list
>> > > ntp@ietf.org
>> > > https://www.ietf.org/mailman/listinfo/ntp
>> >
>> > --
>> > Harlan Stenn <stenn@nwtime.org>
>> > http://networktimefoundation.org - be a member!
>> >
>> > _______________________________________________
>> > ntp mailing list
>> > ntp@ietf.org
>> > https://www.ietf.org/mailman/listinfo/ntp
>>
>> _______________________________________________
>> ntp mailing list
>> ntp@ietf.org
>> https://www.ietf.org/mailman/listinfo/ntp
>
>
>
> --
> Cheers,
> Dave Hart