[Ntp] Re: NTP speed of synchronization question

Dave Hart <davehart@gmail.com> Sat, 24 August 2024 12:21 UTC

Return-Path: <davehart@gmail.com>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 408C3C14F5F1 for <ntp@ietfa.amsl.com>; Sat, 24 Aug 2024 05:21:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vhil86JAmVOP for <ntp@ietfa.amsl.com>; Sat, 24 Aug 2024 05:21:55 -0700 (PDT)
Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 52EBFC14F61F for <ntp@ietf.org>; Sat, 24 Aug 2024 05:21:55 -0700 (PDT)
Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-68d30057ae9so25622067b3.1 for <ntp@ietf.org>; Sat, 24 Aug 2024 05:21:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724502114; x=1725106914; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0tLSWWhuVCeTq/K1BTAQ77/ipsmkRJUMWVdSVtDNsUE=; b=AWSclq/9T8dJA6U+AZoefx5y0Cmp6LkLk5XkVybJKnOPBvRAy4XKZ0/y0MpALpQvJw +dDRwvLKbyJlSHBYaZvgRCHi/V89RnGIE4cdqCDQva5g2fqTGYoFvPOwwYuSSMYjB+RX vGNQ7OxnV0tBVq/ZVDwqHkqsQkojXDh7jrIL0il42o3lihN4MK+CADrI+tRXdZ2OrscW cG5VRFGLEGde5SeVeTW1xnJpim6MZJbDv+7VWHcKenXNyXPqcs/cspiF6s7DCsQDKzS9 /peIspgMezSrcaTXDASo8bHWn2IRta7RyQ0C5N7rpwSGXRljRttjxL0XZBvugn4NOsnI ZwYg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724502114; x=1725106914; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0tLSWWhuVCeTq/K1BTAQ77/ipsmkRJUMWVdSVtDNsUE=; b=Ndu5tpVav/4B9QUUvv8q1UC3VDB9p2otH6FCku6pFXysyxGQTPOz0ceHaPGXj5brno 9TknHM5ZoC5wIHn5oIkvZJf6FyfhNiz+1xFQP1z0IXlKhgaIrCy4OIDX9BRKsrOoeMZy 5yBf0lOv4G7RO2tmhjQM9PNmTIvXYjI++LBWa5Fe6GHMMZ3oP6MOMUwgy5cDYJ3BIiKt tQpAFe+Kd8XiZ3cnq0ZJVZiUFeDns+ew3OJjd/T+MRwcXd1lLqbkbOxIXTYzntKzH9ss vt9BYDduNj5cTZVeivhJ9m82Xq3f5hSoUxIIRA3NJUMuNncWiml7s3IGhXqyzX6Lbkqx BjhA==
X-Forwarded-Encrypted: i=1; AJvYcCVg7Arn+viO5FKrMtwdKiVFHrdjVeoGx7UG4yTjBQnVvZnXmUnQQWILmL+YSMMkUMpUKSA=@ietf.org
X-Gm-Message-State: AOJu0YxDTCZqjpvNTD1ll9F2VF84F8yHdRWMpLDJCIyapOlhCCE4+gMY 6+3svsRG56i0fBdP6b88uupriJgWf4UBUPzOUEEykHhHwIHtQFgRebClRaWXX6j7nDk+WbxRjgn 5Wgpnt+HTwRyUnNqcsFmcO2X2pL4gefmVGkQ=
X-Google-Smtp-Source: AGHT+IHPpnRjmufRLTkPubL9B7DB22X+3cSjEtZ8Mqrr3V9M89rQBK70EF2vWVhps2TfjbKFUztWMBwZoqey66ugUYM=
X-Received: by 2002:a05:690c:660e:b0:6ac:1469:7bfd with SMTP id 00721157ae682-6c625958511mr63680517b3.19.1724502114253; Sat, 24 Aug 2024 05:21:54 -0700 (PDT)
MIME-Version: 1.0
References: <tte@cs.fau.de> <ZrxmPeWAKqsIv86j@faui48e.informatik.uni-erlangen.de> <20240821102828.9398E62003D@107-137-68-211.lightspeed.sntcca.sbcglobal.net>
In-Reply-To: <20240821102828.9398E62003D@107-137-68-211.lightspeed.sntcca.sbcglobal.net>
From: Dave Hart <davehart@gmail.com>
Date: Sat, 24 Aug 2024 12:21:43 +0000
Message-ID: <CAMbSiYCkU1bMq7Svynf6cmYDs2LRzR2RXmr5gFURUCfq3hMJyA@mail.gmail.com>
To: Hal Murray <halmurray+ietf@sonic.net>
Content-Type: multipart/alternative; boundary="000000000000ef485c06206cefea"
Message-ID-Hash: B7SF4K4W4ZT52XRJSQK5QSMKXOWIESHP
X-Message-ID-Hash: B7SF4K4W4ZT52XRJSQK5QSMKXOWIESHP
X-MailFrom: davehart@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-ntp.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Toerless Eckert <tte@cs.fau.de>, ntp@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Ntp] Re: NTP speed of synchronization question
List-Id: Network Time Protocol <ntp.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/v1wsdrQK4VVKWY7FwEEfI_BH1cc>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Owner: <mailto:ntp-owner@ietf.org>
List-Post: <mailto:ntp@ietf.org>
List-Subscribe: <mailto:ntp-join@ietf.org>
List-Unsubscribe: <mailto:ntp-leave@ietf.org>

On Wed, 21 Aug 2024 at 10:30, Hal Murray <halmurray+ietf@sonic.net> wrote:

>
> tte@cs.fau.de said:
> > Assuming typical high-end routers and their implementation of NTP:
>
> > 1. Is there any reasonw why we should NOT be able to achieve
> > synchronization of less than 5 msec ?
>
> > 2. Assuming synchronization only across some WAN connection...
>
> I know roughly nothing about high-end routers.
>
> The previous answers looked good.  Here is my 2 cents worth.
>
> You need to ask the router venders what sort of NTP software they are
> running.  There is a lot of crapware out there claiming to be NTP.  It
> wouldn't surprise me if some vendor was still using their old
> works-well-enough software.  If it ain't broke, don't fix it.
>

I am not familiar with carrier-grade routers either, but I know in the past
Cisco used the reference implementation of NTP in some of their IOS
routers.  I have no idea how closely they track upstream changes to that
code.

If you want to get in the 5 ms range using servers on the WAN, you need to
> pick your servers carefully.  The server needs to be accurate and the
> routing needs to be symmetric and NTP traffic needs to not be filtered.
>
> I'm in Silicon Valley with an AT&T fiber.  I have a good GPS setup to use
> as a reference.
>
> From here, the time from NIST servers in Boulder and Fort Colins is off
> by
> about 6 ms.  The round trip time is normally 45 ms.  It ccasionally drops
> to 33 ms for a while.  When that happens, the time is accurate.  The
> normal case is classic asymmetric routing.  Those servers look good from a
> cloud server in San Francisco -- 28 ms round trip.
>

Good point, Hal.  It's worth noting that asymmetric routing is typical on
the internet, but the impact in latency varies.  Most ASes use hot potato
routing and push traffic off their own network ASAP, so even when the two
end ASes are directly peered with each other, your packets will ride the
destination's network most of the way, and the destination is of course
different in the request vs. the response.  The more upmarket the involved
ASes along the paths are, the less likely this asymmetry will result in
substantially asymmetric latency.  I have 1 Gbps fiber in a rural area
(thank heaven) since 2020, but it uses bargain upstreams Cogent and
Windstream as much as possible, with Hurricane Electric providing the
fallback reachability to the networks the downmarket upstreams can't get
to.  As a result, I'm more likely to see asymmetric latency than if my ISP
were connected to, say, Level3 and UUnet, or even Hurricane only.

Various ISPs, routers, and whatevers filter NTP packets.  That's leftover
> from a giant DDoS attack many years ago that used NTP's monlist option.
> That filtering is rarely documented.  It's a pain to debug and close to
> impossible for mere mortals to get fixed.
>

That whole 2015 monlist mess is a real sore point for me.  I wasn't activly
working on NTP development at the time, but I was aware of the risk of
amplification due to mode 7 (ntpdc) monlist in ntp-dev in 2010, and
provided a reflection-proof mode 6 (ntpq) mrulist replacement, and also
disabled all mode 7 responses by default in ntpd back then.  Unfortunately,
the release manager of ntpd didn't release a ntp-stable with those changes
until 2014 (!).  When the wave of monlist abuses started, the only way to
mitigate for people using older stable releases was to use "restrict
noquery" which also blocks mode 6 queries, which are not subject to
substantial amplification.  Worse yet, network operators trying to restrict
the carnage began filtering all NTP packets other than modes 2 & 3 (client
and server) blocking ntpq at the network level.  Now it seems common wisdom
is that mode 6 queries are a huge security and DDoS threat, so innocent
uses like probing the chain of NTP to the reference clock using ntptrace,
or looking at public server read-only variables with "ntpq -crv" or peers
with "ntpq -p" almost never works.  What a monumental waste.
[...]

> You can't tell how far off a system's clock is unless you have a good
> local clock to use as a reference.  You can't just ask the system.  If it
> knew it was off it would fix it.  With asymmetric routing, it will think
> it is good but will be off by half the difference in transit times.
>

You're not wrong, but that's a bit oversimplified.  The NTP server has an
idea of how far its clock is off from its sources, but it's a short-term
estimate that is intentionally fed through a slower feedback loop steering
its clock to be in agreement and refining its correction to its clock's
frequency error.  It would be a mistake to, for example, use "ntpq -crv"
against your single source NTP server and apply its reported offset as a
correction to the time it reports, because that's a volatile estimate that
changes often as packets are exchanged, while (from our non-Einsteinian
perspective) the actual time always ticks at a fixed rate and NTP's
feedback loop takes advantage of that fact to refine its local clock to
match without slavishly following the short-term errors it sees from its
sources, much of which can be due to vagaries of internet latency.

OP:  As you might have picked up, any question about how close NTP can sync
a clock is likely to get a "it depends" answer that really requires a bit
of a dive into how NTP and its carefully-refined algorithms work to be more
concrete.  It is notable that in the basic NTP packet (and therefore
available even with the widespread blocking of ntpq queries) is a figure
referred to as "root dispersion" which amounts to NTP's error budget all
the way to the reference clock at stratum 0 (that is, through all the
intermediate NTP servers to a reference clock like a GPS receiver or
highly-stable oscillator).  Assuming all the NTP servers along that chain
are implementing NTPv4 and not just the NTP on-wire format and some level
of other voodoo not in the spec, like just about every implementation
except ntpd, that error budget gives you a conservative maximum error bound
on the time at that NTP server.  If you use ntpd locally, its root
dispersion gives that error bound on your local clock.  You can retrieve it
easily with 'ntpq -c "rv 0 rootdisp"' in milliseconds.  For example, on a
machine of mine with a GPS reference clock:

C:\Users\daveh>ntpq -nc "rv 0 rootdisp offset" -p ntp.md.
rootdisp=1.255, offset=+0.020116

     remote           refid      st t when poll reach   delay   offset
 jitter
==============================================================================
o127.127.20.4    .GPPS.           0 l   17   32  377    0.000   +0.020
0.020

Notice that even with observed offset of 20us and jitter (variability in
that offset) of 20us, the conservative error bound is 1.26ms.

I spelunked the source code of one embedded NTP appliance using a GPS
source and found it had none of the NTPv4 algorithms.  It queried the GPS
chip on each request packet and filled in a response with a root dispersion
of 0.0.  Buyer beware.  OTOH, Meinberg sells NTP appliances which run ntpd
and are faithful to the specification.

I fear some NTP server implementations are tuned to bring the offset to
zero very quickly, as a well-meaning but misunderstanding user would think
that means their clock is accurate quickly, when in fact it may mean their
clock is quickly following the errors in short-term measurements that
should be fed through the clustering, combine, clock filter and clock
discipline algorithm.

-- 
Cheers,
Dave Hart