[Ntp] Antw: [EXT] Re: Comments on Miroslav's NTP v5 proposal.

Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> Wed, 21 September 2022 06:49 UTC

Message-Id: <632AB3F4020000A10004DF29@gwsmtp.uni-regensburg.de>
Date: Wed, 21 Sep 2022 08:49:24 +0200
From: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
To: mlichvar@redhat.com
Cc: "ntp@ietf.org" <ntp@ietf.org>
References: <CACsn0cn8ULX5f_PQVbDsrirPnGHVPWGgMqXn52n_T4P5ELkKgQ@mail.gmail.com> <20201126110406.GQ1734865@localhost> <63242AEF020000A10004DD7A@gwsmtp.uni-regensburg.de> <YynRttjp80w29x7r@localhost>
In-Reply-To: <YynRttjp80w29x7r@localhost>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/lu9ntwl0sGtDlm2q8a7Ychlf5kc>
Subject: [Ntp] Antw: [EXT] Re: Comments on Miroslav's NTP v5 proposal.
Precedence: list

>>> Miroslav Lichvar <mlichvar@redhat.com> schrieb am 20.09.2022 um 16:44 in
Nachricht <YynRttjp80w29x7r@localhost>:
> On Fri, Sep 16, 2022 at 09:51:11AM +0200, Ulrich Windl wrote:
>> A pull request with comments on the draft will follow shortly on github,
but 
> for the git(hub)‑agnostic, I'm attaching a plain old unified diff (ignoring

> the white space changes that the editor applied automatically in some
cases).
> 
> Thanks for the comments and suggestions. I'm glad someone looked at it
> in such detail. Most of your suggestions make sense to me and I'll try
> to address them in the next version of the draft.
> 
> Here are my comments and questions for the rest. 
> 
>> @@ ‑74,6 +77,9 @@
>>              and server modes remain in NTPv5.</t>
>>  
>>              <t>Timestamps are clearly separated from values used as 
> cookies.</t>
>> +	    <!‑‑UW: In general it would be advisable to define phrases before
>> +		using them; "cookies", "MAC field" and "timescales" for
>> +		example. ‑‑>
> 
> This is just a part of the introduction comparing NTPv5 with NTPv4. Do
> we need to go in such detail and define these basic terms?
> 
>>          <list style="hanging">
>>            <t hangText="time16"><vspace/>
>> ‑            A 16‑bit fixed‑point type containing values in seconds. It has
1
>> ‑            signed integer bit (i.e. it is just the sign) and 15
fractional
>> ‑            bits. The minimum value is the fraction ‑32767/32768 (almost
‑1
>> +            A 16‑bit signed fixed‑point type containing values in seconds.
It 
> has 1
>> +            sign bit (i.e. it is just the sign) and 15 bits describing the

> fractional
>> +            part. The minimum value is the fraction ‑32767/32768 (almost
‑1
> 
> I think we might want to drop this type completely due to the
> possibility that UTC will be decoupled from UT1, so it couldn't
> represent larger offsets. It might work better as an extension field.
> 
>> +      <!‑‑UW: I would add more types like s_log8 (signed number expressed
as a
>> +          power of two using 8 bits, maybe including an undefined value),
>> +          u_log8 (unsigned number expressed as a power of two using 8
bits,
>> +          s_i16 (signed interger using 16 bits) ‑‑>
> 
> Where would be u_log8 used and do we need to define a type for each
> width of a plain integer? There are also 2‑bit and 3‑bit fields.

I guessed it was the polling interval, but it actually seems to be signed
(s_log8). But in general:
I think it's advisable to explain a type once as compared to explaining it
multiple times if it's used in more than one field. It makes the specification
easier to read (it's more obvious that both _are_ the same type).

2- and 3-bit fields could be true bitfields (with bits to defined separately),
or they could be "enums". (In the mode 6 protocol there is some funny mix of
bit fields and enums in the status fields, and it's not completely obvious
"who's who")

> 
>> @@ ‑191,6 +210,9 @@
>>  
>>        <figure align="center" anchor="message‑format"
>>                title="Format of NTPv5 messages">
>> +	<!‑‑UW: I'd assign a short label for the fields with a long name, used
>> +	     to reference them uniquely in the main text (like "CC" for
>> +	     "Client Cookie") ‑‑>
> 
> I suspect that could be confusing. I prefer the longer capitalized
> forms.

Or put the shorter names used later in parentheses. I see that it's hard to
get a descriptive name into a 2-bit field for example. But still I think
consistent naming is rather important.

> 
>> @@ ‑306,21 +334,32 @@
>>              timestamp. In requests it is always 0.</t>
>>  
>>              <t hangText="Timescale Offset"><vspace/>
>> +	    <!‑‑UW: This could be the s_i16 type ‑‑>
>>              A 16‑bit value specific to the selected timescale, which is
>>              referenced to the receive timestamp. In requests it is always
>>              0.
>> +	    <!‑‑UW: A high‑level description is missing ‑‑>
> 
> What would you describe here?

As for the beginning, define what "timescale" is, the define what sign the
offset will have (it's completely non obvious):
Is it always offset from the timescale (which direction?), or is it always
offset from UTC, or is it always offset from TAI?
You get the idea what I'm talking about...

> 
>>              <list style="symbols">
>>                <t>In the UTC (0) and TAI (1) timescales it is the TAI‑UTC 
> offset
>>                  (TAI minus UTC) as a signed integer, or 0x8000 if 
> unknown.</t>
>>                <t>In the UT1 timescale (2) it is the UT1‑UTC offset (UT1
minus
>>                UTC) using the time16 type (0x8000 if unknown).</t>
>> +	      <!‑‑UW: Shouldn't it be something like UT1‑TAI instead (or
>> +	          TAI‑UT1?)? ‑‑>
> 
> That's how DUT1 is defined and I'd say UTC is the most important
> timescale so everything else should be against it.

See above. I'm not convinced, however.

> 
>> +	      <!‑‑UW: I also think it's bad design to use two different data
>> +		  types in one field ‑‑>
> 
> Would it help if you considered it as two different fields sharing the
> same position?

It has to be implemented like that, anyway.

> 
>> @@ ‑428,6 +477,7 @@
>>          <t>As each client can be synchronized to an unlimited number of
>>            servers (and there can be up to 15 strata of servers), the 
> reference
>>            IDs are exchanged as a Bloom filter instead of a list to limit 
> the
>> +	  <!‑‑UW: Add a reference for "Bloom Filter" ‑‑>
>>            amount of data that needs to be exchanged.</t>
> 
> Any examples of a reference I could use?

Don't know: Any RFC? I guess not. Maybe English Wikipedia, if there a suitable
article on it, or some reference.
Honestly I did not hear of Bloom filters before, so I guess you cannot assume
that everyone knows what it is and what it's properties are. Maybe you could
also add your own description in an appendix.

> 
>>            synchronization loop with one of its servers, it SHOULD stop 
> using
>> ‑          the server for synchronization. When the client's reference ID
is
>> +	  <!‑‑UW: Formally define "synchronization loop"! ‑‑>
> 
> FWIW, RFCs for previous NTP versions didn't define it.

Interestingly, maybe that's exacly why there are lots of interesting
discussions about it. Assuming everybody "knows" what it is is more likely
everybody "thinks he/she knows" what it is in practice. To avoid that make a
formal definition.

> 
>> @@ ‑461,12 +517,12 @@
>>  
>>          <t>False positives are possible. The probability of a collision
grows
>>            with the number of reference IDs in the filter. With 26
reference 
> IDs
>> ‑          it is about 1e‑12. With 118 IDs it is about 1e‑6. The client MAY

> avoid
>> +          it is about 1e‑12.<!‑‑ please explain giving a formula! ‑‑> With
118 IDs 
> it is about 1e‑6. The client MAY avoid
> 
> That formula isn't very nice and I don't think it's important to know.
> Curious readers can find it in the reference which will be provided :).

I think "throwing around with numbers" in some spec that aren't explained is a
bad thing.
Like for NTPv3: Why is minpoll 6 and maxpoll 10? If one digs deeply into the
integer algorithms, it's mostly to avoid overflows and numeric issues (AFAIK).

> 
>> @@ ‑614,15 +673,21 @@
>>                 Transmit with all zeros.</t>
>>  
>>                 <t hangText="Delay Correction"><vspace/>
>> +	       <!‑‑UW: Is there a specific reason to add yet another datatype
>> +		   instead of using "timestamp64"? ‑‑>
> 
> Better resolution and compatibility with existing hardware.

But will existing PTP hardware be able to "massage" or understand NTPv5
packets without a change? I doubt that.

> 
>>                 A signed fixed‑point number of nanoseconds with 48 integer
>>                 bits and 16 binary fractional bits, which represents the 
> current
>>                 correction of the network delay that has accumulated for 
> this
>>                 packet on the path from the source to the destination. The
>>                 format of this field is identical to the PTP 
> correctionField.</t>
>> +	       <!‑‑UW: Is it necessary to use the PTP format here? The
>> +		   hardware owuld have to be adapted for NTP anyway ‑‑>
> 
> The conversion to NTP format is expensive and would make it more
> difficult to implement the support in the hardware.

But aren't 16 bits of fractional nanoseconds not just "bit noise"?


> 
>> @@ ‑633,12 +698,12 @@
>>            </list>
>>          </t>
>>  
>> ‑        <t>A correction capable client SHALL transmit the request with
the
>> +        <t>A correction capable client system SHALL transmit the request
with 
> the
> 
> Can you please explain the difference here?

The emphasis is on _who_ does the correction: "Client" could be the final
software implementation, while "client system" could be any combination of
hard- and software seen as "one system" (like today's storage system actually
consist of multiple computers (controller nodes) and networks inside). Like a
stratum-1 server being a hardware combination of a reference clock and a
computer system.

> 
>> @@ ‑747,6 +813,8 @@
>>  
>>      <section title="Measurement Modes" anchor="measurement‑modes">
>>  
>> +      <!‑‑UW: Shouldn't you give at least a sketch of the NTP algorithms
here?
>> +      ‑‑>
> 
> The algorithms from RFC 5905? I think the idea for this document was to
> avoid describing those.

As said else where, but you define offset and dispersion. Leaving out the
description of the NTP algorithms makes "interleaved mode" just a big riddle
IMHO.

> 
>> @@ ‑787,10 +858,15 @@
>>          interleaved mode enables the server to provide the client with a 
> more
>>          accurate transmit timestamp which is available only after the
>>        response was formed or sent.</t>
>> +      <!‑‑UW: Explain interleave mode in greater detail; leaving out the
>> +	  algorithms and just claiming the "correct" values should be in the
>> +	  packet is a strange concept. ‑‑>
> 
> I don't follow. Please elaborate.

As explained above: When only reading that document, the reader gets no idea
what interleaved mode is and what it is intended for.

> 
>> @@ ‑833,6 +911,7 @@ Tx  | 0  |    | t3 |      | 0  |    | t7 |      | 0  | 
  
> |t11 |
>>          The messages in the basic and interleaved mode are indicated with
B 
> and
>>          I respectively. The timestamps t3' and t11' correspond to the
same
>>          transmissions as t3 and t11, but they may be less accurate. The 
> first
>> +	<!‑‑UW: It's not obvious what the reason for t3' and T11' is ‑‑>
> 
> Do you mean something else that nothing better being available?

Basically I mean: What is "t3'" not just "t3"; what does "'" indicate there?

> 
>> @@ ‑884,6 +963,7 @@ Tx  | 0  |    | t3'|      | 0  |    | t3 |      | 0  | 
  
> |t11'|
>>          stability of the clock. When polling a public server on Internet,

> the
>>          client SHOULD use at least a polling interval of 64 seconds, 
> increasing
>>        up to at least 1024 seconds.</t>
>> +      <!‑‑UW: Why exactly those numbers; explain! ‑‑>
> 
> It's just a recommendation matching the current defaults of the known
> clients. No math behind it.

As said before: Even in the past those numbers weren't well-explained.
Continuing with "it's always been like that" doesn't make it a good
specification. (A bit like the use of semicolons in some programming languages:
Do they end a statement, or do they separate statements, or isn't there any
difference at all?)

> 
>> @@ ‑893,6 +973,11 @@ Tx  | 0  |    | t3'|      | 0  |    | t3 |      | 0  |
  
>  |t11'|
>>  
>>          <list style="numbers">
>>            <t>Generates a new random cookie.</t>
>> +	  <!‑‑UW: Is it a "new" coockie in the sense of NONCE, or is it a
>> +	      random cookie (which is also new in most cases); if it has to be
>> +	      random AND unique, say so!‑‑>
> 
> Ideally we want it to be a nonce, but that is difficult to do in the
> 64‑bit range of values, so I think cookie is a better description.

But "cookie" isn't formally defined anywhere!

> 
>> +	  <!‑‑UW: Also: Shouldn't the list start with the second item, and
>> +	      talking about the cookie when it it needed? ‑‑>
> 
> I don't see the advantage. It can work either way.

It works, but the natural order is different IMHO.

> 
>> @@ ‑973,6 +1062,8 @@ Tx  | 0  |    | t3'|      | 0  |    | t3 |      | 0  |
  
>  |t11'|
>>              dispersion. Otherwise, it MUST be 0, 1, 2, depending on
whether 
> a
>>              leap second is pending in the next 14 day and, if it is, 
> whether it
>>            will be inserted or deleted.</t>
>> +	  <!‑‑UW: Should there be an extension field describing the time (at
>> +	      least) of the leap event; maybe more details could be added. ‑‑>
> 
> The time of the event is the last day of the month. What other
> information do you think would be useful?

That's how it is now, and you must add the logic into the program. To be more
flexible and ready for the future, and to make it absolutely clear when the
event occurs, it should be included. Also that would allow more flexible
announcement intervals (like as early as it's known that there will be a leap
event).

> 
>>  
>> ‑              <t>Era is set to the NTP era of the receive timestamp.</t>
>> +              <t>Era is set to the NTP era of the receive timestamp if it
was 
> valid.</t>
> 
> Do you mean the case when the server is not synchronized and responding
> with leap=3? The receive timestamp is not expected to be zero.

I mean "blindly" copying input to output without validating might be a bad
idea.

> 
>> @@ ‑1039,6 +1130,9 @@ Tx  | 0  |    | t3'|      | 0  |    | t3 |      | 0 
|  
>   |t11'|
>>                  identifies a more accurate transmit timestamp of the 
> response,
>>                  which can be retrieved by the client later with another
>>                request. The cookie generation is
implementation‑specific.</t>
>> +	      <!‑‑UW: I'd expect "random" or "NONCE", but is this for
>> +		  compatibility where it would just be a timestamp? This
>> +		  should be discussed IMHO ‑‑>
> 
> In the NTPv4 interleaved mode it's the (unique in short term) receive
> timestamp. In NTPv5 it could be anything that allows the server to
> identify the previous request. It could be a simple counter, the same
> timestamp as in NTPv4, or something encrypted if there was a reason to
> make it more difficult for off‑path attackers to access to the stored
> timestamps. Unlike in NTPv4 servers don't have to drop it after first
> use, so I'm not sure how important that would be.

What I meant to say was: Explain why it moved from being a timestamp once to
something different now.

Reagrds,
Ulrich

> 
> ‑‑ 
> Miroslav Lichvar

[Ntp] Comments on Miroslav's NTP v5 proposal. Watson Ladd
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Daniel Franke
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Miroslav Lichvar
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Doug Arnold
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Doug Arnold
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Kurt Roeckx
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Miroslav Lichvar
[Ntp] Antw: [EXT] Re: Comments on Miroslav's NTP … Ulrich Windl
[Ntp] Comments on Miroslav's NTP v5 proposal. Ulrich Windl
Re: [Ntp] Comments on Miroslav's NTP v5 proposal. Miroslav Lichvar
[Ntp] Antw: [EXT] Re: Comments on Miroslav's NTP … Ulrich Windl