Re: [ntpwg] Comments on new drafts

Harlan Stenn <stenn@ntp.org> Sat, 09 April 2016 01:12 UTC

From: Harlan Stenn <stenn@ntp.org>
To: Miroslav Lichvar <mlichvar@redhat.com>
In-reply-to: <20160408132559.GA14867@localhost>
References: <20160407121447.GE20410@localhost> <E1aoHHv-0005vf-03@stenn.ntp.org> <20160408085726.GA13598@localhost> <E1aoU0b-0006mQ-Sa@stenn.ntp.org> <20160408132559.GA14867@localhost>
Comments: In-reply-to Miroslav Lichvar <mlichvar@redhat.com> message dated "Fri, 08 Apr 2016 15:25:59 +0200."
Mime-Version: 1.0 (generated by tm-edit 1.8)
Date: Sat, 09 Apr 2016 00:59:39 +0000
Message-Id: <E1aohFT-0007TT-SE@stenn.ntp.org>
Subject: Re: [ntpwg] Comments on new drafts
Precedence: list
Cc: ntpwg@lists.ntp.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ntpwg-bounces+ntp-archives-ahfae6za=lists.ietf.org@lists.ntp.org
Sender: ntpwg <ntpwg-bounces+ntp-archives-ahfae6za=lists.ietf.org@lists.ntp.org>

Miroslav Lichvar writes:
> On Fri, Apr 08, 2016 at 10:51:25AM +0000, Harlan Stenn wrote:
> > Miroslav Lichvar writes:
> 
> > > > > draft-mayer-ntp-mac-extension-field-00
> 
> > > Beside knowing which hash function was used in the MAC, the type field
> > > will allow other MACs in the field, not just those used by
> > > authentication with symmetric keys.
> > 
> > The security folks tell me that it's better to not advertise the
> > algorithm.  If you get adequate consensus from them that it's not an
> > issue to publish the hash function then OK.
> 
> Ok, I can try. Who made the suggestion? Are they following this list?

I believe I heard this from Sharon Goldberg and her team (who are
following this, TTBOMK) and also from at least one of the Cisco teams.

> > Oh, what's stopping people from publishing 1 algorithm but actually
> > using another, again for security purposes?
> 
> Sure, they can do that and if it's not a common practice, it might
> work. That's security through obscurity. The reasonable approach is to
> use secure keys.

It's keeping keys secure *and* using acceptable hash algorithms.

The purpose of the MAC is twofold, to detect packet tampering and to
authenticate the packet data.

> > The NTS designers have told me they do not want to do this.  The NTS
> > stuff uses its packet for a bunch of different things.  So you're
> > suggesting we don't use a new EF for NTS?
> 
> Having all MACs in one extension field looks like a cleaner approach
> to me, but I don't want to make that suggestion unless it can be done
> without further delaying the NTS draft.

I have no strong opinion on this - I was originally expecting this to be
the approach, in that NTS would do its thing and pass us a MAC that we
would use.  But the NTS designers said "No, we're including the MAC in
our packet and we'll return appropriate status codes to you so you'll
know whether or not the EF contained an authenticator, and whether
or not the packet passed this MAC check."

This goes to the bit in the EF that says "MAC-INCLUDED", as the
timestamps on packets that use NTS cannot be trusted until they are
authenticated or can be reality-checked with other trusted packets.

That's also why I want to *allow* NTS packets to also be covered, at
least initially, with symmetric key MACs.

> > > > > draft-stenn-ntp-i-do-00
> 
> > As for examples of how the bits for MAC-OPTIONAL and MAC-INCLUDED, in
> > what way have I provided incomplete examples or supporting explanation
> > before?
> 
> I'd like to see an example where the client or server does anything
> differently when the bit is set the other way, i.e. why it needs to be
> transferred in NTP packets.

It goes to allowing the local NTP instance to do some high-level sanity
checks, and to have more information about "should I drop a packet that
contains an EF with no MAC protection or just ignore that EF?"

> > > > > draft-stenn-ntp-ipv6-refid-hash-00
> 
> > 5) There are a number of different mechisms proposed around REFIDs.
> > They are not mutually exclusive.  You should have the flexibility here
> > to implement a local policy you are comfortable with.
> 
> If you only want to prevent loop, they might not be exclusive. But if
> you want to use refid to detect you are using the same source as your
> server, mixing different kinds of refids in the same network will
> break it.

Being (usually) able to make that choice is possible with the
traditional REFID.

It was never a defined usage, and how valuable is this capability?

I'm game to tweak the proposed language changes to show that sending the
legacy IPv6 refid hash is deprecated, because we're starting to use
values in the first octet in the 240.0.0.0/4 range for various special
cases.  The calculation is the same, but the proposed new IPv6 refid
puts 0xFF as the first octet.

Returning to your point about other ways refids can be used, do you want
to avoid using a system that is sync'd with a specific refclock driver
just because another of your servers is also using that refclock driver?
What about, for example, all GPS?

If you already sync with a NIST server, does that mean you do not want
to sync with another server that syncs with NIST?

The national labs all sync with each other.  If you are syncing with
NIST do you then want to avoid syncing with PTB?

What exactly is the problem you are trying to solve here?

> > > > > draft-stenn-ntp-leap-smear-refid-00
> 
> > > > Where else can it go that it's obvious, visible, and almost totally
> > > > backward-compatible?
> > > 
> > > I think the reference timestamp would be a much better choice.
> 
> > 1) How does somebody know when the reference timestamp is a reference
> > timestamp and when it's an offset?
> 
> You can use a special refid to indicate leap smearing is active and
> reftime has special meaning.

That seems like a lot of work for little gain, and it's a whole lot
messier, IMO.  It's also, as I mentioned, not backward compatible.

> > Whether or not the reference
> > timestamp in its current form is sufficiently useful is a different
> > discussion.  But we should not "hijack" it without a compelling reason.
> 
> Very few client implementations look at the reference timestamp. ntpd
> as a client ignores it completely. From how it's specified in NTPv3
> and NTPv4, there is about 44 bits that can set arbitrarily without
> breaking things. It just needs to be not newer than the transmit
> timestamp and not older than few hours.

And several of those conditions you mention will not be valid during the
leap smear, and older software *will not know this* so it seems like
this approach is creating new problems in the very population of clients
where we are trying to solve the leap smear issue.

If folks think this is a useful alternative way to go, however, let's
write it up and get it passed as an alternative mechanism.

> You could use the encoding specified in the refid draft directly in
> the fractional part of the timestamp. You would just need to decrement
> seconds by one to make sure it's not newer than the transmit
> timestamp. No client would have a problem with that and refid would
> stay clean.

Yes, and one of the points of this is that we want to CLEARLY identify
that a leap smear is in effect.  So we'd still be offering a "unique"
refid in this case, right?  To keep the proposals someowhat compatible,
we could use 254.a.b.c, where if a.b.c are all 0 the offset is encoded
as you describe in the reference time, and otherwise they are encoded as
I describe.

I still think your approach is suboptimal.  The 'ntpq -p' billboard will
only show a refid of 254.0.0.0 during the entire time of the leap smear,
and one will have to take extra steps to try and get the amount of
smear.

With my proposal, the amount of smear is visible, in an encoded form, in
a single ntpq query.

If one is logging this information, it's arguably easier to log a
timestamp and a refid then it is to log a timestamp, a refid, and a
reference time.  Unless all that information is already being logged.

> > 2) How is this easily visible from ntpq?
> 
> ntpq -c rv <assid> prints the reference timestamp.

That's not my idea of easy, and it may not be visible soon as we're
about to implement access restriction control to that sort of data.

> > First, I'm not sure I buy the 1/256 collision thing.  How is this not a 1
> > in 4B chance of collision that would only happen on 1 poll exchange?
> 
> On average 1 in 256 IPv6 servers (using the current definition of
> refid) will appear as constantly performing leap smear. If a client
> should act on that, it would be a problem. If it's meant only for
> humans monitoring the network (who probably can figure out the offset
> is not changing and it's a false positive), the draft should make it
> clear.

The proposal is designed to make it easy for monitoring, as the
premise is that the clients are too old/simple to handle a leap second
insertion themselves.

1) they would appear to be constantly applying a *constant* smear.  If
they were *serving time to a client*.  One does not see the refid smear
if one does an 'ntpq -p' query to the server itself.  One does not get a
refid smear if one is using This refid is only
sent in a time response packet to a *client mode* request.

2) Clients do not act on this information.  The whole purpose of sending
leap-smeared time is to send them:

- only in response to client mode requests

- to systems that are expected to be unable to deal with leap seconds

3) if you are in a network where you have upgraded your servers to send
leap-smeared time, how much harder is it to upgrade those servers before
the next leap-second so they include this fix too?  It's already
established that the leap smear is only needed in environments where
there are old clients that do not properly handle leap seconds.  If
these old clients *were* updated there would be no need to use a leap
smear.

4) I had another important point to make about this but I got
interrupted before I could write it down.  It's gone now :)

> > > If you need to see the fact that the server is leap smearing in the
> > > ntpq output in the refid field, pick a constant value for refid and
> > > don't encode the offset. That's what I did in chrony.
> > 
> > What refid value did you choose?
> 
> 127.127.1.255

I implemented what I considered to be the best policy.  You implemented
what you consider to be best policy.

If there is no clear consensus on the "one way to go" let's put both in
the Standard and give folks the option to pick the one they want.

> > What happens when you have different servers offering different
> > applications of the leap smear?
> 
> Things might break horribly. 
> 
> > How can you tell what they are doing?
> 
> There is a command that prints the current leap smearing status and
> offset in the chrony monitoring protocol.
> 
> > > > > draft-stenn-ntp-suggest-refid-00
> 
> > If folks want to know about an upstream server and they have
> > authorization to ask, they can issue a mode 6 request and see what the
> > source address is for that association.
> 
> Mode 6 is an ntpd-specific extension of NTP.

No it's not.  It's part of the NTPv3 Standard that was inadvertently
omitted from the NTPv4 Standard.  Karen mentioned this again last week
at the IETF WG meeting.

> Even if the server runs
> ntpd, it's typically disabled. NTP clients can't work with that.

That's a local policy choice.  It's trivial in the reference
implementation to control whether or not mode 6/7 queries are answered.

Clients do not need mode 6 for their operation.

Mode 6 is about monitoring and control.

> > Was it you who mentioned that you saw a 3-system loop, and that loop was
> > on systems that were using neither orphan mode nor local refclock?  If
> > that's the case, then please show how increasing the protocol complexity
> > (including a security analysis) to detect these loops is worth the cost.
> > Why just not fix the configuration so this doesn't happen?  Why didn't
> > your systems monitoring catch this when the stratum of your servers went
> > out-of-spec?
> 
> I think the monitoring caught it, that's not the problem. The problem
> was they were unusable for clients below them, it's an unexpected
> situation. A local reference was not an option as the servers and
> clients were required to always report real synchronisation distance.

What is the real synchronization difference from a time island?

Does it depend on the reference time and the sync distance at that time,
adding "PHI*elapsed seconds since sync" to that value?

H
_______________________________________________
ntpwg mailing list
ntpwg@lists.ntp.org
http://lists.ntp.org/listinfo/ntpwg

[ntpwg] Comments on new drafts Miroslav Lichvar
Re: [ntpwg] Comments on new drafts Harlan Stenn
Re: [ntpwg] Comments on new drafts brian utterback
Re: [ntpwg] Comments on new drafts Harlan Stenn
Re: [ntpwg] Comments on new drafts Miroslav Lichvar
Re: [ntpwg] Comments on new drafts Harlan Stenn
Re: [ntpwg] Comments on new drafts Miroslav Lichvar
Re: [ntpwg] Comments on new drafts Harlan Stenn
Re: [ntpwg] Comments on new drafts Miroslav Lichvar
Re: [ntpwg] Comments on new drafts Sharon Goldberg
Re: [ntpwg] Comments on new drafts Sharon Goldberg