[Ntp] NTS pool support

Christer Weinigel <christer@weinigel.se> Sat, 20 July 2019 20:37 UTC

Return-Path: <christer@weinigel.se>
X-Original-To: ntp@ietfa.amsl.com
Delivered-To: ntp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8B33912019C for <ntp@ietfa.amsl.com>; Sat, 20 Jul 2019 13:37:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.917
X-Spam-Level:
X-Spam-Status: No, score=-0.917 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_DYNAMIC=0.982, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id byD6X_PgEJCr for <ntp@ietfa.amsl.com>; Sat, 20 Jul 2019 13:37:27 -0700 (PDT)
Received: from mail.weinigel.se (37-46-169-123.customers.ownit.se [37.46.169.123]) by ietfa.amsl.com (Postfix) with ESMTP id 7F83A12014B for <ntp@ietf.org>; Sat, 20 Jul 2019 13:37:27 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by zoo.weinigel.se (Postfix) with ESMTP id 5210C1E0275 for <ntp@ietf.org>; Sat, 20 Jul 2019 22:37:25 +0200 (CEST)
Received: from mail.weinigel.se ([127.0.0.1]) by localhost (mail.weinigel.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L8z2aF3YXD2m for <ntp@ietf.org>; Sat, 20 Jul 2019 22:37:22 +0200 (CEST)
Received: from [127.0.0.1] (localhost [127.0.0.1]) by zoo.weinigel.se (Postfix) with ESMTP id 8A27A1E022B for <ntp@ietf.org>; Sat, 20 Jul 2019 22:37:22 +0200 (CEST)
To: ntp@ietf.org
From: Christer Weinigel <christer@weinigel.se>
Message-ID: <3cd8c65b-a37c-863e-ea2c-2de0a5aeee96@weinigel.se>
Date: Sat, 20 Jul 2019 22:37:22 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ntp/ruLDmeZ8cfIxgWhI9v-AA9DLq0U>
Subject: [Ntp] NTS pool support
X-BeenThere: ntp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ntp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ntp>, <mailto:ntp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ntp/>
List-Post: <mailto:ntp@ietf.org>
List-Help: <mailto:ntp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ntp>, <mailto:ntp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Jul 2019 20:37:30 -0000

Hi all,

there have been some discussions on how to handle NTS pool support but
I don't think there was any definite conclusion to the discussions.

Here's my naive suggestion on how to handle NTS pools.

Considerations
==============

1) A user should not have to do anything to start using NTS if the
    NTP client that comes with the OS adds NTS support.

2) It should be possible for someone to manage a pool of NTS servers
    without having to manage the TLS private keys for each individual
    NTSKE server.

3) Reuse as much existing infrastructure as possible.

Proposal
========

Today an NTP server pool is implemented with A/AAAA records, where a
pool domain name resolves to multiple IP addresses.  For example I've
set up a small pool for my company's domain which looks like this:

$ dig time.weinigel.se a

time.weinigel.se.       86400   IN      A       77.72.227.121
time.weinigel.se.       86400   IN      A       71.19.158.104
time.weinigel.se.       86400   IN      A       37.46.169.123

(these IP-addresses correspond to fpga-lab.sth.netnod.se,
www.weinigel.se and zoo.weinigel.se respectively).

This is not suitable for a pool of NTS servers though, mostly because
the NTSKE server has to have a server certificate which matches the
domain name used to contact the NTSKE server.  This means that each
NTSKE server would have to be issued a certificate with the name of
the pool.  It also makes it trickier to talk to a specific server,
unless the server is issued multi-domain certificates which will
probably be an administrative nightmare when the pool is in part of
one domain and the server is on another domain (such as
time.weinigel.se and the NTSKE server on fpga-lab.sth.netnod.se).

Additionally, there is no way for a client know if a certain IP
address returned by a pool supports NTS or not.  It could try a well
known port (when/if NTS is assigned it's own port, or if NTS will use
tcp/123) but the client would have to attempt a TCP connection to see
if it's supported or not and this would be rather wasteful.  An ad hoc
solution would be to have a naming convention where one looks for the
same domain prefixed with nts, i.e. nts.time.weinigel.se, but that
feels rather ugly IMNSHO.

Luckily, some smart people have already solved most of this problem:
DNS SRV records.  There is a pretty decent explanation of DNS SRV
records on Wikipedia:

https://en.wikipedia.org/wiki/SRV_record

Basically, it's a way of associating services with a domain with a DNS
record which has the form:

_service._proto.name. TTL class SRV priority weight port target.

Just add a "_ntske._tcp" service for each domain that want to add NTS
support.  (I'm not sure if "_ntske" is the best name, maybe it should
be just be "_nts", or "_ntske1" or some other name, and it has to be
registered with IANA anyway.  But I'll use "_ntske" for my example.)

For my domain "time.weinigel.se", two of my NTP servers also support
NTS, and the NTSKE server listens to port 4446 on both machines.  This
means that I have added the following DNS SRV records to my pool:

$ dig _ntske._tcp.time.weinigel.se srv

_ntske._tcp.time.weinigel.se. 86400 IN SRV 1 10 4446 zoo.weinigel.se.
_ntske._tcp.time.weinigel.se. 86400 IN SRV 2 10 4446 fpga-lab.sth.netnod.se.

Since the DNS SRV records are in a completely different namespace than
the A records used for a NTP pool today it won't affect any existing
NTP clients.  A new client which supports NTS can look up the SRV
records to find the target (host) and port for the NTSKE servers.

The best thing about DNS SRV records is that the target (host) is a
fully qualified domain name, which means that when the NTS client
connects to the NTSKE server it will match the name that's present in
the TLS certificate for the server.  This means that the administrator
of the pool does not have to manage the TLS certificates and private
keys, that's up to whoever manages each NTSKE server.

In my case, I manage all of these machines, and have used letsencrypt
to create the TLS certificates for both zoo.weinigel.se and
fpga-lab.sth.netnod.se, but it could just as well have been that case
that it's Netnods operations team that would handle the certificates
for the machine on their network using their preferred CA.

Notes
=====

Microsoft does something similar for plain NTP if you run a skype for
business server.  They use a SRV record for "_ntp._udp" on domains to
point at the NTP server:

https://docs.microsoft.com/en-us/skypeforbusiness/plan-your-deployment/network-requirements/dns

Just for fun, I have really set up time.weinigel.se just the way I
have described above, and created a simple application in Python do do
the SRV record lookup needed in a client:

https://github.com/Netnod/nts-poc-python/blob/master/lookup.py

$ python lookup.py time.weinigel.se

NTP servers
     37.46.169.123
     71.19.158.104
     77.72.227.121

NTS servers
     fpga-lab.sth.netnod.se.weinigel.se:4446
     zoo.weinigel.se.weinigel.se:4446

Policy for load balancing/failover
==================================

Load balancing and failover can be done at multiple levels.  One can
have multiple DNS SRV records in a pool and achieve load balancing
that way.  If someone wants to have multiple NTSKE servers with the
same host name and do load balancing using A/AAAA records for that
name they can do that as long as the TLS certificates match the host
name.  And someone who wants to have multiple NTS/UDP timestamping
servers using the cookies from one or more NTSKE servers which share
the server keys (the keys needed to decrypt the cookie) they can do
that.

It's up to a client how it behaves when it gets multiple responses, or
if there are both DNS SRV and plain old NTP A/AAAA records is up to
the implementation.  I would suggest something like this though:

Treat multiple DNS SRV _ntske._tcp.$DOMAIN records as members of a
pool.  A smart client should try to use as many as it can at the same
time and weigh the responses from the servers.

Treat multiple A/AAAA records for the target pointed out by the DNS
SRV record as alternate addresses for the same server.  The main use
for this is load balancing/failover.  The client should treat all IP
addresses as the responses from the same server and not weigh them
separately.  The client can try all addresses at the same time, or try
one at a time, but should only use the response from one of them.

If the NTPv4 Server Negotiation record contains a host name and the
A/AAAA lookup returns multiple addresses, treat them as alternate
addresses for the same server.  The main use for this is load
balancing / failover.  The client should treat all IP addresses as the
responses from the same server and not weigh them separately.  To be
as robust as possible a client remember the list of IP addresses if
the NTS UDP timestamping server it's currently using goes down and
switch to using one of the other addresses in that case

I believe that this ought to work for just about all cases of pool
management and gives enough flexibility that one can do load balancing
and failover at multiple levels.  But I don't know if I have missed
some use cases here.

TLS Certificate PKI
===================

I don't think we want to build up our own PKI for the TLS certificates
used with NTS, so I'd suggest that we do the same thing everyone does
for HTTPS.  Yes, this means trusting every CA out there, and yes, it
means that the security of the certificates is generally set by the
least secure CA, but that's something I think we'll have to live with.
And it makes life easier, if you know how to get a HTTPS server
certificate you also know how to get a certificate for your NTSKE
server.  We shouldn't requre any special extensions, just plain
"same certs as for HTTPS".

It is possible to raise the security a bit using DNSSEC and TLSA
(RFC6698), just as for HTTPS.  With DNSSEC whoever maintains the DNS
entries for a pool or DNS entries for a a NTSKE server can ensure that
the DNS information is signed all the way from the root servers.  With
TLSA it's possible to put a hash of the NTSKE server certificate (or
CA) in DNS and thus bridge the gap from DNS to TLS session.

A TLSA record for zoo.weinigel.se would look something like this:

_4446._tcp.zoo.weinigel.se. IN TLSA (3 0 0 deadbeef...)

It basically says that deadbeef is a hash of the certificate used for
TLS connections to tcp port 4446 of zoo.weinigel.se

How to use this information is up to a client.  The default would
probably be for a NTP client to use both plain NTP servers and NTS
servers to begin with and only check the certificates against the
normal HTTPS CAs.  A NTS client which understands DNSSEC and TLSA
could add even more weight to NTSKE servers which use those.  A client
could also be configured to require NTSKE, DNSSEC and/or TLSA (maybe
per pool).

Question: Are there any best practices for HTTPS somewhere out there
that we could reuse?  Since NTSKE is a new protocol we could drop any
deprecated features of HTTPS that web browsers still have to support
for compatibility reasons.

Still not solved
================

There's one major problem with all of this.

To be able to validate the certificate of the NTSKE server the client
needs to know time, and to be able to securely get time a client needs
to be able to NTS.  A client which doesn't have any idea at all about
what time it is will have to ignore the time-related checks of the
NTSKE server certificate (or fall back to plain NTP).

This becomes even worse if one uses DNSSEC since DNSSEC requires the
DNS client to know time to within 5 minutes (IIRC) of the DNS server's
time.  Without time it can't do the DNSSEC validation.

A client will need some kind of policy on what to do here.

Thoughts, flames, have I missed anything vital?

   /Christer