[DNSOP] One review of draft-huston-kskroll-sentinel-04.txt

Edward Lewis <edward.lewis@icann.org> Tue, 21 November 2017 15:13 UTC

Return-Path: <edward.lewis@icann.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 298071294B5 for <dnsop@ietfa.amsl.com>; Tue, 21 Nov 2017 07:13:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id eMdR_KL2zdND for <dnsop@ietfa.amsl.com>; Tue, 21 Nov 2017 07:13:47 -0800 (PST)
Received: from out.west.pexch112.icann.org (pfe112-ca-2.pexch112.icann.org []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C31211294BD for <dnsop@ietf.org>; Tue, 21 Nov 2017 07:13:47 -0800 (PST)
Received: from PMBX112-W1-CA-1.pexch112.icann.org ( by PMBX112-W1-CA-2.pexch112.icann.org ( with Microsoft SMTP Server (TLS) id 15.0.1178.4; Tue, 21 Nov 2017 07:13:45 -0800
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([]) by PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([]) with mapi id 15.00.1178.000; Tue, 21 Nov 2017 07:13:45 -0800
From: Edward Lewis <edward.lewis@icann.org>
To: "dnsop@ietf.org" <dnsop@ietf.org>
Thread-Topic: One review of draft-huston-kskroll-sentinel-04.txt
Thread-Index: AQHTYttSrJ/bE43XFUq/BTHAuey91w==
Date: Tue, 21 Nov 2017 15:13:45 +0000
Message-ID: <802F3877-2156-4C2C-A7D2-9D996381E8D2@icann.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
user-agent: Microsoft-MacOutlook/f.27.0.171010
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: []
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha1"; boundary="B_3594104024_1353469530"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/NOUvnRIYyL1czLOOpou4iA7D9H4>
Subject: [DNSOP] One review of draft-huston-kskroll-sentinel-04.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Nov 2017 15:13:50 -0000

A review of: https://tools.ietf.org/html/draft-huston-kskroll-sentinel-04

This is not a blow-by-blow, nit picking review, but tries to dive into archtecture level issues:

1. I don't think the Root Zone should be specifically called out, this mechanism ought to work for any domain name.

The Intro has an example:

## store.  In particular, this response mechanism can be used to
## determine whether a certain Root Zone KSK is ready to be used as a
## trusted key within the context of a key roll by this resolver.

As an example, sure, but there seems to be the confusion that "the root is special" when it comes to the management of trust anchors, that notion, that any name is special, ought to be put to rest.

Not all deployments of the DNS protocol need to have the same namespace, I used to work with an operational inter-network with its own "everything".

The Root Zone is mentioned in a few places in the document, I haven't seen that it needs to be "called out" for this proposal, whatever the final result is, it should work anywhere in any DNS tree.

2. Need to reserve labels according to regular expression

All labels matching <_is-ta-*> and <_not-ta-*> would have to be reserved to prevent a collision with a configured name.  As we don't know the future key tags we have to clear them all out.  For the root zone, that's a swath of TLDs. The same is true for all names (the root is not special!) where some validator has a trust anchor. (That raises an interesting point - how does a zone operator know whether their zone apex is considered to be a point of trust?  This is an open link when thinking of Automated Updates of DNSSEC Trust Anchors as a protocol between two entities.)  There is no in-DNS-band indication by a zone administrator that they expect to implement timings compatible with STD 69, a trust anchor manager needs out of band indications (and updates).

Note that we don't have a protocol definition for matching "partial" labels.  Yes, that's a problem.

3. Structure of the draft

The draft covers two things - the query/response protocol and the use of the results in a test.  This I find confusing:

## 3.  Sentinel Processing
##    This proposed test that uses the DNS resolver mechanism described in

Until I got to that line, I was expecting that this document covered only the query/response protocol.  I'll structure the rest of the review into "query/response" and then "use of that in testing."

4. The query/response protocol

I'm a little unclear, from the description, what is happening at the query/response level.

Let's say (for the sake of this email), I want to ask whether my resolver (which is a fuzzy statement) has a trust anchor for example.com, key_id=0x4034. I would then send a query for (_is-ta-4034{*}.example.com./IN/A), flags CD off and EDNS option DO, and expect a response of either:

-1. A response with a return code of SERVFAIL (value=2), which would imply that either the responder is saying "no" to the trust anchor status of the key or that there is data matching the query but it has either failed DNSSEC validation or the servers authoritative for the name could not (ultimately) be reached.

-2. A positive response with data, indicating that the responder treated this as a normal query and happened to have query-matching data to return.

-3. A negative response, either a name error (return code=NXDOMAIN) or an empty answer section, indicating the queried name exists and has other data.  In either case, the responder is treating the query as "normal."

-4. An indication that the key is held as trusted by the responder.  This is where I'm lost - what is returned?  The draft says:

##    ..., then the resolver should return a response indicating
##    that the response contains authenticated data according to section
##    5.8 of [RFC6840].  ...

Dereferencing the pointer, there's mumbo-jumbo about the AD bit value in the response.

Perhaps my confusion begins here, in earlier text in "Sentinel Mechanism":

##    If the outcome of the DNS response validation process indicates that
##    the response is authentic, and if the left-most label of the original
##    query name matches the template "_is-ta-<tag-index>.", then the

DNSSEC validation is not performed on responses, it is performed on RRsets.  DNSSEC does not check the headers, does not make a blanket statement over the sections (answer, authority, etc.), does not cover the EDNS stuff (OPT record in the additional), etc.  This sentinel mechanism has to have an answer, and it be signed (up to some trust anchor on the responder) for there to be validation.  Would this be an IP address (v4/v6 for A/AAAA as appropriate)?  What would that address be, given this is not meant to "flow data"?  ( for IPv4, IPv6 only has ::1/128 for local.)

For the negative case, I'm more at sea.  For _not-ta-1111.example.com. to be able to return a set that DNSSEC validation can approve there would need to be a - as yet not defined - wildcard that matches prefixes of labels.  There would have to be fake address records for _not-ta-*.example.com. for all values that didn't match held trust anchors.  Again, with the validator determining the existence of a trust anchor and not the authority, this is hard to generate at the zone administration's DNSSEC signing time.

One factor not covered is the setting of the recursion desired bit, and this might be useful.  By clearing the RD bit (RD=0), the responder ought to be consulting only it's local information.  For responders unaware of this feature, they'd only consult their local cache, which ought to never have an entry for the queried name (expecting that the authority neither configures the name, with or without a delegation, nor has a wildcard).

With a cleared RD bit in the query handled by a responder that is aware of this mechanism, the answer would be something generated from data in the responder's trust anchor data store.  For a positive response to a positive query (i.e., yes to _is-ta-*) if something is to be sent back signed, a valid signature is needed.  For a negative response to a positive query, ... I don't know.

What if the query/response relies on having RD=0 in the query, hoping the responder does not send the query onward (recursion or forwarding), and we use some form of transaction security (like TSIG) for hop-by-hop security?

For unaware responders, they'd treat the query as one for an address record.  Unless there's a matching name in the cache, they'd never return a data set.  Quick check: BIND 9.9.5 returns a referral to the root, unbound 1.5.8 responds with return code of REFUSED.  Different answers from different implementations, hmmm, would make this a bit more difficult.

To me, the basics of the query/response mechanism aren't very clear, an example would certainly help.  (And not one sitting at the root!)

Consider how the flags in the response might be used - particularly the recursion desired bit.

5. The "test"

In the sense of "walk before you crawl", start with the description of queries and responses to specific targets as a network manager would do.  To accurately use the results, one would need to know the specifics of DNS query husbandry in place.  This is something only the network manager could know.

First, target at an IP address.  There may be one or more responders.  There could be a load balancer before a number of independent processes answering on port 53. There could be an anycast constellation of servers with some routing instability.  TSIG could be used to "secure" the exchange.

Next, any single end-consumer of DNS services will likely have a list of IP addresses to target.  The different IP addresses may have different network managers running the DNS service (i.e., some do DNSSEC validation, some don't, some have special trust anchors, etc.).  That would lead to uncertainty in trying to "convert symptoms into a diagnosis."

There are two ways to test.  One assumes that the tester is the manager, one with knowledge of the layout.  A significant ingredient is that the tester, in this case, would have a list of IP addresses to specifically ask, knowing the use of anycast and load balancing.  The other assumes that it is an end-user view, one that does not have the list of IP addresses to target, does not have knowledge of the layout, and is only able to have a black-box (opaque) view of the DNS as a service.

For the former, with a well-designed query/response mechanism, scripts can be written to verify that configurations are as expected.  Managing anycast deployments could use "service addresses" or an appropriately distributed monitoring source network.  Load balancers can be handled too.  (Eliding ways of doing that.)  And TSIG is an option (for security).

For the latter, which I infer is the ultimate intent of the draft, that is, to estimate when a (as opposed to "the") Root Zone KSK rollover (being specific here to the root zone on the global public Internet) can proceed.  There's been experience here, leading to the notion that the results would give a coarse approximation in aggregate, as the way in which DNS recursion is done is quite complex.  Given that the goal is to measure the trust anchors configured in individual validation engines via testing of units that see the DNS as a black box service, a coarse approximation is all we can hope for - that that this is bad but we have to begin to evaluate where the returns diminish.

Summary -

The document has a laudable goal.  The first suggestion is to structure it in layers.  Clean up the query/response section to describe the footwork of the effort, I'd suggest experimenting with the idea of using RD=0, and figuring out what a trust anchor store can originate as a response (as the response is not originating from the administration authoritative for the name).

When it comes to the use of the footwork, divide this into how a network manager could incorporate this into a trust anchor maintenance activity and then into a more widespread, third-party, measurement of trust anchor deployment.

Other considerations, well beyond the scope of the subject line's document - revamp the Automated Updates of DNSSEC Trust Anchors from a validator-side mechanism into a two-party protocol, with the goal of increased manageability (monitoring, measurement) and, as something not yet raised, more efficient in sizes of network responses (e.g., make MISSING an expected state and then manage around that).