Re: [DNSOP] Simplified Updates of DNS Security Trust Anchors, for rolling the root key

On Tue, Jun 30, 2015 at 10:53 AM, Edward Lewis <edward.lewis@icann.org> wrote:
> On 6/30/15, 9:57, "Tony Finch" wrote:
>
>>John Dickinson <jad@sinodun.com> wrote:
>>>
>>> I have been planning to write a draft to address 1 by having validators
>>>send
>>> the DS of known TA's in an edns0 option code. This info, could then be
>>>logged
>>> by the authoritative nameservers.
>>
>>Good idea, though just the key tags should be enough. (I think key
>>management software ensures that tags don't collide.) If you only include
>>the EDNS option when querying for the DNSKEY RRset then that tells the
>>server which zone to the trust anchor key tags belong to.
>
> Is this the right path?  (I'm asking, not driving towards a solution.)
>
> I've been one of the folks struggling with this for some time.
>
> It's true that the key tag does not uniquely identify the key.
> Operationally, most key management tools will discard potential conflicts
> which is good but makes us forget the trivial point mentioned here.
>
> There's a larger set of questions.
>
> Should the source of a trust anchor have the right or ability to ask any
> other (validating, recursive) name server whether it is holding the (any)
> trust anchor?

I think that there is a philosophical and operational difference
between asking other validators what TA they have, and expecting them
to volunteer the information to you.

Having the user of a TA come along and report that it is using the TA
*feels* less intrusive than going off and asking it.

There are also operational issues with asking each TA user, including:
- you have to be able to find them - they may be behind firewalls / NATs
- they may have ACLs that prevent them from being queried by random
people (they really should have this!)
- you could scan for them, but this could be (and often is) viewed as
anti-social. You could simply look through logs and query each address
that asks you -- but a number of resolvers send their outgoing query
on a different interface than they accept queries on
- you could make some addresses "special" and expect resolver
operators to allow queries from this special address, but this is is
kinda icky, still doesn't address FW and NATs, nor finding them
-- this also opens the system to a DoS from A: all or B: the (spoofed)
special address
- you could have probes or sensors all over the place perform the
query and ship that to a central location
-- this requires a lot of additional infrastructure
-- getting probes that reach everywhere is really hard - Atlas is
good, but quite biased to some geographies and networks with clue.
Geoff Huston's Ad network system gets really good coverage, but
requires a bunch of machinery, and still doesn;t get full coverage.

Having the resolver send it's list to the server(s) serving the zone
with the TA is much simpler:
- it feels more opt-in
- the resolver knows where the auth servers are, and can reach them
- it doesn't rely on a bunch of other infrastructure
- gets coverage from resolvers in e.g corporations, remote geographies, etc.

> Should the source of the trust anchor build a process
> relying on the ability to know who has or does not have the trust anchor?

"Did everyone get the new trust anchor?" Well to tell you the truth in
all this excitement I kinda lost track myself. But being this is the
root trust anchor, the most powerful TA in the world and would break
your Internet completely, you've gotta ask yourself one question: "Do
I feel lucky?" Well, do ya, punk?"
  -- with apologies to Dirty Harry.

Ideally you shouldn't have to rely on knowing who has, or who does not
have the trust anchor -- but, without this information you have no way
to predict what the impact of the keyroll will be and who all will be
affected - you are basically flying blind.

>
> ICANN is looking at changing the trust anchor for the root zone.  It would
> be good if, before removing/retiring/revoking the trust anchor in place,
> it was known that everyone out there had the new trust anchor in place.
> That would be perfect.

Yes, it would be perfect -- unfortunately I don't think you'll ever be
able to tell that /everyone/ has the new TA.
But, if someone handed me a black box, and said "Pulling this lever
will make life X% safer for Y% of people, but will significantly
negatively impact Z% of people" I'd want to know what exactly X, Y and
Z are. If I couldn't get better answers than X="some", Y="a bunch" and
Z="not very many... probably" I don't think I'd be able to, in good
conscience, pull the lever. I know that I sure wouldn't want to be in
that situation...

> What's the cost?  First there is identifying all holders of the trust
> anchor.  If it were a NAT-free, IPv4 world, then it would be doable with a
> simple scan of all IP(v4) addresses.  But we have NAT, we have firewalls,
> we have IPv6.  We have techniques folks do to either extend address
> spaces, provide security, improve availability.
>
> If we could manage to contact every DNSSEC validator out there, (so,
> second question,) is it a good idea to let an outsider poke/snoop into
> configuration data?  (At what point does this become Snowden-esque?)

What if people came along and *told* you, instead of you asking? They
way you are not prying / snooping as much -- and people who don't want
to tell you are (implicitly) agreeing to take care of themselves. I
don;t think that someone saying "I'm using TAs #23 and #42 are really
leaking private information to you, the operator of the TA, any more
than them asking "what are the NS for .com?" -- actually, I think it
is much less info that is being leaked...

> Leaving that as an open question.
>
> What's the alternative if it is deemed infeasible or impossible to
> determine when "everyone" has the new trust anchor?  The alternative
> perhaps is to make a best effort to estimate when as many as should know
> do, to inform as many as one can (not having a definitive roster of the
> appropriate audience), to prepare "what to do when something fails"
> documentation, and then go ahead with the change.

Yup. If you cannot determine who all has the new TA, I think all you can do is:
1: Get the word out to as many people as you can, so they can A: get
the new TA and B: build a contingency plan. This means a wide,
intensive communications plan, not just the DNS geeks who happen show
up at OARC and DNSOP - they are not your audience...

3: Do as good a job at estimating the impact as you possibly can. This
includes trying to figure out the distribution of nameserver software
out there, how many users use what set of resolvers, if the top N will
have the new TA, etc. Then make a decision just what the risk vs
reward numbers are -- if you are 90% sure that 99.999% of people will
be fine, is this acceptable? 40% sure that 82% of people will be fine?

4: Figure out what to do if things go pear shaped -- if you estimated
99.999% of people will be OK, but you discover it is only 92%, what do
you do?  If it is unacceptable, how you you roll back -- and then
what?

5: You should collect as much info as possible during the roll, so
next time this isn't as painful...

Or, put another way -- SAC063 (full disclosure: contributor):
Recommendation 1: Internet Corporation for Assigned Names and Numbers
(ICANN) staff, in coordination with the other Root Zone Management
Partners (United States Department of Commerce, National
Telecommunications and Information Administration (NTIA), and
Verisign), should immediately undertake a significant, worldwide
communications effort to publicize the root zone KSK rollover
motivation and process as widely as possible.

Recommendation 3: ICANN staff should lead, coordinate, or otherwise
encourage the creation of clear and objective metrics for acceptable
levels of “breakage” resulting from a key rollover.

Recommendation 4: ICANN staff should lead, coordinate, or otherwise
encourage the development of rollback procedures to be executed when a
rollover has affected operational stability beyond a reasonable
boundary.

Recommendation 5: ICANN staff should lead, coordinate, or otherwise
encourage the collection of as much information as possible about the
impact of a KSK rollover to provide input to planning for future
rollovers.

>
> Is it the responsibility of the source of the trust anchor to avoid all
> risk?

No, it is the responsibility of the source of the trust anchor to be
responsible -- this does not mean avoid all risk, rather *minimize the
risk* and only proceed when the risk vs reward payoff makes sense.

> Or is it the responsibility of the players on the Internet to
> handle all risk?

Ideally yes -- but that is like saying that it is the responsibility
of the person holding the gun not to shoot themselves in the foot.
There are many cases (including children, and myself) who cannot be
trusted with this responsibility. Many people don't even know that
they have the responsibility - they took over from someone else, they
installed a piece of software that enabled this, etc.
People should all understand how their DNS works, and what all the
important bits are -- but, people should also know how their cars work
and what all the important bits in the engine do. Unfortunately
neither of these are true.

>  In my mind, DNSSEC is about protecting the recipient,
> not the publisher of data.  Does that sway where responsibility lies?
>
> This is not a rationalization for irresponsible behavior by a trust anchor
> manager - maintaining trust is larger than security, it includes
> availability/usability/better than the alternatives.  This is a debate in
> my mind about how to manage a change when it's nearly impossible to
> measure and manage all the angles.
>
> PS -  I would love *love* having the ability to detect and measure whether
> running the RFC 5011 process had any impact at all or how much impact it
> has.

Me too!

> Don't get me wrong, more data is helpful.  I'd live to have what I
> can in terms of collectable data for this.

Yah, me too! Luckily (for me!) I'm not the one who has to make any
decisions here. I can sit on the sidelines making snide comments...

>
> PPS - All this having been written, I like the idea of knowing who's
> queried for what in terms of the trust anchors.  Perhaps I'd use the DS
> hash as the owner name (details elided) to avoid the key tag not being
> unique problem.  I wouldn't use TDS to learn keys but just to check on
> whether the trust anchors are up to date - an NXDOMAIN indicating the
> validator's administrator has some work to do.  (Treat this more a
> checksum than error correcting code.)

That's an option. Pulling the keyroll logic out of the draft would
make it shorter and simpler. Not necessarily better, but that's what
these threads are for....

W

>
> _______________________________________________
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop
>

-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf