Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations

Michael StJohns <msj@nthpermutation.com> Fri, 29 September 2017 18:49 UTC

To: dnsop@ietf.org
References: <yblmv5ya5yc.fsf@wu.hardakers.net>
From: Michael StJohns <msj@nthpermutation.com>
Message-ID: <840723c3-899a-f57b-caa1-48f14b3686e8@nthpermutation.com>
Date: Fri, 29 Sep 2017 14:49:06 -0400
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <yblmv5ya5yc.fsf@wu.hardakers.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/iA6ixxHtun78kPNuIyyrWPHQByw>
Subject: Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations
Precedence: list

Comments on -05 and your changes in line.

On 9/13/2017 1:01 PM, Wes Hardaker wrote:
> Mike, after your lengthy last review I went through and carefully made
> sure each of your comments were considered.  Most resulted in changes, a
> few seemed to be just comments and there was nothing to do, and two we
> didn't think were correct.  Below is the summary of the changes in the
> most recent version based on your review from July.
>
> Wes Hardaker
>
>
> Table of Contents
> _________________
>
> 1 Notes / Edits from Mike StJohn on <2017-07-06 Thu>
> .. 1.1 ACCEPTED Use "exclusively" rather than "only" where you can - "only" has
> .. 1.2 ACCEPTED In the introduction  - "validators can update" vs "validators can
> .. 1.3 ACCEPTED In the introduction - same issue with "only recently"
> .. 1.4 ACCEPTED In the introduction - "ceasing publication of a revoked key" vs
> .. 1.5 FIXED Section 1.1 - What is the definition of "rolled"?   Your use of it
> .. 1.6 FIXED Section 2 "which apply to" vs "as required from".
> .. 1.7 FIXED Section 2 "Note that it does not..." - which "it" are you talking about?
> .. 1.8 ACCEPTED Section 2 "using only recently" - same issue as above.
> .. 1.9 ACCEPTED Section 2. New para at "Failure of a DNSKEY..."
> .. 1.10 ACCEPTED Section 2. "... leaving that key in their trust anchor storage
> .. 1.11 ACCEPTED Section 3: Attacker.  "An entity intent..." vs "An attacker
> .. 1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
> .. 1.13 ACCEPTED Section 4.1, bullet 3:  Same problem with "only".  Instead "Begin
> .. 1.14 REJECTED Section 4.2 last para:  This is only an attack if the private key
> .. 1.15 REJECTED Section 6 - general comment.   You're doing interval calculations
> .. 1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
> .. 1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 rollover
> .. 1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
> .. 1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't understand
> .. 1.20 NOTHINGTODO The final formula should be:
> .. 1.21 NOTHINGTODO In any event, the point needs to be made that this attack
> .. 1.22 FIXED The value in 6 regardless of what it is is the wrong value for
<<< skipped lots of stuff that's OK >>>
>
>
> 1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    specifically bullet 3 was modified to clarify your concerns, but isn't
>    what was in 5011.  You need to have both here.
>
>    + Result: Intro paragraph added to note we're paraphrasing 5011 at a
>      very high level only.

Nope - not fixed.  You have "This document discusses the following
scenario, which is one of many possible combinations of operations
defined in Section 6 of RFC5011:" followed by "3. Begin to exclusively 
use recently published DNSKEYs to sign the
appropriate resource records."

5011 does not define this as a possible operation.

Instead: "This document discusses the following scenario which is a 
combination of the operations of sections 6.1 and 6.2 but ignores the 
guidance of the first paragraph of section 6 which recommends at least 
two keys per trust point at all times."


>
>
>
>
>
> 1.14 REJECTED Section 4.2 last para:  This is only an attack if the private key
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    is compromised.  In which case, with only a steady state of a single
>    key, you've got lots of other problems.  Basically, in your one in one
>    out with a steady state of one model, once the current private key is
>    compromised you have no ability to fix the problem.  But getting the
>    numbers for figuring out when this change becomes "sticky" correct is
>    useful.
>
>    + Result: I disagree.  The attack is not just about reusing the stale
>      key beyond it's life time.  The attack in this document describes
>      the ability to affect the state of the validator so that it's state
>      doesn't match the desired state of the Trust Anchor Publisher.
>      You're right, that having that state be out of sync isn't useful to
>      an attacker until they can break the key for the trust anchor.  But
>      an attacker performing this "old state" attack means they have years
>      and years to potentially break the key and introduce fake data into
>      the dns zone years potentially years the future.  But in the end,
>      this *is* an attack against the state of the validator, just not of
>      the severity you allude to above.
Strawman alert.

Um... OK.  If you get to this point then you've (the attacker has) got 
to a) identify those resolvers which have the old key, and b) BREAK THE 
KEY and c) figure out how a 40 year old computer is still working and d) 
was NEVER touched by an administrator in all that time.... given that it 
took you 40 years and $100s of Millions of dollars to break the key.

*sheesh*

>
>
> 1.15 REJECTED Section 6 - general comment.   You're doing interval calculations
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    and you want to do date/time wall clock calculations instead.  While
>    the 5011 values are based on intervals from the RRs get to the
>    validator, the publisher has to be looking at absolute times first
>    (e.g. signature inception and expiration, original TTL) and then
>    deltas from those. [Note, this is NOT a new comment - I've made it
>    previously and strongly]
>
>    + Result: You've failed to convince me the text needs to change.
>      Please propose exact wording (or at least an example) that you feel
>      better serves the purpose.  The wall clock and the interval are
>      mutatable between each other.  We are calculating the interval to
>      wait beyond the publication point (waitTime) which is the same as
>      wallclock_now + waitTime.

Here's the deal.  ICANN today signs a whole group of root DNSKEY RRSets 
once every 3 or 6 months (I forget what the interval is for when they 
meet).  I'm not aware of any real security placed on that data once 
signed, but I would venture to guess its not great since its just signed 
data.

What I would do is look at when the current root KSK rollover process 
began and see what the actual expiration dates are for the old signed 
RRSets and then compare it to your calculations.  I know my "Look at the 
last expiration date of any signed RRSet with the old key in it" gives 
me a good fixed point in time to work from.  I know that your "look at 
the signature interval" doesn't without a lot of additional knowledge.

E.g. given the above is your statement in 1.2 unconditionally true? 
Under which circumstances would it not be true?  Could your document be 
interpreted in a way to make 1.2 not true?


>
>    + That being said, I changed the intro text a bit to be a bit more
>      clear about the fact it starts from the publication time.
>
>      + Based on the conversation in Prague, I'll leave it as is but try
>        to clarify a bit about the issue.
>
>
> 1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    that DNSKEY RRSets may be signed ahead of their use.  You need to
>    assume that once signed, they are available to be published - even by
>    an attacker.  So wherever you have "signature lifetime" you want
>    something like "latest signature expiration of any DNSKEY RRSet not
>    containing the new key" or at least you want to calculate the
>    date/time value based on that.
>
>    + Result: There are two issues here:
>      1. When we discuss the exact requirements for publication, we should
>         be very very clear about the timing requirements.  I agree.
>      2. We're trying to pass on the concept of the attack in this
>         section, not necessarily a description that exactly covers all
>         possibilities.  So, though I'm all for making it as accurate as
>         we can, I don't think we should make the example text so
>         confusing to cover all the corner cases that no one can follow
>         it.
>      3. It doesn't benefit an attacker to publish the signatures ahead of
>         time. So you're right that anyone can publish new signatures, it
>         really doesn't affect the timing required by the publisher to
>         wait, which is the point of this draft.

The attacker doesn't publish the signatures - the publisher has 
signatures it won't be using....    the publisher signs stuff way in 
advance of publication because getting people together and getting the 
HSM unlocked to sign things is a big huge expensive production. If the 
publisher doesn't think to modify its signing schedule in advance of a 
5011 action, then your interval calculations are less then useless.

>      4. The important take away I take from your text is that any delay
>         between signing and publication will affect the length of time to
>         wait, and I'm sure this is what you mean by needing to calculate
>         via wall-clock (since everything should be based on
>         sigexpiration).
>
>    XXX: With this goal in mind, I've cleaned up the text a bit to make it
>    a bit more clear.
>
>
> 1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 rollover
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    approach (section 6.3).  E.g. K_old (and any RRSets it signed) will be
>    revoked the first time K_new is seen and K_standby is the signing key.
>    At this point this reduces to a normal denial of service attack (where
>    you prevent new data from being retrieved by the resolver).  You'd
>    need a different section to do that analysis. [And thinking about it,
>    why is there any practical difference between this attack and a normal
>    denial of service attack in the first place?]
>
>    + Result: As we've both agreed in the past, the attack described in
>      our 5.1.1 section only applies when you sign exclusively with a key
>      that is too new.  So, yes when you are using K_standby, the attack
>      in question doesn't work.  We're only describing the case where
>      there either isn't a K_standby, or when K_standby is also newer than
>      our 'waitTime' time.
>
>    + And, yes by preventing a new key from being accepted as a trust
>      anchor, this is a denial of service attack.  Though one with
>      potentially serious ramifications since it may require manual
>      intervention on all the devices affected by it (unlikely a
>      network-based DDoS attack, it doesn't stop when the attacker stops
>      sending packets; this is long lived until the configuration is
>      manually fixed).

Section 5.1.1 does not apply to preventing a resolver from seeing a 
revocation.  The calculations are different.   You could add a new 
section describing the revocation attack, but I think all you need to do 
is note that at the beginning of 5.1.1 and point to section 6.2 as the 
mitigation math.

>
>
> 1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    the signature validity... " - Two items: Wordsmithing: The hold down
>    time is just the 30 days, not the plus 1/2...  which I *think* given
>    the reference to 2.3 is actually the queryInterval. clarify please.
>    And queryInterval is not actually 1/2 the signature validity - its the
>    MIN (15 days, 10 days (sig life)/2 and 1 day(orig ttl)/2) or 1/2 a
>    day.
>
>    + Result: you're misreading the sentence thinking the holddown time
>      includes the math with the + and onward, where the holddown time was
>      only the 30 minutes.  I'll reword and though I was trying to avoid
>      the much more complex math in the example, you're correct that it's
>      1/2 the queryInterval which has more terms to calculate it exactly.
The new language works.

>
>
> 1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't understand
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    where you got MAX(originalTTL/2,15 days) - there's no support for this
>    in the text.
>
>    You're misreading the commas.  One of the terms in the outer max
>    clause is "MAX(original TTL of K_old DNSKEY RRSet) / 2".  This is
>    slightly different than what is in the "1/2*OrigTTL" clause in 5011
>    itself.  This is because if the publisher changes TTLs over the course
>    of signing, you have to take the maximum value of any of them, not
>    just the most recent.  (though, to be super-accurate you need to do
>    some math which we might want to describe about when a given TTL is
>    published vs when Knew is introduced).
>
>    Anyway, in the end, the formula in ours draft directly derives from
>    what is in yours.  We do take into account the possibility of multiple
>    TTLs for a given signature set, which 5011 doesn't take into account
>    (and to some extent, it's less important, but only further shows how
>    much variance a resolver might have before accepting a new trust
>    anchor).
>
>    A clear piece of advice for an eventual BCP would be to not change
>    TTLs at the same time you start any 5011 publication or revocation
>    process.

Note that in 6.1 you have 5 terms, but in the fully expanded equation in 
6.1.6 you have 4.  You're missing the safetyMargin which you didn't 
actually define completely in section 6.1.5.
>
>
> 1.20 NOTHINGTODO The final formula should be:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    EarliestDateWhereAttackFails::= latest SignatureExpiration of any
>    DNSKEY RRSet not containing the new key
>
>    [ ... alternate math analysis deleted for brevity ... ]
>
>    + Result: there are multiple ways to list out the math / timing behind
>      this.  You do spell out an alternate way to lay out the math; I
>      believe your purpose was not to offer a replacement but to ensure we
>      agree upon the timing.  I do see that you're using wall-clock type
>      math, which certainly works and should be equivalent to your example
>      start tiem of 5/1/2017 00:00UTC + our waitTime.
>
>    + I do see your point that anyone could publish the data once signed,
>      not just the

Then please write this document to assume people are stupid in how they 
do things (e.g. signing a lot of data that then doesn't get used...)
>
>
> 1.21 NOTHINGTODO In any event, the point needs to be made that this attack
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    - while real -
>    is a "retail" attack that would be difficult to prosecute by a single
>    attacker across a broad range of end-entities.  This goes to the
>    general model that no publisher of DNS data knows each and every
>    consumer of that data, nor can wait its publication on every consumer
>    getting published data.  The data protocol for DNS is unidirectional
>    and the update protocol in 5011 was designed with that in mind.
>
>    + Result: We never state in the document that the attack was against
>      every validator out there.  It affects only the validators that
>      received replayed key sets and signatures.  We don't dive into the
>      analysis of how difficult it is to achieve such a replay, either
>      directly or by cache poisoning upstream resolvers, or....  That's
>      outside the scope of the document.  Thus, I don't see any change to
>      the text that is needed as I think the text is pretty clear that a
>      replay attack is already necessary in the attack model.

OK.   Section 5 first para should probably be more clear on this, but I 
can live with it.

Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
[DNSOP] Responding to MSJ review of the previous … Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns