[DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations

Wes Hardaker <wjhns1@hardakers.net> Wed, 13 September 2017 17:01 UTC

From: Wes Hardaker <wjhns1@hardakers.net>
To: dnsop@ietf.org
Date: Wed, 13 Sep 2017 10:01:15 -0700
Message-ID: <yblmv5ya5yc.fsf@wu.hardakers.net>
User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/dXHOzftd7Cd8ofpAIHosHkeYjho>
Subject: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations
Precedence: list

Mike, after your lengthy last review I went through and carefully made
sure each of your comments were considered.  Most resulted in changes, a
few seemed to be just comments and there was nothing to do, and two we
didn't think were correct.  Below is the summary of the changes in the
most recent version based on your review from July.

Wes Hardaker


Table of Contents
_________________

1 Notes / Edits from Mike StJohn on <2017-07-06 Thu>
.. 1.1 ACCEPTED Use "exclusively" rather than "only" where you can - "only" has
.. 1.2 ACCEPTED In the introduction  - "validators can update" vs "validators can
.. 1.3 ACCEPTED In the introduction - same issue with "only recently"
.. 1.4 ACCEPTED In the introduction - "ceasing publication of a revoked key" vs
.. 1.5 FIXED Section 1.1 - What is the definition of "rolled"?   Your use of it
.. 1.6 FIXED Section 2 "which apply to" vs "as required from".
.. 1.7 FIXED Section 2 "Note that it does not..." - which "it" are you talking about?
.. 1.8 ACCEPTED Section 2 "using only recently" - same issue as above.
.. 1.9 ACCEPTED Section 2. New para at "Failure of a DNSKEY..."
.. 1.10 ACCEPTED Section 2. "... leaving that key in their trust anchor storage
.. 1.11 ACCEPTED Section 3: Attacker.  "An entity intent..." vs "An attacker
.. 1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
.. 1.13 ACCEPTED Section 4.1, bullet 3:  Same problem with "only".  Instead "Begin
.. 1.14 REJECTED Section 4.2 last para:  This is only an attack if the private key
.. 1.15 REJECTED Section 6 - general comment.   You're doing interval calculations
.. 1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
.. 1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 rollover
.. 1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
.. 1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't understand
.. 1.20 NOTHINGTODO The final formula should be:
.. 1.21 NOTHINGTODO In any event, the point needs to be made that this attack
.. 1.22 FIXED The value in 6 regardless of what it is is the wrong value for


1 Notes / Edits from Mike StJohn on <2017-07-06 Thu>
====================================================

1.1 ACCEPTED Use "exclusively" rather than "only" where you can - "only" has
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  some issues with misunderstanding.  E.g. in the abstract you have
  "only recently" which can be interpreted applying to "recently" rather
  than meaning "exclusively" with " recently added DNSKEYs"


  Instead in the abstract: "... before exclusively signing with recently
  added DNSKEYs"

  + Result: I don't object to the wording change, so I made it.  I don't
    think it was needed as only is pretty concrete in understanding
    everywhere I've seen it.


1.2 ACCEPTED In the introduction  - "validators can update" vs "validators can
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  extend".

  + Result:


1.3 ACCEPTED In the introduction - same issue with "only recently"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  + Result:


1.4 ACCEPTED In the introduction - "ceasing publication of a revoked key" vs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  "removing a key from a zone"

  + Result:


1.5 FIXED Section 1.1 - What is the definition of "rolled"?   Your use of it
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  doesn't match 5011s and it's not defined either here or in 4033 or .
  Also "before signing a zone" - I don't think this is exactly what you
  asked them and the summary is probably inaccurate.

  + Result: rolled -> introduced
  + Result: added "exclusively" to fix the other aspect


1.6 FIXED Section 2 "which apply to" vs "as required from".
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  + Result: to just "from" as the other words are more confusing


1.7 FIXED Section 2 "Note that it does not..." - which "it" are you talking about?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  + Result: "this document"


1.8 ACCEPTED Section 2 "using only recently" - same issue as above.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  + Result:


1.9 ACCEPTED Section 2. New para at "Failure of a DNSKEY..."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  + Result:


1.10 ACCEPTED Section 2. "... leaving that key in their trust anchor storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  beyond the key's expected lifetime." vs "... leaving a key in their
  trust anchor storage beyond their expected lifetime" -"their" applies
  to two different things

  + Result:


1.11 ACCEPTED Section 3: Attacker.  "An entity intent..." vs "An attacker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  intent..." - self referencing definition problem

  + Result:


1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  specifically bullet 3 was modified to clarify your concerns, but isn't
  what was in 5011.  You need to have both here.

  + Result: Intro paragraph added to note we're paraphrasing 5011 at a
    very high level only.


1.13 ACCEPTED Section 4.1, bullet 3:  Same problem with "only".  Instead "Begin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  to exclusively use recently published DNSKEYs..."

  + Result:


1.14 REJECTED Section 4.2 last para:  This is only an attack if the private key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  is compromised.  In which case, with only a steady state of a single
  key, you've got lots of other problems.  Basically, in your one in one
  out with a steady state of one model, once the current private key is
  compromised you have no ability to fix the problem.  But getting the
  numbers for figuring out when this change becomes "sticky" correct is
  useful.

  + Result: I disagree.  The attack is not just about reusing the stale
    key beyond it's life time.  The attack in this document describes
    the ability to affect the state of the validator so that it's state
    doesn't match the desired state of the Trust Anchor Publisher.
    You're right, that having that state be out of sync isn't useful to
    an attacker until they can break the key for the trust anchor.  But
    an attacker performing this "old state" attack means they have years
    and years to potentially break the key and introduce fake data into
    the dns zone years potentially years the future.  But in the end,
    this *is* an attack against the state of the validator, just not of
    the severity you allude to above.


1.15 REJECTED Section 6 - general comment.   You're doing interval calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  and you want to do date/time wall clock calculations instead.  While
  the 5011 values are based on intervals from the RRs get to the
  validator, the publisher has to be looking at absolute times first
  (e.g. signature inception and expiration, original TTL) and then
  deltas from those. [Note, this is NOT a new comment - I've made it
  previously and strongly]

  + Result: You've failed to convince me the text needs to change.
    Please propose exact wording (or at least an example) that you feel
    better serves the purpose.  The wall clock and the interval are
    mutatable between each other.  We are calculating the interval to
    wait beyond the publication point (waitTime) which is the same as
    wallclock_now + waitTime.

  + That being said, I changed the intro text a bit to be a bit more
    clear about the fact it starts from the publication time.

    + Based on the conversation in Prague, I'll leave it as is but try
      to clarify a bit about the issue.


1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  that DNSKEY RRSets may be signed ahead of their use.  You need to
  assume that once signed, they are available to be published - even by
  an attacker.  So wherever you have "signature lifetime" you want
  something like "latest signature expiration of any DNSKEY RRSet not
  containing the new key" or at least you want to calculate the
  date/time value based on that.

  + Result: There are two issues here:
    1. When we discuss the exact requirements for publication, we should
       be very very clear about the timing requirements.  I agree.
    2. We're trying to pass on the concept of the attack in this
       section, not necessarily a description that exactly covers all
       possibilities.  So, though I'm all for making it as accurate as
       we can, I don't think we should make the example text so
       confusing to cover all the corner cases that no one can follow
       it.
    3. It doesn't benefit an attacker to publish the signatures ahead of
       time. So you're right that anyone can publish new signatures, it
       really doesn't affect the timing required by the publisher to
       wait, which is the point of this draft.
    4. The important take away I take from your text is that any delay
       between signing and publication will affect the length of time to
       wait, and I'm sure this is what you mean by needing to calculate
       via wall-clock (since everything should be based on
       sigexpiration).

  XXX: With this goal in mind, I've cleaned up the text a bit to make it
  a bit more clear.


1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 rollover
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  approach (section 6.3).  E.g. K_old (and any RRSets it signed) will be
  revoked the first time K_new is seen and K_standby is the signing key.
  At this point this reduces to a normal denial of service attack (where
  you prevent new data from being retrieved by the resolver).  You'd
  need a different section to do that analysis. [And thinking about it,
  why is there any practical difference between this attack and a normal
  denial of service attack in the first place?]

  + Result: As we've both agreed in the past, the attack described in
    our 5.1.1 section only applies when you sign exclusively with a key
    that is too new.  So, yes when you are using K_standby, the attack
    in question doesn't work.  We're only describing the case where
    there either isn't a K_standby, or when K_standby is also newer than
    our 'waitTime' time.

  + And, yes by preventing a new key from being accepted as a trust
    anchor, this is a denial of service attack.  Though one with
    potentially serious ramifications since it may require manual
    intervention on all the devices affected by it (unlikely a
    network-based DDoS attack, it doesn't stop when the attacker stops
    sending packets; this is long lived until the configuration is
    manually fixed).


1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  the signature validity... " - Two items: Wordsmithing: The hold down
  time is just the 30 days, not the plus 1/2...  which I *think* given
  the reference to 2.3 is actually the queryInterval. clarify please.
  And queryInterval is not actually 1/2 the signature validity - its the
  MIN (15 days, 10 days (sig life)/2 and 1 day(orig ttl)/2) or 1/2 a
  day.

  + Result: you're misreading the sentence thinking the holddown time
    includes the math with the + and onward, where the holddown time was
    only the 30 minutes.  I'll reword and though I was trying to avoid
    the much more complex math in the example, you're correct that it's
    1/2 the queryInterval which has more terms to calculate it exactly.


1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't understand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  where you got MAX(originalTTL/2,15 days) - there's no support for this
  in the text.

  You're misreading the commas.  One of the terms in the outer max
  clause is "MAX(original TTL of K_old DNSKEY RRSet) / 2".  This is
  slightly different than what is in the "1/2*OrigTTL" clause in 5011
  itself.  This is because if the publisher changes TTLs over the course
  of signing, you have to take the maximum value of any of them, not
  just the most recent.  (though, to be super-accurate you need to do
  some math which we might want to describe about when a given TTL is
  published vs when Knew is introduced).

  Anyway, in the end, the formula in ours draft directly derives from
  what is in yours.  We do take into account the possibility of multiple
  TTLs for a given signature set, which 5011 doesn't take into account
  (and to some extent, it's less important, but only further shows how
  much variance a resolver might have before accepting a new trust
  anchor).

  A clear piece of advice for an eventual BCP would be to not change
  TTLs at the same time you start any 5011 publication or revocation
  process.


1.20 NOTHINGTODO The final formula should be:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  EarliestDateWhereAttackFails::= latest SignatureExpiration of any
  DNSKEY RRSet not containing the new key

  [ ... alternate math analysis deleted for brevity ... ]

  + Result: there are multiple ways to list out the math / timing behind
    this.  You do spell out an alternate way to lay out the math; I
    believe your purpose was not to offer a replacement but to ensure we
    agree upon the timing.  I do see that you're using wall-clock type
    math, which certainly works and should be equivalent to your example
    start tiem of 5/1/2017 00:00UTC + our waitTime.

  + I do see your point that anyone could publish the data once signed,
    not just the


1.21 NOTHINGTODO In any event, the point needs to be made that this attack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - while real -
  is a "retail" attack that would be difficult to prosecute by a single
  attacker across a broad range of end-entities.  This goes to the
  general model that no publisher of DNS data knows each and every
  consumer of that data, nor can wait its publication on every consumer
  getting published data.  The data protocol for DNS is unidirectional
  and the update protocol in 5011 was designed with that in mind.

  + Result: We never state in the document that the attack was against
    every validator out there.  It affects only the validators that
    received replayed key sets and signatures.  We don't dive into the
    analysis of how difficult it is to achieve such a replay, either
    directly or by cache poisoning upstream resolvers, or....  That's
    outside the scope of the document.  Thus, I don't see any change to
    the text that is needed as I think the text is pretty clear that a
    replay attack is already necessary in the attack model.


1.22 FIXED The value in 6 regardless of what it is is the wrong value for
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  revocation.  revocationPublicationWaitTime is basically
  EarliestDateAttackFails + queryInterval + slop.  Revocations take
  place immediately. You can delay them only as long as you have old
  valid signed RRSets.

  + Result: wording changed to addWaitTime and remWaitTime, and math
    adjusted to drop the 30 day part of the timer.  Thanks.


-- 
Wes Hardaker
USC/ISI

Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
[DNSOP] Responding to MSJ review of the previous … Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns