Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations

Michael StJohns <msj@nthpermutation.com> Wed, 18 October 2017 13:58 UTC

To: Wes Hardaker <wjhns1@hardakers.net>
Cc: dnsop@ietf.org
References: <yblmv5ya5yc.fsf@wu.hardakers.net> <840723c3-899a-f57b-caa1-48f14b3686e8@nthpermutation.com> <yblpo9mepz0.fsf@wu.hardakers.net>
From: Michael StJohns <msj@nthpermutation.com>
Message-ID: <52a489db-11fe-4142-2135-c363f171dc47@nthpermutation.com>
Date: Wed, 18 Oct 2017 09:58:18 -0400
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <yblpo9mepz0.fsf@wu.hardakers.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/1nUrmIWSoD_UA6lYxjHskEr60uo>
Subject: Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations
Precedence: list

Top posting because of the "last call" note.

Still not ready.

You said in 4.1:

  which the principle
     way RFC5011 is currently being used (even though Section 6 of
     RFC5011 suggests having a stand-by key available)


And then in  T-1 you say:

    Note that for simplicity we assume the signer is not pre-signing and
     creating "valid in the future" signature sets that may be stolen and
     replayed even later.


But - "the way RFC5011 is currently being used" is with "pre-signing" 
and "valid in the future" signature sets.   So you want to have your 
cake and eat it too.     You're now both not dealing with the way 5011 
said you should do things nor are you dealing with the way that the root 
actually does things.


All of this can be fixed by going to a wall clock model which is the way 
the publisher works vs the interval model which is only appropriate for 
the client.

You continue to have a problem with "sigExpirationTime".
> The amount of time remaining before any existing
>        RRSIG's Signature Expiration time is reached.  Note that for
>        organizations pre-creating signatures this time may be fairly
>        lengthy unless they can be significantly assured their signatures
>        can not be replayed at a later date.


the problem is:  Amount of time measured from when?     (and it should 
be "latest RRSIG's Signature Expiration time is reached" at least.

> sigExpirationTime will
>        fundamentally be the RRSIG's Signature Expiration time minus the
>        RRSIG's Signature Inception time when the signature is created.
This is no longer fundamentally the difference between one RRSig 
inception and expiration time.  You can't even describe it as the 
difference between the earliest inception and latest expiration because 
that changes as the earlier RRSigs expire.   The only fixed value (and 
easiest value) is "latest expiration date". You could say "latest 
expiration date - now" which then gets added to "now" to get back to 
lastest expiration date.....

Please, please please just move to a wall clock value based on the 
latest expiration date plus appropriate intervals.  All the minor 
twiddles you've done to try to avoid doing this have made the document 
less clear.


NOT READY FOR LAST CALL.

More below.




On 10/16/2017 5:34 PM, Wes Hardaker wrote:
>
> Mike,
>
> Here's some responses to your comments from last time out.  I'm
> only including the ones that needed a response or had an actionable
> item.
>
>
> 1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    specifically bullet 3 was modified to clarify your concerns, but isn't
>    what was in 5011.  You need to have both here.
>
>    + Result: Intro paragraph added to note we're paraphrasing 5011 at a
>      very high level only.
>
>    + MSJ responds:
>
>      Nope - not fixed.  You have "This document discusses the following
>      scenario, which is one of many possible combinations of operations
>      defined in Section 6 of RFC5011:" followed by "3. Begin to
>      exclusively use recently published DNSKEYs to sign the appropriate
>      resource records."
>
>      5011 does not define this as a possible operation.
>
>      Instead: "This document discusses the following scenario which is a
>      combination of the operations of sections 6.1 and 6.2 but ignores
>      the guidance of the first paragraph of section 6 which recommends at
>      least two keys per trust point at all times."
>
>    + Result: changed the sentence in question to:
>
>      This document discusses the following scenario, which the principle
>      way RFC5011 is currently being used (even though Section 6 of
>      RFC5011 suggests having a stand-by key available)

OK.  Not perfect but works.
>
>
> 1.16 FIXED2 Section 5.1.1 - You're missing a *very* important point here -
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    that DNSKEY RRSets may be signed ahead of their use.  You need to
>    assume that once signed, they are available to be published - even by
>    an attacker.  So wherever you have "signature lifetime" you want
>    something like "latest signature expiration of any DNSKEY RRSet not
>    containing the new key" or at least you want to calculate the
>    date/time value based on that.
>
>    + Result: There are two issues here:
>      1. When we discuss the exact requirements for publication, we should
>         be very very clear about the timing requirements.  I agree.
>      2. We're trying to pass on the concept of the attack in this
>         section, not necessarily a description that exactly covers all
>         possibilities.  So, though I'm all for making it as accurate as
>         we can, I don't think we should make the example text so
>         confusing to cover all the corner cases that no one can follow
>         it.

No.  You floated this document as being an attack on 5011 updates. Your 
comments have been that we should concentrate on how 5011 is actually 
used by the root.  So then you need to have an example that actually 
meshes with how the root does things.  The root presigns data.  The 
example should deal with that.

>      3. It doesn't benefit an attacker to publish the signatures ahead of
>         time. So you're right that anyone can publish new signatures, it
>         really doesn't affect the timing required by the publisher to
>         wait, which is the point of this draft.
>      4. The important take away I take from your text is that any delay
>         between signing and publication will affect the length of time to
>         wait, and I'm sure this is what you mean by needing to calculate
>         via wall-clock (since everything should be based on
>         sigexpiration).
>
>    With this goal in mind, I've cleaned up the text a bit to make it
>    a bit more clear.
>
>    + MSJ Responds about point 3:
>
>      The attacker doesn't publish the signatures - the publisher has
>      signatures it won't be using....    the publisher signs stuff way in
>      advance of publication because getting people together and getting
>      the HSM unlocked to sign things is a big huge expensive
>      production. If the publisher doesn't think to modify its signing
>      schedule in advance of a 5011 action, then your interval
>      calculations are less then useless.
>
>    + Fair point, thanks for clarifying.  I've added the following text to
>      5.1.1:
>
>      Note that for simplicity we assume the signer is not pre-signing and
>      creating "valid in the future" signature sets that may be stolen and
>      replayed even later.
>
>      I've also changed the terminology of sigExpirationTime:
>
>      sigExpirationTime: The amount of time remaining before any existing
>      RRSIG's Signature Expiration time is reached.  Note that for
>      organizations pre-creating signatures this time may be fairly
>      lengthy unless they can be significantly assured their signatures
>      can not be replayed at a later date.  sigExpirationTime will
>      fundamentally be the RRSIG's Signature Expiration time minus the
>      RRSIG's Signature Inception time when the signature is created.

See my top posting.
>
>
> 1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 rollover
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    approach (section 6.3).  E.g. K_old (and any RRSets it signed) will be
>    revoked the first time K_new is seen and K_standby is the signing key.
>    At this point this reduces to a normal denial of service attack (where
>    you prevent new data from being retrieved by the resolver).  You'd
>    need a different section to do that analysis. [And thinking about it,
>    why is there any practical difference between this attack and a normal
>    denial of service attack in the first place?]
>
>    + Result: As we've both agreed in the past, the attack described in
>      our 5.1.1 section only applies when you sign exclusively with a key
>      that is too new.  So, yes when you are using K_standby, the attack
>      in question doesn't work.  We're only describing the case where
>      there either isn't a K_standby, or when K_standby is also newer than
>      our 'waitTime' time.
>
>    + And, yes by preventing a new key from being accepted as a trust
>      anchor, this is a denial of service attack.  Though one with
>      potentially serious ramifications since it may require manual
>      intervention on all the devices affected by it (unlikely a
>      network-based DDoS attack, it doesn't stop when the attacker stops
>      sending packets; this is long lived until the configuration is
>      manually fixed).
>
>    + MSJ responds:
>
>      Section 5.1.1 does not apply to preventing a resolver from seeing a
>      revocation.  The calculations are different.   You could add a new
>      section describing the revocation attack, but I think all you need
>      to do is note that at the beginning of 5.1.1 and point to section
>      6.2 as the mitigation math.
>
>    + Result: 5.1.1 does not now say that it applies to revocation and
>      specifically discusses "The timing schedule listed below is based on
>      a Trust Anchor Publisher publishing a new Key Signing Key (KSK),
>      with the intent that it will later become a trust anchor."
OK
>
>
> 1.19 FIXED Section 6 - the formulas are wrong.  I also  don't understand
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>    where you got MAX(originalTTL/2,15 days) - there's no support for this
>    in the text.
>
>    You're misreading the commas.  One of the terms in the outer max
>    clause is "MAX(original TTL of K_old DNSKEY RRSet) / 2".  This is
>    slightly different than what is in the "1/2*OrigTTL" clause in 5011
>    itself.  This is because if the publisher changes TTLs over the course
>    of signing, you have to take the maximum value of any of them, not
>    just the most recent.  (though, to be super-accurate you need to do
>    some math which we might want to describe about when a given TTL is
>    published vs when Knew is introduced).
>
>    Anyway, in the end, the formula in ours draft directly derives from
>    what is in yours.  We do take into account the possibility of multiple
>    TTLs for a given signature set, which 5011 doesn't take into account
>    (and to some extent, it's less important, but only further shows how
>    much variance a resolver might have before accepting a new trust
>    anchor).
>
>    A clear piece of advice for an eventual BCP would be to not change
>    TTLs at the same time you start any 5011 publication or revocation
>    process.
>
>    + MSJ responds:
>
>    Note that in 6.1 you have 5 terms, but in the fully expanded equation
>    in 6.1.6 you have 4.  You're missing the safetyMargin which you didn't
>    actually define completely in section 6.1.5.
>
>    + Result: I think you mean activeRefreshOffset, as safetyMargin was defined
>      (though we had to change it again due to the possibility of
>      extremely short TTLs).  But I have added the expansion of
>      activeRefreshOffset to the equation; thanks for catching that.  I
>      haven't changed the definition since I don't see any missing pieces
>      to it (or to the safetyMargin definition).
>
You've got a new term "(addHoldDownTime % activeRefresh)"  that you 
don't appear to have text to cover the "why".   It's not "safetyMargin" 
as you expanded the last term to match that definition.  Is that 
activeRefreshOffset?   Doesn't match the text at the beginning of 6.1.6 
so I'm confused.


Mike

Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
[DNSOP] Responding to MSJ review of the previous … Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns
Re: [DNSOP] Responding to MSJ review of the previ… Wes Hardaker
Re: [DNSOP] Responding to MSJ review of the previ… Michael StJohns