[dnsop] Re: comments on dnsop draft

Edward Lewis <Ed.Lewis@neustar.biz> Wed, 08 June 2005 18:22 UTC

Received: from darkwing.uoregon.edu (root@darkwing.uoregon.edu [128.223.142.13]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA23304 for <dnsop-archive@lists.ietf.org>; Wed, 8 Jun 2005 14:22:59 -0400 (EDT)
Received: from darkwing.uoregon.edu (majordom@localhost [127.0.0.1]) by darkwing.uoregon.edu (8.13.4/8.13.4) with ESMTP id j58H6dsk007529; Wed, 8 Jun 2005 10:06:39 -0700 (PDT)
Received: (from majordom@localhost) by darkwing.uoregon.edu (8.13.4/8.13.4/Submit) id j58H6ddX007518; Wed, 8 Jun 2005 10:06:39 -0700 (PDT)
Received: from ogud.com (ns.ogud.com [66.92.146.160]) by darkwing.uoregon.edu (8.13.4/8.13.4) with ESMTP id j58H6apW007339 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NOT) for <dnsop@lists.uoregon.edu>; Wed, 8 Jun 2005 10:06:37 -0700 (PDT)
Received: from [192.168.1.101] (ns.ogud.com [66.92.146.160]) by ogud.com (8.12.11/8.12.11) with ESMTP id j58H6M15065674; Wed, 8 Jun 2005 13:06:23 -0400 (EDT) (envelope-from Ed.Lewis@neustar.biz)
Mime-Version: 1.0
Message-Id: <a06200701becc9faab7e9@[192.168.1.101]>
In-Reply-To: <20050608124259.221219ee.olaf@ripe.net>
References: <a0620070bbeca61cb60a0@[10.31.32.108]> <20050608124259.221219ee.olaf@ripe.net>
Date: Wed, 08 Jun 2005 13:03:57 -0400
To: "Olaf M. Kolkman" <olaf@ripe.net>
From: Edward Lewis <Ed.Lewis@neustar.biz>
Subject: [dnsop] Re: comments on dnsop draft
Cc: Edward Lewis <Ed.Lewis@neustar.biz>, miek@nlnetlabs.nl, OKolkman@ripe.net, dnsop@lists.uoregon.edu
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-Scanned-By: MIMEDefang 2.51 on 66.92.146.160
Sender: owner-dnsop@lists.uoregon.edu
Precedence: bulk
Reply-To: Edward Lewis <Ed.Lewis@neustar.biz>

At 12:42 +0200 6/8/05, Olaf M. Kolkman wrote:

>>
>>  #1.2  Time Definitions
>>
>>  #   o  "Key effectivity period"
>>  #         The period which a key pair is expected to be effective.  This
>>  #         period is defined as the time between the first inception time
>>  #         stamp and the last expiration date of any signature made with
>>  #         this key.
>>  #         The key effectivity period can span multiple signature validity
>>  #         periods.
>>
>>  Can it be discontinuous?  I.e., only on Tuesdays and Thursdays in May?
>
>I do not know what the "formal definition" is.
>
>But I take it to be that even when signatures made with this key
>appear with a sig-inception Tuesday 00:00 and sig-expiration Tuesday
>23:59 and signatures with sig-inception Thursday 00:00 and
>sig-expiration Thursday 23:59, the Key effectivitiy period would be
>Tuesday 00:00 to Thursday 23:49.
>
>If there is a better way of phrasing this definition as appears now I
>will need text.

Formal or not, I think you want the effectivity period to be from 
first signing to last (useful) time, regardless of discontinuities. 
It's just simpler.

Of course, if you are likely to have keys that are in use in a 
discontinuous manner, you are probably not a conventional 
administrator.

>----------------------------------------------------------------------
>
>
>>>  #   o  "Maximum/Minimum Zone TTL"
>>  #         The maximum or minimum value of the TTLs from the complete set
>>  #         of RRs in a zone.
>>
>>  This has nothing to do with the last number in the SOA RR, right?  Right?
>>
>
>Correct you take all the TTL values in the zone and take the minimum
>and the maximum of that set. Is the text that is there now ambiguous?

The problem is this text from 1035:

#RFC 1035        Domain Implementation and Specification    November 1987
#
#MINIMUM         The unsigned 32 bit minimum TTL field that should be
#                exported with any RR from this zone.

Maybe you should make an explicit reference that the Minimum Zone TTL 
may be less that the MINIMUM field in the SOA RR.

>----------------------------------------------------------------------
>
>
>>  #2.  Keeping the Chain of Trust Intact
>>
>>  #   For the verifying clients it is important that data from secured
>>  #   zones can be used to build chains of trust regardless of whether the
>>  #   data came directly from an authoritative server, a caching nameserver
>>  #   or some middle box.  Only by carefully using the available timing
>>  #   parameters can a zone administrator assure that the data necessary
>>  #   for verification can be obtained.
>>
>>  I don't think that the last sentence is right.  If an admin is 
>>unaware of the
>>  timing parameters, data will only be delayed.
>
>Let me try to reprhasese that last sentence.
>
>Only by carefully timing their actions zone administrators can
>assure that the data necessary for verification can be obtained by
>validating clients.

I think that the sentence is too ambiguous to be useful.  "Carefully 
timing" sounds like there is much to be fearful of, without defining 
the concerns.

How about...

DNS has inherent delays in the propagation of data, e.g., from the 
master to slave, caches timing out old copies before retrieving fresh 
authoritative data.  Because of this administrators need to prepare 
for changes in advance and sequence actions appropriately to achieve 
smooth transitions.  Preparations include adjusting the TTL of 
records and RRSIG expirations.  Sequencing has to account for the 
worst case scenarios of data laying in compliant caches.

...that should be a high level description of the problem.

>----------------------------------------------------------------------
>
>>  It might be good to note that
>>  the time from master to slave is negligible when using NOTIFY and IXFR,
>>  increasing by reliance on AXFR, and more if you rely on the SOA timing
>>  parameters for zone refresh.  (Non-standard means of zone transfers have
>>  other timing concerns.)  When it comes to freshness of data within caches,
>>  the TTL is the only pertinent parameter, with a shorter setting increasing
>>  freshness at the cost of fewer cache "hits."
>
>I propose to modify the 3rd paragraph of section 2 by appending your
>text:
>
>    Administrators of secured zones will have to keep in mind that data
>    published on an authoritative primary server will not be
>    immediately seen by verifying clients; it may take some time for
>    the data to be transfered to other secondary authoritative
>    nameservers and clients may be fetching data from caching
>    non-authoritative servers. In this light ist is good to note that
>    the time from master to slave is negligible when using NOTIFY and
>    IXFR, increasing by reliance on AXFR, and more if you rely on the
>    SOA timing parameters for zone refresh.

s/ist/it/ - in there somewhere.  Any other comments?

>----------------------------------------------------------------------
>
>>
>>  I don't know what detail you want to include, but you should mention the
>>  sliding scale of performance between master and slave, and a note on what
>>  parameter(s) effect cache performance.
>
>Is the detail in the above paragraph sufficient? If not could you help
>us out with suggestions for text?

Probably enough.  I'd be curious to hear from others on this too. 
Maybe I'm glossing over something.

>----------------------------------------------------------------------
>
>>
>>  #3.1  Zone and Key Signing Keys
>>  #
>>  #   The DNSSEC validation protocol does not distinguish between DNSKEYs.
>>  #   All DNSKEYs can be used during the validation.  In practice operators
>>
>>  Is that true?  Is that because DNSKEYs must have the zone key bit set?
>>  I forget how we resolved TKEY stuff the type code roll.
>
>What about:
>
>  The DNSSEC validation protocol does not distinguish between DNSKEYs
>  with the SEP flag set or cleared. Both DNSKEYs can be used during the
>  validation. ...

Probably the original text is better.  No reason to mention the SEP 
flag here, I had forgotten that we kept the old KEY RR for TKEY stuff.

>----------------------------------------------------------------------
>
>>
>>  #3.3  Key Effectivity Period
>>
>>  #   For Key Signing Keys a reasonable key effectivity period is 13
>>  #   months, with the intent to replace them after 12 months.  An intended
>>  #   key effectivity period of a month is reasonable for Zone Signing
>>  #   Keys.
>>
>>  Shouldn't this be linked to some minimum size of the key?
>>
>
>
>What about prepending a line to the above paragraph introducing a new
>one sentence paragraph directly after that:
>
>+  From a purely operational perspective a reasonable key effectivity
>+  period for Key Signing Keys is 13 months, with the intent to
>    replace them after 12 months.  An intended key effectivity period
>    of a month is reasonable for Zone Signing Keys.
>
>+  For a key-size that matches these effectivity periods see section 3.5
>
>    Using these recommendations will lead to rollovers occurring
>    frequently enough to become part of 'operational habits'; the
>    procedure does not have to be reinvented every time a key is
>    replaced.
>
>    Key effectivity periods can be made very short, as in the order of a
>    few minutes.  But when replacing keys one has to take the
>    considerations from Section 4.1 and Section 4.2 into account.

I have two answers to this.

One is that I think recommending a span of changes is good, and 
saying that you should fit the key size to the span.  (As opposed to 
trying to worry about how long a key of a certain size will last.) 
Operations can deal with the calendar better than less tangible 
"events" (such as the time until a key is exposed or guessed).

The other answer is that I have heard suggestions that the KSK ought 
to be longer lived - like 3 or so years.  For the root, because of 
the pain of putting it into anchor positions, even longer.  This is 
counter to "keep it regular so you get used to it" but it has appeal 
to non-operations people.

I think it would be wise to recommend timing because there aren't a 
lot of clear statements on this important issue.  But perhaps you 
need to recommend different time scales for different kinds of zone 
administrations.

Others?

>----------------------------------------------------------------------
>
>>
>>  #3.6  Private Key Storage
>>  #
>>  #   It is recommended that, where possible, zone private keys and the
>>  #   zone file master copy be kept and used in off-line, non-network
>>  #   connected, physically secure machines only.  Periodically an
>>  #   application can be run to add authentication to a zone by adding
>>  #   RRSIG and NSEC RRs.  Then the augmented file can be transferred,
>>
>>  The problem with this recommendation is that a lot of the upper (sensitive)
>>  zones have a "response time" pressure that pushes them to use 
>>dynamic update.
>>  Currently, dynamic update (tools) don't allow the inclusion of signatures,
>>  this might have to be fixed.  Left in limbo is the recommendation to
>>  keep keys entirely off-line.
>
>I understand your observation but I find it difficult to turn this
>into document text. Do you have a suggestion?

Maybe...

When relying on dynamic update to manage a signed zone, be aware that 
at least one zone's private key will have to reside on the master 
server.  This key is only as secure as the amount of exposure the 
server receives to unknown clients and the security of the host. 
Although not mandatory, administering DNS in this manner benefits if 
the master is unavailable to the Internet, not listed in the NS 
RRSet, an approach known as a "hidden master."

(Don't know if we've ever defined "hidden master."

>>  #   perhaps by sneaker-net, to the networked zone primary server machine.
>>
>>  It's ironic that this basic tenet of the DNSSEC world is somewhat out of
>>  whack with what are labeled as the most sensitive zones.
>
>The irony is not intentional. How can we improve?

I think it's a case of time passing the technology by.  When we 
started this process registries ran much slower.  We've added dynamic 
update, IXFR, etc., and there has been a demand for immediate 
gratification when registering domain names.

The only way around this was to have DNSSECbis happen years ago. ;)

>----------------------------------------------------------------------
>
>
>>  #4.1.1  Time Considerations
>>  #
>>
>>  #   o  We suggest the signature publication period to be at least one
>>  #      maximum TTL smaller than the signature validity period.
>>  #         Resigning a zone shortly before the end of the signature
>>  #         validity period may cause simultaneous expiration of data from
>>  #         caches.  This in turn may lead to peaks in the load on
>>  #         authoritative servers.
>>
>>  This is confusing.
>>
>>  Are you suggesting that the publication period of a signature end at least
>>  one maximum TTL duration before the end of the signature's validity period?
>
>Yes, suggested rephrase:
>
>  o   We suggest the publication period of a signature end at least one
>      maximum TTL duration before the end of the signature's validity
>      period.
>
>          Resigning a zone shortly before the end of the signature
>          validity period may cause simultaneous expiration of data
>          from caches.  This in turn may lead to peaks in the load on
>          authoritative servers.
>
>----------------------------------------------------------------------
>
>
>>
>>  #   o  We suggest the minimum zone TTL to be long enough to both fetch
>>  #      and verify all the RRs in the authentication chain.  A low TTL
>>  #      could cause two problems:
>>  #         1.  During validation, some data may expire before the
>>  #         validation is complete.  The validator should be able to keep
>>  #         all data, until is completed.  This applies to all RRs needed
>>  #         to complete the chain of trust: DSs, DNSKEYs, RRSIGs, and the
>>  #         final answers i.e. the RR set that is returned for the initial
>>  #         query.
>>  #         2.  Frequent verification causes load on recursive nameservers.
>>  #         Data at delegation points, DSs, DNSKEYs and RRSIGs benefit from
>>  #         caching.  The TTL on those should be relatively long.
>>
>>  A low TTL has been demonstrated in workshops to be detrimental.  (Not a
>>  "could.")  Even in a close-in workshop, TTL's of under 5 or 10 minuted
>>  disrupted operations.  In the wide Internet, the floor of the TTL will
>>  have to be much higher.
>
>Do we have a reference to the minutes of these workshops? If not I
>propose to start the above paragraph with:
>
>
>   o We suggest the minimum zone TTL to be long enough to both fetch
>     and verify all the RRs in the authentication chain.  In workshop
>     environments it has been demonstrated [E.Lewis: private
>     communication] that a low TTL (under 5 to 10 minutes) caused
>     disruptions because of the following two problems:

We should locate notes...or use the WG list as reference.  (Usually 
docs don't explicitly reference the list of a WG, they indicate that 
the doc is a product of a WG.)

>----------------------------------------------------------------------
>
>>
>>  #         When a slave server is out of sync with its master and data in
>>  #         a zone is signed by expired signatures it may be better for the
>>  #         slave server not to give out any answer.
>>
>>  #         We suggest the SOA expiration timer being approximately one
>>  #         third or one fourth of the signature validity period.  It will
>>  #         allow problems with transfers from the master server to be
>>  #         noticed before the actual signature time out.
>>
>>  One wording choice I noticed - "smaller" rather than "shorter."  When we
>>  are talking time durations, "longer" and "shorter" are more appropriate.
>>
>>  I agree with the recommendation here, but I am not sure about the build up.
>>  I think that a slave ought to continue to serve up RRSIGs whose time has
>>  passed in the face of having lost contact with the master.  For two reasons,
>>  one is that the clock on the slave might be wrong and the other is that
>>  resolvers might be willing to accept past-due data or are 
>>completely ignoring
>>  DNSSEC.
>
>But it will cause a black out for part of the clients that pull from
>that "SOA timed out" server if the do not ignore DNSSEC and do not
>ignore signature validity time.  Lameness is probably better than
>complete blackouts.
>
>We use the "it may be better" consciously.
>
>Unless there are objections or alternative text I intend to keep the
>text as is.
>
>
>----------------------------------------------------------------------
>
>>
>>  #4.2.1.1  Pre-publish key set Rollover
>>  #
>>
>>  #    normal          pre-roll         roll            after
>>  #
>>  #    SOA0            SOA1             SOA2            SOA3
>>  #    RRSIG10(SOA0)   RRSIG10(SOA1)    RRSIG11(SOA2)   RRSIG11(SOA3)
>>  #
>>  #    DNSKEY1         DNSKEY1          DNSKEY1         DNSKEY1
>>  #    DNSKEY10        DNSKEY10         DNSKEY10        DNSKEY11
>>  #                    DNSKEY11         DNSKEY11
>>  #    RRSIG1 (DNSKEY) RRSIG1 (DNSKEY)  RRSIG1(DNSKEY)  RRSIG1 (DNSKEY)
>>  #    RRSIG10(DNSKEY) RRSIG10(DNSKEY)  RRSIG11(DNSKEY) RRSIG11(DNSKEY)
>>
>>                        RRSIG11(DNSKEY)  RRSIG10(DNSKEY)
>>
>>  Those too?
>
>No.. just as was written.
>
>You introduce the key but you do not sign with it yet. You just allow
>the public key to get "introduced" into the caches.
>
>The public key is published but the key's effectivity period starts
>only at the "roll".
>
>I hope this is clear. It is the core of the document.
>
>----------------------------------------------------------------------
>
>>  #      DNSKEY 10 is used to sign all the data of the zone, the zone-
>>  #      signing key.
>>  #   pre-roll: DNSKEY 11 is introduced into the key set.  Note that no
>>  #      signatures are generated with this key yet, but this does not
>>  #      secure against brute force attacks on the public key.  The minimum
>>  #      duration of this pre-roll phase is the time it takes for the data
>>  #      to propagate to the authoritative servers plus TTL value of the
>>  #      key set.  This equates to two times the Maximum Zone TTL.
>>
>>  Aren't all keys required to sign the key set?
>
>Only the keys a DS record points to.
>
>
>----------------------------------------------------------------------
>>
>>  #   roll: At the rollover stage (SOA serial 2) DNSKEY 11 is used to sign
>>  #      the data in the zone exclusively  (i.e. all the signatures from
>>  #      DNSKEY 10 are removed from the zone).  DNSKEY 10 remains published
>>  #      in the key set.  This way data that was loaded into caches from
>>  #      version 1 of the zone can still be verified with key sets fetched
>>  #      from version 2 of the zone.
>>  #      The minimum time that the key set including DNSKEY 10 is to be
>>  #      published is the time that it takes for zone data from the
>>  #      previous version of the zone to expire from old caches i.e. the
>>  #      time it takes for this zone to propagate to all authoritative
>>  #      servers plus the Maximum Zone TTL value of any of the data in the
>>  #      previous version of the zone.
>>
>>  Not the maximum TTL, but the TTL of the key set.
>
>
>You want data that is still cached and signed with signature 10 (as in
>version SOA1 of the key) to expire before you remove key10 from
>SOA2. That timing is set by the TTL of the zone data not the DNSKEY.

And also the REFRESH time in the SOA.  "Pulling" a record from the 
master won't pull it from a slave that isn't going to come back for 
REFRESH seconds.  It's like REFRESH + TTL at least.  Even if you use 
NOTIFY - recall that notifies are on UDP, hence not-guaranteed.

I also thought about this (the advantages of taking a real long time 
to reply, I suppose).  DNS allows a server to act as master and 
slave, so it's possible that the first round of slaves wait REFRESH 
seconds to get an updated zone from the master, and then their slaves 
REFRESH seconds after that.  "Regional masters" is one label I have 
heard for this practice.  In the extreme, it's conceivable that the 
true time for a zone's propagation to all authoritative servers is 
unbounded.

The text here ought to be general by saying that the admin should 
allow for the maximum propagation delay to all authoritative servers 
plus the TTL.  Usually the propagation delay is small with NOTIFY, 
may be REFRESH seconds for one layer of master server-slave server, 
may be 2*REFRESH for slave servers that rely on other slave servers 
for zone updates.  Further chaining of slave servers increases the 
potential delay, for this and other timing reasons, chaining of 
servers ought to be kept to a minimum.

>>   (This could be more
>>  complicated as there would have been a TTL of one week yesterday, then
>>  shortened to an hour today.)
>
>Agreed... but I think mentioning this would complicate rather than
>clarify.
>
>You could actually also mention that one could take the 'maximum'
>SIGNATURE expiration time of the data at the roll to determine when
>the post-roll should occur. Caches should throw out the data when that
>expiration time occurred. Since that provides you with an absolute time
>to do the rollover that may actually be a good recommendation to give
>too. (also see 4035 section 4.3 "The resolver SHOULD discard the
>entire atomic entry when any of the RRs contained in it expire").
>
>Any opinion from the working group?

I should draw some ascii art of the timeline.  I'll do that after 
sending this message and see what happens.

>----------------------------------------------------------------------
>
>>  #4.2.1.3  Pros and Cons of the Schemes
>>
>>  #   Pre-publish-key set rollover: This rollover does not involve signing
>>  #      the zone data twice.  Instead, before the actual rollover, the new
>>  #      key is published in the key set and thus available for
>>  #      cryptanalysis attacks.  A small disadvantage is that this process
>>  #      requires four steps.  Also the pre-publish scheme involves more
>>  #      parental work when used for KSK rollovers as explained in
>>  #      Section 4.2.
>>
>>  I don't think that cryptanalysis is possible without a signature to go
>>  along with the public key, however, dictionary attacks are possible.
>>  (As in "where have I seen this public key before and did I break it?")
>
>The cryptanalysis attack was mentioned in 4.2.1.3, should we remove
>that line?
>
>Your editor needs guidance.

Does anyone else have an opinion.  I've been led to believe my 
comment by others professing cryptology expertise.  Maybe I've been 
misled or I misunderstood the problem.

>----------------------------------------------------------------------
>
>
>>  #   The scenario above puts the responsibility for maintaining a valid
>>  #   chain of trust with the child.  It also is based on the premises that
>>  #   the parent only has one DS RR (per algorithm) per zone.  An
>>  #   alternative mechanism has been considered.  Using an established
>>  #   trust relation, the interaction can be performed in-band, and the
>>  #   removal of the keys by the child can possibly be signaled by the
>>  #   parent.  In this mechanism there are periods where there are two DS
>>  #   RRs at the parent.  Since at the moment of writing the protocol for
>>  #   this interaction has not been developed further discussion is out of
>>  #   scope for this document.
>>
>>  Perhaps you should also show the DS set at the parent in the example.
>>  Later you have one, but it is for the 2 DS at the parent option.
>
>Ack, proposed diagram:
>
>        Parent:
>        normal                  between "roll"
>                                and "after"
>        SOA0                    SOA3
>        RRSIGpar(SOA0)          RRSIGpar(SOA3)
>        DS1                     DS2
>        RRSIGpar(DS)            RRSIGpar(DS)
>
>
>        normal          roll                           after
>
>        SOA0            SOA1                           SOA2
>        RRSIG10(SOA0)   RRSIG10(SOA1)                  RRSIG10(SOA2)
>
>        DNSKEY1         DNSKEY1                        DNSKEY2
>                        DNSKEY2
>        DNSKEY10        DNSKEY10                       DNSKEY10
>        RRSIG1 (DNSKEY) RRSIG1 (DNSKEY)                RRSIG2(DNSKEY)
>                        RRSIG2 (DNSKEY)
>        RRSIG10(DNSKEY) RRSIG10(DNSKEY)                RRSIG10(DNSKEY)

I would label this as four events - initial, new key, DS change, key removal.

I would also make the four look more asynchronous, like below:


        initial         new key           DS change       key removal

        Parent Zone:
        SOA0            -------->         SOA3            -------->
        RRSIGpar(SOA0)  -------->         RRSIGpar(SOA3)  -------->
        DS1             -------->         DS2             -------->
        RRSIGpar(DS)    -------->         RRSIGpar(DS)    -------->

        Child Zone:
        SOA0            SOA1              -------->       SOA2
        RRSIG10(SOA0)   RRSIG10(SOA1)     -------->       RRSIG10(SOA2)
        DNSKEY1         DNSKEY1           -------->       DNSKEY2
                        DNSKEY2           -------->
        DNSKEY10        DNSKEY10          -------->       DNSKEY10
        RRSIG1 (DNSKEY) RRSIG1 (DNSKEY)   -------->       RRSIG2(DNSKEY)
                        RRSIG2 (DNSKEY)   -------->
        RRSIG10(DNSKEY) RRSIG10(DNSKEY)   -------->       RRSIG10(DNSKEY)

This isolates the steps at the child vs the parent.

>----------------------------------------------------------------------
>
>>
>>  #4.2.3  Difference Between ZSK and KSK Rollovers
>>  #
>>  #   Note that KSK rollovers and ZSK rollovers are different.  A zone-key
>>  #   rollover can be handled in two different ways: pre-publish (Section
>>  #   Section 4.2.1.1) and double signature (Section Section 4.2.1.2).
>>
>>  They really aren't that different - it's just the interaction with the
>>  parent and waiting on the parent that is different.  To a KSK, the "entire"
>>  zone is the DNSKEY set, as opposed to all sets for the ZSK.
>
>Suggestion
>
>    Note that KSK rollovers and ZSK rollovers are slightly different.
>                                                  ^^^^^^^^

Maybe that's over simplifying it.

Note that a KSK rollover and a ZSK rollover are similar but differ in 
one fundamental aspect.  KSK rollovers involve requesting action by 
the parent and the ensuing delay in waiting for it.  Other than that, 
both can be achieved by pre-publishing the new key or by using double 
signatures during the rollover.

>----------------------------------------------------------------------
>
>
>
>>
>>  #4.3  Planning for Emergency Key Rollover
>>  #
>>  #   This section deals with preparation for a possible key compromise.
>>  #   Our advice is to have a documented procedure ready for when a key
>>  #   compromise is suspected or confirmed.
>>  #
>>  #   When the private material of one of your keys is compromised it can
>>  #   be used for as long as a valid authentication chain exists.  An
>>  #   authentication chain remains intact for:
>>  #   o  as long as a signature over the compromised key in the
>>  #      authentication chain is valid,
>>  #   o  as long as a parental DS RR (and signature) points to the
>>  #      compromised key,
>>
>>  This is a considerable problem.  A reminder that DS records ought to be
>>  conservatively signed.
>
>
>Suggestion:
>
>       o as long as a parental DS RR (and signature) points to the
>         compromised key (also see 4.4.4  DS Signature Validity Period)
>
>----------------------------------------------------------------------
>
>>
>>  #4.3.1  KSK Compromise
>>  #
>>  #   When the KSK has been compromised the parent must be notified as soon
>>  #   as possible using secure means.  The key set of the zone should be
>>  #   resigned as soon as possible.  Care must be taken to not break the
>>  #   authentication chain.  The local zone can only be resigned with the
>>  #   new KSK after the parent's zone has created and reloaded its zone
>>  #   with the DS created from the new KSK.  Before this update takes place
>>  #   it would be best to drop the security status of a zone all together:
>>  #   the parent removes the DS of the child at the next zone update.
>>  #   After that the child can be made secure again.
>>
>>  During any emergency impacting a system, I don't expect the system to
>>  continue operating smoothly.  As here, if there is a compromised key,
>>  I don't expect maintaining the authentication chain is a priority.  Two
>>  things might be reasonable - dropping security and publication of the
>>  problem via other channels.
>>
>>  Minimizing an outage is a priority of course, meaning that one key ought
>>  not cause disruption for sibling domains.
>>
>>  #
>>  #   An additional danger of a key compromise is that the compromised key
>>  #   can be used to facilitate a legitimate DNSKEY/DS and/or nameserver
>>  #   rollover at the parent.  When that happens the domain can be in
>>  #   dispute.  An authenticated out of band and secure notify mechanism to
>>  #   contact a parent is needed in this case.
>>
>>  It's never wise to secure a system only by using the system's security.
>
>We already suggested alternative text please see:
>http://darkwing.uoregon.edu/~llynch/dnsop/msg03461.html
>
>I hope that text is clearer.


What I don't understand is how the "chain of trust is broken."

It's the resolver that evaluates the trustworthiness of an RRSet, 
it's not set by the server.  Hence, with the potential that someone 
is maliciously publishing data via a compromised key, the honest 
server can't break the chain of trust.

All the server can do is limit the exposure caused by the key and the 
fallout of the problem.

First, using the private key must stop and the (if KSK) DR RR no 
longer be signed by the parent.  The latter begins to limit the 
duration of the vulnerability.

You don't gain by pulling the key or DS RR before the latest 
expiration of a legitimate key or DS because the attacker could be 
using that in a replay tactic.  The benefit of this is that you don't 
"break" legitimate data still in caches.

However, overriding all of this, is the fact that until the latest 
expiring copy of the key or DS expires, you do not have security 
control of the zone.  The big question is - how long is a zone admin 
willing to allow this window of vulnerability?

Pulling data early doesn't effectively shorten this window.

>----------------------------------------------------------------------
>
>>
>>  #4.4.3  Security Lameness
>>  #
>>  #   Security Lameness is defined as what happens when a parent has a DS
>>  #   RR pointing to a non-existing DNSKEY RR.  During key exchange a
>>  #   parent should make sure that the child's key is actually configured
>>  #   in the DNS before publishing a DS RR in its zone.  Failure to do so
>>  #   could cause the child's zone being marked as Bogus.
>>
>>  I think it is dangerous to suggest that the parent check the health of the
>>  child.  During key rollover, I think the child ought to be looking to see
>>  when the parent has changed the DS record before changing the 
>>child's DNSKEYs.
>
>Now I am confused. I can imagine you would not like to see the
>suggestion that the parent checks the health of the child but you not
>having a DNSKEY before the parent publishes a DS would really break
>things. It would at least put the "double signature rollover" to the
>dustbin and registries would then have to deal with multiple DSs in
>their zone.

A child can list two KSK's in their zone, ask the parent to DS the 
new one.  When the new DS is signed, the old KSK is yanked.  There is 
no need for the parent to see if the requested DS corresponds to any 
published KSK, the child ought to be indicating that the new KSK is 
in place by making the request.

I don't see any reason that the parent has to have multiple DS 
records (per algorithm).  True, if the child requests a new DS and 
hasn't put the KSK in the zone, none of the child's data will 
validate.  In this case, the child feels the pain and the child is 
the one that can fix this.

If the parent has to find it - it's like looking for the bent needle 
in a haystack.  And the parent really can't fix the situation - 
putting back in the old DS might prolong a vulnerability that 
prompted the botched rollover.

>>  If you have the child looking up and the parent looking down, you run the
>>  risk of a control "loop."  I think the burden ought to be on the child to
>>  always make sure it is well represented in the parent, to keep the attention
>>  focused in one direction.  Also, a large delegating parent might waste time
>>  on the well-run children instead of helping out the needy kids.
>>
>>  Consider too that a child knows better what it's outage (connectivity)
>>  situation is (than does the parent), which could account for any "missing"
>>  keys.
>
>
>Does changing the "should" into "could" in the above paragraph address
>your uneasiness.
>
>   "During key exchange a parent could make sure that the child's key is
>    actually configured"

Could, yes, but I would add "as part of a comprehensive delegation 
check."  I don't think we want to say that DNSSEC also incurs higher 
maintenance just because it can - which is what I read here.

>----------------------------------------------------------------------
>
>>
>>  #4.4.4  DS Signature Validity Period
>>  #
>>  #   Since the DS can be replayed as long as it has a valid signature, a
>>  #   short signature validity period over the DS minimizes the time a
>>  #   child is vulnerable in the case of a compromise of the child's
>>  #   KSK(s).  A signature validity period that is too short introduces the
>>  #   possibility that a zone is marked Bogus in case of a configuration
>>  #   error in the signer.  There may not be enough time to fix the
>>  #   problems before signatures expire.  Something as mundane as operator
>>  #   unavailability during weekends shows the need for DS signature
>>  #   validity periods longer than 2 days.  We recommend the minimum for a
>>  #   DS signature validity period of a few days.
>
>>
>>  Weeks.  For a large zone, days are not enough.
>>
>>  It's not the signing that's a problem, it the management of the registry
>>  that is.
>
>If the signing is not a problem than I do not understand why the
>management of the registry is a problem; the DS RRs that are published
>by the parent are not subject to change.

It's been long said that the signing of DNSSEC is the easiest part of 
the problem - ever since the first complaints that the early signers 
took a lot of time.

The problem is that, in operations, things go wrong.  Hardware fails, 
demand fluctuates, the world is not inherently cyclical no matter how 
hard the ops staff tries to make it.  Because of this, you want to 
set up events that are easily identifiable (so you can tell if it 
happened or not) and you want to leave spare time for catch up.  If 
ths signing process doesn't fire on Tuesday night because of an 
upgrade to the air conditioning units, you have to allow for it to 
happen a week later because maybe there is other work due on 
Wednesdays.

>Or is "the management of the registry" not a piece of software but
>pieces of bio-ware that sets the minimal values?

A registry is not a software chunk, it's a conglomeration of 
subsystems - including things like billing that are usually 
forgotten, layered on top of utilities that have a mind of their own.

We've often said that DNSSEC complicates the delegations in DNS. 
Delegations are hard enough to manage today.  For this section, what 
I ask is that we do not appear too optimistic in our timing 
recommendations.

>----------------------------------------------------------------------
>>  #   The maximum signature validity period of the DS record depends on how
>>  #   long child zones are willing to be vulnerable after a key compromise.
>>  #   Other considerations, such as how often the zone is (re)signed can
>>  #   also be taken into account.
>>  #
>>  #   We consider a signature validity period of around one week to be a
>>  #   good compromise between the operational constraints of the parent and
>>  #   minimizing damage for the child.
>>
>>  One week is not realistic, one month is what to prepare for.  IMHO,
>>  putting any timescale in this document might create unrealistic
>>  expectations unless the timescale is a necessary piece of the
>>  protocol.
>
>Hmmm, the document doesn;t force you to choose this particular compromise.
>
>I do think that putting timescales in is something that the "DNSOP"
>group can do even though the timing is not piece of the protocol. In
>the document we argue the tradeoff and argue that the timescale is a
>compromise.
>
>I do get your point though, you would not like to see this document
>being stuffed in your face if you do not meet these
>recommendations. Maybe we can address this in the introduction of the
>text by adding a line to the first paragraph of the Introduction:
>
>
>    During workshops and early operational deployment tests, operators
>    and system administrators gained experience about operating the DNS
>    with security extensions (DNSSEC).  This document translates these
>    experiences into a set of practices for zone administrators.  At the
>    time of writing, there exists very little experience with DNSSEC in
>    production environments; this document should therefore explicitly
>+  not be seen as representing 'Best Current Practices'. The intention of
>+  this document is to provide guidance, it should not be used to argue
>+  that operators violate best practices when they choose not to follow
>+  recommendations herein.

On the one hand, I see a few places where IETF documents have been 
taken too seriously.  On the other hand, I have seen a few places 
where a lack of a clear statement has led to a morass of problems.

For example, statements made about IPv6 allocations that appeared in 
IETF documents 5 years ago still rule conversations in the RIRs. 
Alone this is not bad, but there are some who have taken time to 
acknowledge that the IETF of 5 years ago has significantly less 
experience than the operator community of today.  (To emphasize my 
point without engendering a flamefest - the IETF of 5 years ago has 
much less operational experience than the IETF of today too, but I 
didn't mean to compare that.)  The documentation of years ago is 
still pertinent, but some have refused to question it.

I also see where not saying enough leads to problems.  Service level 
agreements are needed to judge the expectations and performance on 
either side of a contractural agreement.  When there are many 
contracts for similar service, there is a natural tendency to compare 
the performers.  This can't be fairly done if the service levels are 
measured on an ad hoc basis - which is what happens in a vacuum of a 
standard recommendation.

This is kind of long-winded in saying that I am concerned that the 
words above may not be taken seriously but I think softening the 
document in other areas is a mistake.

>>  #   In addition to the signature validity period, which sets a lower
>>  #   bound on the amount of times the zone owner will need to sign the
>>  #   zone data and which sets an upper bound to the time a child is
>>  #   vulnerable after key compromise,  there is the TTL value on the DS
>>  #   RRs.  By lowering the TTL, the authoritative servers will see more
>>  #   queries, on the other hand a low TTL increases the speed with which
>>  #   new DS RRs propagate through the DNS.  As argued in Section 4.1.1,
>>  #   the TTL should be a fraction of the signature validity period.
>>
>>  A lower TTL doesn't really increase "the speed with which new DS RRs
>>  propagate through the DNS."  What is true is that it "lowers the persistence
>>  of DS RRSets in caches, forcing more queries to the authoritative servers."
>
>
>How about:
>
>   By lowering the TTL, the authoritative servers will see more
>   queries, on the other hand a low TTL lowers the persistence of old
>   DS RRSets in caches thereby increases the speed with which new DS
>   RRs propagate through the DNS.

Lowering the TTL, lowering the REFRESH, and flattening the zone transfer tree.

I would reword as this:

   Shortening the TTL means that the authoritative servers will see more
   queries.  But on the other hand, a short TTL lowers the persistence of
   DS RRSets in caches thereby increases the rapidity with which updated DS
   RRSets propagate through the DNS.

>----------------------------------------------------------------------
>>
>>  #Appendix A.  Terminology
>>
>>  #   Secure Entry Point key or SEP Key: A KSK that has a parental DS
>>  #      record pointing to it.  Note: this is not enforced in the
>>  #      protocol.  A SEP Key with no parental DS is security lame.
>
>
>Yess.. this looks weird ... the last sentence is just wrong...
>
>How about just:
>
>   Secure Entry Point key or SEP Key: A KSK that has a parental DS
>       record pointing to it or is configured as a trust-anchor Note:
>       this is not enforced in the protocol.

I'd leave off the "Note:..."  I still don't know what that means.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis                                                +1-571-434-5468
NeuStar

If you knew what I was thinking, you'd understand what I was saying.
.
dnsop resources:_____________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/dnsop.html
mhonarc archive: http://darkwing.uoregon.edu/~llynch/dnsop/index.html