[DNSOP] Comments/Additions on I-D Action:draft-ietf-dnsop-rfc4641bis-01.txt

Paul Wouters <paul@xelerance.com> Mon, 27 April 2009 22:22 UTC

Date: Mon, 27 Apr 2009 18:23:21 -0400
From: Paul Wouters <paul@xelerance.com>
To: "Olaf M. Kolkman" <olaf@nlnetlabs.nl>
Message-ID: <alpine.LFD.1.10.0904271749420.12224@newtla.xelerance.com>
User-Agent: Alpine 1.10 (LFD 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format="flowed"; charset="US-ASCII"
Cc: namedroppers@ops.ietf.org, dnsop@ietf.org
Subject: [DNSOP] Comments/Additions on I-D Action:draft-ietf-dnsop-rfc4641bis-01.txt
Precedence: list

Below are my comments on draft-ietf-dnsop-rfc4641bis-01.txt, as well
as a proposed addition regarding NSEC3 (which postdates the original
rfc4146 and was not yet included in its successor)

1.
 	This document describes how to run a DNS Security (DNSSEC)-enabled
 	environment.

It seems this document only describes how to run authoratative DNS Security
enabled servers. Do we not have recommendations for validating resolvers?

I guess these are described from an implementation point of view in
draft-wijngaards-dnsext-resolver-side-mitigation, but do we have any
operational practises for validating resolvers? Regarding loading and
maintaining trust anchors, rfc5011 handling, etc?

2.

 	In this light, note that the time for a zone transfer from master
 	to slave is negligible when using NOTIFY [8] and incremental
 	transfer (IXFR) [7].

Is this still the case when NSEC3 is used?

3.1.1

 	A KSK can have a longer key effectivity period.

The use of "can" here is unclear to me. Can it because of keysize? because
of offline storage? because of HSM? But at least not because of the protocol
which is kind of what this section focuses on.

 	For almost any method of key management and zone signing, the KSK is
 	used less frequently than the ZSK.  Once a key set is signed with the
 	KSK, all the keys in the key set can be used as ZSKs.  If a ZSK is
 	compromised, it can be simply dropped from the key set.  The new key
 	set is then re-signed with the KSK.

This paragraph seems a little incoherent. It starts out saying you hardly
need the KSK, then jumps to one case of using a KSK, and then even seems
to suggest the KSK can make a previously signed ZSK "go away".

 	The Zone Signing Key can be used to sign all the data in a zone

Since having multiple ZSK's is not explained here, this also reads a
little confusing. Either say "a" instead of "the" ZSK, or say that the
ZSK "is used to sign all data".

 	Thus, rolling a KSK with a parent is only done for two reasons:
 	to test and verify the rolling system to prepare for an emergency,
 	and in the case of an actual emergency.

I don't think this is true at all. If so, no one would be scheduling to
do KSK rollovers in 1-3 years once they go into production. In addition,
it makes no sense to me. In case of an emergency, something could go wrong
with the key rollover, but the real damage here is the the cause of the
KSK rollover, not the cause of the failure of the KSK rollover. In other
words, doing a test that fails would be more harmful then waiting on the
KSK emergency, doing a rollover and then failing.

 	[ straw-man by Paul Hoffman]

This is already voiced in the above 2nd bullet point. either seperate
listing the school of thought in the bullet points and only explain in
the paragraphs underneath, or merge the explanations in the bullet points.
Probably the "in order to reduce...." can be removed.

3.3

 	Key Effectivity Period

Didn't 3.1.1 just state in one of the bullet points that no rollover should
happen unless there is reason to believe there is a compromise? So there
would be no effectivity period. [Note that I don't agree with that :)]

 	a reasonable effectivity period for KSKs [...] is of the order of 2 decades

Two decades? Are we predicting mathematical and hardware improvements for
a period of 20 years? That seems to me to be equivalent to tea leaf reading.
(or for Olaf and Miek, "koffiedik kijken")
And again, i find it somewhat contractive to the "let's practise and incur
damage, incase we need to do this in two decades and would incur damage
for doing it wrong". In fact, here you say that doing this annually 20 times
will cause less chance of an error then doing it once every 2 decades?

3.4 Key Algorithm

RFC 3447 Section 8 (Feb 2003) recommends gradually migrating from
RSASSA-PKCS1-v1_5 to RSASSA-PSS for signature algorithms. However,
six years later, we are still have RSASHA1 (algorithm 2 in DNSSEC)
using RSASSA-PKCS1-v1_5 as our main algorithm, and are about to
recommend it in a new RFC. I would not call this "gradual migration"
anymore.

Has an procedure been started to define a new algortihm number for use
of RSASSA-PSS with DNSSEC? If no work has started on this, I'm happy to
initiate it so that rfc4641bis can reference it.

3.5 Key Sizes

 	If an attacker has the capability of breaking a 1024-bit DNSSEC key,
 	he also has the capability of breaking one of the many 1024-bit TLS
 	trust anchor keys that are installed with web browsers.

I really don't like this argumentation, because the TLS people will then
argue the reverse for DNSSEC and both parties have used each other in a
false self-reference to justify their key length. The same applies to the
next paragraph. If TLS slowly migrated to 2048 bit keys, there will not
be "a huge amount of publicity" on 1024 bit keys. And in general, protocols
should not rely on CNN for their security.

 	"heavily used".  That advice may have been true 15 years ago, but it
 	is not true today when using RSA or DSA algorithms .

Paul Hoffman and be talked about this a bit on the list. I still believe
(and my lunching cryptographer agreed) that since DSA relies on a piece of
randomness for every signature, the more you sign, the more vulnerable one
is to the bias of a random number generator, and therefor to a compromise
of the private key. I agree that the heavilly used argument is not valid
for RSA keys though.

3.6 Private Key Storage

(applies also to 4.3.2 ZSK Compromise)
Database compromise is harder to recover from then ZSK compromise. Since
the database has changed legitimately over time as well compared to
the backup. This might be taken into consideration when deciding on an HSM
for a ZSK, as the database and the ZSK most likely would reside on the
same machine and would be simultaniously affected by a compromise.
Therefor, putting the ZSK in private/offline storage, really only makes
sense for zones that to not change frequently and are therefor the smaller
zones. It makes sense for xelerance.com, but not for .com.

4.2.1.1. Pre-Publish Key Rollover

Call this Pre-Publish Zone Key Rollover to avoid confusion.

It should also mention tha the pre-published private ZSK should NOT be
stored online (not even in an HSM, since it would have to be marked
as "exportable" for later, defeating the security of the HSM). Without
this, one cannot use the pre-publish ZSK from a machine "key compromise",
as both will be compromised at once. Also note that if the key compromise
is based on a mathematical breakthrough, the pre-published key, assumed
to be of equal key algorithm and key size, should also be considered
compromised. All in all, I probably would not associate pre-publishing
with key compromise, but associate it with "addressing caching issues"
or "smooth key rollover procedures" instead.

I would also change RRSIGx(DNSKEY) to RRSIGx(DNSKEYs) to avoid confusion
that two sigs are needed because there are two keys, and more clearly
show that two sigs are needed because two keys sign each rrset. Since
signing of the DNSKEY RRset is different from the remainder of the zone,
it might make sense to list it seperately as RRSIGx(zonedata), especially
since that more clearly shows the difference compared to 4.2.1.2.

Perhaps even not include using the KSK in this example at all to avoid
confusion.

Paul Hoffman pointed out that the term "brute force attacks" should not
be used.  There might be attacks on the key that take considerable time,
but re not a brute force attacks.

4.2.1.3

 	"A  small disadvantage is that this process requires four steps."

No, as pointed out, it requires 3 steps if done right. And actually, it
requires 2 steps but involves 3 states.

 	"thus is available for cryptanalysis attacks"

Perhaps add "therefor, a relatively short (order of months for a 1024 bit key)
is advised" for the entire lifespan of a ZSK from pre-publishing to
retirement.

 	"An advantage is that it only requires three steps."

Again, 2 steps, 3 phases.

I am kind of confused why we don't recommend 4.2.1.1 over 4.2.1.2, since I
know everyone feels that way anyway. In fact, if this is a BCP, I would
more likely just recommend NOT using 4.2.1.2 at all. It feels like a
grocery store putting a more expensive version on the shelve so you will
just buy the medium priced version. I think we can be honest instead :)


4.2.2 Key Signing Key Rollovers

A mention of multiple DS records in case of Registrant/Registrar/ZoneAdmin
hand-over might be made here.

Also, the method of the parent polling, combined with RFC 5011 should be
mentioned as a possible approach with responsibility at the parent.

Why is the worse alternative for ZSK double signing explained and not
rejected, while a perfectly valid alternative for KSK rollover is mentioned
but rejected outhand by not getting explained? Some TLD's do or plan on
doing just this

4.2.4 Key algorithm rollover

 	"you may not have a key of an algorithm for which you do not have signatures."

 	I quite understand not this....

I am not sure I understand why this needs to be different from a regular
rollover. Either the algorithm has been broken, and cannot be trusted anyway,
and an emergency key rollover could be started (if deemed less damaging then
a regular key rollover in a few days), or the algorithm change is not
urgent and can be done when a regular key rollover is scheduled.

I think this boils down to "Why not limit algorithm rollovers to coincide
with key rollovers"?

4.3

The use of "the chain is intact as long as Foo and Bar" does not work very
well giving individual sentences reading "as long as" while it really should
say "as long as all these items remain true". Perhaps just move the "as long
as" from the bullet points to the sentence before the enumeration.

 	"as long as a parental DS RR (and signature) points to the compromised key,"

That should be "as long as someone has (a cached version) of the parental DS RR
with a signature life that is still valid". The DS RR might not be pointing
there, but an old version is injected to, or remains in, a cache.

And in some sense, bullet point 1 and 2 are the same (well 2 is a superset)

4.3.1 "That zone could be used to poison the DNS."

I would not use the would "poison" anywhere, and keep that word to mean
"modify unsigned data".

4.3.1.2  "the other" -> "the second"

It does nost list "removing the KSK" as a method to break the chain of trust.

4.3.2
 	"Also note that as long as the RRSIG over the compromised ZSK is not
 	 expired the zone may be still at risk."

I'd say "Also note that as long as the RRSIG over the compromised ZSK still
covers a valid time period, the zone is still at risk until the ZSK is removed
from the zone".

I would probably rewrite it a little to show the arguments here are really
for the zone administrator to balance the immediate removal of the ZSK over
the gradual removal of the ZSK, and are not just "notes".

4.3.3

This section talks about the dangers of trust anchor key management, and
as example names the very unique non-representative situation where one
would only have one trust anchor preconfigured.

4.4.1


 	"A DNSKEY query tool can make use of the SEP bit [5] to select the
 	proper key from a DNSSEC key set"

This does not take into effect the revoke bit or any other possible future
bits denotating certain key properties. I would make this statement more
generic by not specifying bit 5 specifically.

4.4.3 Security lameness

I am not sure if the term "security lame" is a good one. As I understand it,
as a non-native english speaker, is that lame (or cripple) means "less then
normal (or optimal)", eg things still work but slower. But that's not the
case for a wrong DS. It's not lame, it is bogus (or broken or non-existant)

4.4.4

I am missing some arguments on being able to re-use more signatures in a
zone when the validity period is longer, requiring a signer to do less work
by for instance generating signatured with a 30 day period and re-using
them for two weeks in the zone that is re-signed hourly or daily.

This section also mentioned TTL concerns on DS records, but offers no
recommendations like it does for rrsig lifetimes.

4.4.5

(remove use of "registrar A" and "registrar B" and only use "gaining
registrar" and "losing registrar")

 	"One could proceed with a pre-publish ZSK rollover whereby registrar
 	A pre-publishes the ZSK of registrar B [...]"

I assume KSK, not ZSK is meant here? I don't think ZSK handovers should ever
be needed (see below)

The gaining registrar adds the losing's registrar's public KSK in their zone.
(can be done without cooperation). They then add their own KSK and ZSK, and
then do the transfer. Next, the DS has changed, but anything cached and
signed with the old (cached) KSK is still considered valid. Even if losing
registrar tried to be evil, eg give the losing KSK insane lifetimes, could
the gaining registry opt to keep that losing KSK valid for longer.

 	"The only viable option for the registrant is to publish its zone
 	unsigned and ask the registry to remove the DS"

The case of cached data by long evil TTL's of ZSK's that don't exist anymore
(or for short ones that don't exist where no maliciousness is involved) could
be handled by validators by simple removing any signed data from the cache for
which no valid key (through dnssec validation) can be found and optionally
retry to obtain that data with exponential back off.


I am furthermore missing a discussion on NSEC vs NSEC3 and NSEC3 parameters.
I've tried to write something sensible that is included below, as a presumed
section 5:


5 Next Record type

There are currently two types of next records that are provide
authenticated denial of existence of DNS data in a zone.

- The NSEC (RFC 4034) record builds a linked list of sorted RRlabels
   with their record types in the zone.

- The NSEC3 (RFC5 155) record builds a similar linked list, but
   uses hashes instead of the RRLabels.

5.1 Reasons for the existence of NSEC and NSEC3

The NSEC record requires no cryptographic operations aside from validating
its associated signature record. It is human readable and can be used in
manual queries to determin correct operation.  The disadvantage is that it
allows for "zone walking", where one can request all the entries of a zone
by following the next RRlabel pointed to in each subsequent NSEC record.

Though some claim all data in the DNS should be considered public, it
sometimes is considered to be more then private, but less then public data.

The NSEC3 record uses a hashing method of the requested RRlabel.
To increase the workload required to guess entries in the zone, the number
of hashing interations can be specified in the NSEC3 record. Additionally,
a salt can be specified that also modifies the hashes. Note that NSEC3
does not give full protection against information leakage from the zone.

5.2 NSEC or NSEC3

For small zones that only contain contain records in the APEX and a few
common (guessable) RRlabels such as "www" or "mail", NSEC3 provides no
real additional security, and the use of NSEC is recommended to ease
the work required by signers and validating resolvers.

For large zones where there is an implication of "not readilly available"
RRlabels, such as those where one has to sign an NDA before obtaining it,
NSEC3 is recommended.

5.3 NSEC3 parameters

The NSEC3 hashing includes the FQDN in its uncompressed form. This
ensures brute force work done by an attacker for one (FQDN) RRlabel
cannot be re-used for another (FQDN) RRlabel attack, as these entries
are per definition unique.

5.3.1 NSEC3 Algorithm

The NSEC3 algorithm is specified as a version of the DNSKEY algorithm.
The current options are:

Algorithm 6, DSA-NSEC3-SHA1 is an alias for algorithm 3, DSA.
Algorithm 7, RSASHA1-NSEC3-SHA1 is an alias for algorithm 5, RSASHA1.

The algorithm choice therefor depends solely on the DNSKEY algorithm
picked.

[Note that there is an issue here as well as mentioned in Section 3.4
  regarding RSASSA-PKCS1-v1_5 vs RSASSA-PSS as well as no algorithm choice
  for SHA-256]

5.3.2 NSEC3 Iterations

RFC-5155 describes the useful limits of iterations compared to RSA key
size. These are 150 iterations for 1024 bit keys, 500 iterations for
2048 bit keys and 2,500 iterations for 4096 bit keys. Choosing 2/3rd of
the maximum is deemed to be a sufficiently costly yet not excessive value.

5.3.3 NSEC3 Salt

The salt with NSEC3 is not used to increase the work required by an
attacker attacking multiple domains, but as a method to enable creating a
new set of hash values if at some point that might be required. The salt
is used as a "roll over". The FQDN RRlabel, which is part of the value
that is hashed, already ensures that brute force work for one RRlabel
can not be re-used to attack other RRlabel due to their uniquenes.

Key rollovers limit the amount of time attackers can spend on attacking
a certain key before it is retired. The salt serves the same purpose for
the hashes, which are independant of the key being used, and therefor
do not change when rolling over a key. Changing the salt would cause an
attacker to lose all precalculated work for that zone.

The salt of all NSEC3 records in a zone needs to be the same.
Since changing the salt requires the NSEC3 records to be regenerated,
and thus requires generating new RRSIG's over these NSEC3 records, it
is recommended to only change the salt when changing the Zone Signing Key,
as that process in itself already requires all RRSIG's to be regenerated.

5.3.4 Opt-out

An Opt-Out NSEC3 RR does not assert the existence or non-existence of the
insecure delegations that it may cover. This allows for the addition or
removal of these delegations without recalculating or re- signing RRs in
the NSEC3 RR chain. Therefor, Opt-Out should be avoided if possible. A
scenario where one of the authoratative nameservers of a zone does not
have enough resources to hold the additional NSEC3 records in memory is
one of very few reasons to deploy with Opt-Out.

[DNSOP] Comments/Additions on I-D Action:draft-ie… Paul Wouters
Re: [DNSOP] Comments/Additions on I-D Action:draf… Antoin Verschuren
Re: [DNSOP] Comments/Additions on I-D Action:draf… Paul Wouters