[DNSOP] Review of draft-ietf-dnsop-rfc5011-security-considerations-11
Viktor Dukhovni <ietf-dane@dukhovni.org> Wed, 21 February 2018 21:55 UTC
Return-Path: <ietf-dane@dukhovni.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A5DD120725 for <dnsop@ietfa.amsl.com>; Wed, 21 Feb 2018 13:55:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wz7KPQDoTMoX for <dnsop@ietfa.amsl.com>; Wed, 21 Feb 2018 13:55:43 -0800 (PST)
Received: from mournblade.imrryr.org (mournblade.imrryr.org [108.5.242.66]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0EF9112D955 for <dnsop@ietf.org>; Wed, 21 Feb 2018 13:55:42 -0800 (PST)
Received: from [10.200.0.109] (unknown [8.2.105.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mournblade.imrryr.org (Postfix) with ESMTPSA id 1068C7A3309 for <dnsop@ietf.org>; Wed, 21 Feb 2018 21:55:42 +0000 (UTC) (envelope-from ietf-dane@dukhovni.org)
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Reply-To: dnsop@ietf.org
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Message-Id: <E5735B24-EEF4-40DB-91CF-028F5799EE06@dukhovni.org>
Date: Wed, 21 Feb 2018 16:55:41 -0500
To: dnsop@ietf.org
X-Mailer: Apple Mail (2.3445.5.20)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/f2VstCRkxJ1e757dPo6HOFbG2xE>
Subject: [DNSOP] Review of draft-ietf-dnsop-rfc5011-security-considerations-11
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Feb 2018 21:55:45 -0000
1. Introduction Because of this lack of guidance, zone publishers may derive incorrect assumptions about safe usage of the RFC5011 DNSKEY s/derive/arrive at/ and is intended to complement the guidance offered in RFC5011 (which is written to provide timing guidance solely to a Validating Resolver's point of view). s/solely to/solely from/ 1.1. Document History and Motivation To verify this lack of understanding is wide-spread, the authors s/verify/confirm that/ All 5 experts answered with an insecure value, and we determined that this lack of mathematical understanding might cause security concerns s/mathematical// in deployment. We hope that this companion document to RFC5011 will rectify this understanding and provide better guidance to zone s/understanding// publishers that wish to make use of the RFC5011 rollover process. s/that/who/ 1.2. Safely Rolling the Root Zone's KSK in 2017/2018 One important note about ICANN's (currently in process) 2017/2018 KSK rollover plan for the root zone: the timing values chosen for rolling the KSK in the root zone appear completely safe, and are not affected by the timing concerns introduced by this draft s/introduced by this draft/discussed in this draft./ 2. Background The RFC5011 process describes a process by which a RFC5011 Resolver s/The RFC5011 process/RFC5011/ s/a RFC5011/an RFC5011/ operational guidance or recommendations about the RFC5011 process and restricts itself to solely the security and operational ramifications s/to solely/solely to/ of switching to exclusively using recently added keys or removing revoked keys too soon. s/of switching to ... too soon/of prematurely switching to .../ 4. Timing Associated with RFC5011 Processing These sections define a high-level overview of [RFC5011] processing. s/These sections define/The subsections below give/ OLD: These steps are not sufficient for proper RFC5011 implementation, but NEW: The description is not by itself sufficient for a full RFC5011 implementation, but 4.1. Timing Associated with Publication RFC5011's process of safely publishing a new DNSKEY and then assuming RFC5011 Resolvers have adopted it for trust falls into a number of high-level steps to be performed by the SEP Publisher. This document s/falls into/can be broken down into/ discusses the following scenario, which the principle way RFC5011 is s/principle/principal/ 5. Denial of Service Attack Walkthrough If an attacker is able to provide a RFC5011 Resolver with past responses, such as when it is in-path or able to perform any number s/in-path/on-path/ 5.1. Enumerated Attack Example The following example settings are used in the example scenario s/example settings/settings/ attack. The timing schedule listed below is based on a SEP Publisher s/timing schedule listed/timeline/ "T+0". All numbers in this sequence refer to days before and after s/sequence/timeline/ was introduced into the fictitious zone being discussed. s/fictitious/example/ In this dialog, we consider two keys within the example zone: s/dialog/exposition/ K_old: An older KSK and Trust Anchor being replaced. K_new: A new KSK being transitioned into active use and expected to become a Trust Anchor via the RFC5011 automated trust anchor update process. 5.1.1. Attack Timing Breakdown The steps shows an attack that foils the adoption of a new DNSKEY by s/The steps shows/Below we examine/ 6. Minimum RFC5011 Timing Requirements First, we define the term components used in all equations in Section 6.1. s/term components/component terms/ 6.1.6. timingSafetyMargin Mentally, it is easy to assume that the period of time required for s/Mentally/Naively/ will be entirely based off the length of the addHoldDownTime. s/based off/based on/ protocol and in operational realities in deploying it require waiting s/and in/and the/ and additional period of time longer. In subsections Section 6.1.6.1 s/and/an/ 6.1.6.1. activeRefreshOffset Security analysis of the timing associated with the query rate of s/Security analysis/An analysis/ (at time T), the resolver would send checking queries at T+7, T+14, s/(at time T)/(at time T+0)/ The activeRefreshOffset term defines this time difference and becomes: activeRefreshOffset = addHoldDownTime % activeRefresh The % symbol denotes the mathematical mod operator (calculating the remainder in a division problem). This will frequently be zero, but can be nearly as large as activeRefresh itself. Given imperfect clocks, lost packets, ... I would argue that it necessary to pessimistically just set "activeRefreshOffset = activeRefresh" and NOT assume that exact divisibility is operationally meaningful. Very small differences in either value in the expression can easily change values near zero to values near the upper bound, so the upper bound is the only sound choice I would think. Perhaps the "clockskewDriftMargin" in the next section accounts for this, but I am a bit skeptical at first glance. 6.1.6.3. retryDriftMargin that it becomes impossible to predict, from the perspective of the PEP Publisher, when the final important measurement query will Should PEP be SEP here? s/final important/conclusive/ 6.1.6.4. timingSafetyMargin Value The activeRefreshOffset, clockskewDriftMargin, and retryDriftMargin parameters all deal with additional wait-periods that must be accounted for after analyzing what conditions the client will take longer than expected to make its last query while waiting for the addHoldDownTime period to pass. But these values may be merged into a single term by waiting the longest of any of them. We define timingSafetyMargin as this "worst case" value: timingSafetyMargin = MAX(activeRefreshOffset, clockskewDriftMargin, retryDriftMargin) timingSafetyMargin = MAX(addWaitTime % activeRefresh, activeRefresh, activeRefresh) timingSafetyMargin = activeRefresh Here we see that the choice of "addWaitTime % activeRefresh" vs. just "activeRefresh" is not material, and could probably have been made at the outset. 6.1.7. retrySafetyMargin None the less, we do offer the following as one method considering s/None the less/Nonetheless/ numResolvers: The number of client RFC5011 Resolvers With the successRate and numResolvers values selected and the definition of retryTime from RFC5011, one method for determining how many retryTime intervals to wait in order to reduce the set of uncompleted servers to 0 assuming normal probability is thus: Here the text needs to be considerably more clear. The first observation is that "uncompleted servers" is not defined. It seems this is the expected number of resolvers that failed to acquire the new trust anchor. If so, this should be stated clearly. It is also rather unclear what is normally distributed, and why such a distribution is reasonable to assume. It seems to me that tuning to achieve zero failure cases for each size of the resolver population is (to put it kindly) not necessarily sound. Instead, one might want to achieve an acceptably low probability of any chosen resolver failing due to random packet loss, and to handle non-random "short-term" outages (which may last days if say hypothetically that the US East-coast grid goes down for 3 days, or much more likely some home computer is shut down for a few weeks, while the administrator is on vacation). So here, the zone administrator needs to stretch the retry interval by a fudge factor and/or to a minimum time they're comfortable with, but I don't see much relevance of "numResolvers" or plausibility of any sort of "normal distribution" model. When an ISP has an outage, or power is lost, or there's a DDoS attack, lost queries are highly correlated. Therefore, I would discard the table, and just recommend a sensible fudge factor that is likely to work well enough in practice. Say strength the retry time by a factor of 5 to account for transient connectivity loss, software restarts, ... and also set a minimum time for less random "short-term" outages one is willing to tolerate. No finite time can protect a resolver or set of resolvers subject to a sustained long-term DoS, and these will need to be manually rekeyed once reliable connectivity is restored. -- Viktor.
- [DNSOP] Review of draft-ietf-dnsop-rfc5011-securi… Viktor Dukhovni
- Re: [DNSOP] Review of draft-ietf-dnsop-rfc5011-se… Wes Hardaker