Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

Michael StJohns <msj@nthpermutation.com> Tue, 12 December 2017 19:34 UTC

Return-Path: <msj@nthpermutation.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E062129524 for <dnsop@ietfa.amsl.com>; Tue, 12 Dec 2017 11:34:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nthpermutation-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l5WwdR1ja_sG for <dnsop@ietfa.amsl.com>; Tue, 12 Dec 2017 11:34:41 -0800 (PST)
Received: from mail-qt0-x235.google.com (mail-qt0-x235.google.com [IPv6:2607:f8b0:400d:c0d::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 76A1A129510 for <dnsop@ietf.org>; Tue, 12 Dec 2017 11:34:41 -0800 (PST)
Received: by mail-qt0-x235.google.com with SMTP id 33so50279004qtv.1 for <dnsop@ietf.org>; Tue, 12 Dec 2017 11:34:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nthpermutation-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=LP36JAHfFpGD2WFya/ZZurwzjr8PA2695Z76fNhVLAg=; b=LLZPcGZLkZP+ORIXEb9g1S4/L3AeDbLwdvHTreqvPq84z28mPy/1ITQ+outZQAXLF5 IJJN1AGcR1lp6DoLd2o21kHAS2E8CE/nbvLFUNKjuOrmZEZeMJdX/kk2qveMSYXD9vYL 0w3fK5fcnh/7sBIfOXRbQf8q82zuw2Vv0EUWRDe9fqoYfNgEIDU4SAl58puLjiqKDqmp 4y9syUyhO+nyy76H/8o+zyTUyvP+OueGYEltBWyUbJhIboi4IrpiyYsp6XyVmMPPpVBk Fc158zf6LMDZb5J4xyxHCBnN/50/i6UdEL5w9gwelVO84Ag1+NQKQDvdkD19PNgf9RDT 8Y6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=LP36JAHfFpGD2WFya/ZZurwzjr8PA2695Z76fNhVLAg=; b=XekYn4k0Beb71GnF//U+pWAIvURl3M+l6wP68PJqktFuoJxjCKIuzyORFjusAP9vXv jolob5/ViFptlnjxzOlVoLtZYjfvq9nb/LZiEKfRIN9HbZ7bTo0AL9obrHR5yxicwzCF T6LVdNJymKAGV2KNtnXv9fqDlkPFPWqgOIQ5d9xKJQqxL2WH+kHaETGN9RZG0ZQYIXwa gtWu4Rq81OPIWl6ULgwc5jMeH1xUXNG4b3ro+6vamhGzSNm7WpZHZZuhrBDsPvFE1q9Q CU7TjDdwRL5iGxtDitKtvrXZU0bLyHXlDMdiq8/EKIsFLXoKgyemuVODFNAZZRDy+gMA JrHQ==
X-Gm-Message-State: AKGB3mI6RXLGu4nJ/pAT/3sIqrGZefjAL/NSmgFqfbE+VthynPveooww tlK+f2nFmvPUsIqkoid+bJvEbUW9
X-Google-Smtp-Source: ACJfBovLw8VBM1KQYRKrQnq/mRU32pFe8cLGavmTljCqBht6+DjyVUFBUh9xGueUejgrHN0xr787VA==
X-Received: by 10.55.78.15 with SMTP id c15mr1574629qkb.105.1513107279968; Tue, 12 Dec 2017 11:34:39 -0800 (PST)
Received: from ?IPv6:2601:152:4400:720f::1009? ([2601:152:4400:720f::1009]) by smtp.gmail.com with ESMTPSA id c2sm6036625qkf.20.2017.12.12.11.34.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 11:34:38 -0800 (PST)
To: Wes Hardaker <wjhns1@hardakers.net>
Cc: dnsop@ietf.org
References: <151199364931.4845.3034001091375154653@ietfa.amsl.com> <yblvahshg6z.fsf@wu.hardakers.net> <9c71768d-4807-3d0a-b4b1-0ac8d066fe9f@nthpermutation.com> <yblindiavlm.fsf@w7.hardakers.net> <6d239b9a-fd1e-46a3-c705-6851dd8ffe0a@nthpermutation.com> <ybl8te8kbaq.fsf@wu.hardakers.net> <142cad85-1e0e-b4c9-1561-ad590984739a@nthpermutation.com> <yblshcfhnai.fsf@wu.hardakers.net>
From: Michael StJohns <msj@nthpermutation.com>
Message-ID: <1ca3daed-d521-0fcd-d1e7-eef2b781707b@nthpermutation.com>
Date: Tue, 12 Dec 2017 14:34:38 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <yblshcfhnai.fsf@wu.hardakers.net>
Content-Type: multipart/alternative; boundary="------------83FD11E17F27E5447AA1F40C"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/OgHuKuO3hx7eK-18F1oJBNTo0aY>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Dec 2017 19:34:44 -0000

On 12/12/2017 12:24 PM, Wes Hardaker wrote:
> Michael StJohns <msj@nthpermutation.com> writes:
>
>> 2) T + activeRefresh  is the time at which the server sees the last
>> query from the last resolver just starting their trust anchor
>> installation.
>> 3) T + activeRefresh + addHoldDownTime is the time at which the server
>> sees the first query from any resolver finalizing its trust anchor
>> installation.
> There is where we disagree.  Given 2, where you state "last query from
> the last resolver is just starting", I argue that exactly an
> addHoldDownTime beyond that is when that *exact same resolver* will
> finish because it will have sampled the first at (2) and again at
> exactly T + activeRefresh + addHoldDownTime and accept it, per 5011:
>
>     Once the timer expires, the new key will be added as a trust anchor
>     the next time the validated RRSet with the new key is seen at the
>     resolver.
>
> And the last query from the last resolver will be at T + activeRefresh +
> addHoldDownTime.

Seriously no.

This isn't this hard.  You need to stop thinking about what's happening 
from the point of view of one client and think about how the server 
views the behavior of the collection of clients.

Dealing with your "attack" scenario and assuming no retransmissions for 
any client before it gets a response to its first query, the earliest 
time that the server can assume that the first client can start its 
addHoldDown timer is right after T (the lastExpirationTime).   The 
latest time that the server can assume that the last client will start 
its addHoldDown timer is the activeRefresh interval after T.

(Assume a queryInterval of 14 hours and a set of 840 clients evenly 
distributed with their refreshes happening one a minute - the last 
client (#840) will make its query at T + activeRefresh and start its 
addHoldDown timer then).

So dealing only with that last client the server has to wait at least T 
+ activeRefresh before it assumes that the client has started its 
addHoldDown and T + activeRefresh + addHoldDown before the client has 
finished its addHoldDown and is about to make its last query.

The best case scenario (from the servers point of view) is when that 
same client has its last query before its addHoldDown time expires at 
just under the activeRefresh interval (e.g. if the expiration is at 
noon, and the active refresh is 1 hour, then the best case if the last 
query was at 11:00:00.00001) causing the query after the addHoldDown 
time to occur at 12:00:00.00001.   The worst case scenario is when ANY 
client has its last query at .00001 before the addHoldDown time expires, 
making the final query happen at the activeRefresh interval after the 
expiration or in the example at 12:59.59.99999.

For a given client assuming no query losses, there are  FLOOR 
(addHoldDown/activeRefresh) queries in the addHoldDown interval (between 
when the client starts its timer and when it goes off). The difference 
addHoldDown - (FLOOR (addHoldDown/activeRefresh) * activeRefresh) is 
this activeRefreshOffset interval you keep trying to put in.  However, 
we do assume losses and we do (and MUST) assume the worst case that at 
least one client out of 10K, 100K or 1M is going to end up doing fast 
queries and changing that difference such that they end up with their 
last query before their addHoldDown timer occurs JUST before it expires.

>
>> Between (2) and (3) any given resolver may drift/retransmit with the
>> result that any given resolver may end up making a query just before
>> (3) placing its next and final query at (3) plus activeRefresh.
> Please forget drift in the top half of the equation.  There is zero
> drift in the mathematically precise section.  We will deal with drift,
> delays, and everything else in the safetyFactor alone, with many terms
> or concepts within it to get it right.

And here's where you go off the rails.   You don't need to include the 
safety factor to deal with the (2) to (3) interval as retransmit can't 
reduce the addHoldDown period, but can reduce - for a given client - the 
number of queries in the period.  Retransmits and drift also can change 
*when* in that interval the given client produces its last query before 
expiration - e.g. cause a "phase shift".


>
>>>     5) will query again at lastSigExpirationTime + 30 days - .000001
>> No - from the servers point of view, the worst client (which is the
>> only one the server cares about) will make its last query before trust
>> anchor installation at lastSigExpirationTime + activeRefresh (when the
>> last CLIENT saw its first valid update)  + 30 days -.0000001.
> Yes, I said that in 6 stating that it was *still waiting*.  IE, #5 was
> supposed to describe the second to last query.
>
>>>     6) notes this is still in waiting period
> Let me put together something, per Paul's request, to work at this from
> another angle where one of us can be shown right or wrong.
>
> [... retry, delay text by me deleted ...]
>
>> And again. NO.  The retransmits over a given set of clients in the
>> addHoldDown period will result in at least one client (the "worst"
>> client) ending up making a query just before the expiration of ITS
>> addHoldDown timer.  Assuming the worst case of at least one client
>> making a query just before the lastSigExpirationTime and that same
>> client drifting/retransmitting enough to make a query just before its
>> addHoldDown time the activeRefreshOffset is a useless value to
>> calculate.
> If you want to put an extra activeRefresh into the safetyMargin to
> account for drift, I'm willing to do so.  Or we can insert a new term
> labeled "driftSafetyMargin" and define it as activeRefresh if you want.
> But that goes below my math line, not above it (and we can relabel
> safetyMargin as retryFailureSafetyMargin).
>
>
>  From a purely security analysis point of view, the first thing we have
> to agree upon is the precise moment at which all clients in a perfect
> world, with *no errors at all* (no drift, no retries, no transmission
> line delays, no CPU processing delays, no clock failures, etc).  Once we
> have this line in the sand in place, then we can introduce real-world
> correctional elements to account for reality sneaking into our perfect
> world.  I'm trying to talk only about the perfectionists world line in
> the sand first, and then introduce needed operational components *after
> that line*.  You keep inserting "drift" (eg) everywhere in the process
> of this argument, which I absolutely agree needs to be dealt with.  But
> below my perfect-world line only.  The way I keep reading everything
> you've written is that your perfect line includes two activeRefreshes,
> which I (still) argue is incorrect.  As I said last time and this time,
> I'd be happy to insert a "drift" term, but lets please label it what it
> is.  If you agree with that, I'll make that change and push.
>

*sigh*

No.   You screw up the analysis doing it that way because it doesn't 
account for the possible "phase shift" that can happen with a client 
during its addHoldDown period.

I've been using "sigExpire + activeRefresh + addHoldDown + activeRefresh 
+ safetyFactor" because its actually easier to calculate.  The actual 
formula is

"sigExpire + (activeRefresh + retransSlop) + addHoldDown + 
(activeRefresh + retransSlop)"  where retransSlop is about 1/2 the 
safety factor.

Both the interval after sigExpire and before addHoldDown and the 
interval after the last activeRefresh require a safety factor to account 
for retransmissions.   The addHoldDown interval does not because retrans 
and drift result in a phase shift within the interval, but do not affect 
the length of the total interval.


A "perfect" system will behave the way you've described - but adding a 
safety factor while ignoring the phase shift brought on by retransmits 
within the addHoldDown interval will not characterize the actual system.




I hope this is visible.  The first group is the "perfect" one where we 
start at random point in the interval [0..activeRefresh].  The second 
group is the one with retransmits inside the add Hold down and where the 
signature expired just after the last refresh.  The third is Wes' 
perfect with the queries starting just as the sigPeriod expires.  Wes 
would add the saftety factor at the end of the third one and call it 
done.  The second one represents the actual worst case to which we'd add 
the safetyFactor to account for drift and retransmits for the two 
activeRefresh intervals.

Can we stop now?


Mike