Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt

Michael StJohns <msj@nthpermutation.com> Tue, 12 December 2017 02:14 UTC

Return-Path: <msj@nthpermutation.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01E2C126D05 for <dnsop@ietfa.amsl.com>; Mon, 11 Dec 2017 18:14:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nthpermutation-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oO6i2p-E8BHF for <dnsop@ietfa.amsl.com>; Mon, 11 Dec 2017 18:14:46 -0800 (PST)
Received: from mail-qt0-x234.google.com (mail-qt0-x234.google.com [IPv6:2607:f8b0:400d:c0d::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B038127275 for <dnsop@ietf.org>; Mon, 11 Dec 2017 18:14:39 -0800 (PST)
Received: by mail-qt0-x234.google.com with SMTP id g9so43770978qth.9 for <dnsop@ietf.org>; Mon, 11 Dec 2017 18:14:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nthpermutation-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=vSmy0ORuom9eYbNR6Xa0eq1tq1baOgJ0cel0dDnyt6c=; b=PcMDqkf4DVRknOLRoGzDROpAfBHhXAOUmz2xhIk15gU92fx+CufnuYlFj5PL/tQnUJ JBQvNO6aQVS/uHw/QGDhLifoFJ0WWyG4Lxeziv3AWOclZLquXYoEJOQ9oDlfmwmkciTl F+0UCizZprhKbbbvpyxx+KZpNSWwxTQlWL4/D/bkmyapEJkz5R/ajFWd1Cbvr6iyc0Qb wQhHNHL6Ic6AFAwb6a7LIL3F1bD/ffBGsZO1RCv9sa35lReK2EgkKB4uvcq8RtgAbl7n ++EksWS3jZ2sGT/oKC6GId0pJ/cqaUYe6IcFeHIDZE0UhzAsJqOTZc4lupSFRbgLQIGs 7NdQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=vSmy0ORuom9eYbNR6Xa0eq1tq1baOgJ0cel0dDnyt6c=; b=XOupDNV9db4pZesd7WG1sUUMMT1c9S9Q7a4+Vg8uKdbVWHDlV1MLdwhLgWTD1UdOOS Q22KmgtvvcMCykqc38aoHBIRvqXKTNJYZltIeeGzen6N/PYmE9k3CAF8xukC6S5PDjwY JvWhaFMPr5C17+ItDbNev5KQi5fJqDeHHi4SJAFsZA/ninGnLRtdnn19HpLmAY/M8LLT fkFMS7/Ld+OiV+DpdvXo16XyvsPMTuJse1Ag/MrkTUZ3F1a0bjE0u6MDuiC6BWNiMPCi MFVXGpQzrO+H1pZxd/NRJNm1CnwK+4Gta2ADP6hiOQoomfh30MmuMysrYPyqEJc8f5Mb RMBg==
X-Gm-Message-State: AKGB3mIsaul/0duHTCilyMqe40+KAKA7YrfBpK0SVdkzmDc5BFb7Lerq 9xlcZwD7YI3EOEFC5OsXvT2oZ0jZ
X-Google-Smtp-Source: ACJfBovbFaI0wek6vsdSCtOdfcSP+zWcOQepD4N6yMCOyoi76vGG8iENjsgSRuK0x3u8yRHPYKFb4A==
X-Received: by 10.55.82.84 with SMTP id g81mr3506882qkb.263.1513044877773; Mon, 11 Dec 2017 18:14:37 -0800 (PST)
Received: from ?IPv6:2601:152:4400:720f::1009? ([2601:152:4400:720f::1009]) by smtp.gmail.com with ESMTPSA id n24sm5044087qta.50.2017.12.11.18.14.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Dec 2017 18:14:36 -0800 (PST)
To: Wes Hardaker <wjhns1@hardakers.net>
Cc: dnsop@ietf.org
References: <151199364931.4845.3034001091375154653@ietfa.amsl.com> <yblvahshg6z.fsf@wu.hardakers.net> <9c71768d-4807-3d0a-b4b1-0ac8d066fe9f@nthpermutation.com> <yblindiavlm.fsf@w7.hardakers.net> <6d239b9a-fd1e-46a3-c705-6851dd8ffe0a@nthpermutation.com> <ybl8te8kbaq.fsf@wu.hardakers.net>
From: Michael StJohns <msj@nthpermutation.com>
Message-ID: <142cad85-1e0e-b4c9-1561-ad590984739a@nthpermutation.com>
Date: Mon, 11 Dec 2017 21:14:35 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <ybl8te8kbaq.fsf@wu.hardakers.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/moTTzqWtbybHZzLVDMSM3RUAdiA>
Subject: Re: [DNSOP] I-D Action: draft-ietf-dnsop-rfc5011-security-considerations-08.txt
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Dec 2017 02:14:48 -0000

On 12/11/2017 8:03 PM, Wes Hardaker wrote:
> Michael StJohns <msj@nthpermutation.com> writes:
>
> Hi Mike,
>
> Thanks for explaining your thinking because I think, after reading it:
> we're actually in agreement but using different terms for where to put
> in the slop you're worried about.
>
> Specifically:
>
>> A perfectly operating resolver with perfect clock and perfect
>> connectivity and no outages MIGHT possibly keep a perfect interval
>> between each query it makes (making your activeRefreshOffset
>> meaningful), but 10000 resolvers ALL keeping perfect intervals?
> Yes, I agree.  But, this is why I want the majority of the equation to
> be defining the mathematical perfect certainty.  And then *after* that,
> add the operational slop factor (safetyMargin) to account for both
> problems and reality (you forgot to add "speed of light issues" in your
> text above, for example ).
(sigh - safety factor deals with speed of light issues....DUH)


No, no, no, no, no.

> Thus, I break the equation into two critical parts:
>
> addWallClockTime = lastSigExpirationTime
>                     + addHoldDownTime
>                     + activeRefresh                   ^
>                     + activeRefreshOffset             |
>                                                       |
> Precise Math                                         |
> -----------------------------------------------------|
> Needed Fuzz                                          |
>                                                       |
>                     + safetyMargin                    |
>                                                       v

If you'd reorder this properly, you can probably get the right answer -  
For the first part of the discussion assume no drifts or retransmits 
between (1) and (2) and between (3) and (4).

1) T == lastSigExpirationTime  and microscopically after this the time 
that the server sees the first query from any resolver starting their 
trust anchor installation.
2) T + activeRefresh  is the time at which the server sees the last 
query from the last resolver just starting their trust anchor installation.
3) T + activeRefresh + addHoldDownTime is the time at which the server 
sees the first query from any resolver finalizing its trust anchor 
installation.
4) T + activeRefresh + addHoldDownTime + activeRefresh is the time at 
which the server sees the last query from the last resolver finalizing 
its trust anchor installation.

(1) is the earliest time any resolver can start its installation 
(assuming an attack) because its also the time when all of the old 
signatures expire.
(2) is the time at which a resolver who had its last activeRefresh just 
before T (and because of that wasn't able to start its installation) 
will send its first installation query.
Between (2) and (3) any given resolver may drift/retransmit with the 
result that any given resolver may end up making a query just before (3) 
placing its next and final query at (3) plus activeRefresh.

Finally, to deal with drift and retransmits between (1) and (2), and 
between (3) and (4) we add a safetyFactor.    That deals with about 
99.9999% of drift and retransmits but will never deal with the servers 
that have been offline or otherwise unable to get their queries 
completed.  The retransmits in a given clients addHoldDown period only 
really move the end point for a given resolver and don't affect the 
overall safetyFactor of the set of resolvers.

>
> IE, if a perfect resolver hitting a RFC5011 zone with an activeRefresh
> that evenly divides into 30 days:
>
>    1) queries at T--- = lastSigExpirationTime - .000001
>    2) queries at T+1--- = lastSigExpirationTime - .000001 + activeRefresh
Yes.
>    3) Notes that it just saw a new key (assuming worst case #1 is replayed)
>    4) starts timer
>    5) will query again at lastSigExpirationTime + 30 days - .000001
No - from the servers point of view, the worst client (which is the only 
one the server cares about) will make its last query before trust anchor 
installation at lastSigExpirationTime + activeRefresh (when the last 
CLIENT saw its first valid update)  + 30 days -.0000001.
>    6) notes this is still in waiting period
>    7) will query again at lastSigExpirationTime + 30 days - .000001 + activeRefresh
Nope.   The worst client will query again at (from the servers point of 
view) lastSigExpiration + activeRefresh + addHoldDown (30) + activeRefresh

 From a given client's point of view the last query can happen anywhere 
from (lastSigExpiration + addHoldDown + .00000001) to 
(lastSigExpriration + activeRefresh + addHoldDown + activeRefresh).   
The server only cares about the worst (latest) case.


>    8) now notes that it's been 30 days and accepts key
>
> There is only 1 activeRefresh in that sequence.  And that's what's in
> the equation.  Because the timing distance between #7 and #2 is still 30
> days when queried to the perfect sub-nano second.

Nope.  Not from the servers point of view.


>
> Then there should be a bunch of delays inserted, network timeouts, etc.
> That's where the safetyMargin should come in and catch all the issues
> with the impreciseness of the real world.  Now, if you want to add an
> activeRefresh to the already defined safetyMargin suggested term, I'm
> willing to consider that.  But it shouldn't be listed as part of
> anything but the slop term for security analysis clarity.
>
> Would you like to add more time to the safetyMargin to deal with the
> non-perfect world, including clock drift because of time delays in a
> bunch of queries back to back or any other reason?
>
>
> Ending note about the precise timeline: when 30 days isn't divisible by
> the activeRefresh, then you need to add the other term we haven't talked
> about much which is the activeRefreshOffset which accounts for this
> case.

And again. NO.  The retransmits over a given set of clients in the 
addHoldDown period will result in at least one client (the "worst" 
client) ending up making a query just before the expiration of ITS 
addHoldDown timer.   Assuming the worst case of at least one client 
making a query just before the lastSigExpirationTime and that same 
client drifting/retransmitting enough to make a query just before its 
addHoldDown time the activeRefreshOffset is a useless value to calculate.

Later, Mike

>
> Cheers,
> Wes