Re: [Int-area] Adam Roach's Discuss on draft-ietf-intarea-provisioning-domains-10: (with DISCUSS and COMMENT)

Adam Roach <adam@nostrum.com> Thu, 23 January 2020 00:09 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9AB0012003F; Wed, 22 Jan 2020 16:09:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.403
X-Spam-Level:
X-Spam-Status: No, score=-1.403 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HTML_MESSAGE=0.001, KHOP_HELO_FCRDNS=0.275, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=nostrum.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ICli58qf0HPZ; Wed, 22 Jan 2020 16:09:00 -0800 (PST)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B59FB12002E; Wed, 22 Jan 2020 16:09:00 -0800 (PST)
Received: from Zephyrus.local (99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id 00N08ucT026472 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 22 Jan 2020 18:08:57 -0600 (CST) (envelope-from adam@nostrum.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nostrum.com; s=default; t=1579738139; bh=vHEJxz4aqn13Iikt9Dk2WQjmif2dp2qZSFdnJOHJ4dg=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=rmIHQFkGRCj6BHH7hU5ly1ttygXiNBZA4efa+sTVrAtXbUk9t3p5LAT0H0KnuszdV xJgL/nScnmkvFtGNBhJSzC/DEI05zy0qqIDhJBiYu/54XJ+abyURpD3ZlEvWOlEJGf ddSPMGPl3KFu6mMtFMdgj5JpTASHnS1XEZYAIolo=
X-Authentication-Warning: raven.nostrum.com: Host 99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228] claimed to be Zephyrus.local
To: Tommy Pauly <tpauly@apple.com>
Cc: The IESG <iesg@ietf.org>, ek@loon.com, draft-ietf-intarea-provisioning-domains@ietf.org, int-area@ietf.org, intarea-chairs@ietf.org
References: <157967080772.28909.16443816599872682093.idtracker@ietfa.amsl.com> <6AFB6A09-59BF-411D-816F-914BAAF86A9B@apple.com> <a1daf959-3331-e86d-2734-1f63a98d7625@nostrum.com> <BF4953C0-2502-4E08-B8B3-B55D04475416@apple.com> <3c3fb029-be06-02a2-1ac2-d23a3183d09a@nostrum.com> <7BBB92DD-7C0D-4A30-AE5D-3DB6A8424B9A@apple.com>
From: Adam Roach <adam@nostrum.com>
Message-ID: <4b2b529f-9e67-b6d0-1a9c-b6ad5cd96f01@nostrum.com>
Date: Wed, 22 Jan 2020 18:08:51 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <7BBB92DD-7C0D-4A30-AE5D-3DB6A8424B9A@apple.com>
Content-Type: multipart/alternative; boundary="------------69C9C56057B8585C32B7933D"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/eUJ2MYknxd3YDtsMPwP_eLCwSwE>
Subject: Re: [Int-area] Adam Roach's Discuss on draft-ietf-intarea-provisioning-domains-10: (with DISCUSS and COMMENT)
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jan 2020 00:09:02 -0000

Thanks again for the quick turn-around on this.

Using your proposed 2**(Delay + 10) seems to strike an okay balance, if 
I'm understanding the situation correctly. Double-check my thinking 
here: the scope of RA reach from an attacker will be available only on a 
single local link, which deployments typically limit to on the order of 
500 clients or so. If all 500 are triggered at the same time and smooth 
out their requests over a one-second window, we're looking at a 500 TPS 
load on a web server. That's about 25% the capacity of a relatively 
low-end web server (e.g., Apache running on an Atom 1.66), which seems 
small enough to avoid major issues.

So, unless one of my assumptions above is wrong, I think your proposal 
below is a good solution to the issue. I'll clear my DISCUSS when a new 
version of the draft comes out (I would propose that you wait for 
instructions from your AD about when to do so).

/a

On 1/22/20 17:51, Tommy Pauly wrote:
> Hi Adam,
>
> Thanks for the feedback! The updated paragraph in the retrieval 
> section, to indicate a maximum failure count per attachment, is:
>
>     If the request for PvD Additional Information fails due to a TLS
>     error,
>     an HTTP error, or because the retrieved file does not contain
>     valid PvD JSON,
>     hosts MUST close any connection used to fetch the PvD Additional
>     Information,
>     and MUST NOT request the information for that PvD ID again for the
>     duration
>     of the local network attachment. If a host detects 10 or more such
>     failures
>     to fetch PvD Additional Information, the local network is assumed
>     to be
>     misconfigured or under attack, and the host MUST NOT make any further
>     requests for PvD Additional Information, belonging to any PvD ID, for
>     the duration of the local network attachment. For more discussion,
>     see {{security}}.
>
> I've also expanded the security considerations DoS section as follows:
>
>     An attacker generating RAs on a local network can use the H-flag
>     and the PvD ID
>     to cause hosts on the network to make requests for PvD Additional
>     Information
>     from servers. This can become a denial-of-service attack, in which
>     an attacker
>     can amplify its attack by triggering TLS connections to arbitrary
>     servers in response
>     to sending UDP packets containing RA messages. To mitigate this
>     attack, hosts
>     MUST:
>
>     - limit the rate at which they fetch a particular PvD's Additional
>     Information;
>     - limit the rate at which they fetch any PvD Additional
>     Information on a given local
>     network;
>     - stop making requests for a PvD ID that does not respond with
>     valid JSON;
>     - stop making requests for all PvD IDs once a certain number of
>     failures is reached
>     on a particular network.
>
>     Details are provided in {{retr}}. This attack can be targeted at
>     generic web servers,
>     in which case the host behavior of stopping requesting for any
>     server that doesn't
>     behave like a PvD Additional Information server is critical.
>     Limiting requests for
>     a specific PvD ID might not be sufficient if the attacker changes
>     the PvD ID values
>     quickly, so hosts also need to stop requesting if they detect
>     consistent failure when
>     on a network that is under attack. For cases in which an attacker
>     is pointing hosts at
>     a valid PvD Additional Information server (but one that is not
>     actually associated
>     with the local network), the server SHOULD reject any requests
>     that do not originate
>     from the expected IPv6 prefix as described in {{serverop}}.
>
> For the delay calculation, you make a good point that the larger 
> values get pretty unnecessarily large! I'm a bit concerned about 
> making the minimum fetch range be ~4 seconds, as that could end up 
> being user visible for some valid scenarios. How about making the 
> formula "2**(10 + Delay)":
>
>     The target time for the delay is calculated
>     as a random time between zero and 2**(10 + Delay) milliseconds,
>     where 'Delay' corresponds to the 4-bit unsigned integer in
>     the last received PvD Option.
>
> This limits it to 1 second as what the RA can request for fastest 
> frequency bound. This isn't incredibly fast, and with the overall 
> limits for how many requests can be made by a client (which provide 
> the larger portion of the DoS prevention, I'd argue), I think this 
> strikes a good balance between usability and precaution. Thoughts?
>
> I've updated the GitHub text for anyone wanting to see the full flow: 
> https://github.com/IPv6-mPvD/mpvd-ietf-drafts/pull/25
>
> Thanks,
> Tommy
>
>> On Jan 22, 2020, at 2:58 PM, Adam Roach <adam@nostrum.com 
>> <mailto:adam@nostrum.com>> wrote:
>>
>> Thanks for the explanation and the further proposed mitigation.
>>
>> Allowing the RA to specify an arbitrarily small "Delay" parameter 
>> seems to still allow for a pretty big burst of traffic. If I read the 
>> proposed interpretation of the "Delay" bits correctly (2**(Delay * 
>> 2)), the current behavior is specified to allow a delay upper bound 
>> selected from one of the following (approximate) values:
>>
>>   * 1 ms
>>   * 4 ms
>>   * 16 ms
>>   * 64 ms
>>   * 256 ms
>>   * 1 second
>>   * 4 seconds
>>   * 16 seconds
>>   * 1 minute
>>   * 4 minutes
>>   * 17 minutes
>>   * 70 minutes
>>   * 4 hours, 40 minutes
>>   * 18 hours 38 minutes
>>   * 3 days, 3 hours
>>   * 1 week, 5 days
>>
>>
>> That's a pretty breathtaking scope, and it's hard to imagine that the 
>> first six or so are strictly needed, while all six are in a range 
>> that might overload a DDoS target. The final several seem a bit 
>> questionable as well, given normal operational timelines for network 
>> attachment. If the formula were revised to, e.g., "2**(Delay + 12)" 
>> instead of the current formula, you would have an enforced lower 
>> bound of roughly four seconds (which should be enough to blunt most 
>> DDoS attacks), and an upper bound of roughly 37 hours (which still 
>> seems excessive, although not quite as much as the previous upper bound).
>>
>> Assuming the additional mitigation you propose below (10 maximum 
>> failures per attachment) as well as some means of achieving a 
>> lower-bound for "Delay" on the order of multiple seconds, I think I'm 
>> good clearing when a new version comes out.
>>
>> Thanks for your work in thinking through practical solutions to this 
>> issue.
>>
>> /a
>>
>