Re: [Int-area] Adam Roach's Discuss on draft-ietf-intarea-provisioning-domains-10: (with DISCUSS and COMMENT)

Tommy Pauly <tpauly@apple.com> Wed, 22 January 2020 23:51 UTC

Sender: tpauly@apple.com
From: Tommy Pauly <tpauly@apple.com>
Message-id: <6A108BAC-9748-4A93-9909-ACD87E8059B4@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_BAD779BA-DA98-4F86-AF55-F09C6A619F40"
MIME-version: 1.0 (Mac OS X Mail 13.0 \(3594.4.17\))
Date: Wed, 22 Jan 2020 15:50:54 -0800
In-reply-to: <3c3fb029-be06-02a2-1ac2-d23a3183d09a@nostrum.com>
Cc: The IESG <iesg@ietf.org>, ek@loon.com, draft-ietf-intarea-provisioning-domains@ietf.org, int-area@ietf.org, intarea-chairs@ietf.org
To: Adam Roach <adam@nostrum.com>
References: <157967080772.28909.16443816599872682093.idtracker@ietfa.amsl.com> <6AFB6A09-59BF-411D-816F-914BAAF86A9B@apple.com> <a1daf959-3331-e86d-2734-1f63a98d7625@nostrum.com> <BF4953C0-2502-4E08-B8B3-B55D04475416@apple.com> <3c3fb029-be06-02a2-1ac2-d23a3183d09a@nostrum.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/XHROCUF66zV0XQzWFcJJkzRnbGc>
Subject: Re: [Int-area] Adam Roach's Discuss on draft-ietf-intarea-provisioning-domains-10: (with DISCUSS and COMMENT)
Precedence: list

Hi Adam,

Thanks for the feedback! The updated paragraph in the retrieval section, to indicate a maximum failure count per attachment, is:

If the request for PvD Additional Information fails due to a TLS error,
an HTTP error, or because the retrieved file does not contain valid PvD JSON,
hosts MUST close any connection used to fetch the PvD Additional Information,
and MUST NOT request the information for that PvD ID again for the duration
of the local network attachment. If a host detects 10 or more such failures
to fetch PvD Additional Information, the local network is assumed to be
misconfigured or under attack, and the host MUST NOT make any further
requests for PvD Additional Information, belonging to any PvD ID, for
the duration of the local network attachment. For more discussion, see {{security}}.

I've also expanded the security considerations DoS section as follows:

An attacker generating RAs on a local network can use the H-flag and the PvD ID
to cause hosts on the network to make requests for PvD Additional Information
from servers. This can become a denial-of-service attack, in which an attacker
can amplify its attack by triggering TLS connections to arbitrary servers in response
to sending UDP packets containing RA messages. To mitigate this attack, hosts
MUST:

- limit the rate at which they fetch a particular PvD's Additional Information;
- limit the rate at which they fetch any PvD Additional Information on a given local
network;
- stop making requests for a PvD ID that does not respond with valid JSON;
- stop making requests for all PvD IDs once a certain number of failures is reached
on a particular network.

Details are provided in {{retr}}. This attack can be targeted at generic web servers,
in which case the host behavior of stopping requesting for any server that doesn't
behave like a PvD Additional Information server is critical. Limiting requests for
a specific PvD ID might not be sufficient if the attacker changes the PvD ID values
quickly, so hosts also need to stop requesting if they detect consistent failure when
on a network that is under attack. For cases in which an attacker is pointing hosts at
a valid PvD Additional Information server (but one that is not actually associated
with the local network), the server SHOULD reject any requests that do not originate
from the expected IPv6 prefix as described in {{serverop}}.

For the delay calculation, you make a good point that the larger values get pretty unnecessarily large! I'm a bit concerned about making the minimum fetch range be ~4 seconds, as that could end up being user visible for some valid scenarios. How about making the formula "2**(10 + Delay)":

The target time for the delay is calculated
as a random time between zero and 2**(10 + Delay) milliseconds,
where 'Delay' corresponds to the 4-bit unsigned integer in
the last received PvD Option.

This limits it to 1 second as what the RA can request for fastest frequency bound. This isn't incredibly fast, and with the overall limits for how many requests can be made by a client (which provide the larger portion of the DoS prevention, I'd argue), I think this strikes a good balance between usability and precaution. Thoughts?

I've updated the GitHub text for anyone wanting to see the full flow: https://github.com/IPv6-mPvD/mpvd-ietf-drafts/pull/25

Thanks,
Tommy

> On Jan 22, 2020, at 2:58 PM, Adam Roach <adam@nostrum.com> wrote:
> 
> Thanks for the explanation and the further proposed mitigation.
> 
> Allowing the RA to specify an arbitrarily small "Delay" parameter seems to still allow for a pretty big burst of traffic. If I read the proposed interpretation of the "Delay" bits correctly (2**(Delay * 2)), the current behavior is specified to allow a delay upper bound selected from one of the following (approximate) values:
> 
> 1 ms
> 4 ms
> 16 ms
> 64 ms
> 256 ms
> 1 second
> 4 seconds
> 16 seconds
> 1 minute
> 4 minutes
> 17 minutes
> 70 minutes
> 4 hours, 40 minutes
> 18 hours 38 minutes
> 3 days, 3 hours
> 1 week, 5 days
> 
> That's a pretty breathtaking scope, and it's hard to imagine that the first six or so are strictly needed, while all six are in a range that might overload a DDoS target. The final several seem a bit questionable as well, given normal operational timelines for network attachment. If the formula were revised to, e.g., "2**(Delay + 12)" instead of the current formula, you would have an enforced lower bound of roughly four seconds (which should be enough to blunt most DDoS attacks), and an upper bound of roughly 37 hours (which still seems excessive, although not quite as much as the previous upper bound).
> 
> Assuming the additional mitigation you propose below (10 maximum failures per attachment) as well as some means of achieving a lower-bound for "Delay" on the order of multiple seconds, I think I'm good clearing when a new version comes out.
> 
> Thanks for your work in thinking through practical solutions to this issue.
> 
> /a
> 
> On 1/22/20 16:22, Tommy Pauly wrote:
>> Hi Adam,
>> 
>> Thanks for taking a look! I'd like to avoid adding extra checks, such as looking for particular DNS records, to avoid deployment complexity and more opportunities for incomplete configuration. As such, I'd like to dig into this a bit further.
>> 
>> If the attacker in this case is a rogue actor on a local network sending out RAs on their local link, any given attacking host would be restricted in their scope to the devices they can reach. Of course, there could be coordination across many different local networks simultaneously, but that also requires more work on the side of the attacker.
>> 
>> The reason for the delay was to limit the impact on a relatively low-powered and unsophisticated local HTTPS server for serving PvD information, which may itself be on the router. I imagine that any large web server deployment would not have any issue with the load generated from a particular local network. Specifically, if we are limiting any given host to requesting only a few times within a 10 second window on the network at all, and the number of hosts on the network is bounded, the number of opportunities for the attacker to cause load on the servers is limited.
>> 
>> Another option, to avoid remaining concern about hitting wildcarded hosts, is to simply say that if the host keeps receiving PvD IDs with bogus (failing) servers, it disables all fetching of additional information for the duration of the network attachment. Networks that do implement better control over RAs (RA-guard, etc) presumably won't have this issue, and since the additional info is optional, it shouldn't cause any major connectivity issues.
>> 
>> If we require such a limit (you only get to fail to fetch 10 times total per attachment, say), does that mitigate things?
>> 
>> Thanks,
>> Tommy
>> 
>>> On Jan 22, 2020, at 1:51 PM, Adam Roach <adam@nostrum.com <mailto:adam@nostrum.com>> wrote:
>>> 
>>> Thanks! The new text is good, but I don't think it's sufficient. I have two remaining concerns in particular:
>>> The mitigation for wildcarded web hosts appears inadequate, especially given:
>>> The mechanism clearly anticipates a scale where it can generate *single* short torrential burst sufficient to knock an average server over (hence the random delay mechanism for fetching data over HTTP). Given that fact, simple rate-limiting will never be enough if a single tight burst of traffic can be orchestrated.
>>> The more I think about it, the more I believe the TXT-based opt-in solution I proposed in my earlier email is a reasonable approach to protect general-purpose web servers from PvD-client-based attacks.
>>> 
>>> One further comment inline below.
>>> 
>>> /a
>>> 
>>> On 1/22/20 15:17, Tommy Pauly wrote:
>>>> Hi Adam,
>>>> 
>>>> Thanks again for bringing this up! I've updated our text to include mitigations for this attack. It can be found here (https://github.com/IPv6-mPvD/mpvd-ietf-drafts/pull/25 <https://github.com/IPv6-mPvD/mpvd-ietf-drafts/pull/25>), but here's an overview of the proposed text:
>>>> 
>>>> In Section 4.1, I've added two new paragraphs. The first describes time limits on fetching PvD info:
>>>> 
>>>> In addition to adding a random delay when fetching Additional Information, hosts
>>>> MUST enforce a minimum time between requesting Additional Information
>>>> for a given PvD on the same network. This minimum time is RECOMMENDED
>>>> to be 10 seconds, in order to avoid hosts causing a denial-of-service on the
>>>> PvD server. Hosts also MUST limit the number of requests that are made to
>>>> different PvD Additional Information servers on the same network within a short
>>>> period of time. A RECOMMENDED value is to issue no more than five PvD
>>>> Additional Information requests in total on a given network within 10 seconds.
>>>> For more discussion, see {{security}}.
>>>> 
>>>> The second also makes clear the behavior to take in case of failure, which will be the case for non-PvD web servers:
>>>> 
>>>> If the request for PvD Additional Information fails due to a TLS error,
>>>> an HTTP error, or because the retrieved file does not contain valid PvD JSON,
>>>> hosts MUST close any connection used to fetch the PvD Additional Information,
>>>> and MUST NOT request the information for that PvD ID again for the duration
>>>> of the local network attachment. For more discussion, see {{security}}.
>>>> 
>>>> In addition, I added text to the Security Considerations:
>>>> 
>>>> An attacker generating RAs on a local network can use the H-flag and the PvD ID
>>>> to cause hosts on the network to make requests for PvD Additional Information
>>>> from servers. This can become a denial-of-service attack if not mitigated.
>>> 
>>> This doesn't really convey the amplification involved, which I think is highly relevant.
>>> 
>>> 
>>> 
>>>> To mitigate
>>>> this attack, hosts MUST limit the rate at which they fetch a particular PvD's
>>>> Additional Information, limit the rate at which they fetch any PvD Additional
>>>> Information on a given local network, and stop making requests to any PvD ID
>>>> that does not respond with valid JSON. Details are provided in {{retr}}. This attack
>>>> can be targeted at generic web servers, in which case the host behavior of stopping
>>>> requesting for any server that doesn't behave like a PvD Additional Information server
>>>> is critical. For cases in which an attacker is pointing hosts at a valid PvD Additional
>>>> Information server (but one that is not actually associated with the local network),
>>>> the server SHOULD reject any requests that do not originate from the expected IPv6
>>>> prefix as described in {{serverop}}.
>>>> 
>>>> The existing text referenced here about server behavior is:
>>>> 
>>>> The server providing the JSON files SHOULD also check whether the
>>>> client address is contained by the prefixes listed in the additional
>>>> information, and SHOULD return a 403 response code if there is no
>>>> match.
>>>> 
>>>> Let me know if this addresses your concerns!
>>>> 
>>>> Best,
>>>> Tommy
>>>> 
>>>>> On Jan 21, 2020, at 9:26 PM, Adam Roach via Datatracker <noreply@ietf.org <mailto:noreply@ietf.org>> wrote:
>>>>> 
>>>>> ----------------------------------------------------------------------
>>>>> DISCUSS:
>>>>> ----------------------------------------------------------------------
>>>>> 
>>>>> Thanks to the authors and working group for their work on this document.  I
>>>>> have one major concern about the ability for this mechanism to be abused to
>>>>> form DDoS attacks, described below. Unfortunately, while I have identified the
>>>>> attack, I don't have an easy solution to propose that mitigates it satisfactorily.
>>>>> 
>>>>> I also have a handful of mostly editorial comments on the document.
>>>>> 
>>>>> ---------------------------------------------------------------------------
>>>>> 
>>>>> §6:
>>>>> 
>>>>> I was expecting to see a discussion of the DDoS attack that may result from a
>>>>> large network (or a rogue host on such a network) sending out a PvD ID
>>>>> containing the hostname of a victim machine, and setting the "H" flag.
>>>>> 
>>>>> Since the messages used to trigger these HTTP connections are extremely
>>>>> lightweight, unauthenticated UDP messages, and the resulting HTTP connections
>>>>> require the exchange of a significant number of packets in addition to a
>>>>> number of cryptographic operations, this is a very high ratio amplification
>>>>> attack, both in terms of network and CPU resources.
>>>>> 
>>>>> Given that the delay setting comes from the network instead of being
>>>>> independently computed by the host, such an attack could be honed to be
>>>>> particularly devastating.  Although it isn't a complete mitigation, one
>>>>> approach to consider would be moving computation of the delay upper bound to
>>>>> the host, or specifying a minimum upper bound of several minutes (where a
>>>>> smaller value will cause the host to use this minimum upper bound).
>>>>> 
>>>>> Regardless of how this is ultimately handled, I think this is a pretty severe
>>>>> risk that needs addressing in the document prior to publication.
>>>>> 
>>>> 
>>> 
>> 
>

[Int-area] Adam Roach's Discuss on draft-ietf-int… Adam Roach via Datatracker
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Warren Kumari
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Ted Lemon
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Ted Lemon
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Adam Roach
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Suresh Krishnan
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Eric Vyncke (evyncke)
Re: [Int-area] Adam Roach's Discuss on draft-ietf… Tommy Pauly