Re: Is the invariants draft really standards track?

Martin Duke <martin.h.duke@gmail.com> Sat, 20 June 2020 00:56 UTC

Return-Path: <martin.h.duke@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 685F23A0FA2 for <quic@ietfa.amsl.com>; Fri, 19 Jun 2020 17:56:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q6MduqiNHlWz for <quic@ietfa.amsl.com>; Fri, 19 Jun 2020 17:56:16 -0700 (PDT)
Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DAA9B3A0F7A for <quic@ietf.org>; Fri, 19 Jun 2020 17:56:15 -0700 (PDT)
Received: by mail-io1-xd32.google.com with SMTP id r2so13449465ioo.4 for <quic@ietf.org>; Fri, 19 Jun 2020 17:56:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yX21H14U50qkycTl3XSpc/GVu5iBjTrehIS2mRVB3jk=; b=XoIaqkUg4gEbMEmwKzGuk3b6Uq12M2XeuZ8RZydaJhDj/hbPbYv54zSwImwfivwHyY uwJ2TRdwBP4zA1/Ab0GOxS0tvnIhXd2d0um3PEY+bxg4a/G8wFxvURU6yOlxkgux/0Gf 9BRlqRP/6qX19e/+WIKyAYqDDdb4IUZriQkNbjYx8xZZDl/GXoq/PbhtmelCgplsp9L+ 1IJ2rP7vnN/mVDs6zEwxVqGL2TogiOsv8InVQfWllB66kMi15T82GG3CNX65PP17Gqo8 S2VoC/GkZMF78e1E4d2msurOHtrszcDFREwcnvzff+tyUdICtfImG5kaQflrn+iAlxLq yrsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yX21H14U50qkycTl3XSpc/GVu5iBjTrehIS2mRVB3jk=; b=Xe0zDrF9Dvyu2hQYho77EIlxymNj0N3lki2/e9D7RTtNXHpbMn32XM5NXKLYPYXPQr kT6IgJl3UF5xBkf5cHpSUoHGPDjE8jAHImuCQmWRptWj7TSDAvqcr782z4RbRTQs8tsB l93AKasGu6oZuUu9MUCh+z8Y59XZke8uvOswJK0mXQHIi9ObdIWxbuWTuHmta717iYlC wpGWwX7ctGXXq7FkjjulUEVi8a13jDwEsJSR6659SJ8DZs6vb+NYC8hsUKoyEZTjBb12 vTmlcWy5J0YveQfufGBZC+GcqfU3UgBNwWUiAes1YFstRuoeVeZnu6ilqXPczd8nVcA0 Y/Rw==
X-Gm-Message-State: AOAM530X3JU6cndTEB6BUH3J7cGb6q1nY4ZOXJ8R6JwxncxyCvKhiKjS PVG+XzBVre5zxKMkt9vHDX1Cj+W6Dce0ellhG5I=
X-Google-Smtp-Source: ABdhPJz61GG1jYzAyjdRpMHFyYzGU27AD5r8+aap9lQ7TPtXmcGqG+BGCtb3dm/RqiFktis9ldIBAM+keH74/EKSHro=
X-Received: by 2002:a5d:9a13:: with SMTP id s19mr7158330iol.20.1592614574943; Fri, 19 Jun 2020 17:56:14 -0700 (PDT)
MIME-Version: 1.0
References: <CAM4esxQBqfrz24riPQA_VGKcGp_TzW0pqb97KfFMtNdW9pUfDg@mail.gmail.com> <833A693C-62E6-4889-9954-FCE65A839A7C@eggert.org> <CAKcm_gPMO2DtqvKucqVw0zDjSniSOmFD4p1Tp4YLjr9WSWdEUw@mail.gmail.com> <CAJU8_nUN42gGmQof24XD9-EjXedyzcarDyRP8MGe1qW-BZ=+Aw@mail.gmail.com> <9cd91c24-c730-22a4-7aa0-baf61613b3ce@huitema.net> <f4922cdb59014202900de44cc5fea0ff@usma1ex-dag1mb5.msg.corp.akamai.com> <CAM4esxQvwkTvpUcu6-+W5zWo22m-R1jvN7DcCpXfuw8Hb55qsw@mail.gmail.com> <95dd02c92b32472d9cab0dd47b98c637@usma1ex-dag1mb5.msg.corp.akamai.com> <CAM4esxQxxXn27rZEY75-AsHD5VF0fqiV1VDyeSrzQ=-sM7JNCg@mail.gmail.com> <9c2e300c30f74d1794d11cf4334ea07b@usma1ex-dag1mb5.msg.corp.akamai.com> <2c40f3d9-fa40-9834-ac30-36bc9a1a6303@huitema.net> <CA+9kkMBQt001xOVgT=9G8YOOTOJ+9S=OWDwGEeYuKVm46Cq3iQ@mail.gmail.com> <3bb42dfc-17ba-c5c8-03f7-35428756b4c2@huitema.net> <CAM4esxRWVRjVhxyYuuzwDGq_wfTjQHkY6KHG2rEPErO2aHXA0w@mail.gmail.com> <f9e2c611-bb4d-bc80-dfe3-e323a08bfc5b@huitema.net>
In-Reply-To: <f9e2c611-bb4d-bc80-dfe3-e323a08bfc5b@huitema.net>
From: Martin Duke <martin.h.duke@gmail.com>
Date: Fri, 19 Jun 2020 17:56:03 -0700
Message-ID: <CAM4esxQsZba_Ck80hNX23Yd6tZfgLTfG-VNWvBDhn9V7UfNnbg@mail.gmail.com>
Subject: Re: Is the invariants draft really standards track?
To: Christian Huitema <huitema@huitema.net>
Cc: Ted Hardie <ted.ietf@gmail.com>, Jared Mauch <jared@puck.nether.net>, Kyle Rose <krose@krose.org>, "Lubashev, Igor" <ilubashe=40akamai.com@dmarc.ietf.org>, Ian Swett <ianswett=40google.com@dmarc.ietf.org>, Lars Eggert <lars@eggert.org>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000000f2dc05a8797ac9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/zuMIpP3tTqk-pFpCzyot76O0Isc>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Jun 2020 00:56:19 -0000

Ah, I understand now. Thanks for explaining it again.

Oddly enough, the load-balancing part of QUIC-LB provides some DoS
protection by making it easier to check for valid conn IDs on 1RTT packets
further away from the server.

On Fri, Jun 19, 2020 at 4:52 PM Christian Huitema <huitema@huitema.net>
wrote:

>
> On 6/19/2020 4:00 PM, Martin Duke wrote:
>
> >  The problem of the DOS protection box is to drop as much traffic as
> possible while still letting some good in, and also without causing
> performance hits when there are no attacks. There are indeed some thorny
> traffic management issues there. More information helps, and Ben's
> suggestion to look at DOTS is certainly on point.
>
> I am not sure how any of this can work when inbound 1RTT packets can come
> with essentially random IP/port and CID and potentially be valid, given
> there might have been a migration. Some of the QUIC-LB modes at least make
> it hard to guess your way to a valid CID.
>
> The DOS box does not have to worry about what kind of traffic is coming
> in. It just has to open a context for the 5 tuple, and check whether it
> sees 1-RTT packets coming back. And then maybe count the volume of 1-RTT
> packets coming back.
>
> The worry is that one of the bots might start a legitimate connection,
> then disclose its five tuple to the rest of the botnet. The whole botnet
> can then spoof the 5 tuple that was just pin-holed by the DOS box. A simple
> "open-close" logic is thus not good enough. The DOS box must also enforce
> some kind of rate limiting per 5 tuple.
>
> Which also means that if a botnet can predict the 5 tuple used by a
> legitimate connection and then spoof it, it can DOS it. Once you start
> digging that particular rabbit hole, the joy never stops...
>
> -- Christian Huitema
>
>
>
> DOTS is an intriguing idea, but again I'm not sure what signatures one
> could use to configure it.
>
> On Fri, Jun 19, 2020 at 3:43 PM Christian Huitema <huitema@huitema.net>
> wrote:
>
>>
>> On 6/19/2020 3:05 PM, Ted Hardie wrote:
>>
>> Hi Christian,
>>
>> A question in-line,
>>
>> On Fri, Jun 19, 2020 at 2:49 PM Christian Huitema <huitema@huitema.net>
>> wrote:
>>
>>> When under DOS attack, you want to "minimize blowback", i.e., as much as
>>> possible avoid generating packets in response to attack traffic. So, yes, a
>>> server may choose to not send stateless resets to anyone when under attack;
>>> in fact, my recommendation would be that a server SHOULD NOT send stateless
>>> resets to anyone when under attack.
>>>
>>> That said, Igor raised an interesting point about return traffic. It
>>> would be very nice if DOS protection boxes could distinguish between
>>> "validated traffic" that the server presumably intends to process, and
>>> "unsolicited traffic" that will just consume resource. The box can then
>>> reserve some share of the resource for validated traffic, and place the
>>> rest of the traffic in a lower priority queue. Fine, but there needs to be
>>> a test. The classic test is that incoming traffic is "validated" if the
>>> protection box can match it with return traffic coming from the server --
>>> for some definition of matching.
>>>
>>> From that point of view, stateless reset is definitely not helpful. But
>>> problematic traffic goes beyond that. The server will reply to a client's
>>> initial with a server's initial packet. Does that validate the response
>>> traffic? OK, maybe the protection box can programmed to only validate
>>> traffic if it sees 1RTT packets. But many servers will send 1-RTT packets
>>> as 0.5 RTT. Does that validate the response traffic?
>>>
>>> We might say that traffic is validated when the handshake is confirmed,
>>> but the protection box does not understand the TLS handshake, it just sees
>>> packet types and packet sizes. It cannot distinguish between 0.5RTT data
>>> and 1RTT data, and thus the closest approximation of "validation" would be
>>> seeing more than an initial window worth of traffic coming back from the
>>> server. That does not sound great.
>>>
>>> On the other hand, things get much better if the server under attack can
>>> adopt a defensive posture and help the DOS protection box do its job.
>>> Suppose that a server can detect that it is under attack -- or be
>>> explicitly configured so. The simplest defensive posture would be to (1)
>>> disable stateless reset and (2) not send any 0.5RTT packet, including in
>>> response to 0-RTT. The protection boxes can at that point take the 1-RTT
>>> packets from the server as indicating validation.
>>>
>> Perhaps I'm misreading you here, but this sounds like you expect the
>> protection boxes to be able to distinguish when servers see themselves as
>> under DOS from when they don't, so that they can tell that the lack of
>> 0.5RTT is an indication of an attack response.  Given distribution patterns
>> of DOS attacks, I'm struggling to see whether that will commonly be the
>> case.
>>
>> Maybe I wasn't too clear. I don't think that DOS protection boxes are
>> trying to take directions from servers. I also don't believe that DOS
>> protection boxes have any practical means of distinguishing between 0.5 RTT
>> and 1 RTT traffic. That might actually be a problem.
>>
>>
>> You could, of course, always put handshake traffic into low-priority
>> queues until you see the 1-RTT packets that validate the server's
>> interest.  That would make 1-RTT traffic effectively a path signal of
>> validation.
>>
>> Yes, that's more or less what I was considering. But of course it only
>> works if the server refrains from sending 1RTT packets until the handshake
>> is confirmed. Otherwise, the protection boxes will have to count the number
>> of 1RTT packets coming back from the server, or maybe the amount of data
>> coming from the server, and only consider the traffic validated if number
>> of packets and amount of data are larger than common values of IW. You
>> might end up with 3 classes instead of 2 -- validated if 1RTT > IW,
>> validation in progress if 1RTT see but < IW, not validated yet if no 1RTT
>> seen.
>>
>> My concern with that is that using those low priority queues during the
>> handshake phase seems likely to result in worse latency and increased risk
>> of packet loss (which can be tricky during that phase).  That seems a heavy
>> price to pay during non-attack times for better protection when under
>> attack.
>>
>> The problem of the DOS protection box is to drop as much traffic as
>> possible while still letting some good in, and also without causing
>> performance hits when there are no attacks. There are indeed some thorny
>> traffic management issues there. More information helps, and Ben's
>> suggestion to look at DOTS is certainly on point.
>>
>> I am just doing a thought experiment so far. One immediate observation is
>> that in the absence of other data, DOS protection designers are likely to
>> look at packet types, and certainly reason about 1RTT packets versus long
>> header packets. This is definitely an ossification risk.
>>
>> The other observation is that we have to be careful about the realism of
>> thought experiments. If I remember correctly, the payload of the Blaster
>> worm was something like:
>>
>>     On each bot
>>
>>         Open 256 threads
>>
>>             In each thread, loop on "GET some large page from the server"
>>
>> Reasoning about validated traffic would not stop that. But reasoning
>> about validated traffic forces the attacker to disclose the "real" IP
>> addresses of the attacking bots, which then enables a second line of
>> defense.
>>
>> -- Christian Huitema
>>
>>
>>
>>
>> Am I misunderstanding how this is distinguished?
>>
>> Clue appreciated,
>>
>> Ted
>>
>>
>>
>>> Maybe we should specify that.
>>>
>>> -- Christian Huitema
>>> On 6/19/2020 1:42 PM, Lubashev, Igor wrote:
>>>
>>> > There is no need for servers to decrypt CIDs in QUIC-LB. Presumably
>>> the server has a lookup table for its CIDs.
>>>
>>>
>>>
>>> Sending a stateless reset in response to a junk packet would cost more
>>> CPU than verifying CID integrity.  But, yes, a server may choose to not
>>> send stateless resets to anyone when under attack.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Duke <martin.h.duke@gmail.com> <martin.h.duke@gmail.com>
>>> *Sent:* Friday, June 19, 2020 2:44 PM
>>>
>>> >Unfortunately, Retry system protects only server's memory state and
>>> some CPU cycles spent on crypto.  (Servers still need to decrypt CID to
>>> decide it is invalid, and if the attacker is clever enough to establish one
>>> valid connection and use that CID in a flood, the server will also be
>>> decrypting packets.)
>>>
>>>
>>>
>>> There is no need for servers to decrypt CIDs in QUIC-LB. Presumably the
>>> server has a lookup table for its CIDs.
>>>
>>>
>>>
>>> It is true that Retry Services (and indeed, the Retry concept as a
>>> whole) does nothing to protect network capacity.
>>>
>>>
>>>
>>> On Fri, Jun 19, 2020 at 8:08 AM Lubashev, Igor <ilubashe@akamai.com>
>>> wrote:
>>>
>>> It looks like
>>> https://tools.ietf.org/html/draft-ietf-quic-load-balancers-02#section-5
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dietf-2Dquic-2Dload-2Dbalancers-2D02-23section-2D5&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=eTT-BoZ1fMitywKRVSxpU3js0lhO0qkspYTKljvj-ys&s=1x7AN2a51X_zV5xSyK0uCL8vZ5cugD3n4lbFWZDqVQg&e=>
>>> is an excellent discussion of Retry mechanics.  It definitely deserves a
>>> reference from this manageability draft.
>>>
>>>
>>>
>>> The Retry mechanisms described in LB draft are all cooperating boxes,
>>> and servers must be aware of them.  Unfortunately, Retry system protects
>>> only server's memory state and some CPU cycles spent on crypto.  (Servers
>>> still need to decrypt CID to decide it is invalid, and if the attacker is
>>> clever enough to establish one valid connection and use that CID in a
>>> flood, the server will also be decrypting packets.)  Retry does nothing to
>>> protect network resources.
>>>
>>>
>>>
>>> The PR I opened (https://github.com/quicwg/ops-drafts/pull/94
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_quicwg_ops-2Ddrafts_pull_94&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=eTT-BoZ1fMitywKRVSxpU3js0lhO0qkspYTKljvj-ys&s=VpvhQ9n79rANIk3O5Sm7PZyToFQEennoPt9iqFpWbq8&e=>)
>>> is about uncooperating devices that try to mitigate volumetric network
>>> attacks.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Duke <martin.h.duke@gmail.com>
>>> *Sent:* Wednesday, June 17, 2020 8:16 PM
>>>
>>> Hi Igor, you might want to check out the "Retry Services" bit of the
>>> QUIC-LB draft. This has something to do with the DDoS use case you discuss.
>>>
>>>
>>>
>>> On Wed, May 27, 2020 at 9:07 AM Lubashev, Igor <ilubashe@akamai.com>
>>> wrote:
>>>
>>> I’m working on a manageability draft PR for this (how to rate limit UDP
>>> to reduce disruption to QUIC if you have to rate limit UDP).  ETA end of
>>> the week (if I do not get pulled into something again).
>>>
>>>
>>>
>>> The relevant observation is that DDoS with UDP that is indistinguishable
>>> from QUIC will happen.  UDP is already the most prevalent DDoS vector,
>>> since it is easy for a compromised non-admin app to send a flood of huge
>>> UDP packets (with TCP you get throttled by the congestion controller).  So
>>> there WILL be DDoS protection devices out there to try to mitigate the
>>> problem, possibly by observing both directions of the flow and deciding
>>> whether a packet belongs to a valid flow or not.
>>>
>>>
>>>
>>> Since such middle boxes will be created, the more explicit and normative
>>> Invariants are about what one can expect, the less such middle boxes may
>>> decide for themselves.  For example (I did not think long about it), if
>>> some elements of path validation could land into Invariants (roughly, “no
>>> more than X packets/bytes can be sent on a new path w/o a return packet”),
>>> a DDoS middle box may use this fact and active connection migration might
>>> still have a chance during an attack (NAT rebinding could be linked by DDoS
>>> boxes to an old connection via unchanged CID).
>>>
>>>
>>>
>>>    - Igor
>>>
>>>
>>>
>>>
>>>
>>> *From:* Christian Huitema <huitema@huitema.net>
>>> *Sent:* Wednesday, May 27, 2020 11:34 AM
>>>
>>> On 5/27/2020 8:28 AM, Kyle Rose wrote:
>>>
>>> On Wed, May 27, 2020 at 10:34 AM Ian Swett <ianswett=
>>> 40google.com@dmarc.ietf.org> wrote:
>>>
>>> I was agreeing with MT, but I'm happy to see some more MUSTs added if
>>> people feel that'd be helpful.
>>>
>>>
>>>
>>> Coincidentally, we were just talking about this internally at Akamai
>>> yesterday. IMO, an invariants document isn't really helpful if it isn't
>>> normative, and for it to be normative it (or a related practices doc for
>>> operators) really needs to spell out clear boundaries for operators with
>>> MUSTs..
>>>
>>>
>>>
>>> The example that came up yesterday was around operators filtering QUIC
>>> in the event of a DDoS: one recommendation based on some conversations
>>> going back at least to Prague 2019 was to hash packets on 4-tuple and
>>> filter those below a hash value chosen for a desired ingress limit instead
>>> of doing what most operators do with UDP today, which is to cap UDP
>>> throughput and just drop packets randomly or tail drop.
>>>
>>> Interesting. Did they consider using the CID, or a fraction of it? This
>>> looks entirely like the scenario for which we developed stateless retry.
>>>
>>>
>>>
>>> This recommendation certainly imposes some constraints on future
>>> protocol development that motivate new invariants: for instance, it would
>>> preclude sharding a connection across multiple source ports (not that there
>>> is necessarily a reason to do this; it's just an example). But more
>>> importantly, it goes beyond invariants: it's one among many practices
>>> compatible with the current set of invariants, some reasonable and some
>>> terrible.
>>>
>>> This would break the "preferred address" redirection. Preferred address
>>> migration may or may not be spelled out in the invariants.
>>>
>>>
>>>
>>> Operators are going to do things to QUIC traffic, so it would be good to
>>> offer them recommendations that are compatible with broad deployability.
>>>
>>>
>>>
>>> Yes, we do need the invariants for that.
>>>
>>> -- Christian Huitema
>>>
>>>