Re: Too many GREASE codepoints

Lucas Pardue <lucaspardue.24.7@gmail.com> Wed, 18 November 2020 00:17 UTC

Return-Path: <lucaspardue.24.7@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BBB23A10C1 for <quic@ietfa.amsl.com>; Tue, 17 Nov 2020 16:17:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.847
X-Spam-Level:
X-Spam-Status: No, score=-1.847 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id T5Xp0MJ1W5dC for <quic@ietfa.amsl.com>; Tue, 17 Nov 2020 16:17:54 -0800 (PST)
Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E185E3A10B7 for <quic@ietf.org>; Tue, 17 Nov 2020 16:17:53 -0800 (PST)
Received: by mail-ed1-x536.google.com with SMTP id t9so88098edq.8 for <quic@ietf.org>; Tue, 17 Nov 2020 16:17:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WeQGnJWVtyIE5dgpANM5jx0HOqQLSkQPcO4XjOFVvy0=; b=fXoSDK4VNL5YTbObzkh7GsqHx6T/ZZcsevBHGPRsKfeSgayiFeaIDVEhfC/BDh2MNQ 7gzYAESmhWGA71390k/44BMSzSht91SgvyHZUGXqITGWJscFZwby+I0wy6DN5vuNjcV1 ByXU4ORxFc7pQI78V2eNqtsAdYLhg+DlAkMs1Zl2Kkz7Y0ugGieKSecBhIBc/p7x9Y7x bdx0HbFS+B7TZ7nhwPD5wvLL/CVf1xJTE8xI6L+l2zvjVmfEsVanWr55rK/MqeKzfNA4 lwwdWqqx8a8vDdDfWPyxHqtSYayA1d7E9gIV4FBUaeOEciTB+JpTJs8TIZhxwNvoUb2S WyqA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WeQGnJWVtyIE5dgpANM5jx0HOqQLSkQPcO4XjOFVvy0=; b=MrPI9VO94voxgHKV7Uok2eTSepWPaCo/gXFjm/QFai60u0PZrGuwg0f0RfKbpCRRr5 hUwLIelzcOvM5iyXWkCd/sLIoDbbLlMmKUR2zjuQ5Ne7MnFQ02Zr4uSTlW6b1rJ52puz wgjF8BDxLfmsQuWJPYOPLlUNfj5tG4KMmx2hUYQSoVDKktvsRk3kTWRvkRuk1sZCVQYH +qg+7Bu4Aobl1aeoaE/ya7wGXF0NNe8DFLQgLhp2RpAoGGNR3upPlHFCorg/n7c8lXds MqH2KpoPyeYdm18MizU3+AZVbESrseKriJQrXzDzgaRP7DwDkDDDTr9kDWa63ZIYL84q rpzA==
X-Gm-Message-State: AOAM531idJ4Gai3ucyieC4Fa7EHbm9Fag05UrgcMbNBaqTs7XVASiFlh 5Hof6kniLAYRzQbw98VBzqZQkVfNb9p51QhoABA=
X-Google-Smtp-Source: ABdhPJxrfuntzlsAnQsEjOcdYriUuvQsN35w5CIgdMySjb29a8B/HNNrhnpbBDQ5qjCMIoDKHnwzT3WkfGSyLYi9dW0=
X-Received: by 2002:a50:af65:: with SMTP id g92mr8122760edd.273.1605658672190; Tue, 17 Nov 2020 16:17:52 -0800 (PST)
MIME-Version: 1.0
References: <59a19e58-ced4-4a38-89da-e9441df162fb@www.fastmail.com> <2a594580-dae0-a678-b5e9-81f040cfa264@huitema.net>
In-Reply-To: <2a594580-dae0-a678-b5e9-81f040cfa264@huitema.net>
From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Wed, 18 Nov 2020 00:17:39 +0000
Message-ID: <CALGR9oZ9_G2rGrpz=qH3j1+17rd8o8gsYk1EHq_9fRLO2q_LFQ@mail.gmail.com>
Subject: Re: Too many GREASE codepoints
To: Christian Huitema <huitema@huitema.net>
Cc: Martin Thomson <mt@lowentropy.net>, QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000c95acc05b4568aba"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/51-_pUjUEagLuu22awHamOszaK8>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2020 00:17:57 -0000

I don't know.

This is just a proposal but do you know if the new grease code points would
collide with any existing extensions in QUIC or HTTP/3? Some short term
pain should not preclude a design change but I'd be more confident knowing
either way.

Having written some extension I-D, the current process of code point
generation is simple but tedious. The process I came to was to extract the
quiche implementation grease code into a standalone application, run it a
couple of times to find something not ugly (but not vanity) and then add 1
to it. That "buys me some land" that I can use for successive
Internet-Draft revisions, each new one simply adds 1 again. Even then I had
some comments that my long codepoint was a bit annoying and so I relented
to shortening the code even further by lopping off N top bits. I took care
to check for grease overlap again but that's a step easily missed.

I'm not advocating my process but it highlights some definciencies in the
UX of random codepoint selection. There's some game theory here wrt to
boundary conditions. Starting on work with a random value
lower-than-but-too close to a grease point risks some discontinuity if your
aim is to use successive codepoints. It's tempting to clamp to the other
side of the boundary (or natural divisions within each grease point gap) in
the name of safety.

I wonder if we could address some of these problems with assistive
technologies, like hosting a JavaScript generator on quicwg.org that does
some collision avoidance based on registries.

Cheers
Lucas

On Tue, Nov 17, 2020 at 11:53 PM Christian Huitema <huitema@huitema.net>
wrote:

>
> On 11/17/2020 3:41 PM, Martin Thomson wrote:
> > (I've been meaning to write this for a few days, but the IANA review
> prompted me to sit down and work through it.)
> >
> > I have come to believe that we are reserving too many codepoints for
> greasing.
> >
> > The greasing function we currently use is 27+N*31.  That's about 2^57
> values.  148764065110560896 by my estimation.
> >
> > The waste I can tolerate.  This is a very large number, but it leaves
> plenty of values free for use.  The problem is that the odds of collision
> are too low.
> >
> > This problem was highlighted when draft-iyengar-quic-delayed-ack-02
> chose a greased codepoint.  The failure mode there is terrible:
> occasionally an implementation will generate a greased value that collides
> with this value.  If an implementation of that draft encounters that
> collision it will fail.  It will probably fail when trying to parse the
> transport parameter, but it could instead fail when it receives the frame.
> Alternatively, is also plausible that an implementation of the draft could
> include a greased value with the same codepoint, which would be similarly
> bad.  Such failures are virtually impossible to reproduce and unless you
> are lucky enough to have good logging, you might have trouble catching the
> reason for the error.
> >
> > The odds that an implementation of that draft would encounter a peer who
> used a greased codepoint with the same value is minuscule.  There are
> around 2^57 valid greasing values, so even if every connection ever
> included greased values, the odds of a given connection containing that
> value is tiny. (Coincidentally, this is the same as the probability we
> allow for an attacker to attack the packet protection AEAD.)
> >
> > I think that we want a higher risk of collision than that.  Then
> mistakes like this will result in a higher probability of failure when
> deployed.  Ideally, failures will be discovered in testing.
> >
> > The reason we have a non-trivial number of reserved codepoints is to
> avoid having implementations that filter out these reserved values by just
> enumerating them.  However, for that we don't need so many reserved
> values.  Even as little as 1000 will create a disincentive to tabulate the
> values.  That many would add a large 8k table to a simple implementation.
> >
> > If we were to select 1000 (or 1024) values in a way that is relatively
> simple to generate from a random number, then that would be better than the
> current situation.
> >
> > So I'm going to propose that we define a new function that takes a
> 10-bit random value and produces a greasing codepoint.  This can be used to
> produce a greasing codepoint.  I have two requirements:
> >
> > * It should be a tiny bit more complex to implement a function that
> tests a value for whether it is reserved.  The current scheme is easy to
> filter.  (X - 27) % 31 == 0 isn't much code to write.
> >
> > * The scheme needs to produce values that encode to 1, 2, and 4 bytes in
> varints with reasonable probability.  An even distribution over 2^62 with
> 1024 values has a gap of 2^52 between each value, which means that 2 byte
> and 4 byte values aren't likely to be represented.
> >
> > The function I've been thinking of looks like this:
> >
> > def grease(n):
> >     v = 27
> >     i = 1
> >     while n > 0:
> >         v = (v << i) | (n & 1)
> >         i = i + 1
> >         n = n >> 1
> >     return v
> >
> > This is bit spreader, taking the bits of the random value and adding
> progressively more space between each bit.
> Looking for a nerd feast, are we?
> >
> > When passed 0x3ff, the resulting value is the 60-bit binary value:
> 0b110111010010001000010000010000001000000010000000010000000001.  Smaller
> values result in different 1 bits being turned off.  If it is passed 0x1ff,
> it produces the shorter 50-bit value of
> 0b11011101001000100001000001000000100000001000000001.
> >
> > The result is two values that encode to 1 byte, 14 values that encode to
> 2 bytes, 48 values that encode to 4 bytes, with the remaining 960 values
> encoding to 8 bytes.
> >
> > Detecting this is fairly easy, but also probably not worth writing out
> the code to do so.  This could be made considerably harder by flipping more
> than one bit at a time.  For example, `v = (v << i) ^ ((n & 1) * 19)`.
> >
> > What do people think?  Is this a change we could tolerate?  Or is my
> logic faulty here in some way?
>
> I don't know. I observe that most extension designers pick a 12 to 16
> bit code that looks cute, and then prepend something like "ff01" to deal
> with draft numbers. A nice property would be to make grease collisions
> up to 16 bits easy to evaluate, and then ensure that patterns like
> "ff121dea" do not collide if the root pattern "1dea" does not.
>
> -- Christian Huitema
>
>