Re: [tsvwg] Deprecating RFC 3168 for future ECN experimentation

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Tue, 30 March 2021 11:15 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 336FF3A0B91 for <tsvwg@ietfa.amsl.com>; Tue, 30 Mar 2021 04:15:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.619
X-Spam-Level:
X-Spam-Status: No, score=-1.619 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_DOTEDU=0.28] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QXSFfcnXxpS9 for <tsvwg@ietfa.amsl.com>; Tue, 30 Mar 2021 04:15:52 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [IPv6:2001:630:42:150::2]) by ietfa.amsl.com (Postfix) with ESMTP id B26D93A0B8C for <tsvwg@ietf.org>; Tue, 30 Mar 2021 04:15:52 -0700 (PDT)
Received: from GF-MBP-2.lan (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id D5B8E1B0022C; Tue, 30 Mar 2021 12:15:47 +0100 (BST)
To: Pete Heist <pete@heistp.net>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <1b673100019174d056c44339d3b1758df058a2aa.camel@petri-meat.com> <fc0e7ffe6cb66896000be498bf2be8ca1abd3fd7.camel@heistp.net> <4ddecbd6-184e-bc38-aae0-22a64d0de29b@kit.edu> <88026fb8957278dce24b8f6cbdbfecc0c3bb5233.camel@heistp.net>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Message-ID: <dddecdb7-4514-318f-4183-b2f79d3994eb@erg.abdn.ac.uk>
Date: Tue, 30 Mar 2021 12:15:46 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.9.0
MIME-Version: 1.0
In-Reply-To: <88026fb8957278dce24b8f6cbdbfecc0c3bb5233.camel@heistp.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/9Y9f9tIuXhcvsOPh1ieW9lar81I>
Subject: Re: [tsvwg] Deprecating RFC 3168 for future ECN experimentation
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Mar 2021 11:15:57 -0000

Please see below:

On 30/03/2021 11:56, Pete Heist wrote:
> ...
>
> On Mon, 2021-03-29 at 10:26 +0200, Bless, Roland (TM) wrote:
>> Hi,
>>
>> On 27.03.21 at 15:41 Pete Heist wrote:
>>> I agree overall. If we want to introduce a proposal that's
>>> incompatible
>>> with RFC3168, we should first make it historic.
>> I just want to add that RFC 3168 is a proposed standard, whereas we
>> want to introduce L4S as an _experiment_. All the recent work of the
>> AQM WG would be also void and I thought that vendors are actually
>> implementing this in home routers. I think that we must be really sure
>> about RFC3168 (non-)deployment before deprecating RFC3168. Even RFC
>> 8311
>> still stated: "Forwarding behavior as described in RFC 3168 remains the
>> preferred approach for routers that are not involved in ECN
>> experiments,".
> Just to clarify, although I think it *would* be the right approach to
> deprecate RFC3168 in order to redefine its semantics, I don't
> personally support doing that. Its benefits to typical Internet traffic
> may not be dramatic, but on the other hand it is fully compatible with
> non-ECN traffic, and avoids some packet loss and retransmissions, while
> serving as a reliable base to build upon.
> I also think recent discussion suggests that there is enough support
> for the integrity of RFC3168 that traffic from new proposals, if not
> guarded by a DSCP, must be fully compatible with existing deployed 3168
> middleboxes, effectively including those with single queue AQMs.
>>> Before we do that though, we should make sure that the current CE
>>> is
>>> not actually useful. Figure 5 in this paper suggests some benefit
>>> to
>>> two bits of signal as opposed to one:
>>> http://buffer-workshop.stanford.edu/papers/paper34.pdf
>>>
>>> A second signal provides a harder backoff without packet loss, for
>>> example during capacity changes or flow introductions. It wouldn't
>>> be
>>> ideal to deprecate RFC3168, only to find out that another bit of
>>> signal
>>> in line with CE, along with ABE in RFC8511, or something similarly
>>> deployable with today's equipment, is still useful.
>> I'm also in favor of having better (i.e., more fine grained)
>> explicit congestion signals as end-systems may use them to
>> come up with better decisions. I expressed this clearly in the
>> ECT(1) input cs. output signal discussion.
>> Now, if we come up
>> with a DSCP as additional safeguard qualifier for L4S,
>> which I support, then it makes sense to revisit the earlier
>> WG decision and use ECT(1) as additional congestion signal.
>>> It's also my position that we can't ignore existing RFC3168
>>> bottlenecks, not just for safety but also for performance. The recent
>>> ISP study we did suggested RFC3168 AQMs may be present on ~10% of
>>> Internet paths there. Prior to that we heard 5% elsewhere. Whatever
>>> the
>>> number is exactly, these AQMs do exist and mark in response to both
>>> ECT(0) and ECT(1). If you introduce traffic that backs off much less
>>> in
>>> response to CE, the AQMs may operate sub-optimally, since they
>>> weren't
>>> designed with that kind of traffic in mind
>>> (https://github.com/heistp/l4s-tests/#intra-flow-latency-spikes).
>> In case home routers are using RFC 3168 and congestion is in
>> downstream direction one would only be able to detect this by seeing
>> ECE in the other direction. Maybe servers from large content providers
>> could also provide data about this...
> That's also what we saw in the ISP study at the gateway, that ECE is a
> more reliable indicator of AQM deployment than CE:
>
> https://www.ietf.org/archive/id/draft-heist-tsvwg-ecn-deployment-observations-02.html#name-tcp-initiated-from-lan-to-w
>
> Pete

Thanks, I think would agree on your summary of the "non dramatic" 
benefits for existing using the currently standardised RFC-3168 marking, 
and hopefully what you say is not far from the summary we arrived at in 
RFC 8087.

I see arguments in recent discussions for retaining backwards 
compatibility with this and also arguments stressing other aspects.

Gorry

>> Regards,
>>    Roland
>>
>>> On Fri, 2021-03-26 at 13:01 -0400, Steven Blake wrote:
>>>> A lot (not all) of the recent arguments revolve around the
>>>> assumption
>>>> by some that RFC 3168 ECN deployment barely exists in the Internet,
>>>> and
>>>> the few networks where it does can be safely ignored, or cleaned
>>>> out,
>>>> or be expected to take proactive measures to protect themselves,
>>>> which
>>>> may in practice require them to lobby their router vendors to spin
>>>> patch releases to enable (some of) the mitigation measures detailed
>>>> in
>>>> -l4ops-02 Sec. 5.
>>>>
>>>> If that is the WG consensus, then I *strongly urge* the WG to do
>>>> the
>>>> following:
>>>>
>>>> 1. Push to move RFC 3168 ECN to Historic
>>>>
>>>> 2. Adopt the following "New ECN" signals for future ECN
>>>> experimentation:
>>>>
>>>> - Not-ECT
>>>> - ECT
>>>> - CE-a
>>>> - CE-b
>>>>
>>>> This second step would allow for two sets of experiments. The
>>>> semantics
>>>> of CE-a and CE-b for the first set of experiments would be as
>>>> follows:
>>>>
>>>> - CE-a: "Decelerate"
>>>> - CE-b: "Decelerate harder" (multiplicative decrease)
>>>>
>>>> The exact behavior elicited by the "Decelerate" signal would be the
>>>> subject of investigation. Since we are certain that any remaining
>>>> RFC
>>>> 3168 deployments can be safely ignored, then ECT/CE-a/CE-b can be
>>>> used
>>>> as unambiguous signals to steer packets into a low-latency queue,
>>>> if
>>>> desired.
>>>>
>>>> The semantics of CE-a and CE-b for the second set of experiments
>>>> would
>>>> be as follows:
>>>>
>>>> - CE-a: "Decelerate"
>>>> - CE-b: "Accelerate"
>>>>
>>>> An aggressive fraction (100%?) of CE-b marked packets traversing a
>>>> queue not in "Accelerate" state would be re-marked to either CE-a
>>>> or
>>>> ECT. Any packet discard (or detection of high delay variation?)
>>>> must
>>>> disable the transport's "Accelerate" mechanism for some interval
>>>> and
>>>> should cause the transport to revert to "TCP-friendly" behavior for
>>>> some (different?) interval. The exact behaviors of "Accelerate" and
>>>> "Decelerate" signals would be the subject of investigation. Again,
>>>> since we are certain that any remaining RFC 3168 deployments can be
>>>> safely ignored, then ECT/CE-a/CE-b can be used as unambiguous
>>>> signals
>>>> to steer packets into a low-latency queue.
>>>>
>>>> The differences between these two sets of experiments hinge on
>>>> whether
>>>> there is more utility in an "Accelerate" signal coupled with a
>>>> "Decelerate" signal, or with two separate levels of "Decelerate"
>>>> signals. Since it is WG consensus that the RFC 3168 ECN experiment
>>>> failed after two decades, we probably only get one more chance to
>>>> get
>>>> this right, so careful and exhaustive experimentation which
>>>> explores
>>>> the design space is in order.
>>>>
>>>> Obviously, both sets of experiments cannot be run simultaneously on
>>>> intersecting parts of the Internet. I leave the options for safely
>>>> isolating these experiments as an exercise for the reader. Since we
>>>> are
>>>> certain that any remaining RFC 3168 ECN deployments can be safely
>>>> ignored, I suggest choosing bit assignments for the four signals
>>>> that
>>>> induce maximum pain in the obstinate minority that might still
>>>> deploy
>>>> RFC 3168 ECN.
>>>>
>>>> Now, *if it is not WG consensus* that any existing RFC 3168 ECN
>>>> deployments can be safely ignored, then I *strongly urge* the WG
>>>> *to
>>>> not adopt* experimental proposals that place burden and/or risk on
>>>> networks that have deployed it.
>>>>
>>>>
>>>> TL;DR: Either RFC 3168 ECN exists in the Internet, or it doesn't.
>>>> Decide, and act appropriately.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> // Steve
>>>>
>>>>
>>>>
>>>