Re: [tsvwg] Comments on L4Sops [was Adoption call for draft-white-tsvwg-l4sops ...]

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Tue, 16 March 2021 12:03 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BEA5F3A0045 for <tsvwg@ietfa.amsl.com>; Tue, 16 Mar 2021 05:03:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LQJQ4gQytwhY for <tsvwg@ietfa.amsl.com>; Tue, 16 Mar 2021 05:02:58 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [137.50.19.135]) by ietfa.amsl.com (Postfix) with ESMTP id 156333A003E for <tsvwg@ietf.org>; Tue, 16 Mar 2021 05:02:58 -0700 (PDT)
Received: from GF-MBP-2.lan (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id 23A131B0022B; Tue, 16 Mar 2021 12:02:53 +0000 (GMT)
To: Pete Heist <pete@heistp.net>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <FC0AE9F0-0F85-441E-B555-51A5B6A6A009@cablelabs.com> <7b58524d41c222878a79655195e6e052372f5999.camel@heistp.net>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Message-ID: <105ec2e2-6d68-f9e2-75ae-8e3516fabd46@erg.abdn.ac.uk>
Date: Tue, 16 Mar 2021 12:02:52 +0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.8.1
MIME-Version: 1.0
In-Reply-To: <7b58524d41c222878a79655195e6e052372f5999.camel@heistp.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/N0lFWICHbBJioVOUktiTsNyaDTc>
Subject: Re: [tsvwg] Comments on L4Sops [was Adoption call for draft-white-tsvwg-l4sops ...]
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Mar 2021 12:03:01 -0000

See below.

On 16/03/2021 11:06, Pete Heist wrote:
> One comment below... [PH]
>
> On Mon, 2021-03-15 at 23:41 +0000, Greg White wrote:
>> Hi Jonathan,
>>
>> I think you've misinterpreted the intent of the draft a bit.  Maybe
>> there is an opportunity for further clarification in the draft
>> itself, but fwiw I'll try to explain it better here.
>>
>> Accurate real-time, in-band detection of RFC3168 by an L4S congestion
>> controller is what you are referring to in your first item labeled
>> 1.  If such an algorithm were generally agreed to be simple and
>> reliable, then it could be argued that there wouldn't need to be an
>> L4Sops draft. The detection algorithm that was studied earlier was
>> found to suffer false positives at low data rates or high RTTs
>> (though the referenced 'fallback' paper does discuss another approach
>> for in-band detection that warrants looking into).  While the draft
>> intends to include in-band detection as a potential option, it
>> recognizes that it isn't the only solution.  I think there is
>> confidence that out-of band tests can be reliably performed that
>> determine whether RFC3168 bottlenecks exist, and this information can
>> be used to make decisions about deployment and use of L4S.
>>
>> Further, I don't agree that network operators are always "innocent
>> bystanders" in your taxonomy.  Some might be, but others are
>> definitely not.  I think it is totally appropriate to provide
>> guidance to network operators that want to participate in L4S but
>> have RFC3168 deployed in their networks.
> On Tue, 2021-03-16 at 07:17 +0200, Jonathan Morton wrote:
>> Since the harm is primarily caused to "innocent
>>> bystanders" rather than "involved participants" or "interested
>>> observers", the acceptable level of harm and risk is especially
>> low,
>>> and the mitigations need to be correspondingly robust.
> [PH] I think Jon was referring here to end users as opposed to network
> operators. End users are "innocent bystanders" because they may be
> using a 3168 AQM without knowing it. Examples:
>
> * Their ISP uses fq_codel in its network.
> * They bought a WiFi AP that has fq_codel in the driver.
> * They're using a public WiFi AP that has fq_codel in the driver.
> * They enabled an AQM in their home router.
>
> The draft can't immediately help those users, because it will take
> years for the existing AQMs to be updated, reconfigured or undeployed.
>
> Pete
>
>> -Greg
>>
>>
>>
>> On 3/15/21, 4:35 AM, "tsvwg on behalf of Jonathan Morton" <
>> tsvwg-bounces@ietf.org on behalf of chromatix99@gmail.com> wrote:
>>
>>      I do not think this document is ready for adoption in its current
>> form.  Let me explain why, and suggest some ways it could be
>> improved.
>>
>>      L4S has a fundamental incompatibility with conventional AIMD
>> traffic in the presence of RFC-3168 ECN AQMs, just like DCTCP upon
>> which it was based.  L4S therefore requires mitigations to ensure
>> that the harm caused by this incompatibility is minimised to an
>> acceptable level.  Since the harm is primarily caused to "innocent
>> bystanders" rather than "involved participants" or "interested
>> observers", the acceptable level of harm and risk is especially low,
>> and the mitigations need to be correspondingly robust.
>>
>>      However, robust mitigations are not what l4s-ops currently
>> describes.  Most of the measures described fall into three
>> categories:
>>
>>      1: Reliance on detecting an RFC-3168 AQM and disabling the L4S
>> behaviour, using heuristics that have not yet been shown in a
>> reliably working state, even under lab conditions.  It is impossible
>> to state that such a heuristic can be relied upon until such a
>> showing has been made.  A previous attempt at implementing such a
>> heuristic was unsuccessful and is now disabled by default in the
>> reference implementation.  Hence, the reliability of such a heuristic
>> would necessarily be a subject of the experiment, not the primary
>> safeguard.
>>
>>      2: Requirements placed upon "innocent bystanders" to avoid the
>> harm, mostly by reconfiguring, replacing, or disabling their RFC-3168
>> AQMs (sometimes in an RFC-ignorant manner).  This is obviously
>> unworkable, since by definition "innocent bystanders" are unaware of
>> the experiment, and even if made aware, are disinterested in doing
>> work to accommodate it.
>>
>>      3: Recommendation to deploy L4S hosts on networks that have been
>> prepared to receive it.  Which is a step in the right direction.  But
>> this is not accompanied by a corresponding requirement to *contain*
>> L4S traffic to each prepared network.  Without such a requirement, it
>> would be very easy for L4S hosts on different networks, which may
>> individually have been prepared, to communicate over the path between
>> those networks that has *not* been prepared, and upon which the risk
>> of disrupting bystander traffic therefore exists.
>>
>>      It is perhaps noteworthy that gaps in the second and third
>> classes of mitigation are proposed to be covered by the first class
>> of mitigation.  I also note that there is still an assertion in the
>> text that RFC-3168 AQMs are "rare", which is refuted by recent data.
>> Finally, in the context of a CDN-ISP pairing for an experimental
>> deployment, the ISP subscribers' LANs and WLANs are technically
>> separate networks that would be difficult to "prepare" for L4S in
>> advance; it would be wise to consider the ramifications of that.
>>
>>      I also note in passing that a modification of tunnel
>> encapsulation semantics is also proposed.  Given that tunnel
>> implementations are more diverse than RFC-3168 AQM implementations, I
>> also consider this unlikely to be practical, though I haven't studied
>> in detail whether it would be effective if achieved.
>>
>>
>>      I am currently aware of four theoretical methods of robustly
>> mitigating the risk posed by L4S.  I think that l4s-ops would be
>> considerably improved by proposing that at least one of them be
>> employed as a prerequisite to the L4S experiment actually taking
>> place:
>>
>>      1: Develop, implement, demonstrate, and open for scrutiny an RFC-
>> 3168 detection heuristic that is reliable and prompt enough to serve
>> as a primary safeguard for the experiment.  In my opinion this will
>> be difficult and will take significant time, but is not impossible to
>> achieve.
>>
>>      2: Deprecate RFC-3168, or amend it to remove drop-equivalent
>> marking of ECT(1) packets, and require the removal of all unmodified
>> ECN AQMs from the Internet.  This is unlikely to get much support
>> given the increasing deployment rates of RFC-3168 AQMs at the present
>> time.  In any case it would take a very long time to eliminate
>> existing RFC-3168 AQM deployments at Internet scale, so I consider
>> this impractical.
>>
>>      3: Explicitly contain L4S traffic to networks that have been
>> prepared or designated for the experiment.  That could be done by
>> marking all L4S traffic with a designated DSCP at origin, and
>> blocking traffic carrying that DSCP from traversing border gateways
>> into unprepared networks.  This has the effect of making users and
>> administrators of these networks at least "interested observers" and
>> isolating L4S traffic from "innocent bystanders".  Within the
>> designated networks, observing the practical interactions between L4S
>> and conventional traffic would be part of the experiment.
>>
>>      4: Redesign L4S to shift the risk burden away from "innocent
>> bystanders".  The most obvious way to do so is to implement
>> unambiguous signalling by the network, so that the receiver knows for
>> certain whether it is receiving congestion signals from an RFC-3168
>> AQM requesting an immediate MD response, or from an AQM of the new
>> type requesting a new type of response.  The risk of performance
>> trouble is then restricted to network nodes that produce the new
>> signals and transport endpoints that understand them - in other
>> words, to the relatively small number of "involved participants" who
>> have the knowledge and incentive to study the problem and find
>> solutions.  The incentives are thus aligned correctly and risks are
>> not "externalised".
>>
>>      The SCE proposal does exactly that, in a manner that is totally
>> transparent to existing RFC-3168 endpoints and middleboxes.  It
>> becomes practical, for example, to use a Differentiated Services Code
>> Point to differentiate a low-latency service onto a second bearer and
>> provide a single-queue SCE AQM there, while providing a single-queue
>> RFC-3168 AQM (without SCE) on the primary bearer.  Because of the
>> unambiguous signalling, SCE traffic missing the DSCP would still
>> compete on equal terms with conventional traffic, instead of
>> dominating it or being dominated.
>>
>>      I realise that this last method is not strictly in scope for the
>> l4s-ops draft (and that mentions of SCE tend to raise hackles among
>> L4S proponents), but I include it because it appears to be the most
>> robust mitigation method available.  It also has the advantage of
>> running code being available to try it out immediately.
>>
>>
>>      I am not hugely optimistic that the l4s-ops draft will
>> incorporate the above advice before the adoption call ends.  But
>> unless and until it does, my position is that it SHOULD NOT be
>> adopted.
>>
>>       - Jonathan Morton

Over the years there have been a whole bunch of non-standard proposals 
to use the ECN  technique - SCE and other methods mroe recently, from a 
satndards perspective I'm assuming these experimental methods are in 
private use and we don't need to worry about them.

I see all the examples of RFC3168 from PH use FQ triggering the CE 
marking. From an L4S perspective, this at least means these devices meet 
one of the coexistance scenarios, where a form of scheduling is used to 
isolate the queues.

Gorry