Re: [tsvwg] plan for L4S issue #29

Sebastian Moeller <moeller0@gmx.de> Wed, 23 September 2020 17:47 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D86E43A1307 for <tsvwg@ietfa.amsl.com>; Wed, 23 Sep 2020 10:47:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.349
X-Spam-Level:
X-Spam-Status: No, score=-2.349 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eC-kNvaV4A-7 for <tsvwg@ietfa.amsl.com>; Wed, 23 Sep 2020 10:47:19 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 281873A1304 for <tsvwg@ietf.org>; Wed, 23 Sep 2020 10:47:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1600883232; bh=z+2nF5UHit0shFgTZ9oesD2mAkByQNJC5s1WMIUUw30=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=GhLkRXct436hQiYeKxpKcBgx6xn1vIGu2S3F7OeFgCHDmdXaDUJcIkgqhKsrdUjQP sVFPvdZWfgDfpQzxJ5YtVxOvgtgio33vT3dhV14sJg4AqpgZshU/5M5EgjtYF1HH8S e5MPk/eNjjazYI+rqP4WgACrckbb8iUsSq+c8c9w=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.42.229] ([77.6.238.6]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1M42jQ-1kL8rH3iQS-00006L; Wed, 23 Sep 2020 19:47:11 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.15\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM0PR07MB61140549F3BCAA65BBE6AD24B9380@AM0PR07MB6114.eurprd07.prod.outlook.com>
Date: Wed, 23 Sep 2020 19:47:07 +0200
Cc: Pete Heist <pete@heistp.net>, Wesley Eddy <wes@mti-systems.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <1F797C57-6284-4FA7-93F1-0CFCA903CC3C@gmx.de>
References: <ca8ede0e-53a2-f4ff-751d-f1065cf5e795@mti-systems.com> <D0D3EDCE-3633-4E37-A167-3F1E09148ED9@heistp.net> <AM0PR07MB6114EDA6F2E8DCCB3D86D082B9200@AM0PR07MB6114.eurprd07.prod.outlook.com> <92c056567b3ad7af08777829314673ed66f5a96b.camel@heistp.net> <AM0PR07MB61140549F3BCAA65BBE6AD24B9380@AM0PR07MB6114.eurprd07.prod.outlook.com>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
X-Mailer: Apple Mail (2.3445.104.15)
X-Provags-ID: V03:K1:/GSpVb639POjwzTMEUA3xkc+j2M+6a5lyfBKVUhQirKG2zMr0h5 /u8mwYf82in2+4sPpvK+BRctqbQVEZTj8z5jZOr7FhG57L7mEmk7dKicN9sHmTZk7mWVQda MYv8d8vIoDloL10k7HlPa+u8lrA0hjiUMdMKTYvDqYDwUsrmon2gJG/UeHKRjBc5me3MxXX j2jsF99zoC6gMBA4n/5Cw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:XouYBK00ZM0=:bsumT5bKHTjtgiMI+CY9GI lK1URFuHMFt7gNe4NMvQSPtO7MYfnny1PrEYy/3YVs3vJRBHOdQ5/iFu9oOdRSaOGqZRv5ljD 1SvBRGzLgTchco8G6omtS1vpWsrUa+tftRQJALJdcrBODLLm1a0MERWJnymKCSjK0gogk24yh p93DMigjRLj6KrpdwMWkbxYC/ktIyrQJqc7HtR8KW337aTnUiXTZEENlrLTWm9fkkvQO3pM6s 111HePBjotIWjdeDJxBW99q7vef4OQr3kGhy/c+4XzSNeqI7xZRq5CDm2sTuf4Vir1g7Hn8WE yi7VvJ+Pg+GCLlsAcZoHyQYNHcVuV9Jhqbzi+jQ4j9K+ptkt9WI/yCugi9TFqKUkrI9AQIk3h ZNBJWcgkkHAwHF+PPWMQf95i56icgfwV2kkAF3OBB/1d/8I/qdEDloZiFuBCxwyXecMKpg02B 3khorNwnbLwzkgPRYFylSy0wJrFAPDcKkyDvHhwu7kMRX70eSbY8MZP5/Rs9xK/h6SlPcjcUK eA/CSBP/Yw5Wf3SoCY63WXn4JPH/RrWDe09rHbNgD17trvQygM2gefW4dERH5wPXynNccay5O WQNCiFHUfhwnwbE91Y4TjDcNSjLWs+KXxLm0pSBLHXSpLbspjpFCiJhLtyM2UGreUF2wzscOs 3ip8bJud2DwKbK0vO4vuB+mQMUqNX+/Y8mXGmjYePkLnVfq0IyqLh/PVbt5tjAcKln1l+mRXn xM3Vv8nmJKNcKt1EVhiqB9RNCv/LN1r4aQCwy2rFCtM8Nt3QwhnXNluDARNRMYvXgSQyxmO+J RnlDmNJIdUTGPRgA7XHA1l2u7usJbxR1gI3ll2D/LJzFuBATgeSjIwbX5IX4hdcHqR0ZVC3yo CVVe9GVkVoymHXn+L+WkDRmCyJVMQTdRZo5q/SOLFnB6022tbPvH8+Hz7CNs/NO3QM2uqrWad bRylJ/Sr821G0DAOpoYgnF/r6Nn8Kb+ad39R5l51TrWlKNn0G7sqfDvZdKuSL0bAXEay5kvNe Xkl2obN2gX986D7VHHRe8sJT5OPDhYH0h9xDjKPbVqej9CNiw3BWNVoKnymYp79UXi3I6ENVk Tv37lPI2dc6v9BjTK7w6skZSRH0rL/FuQUl87b0kONJy7M2IBavreM8z2f68QwMInjeQyt7SY HMMd/yy1oEuYQQvq+OKZ4zjK227LqcxASYByDJaLnf/QKpxx2aBlADjPENOU3Ga+VDKtIj1gY bqa7qxIpPBc46AN/e
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/cn5QyMYEl0HDaCd_OTA4EPZqoQI>
Subject: Re: [tsvwg] plan for L4S issue #29
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Sep 2020 17:47:22 -0000

Hi Koen,


> On Sep 23, 2020, at 17:46, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Pete,
> 
> I don't think the goal is to wait for a perfect RFC3168 detection solution and we already have an implementation that works well under a wide variety of conditions without additional configuration.

	How confident are we that the "wide variety of conditions" under which this supposedly works is actually representative for the existing internet?


> This also means that it can still be improved, which I think can be done more appropriate when facing real world issues. The goal of the operational guidelines is to provide a larger set of tools for solving problems with classic ECN bottlenecks, exactly to avoid the need to rely on perfect end-host detection mechanism only.

	I grudgingly agree that trying to fix L4S's AQM short comings by mandating fancy heuristics in the end points seems to be a loosing proposition. 

> 
> Related to the "false negatives" and "false positives" naming, I agree that it is very confusing. As the goal is to detect Classic ECN network AQM behavior, maybe better and shorter names could be "false-detect" and "false-non-detect"? 

	I do not think that that is helpful, false-positive and false positive have much better well-known definitions that actually apply here. So if other nomenclature should be used, let's make it clearly different or better yet use already established terms. 

	We could switch to signal detection theory terms (which is just one way to think about a classifier with just two options), then the four combinations of truth and classification for our RFC3168 detector would be described with the following terms:

rfc3168 present; rfc3168 detected: 	hit (true positive)
rfc3168 present; rfc3168 not-detected:	miss	(false negative)
rfc3168 absent; rfc3168 detected:		false alarm (false positive)
rfc3168 absent; rfc3168 not-detected:	correct rejection (true negative)

these terms are pretty much standard for similar detection problems (and also offer a decent approach to assess the effectiveness of the detector).

Let's stick to some already established nomenclature, whether that is the true/false positive/negative one or the DST one, please (see e.g. https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).

The bigger issue I have with this is that this is still just a "fudge" to be able write in the internet draft "transports are required to implement RFC3168 detection, as demonstrated in TCP Prague" even though everybody here should know that that is not going to happen. People will mark whatever with ECT(1) and we will have to live with the fall-out. IMHO this is a sufficient reason to put the smarts in the AQM, instead of the current approach of assuming that end-points will just do this out of goodness of their heart.

That is why I believe that rfc3168 detection in TCP Prague is a red herring that distracts from fixing L4S true issues. Like demonstrating that the current implementation actually performs as inteded over long holti-hop high RTT high Bandwdth links, over asymmetric links, and over uni- and bidirectionally saturated links.

Best Regards
	Sebastian

> 
> Koen.
> 
> -----Original Message-----
> From: Pete Heist <pete@heistp.net> 
> Sent: Wednesday, September 23, 2020 2:19 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>; Wesley Eddy <wes@mti-systems.com>
> Cc: tsvwg@ietf.org
> Subject: Re: [tsvwg] plan for L4S issue #29
> 
> Hi Koen,
> 
> I can definitely understand the need to turn bottleneck detection on and off for testing, or for additional knobs during development.
> 
> Overall, I suspect that there will be more questions about potential problems if bottleneck detection is not a MUST for implementations in the draft, or not baked into the final implementation in a way that works well under a wide variety of conditions without additional configuration.
> 
> On an easier topic, I wonder if we shouldn't change the "false negatives" and "false positives" terminology to something clearer, like "mis-identification of RFC 3168 bottlenecks as L4S", or "mis- identification of L4S bottlenecks as RFC 3168", respectively. I might have opened up a can of worms there in trying to save a few words. :)
> 
> Pete
> 
> On Tue, 2020-09-15 at 12:43 +0000, De Schepper, Koen (Nokia -
> BE/Antwerp) wrote:
>> Hi Wes, Pete,
>> 
>> I think to make progress on avoiding both false negatives and false 
>> positives, a good view on the conditions that cause problems is 
>> needed. So we better have the means to detect the real life impact of 
>> Classic-ECN-FIFO deployments. This means we need to be able to switch 
>> off Classic ECN detection (under controlled or even known conditions).
>> 
>> Another point is that it would be useful also to have all control 
>> variables of the existing implementation configurable for everyone 
>> willing to further experiment (without necessarily needing to change 
>> code). As I understood, the right tuning of these can bring a lot of 
>> further improvement opportunities. Also depending on a typical 
>> deployment, these parameters could be tuned for that specific targeted 
>> case.
>> 
>> So the resolution of this issue is exactly to facilitate further 
>> improving the detection algorithm (preferably via tuning), and being 
>> able to disable it when conditions are controlled or safe to avoid 
>> these false negatives.
>> 
>> I think these are topics that can be covered by the Operational 
>> Guidelines draft.
>> 
>> Regards,
>> Koen.
>> 
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Pete Heist
>> Sent: Friday, July 31, 2020 8:53 PM
>> To: Wesley Eddy <wes@mti-systems.com>
>> Cc: tsvwg@ietf.org
>> Subject: Re: [tsvwg] plan for L4S issue #29
>> 
>> Hi Wesley,
>> 
>> One thing I noticed during testing was that the current implementation 
>> of TCP Prague in Linux allows disabling bottleneck detection through 
>> the prague_ecn_fallback kernel module parameter ( 
>> https://github.com/L4STeam/linux/blob/0e7cf8acb318873c3f61084453f8da15
>> b2e398be/net/ipv4/tcp_prague.c , line 158). I don’t know if that was 
>> left in only for testing.
>> 
>> In section 6.3.3 of l4s-arch, there is discussion around classic 
>> bottleneck detection. Since I don’t see an explicit MUST that it 
>> remain enabled (although I do see the text “an L4S sender will have to 
>> fall back to…”), it’s not completely clear to me if it’s actually 
>> required to be implemented and permanently enabled in all 
>> implementations. If it is, I suppose the implementation should reflect 
>> that also.
>> 
>> While I feel it best that detection identifies both types of queues 
>> accurately, if bottleneck detection were both an explicit MUST in the 
>> text *and* not possible to disable in any implementation, I think that 
>> would make the misidentification of L4S queues as classic ECN queues 
>> less of a safety concern, since it would be impossible to turn off. It 
>> would remain an issue for the architecture overall though.
>> 
>> Hope that helps...
>> 
>> Pete
>> 
>>> On Jul 31, 2020, at 5:41 PM, Wesley Eddy <wes@mti-systems.com>
>>> wrote:
>>> 
>>> Hello, ticket #29 for the L4S documents is about classic bottleneck 
>>> detection misidentifying L4S queues as classic ECN queues.
>>> 
>>> https://trac.ietf.org/trac/tsvwg/ticket/29
>>> 
>>> In contrast to other issues, it doesn't seem like this should block 
>>> a WGLC on the L4S drafts.
>>> 
>>> 	• It is specific to classic bottleneck detection algorithm, which 
>>> is planned to be worked on in the Prague ICCRG draft.
>>> 	• The result is sometimes failing to achieve the best possible L4S 
>>> behavior, but doesn't seem to be an Internet safety issue.  This 
>>> resulting in people turning off classic bottleneck detection would 
>>> be a different issue, and something maybe the operator guidelines 
>>> would address.
>>> 	• It seems like it can be worked on further in the course of L4S 
>>> experimentation, without negative effects to others.
>>> So, I believe we should track this work in the ICCRG, and close the 
>>> ticket here.  Please let me know in the next week if I've 
>>> misunderstood any aspect of this and it should remain open.
>>> 
>>> 
>