Re: [tsvwg] plan for L4S issue #29

"De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com> Mon, 28 September 2020 09:54 UTC

Return-Path: <koen.de_schepper@nokia-bell-labs.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 54AC63A0B25 for <tsvwg@ietfa.amsl.com>; Mon, 28 Sep 2020 02:54:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.101
X-Spam-Level:
X-Spam-Status: No, score=-3.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.2, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MZEylgBcncKr for <tsvwg@ietfa.amsl.com>; Mon, 28 Sep 2020 02:54:55 -0700 (PDT)
Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50097.outbound.protection.outlook.com [40.107.5.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ACA163A0AE7 for <tsvwg@ietf.org>; Mon, 28 Sep 2020 02:54:54 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W+u/p8QYUs2FOBrzHHwD6+2N96qS9CEvDOLBsiT4KDwbDX26XSPvuQtcBmCQMHTkU9zzjlXokd++QKt/2pFlu7ib7ikPBcRvaL7LvvptAgKzzuEjThcBRIs7gj7L+bbiNDsdpRkVaHwLsKgRkbn176Iny4TZSO748l4kNcTcLLpHvVE5RJP8P0txEDh0isqbNMLs0sOxyL+pr/sKd55TvbFsxDrnF4bGIoBy6S1uleCBeuARJfSd68CZupEIjEkEMyaP4jQEPOWgxDG8STc8kXxM/JmQ+dG6ULD/ROo96N3aWVnGYwckuh2zVrLZK3b0S8CQZ0R2kktV2grlSgcFOQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LW+AYWk4a808aUKNAwwbFVsICfY503wWYS/IXnP7Z/k=; b=nS3NEciAFJDYvvNvh9aV7hAJm2XIfMkOuutFZo1HH4JKmyVPzhu+pGSaXBGLpX1ObvhBsJTnDSP/33NdP84SGcmaHEW6prW50TpoLv4Rmdr62FgyXGeuDqPvB3OBJHCAYMPQe9a6fsecXgviQ2uLjGbKC3XH1gpcdOBnPeMko+o3NiNvWqWv+LdbuIw2XMqUOzq/hTlK2CVGxFMeN2ECAzEIyx0IibHDTf4xTO3q5ltJrbBvcAmsmliThyFWFCNthxOI90aB5AdihsBGJk8SHo6Hq0nb8s9WDtHDilyDWFtYmoF3rlO1p83TPF9A8eSj6mNMuOgl7sliCistgDio7g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia-bell-labs.com; dmarc=pass action=none header.from=nokia-bell-labs.com; dkim=pass header.d=nokia-bell-labs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LW+AYWk4a808aUKNAwwbFVsICfY503wWYS/IXnP7Z/k=; b=Nf39s8WCsudp/u38HPx747be532LjhvbSumV4FgOETfjzen2sjFgZnBnPmZgC3r07Mf2twy+rjH5CDKTU1qp1PtOi5Rn5mieI6xOO9eauvN4L4ui93p9x2YZYOeYJiqs6OyazNMNaYiaGhYKUrWqc/fDOczBH6A8n0IFspymAbQ=
Received: from AM0PR07MB6114.eurprd07.prod.outlook.com (2603:10a6:208:113::33) by AM4PR07MB3284.eurprd07.prod.outlook.com (2603:10a6:205:5::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3433.18; Mon, 28 Sep 2020 09:54:52 +0000
Received: from AM0PR07MB6114.eurprd07.prod.outlook.com ([fe80::a06d:c78b:2c84:9f9]) by AM0PR07MB6114.eurprd07.prod.outlook.com ([fe80::a06d:c78b:2c84:9f9%5]) with mapi id 15.20.3433.030; Mon, 28 Sep 2020 09:54:52 +0000
From: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
To: Sebastian Moeller <moeller0@gmx.de>
CC: Pete Heist <pete@heistp.net>, Wesley Eddy <wes@mti-systems.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] plan for L4S issue #29
Thread-Index: AQHWZ1Eiyyf8Tpcrk06jp3qQxH0lfKkiCPgAgEfDadCADKx2gIAALuQggAAsvoCAB1LgEA==
Date: Mon, 28 Sep 2020 09:54:51 +0000
Message-ID: <AM0PR07MB6114EFE292E21A019F712CCCB9350@AM0PR07MB6114.eurprd07.prod.outlook.com>
References: <ca8ede0e-53a2-f4ff-751d-f1065cf5e795@mti-systems.com> <D0D3EDCE-3633-4E37-A167-3F1E09148ED9@heistp.net> <AM0PR07MB6114EDA6F2E8DCCB3D86D082B9200@AM0PR07MB6114.eurprd07.prod.outlook.com> <92c056567b3ad7af08777829314673ed66f5a96b.camel@heistp.net> <AM0PR07MB61140549F3BCAA65BBE6AD24B9380@AM0PR07MB6114.eurprd07.prod.outlook.com> <1F797C57-6284-4FA7-93F1-0CFCA903CC3C@gmx.de>
In-Reply-To: <1F797C57-6284-4FA7-93F1-0CFCA903CC3C@gmx.de>
Accept-Language: nl-BE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: gmx.de; dkim=none (message not signed) header.d=none; gmx.de; dmarc=none action=none header.from=nokia-bell-labs.com;
x-originating-ip: [81.82.56.234]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: d379ec5a-c3cf-4efe-2733-08d863948b8e
x-ms-traffictypediagnostic: AM4PR07MB3284:
x-microsoft-antispam-prvs: <AM4PR07MB3284AFA5FACB2FCA0B58FD26B9350@AM4PR07MB3284.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 0XUiLmk9yiwj/Id93qfDydNJDglg0/WysGmjEY0elumu/GZ161+51HcDX8E+XHF9Eb8V6PnMTqEWZPGsuSNEGX7m4OLi+fZDSB6tKTjLRUTxtXIlWhbJQZaS6uiJ/dIJXZ4r9pSZ7SvE/RXwaK42gao9oFm8zm6ZrdouBX9seCsASClPMVLd3yO62g5nE/t4G2d7TQkitBC5kop5bkmfb8vq2CKeEkSFOhdCW+Wm+5wgOfS3xdRQ4elzVXtauYgFRrj4snbU0jxZu2L5G7wlhJ4GkuWSyxuj4U7mkjFVBXi7Pr3DQfrXtyVv1kWNeSZd7sMCJT5m+FAqsi2lTK6E1RbwGcnb+hLHtX/y+fAySNitId76lAU4Ni0nFgA9I7GgUn/K13KjiW6iwnm4LH+tfg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR07MB6114.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(136003)(366004)(39860400002)(376002)(4326008)(26005)(186003)(6506007)(53546011)(86362001)(966005)(6916009)(316002)(2906002)(478600001)(7696005)(54906003)(8936002)(66476007)(9686003)(55016002)(66556008)(66446008)(52536014)(5660300002)(64756008)(71200400001)(66946007)(66574015)(76116006)(83080400001)(33656002)(8676002)(83380400001); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: D16/0FP1yscP/Eg+sLoYE+6Zj3VyqjjHi6eXDUUr2pHrAL0eKIajLcp1kk8MYAGxFz4MV+5Jt0uML8FKdMjWJhV6dpUlQ/yJag+mkeYC1eNRg3ptH/eufW20jBfiRDZbYMRNjDzSXa/sEUWAc85HvlyIYeCQk1q3gQtE19VNsjk1lC56JDLOF3D0uT9kNLIrJR3Ltal/SK7fm3rTH+xs5qzHohe/9jFAewf8IDrJ1v+yJO20KCL7V3652Ig0W9oNWX5jvHVgdr/NDWJG8BP8h5X3NICo7Hj+1/jTAjIhOuZCGBUQ1WZ4vfjv1J2Q7zhl5d0fILiqUsr23EI7Cn3Qfb4Nz89i5N8QgOvlh6sdStMgom8A16jOYRskWzZoK3wREHwHuOYwT4RmInaNVIZ7nxPfeJVVtywoYXMKws/vkRUjdd8BZdHftH9Tg5y6JmEGwKRuN0uofLprw7nQgdOKAlazBoXcX19xJ4Z85ZaGmryvki/dnDnyq89RAuB0ICtrcX5nDQQRRz8QSbp2VlPk2BMU9CrEti/4sJx//V5lbSTnyuxGessB8+Wh0y9FE22gsu4DRcfYlEuljxlQKkJ+iIGSJzzs0peKYpKplPshz4B1LSLjJyPU7X8lCFUtgOVbxtwpJUChyrBQ4UTE0ELVLQ==
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: nokia-bell-labs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: AM0PR07MB6114.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d379ec5a-c3cf-4efe-2733-08d863948b8e
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Sep 2020 09:54:51.9704 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: l+sX9F7ClOX5uX5SwCCo0Rn2FnGAuMxkTKlJZwf3cA0MrF2IoqY25uL3Ig2gxcu/1lo6/ZL+WF04W8qpWJFLKIhHxRFYJlQmAfYAoXQFw+DUvhODZN6MEU+GeGU++VUv
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR07MB3284
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ySi3DxiiD6-9S-MPSA8yS2nPHJY>
Subject: Re: [tsvwg] plan for L4S issue #29
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Sep 2020 09:54:57 -0000

Hi Sebastian,

>>	How confident are we that the "wide variety of conditions" under which this supposedly works is actually representative for the existing internet?

Only real world deployments will answer this question. I guess any other answer will be purely speculation...

>>	rfc3168 present; rfc3168 detected: 	hit (true positive)
>>	rfc3168 present; rfc3168 not-detected:	miss	(false negative)
>>	rfc3168 absent; rfc3168 detected:		false alarm (false positive)
>>	rfc3168 absent; rfc3168 not-detected:	correct rejection (true negative)

Thanks for this. I like this classification and the tabular representation a lot. I guess it is ok then to use in the draft "missed detection" and "false detection"? I assume the other correct cases will be less referred to.

>>	That is why I believe that rfc3168 detection in TCP Prague is a red herring that distracts from fixing L4S true issues. Like demonstrating that the current implementation actually performs as inteded over long holti-hop high RTT high Bandwdth links, over asymmetric links, and over uni- and bidirectionally saturated links.

These are all known problems that also exist in Classic TCP on FIFO queues. Solutions are known for several (all?) of them, which can as well be implemented for L4S (Prague CCs and L4S AQMs). So I wouldn't blame L4S for those problems or expect that a reference implementation is including all solutions to all known problems.

Koen.

-----Original Message-----
From: Sebastian Moeller <moeller0@gmx.de> 
Sent: Wednesday, September 23, 2020 7:47 PM
To: De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com>
Cc: Pete Heist <pete@heistp.net>; Wesley Eddy <wes@mti-systems.com>; tsvwg@ietf.org
Subject: Re: [tsvwg] plan for L4S issue #29

Hi Koen,


> On Sep 23, 2020, at 17:46, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper@nokia-bell-labs.com> wrote:
> 
> Hi Pete,
> 
> I don't think the goal is to wait for a perfect RFC3168 detection solution and we already have an implementation that works well under a wide variety of conditions without additional configuration.

	How confident are we that the "wide variety of conditions" under which this supposedly works is actually representative for the existing internet?


> This also means that it can still be improved, which I think can be done more appropriate when facing real world issues. The goal of the operational guidelines is to provide a larger set of tools for solving problems with classic ECN bottlenecks, exactly to avoid the need to rely on perfect end-host detection mechanism only.

	I grudgingly agree that trying to fix L4S's AQM short comings by mandating fancy heuristics in the end points seems to be a loosing proposition. 

> 
> Related to the "false negatives" and "false positives" naming, I agree that it is very confusing. As the goal is to detect Classic ECN network AQM behavior, maybe better and shorter names could be "false-detect" and "false-non-detect"? 

	I do not think that that is helpful, false-positive and false positive have much better well-known definitions that actually apply here. So if other nomenclature should be used, let's make it clearly different or better yet use already established terms. 

	We could switch to signal detection theory terms (which is just one way to think about a classifier with just two options), then the four combinations of truth and classification for our RFC3168 detector would be described with the following terms:

rfc3168 present; rfc3168 detected: 	hit (true positive)
rfc3168 present; rfc3168 not-detected:	miss	(false negative)
rfc3168 absent; rfc3168 detected:		false alarm (false positive)
rfc3168 absent; rfc3168 not-detected:	correct rejection (true negative)

these terms are pretty much standard for similar detection problems (and also offer a decent approach to assess the effectiveness of the detector).

Let's stick to some already established nomenclature, whether that is the true/false positive/negative one or the DST one, please (see e.g. https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).

The bigger issue I have with this is that this is still just a "fudge" to be able write in the internet draft "transports are required to implement RFC3168 detection, as demonstrated in TCP Prague" even though everybody here should know that that is not going to happen. People will mark whatever with ECT(1) and we will have to live with the fall-out. IMHO this is a sufficient reason to put the smarts in the AQM, instead of the current approach of assuming that end-points will just do this out of goodness of their heart.

That is why I believe that rfc3168 detection in TCP Prague is a red herring that distracts from fixing L4S true issues. Like demonstrating that the current implementation actually performs as inteded over long holti-hop high RTT high Bandwdth links, over asymmetric links, and over uni- and bidirectionally saturated links.

Best Regards
	Sebastian

> 
> Koen.
> 
> -----Original Message-----
> From: Pete Heist <pete@heistp.net>
> Sent: Wednesday, September 23, 2020 2:19 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) 
> <koen.de_schepper@nokia-bell-labs.com>; Wesley Eddy 
> <wes@mti-systems.com>
> Cc: tsvwg@ietf.org
> Subject: Re: [tsvwg] plan for L4S issue #29
> 
> Hi Koen,
> 
> I can definitely understand the need to turn bottleneck detection on and off for testing, or for additional knobs during development.
> 
> Overall, I suspect that there will be more questions about potential problems if bottleneck detection is not a MUST for implementations in the draft, or not baked into the final implementation in a way that works well under a wide variety of conditions without additional configuration.
> 
> On an easier topic, I wonder if we shouldn't change the "false 
> negatives" and "false positives" terminology to something clearer, 
> like "mis-identification of RFC 3168 bottlenecks as L4S", or "mis- 
> identification of L4S bottlenecks as RFC 3168", respectively. I might 
> have opened up a can of worms there in trying to save a few words. :)
> 
> Pete
> 
> On Tue, 2020-09-15 at 12:43 +0000, De Schepper, Koen (Nokia -
> BE/Antwerp) wrote:
>> Hi Wes, Pete,
>> 
>> I think to make progress on avoiding both false negatives and false 
>> positives, a good view on the conditions that cause problems is 
>> needed. So we better have the means to detect the real life impact of 
>> Classic-ECN-FIFO deployments. This means we need to be able to switch 
>> off Classic ECN detection (under controlled or even known conditions).
>> 
>> Another point is that it would be useful also to have all control 
>> variables of the existing implementation configurable for everyone 
>> willing to further experiment (without necessarily needing to change 
>> code). As I understood, the right tuning of these can bring a lot of 
>> further improvement opportunities. Also depending on a typical 
>> deployment, these parameters could be tuned for that specific 
>> targeted case.
>> 
>> So the resolution of this issue is exactly to facilitate further 
>> improving the detection algorithm (preferably via tuning), and being 
>> able to disable it when conditions are controlled or safe to avoid 
>> these false negatives.
>> 
>> I think these are topics that can be covered by the Operational 
>> Guidelines draft.
>> 
>> Regards,
>> Koen.
>> 
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Pete Heist
>> Sent: Friday, July 31, 2020 8:53 PM
>> To: Wesley Eddy <wes@mti-systems.com>
>> Cc: tsvwg@ietf.org
>> Subject: Re: [tsvwg] plan for L4S issue #29
>> 
>> Hi Wesley,
>> 
>> One thing I noticed during testing was that the current 
>> implementation of TCP Prague in Linux allows disabling bottleneck 
>> detection through the prague_ecn_fallback kernel module parameter (
>> https://github.com/L4STeam/linux/blob/0e7cf8acb318873c3f61084453f8da1
>> 5 b2e398be/net/ipv4/tcp_prague.c , line 158). I don’t know if that 
>> was left in only for testing.
>> 
>> In section 6.3.3 of l4s-arch, there is discussion around classic 
>> bottleneck detection. Since I don’t see an explicit MUST that it 
>> remain enabled (although I do see the text “an L4S sender will have 
>> to fall back to…”), it’s not completely clear to me if it’s actually 
>> required to be implemented and permanently enabled in all 
>> implementations. If it is, I suppose the implementation should 
>> reflect that also.
>> 
>> While I feel it best that detection identifies both types of queues 
>> accurately, if bottleneck detection were both an explicit MUST in the 
>> text *and* not possible to disable in any implementation, I think 
>> that would make the misidentification of L4S queues as classic ECN 
>> queues less of a safety concern, since it would be impossible to turn 
>> off. It would remain an issue for the architecture overall though.
>> 
>> Hope that helps...
>> 
>> Pete
>> 
>>> On Jul 31, 2020, at 5:41 PM, Wesley Eddy <wes@mti-systems.com>
>>> wrote:
>>> 
>>> Hello, ticket #29 for the L4S documents is about classic bottleneck 
>>> detection misidentifying L4S queues as classic ECN queues.
>>> 
>>> https://trac.ietf.org/trac/tsvwg/ticket/29
>>> 
>>> In contrast to other issues, it doesn't seem like this should block 
>>> a WGLC on the L4S drafts.
>>> 
>>> 	• It is specific to classic bottleneck detection algorithm, which 
>>> is planned to be worked on in the Prague ICCRG draft.
>>> 	• The result is sometimes failing to achieve the best possible L4S 
>>> behavior, but doesn't seem to be an Internet safety issue.  This 
>>> resulting in people turning off classic bottleneck detection would 
>>> be a different issue, and something maybe the operator guidelines 
>>> would address.
>>> 	• It seems like it can be worked on further in the course of L4S 
>>> experimentation, without negative effects to others.
>>> So, I believe we should track this work in the ICCRG, and close the 
>>> ticket here.  Please let me know in the next week if I've 
>>> misunderstood any aspect of this and it should remain open.
>>> 
>>> 
>