Re: [tsvwg] plan for L4S issue #29

"De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com> Mon, 28 September 2020 10:17 UTC

Return-Path: <koen.de_schepper@nokia-bell-labs.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D085E3A0B95 for <tsvwg@ietfa.amsl.com>; Mon, 28 Sep 2020 03:17:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.101
X-Spam-Level:
X-Spam-Status: No, score=-3.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.2, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w83lwG9OuLvB for <tsvwg@ietfa.amsl.com>; Mon, 28 Sep 2020 03:17:54 -0700 (PDT)
Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2134.outbound.protection.outlook.com [40.107.21.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B641A3A0B7B for <tsvwg@ietf.org>; Mon, 28 Sep 2020 03:17:53 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WfCMQu3D4ubiI+/eUmjZgLtHDug6b8rmWTHuZ2v2QwstB3k2kxo1r9tXYKODs1TWYYEWuian9ydHK9mA2yFz+bxvYZf7QOooz29QqQnP7764QebB37yHYJj+0kWu54G/Hi2aRUHrNO+iEYI0P5VuoRbJB/Pw/eZXhTISMmQzyrvIm1QI026GXHYcMNrugbEzrtM7A6Xkrk+nbzyVkqTMfSKsoQ8Cu74qogSrE+EqymYOTh0Qk2eiCG79x3PPTzsMnzA/Uvby+AdKS21uJnRUYhiVxP4Gl0q0hrPulDSOmtjmM9Lp5ZIW1GMg75sC2dqmTcXA8cC4dP0K7oJfLt+eCw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9zBwDmTdodxgDCVdBAElafWAOLNWBzkRdCsFevw6iIY=; b=Zap7fv9k/EQx3gTT4YDetE4OCSSrf4n34/vyct91+upxqHamdzub4kD85jAzBuFVsip2OVwryoonoqppBNoUAUEVArkczGIT4VRqhW9zJyUoTdaNtqCx57H2f+ACFDAMI6FRiohZjEPRFUFfbAkN9y8vDFXJcrnnLsxJiKBIKFh+8Jnk/oKdoTD39Q+lOctsCsqdxIltQn4q+1Ti512C9yVf5EWiFTLMqaraS586ZcOSiU1FRlm32UFnVHrGrni5OAmRNuoTZV6GuSsSnWLHpxhhSOsAc1F3D+x0V+HZcUbGantoY/EwQcJIPrl/wGDlvxFKolY4enmgEBxnqtxiWw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia-bell-labs.com; dmarc=pass action=none header.from=nokia-bell-labs.com; dkim=pass header.d=nokia-bell-labs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9zBwDmTdodxgDCVdBAElafWAOLNWBzkRdCsFevw6iIY=; b=HKgh/0nGiWwGV3eaBDWJmdT9MWXPEw8lDQhdHg841WT7Zk/U9kYFQekbezPBtnBs8zaeEkwwfUAqdK5C4Q3Ma8UmUzUg14nuDx3sKP1sNIPPXo8aHLw8S23AFuhIVB/fUKXwpDK8femdrIeI1Wl8Z1+jnNEIPinMXtuGdMy2IlM=
Received: from AM0PR07MB6114.eurprd07.prod.outlook.com (2603:10a6:208:113::33) by AM0PR07MB4100.eurprd07.prod.outlook.com (2603:10a6:208:4d::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3433.16; Mon, 28 Sep 2020 10:17:50 +0000
Received: from AM0PR07MB6114.eurprd07.prod.outlook.com ([fe80::a06d:c78b:2c84:9f9]) by AM0PR07MB6114.eurprd07.prod.outlook.com ([fe80::a06d:c78b:2c84:9f9%5]) with mapi id 15.20.3433.030; Mon, 28 Sep 2020 10:17:50 +0000
From: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>
To: Sebastian Moeller <moeller0@gmx.de>, Bob Briscoe <ietf@bobbriscoe.net>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] plan for L4S issue #29
Thread-Index: AQHWZ1Eiyyf8Tpcrk06jp3qQxH0lfKkh+wUAgFNFrQCAAbkngIAHNAXw
Date: Mon, 28 Sep 2020 10:17:50 +0000
Message-ID: <AM0PR07MB61144F9D74A5ABFD94C370AEB9350@AM0PR07MB6114.eurprd07.prod.outlook.com>
References: <ca8ede0e-53a2-f4ff-751d-f1065cf5e795@mti-systems.com> <4FE5E2A4-7853-487E-82E7-7B74AA2B6FC4@gmail.com> <5ebf850e-631f-4293-2ec8-7c80349e6a02@bobbriscoe.net> <154941B2-6E5A-466B-93FB-B1263FFC1D9A@gmx.de>
In-Reply-To: <154941B2-6E5A-466B-93FB-B1263FFC1D9A@gmx.de>
Accept-Language: nl-BE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: gmx.de; dkim=none (message not signed) header.d=none; gmx.de; dmarc=none action=none header.from=nokia-bell-labs.com;
x-originating-ip: [81.82.56.234]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 326c3967-6b3f-4faa-108f-08d86397c152
x-ms-traffictypediagnostic: AM0PR07MB4100:
x-microsoft-antispam-prvs: <AM0PR07MB4100076F69CB8C24C4D0621DB9350@AM0PR07MB4100.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Z+ckIe0gewzJ20e+sp2z5HjUHV5ZkvLatpjNlqGB2NyZm+mLI6aJLbpSm735XXrKwYple61NaNNBDRG+1uTNiS6dTBRRX7ZwRq73DZFitu9YNIh8y1d+W5O40M11rTnQxCECcRwDOKJrgsuaFeDvuVSfLsosQk5ker0GuAyI/U32FBE3uRhDx9ZhbPvt6qq6Nt4QNCmIMFdqZVSEVjxIePeOkHsERpNrUbLKg74H6dgkmXYUN8LtjP3H8dQX4rkQzNlVxngyCrbBGcFznrXKV7jpAXWZrhXL+qnTlSscNG2F6T6IPHwa0HFOnz4yMjcksf6GNB39jq2J9x+p7wRmS/3sdtsgoNcCbE6/S0C1YFyMDDPub1xZ/WqLh8BqvfB8cfRaSpuTC4UbHFSNtM/k7Q==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR07MB6114.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(346002)(39860400002)(366004)(136003)(396003)(53546011)(6506007)(2906002)(478600001)(186003)(966005)(26005)(9686003)(55016002)(8676002)(8936002)(83080400001)(5660300002)(4326008)(7696005)(33656002)(83380400001)(66446008)(66556008)(66946007)(66476007)(64756008)(76116006)(52536014)(86362001)(66574015)(316002)(71200400001)(110136005); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: g8LFHWUN6MT8hN2+KDTVtLBrZpuigvNQZORBXytnPL8Tttuz1S0NoN5fTZBIg1J9pa9W/LwqgzqqjnVUNw2140S9jsBo05Fv9GdeKLxZ8trFxQdY5AsXXUkTiFGD14GOc2WWtVBjAQ0t/m4Y+/l+pMXgnOxavK10Tk578lEfFAxaan2Yv/5ljy/BjiVjDCeBnlPeKlakdk4WtzZ9QZkvbQ4IOVVucBXRxQX3u4rJwvHhz9aF2SIZ1jlWrzcPmrGoVc4zBMGicYOihPK8x9lvbzJ6fMyTzXhTkJORhL5RKH7XsQGQJPcdyo+5oD9r7WiveTiTqdGqWmw9ONCMGgL1GM4nh/SxV57aPhkz8pOJbHv2sr48STw7trgPUjTPQnTK28utDgG40L8rvNuY5++dDLn+FP6v/85kUOboQV9Vj0PLoWwNf4oonDuAHOqqfNFV7Be82sO9upWgWdZrWDYs6/zPdWUycg0L3Udc1X99H1wstsw1LC9bxWfJ78XlZuXoV9A0+V7pMs1JnCLOpWJMjqSeHlotqvQ29w9Y0OIcNf75rf1t4kWskyymdbKYyd3IyEjNaTkFfDBF+o+DjtxHrcYir29M2vXHYM26cjQtKOMgexm1/7y4O6qqVbRjJm5XDcQBphQheWtubvQlfppiKg==
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: nokia-bell-labs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: AM0PR07MB6114.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 326c3967-6b3f-4faa-108f-08d86397c152
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Sep 2020 10:17:50.6708 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: yFlemv99H+IjcbgeAjlx0K0ciINK129ciw62cw0vwGIcP0TbXxq6Xkq/vx2H1vjKh29eCYJIiIU/HnuZqLuDjPuxXD0fxaNxwOMUTAQP4IQk1dIoO0vKbMty0FRI5qEg
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR07MB4100
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/0kMjbk1s207jxBgwduunNp9KGkw>
Subject: Re: [tsvwg] plan for L4S issue #29
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Sep 2020 10:17:56 -0000

	[SM] Which deliverables did you agree to and with whom?

I assume Bob means the L4S drafts.

	[SM] Least by which criterion?

My main criterions were: Incentive to get ECN finally widely deployed by having a "big" advantage compared to non-ECT traffic, and having a simple enough network AQM that would be good enough to Classic traffic (better than tail-drop) and very good-enough for low latency (and throughput) to the new-ECN users.

	[SM] Yes, that fleetingly appeared on the list, I commented up it and did not head much of that thereafter.

Thanks for this. Your comments were noted and will certainly be taken into account by Greg (who is editing the draft). I understood from Greg that any contributions of text (filling the todo-gaps and rephrasing the existing text) are welcome.

Koen.


-----Original Message-----
From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Sebastian Moeller
Sent: Wednesday, September 23, 2020 10:01 PM
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: tsvwg@ietf.org
Subject: Re: [tsvwg] plan for L4S issue #29

Dear Bob,

a few questions in-line, below.


> On Sep 22, 2020, at 19:41, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Jonathan, Wes, list,
> 
> Sorry, I've been dead to the L4S conversations on this list for weeks now. But I've resolved to get back involved now. I stopped interacting on L4S over the summer because I became burned out by the L4S conversation being dominated by the same two or three people who do not want the agreed deliverables at all.

	[SM] Which deliverables did you agree to and with whom? 


> I have never had to work on a technology before where I am faced with a continual stream of blocking messages, rather than constructive criticism or solutions being offered.

	[SM] I take it you ignored the IPv4/v6 discussions then?

> 
> It has always been recognized that there are not enough bits in the IP header for a perfect encoding scheme, so that we will have to choose the least worst compromise.

	[SM] Least by which criterion?

> We had a consensus call in April,

	[SM] Did we? As far as I remember the decision back then was always a "vote" of confidence for L4S over SCE, to make team L4S continue with the experient, but it did not (even try to) answer the question whether L4S is ready for deployment.

> in which the questions of Classic ECN fallback (#16) and false positives in the current algo (#29) were well understood by all concerned. The result was a decision not to go with the alternative solution (SCE) that inherently addresses issues #16 & #29, but instead to continue with the solution (L4S) that partially addresses these issues and fully addresses other concerns. The L4S design team were asked to prepare a new ops guidelines draft as part of that decision.

	[SM] Yes, that fleetingly appeared on the list, I commented up it and did not head much of that thereafter.

> 
> I became disheartened about working further on the Classic ECN AQM fallback algo when all the work we'd done was just dismissed as futile.
> It became apparent that no amount of work on it would ever satisfy the few people who have been vocally opposing L4S as a whole.

	[SM] I have a feeling you are targeting in my direction here, but I have always maintained that I can be convinced by data, as long as the data was reasonably representable for to be expected "bad" conditions over the open internet (that is to say pure proofs of principle that something can work under some carefully selected conditions are not enough, no matter under how many iterations known good conditions are shuffled).


> Now that's a shame, because I/we had worked out an additional heuristic to address the large bulk of the false positives that appeared at low link capacity (note that is different from low flow rate). I admit that this solution would not have addressed the few cases that arise in apparently random circumstances. And I admit that, without having tried it, it might not work at all.

	[SM] Not sure that piling up heuristic on top of heuristic is a safe way to engineer robust and reliable solutions, sorry.


> 
> I now prefer to work on Classic ECN AQM detection more than fallback. My logic for this is that L4S can be /unfair/ against Classic flow(s) in a Classic ECN AQM, but it is not /unsafe/. Unsafe means starvation, but Classic flows always still progress against L4S.

	[SM] Could you please give a definition of "starvation", in terms of share of a bottleneck. Without a proper numeric definition that sub thread is going to de-rail quickly. Here is mine: giving a flow less than 1:8 of its equitable share on a bottleneck is starvation in my book. (I accept that too many flows will result in almost unusable throughput for everybody, but that is a different starvation criterion, absolute minimal throughput, which for the discussion at hands seems less suited).


> So, it should be sufficient for L4S senders to /monitor/ for Classic AQMs, at least in the initial stages of the L4S experiment.

	[SM] Or better keep the L4S experiment isolated from the rest of the internet, then you can do whatever you want, no? 

> 
> This can then become part of the monitoring infrastructure that the L4S ops guidance proposes for the L4S experiment. It can highlight where classic ECN AQMs might have been deployed.

	[SM] How that? You will need numbers for your classifiers performance versus the ground truth to assess that, but by running over the uncontrolled internet, you will not have ground truth information. So please elaborate how you intend to use your classifier's output to assess the quality of said classifier itself?


> Then, if it produces false positives,

	[SM] How do you plan to figure out whether a detection is a hit, or a false positive and whether a non-detection is a correct rejection or a miss, if you do not run the experiment over a know network with reliable information about the exact rfc3168-ness of each AQM along the path?

> further digging should discover that there is not actually any classic ECN AQM where it says there is. Then at least false positives don't harm performance.

	[SM] This is IMHO the wrong mindset, do not optimise prematurely for you preferred case, but rather first see how reality looks and then optimize... 

> 
> If it turns out that there are a lot of classic ECN AQMs out there, it then becomes important to improve the detection algo so it can be used in-band for fall-back. On the other hand, if we find there is very little, it becomes preferable to use the network configuration techniques described in the ops guidance to alter the classic ECN AQMs to isolate classic ECN and L4S ECN traffic.

	[SM] This seems all quite optimistic, unless we get an error free rfc3168 detector first to assess the likelihood of rfc3168 AQMs on a typical internet path, no? Really, I must be missing something here, but this does not look like engineering for a proper solution here.

Best Regards
	Sebastian

> 
> Regards
> 
> 
> 
> Bob
> 
> PS. There are also the cases where L4S flows don't perform well (particularly around start-up behaviour) in a Classic ECN AQM within an FQ scheduler (e.g. the experiments Pete did with L4S over FQ_CoDel and with Cake). Over the summer Joakim and I have been working on faster response to marking in the Prague algo, and we need to combine that with delay measurements for these cases.
> 
> I would also like to see a commitment to improving the CoDel and Cobalt control laws to increase ECN marking more rapidly in response to consistently increasing delay. Because this has always been a problem with unresponsive flows, irrespective of whether L4S had ever appeared on the scene. I'd be willing and interested to help with that sort of work,... assuming my day job left sufficient time for this.
> 
> 
> On 31/07/2020 19:03, Jonathan Morton wrote:
>>> On 31 Jul, 2020, at 6:41 pm, Wesley Eddy <wes@mti-systems.com> wrote:
>>> 
>>> Hello, ticket #29 for the L4S documents is about classic bottleneck detection misidentifying L4S queues as classic ECN queues.
>>> 
>>> https://trac.ietf.org/trac/tsvwg/ticket/29
>>> 
>>> In contrast to other issues, it doesn't seem like this should block a WGLC on the L4S drafts.
>>> 
>>> 	• It is specific to classic bottleneck detection algorithm, which is planned to be worked on in the Prague ICCRG draft.
>>> 	• The result is sometimes failing to achieve the best possible L4S behavior, but doesn't seem to be an Internet safety issue.  This resulting in people turning off classic bottleneck detection would be a different issue, and something maybe the operator guidelines would address.
>>> 	• It seems like it can be worked on further in the course of L4S experimentation, without negative effects to others.
>>> So, I believe we should track this work in the ICCRG, and close the ticket here.  Please let me know in the next week if I've misunderstood any aspect of this and it should remain open.
>> Presently, the Prague congestion control algorithm is unsafe on the Internet without a properly functioning classic-bottleneck detection heuristic.  Additionally, the classic-bottleneck detection heuristic that has been published and demonstrated does not function properly.  This combination of facts absolutely *is* a blocker for WGLC.
>> 
>> Therefore, this issue should not be closed until it has been concretely addressed.
>> 
>>  - Jonathan Morton
>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>