Re: [tsvwg] plan for L4S issue #29

Pete Heist <pete@heistp.net> Tue, 29 September 2020 08:26 UTC

Return-Path: <pete@heistp.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 25F693A09E8 for <tsvwg@ietfa.amsl.com>; Tue, 29 Sep 2020 01:26:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=heistp.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vCf8vh5aw1vh for <tsvwg@ietfa.amsl.com>; Tue, 29 Sep 2020 01:26:47 -0700 (PDT)
Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 645243A09E5 for <tsvwg@ietf.org>; Tue, 29 Sep 2020 01:26:47 -0700 (PDT)
Received: by mail-wr1-x42f.google.com with SMTP id o5so4280889wrn.13 for <tsvwg@ietf.org>; Tue, 29 Sep 2020 01:26:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=8N3guXY5lu/QRvy0pB7IUAGr7GU7qIjE5z4JbMGZdZE=; b=X9wzjaJvl4eX12DW5hFV6AumVDrWv3hRJYRoJbXkudxNj5BM6LlF/ghfqvLCw9hRjt EOTol3DSYmZCM2s52ju7PJOG9ndFvq4V0QBYiB6HJO4BvPT4b+PqALNFkdmbCnL+VSOL ytzk1T+wMDInHHHoDjLf2HNeI8Yu1bkkOPARIU6QfYWfOG8+MGeabAB+ZBrymwkpledQ piGUpcu4Or5Z1MI1nge1VezCNBLUPBiuBDLjlv4ZDUGxk4BVTBG+gFgStZkXy5X+8sB2 Ew0hMRjusM7bFft2aMIhYRwIv1VYbmaNH3G1PCOPHeueI+QjyamVp+1uMXS58IISHU4R 6xIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=8N3guXY5lu/QRvy0pB7IUAGr7GU7qIjE5z4JbMGZdZE=; b=JxNBYtoqZq0saVAutdwULdOiOUcWPMwYZ5xOWPSzmbAikeoe37uTJCzFedan+XPmKp IHzS9Fy8eQOq5GkWNli1Khvpypwm6m81JCR2H7GHBAbzpdzO2UXOW9tpOnemhPsN5yg1 13Yo8vxDPMUK2yl1TuP8fMeoJwZvuA241kmeydDvMnuUCtj00aDLfMEJnmNig9NGh86T joQPMex6zVAOCk8IlsOpj0gRhqBM6QheSakRI6JREn7d8XU1ArlD21oLwbvEI5tTAkrk P+ZLBidjy+Ns/NGySFNi9QXxxz2Ysnr+yQ2YOMMeU+HNAVkeSiU2jiM8fU+pycFmsDos ujTQ==
X-Gm-Message-State: AOAM53149r2w9LqRsF/b/ISj8+FJiwl1IdsDIak1vnko4IQcjKD+6tql 7NVOGz7LzrvcIjjLiibfvjugHg==
X-Google-Smtp-Source: ABdhPJzs2Kaj1SCdoRDWR//R9q225o5fQrl7N1AWv9hmNTe8tUv8b+ebcN0z7eVapRZAtj0dK46vmQ==
X-Received: by 2002:adf:81e6:: with SMTP id 93mr2876092wra.412.1601368005346; Tue, 29 Sep 2020 01:26:45 -0700 (PDT)
Received: from [10.72.0.88] (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id m4sm5475015wro.18.2020.09.29.01.26.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 01:26:44 -0700 (PDT)
Message-ID: <6ec53e8416422916913ae6f6a08bbe2aae061276.camel@heistp.net>
From: Pete Heist <pete@heistp.net>
To: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>, Sebastian Moeller <moeller0@gmx.de>
Cc: Wesley Eddy <wes@mti-systems.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Date: Tue, 29 Sep 2020 10:26:44 +0200
In-Reply-To: <AM0PR07MB6114EFE292E21A019F712CCCB9350@AM0PR07MB6114.eurprd07.prod.outlook.com>
References: <ca8ede0e-53a2-f4ff-751d-f1065cf5e795@mti-systems.com> <D0D3EDCE-3633-4E37-A167-3F1E09148ED9@heistp.net> <AM0PR07MB6114EDA6F2E8DCCB3D86D082B9200@AM0PR07MB6114.eurprd07.prod.outlook.com> <92c056567b3ad7af08777829314673ed66f5a96b.camel@heistp.net> <AM0PR07MB61140549F3BCAA65BBE6AD24B9380@AM0PR07MB6114.eurprd07.prod.outlook.com> <1F797C57-6284-4FA7-93F1-0CFCA903CC3C@gmx.de> <AM0PR07MB6114EFE292E21A019F712CCCB9350@AM0PR07MB6114.eurprd07.prod.outlook.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.36.5
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/uBE5u-5O1Kpz0h7cuxIJqLew9-0>
Subject: Re: [tsvwg] plan for L4S issue #29
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Sep 2020 08:26:50 -0000

On Mon, 2020-09-28 at 09:54 +0000, De Schepper, Koen (Nokia -
BE/Antwerp) wrote:
> Hi Sebastian,
> 
> > > 	How confident are we that the "wide variety of conditions"
> > > under which this supposedly works is actually representative for
> > > the existing internet?
> 
> Only real world deployments will answer this question. I guess any
> other answer will be purely speculation...
> 
> > > 	rfc3168 present; rfc3168 detected: 	hit (true positive)
> > > 	rfc3168 present; rfc3168 not-detected:	miss	(false
> > > negative)
> > > 	rfc3168 absent; rfc3168 detected:		false alarm
> > > (false positive)
> > > 	rfc3168 absent; rfc3168 not-detected:	correct rejection (true
> > > negative)
> 
> Thanks for this. I like this classification and the tabular
> representation a lot. I guess it is ok then to use in the draft
> "missed detection" and "false detection"? I assume the other correct
> cases will be less referred to.

Personally, "missed detection" and "false detection" sound fine to me,
as I get the right sense of both the bottleneck and the result in both
cases.

> > > 	That is why I believe that rfc3168 detection in TCP Prague is a
> > > red herring that distracts from fixing L4S true issues. Like
> > > demonstrating that the current implementation actually performs
> > > as inteded over long holti-hop high RTT high Bandwdth links, over
> > > asymmetric links, and over uni- and bidirectionally saturated
> > > links.
> 
> These are all known problems that also exist in Classic TCP on FIFO
> queues. Solutions are known for several (all?) of them, which can as
> well be implemented for L4S (Prague CCs and L4S AQMs). So I wouldn't
> blame L4S for those problems or expect that a reference
> implementation is including all solutions to all known problems.
> 
> Koen.
> 
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de> 
> Sent: Wednesday, September 23, 2020 7:47 PM
> To: De Schepper, Koen (Nokia - BE/Antwerp) <
> koen.de_schepper@nokia-bell-labs.com>
> Cc: Pete Heist <pete@heistp.net>; Wesley Eddy <wes@mti-systems.com>; 
> tsvwg@ietf.org
> Subject: Re: [tsvwg] plan for L4S issue #29
> 
> Hi Koen,
> 
> 
> > On Sep 23, 2020, at 17:46, De Schepper, Koen (Nokia - BE/Antwerp) <
> > koen.de_schepper@nokia-bell-labs.com> wrote:
> > 
> > Hi Pete,
> > 
> > I don't think the goal is to wait for a perfect RFC3168 detection
> > solution and we already have an implementation that works well
> > under a wide variety of conditions without additional
> > configuration.
> 
> 	How confident are we that the "wide variety of conditions"
> under which this supposedly works is actually representative for the
> existing internet?
> 
> 
> > This also means that it can still be improved, which I think can be
> > done more appropriate when facing real world issues. The goal of
> > the operational guidelines is to provide a larger set of tools for
> > solving problems with classic ECN bottlenecks, exactly to avoid the
> > need to rely on perfect end-host detection mechanism only.
> 
> 	I grudgingly agree that trying to fix L4S's AQM short comings
> by mandating fancy heuristics in the end points seems to be a loosing
> proposition. 
> 
> > Related to the "false negatives" and "false positives" naming, I
> > agree that it is very confusing. As the goal is to detect Classic
> > ECN network AQM behavior, maybe better and shorter names could be
> > "false-detect" and "false-non-detect"? 
> 
> 	I do not think that that is helpful, false-positive and false
> positive have much better well-known definitions that actually apply
> here. So if other nomenclature should be used, let's make it clearly
> different or better yet use already established terms. 
> 
> 	We could switch to signal detection theory terms (which is just
> one way to think about a classifier with just two options), then the
> four combinations of truth and classification for our RFC3168
> detector would be described with the following terms:
> 
> rfc3168 present; rfc3168 detected: 	hit (true positive)
> rfc3168 present; rfc3168 not-detected:	miss	(false
> negative)
> rfc3168 absent; rfc3168 detected:		false alarm (false
> positive)
> rfc3168 absent; rfc3168 not-detected:	correct rejection (true
> negative)
> 
> these terms are pretty much standard for similar detection problems
> (and also offer a decent approach to assess the effectiveness of the
> detector).
> 
> Let's stick to some already established nomenclature, whether that is
> the true/false positive/negative one or the DST one, please (see e.g.
> https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers).
> 
> The bigger issue I have with this is that this is still just a
> "fudge" to be able write in the internet draft "transports are
> required to implement RFC3168 detection, as demonstrated in TCP
> Prague" even though everybody here should know that that is not going
> to happen. People will mark whatever with ECT(1) and we will have to
> live with the fall-out. IMHO this is a sufficient reason to put the
> smarts in the AQM, instead of the current approach of assuming that
> end-points will just do this out of goodness of their heart.
> 
> That is why I believe that rfc3168 detection in TCP Prague is a red
> herring that distracts from fixing L4S true issues. Like
> demonstrating that the current implementation actually performs as
> inteded over long holti-hop high RTT high Bandwdth links, over
> asymmetric links, and over uni- and bidirectionally saturated links.
> 
> Best Regards
> 	Sebastian
> 
> > Koen.
> > 
> > -----Original Message-----
> > From: Pete Heist <pete@heistp.net>
> > Sent: Wednesday, September 23, 2020 2:19 PM
> > To: De Schepper, Koen (Nokia - BE/Antwerp) 
> > <koen.de_schepper@nokia-bell-labs.com>; Wesley Eddy 
> > <wes@mti-systems.com>
> > Cc: tsvwg@ietf.org
> > Subject: Re: [tsvwg] plan for L4S issue #29
> > 
> > Hi Koen,
> > 
> > I can definitely understand the need to turn bottleneck detection
> > on and off for testing, or for additional knobs during development.
> > 
> > Overall, I suspect that there will be more questions about
> > potential problems if bottleneck detection is not a MUST for
> > implementations in the draft, or not baked into the final
> > implementation in a way that works well under a wide variety of
> > conditions without additional configuration.
> > 
> > On an easier topic, I wonder if we shouldn't change the "false 
> > negatives" and "false positives" terminology to something clearer, 
> > like "mis-identification of RFC 3168 bottlenecks as L4S", or "mis- 
> > identification of L4S bottlenecks as RFC 3168", respectively. I
> > might 
> > have opened up a can of worms there in trying to save a few words.
> > :)
> > 
> > Pete
> > 
> > On Tue, 2020-09-15 at 12:43 +0000, De Schepper, Koen (Nokia -
> > BE/Antwerp) wrote:
> > > Hi Wes, Pete,
> > > 
> > > I think to make progress on avoiding both false negatives and
> > > false 
> > > positives, a good view on the conditions that cause problems is 
> > > needed. So we better have the means to detect the real life
> > > impact of 
> > > Classic-ECN-FIFO deployments. This means we need to be able to
> > > switch 
> > > off Classic ECN detection (under controlled or even known
> > > conditions).
> > > 
> > > Another point is that it would be useful also to have all
> > > control 
> > > variables of the existing implementation configurable for
> > > everyone 
> > > willing to further experiment (without necessarily needing to
> > > change 
> > > code). As I understood, the right tuning of these can bring a lot
> > > of 
> > > further improvement opportunities. Also depending on a typical 
> > > deployment, these parameters could be tuned for that specific 
> > > targeted case.
> > > 
> > > So the resolution of this issue is exactly to facilitate further 
> > > improving the detection algorithm (preferably via tuning), and
> > > being 
> > > able to disable it when conditions are controlled or safe to
> > > avoid 
> > > these false negatives.
> > > 
> > > I think these are topics that can be covered by the Operational 
> > > Guidelines draft.
> > > 
> > > Regards,
> > > Koen.
> > > 
> > > -----Original Message-----
> > > From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Pete Heist
> > > Sent: Friday, July 31, 2020 8:53 PM
> > > To: Wesley Eddy <wes@mti-systems.com>
> > > Cc: tsvwg@ietf.org
> > > Subject: Re: [tsvwg] plan for L4S issue #29
> > > 
> > > Hi Wesley,
> > > 
> > > One thing I noticed during testing was that the current 
> > > implementation of TCP Prague in Linux allows disabling
> > > bottleneck 
> > > detection through the prague_ecn_fallback kernel module parameter
> > > (
> > > https://github.com/L4STeam/linux/blob/0e7cf8acb318873c3f61084453f8da1
> > > 5 b2e398be/net/ipv4/tcp_prague.c , line 158). I don’t know if
> > > that 
> > > was left in only for testing.
> > > 
> > > In section 6.3.3 of l4s-arch, there is discussion around classic 
> > > bottleneck detection. Since I don’t see an explicit MUST that it 
> > > remain enabled (although I do see the text “an L4S sender will
> > > have 
> > > to fall back to…”), it’s not completely clear to me if it’s
> > > actually 
> > > required to be implemented and permanently enabled in all 
> > > implementations. If it is, I suppose the implementation should 
> > > reflect that also.
> > > 
> > > While I feel it best that detection identifies both types of
> > > queues 
> > > accurately, if bottleneck detection were both an explicit MUST in
> > > the 
> > > text *and* not possible to disable in any implementation, I
> > > think 
> > > that would make the misidentification of L4S queues as classic
> > > ECN 
> > > queues less of a safety concern, since it would be impossible to
> > > turn 
> > > off. It would remain an issue for the architecture overall
> > > though.
> > > 
> > > Hope that helps...
> > > 
> > > Pete
> > > 
> > > > On Jul 31, 2020, at 5:41 PM, Wesley Eddy <wes@mti-systems.com>
> > > > wrote:
> > > > 
> > > > Hello, ticket #29 for the L4S documents is about classic
> > > > bottleneck 
> > > > detection misidentifying L4S queues as classic ECN queues.
> > > > 
> > > > https://trac.ietf.org/trac/tsvwg/ticket/29
> > > > 
> > > > In contrast to other issues, it doesn't seem like this should
> > > > block 
> > > > a WGLC on the L4S drafts.
> > > > 
> > > > 	• It is specific to classic bottleneck detection
> > > > algorithm, which 
> > > > is planned to be worked on in the Prague ICCRG draft.
> > > > 	• The result is sometimes failing to achieve the best
> > > > possible L4S 
> > > > behavior, but doesn't seem to be an Internet safety
> > > > issue.  This 
> > > > resulting in people turning off classic bottleneck detection
> > > > would 
> > > > be a different issue, and something maybe the operator
> > > > guidelines 
> > > > would address.
> > > > 	• It seems like it can be worked on further in the
> > > > course of L4S 
> > > > experimentation, without negative effects to others.
> > > > So, I believe we should track this work in the ICCRG, and close
> > > > the 
> > > > ticket here.  Please let me know in the next week if I've 
> > > > misunderstood any aspect of this and it should remain open.
> > > > 
> > > >