Re: [tcpm] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
"Black, David" <David.Black@dell.com> Tue, 09 July 2019 14:44 UTC
Return-Path: <David.Black@dell.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DD095120486; Tue, 9 Jul 2019 07:44:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.689
X-Spam-Level:
X-Spam-Status: No, score=-2.689 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=dell.com header.b=mb5C7rDW; dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=emc.com header.b=JsOfgdW7
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eyKfytUSon-B; Tue, 9 Jul 2019 07:44:32 -0700 (PDT)
Received: from mx0b-00154904.pphosted.com (mx0b-00154904.pphosted.com [148.163.137.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7A0AF120485; Tue, 9 Jul 2019 07:44:32 -0700 (PDT)
Received: from pps.filterd (m0170395.ppops.net [127.0.0.1]) by mx0b-00154904.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x69Ee8xK004813; Tue, 9 Jul 2019 10:43:48 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=smtpout1; bh=ZKIHq/7KR9vNEbykE4JilpU/BdtA+5tobk8fM0SLt3Q=; b=mb5C7rDWJD3U1oL1DafHQvRbH3jHnepGGdmpQJEMiiAIS/TSBZLGu7f+AMM+nTq7lkvd TuWtx+4P9FlhIwisz+lgu1yfCWNvsIcnVrkm8/e0s868ZaGKzQzkwrWuQwC5/67HKZi/ V9fDi8UMNH2iBwRSjfUEi0cFAvtvY72edcaQik2vndy6Ww68xW4WGaxMInOKFIuMKiTC 5Qrgn9lDQZCzhhP+k0u4uSM30oNoZtXvSwu5Oy+gqB9Wx6zP3JP/mvoVGWSWqH4Brjf0 +3GFXXPxRqnonHoVVuftms0DAqwX9RRoAPE4RTbBdOuWhQbHC5bKTt8fEVR9X9CbwH6C UQ==
Received: from mx0b-00154901.pphosted.com (mx0b-00154901.pphosted.com [67.231.157.37]) by mx0b-00154904.pphosted.com with ESMTP id 2tme96kjj9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Jul 2019 10:43:48 -0400
Received: from pps.filterd (m0144104.ppops.net [127.0.0.1]) by mx0b-00154901.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x69EhXtZ061183; Tue, 9 Jul 2019 10:43:48 -0400
Received: from mailuogwdur.emc.com (mailuogwdur.emc.com [128.221.224.79]) by mx0b-00154901.pphosted.com with ESMTP id 2tmv8c0y17-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 09 Jul 2019 10:43:47 -0400
Received: from maildlpprd54.lss.emc.com (maildlpprd54.lss.emc.com [10.106.48.158]) by mailuogwprd52.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x69EhcVb018503 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 9 Jul 2019 10:43:46 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd52.lss.emc.com x69EhcVb018503
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1562683426; bh=A5HiedJrzjeOK++NfS9g4fpV3rY=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=JsOfgdW7h142tvXUm0ulvqGwKrJuHkF0709jCMn4hhK/maESH+2FRVDhlGYRkUHjS QtGTmahS+ciE42jfk9Tj7xJgy1gOZyFLNLParg6eNfvtTt2r75KU6UNzL3sR1Lbhww yuOWXx+zBETSqxOwjT+Eg6+JLoogxy+FUy6DPTG0=
Received: from mailusrhubprd54.lss.emc.com (mailusrhubprd54.lss.emc.com [10.106.48.19]) by maildlpprd54.lss.emc.com (RSA Interceptor); Tue, 9 Jul 2019 10:41:05 -0400
Received: from MXHUB303.corp.emc.com (MXHUB303.corp.emc.com [10.146.3.29]) by mailusrhubprd54.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x69Ef54r027571 (version=TLSv1.2 cipher=AES128-SHA256 bits=128 verify=FAIL); Tue, 9 Jul 2019 10:41:05 -0400
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB303.corp.emc.com ([10.146.3.29]) with mapi id 14.03.0439.000; Tue, 9 Jul 2019 10:41:04 -0400
From: "Black, David" <David.Black@dell.com>
To: Bob Briscoe <ietf@bobbriscoe.net>, "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, tcpm IETF list <tcpm@ietf.org>
CC: tsvwg IETF list <tsvwg@ietf.org>, "Black, David" <David.Black@dell.com>
Thread-Topic: [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
Thread-Index: AQHVIgfVHJuiHQ6zPkakYH0F258pwabCeiqw
Date: Tue, 09 Jul 2019 14:41:04 +0000
Message-ID: <CE03DB3D7B45C245BCA0D243277949363060CC9C@MX307CL04.corp.emc.com>
References: <24f7b15a-129f-ca44-60e0-32c7d23eadf4@bobbriscoe.net>
In-Reply-To: <24f7b15a-129f-ca44-60e0-32c7d23eadf4@bobbriscoe.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Enabled=True; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_SiteId=945c199a-83a2-4e80-9f8c-5a91be5752dd; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Owner=david.black@emc.com; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_SetDate=2019-07-09T14:09:55.0166457Z; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Name=External Public; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Application=Microsoft Azure Information Protection; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Extended_MSFT_Method=Manual; aiplabel=External Public
x-originating-ip: [10.238.21.131]
Content-Type: multipart/alternative; boundary="_000_CE03DB3D7B45C245BCA0D243277949363060CC9CMX307CL04corpem_"
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd54.lss.emc.com
X-RSA-Classifications: public
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-07-09_06:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907090176
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907090176
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/PPBx9NSGcE0q3zoF2giNenXl1C8>
Subject: Re: [tcpm] [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 14:44:37 -0000
Bob, Commenting as an individual, not a WG chair. > Q#1: If this glosses over any concerns you have, please explain. It does gloss over, at least for me. The TL;DR summary is that items 1-3 aren’t relevant or helpful, IMHO, leaving items 4 and 5, whose effectiveness depends on widespread deployment of RACK and FQ AQMs (e.g., FQ-CoDel) respectively. Items 1 & 2: The general expectation for Internet transport protocols is that they’re robust against “stupid network tricks” like reordering, but existing protocols transport wind up being designed/implemented for the network we have, not the one we wish we had. I’m generally skeptical of “highly unlikely” arguments, as horrendous results in a highly unlikely scenario are not acceptable if that scenario occurs repeatedly, even with long intervals in between occurrences. In light of that, I view items 1 and 2 as defining the problem scenario that needs to be addressed, particularly if L4S is to be widely deployed, and prefer to focus on items 3-5 about how the problem is dealt with. Item 3: This begins by correctly points out that 3DupACK is the criteria for triggering conventional TCP retransmission, e.g., 2DupACK doesn’t. An aspect that isn’t mentioned is that AQMs for classic (non-L4S) traffic should be randomly marking (above a queue threshold, CE marking probability depends on queue occupancy), not threshold marking (above a queue threshold, mark all packets with CE). If threshold marking is used, 3 CE marks in a row is a near certainty, as for non-mice flows, one can expect to have at least that many packets in an RTT window; this is a “Doctor it hurts when I do <this>.”/”Don’t do that!” scenario where the right answer is to fix the broken threshold marking implementation. Assuming probabilistic marking, one then needs to look at 3-in-a-row CE marking probabilities based on the marking rate. These are not small - for example, at a 10% marking probability, the likelihood of CE-marking 3 packets in a row starting from a specific packet is 1 in 1,000 (1/10th of 1%), but across 500 packets in a flow, that probability is about 50%. My initial take-away from this is that if the two bottlenecks (conventional followed by L4S) persist, then the “unusual scenario” of 3 CE-marked packets in a row is nearly certain to happen, which suggests that item 3 is not particularly helpful, leaving items 4 (RACK) and 5 (FQ-CoDel). So, while I don’t have a conclusion to draw, it appears to me that the countermeasures to this conventional TCP flow misbehavior with L4S are deployment of RACK at endpoints and deployment of FQ AQMs such as FQ-CoDel at non-L4S potential bottleneck nodes. Items 4 and 5 below effectively assert wide deployment of those algorithms – additional information and data on that would be of interest. Thanks, --David From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Bob Briscoe Sent: Thursday, June 13, 2019 12:48 PM To: ecn-sane@lists.bufferbloat.net; tcpm IETF list Cc: tsvwg IETF list Subject: [tsvwg] ECN CE that was ECT(0) incorrectly classified as L4S [EXTERNAL EMAIL] [I'm sending this to ecn-sane 'cos that's where I detect that this concern is still rumbling. I'm also sending to tcpm@ietf 'cos there's a question for TCP experts just before the quoted text below. And tsvwg@ietf is where it ought to be discussed.] Now that the IPR issue with L4S has been put to bed, one by one I am going through the other concerns that have been raised about L4S. In the IETF draft that records all the pros and cons of different identifiers to use for L4S, under the "ECT(1) and CE" choice (which is currently the one adopted at the IETF) there was already an explanation of why there would be vanishingly low risk of any harmful consequences from CE that was originally ECT(0) being classified into the L4S queue: https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-06#page-32 Re-reading that, I have found some things unstated that I had thought were obvious. So I've spelled it all out long-hand in the text below, which is now in my local copy of the draft and will be in the next revision unless people suggest improvements/corrections here. Q#1: If this glosses over any concerns you have, please explain. Otherwise I will continue to consider that this is effectively a non-issue, which is the conclusion everyone in the TCP community came to at the time the L4S identifier was chosen back in 2015. Q#2: The last couple of lines are the only part I am not sure of. Do most of today's TCP implementations recover the reduction in congestion window when they discover later that a fast retransmit was spurious? There's a note at the end of the intro to rfc4015 saying there was insufficient consensus to standardize this behaviour, but that most likely means it's done in different ways, rather than it isn't done at all. Bob ====================================== Risk of reordering classic CE packets: Classifying all CE packets into the L4S queue risks any CE packets that were originally ECT(0) being incorrectly classified as L4S. If there were delay in the Classic queue, these incorrectly classified CE packets would arrive early, which is a form of reordering. Reordering can cause TCP senders (and senders of similar transports) to retransmit spuriously. However, the risk of spurious retransmissions would be extremely low for the following reasons: 1. It is quite unusual to experience queuing at more than one bottleneck on the same path (the available capacities have to be identical). 2. In only a subset of these unusual cases would the first bottleneck support classic ECN marking while the second supported L4S ECN marking, which would be the only scenario where some ECT(0) packets could be CE marked by a non-L4S AQM then the remainder experienced further delay through the Classic side of a subsequent L4S DualQ AQM. 3. Even then, when a few packets are delivered early, it takes very unusual conditions to cause a spurious retransmission, in contrast to when some packets are delivered late. The first bottleneck has to apply CE-marks to at least N contiguous packets and the second bottleneck has to inject an uninterrupted sequence of at least N of these packets between two packets earlier in the stream (where N is the reordering window that the transport protocol allows before it considers a packet is lost). For example consider N=3, and consider the sequence of packets 100, 101, 102, 103,... and imagine that packets 150,151,152 from later in the flow are injected as follows: 100, 150, 151, 101, 152, 102, 103... If this were late reordering, even one packet arriving 50 out of sequence would trigger a spurious retransmission, but there is no spurious retransmission here, because packet 101 moves the cumulative ACK counter forward before 3 packets have arrived out of order. Later, when packets 148, 149, 153... arrive, even though there is a 3-packet hole, there will be no problem, because the packets to fill the hole are already in the receive buffer. 4. Even with the current recommended TCP (N=3) spurious retransmissions will be unlikely for all the above reasons. As RACK [I-D.ietf-tcpm-rack] is becoming widely deployed, it tends to adapt its reordering window to a larger value of N, which will make the chance of a contiguous sequence of N early arrivals vanishingly small. 5. Even a run of 2 CE marks within a classic ECN flow is unlikely, given FQ-CoDel is the only known widely deployed AQM that supports classic ECN marking and it takes great care to separate out flows and to space any markings evenly along each flow. It is extremely unlikely that the above set of 5 eventualities that are each unusual in themselves would all happen simultaneously. But, even if they did, the consequences would hardly be dire: the odd spurious fast retransmission. Admittedly TCP reduces its congestion window when it deems there has been a loss, but even this can be recovered once the sender detects that the retransmission was spurious. -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- [tcpm] ECN CE that was ECT(0) incorrectly classif… Bob Briscoe
- Re: [tcpm] [tsvwg] ECN CE that was ECT(0) incorre… Black, David
- Re: [tcpm] [tsvwg] ECN CE that was ECT(0) incorre… Neal Cardwell
- Re: [tcpm] [Ecn-sane] ECN CE that was ECT(0) inco… Jonathan Morton
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Yuchung Cheng
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Jonathan Morton
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Dave Taht
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Dave Taht
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Sebastian Moeller
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Jonathan Morton
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Sebastian Moeller
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Sebastian Moeller
- Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT… Jonathan Morton
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Mikael Abrahamsson
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Ruediger.Geib
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Jonathan Morton
- Re: [tcpm] [Ecn-sane] [tsvwg] ECN CE that was ECT… Dave Taht