Re: [tsvwg] Switch testing at 25G with ECN --SCE Draft

"Black, David" <David.Black@dell.com> Mon, 12 August 2019 21:58 UTC

Return-Path: <David.Black@dell.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80E1D120108 for <tsvwg@ietfa.amsl.com>; Mon, 12 Aug 2019 14:58:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=dell.com header.b=nEJ3ZBhB; dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=emc.com header.b=ePwGV+tU
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3zmsE2pZ7B2i for <tsvwg@ietfa.amsl.com>; Mon, 12 Aug 2019 14:58:00 -0700 (PDT)
Received: from mx0b-00154904.pphosted.com (mx0b-00154904.pphosted.com [148.163.137.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3683F121749 for <tsvwg@ietf.org>; Mon, 12 Aug 2019 07:35:13 -0700 (PDT)
Received: from pps.filterd (m0170394.ppops.net [127.0.0.1]) by mx0b-00154904.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7CEYnno007865; Mon, 12 Aug 2019 10:35:09 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=smtpout1; bh=mfk5xcBUylx+nIjHpm9+NvrLjDk1r/63iqCdXEcD9Ow=; b=nEJ3ZBhBi1OYmmCqtCH6ho+q59oUEZBqhznIqafo/5bCWsGeg/L3LQajYxKcCEsCUFnX YoNq22B9n/ZEMcWLq7HWc6cvuGlW0Iy0/093hJfCBaJACnM2AiO3O4ZY96gH8sNPzexv 5xUQAFnov6dUZQlmUg/fhvICzzyr6Sd0Z5Jl9psEyvhxJndrYmeAOKDlMU7aRO7qzJi5 k4+Wv9yskSKmDSqw6X8IwN2ugLBrTDysf1WoWUAfPZ0aI/Iyc7hyqPPyTGaN1LbiaVRo P8Gzj/mdxFQZjUGXHtUkLLDHujFWXRNmsU0BuAW/6qSYHzR0qmt7FsKg3KtMo7wEr+hT Cw==
Received: from mx0b-00154901.pphosted.com (mx0b-00154901.pphosted.com [67.231.157.37]) by mx0b-00154904.pphosted.com with ESMTP id 2u9rg574pd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 12 Aug 2019 10:35:09 -0400
Received: from pps.filterd (m0144103.ppops.net [127.0.0.1]) by mx0b-00154901.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7CEWm57164319; Mon, 12 Aug 2019 10:35:09 -0400
Received: from mailuogwhop.emc.com (mailuogwhop.emc.com [168.159.213.141]) by mx0b-00154901.pphosted.com with ESMTP id 2ub6f93628-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 12 Aug 2019 10:35:09 -0400
Received: from maildlpprd02.lss.emc.com (maildlpprd02.lss.emc.com [10.253.24.34]) by mailuogwprd03.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x7CEYuva010700 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 12 Aug 2019 10:35:08 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd03.lss.emc.com x7CEYuva010700
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1565620508; bh=zESdtxTk7FU0ShEwswPqBEiYm4I=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=ePwGV+tUI18UCl2dc5/frkRDrU6fnEqJ4a8h4Z52DNnHi82UOwQlnK2eFsTC+NgGx Iyvg6MMy+xvGVLf/ppqd15nImVQZXp5Wt9mUGrAbo8YLgv/dWKOY7/393rB2qImNzT i9zkROprKjtlDYA11G1Kmn2Sbb0fT+IeH2iUKvAs=
Received: from mailusrhubprd51.lss.emc.com (mailusrhubprd51.lss.emc.com [10.106.48.24]) by maildlpprd02.lss.emc.com (RSA Interceptor); Mon, 12 Aug 2019 10:34:08 -0400
Received: from MXHUB321.corp.emc.com (MXHUB321.corp.emc.com [10.146.3.99]) by mailusrhubprd51.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x7CEY7ne002226 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=FAIL); Mon, 12 Aug 2019 10:34:08 -0400
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB321.corp.emc.com ([10.146.3.99]) with mapi id 14.03.0439.000; Mon, 12 Aug 2019 10:34:07 -0400
From: "Black, David" <David.Black@dell.com>
To: Jonathan Morton <chromatix99@gmail.com>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>, "Black, David" <David.Black@dell.com>
Thread-Topic: [tsvwg] Switch testing at 25G with ECN --SCE Draft
Thread-Index: AQHVTjzVhUHobEZZSO2B4aLiltuEeqbyKfAAgAAHcICAAUMeAIAAGKcAgAAD1wCAAAqVgIAAKN0AgAGrQQCAAGNkAIABwyFA
Date: Mon, 12 Aug 2019 14:34:07 +0000
Message-ID: <CE03DB3D7B45C245BCA0D24327794936306675CD@MX307CL04.corp.emc.com>
References: <201908100002.x7A02e5h099876@gndrsh.dnsmgr.net> <1cb8e129-c22c-2b26-1149-e68305fee991@mti-systems.com>, <46D1EB74-2663-436B-A5FB-FC59A8BB2B8D@gmail.com>
In-Reply-To: <46D1EB74-2663-436B-A5FB-FC59A8BB2B8D@gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.253.49.37]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd51.lss.emc.com
X-RSA-Classifications: public
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-08-12_06:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1908120163
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1908120163
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/KkfODHDj8YbMWW_3QZ53EQl1KoA>
Subject: Re: [tsvwg] Switch testing at 25G with ECN --SCE Draft
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Aug 2019 21:58:03 -0000

Writing as an author of RFC 3168, not as a WG chair ...

> More generally, I notice it's a surprisingly common misconception on this list
> that RFC-3168 states that marking should happen when, or just before, packets
> would otherwise be dropped.  A little critical thought will show that this is
> absurd; AQM is supposed to leverage congestion-control behaviour to keep
> the queue depth well away from the hard tail-drop zone on average.
> What the spec *actually* says is that an AQM should substitute packet drops
> for marking, when the packet is not ECT and therefore cannot be marked;
> this is practically the inverse of the misconception noted.

+1 - RFC 3168 mandates drop-equivalence of CE, *not* tail-drop-equivalence, as can be seen from the numerous mentions of AQM and the RED AQM in RFC 3168's text (yes, we've moved on from RED, but RFC 3168 was written nearly 2 decades ago).  Tail-drop equivalence is allowed, however, in practice, most CE marks are generated by AQMs, e.g., ...

For completeness, I should note that RFC 8511 (Experimental) "TCP Alternative Backoff with ECN (ABE)" allows the TCP sender reaction to reported CE marks to differ from the TCP sender reaction to drops - the TCP sender is allowed to back off by a smaller amount on the assumption that CE marks have been applied by AQMs, as opposed to last-ditch tail-drop-avoidance mechanisms.

Thanks,
--David

________________________________________
From: tsvwg [tsvwg-bounces@ietf.org] on behalf of Jonathan Morton [chromatix99@gmail.com]
Sent: Sunday, August 11, 2019 3:27 AM
To: Wesley Eddy
Cc: tsvwg@ietf.org
Subject: Re: [tsvwg] Switch testing at 25G with ECN --SCE Draft

[EXTERNAL EMAIL]

> The advice in 3168 is to do marking based on average queue length, but *not* just before it needs to start dropping.  Is it more correct instead to understand what you're saying as the experiment being based on using the instantaneous queue length rather than average?

I think that is right.  In software we like to use the actual sojourn time within the queue of the candidate packet, but in hardware it is often convenient to use the queue length instead.

More generally, I notice it's a surprisingly common misconception on this list that RFC-3168 states that marking should happen when, or just before, packets would otherwise be dropped.  A little critical thought will show that this is absurd; AQM is supposed to leverage congestion-control behaviour to keep the queue depth well away from the hard tail-drop zone on average.  What the spec *actually* says is that an AQM should substitute packet drops for marking, when the packet is not ECT and therefore cannot be marked; this is practically the inverse of the misconception noted.

In particular, I believe marking should definitely occur by the time the queue is half-full, or more precisely so that there is a full BDP (including queuing delay already incurred at the point the mark is applied) of space remaining in the queue.  Why?  To accommodate the RTT of control feedback delay before the signal to exit slow-start takes effect at the queue, during which the cwnd of a typical TCP will double.  Most AQMs do in fact satisfy that principle.

>> RFC3168 ECN marking is already in the switch, the switch was modified to behave in a different manner, using the RFC3168 CE bits.  This different manner was turned OFF to do the dctcp tests and turned ON to do the dctcp-sce tests.
>
> Since the marking probability function and marking decision isn't part of 3168/ECN, the description here sounds like the switch is always conforming to 3168 behavior, but just using some custom decision logic for marking.
>
> Just trying to be clear on what we're really talking about ...

I think it's best to consider this an experimental prototype, in which the goal is to investigate whether the obtained behaviour is desirable and useful before continuing with more involved development.  From what I hear, and without going into needless detail, it was easier in the short term to set it up with CE marking and transform that into SCE codepoints at the receiver.  This was always going to be a temporary expedient.

Also from what I hear, experiments had previously been conducted using CE marking and the standard version of DCTCP in Linux, but without anywhere near the level of success seen in the results just posted.  It seems likely that faults in the DCTCP implementation were responsible, but this in turn raises the question of how well maintained that piece of code is, and how such faults were allowed to persist in the mainline codebase.

Linux is normally associated with better development practices than that.  The most reasonable explanation I can come up with is that DCTCP is not actually in widespread use, so when faults appeared they were not noticed by any party interested in having them fixed.  Or, perhaps more disturbingly, DCTCP *is* in widespread use in closed datacentre environments, but when the faults arose in production their effects were not recognised.

So in practice, this controlled experiment has validated certain principles associated with the DCTCP response function (well, a simplified version of it), and also shown that other response functions are similarly effective, when applied to an ultra-low-latency network.  It has also shown that the proposed feedback mechanism for SCE signals on TCP connections is viable, so the relatively complex AccECN mechanism isn't obviously needed. We have shown other results which show similarly good behaviour on Internet-scale latencies with real SCE marking, using a broadly similar marking function.  Overall these are encouraging results for us, and work will continue accordingly.

 - Jonathan Morton