[tsvwg] Comments on L4S drafts

"Holland, Jake" <jholland@akamai.com> Wed, 05 June 2019 00:02 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A2AEA12008F for <tsvwg@ietfa.amsl.com>; Tue, 4 Jun 2019 17:02:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.71
X-Spam-Level:
X-Spam-Status: No, score=-2.71 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jNIvfy7EnJMf for <tsvwg@ietfa.amsl.com>; Tue, 4 Jun 2019 17:02:21 -0700 (PDT)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C42CE12000E for <tsvwg@ietf.org>; Tue, 4 Jun 2019 17:02:21 -0700 (PDT)
Received: from pps.filterd (m0122333.ppops.net [127.0.0.1]) by mx0a-00190b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x54NuPP6028549; Wed, 5 Jun 2019 01:01:36 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : content-type : content-id : content-transfer-encoding : mime-version; s=jan2016.eng; bh=5gIwjkJCmVd95xiTxLWDHKER70CrVUODE84tK2Mnsbg=; b=joHkxMEQPNxLASDbjH06EfxznEaOTEollEC1rmLTU3RyITrG3/XlwF9WqwT+LhMIYRGN WiXuuiUShB+Nw8HO7vNq8A5hZz6nlaIGfKgTeETLdi99sWwUBJ6362i22c83mNDHjZs3 XmgtL76WiH5nrcXzYQTTifNTgn3Ak6qxnDrLIXAuzAyJeAHHA6ZY2l8DjWrbqUuu4GlQ Lj0ZZ+dmshp4G3bjbrLcQjmT070E9plgsPxPP7wOHJf0SePTrSQ0qx6BywqXzrsv65hy bliggWy/IA8r3G6NBuLsoSPjAqRq03iCQ68ERN14fVLRq8Ijd1iF8jbs0b6issPngYmE QA==
Received: from prod-mail-ppoint4 (prod-mail-ppoint4.akamai.com [96.6.114.87] (may be forged)) by mx0a-00190b01.pphosted.com with ESMTP id 2sws1h1ry0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 05 Jun 2019 01:01:36 +0100
Received: from pps.filterd (prod-mail-ppoint4.akamai.com [127.0.0.1]) by prod-mail-ppoint4.akamai.com (8.16.0.27/8.16.0.27) with SMTP id x54NlTnG007548; Tue, 4 Jun 2019 20:01:35 -0400
Received: from email.msg.corp.akamai.com ([172.27.25.34]) by prod-mail-ppoint4.akamai.com with ESMTP id 2sumpxhrve-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 04 Jun 2019 20:01:34 -0400
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com (172.27.27.104) by ustx2ex-dag1mb4.msg.corp.akamai.com (172.27.27.104) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 4 Jun 2019 19:01:28 -0500
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com ([172.27.6.134]) by ustx2ex-dag1mb4.msg.corp.akamai.com ([172.27.6.134]) with mapi id 15.00.1473.003; Tue, 4 Jun 2019 19:01:28 -0500
From: "Holland, Jake" <jholland@akamai.com>
To: "tsvwg@ietf.org" <tsvwg@ietf.org>, Bob Briscoe <ietf@bobbriscoe.net>
CC: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>
Thread-Topic: Comments on L4S drafts
Thread-Index: AQHVGzHSw5V07mpKU0eMpRmcLIrFiA==
Date: Wed, 5 Jun 2019 00:01:27 +0000
Message-ID: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.19.0.190512
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.116.11]
Content-Type: text/plain; charset="utf-8"
Content-ID: <0BFF481B74F8744388C701E671BA976D@akamai.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-04_15:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906040150
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-04_15:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906040151
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/WYNurfs72tf-y-hrGHzEazg1qcI>
Subject: [tsvwg] Comments on L4S drafts
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jun 2019 00:02:25 -0000

Hi Bob,

I have a few comments and questions on draft-ietf-tsvwg-ecn-l4s-id-06
and draft-ietf-tsvwg-l4s-arch-03.

I've been re-reading these with an eye toward whether it would be
feasible to make L4S compatible with SCE[1] by using ECN capability alone
as the dualq classifier (roughly as described in Appendix B.3 of l4s-id),
and using ECT(1) to indicate a sub-loss congestion signal, assuming
some reasonable mechanism for reflecting the ECT(1) signals to sender
(such as AccECN in TCP, or even just reflecting each SCE signal in the
NS bit from receiver, if AccECN is un-negotiated).

I'm trying to understand the impact this approach would have on the
overall L4S architecture, and I thought I'd write out some of the
comments and questions that taking this angle on a review has left me
with.

This approach of course would require some minor updates to DCTCP or other
CCs that hope to make use of the sub-loss signal, but the changes seem
relatively straightforward (I believe there's a preliminary
implementation that was able to achieve similarly reduced RTT in lab) and
the idea of course comes with some tradeoffs--I've tried to articulate the
key ones I noticed below, which I think are mostly already stated in the
l4s drafts, but I thought I'd ask your opinion of whether you agree with
this interpretation of what these tradeoffs would look like, or there
are other important points you'd like to mention for consideration.


1.
Of course, I understand using SCE-style signaling with ECT capability as
the dualq classifier would come with a cost that where there's classic ECT
behavior at endpoints, the low latency queue would routinely get some
queue-building, until there's pretty wide deployment of scalable controllers
and feedback for the congestion signals at the endpoints.

This is a downside for the proposal, but of course even under this downside,
there's the gains described in Section 5.2 of l4s-arch:
   "State-of-the-art AQMs:  AQMs such as PIE and fq_CoDel give a
      significant reduction in queuing delay relative to no AQM at all."

On top of that, the same pressures that l4s-arch describes that should
cause rapid rollout of L4S should for the same reasons cause rapid rollout
of the endpoint capabilities, especially if the network capability is
there.

But regardless, the queue-building from classic ECN-capable endpoints that
only get 1 congestion signal per RTT is what I understand as the main
downside of the tradeoff if we try to use ECN-capability as the dualq
classifier.  Does that match your understanding?


2.
I ended up confused about how falling back works, and I didn't see it
spelled out anywhere.  I had assumed it was a persistent state-change
for the sender for the rest of the flow lifetime after detecting a
condition that required it, but then I saw some text that seemed to
indicate it might be temporary? From section 4.3 in l4s-id:
   "Note that a scalable congestion control is not expected to change
      to setting ECT(0) while it temporarily falls back to coexist with
      Reno ."

Can you clarify whether the fall-back is meant to be temporary or not,
and whether I missed a more complete explanation of how it's supposed to
work?


3.
I also was a little confused on the implementation status of the fallback
logic.  I was looking through some of the various links I could find, and
I think these 2 are the main ones to consider? (from
https://riteproject.eu/dctth/#code ):
- https://github.com/L4STeam/sch_dualpi2_upstream
- https://github.com/L4STeam/tcp-prague

It looks like the prague_fallback_to_ca case so far only happens when
AccECN is not negotiated, right?

To me, the logic for when to do this (especially for rtt changes) seems
fairly complicated and easy to get wrong, especially if it's meant to be
temporary for the flow, or if needs to deal with things like network path
changes unrelated to the bottleneck, or variations in rtt due an endpoint
being a mobile device, or on wi-fi.

Which brings me to:


*4.
(* I think this is the biggest point driving me to ask about this.)

I'm pretty worried about mis-categorizing CE marking from classic AQM
algorithms as L4S-style markings, when using ECT(1) as the dualq
classifier.

I did see this issue addressed in the l4s drafts, but reviewing it
left me a little confused, so I thought I'd ask about a point I
noticed for clarification:

From section 6.3.3 of l4s-arch:
   "an L4S sender will have to
   fall back to a classic ('TCP-Friendly') behaviour if it detects that
   ECN marking is accompanied by greater queuing delay or greater delay
   variation than would be expected with L4S"

From the abstract in l4s-arch:
   "In
   extensive testing the new L4S service keeps average queuing delay
   under a millisecond for _all_ applications even under very heavy
   load"

My reading of these seems to suggest that if the sender can observe
a variance or increase of more than 1 millisecond of rtt, it should fall
back to classic ECN?

I'm not sure yet how to square that with Section A.1.4 of l4s-id:
   "An increase in queuing delay or in delay variation would be
   a tell-tale sign, but it is not yet clear where a line would be drawn
   between the two behaviours."

Is the discrepancy here because the extensive testing (also mentioned in
the abstract of l4s-arch) was mainly in controlled environments, but the
internet is expected to introduce extra non-bottleneck delays even where
a dualq is present at the bottleneck, such as those from wi-fi, mobile
networks, and path changes?

Regardless, this seems to me like a worrisome gap in the spec, because if
the claim that dualq will get deployed and enabled quickly and widely is
correct, it means this will be a common scenario in deployment--basically
wherever there's existing classic AQMs deployed, especially since in CPE
devices the existing AQMs are generally configured to have a lower
bandwidth limit than the subscriber limit, so they'll (deliberately) be
the bottleneck whenever the upstream access network isn't overly
congested.

I guess if it's really a 1-2 ms variance threshold to fall back, that
would probably address the safety concern, but it seems like it would
have a lot of false positives, and unnecessarily fall back on a lot of
flows.

But worse, if there's some (not yet specified?) logic that tries to reduce
those false positives by relaxing a simple very-few-ms threshold, it seems
like there's a high likelihood of logic that produces false negatives going
undetected.

If that's the case, to me it seems like it will remain a significant risk
even while TCP Prague has been deployed for quite a long time at a sender,
as long as different endpoint and AQM implementations roll out randomly
behind different network conditions, for the various endpoints that end
up connected with the sender.

It also seems to me there's a high likelihood of causing unsafe non-
responsive sender conditions in some of the cases where this kind of false
negative happens in any kind of systematic way.

By contrast, as I understand it an SCE-based approach wouldn't need the
same kind of fallback state-change logic for the flow, since any CE would
indicate a RFC 3168-style multiplicative decrease, and only ECT(1) would
indicate sub-loss congestion.

This is one of the big advantages of the SCE-based approach in my mind,
since there's no chance of mis-classifying the meaning of a CE mark and
no need for a state change for how the sender handles the ECT backoff logic
or sets the ECT markings.  (It just goes back to treating any CE as RFC3168-
style loss equivalent, and SCE as a sub-loss signal.)

Since an SCE-based approach would avoid this problem nicely, I consider
the reduced risk of false negatives (and unresponsive flows) here one of the
important gains, to be weighed against the key downside mentioned in comment
#1.


5.
Something similar comes up again in some other places, for instance:

from A.1.4 in l4s-id:
   "Description: A scalable congestion control needs to distinguish the
   packets it sends from those sent by classic congestion controls.

   Motivation: It needs to be possible for a network node to classify
   L4S packets without flow state into a queue that applies an L4S ECN
   marking behaviour and isolates L4S packets from the queuing delay of
   classic packets."

Listing this as a requirement seems to prioritize enabling the gains of
L4S ahead of avoiding the dangers of L4S flows failing to back off in the
presence of possibly-miscategorized CE markings, if I'm reading it right?

I guess Appendix A says these "requirements" are non-normative, but I'm a
little concerned that framing it as a requirement instead of a design
choice with a tradeoff in its consequences is misleading here, and
pushes toward a less safe choice.


6.
If queuing from classic ECN-capable flows is the main issue with using
ECT as the dualq classifier, do you think it would still be possible to
get the queuing delay down to a max of ~20-40ms right away for ECN-capable
endpoints in networks that deploy this kind of dualq, and then hopefully
see it drop further to ~1-5ms as more endpoints get updated with AccECN or
some kind of ECT(1) feedback and a scalable congestion controller that
can respond to SCE-style marking?

Or is it your position that the additional gains from the ~1ms queueing delay
that should be achievable from the beginning by using ECT(1) (in connections
where enough of the key entities upgrade) are worth the risks?

(And if so, do you happen to have a pointer to any presentations or papers
that made a quantitative comparison of the benefits from those 2 options?
I don't recall any offhand, but there's a lot of papers...)


Best regards,
Jake