Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce

"Holland, Jake" <jholland@akamai.com> Mon, 18 November 2019 00:45 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 074E8120831 for <tsvwg@ietfa.amsl.com>; Sun, 17 Nov 2019 16:45:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xwyLM2dITwhx for <tsvwg@ietfa.amsl.com>; Sun, 17 Nov 2019 16:45:48 -0800 (PST)
Received: from mx0b-00190b01.pphosted.com (mx0b-00190b01.pphosted.com [IPv6:2620:100:9005:57f::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3D468120800 for <tsvwg@ietf.org>; Sun, 17 Nov 2019 16:45:48 -0800 (PST)
Received: from pps.filterd (m0122330.ppops.net [127.0.0.1]) by mx0b-00190b01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id xAI0afZ9019142; Mon, 18 Nov 2019 00:45:38 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=jan2016.eng; bh=O5zEgYmBIG6A8VDsmUZXj4gE1AbzM5LWT5jBTgAZgvQ=; b=oIeDIf2WV9YRSnAZgc9+pOevFv5Eo8HXrzsHuHqYx0RyxqsWNo8Zs6HXObuoK0AHuVmE M8haOQjTLhpTS45YWytp+z+Kk+12ODQX03zMoAMH8fo/b5emTxI+gNFZsWNj/BFLvLjI nALyWPexFX9dM6aQf8iPbr/O5DWTp7zRB37pSOEqeIi+dOH69rRdbpzG2pOzcQvYdm4x ecYmt1WHobhiQvUDrLL4sVy6pctiIDf7X51kwxQaNR9/OKKjndzY7PAuSaghtqVvUu6p /FqBnlF948biLIVdEPLo7Pxh/fpicbLpLjs5iJPRpuSUoOWD/xuVRawT+Mi2oUlpn3DW Dw==
Received: from prod-mail-ppoint8 (prod-mail-ppoint8.akamai.com [96.6.114.122] (may be forged)) by mx0b-00190b01.pphosted.com with ESMTP id 2wag31dn15-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Nov 2019 00:45:37 +0000
Received: from pps.filterd (prod-mail-ppoint8.akamai.com [127.0.0.1]) by prod-mail-ppoint8.akamai.com (8.16.0.27/8.16.0.27) with SMTP id xAI0W1kl017200; Sun, 17 Nov 2019 19:45:36 -0500
Received: from email.msg.corp.akamai.com ([172.27.165.116]) by prod-mail-ppoint8.akamai.com with ESMTP id 2wadayw164-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Sun, 17 Nov 2019 19:45:35 -0500
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com (172.27.165.122) by ustx2ex-dag1mb2.msg.corp.akamai.com (172.27.165.120) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Sun, 17 Nov 2019 18:45:34 -0600
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com ([172.27.165.122]) by ustx2ex-dag1mb4.msg.corp.akamai.com ([172.27.165.122]) with mapi id 15.00.1473.005; Sun, 17 Nov 2019 18:45:34 -0600
From: "Holland, Jake" <jholland@akamai.com>
To: Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
CC: "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Thread-Topic: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
Thread-Index: AdWdjBEz00r+JXBb50O425XwWFTerwACRX2A
Date: Mon, 18 Nov 2019 00:45:33 +0000
Message-ID: <0F5F9FA9-FC09-4679-8A6A-45F93A6A6ED5@akamai.com>
References: <HE1PR07MB4425A6B56F769A5925FF5AA0C2720@HE1PR07MB4425.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR07MB4425A6B56F769A5925FF5AA0C2720@HE1PR07MB4425.eurprd07.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1f.0.191110
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.216.92]
Content-Type: text/plain; charset="utf-8"
Content-ID: <0FA999EF517D7B4184E5C788EF61A156@akamai.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-11-17_05:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1911180001
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,18.0.572 definitions=2019-11-17_05:2019-11-15,2019-11-17 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 bulkscore=0 lowpriorityscore=0 clxscore=1011 spamscore=0 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1910280000 definitions=main-1911180001
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/3sWvrCY2TnjJbAln9mb7Lb08s-0>
Subject: Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2019 00:45:50 -0000

Hi Ingemar,

If fragmenting the space will prevent other SDOs from prematurely
adopting the unproven L4S technology, that seems like exactly the
right thing to do at this stage.

I think we've seen strong evidence that L4S may still contain
show-stopping problems.  Also that we have not yet seen strong
evidence that the problems stemming from the ambiguity in the
L4S signaling design can be fixed.

This carries a demonstrated potential for breaking existing
ECN deployments by under-responding to the already widely-deployed
congestion feedback systems.

Certainly L4S's implementation was demonstrated to contain an
issue that would have wrecked the latency of existing ECN
deployments, and it had not previously been detected, despite the
years of lab evaluation and repeated requests from reviewers to
test such scenarios earlier.

Although a fix was found for the specific initially-demonstrated
case, no fix has yet been demonstrated for what looks to be a very
similar issue occurring with staggered flow startup, which can't
be attributed to a wrong alpha starting value:
https://trac.ietf.org/trac/tsvwg/ticket/17#comment:8

The proposed pseudocode fix (with up to 5 tuning parameters, IIRC)
may or may not be able to address this for specific cases, and it
may or may not be possible to discover a set of tuning values that
can address a wide range of conditions, but it seems appropriate
to have some skepticism, at least until demonstration of successful
operation under a wide range of conditions, given the history of
such proposals.  This suggests that we do the opposite of
encouraging other SDOs to move broadly forward with L4S at this time.

I share your concern that we might lose the codepoint (and the low
latency functionality), and I acknowledge that a persistently
fragmented space introduces a risk that it never happens, or takes
an extra decade.

But the risk that concerns me even more is if L4S gets rolled out
and then these kinds of issues are discovered in production, after
other SDOs have prematurely standardized on this experiment, and it
therefore gets shut off with prejudice against future solutions.
That outcome also would lose the use of the codepoint, probably
even more permanently.

(Or even worse: if it does not get shut off in spite of the problems
it causes, which loses even the low-ish latency solutions we already
have, and adds to the congestion control aggression arms race.)

IMO, those would be even worse outcomes than a somewhat delayed
adoption of a fully vetted system (or at least one that can't break
existing deployed networks).

Best regards,
Jake

PS: I still don't understand why the gains available through the
use of regular AQM (especially with ECN) have not been more widely
adopted by the other SDOs that would want to make use of L4S.

It seems possible already to reduce the application-visible delay
spikes from ~200ms to ~20ms (provided that no overly aggressive
competing traffic improperly ignores the feedback, or that
flow-queuing or other queue protection mechanisms are more widely
deployed to prevent excessive damage from aggressive flows to less
aggressive competing flows).

I wonder if whatever would drive SDOs to start using L4S maybe
could instead be leveraged to drive adoption of the much more well-
proven existing ECN solutions, which at least already have a lot
of endpoint support deployed.

The endpoint support is a critical component to making this useful,
and I see no reason to believe it'll be any quicker than the existing
regular ECN was.  I'd even expect less so, since the behavior is much
more complicated and hard to test.

PPS: I agree it would be interesting to see paced chirping solutions
to help do better than slow start, and to quickly grow when new
capacity opens on-path.  But I'll point out that's not specific to L4S,
but rather should have application for any CC that can avoid pushing
the network until queue overflow, which to me likely includes regular
ECN-enabled Reno or Cubic, as well as BBR.

However, as yet another unproven TBD, I'll suggest it's not very
useful as a strong influence on this debate, in spite of the early
demos using L4S.  Regardless of the ultimate low latency solution,
that part will need further development and might not work.