Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce

Greg White <g.white@CableLabs.com> Mon, 18 November 2019 01:46 UTC

Return-Path: <g.white@CableLabs.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4F340120811 for <tsvwg@ietfa.amsl.com>; Sun, 17 Nov 2019 17:46:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.001
X-Spam-Level:
X-Spam-Status: No, score=-2.001 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cablelabs.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hYvrTNXlTsII for <tsvwg@ietfa.amsl.com>; Sun, 17 Nov 2019 17:46:44 -0800 (PST)
Received: from NAM04-CO1-obe.outbound.protection.outlook.com (mail-eopbgr690131.outbound.protection.outlook.com [40.107.69.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E51F6120288 for <tsvwg@ietf.org>; Sun, 17 Nov 2019 17:46:43 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aw7bNCrVb4S9Ve8g7AneJ0zkDyOeCXfRouewQwTRmSMyxBuxnOXK3LqNrg/CAtdnHPAO8gaW9ERORXRphAL3IGaC5zdGFOghQ7We0UGg40JlLE2gaxN3OB9UKHeETzGZwzMFQeAiv9NixByvx28SUeNTmjF1IcMZc1yyl2OtYB9o1qvJ7H+wXmJCg5garBbmAHx3wqB48hTpS89LR0gRjasZkGbUJ6SEw3qn+KxRrGzOR2Ve1r+w7LITWReSA34ukWHK9xiksuSgZnaQmhdtLuIB3z3auFZi7C9ds0QI2W1QLv82iBgOdIwLcaMh4TlW9dq2OBdKFszSNDOYuCTBWg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yqu/YC5VA0uWRczbQ2mPNyC09OxfF7hgdDQwuCM0ug0=; b=A2+jCz5+2qa0zA5ZbiBnFRChxqvWPW1ptATJXyF/iCLFFc/xNbpEq2UHAl5jw597mq5clq5WJ9ozwB5JMXOAXa14YRBMjyPk6YgbnOec1iCu+bk4C87Yl4RMN+JHRl1dngO6dmVyOlXEHBIeHVZVC7rbiFqYQ6dju5ZPT+RqEiXipEHyO2cFh8NjzOmk8mjfpm9RWAo4BkSu02VZVXZmuswX3RzMpJlwienZpy7iHga61Ck7Y6ad6z7ymwADDFkgKkWSoYa37HG4+gFhjz5OKtELwVm+dkSgOpz0HWfDYDigGdIt3UVQJkl7x6KDsAw9LCik6eitMmRP1PfMuEX/xQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cablelabs.com; dmarc=pass action=none header.from=cablelabs.com; dkim=pass header.d=cablelabs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cablelabs.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yqu/YC5VA0uWRczbQ2mPNyC09OxfF7hgdDQwuCM0ug0=; b=Gny00sF++xZOpdp1Ms+qgB+rMGqNuK5n9K3XmXkYk71lvc0GWQtwcvXDSeWVbUq/MPVZIlLpOZ/iz2P4xPacA9tSOPfmhdQ63l7xKoPWoNtUVkEvuZdR3OCwgbvFYZotS+hL2PIWKwNNjGn3As69eI2SqQZYx+ykZZxM4uBcWO4=
Received: from SN6PR06MB4655.namprd06.prod.outlook.com (52.135.117.85) by SN6PR06MB4462.namprd06.prod.outlook.com (52.135.123.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2451.26; Mon, 18 Nov 2019 01:46:42 +0000
Received: from SN6PR06MB4655.namprd06.prod.outlook.com ([fe80::91c5:f29b:4501:6db5]) by SN6PR06MB4655.namprd06.prod.outlook.com ([fe80::91c5:f29b:4501:6db5%2]) with mapi id 15.20.2451.029; Mon, 18 Nov 2019 01:46:42 +0000
From: Greg White <g.white@CableLabs.com>
To: "Holland, Jake" <jholland@akamai.com>, Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
CC: "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Thread-Topic: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
Thread-Index: AdWdjBEzYqXY6vz8s06ga2vCzUdN0QAHWqiA//+bu4A=
Date: Mon, 18 Nov 2019 01:46:41 +0000
Message-ID: <B99441C2-B57F-41A8-8E2C-AD80BC59F84C@cablelabs.com>
References: <HE1PR07MB4425A6B56F769A5925FF5AA0C2720@HE1PR07MB4425.eurprd07.prod.outlook.com> <0F5F9FA9-FC09-4679-8A6A-45F93A6A6ED5@akamai.com>
In-Reply-To: <0F5F9FA9-FC09-4679-8A6A-45F93A6A6ED5@akamai.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1c.0.190812
authentication-results: spf=none (sender IP is ) smtp.mailfrom=g.white@CableLabs.com;
x-originating-ip: [2001:67c:370:128:bdf5:8f33:9942:cbb8]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 000852df-8d97-4336-7d48-08d76bc9296b
x-ms-traffictypediagnostic: SN6PR06MB4462:
x-microsoft-antispam-prvs: <SN6PR06MB446205223D0670ED61C8B1D9EE4D0@SN6PR06MB4462.namprd06.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-forefront-prvs: 0225B0D5BC
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(366004)(136003)(396003)(376002)(346002)(39840400004)(54094003)(189003)(199004)(2906002)(71200400001)(71190400001)(66446008)(64756008)(66476007)(66946007)(316002)(5660300002)(58126008)(229853002)(54906003)(14444005)(186003)(256004)(446003)(966005)(305945005)(110136005)(8676002)(7736002)(81166006)(81156014)(8936002)(478600001)(25786009)(2616005)(6116002)(476003)(76176011)(91956017)(14454004)(99286004)(66556008)(76116006)(33656002)(6506007)(36756003)(46003)(2501003)(86362001)(6306002)(6246003)(6512007)(11346002)(486006)(102836004)(4326008)(6486002)(6436002)(85282002); DIR:OUT; SFP:1102; SCL:1; SRVR:SN6PR06MB4462; H:SN6PR06MB4655.namprd06.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: CableLabs.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 5caHTBAVshC0WtBu+AX0e4qCWfjCiaamwdFc/JgZuWb+NyJw98SmdsTziGhNwyVNVxDkwAxLzveTtNU5kNK/bCloXbrrFKQKCFuqzvBA+aZxhzI2Pj0ZO89QLSRmqGitSnOBYa+RWLwgKVkEs8Zhj7xwReb3iY3i4YUAzc4rNsg5bVkjsV+Lk7jbkH2wlWBb5s4pkoFSYETDHZCnCX73ttL1Vr12TBqTTUPrnLsot1WL1u3dDgVY/UmTzbyqicyV9EDhqEL3o7Y9SrlHbTk73zyRH2FVh6XMhwgC+fIPBOzihi9QA8Z2t5XJ7+s7neYJNMshUuzq65FTCXvjKu6pTcKXK6OLqOA7PP8rXMmmuMnU8jkkJImromwll5LTG3Nbp1lc6eTDANjCVhPsYNN5LohpZD0B9pGHMjJm70vFVGUk2ncUngXoyZKgwUxVB1cB
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-ID: <A8F3F170104A974EA3DC8D596EA7EBEA@namprd06.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: cablelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 000852df-8d97-4336-7d48-08d76bc9296b
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Nov 2019 01:46:41.9185 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: ce4fbcd1-1d81-4af0-ad0b-2998c441e160
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: XnFsWT0ohhQLFaZNkgxaRRTAB93+cGpovzC0fCe0oXoMTg6V3r1bBTGAdz1gnuEgnXBv4QT82dD+IHq/T937zQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR06MB4462
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/EvNWMI9NgW3RCfl7brR2FW7sWqw>
Subject: Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2019 01:46:46 -0000

Hi Jake,

A couple of comments.

Please don't confuse L4S with TCP Prague.  TCP Prague is just one congestion controller that is L4S compatible.  There are others, and there will likely be more in the future.  The bug you are referring to was not with L4S congestion signaling, it was with the TCP Prague implementation, and a part of that implementation (exit from slow start) where there is active research (paced chirping, etc.) and opportunity for continued evolution even after L4S gets deployed.   

The second issue you are referring to is not similar to the first.  It does not affect any flows other than the TCP Prague flow itself, and it seems to be restricted to these existing IQRouter/CAKE implementations. If we can consider improvements in the IQRouter/CAKE implementations (a not unreasonable fix for this sort of issue) a fairly minor change to fq_codel (low CE threshold marking for ECT(1) packets) would fix it.  Also (and this is speculation) a more responsive AQM such as PIE instead of CoDel probably would as well.

Greg

On 11/18/19, 8:45 AM, "tsvwg on behalf of Holland, Jake" <tsvwg-bounces@ietf.org on behalf of jholland@akamai.com> wrote:

    Hi Ingemar,
    
    If fragmenting the space will prevent other SDOs from prematurely
    adopting the unproven L4S technology, that seems like exactly the
    right thing to do at this stage.
    
    I think we've seen strong evidence that L4S may still contain
    show-stopping problems.  Also that we have not yet seen strong
    evidence that the problems stemming from the ambiguity in the
    L4S signaling design can be fixed.
    
    This carries a demonstrated potential for breaking existing
    ECN deployments by under-responding to the already widely-deployed
    congestion feedback systems.
    
    Certainly L4S's implementation was demonstrated to contain an
    issue that would have wrecked the latency of existing ECN
    deployments, and it had not previously been detected, despite the
    years of lab evaluation and repeated requests from reviewers to
    test such scenarios earlier.

    Although a fix was found for the specific initially-demonstrated
    case, no fix has yet been demonstrated for what looks to be a very
    similar issue occurring with staggered flow startup, which can't
    be attributed to a wrong alpha starting value:
    https://trac.ietf.org/trac/tsvwg/ticket/17#comment:8
    
    The proposed pseudocode fix (with up to 5 tuning parameters, IIRC)
    may or may not be able to address this for specific cases, and it
    may or may not be possible to discover a set of tuning values that
    can address a wide range of conditions, but it seems appropriate
    to have some skepticism, at least until demonstration of successful
    operation under a wide range of conditions, given the history of
    such proposals.  This suggests that we do the opposite of
    encouraging other SDOs to move broadly forward with L4S at this time.
    
    I share your concern that we might lose the codepoint (and the low
    latency functionality), and I acknowledge that a persistently
    fragmented space introduces a risk that it never happens, or takes
    an extra decade.
    
    But the risk that concerns me even more is if L4S gets rolled out
    and then these kinds of issues are discovered in production, after
    other SDOs have prematurely standardized on this experiment, and it
    therefore gets shut off with prejudice against future solutions.
    That outcome also would lose the use of the codepoint, probably
    even more permanently.
    
    (Or even worse: if it does not get shut off in spite of the problems
    it causes, which loses even the low-ish latency solutions we already
    have, and adds to the congestion control aggression arms race.)
    
    IMO, those would be even worse outcomes than a somewhat delayed
    adoption of a fully vetted system (or at least one that can't break
    existing deployed networks).
    
    Best regards,
    Jake
    
    PS: I still don't understand why the gains available through the
    use of regular AQM (especially with ECN) have not been more widely
    adopted by the other SDOs that would want to make use of L4S.
    
    It seems possible already to reduce the application-visible delay
    spikes from ~200ms to ~20ms (provided that no overly aggressive
    competing traffic improperly ignores the feedback, or that
    flow-queuing or other queue protection mechanisms are more widely
    deployed to prevent excessive damage from aggressive flows to less
    aggressive competing flows).
    
    I wonder if whatever would drive SDOs to start using L4S maybe
    could instead be leveraged to drive adoption of the much more well-
    proven existing ECN solutions, which at least already have a lot
    of endpoint support deployed.
    
    The endpoint support is a critical component to making this useful,
    and I see no reason to believe it'll be any quicker than the existing
    regular ECN was.  I'd even expect less so, since the behavior is much
    more complicated and hard to test.
    
    PPS: I agree it would be interesting to see paced chirping solutions
    to help do better than slow start, and to quickly grow when new
    capacity opens on-path.  But I'll point out that's not specific to L4S,
    but rather should have application for any CC that can avoid pushing
    the network until queue overflow, which to me likely includes regular
    ECN-enabled Reno or Cubic, as well as BBR.
    
    However, as yet another unproven TBD, I'll suggest it's not very
    useful as a strong influence on this debate, in spite of the early
    demos using L4S.  Regardless of the ultimate low latency solution,
    that part will need further development and might not work.