Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce

Greg White <g.white@CableLabs.com> Mon, 18 November 2019 01:46 UTC

From: Greg White <g.white@CableLabs.com>
To: "Holland, Jake" <jholland@akamai.com>, Ingemar Johansson S <ingemar.s.johansson=40ericsson.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
CC: "gorry@erg.abdn.ac.uk" <gorry@erg.abdn.ac.uk>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Thread-Topic: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
Thread-Index: AdWdjBEzYqXY6vz8s06ga2vCzUdN0QAHWqiA//+bu4A=
Date: Mon, 18 Nov 2019 01:46:41 +0000
Message-ID: <B99441C2-B57F-41A8-8E2C-AD80BC59F84C@cablelabs.com>
References: <HE1PR07MB4425A6B56F769A5925FF5AA0C2720@HE1PR07MB4425.eurprd07.prod.outlook.com> <0F5F9FA9-FC09-4679-8A6A-45F93A6A6ED5@akamai.com>
In-Reply-To: <0F5F9FA9-FC09-4679-8A6A-45F93A6A6ED5@akamai.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/10.1c.0.190812
received-spf: None (protection.outlook.com: CableLabs.com does not designate permitted sender hosts)
Content-Type: text/plain; charset="utf-8"
Content-ID: <A8F3F170104A974EA3DC8D596EA7EBEA@namprd06.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 000852df-8d97-4336-7d48-08d76bc9296b
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Nov 2019 01:46:41.9185 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: ce4fbcd1-1d81-4af0-ad0b-2998c441e160
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: XnFsWT0ohhQLFaZNkgxaRRTAB93+cGpovzC0fCe0oXoMTg6V3r1bBTGAdz1gnuEgnXBv4QT82dD+IHq/T937zQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR06MB4462
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/EvNWMI9NgW3RCfl7brR2FW7sWqw>
Subject: Re: [tsvwg] Requesting TSVWG adoption of SCE draft-morton-tsvwg-sce
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2019 01:46:46 -0000

Hi Jake,

A couple of comments.

Please don't confuse L4S with TCP Prague.  TCP Prague is just one congestion controller that is L4S compatible.  There are others, and there will likely be more in the future.  The bug you are referring to was not with L4S congestion signaling, it was with the TCP Prague implementation, and a part of that implementation (exit from slow start) where there is active research (paced chirping, etc.) and opportunity for continued evolution even after L4S gets deployed.   

The second issue you are referring to is not similar to the first.  It does not affect any flows other than the TCP Prague flow itself, and it seems to be restricted to these existing IQRouter/CAKE implementations. If we can consider improvements in the IQRouter/CAKE implementations (a not unreasonable fix for this sort of issue) a fairly minor change to fq_codel (low CE threshold marking for ECT(1) packets) would fix it.  Also (and this is speculation) a more responsive AQM such as PIE instead of CoDel probably would as well.

Greg

On 11/18/19, 8:45 AM, "tsvwg on behalf of Holland, Jake" <tsvwg-bounces@ietf.org on behalf of jholland@akamai.com> wrote:

    Hi Ingemar,
    
    If fragmenting the space will prevent other SDOs from prematurely
    adopting the unproven L4S technology, that seems like exactly the
    right thing to do at this stage.
    
    I think we've seen strong evidence that L4S may still contain
    show-stopping problems.  Also that we have not yet seen strong
    evidence that the problems stemming from the ambiguity in the
    L4S signaling design can be fixed.
    
    This carries a demonstrated potential for breaking existing
    ECN deployments by under-responding to the already widely-deployed
    congestion feedback systems.
    
    Certainly L4S's implementation was demonstrated to contain an
    issue that would have wrecked the latency of existing ECN
    deployments, and it had not previously been detected, despite the
    years of lab evaluation and repeated requests from reviewers to
    test such scenarios earlier.

    Although a fix was found for the specific initially-demonstrated
    case, no fix has yet been demonstrated for what looks to be a very
    similar issue occurring with staggered flow startup, which can't
    be attributed to a wrong alpha starting value:
    https://trac.ietf.org/trac/tsvwg/ticket/17#comment:8
    
    The proposed pseudocode fix (with up to 5 tuning parameters, IIRC)
    may or may not be able to address this for specific cases, and it
    may or may not be possible to discover a set of tuning values that
    can address a wide range of conditions, but it seems appropriate
    to have some skepticism, at least until demonstration of successful
    operation under a wide range of conditions, given the history of
    such proposals.  This suggests that we do the opposite of
    encouraging other SDOs to move broadly forward with L4S at this time.
    
    I share your concern that we might lose the codepoint (and the low
    latency functionality), and I acknowledge that a persistently
    fragmented space introduces a risk that it never happens, or takes
    an extra decade.
    
    But the risk that concerns me even more is if L4S gets rolled out
    and then these kinds of issues are discovered in production, after
    other SDOs have prematurely standardized on this experiment, and it
    therefore gets shut off with prejudice against future solutions.
    That outcome also would lose the use of the codepoint, probably
    even more permanently.
    
    (Or even worse: if it does not get shut off in spite of the problems
    it causes, which loses even the low-ish latency solutions we already
    have, and adds to the congestion control aggression arms race.)
    
    IMO, those would be even worse outcomes than a somewhat delayed
    adoption of a fully vetted system (or at least one that can't break
    existing deployed networks).
    
    Best regards,
    Jake
    
    PS: I still don't understand why the gains available through the
    use of regular AQM (especially with ECN) have not been more widely
    adopted by the other SDOs that would want to make use of L4S.
    
    It seems possible already to reduce the application-visible delay
    spikes from ~200ms to ~20ms (provided that no overly aggressive
    competing traffic improperly ignores the feedback, or that
    flow-queuing or other queue protection mechanisms are more widely
    deployed to prevent excessive damage from aggressive flows to less
    aggressive competing flows).
    
    I wonder if whatever would drive SDOs to start using L4S maybe
    could instead be leveraged to drive adoption of the much more well-
    proven existing ECN solutions, which at least already have a lot
    of endpoint support deployed.
    
    The endpoint support is a critical component to making this useful,
    and I see no reason to believe it'll be any quicker than the existing
    regular ECN was.  I'd even expect less so, since the behavior is much
    more complicated and hard to test.
    
    PPS: I agree it would be interesting to see paced chirping solutions
    to help do better than slow start, and to quickly grow when new
    capacity opens on-path.  But I'll point out that's not specific to L4S,
    but rather should have application for any CC that can avoid pushing
    the network until queue overflow, which to me likely includes regular
    ECN-enabled Reno or Cubic, as well as BBR.
    
    However, as yet another unproven TBD, I'll suggest it's not very
    useful as a strong influence on this debate, in spite of the early
    demos using L4S.  Regardless of the ultimate low latency solution,
    that part will need further development and might not work.

[tsvwg] Requesting TSVWG adoption of SCE draft-mo… Rodney W. Grimes
[tsvwg] draft-morton-tsvwg-sce: "Permitted ECN co… Neal Cardwell
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Dave Taht
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… G Fairhurst
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Jonathan Morton
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Dave Taht
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Sebastian Moeller
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… G Fairhurst
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Matt Mathis
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Jonathan Morton
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Neal Cardwell
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Sebastian Moeller
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Jonathan Morton
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… G Fairhurst
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Rodney W. Grimes
Re: [tsvwg] draft-morton-tsvwg-sce: "Permitted EC… Rodney W. Grimes
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Roni Even (A)
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Dave Taht
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Ingemar Johansson S
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Bob Briscoe
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Roni Even (A)
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Holland, Jake
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Scheffenegger, Richard
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Jonathan Morton
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Greg White
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Scheffenegger, Richard
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Black, David
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Jonathan Morton
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… G Fairhurst
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Kyle Rose
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Luca Muscariello
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Sebastian Moeller
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Loganaden Velvindron
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… De Schepper, Koen (Nokia - BE/Antwerp)
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Luca Muscariello
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Ingemar Johansson S
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Sebastian Moeller
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Michael Welzl
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Ingemar Johansson S
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Jonathan Morton
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Sebastian Moeller
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Michael Welzl
Re: [tsvwg] Requesting TSVWG adoption of SCE draf… Roland Bless