Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Bob Briscoe <ietf@bobbriscoe.net> Wed, 13 May 2020 23:13 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A0803A0791 for <tsvwg@ietfa.amsl.com>; Wed, 13 May 2020 16:13:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_FAIL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n2Rex5CZd3su for <tsvwg@ietfa.amsl.com>; Wed, 13 May 2020 16:13:15 -0700 (PDT)
Received: from cl3.bcs-hosting.net (cl3.bcs-hosting.net [3.11.37.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B6333A078F for <tsvwg@ietf.org>; Wed, 13 May 2020 16:13:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=snaxjK/dIcuBj7GpjEuIliMr8LOCSrXu0j5FRlMrdUA=; b=PLWFPfBQ4R6WqXf80m4/wHR3n lIsM6YHqdmFMkDfSE/71pLPUtxcazh2Ti/PCFNWnaFmrP3rnP5NurwSzN8Bl4T6CpVeHwI4dt7TJi DFxlhJvfUyr3VgC4t/HKBd5SzOP0dchSCqUM0pQKJS+Z4vg7fDtK4MJ0by/DQCOTjBBVNFrcahYR3 XFzt5RcFMqTOztOWXukyHNIsj7tV8cImsVh+MbaIEh3wfBz5+oSdPP02qf54X+a6ZzrK2iLNVg4YQ 6m8amZyb+e3DfpNLkfaVr3eaICa8xQ/nYNEpN6nLqtrpot6dRN474X1dildM+xBtD9Bndfd9M+9iE vbtkJeixg==;
Received: from host-79-78-166-168.static.as9105.net ([79.78.166.168]:60748 helo=[192.168.2.7]) by cl3.bcs-hosting.net with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <ietf@bobbriscoe.net>) id 1jZ0Yr-002xEE-6J; Thu, 14 May 2020 00:13:13 +0100
To: Neal Cardwell <ncardwell@google.com>, "Holland, Jake" <jholland=40akamai.com@dmarc.ietf.org>
Cc: tsvwg IETF list <tsvwg@ietf.org>
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <A4B43F47-9050-403D-B739-BF12C8F873EB@akamai.com> <CADVnQy=zbFSaJxosicyAjz0sbBRnq_N82LV=SeiCZqCx3BYqwA@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <ac27adec-6752-141b-6cd5-092f2b52e6c7@bobbriscoe.net>
Date: Thu, 14 May 2020 00:13:11 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <CADVnQy=zbFSaJxosicyAjz0sbBRnq_N82LV=SeiCZqCx3BYqwA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------CBDCC980E1D79722AD82866F"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - cl3.bcs-hosting.net
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: cl3.bcs-hosting.net: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: cl3.bcs-hosting.net: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/1cU3g2KfBFiWOAJZnUOKybeOrGk>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 May 2020 23:13:18 -0000

Neal, Jake,

On 12/05/2020 21:55, Neal Cardwell wrote:
> On Fri, May 8, 2020 at 6:02 PM Holland, Jake
> <jholland=40akamai.com@dmarc.ietf.org> wrote:
>> Hi Neal,
>>
>> Thanks for posting this, it’s a helpful explanation.
>>
>> A couple of points I wanted to respond to:
>>
>> From: Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org>
> ..
>> I wanted to check to make sure I understand:
>> To what extent do you think it's an L4S goal to accommodate the
>> existing installed base of switch hardware that’s currently in use
>> for DCTCP in datacenters?
> I'm not sure if accommodating the existing installed base of switch
> hardware was an explicit goal of L4S or not, but it is a nice property
> that seems to help greatly in making L4S easier to deploy than it
> might otherwise be.
>
>> I thought the existing deployments generally wouldn’t be compliant
>> L4S-compatible dualq devices suitable for general internet traffic
>> anyway, and would continue to need traffic isolation the way they do
>> now.  Is that different from your understanding?
> My understanding is that dualq is not a required component of
> implementing L4S, and definitely would not be required at every hop or
> potential bottleneck along the network path. My understanding is that
> there would be sites that don't want to change the qdiscs on their
> senders/servers, and don't want to change their datacenter switches,
> but would like their connections over the public Internet to be able
> to use L4S.

[BB] Heterogeneous DC networks was actually the original use-case that 
motivated the work that became known as L4S.

Back in 2013, when I worked for BT, most of my work was for BT's finance 
sector customers (BT operated what used to be the Thomson-Reuters 
Radianz network, which is the largest finance network in the world, 
interconnecting all the finance centres and stock markets). We did an 
analysis of the solutions that gave the greatest latency gains with the 
least pain (deployment cost). DCTCP had great potential, but we couldn't 
deploy it because it needed a single flag-day deployment, whiach wasn't 
feasible on this private internetwork operated by hundreds of 
independent administrators. Glenn Judd at Morgan Stanley was using 
traditional Diffserv for isolation [Judd15], but the whole Radianz 
traffic matrix was too unpredictable to configure static partitions.

That's why I started on the problem of coexistence between classic TCP 
and DCTCP [Kuehlewind14], which was just as applicable to the public 
Internet as to private DC networks. These remained the two strongest 
use-cases as we developed the DualQ Coupled AQM within the EU-funded 
RITE project, alongside Koen's team from Al-Lu, who were more focused on 
broadband access, which incidentally led to BT's network architect 
sponsoring the work internally in addition to the finance sector 
business unit.

After I left BT, I lost the link with the finance sector clients, but 
the DC use-case was still very relevant at Simula Research. Indeed, in 
2016 the Curvy RED pseudocode in draft-briscoe-tsvwg-aqm-dualq-coupled 
was picked up for implementation in merchant switch silicon.

However, once I started with CableLabs in 2017, the core L4S team no 
longer included anyone directly working on DCs, which is probably why 
you aren't aware of the DC heritage behind L4S.

____________
Quoting from the Use cases section of the L4S architecture:

    o  Private networks of heterogeneous data centres, where there is no
       single administrator that can arrange for all the simultaneous
       changes to senders, receivers and network needed to deploy DCTCP:

       *  a set of private data centres interconnected over a wide area
          with separate administrations, but within the same company

       *  a set of data centres operated by separate companies
          interconnected by a community of interest network (e.g. for the
          finance sector)

       *  multi-tenant (cloud) data centres where tenants choose their
          operating system stack (Infrastructure as a Service - IaaS)


And quoting from the Scope section of draft-ietf-tsvwg-aqm-dualq-coupled:

    ... it is believed the
    Coupled AQM would be applicable and easy to deploy in all types of
    buffers; buffers in cost-reduced mass-market residential equipment;
    buffers in end-system stacks; buffers in carrier-scale equipment
    including remote access servers, routers, firewalls and Ethernet
    switches; buffers in network interface cards, buffers in virtualized
    network appliances, hypervisors, and so on.



[Kühlewind14] Kühlewind, M., Wagner, D.P., Espinosa, J.M.R. & Briscoe, 
B., "Using Data Center TCP (DCTCP) in the Internet," In: Proc. Third 
IEEE Globecom Workshop on Telecommunications Standards: From Research to 
Standards pp.583-588 (December 2014)

[Judd15] Judd, G., "Attaining the Promise and Avoiding the Pitfalls of 
TCP in the Datacenter," In: 12th USENIX Symposium on Networked Systems 
Design and Implementation (NSDI 15) pp.145-157 USENIX Association (May 2015)

more...

>
>>> - More generally, if there is any problem discovered with the L4S
>>>    experiment, either the algorithm or particular implementations,
>>>    bottlenecks can easily identify L4S traffic and bleach it into Not-ECT,
>>>    and treat it like Reno/CUBIC traffic.
>> To repeat in what's maybe a better thread for it:
>>
>> I agree that bleaching is an option where there’s RFC 3168 trouble,
>> that’s a good point and thanks for mentioning it.
>>
>> I’d consider it a poor choice to land there if there's other options,
>> because it works against the success of existing or under-way
>> deployments of RFC 3168 queues by imposing a new bleaching requirement
>> that may not be trivial for all networks with marking queues
>>
>> It would also be an unfortunate choice to endorse bleaching because it
>> would lose much of the potential value in the code point, as David
>> recently mentioned.
>>
>> But I do agree it would work, and that it’s a worthwhile mitigation
>> that should be mentioned in the L4S docs if we end up not finding a
>> way to support robust 3168 compatibility.
> Yes, I agree with all those points; bleaching is very far from ideal.

[BB] It's important not to change ECT(1) to Not-ECT, and there's no need 
to, as long as you can configure AQMs to treat ECT(1) as Not-ECT.

[I-D.ietf-tsvwg-ecn-l4s-id] says

    "the ECT(1) codepoint MUST NOT be
    changed to any other codepoint than CE"

See also 
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-10#section-5.4.1.2

5.4.1.2. Exclusion of Traffic From L4S Treatment
...
    The operator MUST NOT alter the end-to-end L4S ECN identifier from
    L4S to Classic, because its decision to exclude certain traffic from
    L4S treatment is local-only.  The end-to-end L4S identifier then
    survives for other operators to use, or indeed, they can apply their
    own policy, independently based on their own choice of locally-used
    identifiers.  This approach also allows any operator to remove its
    locally-applied exclusions in future, e.g.  if it wishes to widen the
    benefit of the L4S treatment to all its customers.




>
>>> - Encapsulation/decapsulation is a widely prevalent and important
>>>    technology today in production networks. With the installed base of
>>>    encap/decap mechanisms, it is likely that for many implementations any
>>>    SCE marking applied to packets would just get stripped off with the outer
>>>    header when decapsulated.
>> I'd be interested in more information about this.  Is there anything
>> publicly known about the deployment footprint of different tunnel
>> implementations in the places likely to deploy L4S dualqs, and
>> whether they're managed by the same entities who control the queues?
>>
>> I'm not sure it changes my position much, but it seems at least
>> useful to know how big a lift it would be to do a tunnel
>> decapsulation change, if that's a necessary piece of a safe
>> approach to L4S that can achieve low latency effectively.
>>
>> (I know Bob gave a long list of tunneling protocols not long ago, but
>> it seems like only some of them are likely to be relevant to the L4S
>> question, and probably only in a reasonably specific set of
>> implementations for at least the early stages...)

All the tunnels and encapsulations that I mentioned are surely highly 
relevant to any "two-output" marking scheme (whether SCE or your 
proposal). I've updated the 2-slide briefing I posted a couple of weeks 
ago to add a third slide about how an AQM applying your proposed marking 
scheme within a tunnel would just get stripped:
http://bobbriscoe.net/presents/2004ietf/2020-04ecn-tunnel-brief.pdf


Actually, my long list was in response to an offlist email from you. So 
I ought to repeat it here for the list:

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
Jake,

[Context: marking ECT(1) -> ECT(0)...]
I'm afraid there are two very strong reasons why we didn't do this:

#1/ That will be stripped by all stds track ECN tunnels and 
encapsulations, e.g. the following that specify ECN processing specifically:

  * RFC3168,
  * IPSec [RFC4301]
  * those tunnels that have since been updated to RFC6040,
  * All IP/UDP/IP encaps [RFC8085]
  * MPLS ECN [RFC 5129],
  * CAPWAP [RFC5415]
  * LISP [RFC6830]
  * VXLAN [RFC7349]

And the following stds track drafts that are widely already implemented 
because they are close to RFC:

  * TRILL (altho that's waiting on the L4S drafts in the RFC Editor
    queue, even tho it's stds track).
  * Geneve [draft-ietf-nvo3-geneve-16]
  * GUE [draft-ietf-intarea-gue-09]

It would also be stripped by all the other stds track encaps that some 
implementers could have written to comply with either of RFC3168 or 
RFC6040, e.g.

  * L2TP [RFC3931]
  * GRE [RFC2784]
  * GTP [3GPP]
  * Teredo [RFC4380]

#2/ [...snip]


Bob
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

> As I mentioned with Roland, here my understanding was based on a
> description of this issue by Bob Briscoe. I'm not sure what the
> original data source is for that issue. (And apologies if I have
> misinterpreted the issue.) But perhaps Bob will have time to fill in
> more of the details or pointers.
>
> Best regards,
> neal

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/