Re: [tsvwg] [Ecn-sane] Comments on L4S drafts

Bob Briscoe <ietf@bobbriscoe.net> Thu, 25 July 2019 21:17 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9F75412028F for <tsvwg@ietfa.amsl.com>; Thu, 25 Jul 2019 14:17:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vpwTU4AOi0cC for <tsvwg@ietfa.amsl.com>; Thu, 25 Jul 2019 14:17:37 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 800631201F2 for <tsvwg@ietf.org>; Thu, 25 Jul 2019 14:17:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:References:Cc:To:From:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=PE/yiWHJ4qZJEZ5Xnw36DeJUwL4b6X1xl9QVFuWsSq0=; b=8uPlrWcSfl7WJ22qbIANDLNDU qGWVNuqFEwxk3ah+jjoRrU/dr7/QpXs6I0XjW+CtEM3/pi0jvyXNf7yduIWKHG6+0UsLTzVjBkmX/ GhGGeU1H3z2l6koecOqjPUWM8W+vJPf5+V+sYrOO5ecK7jo/iuCaeEpVplx42gTkl59xl/ajKbpJX UgaIgyNyrwNVg20O1a3sG0kHn09e/62znQB1DhQmjdJplqdUPcUEvyC+206cAhPyV3Y6QiUjDPkHB fl8ReYl+ItR3oriwVdxO7bAB91915vYQ4hPaGxZ7LDjIW5aR1cHHg+yapHLYL7sp7cK6M2+3yKZtr nXIWzAF0Q==;
Received: from dhcp-9572.meeting.ietf.org ([31.133.149.114]:53138) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1hql7E-0007gg-SG; Thu, 25 Jul 2019 22:17:33 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: "De Schepper, Koen (Nokia - BE/Antwerp)" <koen.de_schepper@nokia-bell-labs.com>, "Black, David" <David.Black@dell.com>, "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, "tsvwg@ietf.org" <tsvwg@ietf.org>, Dave Taht <dave@taht.net>
References: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com> <17B33B39-D25A-432C-9037-3A4835CCC0E1@gmail.com> <AM4PR07MB345956F52D92759F24FFAA13B9F50@AM4PR07MB3459.eurprd07.prod.outlook.com> <52F85CFC-B7CF-4C7A-88B8-AE0879B3CCFE@gmail.com> <AM4PR07MB3459B471C4D7ADAE4CF713F3B9F60@AM4PR07MB3459.eurprd07.prod.outlook.com> <D231681B-1E57-44E1-992A-E8CC423926B6@akamai.com> <AM4PR07MB34592A10E2625C2C32B9893EB9F00@AM4PR07MB3459.eurprd07.prod.outlook.com> <A6F05DD3-D276-4893-9B15-F48E3018A129@gmx.de> <AM4PR07MB3459487C8A79B1152E132CE1B9CB0@AM4PR07MB3459.eurprd07.prod.outlook.com> <87ef2myqzv.fsf@taht.net> <a85d38ba-98ac-e43e-7610-658f4d03e0f4@mti-systems.com> <CE03DB3D7B45C245BCA0D243277949363062879C@MX307CL04.corp.emc.com> <e1660988-3651-0c3b-cdc1-5518f067e42e@bobbriscoe.net> <4B02593C-E67F-4587-8B7E-9127D029AED9@gmx.de> <34e3b1b0-3c4c-bb6a-82c1-89ac14d5fd2c@bobbriscoe.net> <E031B993-DAAF-4BE4-A542-33C44310D6E9@gmx.de> <77522c07-6f2e-2491-ba0e-cbef62aad194@bobbriscoe.net>
Message-ID: <619092c0-640f-56c2-19c9-1cc486180c8b@bobbriscoe.net>
Date: Thu, 25 Jul 2019 17:17:30 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <77522c07-6f2e-2491-ba0e-cbef62aad194@bobbriscoe.net>
Content-Type: multipart/alternative; boundary="------------86925ABAEFAADF310380842B"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ek7KfRNxVQUVpetJ_lXf3TIAFPI>
Subject: Re: [tsvwg] [Ecn-sane] Comments on L4S drafts
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jul 2019 21:17:45 -0000

Sebastien,

Sry, I sent that last reply too early, and not bottom posted. Both 
corrected below (tagged [BB]):


On 25/07/2019 16:51, Bob Briscoe wrote:
> Sebastien,
>
>
> On 21/07/2019 16:48, Sebastian Moeller wrote:
>> Dear Bob,
>>
>>> On Jul 21, 2019, at 21:14, Bob Briscoe<ietf@bobbriscoe.net>  wrote:
>>>
>>> Sebastien,
>>>
>>> On 21/07/2019 17:08, Sebastian Moeller wrote:
>>>> Hi Bob,
>>>>
>>>>
>>>>
>>>>> On Jul 21, 2019, at 14:30, Bob Briscoe<ietf@bobbriscoe.net>
>>>>>   wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> On 19/07/2019 21:06, Black, David wrote:
>>>>>
>>>>>> Two comments as an individual, not as a WG chair:
>>>>>>
>>>>>>
>>>>>>> Mostly, they're things that an end-host algorithm needs
>>>>>>> to do in order to behave nicely, that might be good things anyways
>>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>>>>
>>>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>>>>
>>>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>>>>
>>>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>>>>
>>>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>>>>
>>>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>>>>
>>>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>>>>
>>>> Best Regards
>>>> 	Sebastian
>>>>
>>>>
>>> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
>>> 	A scalable congestion control MUST detect loss by counting in time-based units
>>> That's all. No more, no less.
>>>
>>> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.
>> 	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.

[BB] The protocol ID identifies the wire protocol, not the congestion 
control behaviour. If we had used a different protocol ID for each 
congestion control behaviour, we'd have run out of protocol IDs long ago 
(semi serious ;)

This is a re-run of a debate that has already been had (in Jul 2015 - 
Nov 2016), which is recorded in the appendix of ecn-l4s-id here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4
Quoted and annotated below:

> B.4.  Protocol ID
>
>     It has been suggested that a new ID in the IPv4 Protocol field or the
>     IPv6 Next Header field could identify L4S packets.  However this
>     approach is ruled out by numerous problems:
>
>     o  A new protocol ID would need to be paired with the old one for
>        each transport (TCP, SCTP, UDP, etc.);
>
>     o  In IPv6, there can be a sequence of Next Header fields, and it
>        would not be obvious which one would be expected to identify a
>        network service like L4S;

In particular, the protocol ID / next header stays next to the upper 
layer header as a PDU gets encapsulated, possibly many times. So the 
protocol ID is not necessarily (rarely?) in the outer, particularly in 
IPv6, and it might be encrypted in IPSec.

>     o  A new protocol ID would rarely provide an end-to-end service,
>        because It is well-known that new protocol IDs are often blocked
>        by numerous types of middlebox;
>
>     o  The approach is not a solution for AQMs below the IP layer;

That last point means that the protocol ID is not designed to always 
propagate to the outer on encap and back from the outer on decap, 
whereas the ECN field is (and it's the only field that is).

more....
>>
>>
>>> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).
>> 	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that...
>> And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).
[BB] No-one's suggesting reordering degree will adapt to measured RTT at 
run-time.

See the original discussion on this point here:
Vicious or Virtuous circle? Adapting reordering window to reordering 
degree 
<https://mailarchive.ietf.org/arch/msg/tcpm/QOhMjHEo2kbHGInH8eFEsXbdwkA>

In summary, the uncertainty for the network is a feature not a bug. It 
means it has to keep reordering degree lower than the lowest likely RTT 
(or some fraction of it) that is expected for that link technology at 
the design stage. This will keep reordering low, but not too 
unnecessarily low (i.e. not 3 packets at the link rate).

>>
>>> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer.
>> 	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...
[BB] In a purist sense, it is a valid architectural criticism that we 
overload one codepoint with two architecturally distinct functions:

  * low queuing delay
  * low resequencing delay

But then, one has to consider the value vs cost of 2 independent 
identifiers for two things that are unlikely to ever need to be 
distinguished. If an app wants low delay, would it want only low queuing 
delay and not low resequencing delay?

You could contrive a case where the receiver is memory-challenged and 
needs the network to do the resequencing. But it's not a reasonable 
expectation for the network to do a function that will cause HoL 
blocking for other applications in the process of helping you with your 
memory problems.

Given we are header-bit-challenged, it would not be unreasonable for the 
WG to decide to conflate these two architectural identifiers into one.


Bob

>>
>> Best Regards
>> 	Sebastian
>>
>>>
>>> Bob
>>>
>>>
>>>
>>> -- 
>>> ________________________________________________________________
>>> Bob Briscoe
>>> http://bobbriscoe.net/
>> _______________________________________________
>> Ecn-sane mailing list
>> Ecn-sane@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/ecn-sane
>
> -- 
> ________________________________________________________________
> Bob Briscoehttp://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/