Re: [tcpm] [tsvwg] Cross-area alignment on "L4S and RACK"

Bob Briscoe <ietf@bobbriscoe.net> Mon, 22 July 2019 02:05 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C04A120112; Sun, 21 Jul 2019 19:05:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.988
X-Spam-Level:
X-Spam-Status: No, score=-1.988 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CIlnEptpz-ao; Sun, 21 Jul 2019 19:05:35 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03BFB12003F; Sun, 21 Jul 2019 19:05:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:References:Cc:To:Subject:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=l8AT2PbrW4xfM1RbD0+JHa1XlCl9aXtpVRCqXhQ72a8=; b=xkF7A0+Dmobw3Yt0RCsdYaqpn gHmHwfonX5+Q3wjhabODIGtuLJLQAXdjRrknf+bTG5ttXEqnCiA80OzDh01jtb+6qn/KQ3puObQvu Ak+ZZv5WoGpNUeGUPqShVuSEBZ5TshEjzHuhgrsanNH4OhFL7suM4crtCgpWfROpgrAf/l55A9I0s ACWn5O0NebhLWSs0ShlLbkEyu/YDRERtEGwJScClh8eYeewGE8/e9xYsLYVCrjQZJcNRE2y5Iv5Y1 9zHk2aSwcpoFkAmRyOd9oPVTOK3pCl2iGkZjQGHuBsEElFxn+eKf2i0bPG/hz2mJXEv9mEMvyR0Ae EqDHLFgOg==;
Received: from modemcable186.232-83-70.mc.videotron.ca ([70.83.232.186]:36952 helo=[192.168.0.161]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1hpNhj-0005cL-MJ; Mon, 22 Jul 2019 03:05:32 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: "Scharf, Michael" <Michael.Scharf@hs-esslingen.de>, Praveen Balasubramanian <pravb@microsoft.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <6EC6417807D9754DA64F3087E2E2E03E2D14F3F0@rznt8114.rznt.rzdir.fht-esslingen.de> <110030da-11f7-c9c2-c059-2abdf6178864@bobbriscoe.net> <6EC6417807D9754DA64F3087E2E2E03E2D15201A@rznt8114.rznt.rzdir.fht-esslingen.de> <1f7ccdda-0e0c-4241-3cfb-b4fe4cc00b47@bobbriscoe.net> <6EC6417807D9754DA64F3087E2E2E03E2D152905@rznt8114.rznt.rzdir.fht-esslingen.de> <53f76b72-67e8-9c95-0ea5-a5621f378dd0@bobbriscoe.net> <6EC6417807D9754DA64F3087E2E2E03E2D1536B3@rznt8114.rznt.rzdir.fht-esslingen.de> <bbe04ad6-d471-4dc3-1b14-000e5cd2bb9a@bobbriscoe.net> <6EC6417807D9754DA64F3087E2E2E03E2D1537CD@rznt8114.rznt.rzdir.fht-esslingen.de> <45512894-fc71-a4a8-4043-3feb06f9e347@bobbriscoe.net> <MW2PR2101MB1049EAF866AFA35983A31167B6C60@MW2PR2101MB1049.namprd21.prod.outlook.com> <6413d62f-ecf1-3319-7bb2-2c763b0633be@bobbriscoe.net> <6EC6417807D9754DA64F3087E2E2E03E2D154250@rznt8114.rznt.rzdir.fht-esslingen.de>
Message-ID: <cb0a912b-4aee-f460-78fc-4e7d7ba5b38d@bobbriscoe.net>
Date: Mon, 22 Jul 2019 03:05:30 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <6EC6417807D9754DA64F3087E2E2E03E2D154250@rznt8114.rznt.rzdir.fht-esslingen.de>
Content-Type: multipart/alternative; boundary="------------64A99559A645D0691FCD6031"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/Qd7NzhjQBTLcdbVsVSGYt4Pl16A>
Subject: Re: [tcpm] [tsvwg] Cross-area alignment on "L4S and RACK"
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jul 2019 02:05:42 -0000

Michael,

Sorry, I held off responding until the Low Latency additions to the 
DOCSIS 3.1 spec were published (which was imminent at the time, but then 
got delayed). Then I omitted to pick up this thread again.

I can now give one of the real world examples that I couldn't give 
before, even tho it's not the most interesting one...

On 10/11/2018 08:13, Scharf, Michael wrote:
>
> Hi Bob,
>
> I believe more empirical evidence is needed that this is a problem 
> that need to be addressed.
>
> For instance, for what link technologies does this problem exist in 
> reality?
>
Main problem is radio links that do link-layer retransmission (due to 
intermittent corruption loss). RFC3366 explains the problem of needing 
different ARQ persistence for transports with different ordering 
sensitivities.


      LTE

LTE typically does a max of 3 link-layer retransmissions as part of 
HARQ. Each takes about 8ms. And after 3 re-transmissions, despite 
wasting 24ms, the block error rate is typically not significantly 
improved (implying that, once 1 retransmission is needed, many more 
retransmissions are often needed). So being allows to immediately 
forward those packets that can be assembled saves 24ms - more often than 
not.

Packets from one flow (or stream in a QUIC/SCTP scenario) can be sitting 
behind a missing frame waiting for link-layer retransmission (HoL 
blocking).

Here's a thesis about exploiting RACK by turning off resequencing at the 
PDCP layer in LTE:
Implementing immediate forwarding for 4G in a network simulator 
<https://www.duo.uio.no/handle/10852/68158>
Unfortunately the evaluations are limited - I plan to pick this up soon.


      WiFi

I haven't studied WiFi, The retry limits are higher than LTE (max 6 
short retries and 4 long = 10 total). The overall potential delay is 
greater than LTE, altho each round is somewhat faster.


      DOCSIS

Since DOCSIS 3.1 i17 when L4S was added, resequencing is specified to be 
disabled in the downlink for the L4S service flow. This doesn't give 
such a big gain in delay as it would in a radio link, cos the cable has 
much lower loss, of course. The saving is primarily in cable modem 
buffer memory usage and less processing.

Resequencing is necessary on the Classic service flow because 
traditional (3DupACK) TCPs would otherwise spuriously detect losses. The 
misordering happens on links that bond together multiple OFDMA channels 
for faster throughput. Data on different channels arrives faster or slower.


> Your example with 94us seems to imply a data rate of 384 Mbit/s for a 
> single TCP stream. That is probably not what the vast majority of TCP 
> connections use. So is there an engineering need?
>
Perhaps you've misunderstood. This is described in the spec. as a 
scaling requirement. That means it's so that it is easier to design 
faster links in future, with less constraint on ordering (typically 
using parallelism at the link layer).

If the requirement on L4S senders is not made from the start, it will 
never be possible to introduce it later, because there will always be 
the risk of 'legacy' senders on the link.

Link rates have doubled every 1.6 years for the last couple of decades. 
Even if this slows down, it's not going to be long before 384Mb/s is 
normal.

Now compare that with the average time it takes to go from experimental 
WG draft to proposed standard RFC. By the time this is a PS, 384Mb/s 
will seem slow.


> And I don’t get why keeping reordering within 94us is so much of a 
> challenge for a link layer technology. I would really like to learn 
> technical reasons why (mostly) in-order delivery on links cannot be 
> implemented now or in future.
>
If you still need me to explain, pls ask. But I assume it's obvious in 
the 3 cases above (LTE, WiFi and DOCSIS).

  * In the radio cases, it's a question of excessive HoL buffering delay.
  * In the fixed link example, it cuts the cost of high speed buffer
    memory and being able to use processing for more useful purposes.
    You don't necessarily get these benefits when you first turn reseq
    off. You get them once the approach becomes the norm, so you can
    cost-reduce the equipment.


> If we indeed run into a real-world technical problem, it wouldn’t help 
> to solve that problem for L4S – it must be solved for all Internet 
> traffic then and the overall solution would (probably) be different to 
> the one used in an experiment.
>
Irrespective of the proportion of traffic that uses L4S:

  * Removing HoL blocking of retransmission delay on wireless links,
    gives an immediate performance benefit for all the traffic that uses L4S
  * On fixed links, unless the equipment is designed so that resources
    are dedicated to configured comms channels (which would be a bad
    design), reducing buffer memory and processing for a fraction of the
    traffic, reduces those resources by that fraction (or frees up that
    fraction to be used by the other fraction).


The idea of the L4S experiment (as stated in l4s-arch) is ultimately low 
delay for all Internet traffic (because there would be no reason not to 
use it if it achieves its ambitions).

> To me, one of the fundamental requirements on a packet network is that 
> the network invests some reasonable effort to keep packets in order on 
> a given path – albeit a packet network is not always perfect and that 
> has always been OK. I am not convinced so far that it makes sense to 
> give up that design guideline for link technologies – not even for L4S.
>
Constraints change. Design principles change. What you and I learned at 
school was true for its time. There's a lot more parallelism in networks 
now, because it's much more expensive to get individual links to keep 
increasing in speed, than it is to stick a load in parallel.



Bob

> Michael
>
> *From:*Bob Briscoe [mailto:ietf@bobbriscoe.net]
> *Sent:* Saturday, November 10, 2018 2:35 AM
> *To:* Praveen Balasubramanian; Scharf, Michael; tcpm@ietf.org
> *Cc:* tsvwg@ietf.org
> *Subject:* Re: [tcpm] [tsvwg] Cross-area alignment on "L4S and RACK"
>
> Praveen, Michael,
>
> The idea is not to relax ordering requirement on links. It's to make 
> sure it doesn't keep getting tighter for ever.
>
> As flow-rates get faster, the requirements on links get tighter if the 
> reordering window is expressed in packets.
> For instance (as I showed in my presentation), with RTT = 24ms (say), 
> if the window is 12 segments, a reordering window of 3 DupACKs equates 
> to 6ms. But as flow rates increase, say 8 times faster (window = 96) 
> reduces the time over which links have to keep packets in order to 
> 750μs. Another 8x faster, and links have to keep order within 94μs. 
> And so on.
>
> This cannot go on indefinitely, because the constraints on link-layer 
> retransmission are constant in time (length of link / speed of light + 
> processing time). If the 3 DupACK rule is not removed, link timings 
> will be continually squeezed, so radio links will eventually have to 
> give up ARQ, which will be awful for performance.
>
> It will take decades before /all/ legacy (non-RACK) TCPs have stopped 
> using any particular link. So links in general will still have to keep 
> tightening their re-sequencing time because of legacy 3 Dup-ACK flows. 
> But eventually the 3-DupACK flows that remain will tend to be those on 
> unmaintained machines that will not be taking advantage of higher 
> speeds. Then, assuming the RACK experiment remains successful over 
> these decades, links will be able to stop tightening their reordering 
> degree.
>
>
> The idea is that, for the L4S experiment, we can remove the 3 DupACK 
> rule entirely from the start, and solely express reordering in units 
> of time. Then, L4S transports will evolve, presumably using similar 
> RACK-like reordering logic to non-L4S transports. And we shall see 
> where the minimum reordering window ends up (in time units), and 
> hopefully converge on a value we can standardize - both in TCP and in 
> other transports.
>
> No L4S link would relax its ordering requirement if it led to worse 
> TCP performance. So layer 4 will be in control of this process. 
> Nonetheless, it is important to stop measuring reordering in packets. 
> But hopefully, from the start of the L4S experiment, logical links 
> solely bearing L4S traffic will not need to continually tighten their 
> ordering requirement.
>
> If an OS supports RACK, which yours does, then buffering on the 
> receiver will remain constant in time, which will mean the memory 
> requirement will grow as flow rates scale up.
>
> You (Praveen) supported and implemented RACK before I'd even thought 
> about all this. I'm also now a convert. I'm recognizing and 
> articulating its wider advantages.
>
>
> Bob
>
> On 09/11/2018 19:10, Praveen Balasubramanian wrote:
>
>     >> Therefore they can estimate the max reordering they allow their
>     links to introduce
>
>     IMO L2 shouldn’t /introduce/ more reordering just because L4
>     /congestion control/ is more resilient to it. There is other
>     performance impact due to reordering: LRO / GRO / RSC opportunity
>     will be reduced, more memory and CPU cost to do reassembly at L4
>     (this is an active area for remote attacks) etc. IMO L4 guidance
>     should remain that link vendors must avoid reordering as much as
>     possible and not build solutions that cause frequent or consistent
>     reordering.
>
>     *From:*tcpm <tcpm-bounces@ietf.org>; <mailto:tcpm-bounces@ietf.org>
>     *On Behalf Of *Bob Briscoe
>     *Sent:* Friday, November 9, 2018 10:52 AM
>     *To:* Scharf, Michael <Michael.Scharf@hs-esslingen.de>;
>     <mailto:Michael.Scharf@hs-esslingen.de>; tcpm@ietf.org
>     <mailto:tcpm@ietf.org>
>     *Cc:* tsvwg@ietf.org <mailto:tsvwg@ietf.org>
>     *Subject:* Re: [tcpm] [tsvwg] Cross-area alignment on "L4S and RACK"
>
>     Michael,
>
>     The idea is that (as now in the RACK draft since discussion on the
>     tcpm list earlier this year under a subject something like vicious
>     circle or virtuous circle) there will be a limit on the initial
>     RACK reordering window expressed in fractional RTT units.
>
>     Link vendors know the min e2e RTT of the paths that their links
>     are usually deployed over. Therefore they can estimate the max
>     reordering they allow their links to introduce.
>
>     I don't expect this limit to be set in stone while both RACK and
>     L4S are experimental. But as both these experiments progress, I
>     think the industry will get a better idea of where the limit
>     should lie.
>
>     It was the realization that, if TCP adapted its reordering window
>     to the measured reordering degree without bound, the L2 world
>     would continually increase the reordering degree of their links
>     (the vicious circle).
>
>     If there is any aspect of this limit expressed in units of
>     packets, the limit will decrease in time as flow rates increase,
>     and therefore not serve to give the L2 world any clear guidance -
>     no de facto limit agreed between the L4 world and the L2 world.
>
>     If we keep the limit in units of RTT, hopefully then we have a
>     virtuous circle, not a vicious circle.
>
>
>
>     Bob
>
>
>     On 09/11/2018 18:28, Scharf, Michael wrote:
>
>         I may miss something, but I read the outcome of the appendix
>         as a requirement “must be more robust to reordering”. As far
>         as I understand, that can be implemented in different ways.
>
>         And I may be wrong, but those link technologies could be well
>         implemented in a router and then one needs to analyze how that
>         whole architecture fits into the scheduler, queue, policer,
>         and line card hardware design of a router. But that is
>         something routing people know better than me. My point is just
>         to get the experts into the loop early.
>
>         Michael
>
>         *From:*Bob Briscoe [mailto:ietf@bobbriscoe.net]
>         *Sent:* Friday, November 9, 2018 7:04 PM
>         *To:* Scharf, Michael; tcpm@ietf.org <mailto:tcpm@ietf.org>
>         *Cc:* tsvwg@ietf.org <mailto:tsvwg@ietf.org>
>         *Subject:* Re: [tsvwg] Cross-area alignment on "L4S and RACK"
>
>         Michael,
>
>         Please address the scaling rationale in the appendix for why
>         it does matter how a sender realizes this internally.
>
>         The list of access links types I mentioned are rarely
>         connected directly to routers.
>
>
>
>         Bob
>
>         On 09/11/2018 17:44, Scharf, Michael wrote:
>
>             I’ll not comment on link technology implementation,
>             scheduler and queue design assumptions in Appendix A.1.7
>             of draft-ietf-tsvwg-ecn-l4s-id-05. People e.g. in RTG area
>             may be more familiar with the implications of internal
>             designs e.g. inside a router. IMHO the best way to get
>             routing experts into the loop before further discussing
>             what is currently listed in Appendix A.1.7.
>
>             Thus, I limit my response to the actual wording in the
>             main part of draft-ietf-tsvwg-ecn-l4s-id-05:
>
>                 o  A scalable congestion control MUST detect loss by
>             counting in
>
>                   units of time, which is scalable, and MUST NOT count
>             in units of
>
>                   packets (as in the 3 DupACK rule of traditional
>             TCP), which is not
>
>                   scalable (see Appendix A.1.7
>             <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Fdraft-ietf-tsvwg-ecn-l4s-id-05%23appendix-A.1.7&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895496498&sdata=%2Bez9OCZi1m5wx7TGHkvfG9QsJfJ0VxhsXynqWW2jXG4%3D&reserved=0>for
>             rationale).
>
>             As individual contributor, I disagree with this
>             requirement. I would probably be OK with a wording that a
>             TCP sender must be able to tolerate reordering beyond 3
>             Duplicate ACKs. I believe it does not matter how a sender
>             realizes this internally.
>
>             And if that requirement is not just removed, I believe the
>             wording “traditional TCP” has to be replaced by “standard
>             TCP”.
>
>             BTW, also, in the rest of the document there has to be a
>             clear distinction between experiments and standards.
>             History tells us that some experiments can evolve into
>             standards track specifications, while other experiments
>             may fail. And there is no way of knowing that in advance.
>
>             Michael
>
>             *From:*Bob Briscoe [mailto:ietf@bobbriscoe.net]
>             *Sent:* Friday, November 9, 2018 11:08 AM
>             *To:* Scharf, Michael; tcpm@ietf.org <mailto:tcpm@ietf.org>
>             *Cc:* tsvwg@ietf.org <mailto:tsvwg@ietf.org>
>             *Subject:* Re: [tsvwg] Cross-area alignment on "L4S and RACK"
>
>             Michael,
>
>             On 09/11/2018 06:21, Scharf, Michael wrote:
>
>                 I don’t see the fundamental difference between a link
>                 technology that guarantees in-order delivery within a
>                 flow (say by flow hashing, see ECMP) vs. some link
>                 scheme that guarantees in-order delivery, say, only
>                 for some fraction of the traffic characterized by some
>                 other means (say, other header bits).
>
>             [The "other means" and "other bits" wording is pretty
>             ambiguous here. I reckon I can guess what you mean but
>             then I don't know why you're saying this, 'cos I don't
>             think I've disagreed with you on this. So perhaps you
>             could be more specific and we might be able to see why
>             you've brought this up.]
>
>             Also I wasn't really talking about all "link technology
>             that guarantees in-order delivery within a flow". {Note 1}
>             I was talking about link technology where all flows are
>             blocked while waiting to get the aggregate back in order
>             that it was in when it arrived. I specifically said:
>
>
>
>
>             Note that there can be multiple flows over one link. So
>             'in-order delivery' for a link does not mean in order of
>             TCP sequence number. It means 'delivered in the order that
>             packets arrived at the link' (the link ingress adds its
>             own sequence numbers, and the link egress buffers them
>             until it can send out in the same order, or until a
>             time-out or max no. of link retransmissions).
>
>             For example:
>
>               * A packet resequencing buffer after allowing for
>                 link-layer retransmissions is common for radio link
>                 technologies such as LTE (PDCP layer) and 802.11 WiFi.
>               * A packet resequencing buffer after merging multiple
>                 bonded link channels is common for other access link
>                 technologies such as downstream DOCSIS bonded channels.
>
>
>
>
>
>
>             Clearly, reordering is a problem that needs to be
>             discussed in the IETF as a whole. And we could discuss
>             whether a revision of RFC 3366 is needed.
>
>             No. There is no proposal to alter general guidance for all
>             links like RFC 3366. No-one is saying this.
>
>             There is not even a proposal to require or even recommend
>             that L4S-specific links do anything.
>
>             The specific proposal is to require L4S /sources/ to use a
>             RACK-like scheme, as a condition for setting the ECT(1)
>             codepoint. See the last bullet in S.4.3 of
>             draft-ietf-tsvwg-ecn-l4s-id-05
>             <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Fdraft-ietf-tsvwg-ecn-l4s-id-05%23section-4.3&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895506511&sdata=m%2Fn6HNQnJYbTPIo0MR8cTXKfcj0DX579HAyD9LMe4aQ%3D&reserved=0>;.
>
>
>             This says nothing about what L4S links MUST, SHOULD or MAY
>             do. It just gives an L4S link an opportunity not to have
>             to do resequencing.
>
>
>
>
>
>             But I am not convinced that this topic has to be tied into
>             an experiment such as L4S.
>
>             It is an opportunity that currently only applies to L4S.
>
>             L4S is merely an example of a case where we can mandate
>             that there will be no non-RACK traffic in a link, which
>             creates an opportunity for those who are implementing the
>             L4S DualQ Coupled AQM to ignore the RFC3366 resequencing
>             advice for the L4S queue (but we are not changing the
>             advice, because L4S is only an experiment). The prize is
>             to remove the head-of-line blocking problem for their users.
>
>             This is not L4S-specific, altho there are no other
>             examples at the moment where this opportunity is possible
>             (so in that sense it is /currently/ L4S-specific).
>
>
>             The message has to be handled really carefully, because
>             the opportunities for misunderstanding are enormous (as
>             evidenced by this thread, and we haven't even started
>             talking to non-transport people yet).
>
>             So Gorry and David B want me to draft a short I-D that
>             they will word-smith and author as tsvwg chairs to make
>             the situation clear.
>
>             I have said I don't want this discussion to be officially
>             raised outside the transport area until we are sure that a
>             variant of RACK is even feasible without any packet
>             counting at all (i.e. without the initial 3-DupACK rule in
>             current RACK). So I'll prioritize working that out for the
>             next week or so, rather than drafting anything.
>
>
>
>             Bob
>
>             {Note 1}: I hadn't considered that this would affect
>             technologies like ECMP that maintain order without a
>             resequencing buffer and without introducing HoL blocking.
>             But actually, the consequence there would be that L4S
>             traffic could be sprayed by ECMP without per-flow hashing.
>
>
>
>
>
>             Michael
>
>             *From:*Bob Briscoe [mailto:in@bobbriscoe.net]
>             *Sent:* Friday, November 9, 2018 6:42 AM
>             *To:* Scharf, Michael; tcpm@ietf.org <mailto:tcpm@ietf.org>
>             *Cc:* tsvwg@ietf.org <mailto:tsvwg@ietf.org>
>             *Subject:* Re: [tsvwg] Cross-area alignment on "L4S and RACK"
>
>             Michael,
>
>             That is the point I thought you were making, and my point
>             /is/ about that.
>
>             If an application needs all the packets (i.e. a reliable
>             protocol), it's not useful to reduce the latency of some
>             packets but not all. That's where the in-order requirement
>             comes from.
>
>             However, if a stream-based transport is used (i.e. TCP)
>             the /application/ still gets in-order delivery from TCP.
>             It just doesn't have to require links to provide in order
>             delivery as well.
>
>             Note that there can be multiple flows over one link. So
>             'in-order delivery' for a link does not mean in order of
>             TCP sequence number. It means 'delivered in the order that
>             packets arrived at the link' (the link ingress adds its
>             own sequence numbers, and the link egress buffers them
>             until it can send out in the same order, or until a
>             time-out or max no. of link retransmissions).
>
>             So, if the link loses a packet from flow A (the link isn't
>             aware of microflows, I'm just saying the packet happens to
>             be from flow A), link-layer in-order delivery will hold
>             back all packets sent after that packet (from all flows,
>             not just flow A) until the link repairs the gap or gives up.
>
>             So hop-by-hop in-order delivery gives every flow the same
>             delay as the worst delayed flow.
>
>             Of course, someone building a network for DETNET would
>             avoid link technologies like WiFi or LTE that frequently
>             lose and retransmit packets. But wherever there is
>             variable delay in a link, guaranteed in-order delivery
>             per-hop means every flow is guaranteed to always get the
>             worst delay from every link, which would have been
>             experienced by only one flow without per-hop in-order
>             delivery.
>
>             And many of the applications of DETNET involve dumb
>             industrial machinery that just expects everything to
>             arrive in order and to a clocked schedule. But it's always
>             as effective to deploy a resequencing buffer as the
>             penultimate hop, screwed onto the receiving machine if
>             necessary, rather than require per-hop resequencing in the
>             network.
>
>             ________________
>             So, why do Internet links often ensure in-order delivery
>             even though TCP puts everything in order for the application?
>
>             Short answer: the 3 DupACK rule (which is to do with
>             loss-recovery, a different issue from the discussion above).
>
>             Long answer: As flow rates have scaled up, the typical
>             time between 3 packet arrivals has become so small that
>             TCP's 3-DupACK rule makes links have to provide delivery
>             with only a very small re-ordering degree - so small that
>             it's easier for a link to just deliver everything in the
>             order it received it; not because TCP doesn't put packets
>             back in order, but just so as not to trigger TCP into
>             generating spurious retransmissions.
>
>
>
>             Bob
>
>             On 08/11/2018 16:35, Scharf, Michael wrote:
>
>                 Inline [ms]
>
>                 *From:*Bob Briscoe [mailto:in@bobbriscoe.net]
>                 *Sent:* Thursday, November 8, 2018 12:56 PM
>                 *To:* Scharf, Michael; tcpm@ietf.org
>                 <mailto:tcpm@ietf.org>
>                 *Cc:* tsvwg@ietf.org <mailto:tsvwg@ietf.org>
>                 *Subject:* Re: [tsvwg] Cross-area alignment on "L4S
>                 and RACK"
>
>                 Michael,
>
>                 This is merely a symptom of a difference in opinion on
>                 where the resequencing function should be located.
>
>                   * L4S takes the end-to-end approach saying that it
>                     is sufficient to do resequencing in one place (the
>                     receiving host) if it is needed. Then any
>                     resequencing delay only affects that flow.
>                   * The DETNET approach is saying the resequencing
>                     must be done hop-by-hop. This means there appears
>                     to be no resequencing needed on the receiver
>                     (except if there's re-ordering on the final hop).
>
>                   * In order to guarantee no resequencing in the
>                     network (DETNET approach), only the the worst case
>                     latency can be guaranteed, and every packet has to
>                     have that same worst-case latency.
>                   * When you leave resequencing to the receiver
>                     (end-to-end approach), most of the packets arrive
>                     earlier than they would with DETNET, but out of
>                     order ones might not. Then the application chooses
>                     whether it wants to wait.
>
>                 [ms] My point is different: There seem to be other
>                 working groups in the IETF that do talk about “low
>                 latency” as well but apparently they also do need
>                 in-order delivery of packets, too. So the assumption
>                 that e.g. a link layer technology can simply disable
>                 in-order delivery for all “low latency” applications
>                 seems not correct. The reality may be a bit more complex.
>
>                 The 3 DupACK rule in TCP led the Internet not to
>                 follow the end-to-end principle on re-sequencing. Now
>                 we're removing the 3 DupACK rule, we can take
>                 advantage of the e2e principle.
>
>                 [ms] For TCP, three DupACKS are a SHOULD in RFC 5681
>                 and RFC 5681 is standards track. So the bar for
>                 “removing” that rule for TCP is not low…
>
>                 [ms] Also, the end-to-end principle comes with
>                 tradeoffs. For instance, relying on the end-to-end
>                 principle for recovery of bit errors is not efficient.
>                 For in-order delivery there are tradeoffs as well, see
>                 e.g. Section 4 in draft-ietf-tcpm-rack-04. There may
>                 be no free lunch.
>
>                 Of course, an application might choose to use TCP and
>                 L4S, rather than UDP and L4S (e.g QUIC). Then there
>                 could still be HoL blocking in the receiving TCP
>                 stack. But at least there's no HoL blocking in the
>                 network, and the app can choose between stream (TCP)
>                 or datagram (UDP).
>
>                 [ms] If LRO/GRO in the receiving TCP stack gets messed
>                 up, I fail to see the benefit (see
>                 draft-ietf-tcpm-rack-04).
>
>                 Michael
>
>
>                 Bob
>
>                 On 06/11/2018 19:20, Scharf, Michael wrote:
>
>                     A comment on the TCPM presentation "L4S and RACK":
>
>                     As far as I understand, the DETNET WG in RTG area
>                     has quite some uses cases for ultra-low latency
>                     transport - in particular also with bounded
>                     jitter. And some of these applications (e.g. using
>                     UDP) apparently may not be able to tolerate *any*
>                     out-of-order packet delivery.
>
>                     So perhaps some cross-area alignment on
>                     ulta-low-latency application requirements would be
>                     useful?
>
>                     Michael
>
>                     (who recently had to review
>                     draft-ietf-detnet-architecture)
>
>
>
>
>
>
>
>
>                 -- 
>
>                 ________________________________________________________________
>
>                 Bob Briscoe http://bobbriscoe.net/
>                 <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbobbriscoe.net%2F&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895506511&sdata=%2B4XolnKyPnY5Cu8ktuWXi2SJ5qboN7Br9%2B6DMrCHnmc%3D&reserved=0>
>
>
>
>
>
>
>
>             -- 
>
>             ________________________________________________________________
>
>             Bob Briscoe http://bobbriscoe.net/
>             <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbobbriscoe.net%2F&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895516515&sdata=xZPvdqkPWt0B1RaGlo0u5Pvryjazmqjm98pV00zOMWw%3D&reserved=0>
>
>
>
>
>
>
>             -- 
>
>             ________________________________________________________________
>
>             Bob Briscoe http://bobbriscoe.net/
>             <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbobbriscoe.net%2F&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895526524&sdata=SpOJ%2FduwSERDmslDytwNLNIJ5L4OxvpTzncaAx0oFrU%3D&reserved=0>
>
>
>
>
>
>         -- 
>
>         ________________________________________________________________
>
>         Bob Briscoe http://bobbriscoe.net/
>         <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbobbriscoe.net%2F&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895526524&sdata=SpOJ%2FduwSERDmslDytwNLNIJ5L4OxvpTzncaAx0oFrU%3D&reserved=0>
>
>
>
>
>     -- 
>
>     ________________________________________________________________
>
>     Bob Briscoe http://bobbriscoe.net/
>     <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbobbriscoe.net%2F&data=02%7C01%7Cpravb%40microsoft.com%7C547e503e218549483f3008d6467495b9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636773863895536537&sdata=lyBhcXU74SU56s7qr1OUSfwaIL68LcbF7WJxtmz%2BUb0%3D&reserved=0>
>
>
>
> -- 
> ________________________________________________________________
> Bob Briscoe http://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/