Re: [tsvwg] Update to Position Statement on ECT(1)

Bob Briscoe <in@bobbriscoe.net> Wed, 20 May 2020 18:59 UTC

To: "Holland, Jake" <jholland=40akamai.com@dmarc.ietf.org>, Martin Duke <martin.h.duke@gmail.com>
Cc: TSVWG <tsvwg@ietf.org>
References: <BE44EAE9-5CFB-4F5D-85B8-05AFA516C151@akamai.com> <CACL_3VEbUHB-Omwp1-g5Tq3G3J-kKj9N3jPZLcfruicw3X=AsA@mail.gmail.com> <2CBBD8CD-2088-4E41-B113-EED665853D3C@akamai.com> <CAM4esxSFCBcxXjz5JJJg1z6+wwfN3mTrtJ8bKiBsj2TeOmmFSw@mail.gmail.com> <1D8D2AF8-F805-4BAC-8126-355A8337D830@akamai.com> <CAM4esxSMELAi0BMBRynYTx44iY6f-yLEWng4QQ2Pxt9J-haxFg@mail.gmail.com> <DE770902-CA1E-405C-A944-F12114AF2C3B@akamai.com> <CAM4esxQTyDNfNiAFhiHL9Zb3OPr9jivkrD2u8DtvhsMw_2Yv-g@mail.gmail.com> <789B55A8-A560-4C93-94FA-C62498B33411@akamai.com>
From: Bob Briscoe <in@bobbriscoe.net>
Message-ID: <f60e36b7-0c04-be4b-c907-be1fd8ec66a3@bobbriscoe.net>
Date: Wed, 20 May 2020 19:59:06 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <789B55A8-A560-4C93-94FA-C62498B33411@akamai.com>
Content-Type: multipart/alternative; boundary="------------B555BF73C697048A221DF6CA"
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/h6DPXNbPR9kXXzsZTuD9x7_Bs88>
Subject: Re: [tsvwg] Update to Position Statement on ECT(1)
Precedence: list

Jake, Martin,

On 20/05/2020 02:32, Holland, Jake wrote:
>
> Hi Martin,
>
> > Jake, I like your frame of the interesting trade-off between losing 
> the L4S signal and losing the 3168 signal. However, let me add yet 
> another concern about ECT1->0: by design, packets so marked may arrive 
> *much* later than neighbors not marked, which means they are likely to 
> be declared lost anyway. Do you see this as a problem, or post 
> bottleneck is the latency difference likely small?
>
> My take is that this basically happens only in a dualq (leaving aside 
> other causes of reordering), and the classic portion of a dualq is not 
> a deep dumb fifo, but rather a ordinary PIE AQM with a good delay 
> response like normal classic PIE AQMs.  So the time difference that’s 
> added by a single such queue is what you’d expect from a normal 
> classic PIE AQM, if you have multiple loaded links (which by the way 
> we’ve seen argued is quite uncommon, at least when L4S proponents are 
> supporting the notion that mis-classifying classic CE-marked packets 
> isn’t a problem).
>
> According to the same charts, such a PIE queue tends to vary from up 
> to what looks like about 5ms at near the median to about 50ms at the 5 
> 9’s level:
>
> https://datatracker.ietf.org/meeting/105/materials/slides-105-tsvwg-sessa-5-l4s-presentation-00.pdf#page=2
>
> This time difference is well below RTO, but could easily hit a fast 
> rexmit from dupacks if the L4S endpoint isn’t using RACK, or if it is 
> using RACK it could reasonably at the higher end fall outside the RACK 
> reordering window if it’s a short enough path, since the reordering 
> window at min_rtt/4, according to 
> https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-8.3. (And 
> of course this can cascade if there are multiple queues, so at, say, 
> 10-15 such loaded bottlenecks when the first one gets a high-fi mark, 
> you might be able to make it start to bump into RTO.)
>
> However, IIRC previous review found that a fast retransmit in a modern 
> linux would get retroactively fixed up to be a non-loss signal when 
> the late ack arrives, though you could end up with some extra traffic 
> generated when it happens.
>
> So to me the reordering issue doesn’t look offhand like a likely 
> stopper for this, though it’s certainly awkward when there’s multiple 
> loaded paths in sequence and this is a clear downside for this 
> signaling strategy, regardless of whether people agree that the MD 
> signal is the more important of the 2 signals to keep in tunneled 
> paths. The similar worry over classic CE reordering got a lot of 
> discussion IIRC, but I think the resolution concluded that it wasn’t a 
> big problem.  However, I think this would be somewhat worse, and could 
> also get some similar discussion, even if this otherwise looks like a 
> worthwhile path.
>

It's worth revisiting
"Risk of reordering Classic CE packets" in the appendix here: 
https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-10#appendix-B.1
It explains the likelihood of harm if a packet that was originally sent 
as ECT0 from a 3168 sender gets marked CE at a first bottleneck, then 
gets classified into the L queue instead of the C queue at a subsequent 
dualQ coupled AQM.

I've quoted it below (indented), then added in commentary on how it 
would change for ECT1->0 (not indented).

The main difference is that ECT0->CE causes early reordering, whereas 
ECT1->0 causes late reordering. We designed L4S this way round because 
early reordering requires other ducks to all sit in a row before it 
causes any harm, whereas late reordering would generally cause harm at 
every marking.

    1.  It is quite unusual to experience queuing at more than one
           bottleneck on the same path (the available capacities have to
           be identical).

Still true.

       2.  In only a subset of these unusual cases would the first
           bottleneck support Classic ECN marking while the second
           supported L4S ECN marking, which would be the only scenario
           where some ECT(0) packets could be CE marked by an AQM
           supporting Classic ECN then the remainder experienced further
           delay through the Classic side of a subsequent L4S DualQ AQM.

For ECT0->CE, this problem exists only during transition from RFC3168 to 
L4S, but goes away if the L4S experiment succeeds in superseding 3168.
For ECT1->0, this problem occurs when there is more than one L4S 
bottleneck on a path (as opposed to a 3168 bottleneck followed by an L4S 
bottleneck). So it persists even after 3168 traffic on the Internet has 
reduced to near zero.

As 3168 sunsets, there would rarely be traffic in the C queue. However, 
packets marked ECT1->0 at a first AQM, would get held up in the second C 
queue while those around them were scheduled in front of them (assuming 
the rest of the flow and potentially other traffic in the L queue was 
keeping it busy).

* With a WRR scheduler weighted as we recommend, these ECT0 packets 
would pop out usually 15 packets late.
* With a Time-shifted FIFO, they would wait for the time shift - we 
recommend 4*target in the dualQ draft, where target=15ms, i.e. 60ms.

       3.  Even then, when a few packets are delivered early, it takes
           very unusual conditions to cause a spurious retransmission, in
           contrast to when some packets are delivered late.  The first
           bottleneck has to apply CE-marks to at least N contiguous
           packets and the second bottleneck has to inject an
           uninterrupted sequence of at least N of these packets between
           two packets earlier in the stream (where N is the reordering
           window that the transport protocol allows before it considers
           a packet is lost).

* ECT0->CE promotes C packets to L, causing early reordering, which 
leads to no harm except in the above extremely unlikely condition of N 
contiguous CE-marked packets. Also, the marking has to have been applied 
by a classic AQM that uses a low marking probability.
* For ECT1->0, the reordered packets are later, not earlier. So, if the 
reordering exceeds the reordering window, each delayed packet 
immediately causes a spurious retransmission. Also, the upstream marking 
would be applied by an L4S AQM, that often applies a higher marking 
probability, including frequent long runs of 100% marking.

Where more than one packet in a row was wrongly classified into the C 
queue, the delays for the later packets would accumulate. For instance, 
a WRR scheduler would release one C packet for every 15 L packets, even 
if the C packets were contiguous. In the other direction, the more L 
packets wrongly classifed into C, the more the L queue would have time 
to drain and leave a gap for a C packet to be released without waiting 
to be scheduled explicitly.

              For example consider N=3, and consider the sequence of
              packets 100, 101, 102, 103,... and imagine that packets
              150,151,152 from later in the flow are injected as follows:
              100, 150, 151, 101, 152, 102, 103...  If this were late
              reordering, even one packet arriving 50 out of sequence
              would trigger a spurious retransmission, but there is no
              spurious retransmission here, with early reordering,
              because packet 101 moves the cumulative ACK counter forward
              before 3 packets have arrived out of order.  Later, when
              packets 148, 149, 153... arrive, even though there is a
              3-packet hole, there will be no problem, because the
              packets to fill the hole are already in the receive buffer.

       4.  Even with the current TCP recommendation of N=3 [RFC5681]
           spurious retransmissions will be unlikely for all the above
           reasons.  As RACK [I-D.ietf-tcpm-rack] is becoming widely
           deployed, it tends to adapt its reordering window to a larger
           value of N, which will make the chance of a contiguous
           sequence of N early arrivals vanishingly small.

ECT1->0 causes harm whether or not the reordered packets are contiguous, 
so this factor does not reduce the chance of harm.

       5.  Even a run of 2 CE marks within a Classic ECN flow is
           unlikely, given FQ-CoDel is the only known widely deployed AQM
           that supports Classic ECN marking and it takes great care to
           separate out flows and to space any markings evenly along each
           flow.

Ditto.

       It is extremely unlikely that the above set of 5 eventualities
       that are each unusual in themselves would all happen
       simultaneously.  But, even if they did, the consequences would
       hardly be dire: the odd spurious fast retransmission. Admittedly
       TCP (and similar transports) reduce their congestion window when
       they deem there has been a loss, but even this can be recovered
       once the sender detects that the retransmission was spurious.

In summary,
* There is only one unlikely condition (no.1) for spurious 
retransmissions as a result of ECT1->0.
* Whereas ECT0->CE only causes spurious retransmissions if all 5 
unlikely or extremely unlikely conditions occur together.

I didn't quote a figure for condition #1, 'cos the only paper I found 
that could quantify it, describes how to get a figure, but doesn't 
really have any sound experimental results.

Deng, L. & Kuzmanovic, A., "Pong: Diagnosing Spatio-temporal Internet 
Congestion Properties," SIGMETRICS Perform. Eval. Rev. 35(1):381--382 
ACM (June 2007)

Having engaged with this argument, I do want to emphasize that the 
tunnel argument causes ECT1->0 not to work at all over 60-70% of the 
Internet for many years.

I find it weird that this thread is trying to count how many angels can 
dance on the point of a pin, when there's the tunnelling elephant 
standing in the room, squashing all the angels and bending the pin.

Bob

> Caveat emptor: this is all off-the-cuff reasoning, I haven’t tested 
> any of this of course.  Just laying out the line of reasoning as I 
> understand it, but if anybody ever tests this, it could reasonably 
> show out problems I haven’t thought of.  I do think it would be 
> potentially quite a bit of reordering sometimes in paths with multiple 
> dualqs.
>
> Best,
>
> Jake
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

Re: [tsvwg] Update to Position Statement on ECT(1) C. M. Heard
[tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) alex.burr@ealdwulf.org.uk
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) alex.burr@ealdwulf.org.uk
Re: [tsvwg] Update to Position Statement on ECT(1) Jonathan Morton
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) C. M. Heard
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Martin Duke
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Bob Briscoe
Re: [tsvwg] Update to Position Statement on ECT(1) Black, David
Re: [tsvwg] Update to Position Statement on ECT(1) Martin Duke
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Bob Briscoe
Re: [tsvwg] Update to Position Statement on ECT(1) Martin Duke
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Bob Briscoe
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Scheffenegger, Richard
Re: [tsvwg] Update to Position Statement on ECT(1) Jonathan Morton
Re: [tsvwg] Update to Position Statement on ECT(1) C. M. Heard
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Sebastian Moeller
Re: [tsvwg] Update to Position Statement on ECT(1) Holland, Jake
Re: [tsvwg] Update to Position Statement on ECT(1) Sebastian Moeller
Re: [tsvwg] Update to Position Statement on ECT(1) Gorry Fairhurst
Re: [tsvwg] Update to Position Statement on ECT(1) C. M. Heard