From nobody Sun May 23 15:00:23 2021
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id 2CA703A27C1
 for <tsvwg@ietfa.amsl.com>; Sun, 23 May 2021 15:00:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.433
X-Spam-Level: 
X-Spam-Status: No, score=-1.433 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
 DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001,
 NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665,
 URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
 header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id pwg37_tIZgUi for <tsvwg@ietfa.amsl.com>;
 Sun, 23 May 2021 15:00:12 -0700 (PDT)
Received: from mail-ssdrsserver2.hosting.co.uk
 (mail-ssdrsserver2.hosting.co.uk [185.185.85.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id 93E173A27C0
 for <tsvwg@ietf.org>; Sun, 23 May 2021 15:00:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date:
 Message-ID:From:References:Cc:To:Subject:Sender:Reply-To:
 Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
 Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
 List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=rfu0Ue8Vt2nvx+dyTGt0dFDkKQjnc+B4J7K/JY7Tq94=; b=kZgDHyZtqq7NiaaJ4rwFR5/SEh
 ihNp8NBcy0fUDkulNQY0kus6rdbKR2maEJapftDkYmUsdi3pfLewcFYEU5Uxqz9N5CGAl5iLqPVBV
 EAJI9Qg9yqkxV0U91X3c/8SfhEQYTzUhQITiBSaVq0zga0LA8H8gDiDd7R9lh/movKmqs8faXxWb1
 vesv4FOA1iLwV/xv2PaxV9rQXdF9ulOxaW6JcswAmwTwWeRz6uGrYdtDjhePPXH+IDR70I49ZorFT
 Ql4jCIGjTD47lIsGEqsURZWG3/DqW/HiA0dEOZ+aI2AdbDaInI+yw3Sye5V8eg3SlDwJNVHgn1US1
 wzG1aT3A==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:45216
 helo=[192.168.1.11])
 by ssdrsserver2.hosting.co.uk with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2)
 (envelope-from <ietf@bobbriscoe.net>)
 id 1lkw8o-0001Qs-8C; Sun, 23 May 2021 23:00:09 +0100
To: "Black, David" <David.Black@dell.com>,
 Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Wesley Eddy <wes@mti-systems.com>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <162158815765.22731.15608328324211025925@ietfa.amsl.com>
 <f8ed1105-d1db-55ce-eb1f-00de8a83b0e8@bobbriscoe.net>
 <3F147A3D-BD68-4F0A-89FF-9A92284FF0A5@gmx.de>
 <MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <0b7edf59-5bce-3189-8745-324083c98ce4@bobbriscoe.net>
Date: Sun, 23 May 2021 23:00:07 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.8.1
MIME-Version: 1.0
In-Reply-To: <MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com>
Content-Type: multipart/alternative;
 boundary="------------9B995F2954A4FDB4DF73FEC9"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hosting.co.uk
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hosting.co.uk: authenticated_id:
 in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hosting.co.uk: in@bobbriscoe.net
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ztnI5lFcSOvj9v_DwfiosHyYQkA>
Subject: Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>,
 <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>,
 <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 23 May 2021 22:00:18 -0000

This is a multi-part message in MIME format.
--------------9B995F2954A4FDB4DF73FEC9
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

David,
See inline, tagged [BB]...

On 21/05/2021 23:02, Black, David wrote:
>> section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited
> The situation is more severe than that, and it's problematic if there's any non-L4S TCP traffic that uses ECN involved.  The two SAs are actually for 2 ECN values each:
>
>     To avoid this limitation, a VPN ingress participating in the L4S
>     experiment SHOULD map packets onto two parallel SAs indexed by the
>     lowest significant bit of the IP-ECN field.
>
> That indexing puts Not-ECT [00b] and ECT(0) [10b] packets into one SA, and puts ECT(1) [01b] and CE [11b] packets into another.  Non-L4S TCP uses ECT(0) and CE, hence a flow that carries any congestion indications is split across those two SAs.  ECMP based on SPI (deployed technology) that puts those parallel SAs on different network paths is not assured to preserve packet order across those SAs, and packet reordering is not a good thing to do to a non-L4S TCP flow.

[BB] Yes, this would be a potential problem.

This particular path divergence occurs iff an ECT0 flow is CE-marked 
/prior/ to the VPN ingress:
* In general where the VPN ingress is on the end-system (transport mode) 
that doesn't happen.
* There is however a chance that CE could be introduced before a 'bump 
in the wire' VPN ingress (tunnel mode).

For a VPN participating in the L4S experiment with two SAs, if there was 
any CE before the ingress, we did think carefully about which way to 
classify it:
* If CE goes with ECT0, then if a subsequent L4S node gives ECT0 more 
queue delay than CE, the VPN anti-replay function could discard some 
ECT0 packets.
* If CE goes with ECT1, it avoids anti-replay discard completely. We had 
tried to think of possible problems with splitting Classic flows across 
SAs,  but we hadn't thought of your point where ECMP on the SPI makes CE 
and ECT0 from the same flow taking different paths.

I believe it is still best to classify CE with ECT1, based on the 
following reasoning. I think the scenario would occur with very low 
probability {Note 1}, and if it did, the CE might either arrive a little 
earlier or a little later than data packets around them.
* If earlier, the benefit of ECN would not be lost, but there would be 
an extremely small chance (p_s * p_r * p_N) of an occasional spurious 
re-xmt {Note 2}.
* If later, the chance that a delayed CE would be mistaken for a drop 
and trigger a spurious re-xmt and a congestion response would probably 
be higher, but still very small (p_s * p_r * p_l) {Note 3}. That would 
lose the benefit of ECN but, at least for Classic flows, a single CE is 
meant to trigger a large congestion response as if it was a loss anyway.

I don't want to belittle the point you've made, because this is 
certainly a new problem. However, I do think the probabilities need to 
be put in perspective.

I suspect the ECMP problem with any ECN (3168 or L4S) is likely to be a 
greater concern. You might recall that João Taveira from Fastly pointed 
this out during a tsvwg discussion on ECN back in 2017. This link should 
jump to 14:25 mins into a 2017 NANOG talk from Lorenzo Saino of Fastly 
about how they had to disable ECN negotiation when Apple clients started 
to request ECN in 2015 - because a number of different ECMP vendors 
still hash on the ToS byte for ECMP and load balancing, so the data 
would have been routed to a different server from the handshake:
https://youtu.be/ciClZdwHelU?t=805

Regards



Bob

{Note 1}: Reasoning for scenario occurring with low probability (we'll 
call it p_s):
Usually, an institution provides a tunnel-mode VPN between physically 
secured networks (e.g. between their intranet and an employee's home 
laptop). AQM deployment would be unusual in the interior of corporate 
and institutional intranets, because the bottleneck is more likely 
elsewhere (e.g. in the access link between the site and the Internet). 
However, there are bound to be some cases of tunnel-mode VPNs where the 
traffic arriving could have already been ECN-marked (for instance 
Jonathan Morton's example of gamers providing a VPN end-point on the 
public Internet to hide their own IP address).
Then, to experience this problem, there also has to be some load 
balancing or ECMP after the VPN ingress but before the egress. I'm sure 
that will occur sometime somewhere (let's say with probability p_r). The 
likelihood of the scenario occurring will then be p_s * p_r.

{Note 2}:
* N packets in a row within a Classic flow have to be CE-marked for 
early CE packets to result in a spurious re-xmt, let's say that happens 
with probability p_N. As already explained in Appx B of ecn-l4s-id, N 
was traditionally 3, but RACK is making it larger. And the main sources 
of ECN marking for Classic flows are FQ_CoDel and CAKE, both of which 
take great care not to mark multiple packets in a row. So p_N is going 
to be tiny.
And the probability of a spurious re-xmt with early reordering will be 
the vanishingly small product (p_s * p_r * p_N).

{Note 3}:
* Let's say p_l (for late) is the probability that the reordering 
between the two SAs is sufficient to make each CE packet arrive late 
enough  to be deemed a loss. p_l is unlikely to be as small as p_N, but 
the overall probability of this occurring is still reduced because the 
probability that CE marking precedes that VPN (p_s), and that there's 
routing keyed on flow IDs and SPIs (p_r). So the overall probability of 
a spurious rexmt due to late reordering is (p_s * p_r * p_l).



Bob

> Thanks, --David
>
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de>
> Sent: Friday, May 21, 2021 5:27 PM
> To: Bob Briscoe
> Cc: Gorry Fairhurst; Black, David; Wesley Eddy; tsvwg@ietf.org
> Subject: Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17
>
>
> [EXTERNAL EMAIL]
>
> Bob, chairs,
>
> section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited since it ignores that VPNs might propagate both DSCPs and ECN bits between the layers, so IMHO a better approach might be to recommend to treat DSCP+ECN bits as one aggregate byte (let's cal it TOS ;) ) as the extra ECT(1)-SA seems to be required for all SAs that already exist to deal with multiple supported DSCPs. So in a sense the recommendation would be to double the number of SAs.

[BB] Yes, we ought to reword it to say that the VPN ingress should use 
two SAs indexed on the LSB of the ECN field, and, if it was also 
classifying on DSCPs, it could also consider classifying any low latency 
DSCP(s) with the L4S packets. To avoid the anti-replay problem, there 
would only need to be one SA configured per each degree of queuing 
delay, not one for every ECN x DSCP combination.

>
> Also:
> "and the current draft of DTLS 1.3 says "The receiver	
>   	   SHOULD pick a window large enough to handle any plausible reordering,	
>   	   which depends on the data rate."  However, in practice, the size of	
>   	   the VPN's anti-replay window is not always scaled appropriately."
>
> L4S on a 10 ms path under load can introduce re-ordering in the range of 50 ms (roughly twice the difference between the L- and C-queue delay targets), re-ordering tolerance 5 times of the path RTT seems to be a bit on the high side to expect, no?

[BB] The above text that I quoted from the DTLS spec. is reasonable, 
both practically (see below) and in terms of taking responsibility for 
the problem. Beyond its window, the anti-replay function presumes a 
packet is guilty of a replay attack with no evidence, purely because it 
chooses not to hold that amount of evidence. Therefore it's proper that 
it holds a sufficient window of evidence for any plausible reordering.

BTW, the C-queue target has never been 25ms. I noticed JM said that 
incorrectly as well recently.
* A default C queue delay target of 15ms has always been recommended in 
aqm-dualq-coupled. That results in PI2 Qdelay of about 25ms at the 
99%ile or 30ms at the 99.9%ile. We have been considering whether to 
change the default target to 10ms for some time, but not done so yet.
* Low Latency DOCSIS specifies a default C queue delay target of 10ms.

So a replay window allowing for 30ms of packets at the interface rate 
would be sufficient.
At 1Gb/s (say) using 1500B packets, that's a replay window of 2500 packets.

Quoting Pete Heist's info here 
https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled 
:

    "Modern Linux kernels have a default maximum replay window size of
    4096 (|XFRMA_REPLAY_ESN_MAX| in xfrm.h
    <https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/xfrm.h>).
    Wireguard uses a hardcoded value of 8192 with no option for runtime
    configuration, increased from 2048 in May 2020 by this commit
    <https://git.zx2c4.com/wireguard-linux/commit/drivers/net/wireguard?id=c78a0b4a78839d572d8a80f6a62221c0d7843135>."

Regards



Bob

>
>
>
>
> Regards
> 	Sebastian
>
>
>> On May 21, 2021, at 11:21, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Chairs, list,
>>
>> We've posted a new rev of draft-ietf-tsvwg-ecn-l4s-id-17 attempting to address all the discussion since the last posting just before the interim. In particular:
>> * review comments on a careful read from Gorry and the chairs
>> * the VPN anti-replay problem
>> * added an out-of-band test for an RFC3168 ECN AQM in a shared queue.
>>
>> There are a couple of outstanding discussions, which I'm sure will continue on the list, e.g. the role of RFC4774 and whether to remove any of Appx C. But it was considered better to get the queued up changes out, to re-base the discussions.
>>
>> This is quite an extensive set of changes, so pls check and pass any comments to the list.
>>
>> Thanks for everyone who is contributing, and particularly to the chairs for continuing to referee this all. We've added appropriate thanks in the Acks section.
>>
>>
>> Bob
>>
>>
>> On 21/05/2021 10:09, internet-drafts@ietf.org wrote:
>>> A New Internet-Draft is available from the on-line Internet-Drafts directories.
>>> This draft is a work item of the Transport Area Working Group WG of the IETF.
>>>
>>>          Title           : Explicit Congestion Notification (ECN) Protocol for Very Low Queuing Delay (L4S)
>>>          Authors         : Koen De Schepper
>>>                            Bob Briscoe
>>> 	Filename        : draft-ietf-tsvwg-ecn-l4s-id-17.txt
>>> 	Pages           : 57
>>> 	Date            : 2021-05-21
>>>
>>> Abstract:
>>>     This specification defines the protocol to be used for a new network
>>>     service called low latency, low loss and scalable throughput (L4S).
>>>     L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
>>>     layer that is similar to the original (or 'Classic') ECN approach,
>>>     except as specified within.  L4S uses 'scalable' congestion control,
>>>     which induces much more frequent control signals from the network and
>>>     it responds to them with much more fine-grained adjustments, so that
>>>     very low (typically sub-millisecond on average) and consistently low
>>>     queuing delay becomes possible for L4S traffic without compromising
>>>     link utilization.  Thus even capacity-seeking (TCP-like) traffic can
>>>     have high bandwidth and very low delay at the same time, even during
>>>     periods of high traffic load.
>>>
>>>     The L4S identifier defined in this document distinguishes L4S from
>>>     'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
>>>     migration path so that suitably modified network bottlenecks can
>>>     distinguish and isolate existing traffic that still follows the
>>>     Classic behaviour, to prevent it degrading the low queuing delay and
>>>     low loss of L4S traffic.  This specification defines the rules that
>>>     L4S transports and network elements need to follow with the intention
>>>     that L4S flows neither harm each other's performance nor that of
>>>     Classic traffic.  Examples of new active queue management (AQM)
>>>     marking algorithms and examples of new transports (whether TCP-like
>>>     or real-time) are specified separately.
>>>
>>>
>>> The IETF datatracker status page for this draft is:
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-tsvwg-ecn-l4s-id/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9qj70HfR$ [datatracker[.]ietf[.]org]
>>>
>>> There is also an htmlized version available at:
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9vC50L2Y$ [datatracker[.]ietf[.]org]
>>>
>>> A diff from the previous version is available at:
>>> https://urldefense.com/v3/__https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9oZoDRib$ [ietf[.]org]
>>>
>>>
>>> Internet-Drafts are also available by anonymous FTP at:
>>> https://urldefense.com/v3/__ftp://ftp.ietf.org/internet-drafts/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9rKXcwvA$ [ftp[.]ietf[.]org]
>>>
>>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9gj_uPXC$ [bobbriscoe[.]net]
>>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


--------------9B995F2954A4FDB4DF73FEC9
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    David, <br>
    See inline, tagged [BB]...<br>
    <br>
    <div class="moz-cite-prefix">On 21/05/2021 23:02, Black, David
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com">
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
The situation is more severe than that, and it's problematic if there's any non-L4S TCP traffic that uses ECN involved.  The two SAs are actually for 2 ECN values each:

   To avoid this limitation, a VPN ingress participating in the L4S
   experiment SHOULD map packets onto two parallel SAs indexed by the
   lowest significant bit of the IP-ECN field.

That indexing puts Not-ECT [00b] and ECT(0) [10b] packets into one SA, and puts ECT(1) [01b] and CE [11b] packets into another.  Non-L4S TCP uses ECT(0) and CE, hence a flow that carries any congestion indications is split across those two SAs.  ECMP based on SPI (deployed technology) that puts those parallel SAs on different network paths is not assured to preserve packet order across those SAs, and packet reordering is not a good thing to do to a non-L4S TCP flow.</pre>
    </blockquote>
    <br>
    [BB] Yes, this would be a potential problem.<br>
    <br>
    This particular path divergence occurs iff an ECT0 flow is CE-marked
    /prior/ to the VPN ingress:<br>
    * In general where the VPN ingress is on the end-system (transport
    mode) that doesn't happen. <br>
    * There is however a chance that CE could be introduced before a
    'bump in the wire' VPN ingress (tunnel mode). <br>
    <br>
    For a VPN participating in the L4S experiment with two SAs, if there
    was any CE before the ingress, we did think carefully about which
    way to classify it:<br>
    * If CE goes with ECT0, then if a subsequent L4S node gives ECT0
    more queue delay than CE, the VPN anti-replay function could discard
    some ECT0 packets.<br>
    * If CE goes with ECT1, it avoids anti-replay discard completely. We
    had tried to think of possible problems with splitting Classic flows
    across SAs,  but we hadn't thought of your point where ECMP on the
    SPI makes CE and ECT0 from the same flow taking different paths.<br>
    <br>
    I believe it is still best to classify CE with ECT1, based on the
    following reasoning. I think the scenario would occur with very low
    probability {Note 1}, and if it did, the CE might either arrive a
    little earlier or a little later than data packets around them. <br>
    * If earlier, the benefit of ECN would not be lost, but there would
    be an extremely small chance (p_s * p_r * p_N) of an occasional
    spurious re-xmt {Note 2}.<br>
    * If later, the chance that a delayed CE would be mistaken for a
    drop and trigger a spurious re-xmt and a congestion response would
    probably be higher, but still very small (p_s * p_r * p_l) {Note 3}.
    That would lose the benefit of ECN but, at least for Classic flows,
    a single CE is meant to trigger a large congestion response as if it
    was a loss anyway.<br>
    <br>
    I don't want to belittle the point you've made, because this is
    certainly a new problem. However, I do think the probabilities need
    to be put in perspective. <br>
    <br>
    I suspect the ECMP problem with any ECN (3168 or L4S) is likely to
    be a greater concern. You might recall that João Taveira from Fastly
    pointed this out during a tsvwg discussion on ECN back in 2017. This
    link should jump to 14:25 mins into a 2017 NANOG talk from Lorenzo
    Saino of Fastly about how they had to disable ECN negotiation when
    Apple clients started to request ECN in 2015 - because a number of
    different ECMP vendors still hash on the ToS byte for ECMP and load
    balancing, so the data would have been routed to a different server
    from the handshake:
    <br>
        <a class="moz-txt-link-freetext"
      href="https://youtu.be/ciClZdwHelU?t=805">https://youtu.be/ciClZdwHelU?t=805</a><br>
    <br>
    Regards<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    {Note 1}: Reasoning for scenario occurring with low probability
    (we'll call it p_s):<br>
    Usually, an institution provides a tunnel-mode VPN between
    physically secured networks (e.g. between their intranet and an
    employee's home laptop). AQM deployment would be unusual in the
    interior of corporate and institutional intranets, because the
    bottleneck is more likely elsewhere (e.g. in
    the access link between the site and the Internet). However, there
    are bound to be some cases of tunnel-mode VPNs where the traffic
    arriving could have already been ECN-marked (for instance Jonathan
    Morton's example of gamers providing a VPN end-point on the public
    Internet to hide their own IP address). <br>
    Then, to experience this problem, there also has to be some load
    balancing or ECMP after the VPN ingress but before the egress. I'm
    sure that will occur sometime somewhere (let's say with probability
    p_r). The likelihood of the scenario occurring will then be p_s *
    p_r.<br>
    <br>
    {Note 2}:<br>
    * N packets in a row within a Classic flow have to be CE-marked for
    early CE packets to result in a spurious re-xmt, let's say that
    happens with probability p_N. As already explained in Appx B of
    ecn-l4s-id, N was traditionally 3, but RACK is making it larger. And
    the main sources of ECN marking for Classic flows are FQ_CoDel and
    CAKE, both of which take great care not to mark multiple packets in
    a row. So p_N is going to be tiny.<br>
    And the probability of a spurious re-xmt with early reordering will
    be the vanishingly small product (p_s * p_r * p_N).<br>
    <br>
    {Note 3}:<br>
    * Let's say p_l (for late) is the probability that the reordering
    between the two SAs is sufficient to make each CE packet arrive late
    enough  to be deemed a loss. p_l is unlikely to be as small as p_N,
    but the overall probability of this occurring is still reduced
    because the probability that CE marking precedes that VPN (p_s), and
    that there's routing keyed on flow IDs and SPIs (p_r). So the
    overall probability of a spurious rexmt due to late reordering is
    (p_s * p_r * p_l).<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    <blockquote type="cite"
cite="mid:MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">
Thanks, --David

-----Original Message-----
From: Sebastian Moeller <a class="moz-txt-link-rfc2396E" href="mailto:moeller0@gmx.de">&lt;moeller0@gmx.de&gt;</a> 
Sent: Friday, May 21, 2021 5:27 PM
To: Bob Briscoe
Cc: Gorry Fairhurst; Black, David; Wesley Eddy; <a class="moz-txt-link-abbreviated" href="mailto:tsvwg@ietf.org">tsvwg@ietf.org</a>
Subject: Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17


[EXTERNAL EMAIL] 

Bob, chairs,

section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited since it ignores that VPNs might propagate both DSCPs and ECN bits between the layers, so IMHO a better approach might be to recommend to treat DSCP+ECN bits as one aggregate byte (let's cal it TOS ;) ) as the extra ECT(1)-SA seems to be required for all SAs that already exist to deal with multiple supported DSCPs. So in a sense the recommendation would be to double the number of SAs.</pre>
    </blockquote>
    <br>
    [BB] Yes, we ought to reword it to say that the VPN ingress should
    use two SAs indexed on the LSB of the ECN field, and, if it was also
    classifying on DSCPs, it could also consider classifying any low
    latency DSCP(s) with the L4S packets. To avoid the anti-replay
    problem, there would only need to be one SA configured per each
    degree of queuing delay, not one for every ECN x DSCP combination.<br>
    <br>
    <blockquote type="cite"
cite="mid:MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">

Also:
"and the current draft of DTLS 1.3 says "The receiver	
 	   SHOULD pick a window large enough to handle any plausible reordering,	
 	   which depends on the data rate."  However, in practice, the size of	
 	   the VPN's anti-replay window is not always scaled appropriately."

L4S on a 10 ms path under load can introduce re-ordering in the range of 50 ms (roughly twice the difference between the L- and C-queue delay targets), re-ordering tolerance 5 times of the path RTT seems to be a bit on the high side to expect, no?</pre>
    </blockquote>
    <br>
    [BB] The above text that I quoted from the DTLS spec. is reasonable,
    both practically (see below) and in terms of taking responsibility
    for the problem. Beyond its window, the anti-replay function
    presumes a packet is guilty of a replay attack with no evidence,
    purely because it chooses not to hold that amount of evidence.
    Therefore it's proper that it holds a sufficient window of evidence
    for any plausible reordering.<br>
    <br>
    BTW, the C-queue target has never been 25ms. I noticed JM said that
    incorrectly as well recently.<br>
    * A default C queue delay target of 15ms has always been recommended
    in aqm-dualq-coupled. That results in PI2 Qdelay of about 25ms at
    the 99%ile or 30ms at the 99.9%ile. We have been considering whether
    to change the default target to 10ms for some time, but not done so
    yet.<br>
    * Low Latency DOCSIS specifies a default C queue delay target of
    10ms. <br>
    <br>
    So a replay window allowing for 30ms of packets at the interface
    rate would be sufficient.<br>
    At 1Gb/s (say) using 1500B packets, that's a replay window of 2500
    packets.<br>
    <br>
    Quoting Pete Heist's info here
<a class="moz-txt-link-freetext" href="https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled">https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled</a>
    :<br>
    <blockquote>"Modern Linux kernels have a default maximum replay
      window size of 4096
      (<code>XFRMA_REPLAY_ESN_MAX</code> in
      <a
href="https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/xfrm.h"
        rel="nofollow">xfrm.h</a>).
      Wireguard uses a hardcoded value of 8192 with no option for
      runtime
      configuration, increased from 2048 in May 2020 by <a
href="https://git.zx2c4.com/wireguard-linux/commit/drivers/net/wireguard?id=c78a0b4a78839d572d8a80f6a62221c0d7843135"
        rel="nofollow">this
        commit</a>."<br>
    </blockquote>
    Regards<br>
    <br>
    <br>
    <br>
    Bob<br>
    <br>
    <blockquote type="cite"
cite="mid:MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">




Regards
	Sebastian


</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">On May 21, 2021, at 11:21, Bob Briscoe <a class="moz-txt-link-rfc2396E" href="mailto:ietf@bobbriscoe.net">&lt;ietf@bobbriscoe.net&gt;</a> wrote:

Chairs, list,

We've posted a new rev of draft-ietf-tsvwg-ecn-l4s-id-17 attempting to address all the discussion since the last posting just before the interim. In particular:
* review comments on a careful read from Gorry and the chairs
* the VPN anti-replay problem
* added an out-of-band test for an RFC3168 ECN AQM in a shared queue.

There are a couple of outstanding discussions, which I'm sure will continue on the list, e.g. the role of RFC4774 and whether to remove any of Appx C. But it was considered better to get the queued up changes out, to re-base the discussions.

This is quite an extensive set of changes, so pls check and pass any comments to the list.

Thanks for everyone who is contributing, and particularly to the chairs for continuing to referee this all. We've added appropriate thanks in the Acks section.


Bob


On 21/05/2021 10:09, <a class="moz-txt-link-abbreviated" href="mailto:internet-drafts@ietf.org">internet-drafts@ietf.org</a> wrote:
</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Transport Area Working Group WG of the IETF.

        Title           : Explicit Congestion Notification (ECN) Protocol for Very Low Queuing Delay (L4S)
        Authors         : Koen De Schepper
                          Bob Briscoe
	Filename        : draft-ietf-tsvwg-ecn-l4s-id-17.txt
	Pages           : 57
	Date            : 2021-05-21

Abstract:
   This specification defines the protocol to be used for a new network
   service called low latency, low loss and scalable throughput (L4S).
   L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
   layer that is similar to the original (or 'Classic') ECN approach,
   except as specified within.  L4S uses 'scalable' congestion control,
   which induces much more frequent control signals from the network and
   it responds to them with much more fine-grained adjustments, so that
   very low (typically sub-millisecond on average) and consistently low
   queuing delay becomes possible for L4S traffic without compromising
   link utilization.  Thus even capacity-seeking (TCP-like) traffic can
   have high bandwidth and very low delay at the same time, even during
   periods of high traffic load.

   The L4S identifier defined in this document distinguishes L4S from
   'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
   migration path so that suitably modified network bottlenecks can
   distinguish and isolate existing traffic that still follows the
   Classic behaviour, to prevent it degrading the low queuing delay and
   low loss of L4S traffic.  This specification defines the rules that
   L4S transports and network elements need to follow with the intention
   that L4S flows neither harm each other's performance nor that of
   Classic traffic.  Examples of new active queue management (AQM)
   marking algorithms and examples of new transports (whether TCP-like
   or real-time) are specified separately.


The IETF datatracker status page for this draft is:
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-tsvwg-ecn-l4s-id/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9qj70HfR$">https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-tsvwg-ecn-l4s-id/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9qj70HfR$</a> [datatracker[.]ietf[.]org]

There is also an htmlized version available at:
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9vC50L2Y$">https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9vC50L2Y$</a> [datatracker[.]ietf[.]org]

A diff from the previous version is available at:
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9oZoDRib$">https://urldefense.com/v3/__https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9oZoDRib$</a> [ietf[.]org]


Internet-Drafts are also available by anonymous FTP at:
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__ftp://ftp.ietf.org/internet-drafts/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9rKXcwvA$">https://urldefense.com/v3/__ftp://ftp.ietf.org/internet-drafts/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9rKXcwvA$</a> [ftp[.]ietf[.]org]


</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">
-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9gj_uPXC$">https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9gj_uPXC$</a> [bobbriscoe[.]net]

</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
________________________________________________________________
Bob Briscoe                               <a class="moz-txt-link-freetext" href="http://bobbriscoe.net/">http://bobbriscoe.net/</a></pre>
  </body>
</html>

--------------9B995F2954A4FDB4DF73FEC9--

