Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17

Bob Briscoe <ietf@bobbriscoe.net> Sun, 23 May 2021 22:00 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2CA703A27C1 for <tsvwg@ietfa.amsl.com>; Sun, 23 May 2021 15:00:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.433
X-Spam-Level:
X-Spam-Status: No, score=-1.433 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pwg37_tIZgUi for <tsvwg@ietfa.amsl.com>; Sun, 23 May 2021 15:00:12 -0700 (PDT)
Received: from mail-ssdrsserver2.hosting.co.uk (mail-ssdrsserver2.hosting.co.uk [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 93E173A27C0 for <tsvwg@ietf.org>; Sun, 23 May 2021 15:00:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=rfu0Ue8Vt2nvx+dyTGt0dFDkKQjnc+B4J7K/JY7Tq94=; b=kZgDHyZtqq7NiaaJ4rwFR5/SEh ihNp8NBcy0fUDkulNQY0kus6rdbKR2maEJapftDkYmUsdi3pfLewcFYEU5Uxqz9N5CGAl5iLqPVBV EAJI9Qg9yqkxV0U91X3c/8SfhEQYTzUhQITiBSaVq0zga0LA8H8gDiDd7R9lh/movKmqs8faXxWb1 vesv4FOA1iLwV/xv2PaxV9rQXdF9ulOxaW6JcswAmwTwWeRz6uGrYdtDjhePPXH+IDR70I49ZorFT Ql4jCIGjTD47lIsGEqsURZWG3/DqW/HiA0dEOZ+aI2AdbDaInI+yw3Sye5V8eg3SlDwJNVHgn1US1 wzG1aT3A==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:45216 helo=[192.168.1.11]) by ssdrsserver2.hosting.co.uk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <ietf@bobbriscoe.net>) id 1lkw8o-0001Qs-8C; Sun, 23 May 2021 23:00:09 +0100
To: "Black, David" <David.Black@dell.com>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Wesley Eddy <wes@mti-systems.com>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <162158815765.22731.15608328324211025925@ietfa.amsl.com> <f8ed1105-d1db-55ce-eb1f-00de8a83b0e8@bobbriscoe.net> <3F147A3D-BD68-4F0A-89FF-9A92284FF0A5@gmx.de> <MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <0b7edf59-5bce-3189-8745-324083c98ce4@bobbriscoe.net>
Date: Sun, 23 May 2021 23:00:07 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1
MIME-Version: 1.0
In-Reply-To: <MN2PR19MB404590F453EF73E681A7F17D83299@MN2PR19MB4045.namprd19.prod.outlook.com>
Content-Type: multipart/alternative; boundary="------------9B995F2954A4FDB4DF73FEC9"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hosting.co.uk
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hosting.co.uk: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hosting.co.uk: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/ztnI5lFcSOvj9v_DwfiosHyYQkA>
Subject: Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 23 May 2021 22:00:18 -0000

David,
See inline, tagged [BB]...

On 21/05/2021 23:02, Black, David wrote:
>> section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited
> The situation is more severe than that, and it's problematic if there's any non-L4S TCP traffic that uses ECN involved.  The two SAs are actually for 2 ECN values each:
>
>     To avoid this limitation, a VPN ingress participating in the L4S
>     experiment SHOULD map packets onto two parallel SAs indexed by the
>     lowest significant bit of the IP-ECN field.
>
> That indexing puts Not-ECT [00b] and ECT(0) [10b] packets into one SA, and puts ECT(1) [01b] and CE [11b] packets into another.  Non-L4S TCP uses ECT(0) and CE, hence a flow that carries any congestion indications is split across those two SAs.  ECMP based on SPI (deployed technology) that puts those parallel SAs on different network paths is not assured to preserve packet order across those SAs, and packet reordering is not a good thing to do to a non-L4S TCP flow.

[BB] Yes, this would be a potential problem.

This particular path divergence occurs iff an ECT0 flow is CE-marked 
/prior/ to the VPN ingress:
* In general where the VPN ingress is on the end-system (transport mode) 
that doesn't happen.
* There is however a chance that CE could be introduced before a 'bump 
in the wire' VPN ingress (tunnel mode).

For a VPN participating in the L4S experiment with two SAs, if there was 
any CE before the ingress, we did think carefully about which way to 
classify it:
* If CE goes with ECT0, then if a subsequent L4S node gives ECT0 more 
queue delay than CE, the VPN anti-replay function could discard some 
ECT0 packets.
* If CE goes with ECT1, it avoids anti-replay discard completely. We had 
tried to think of possible problems with splitting Classic flows across 
SAs,  but we hadn't thought of your point where ECMP on the SPI makes CE 
and ECT0 from the same flow taking different paths.

I believe it is still best to classify CE with ECT1, based on the 
following reasoning. I think the scenario would occur with very low 
probability {Note 1}, and if it did, the CE might either arrive a little 
earlier or a little later than data packets around them.
* If earlier, the benefit of ECN would not be lost, but there would be 
an extremely small chance (p_s * p_r * p_N) of an occasional spurious 
re-xmt {Note 2}.
* If later, the chance that a delayed CE would be mistaken for a drop 
and trigger a spurious re-xmt and a congestion response would probably 
be higher, but still very small (p_s * p_r * p_l) {Note 3}. That would 
lose the benefit of ECN but, at least for Classic flows, a single CE is 
meant to trigger a large congestion response as if it was a loss anyway.

I don't want to belittle the point you've made, because this is 
certainly a new problem. However, I do think the probabilities need to 
be put in perspective.

I suspect the ECMP problem with any ECN (3168 or L4S) is likely to be a 
greater concern. You might recall that João Taveira from Fastly pointed 
this out during a tsvwg discussion on ECN back in 2017. This link should 
jump to 14:25 mins into a 2017 NANOG talk from Lorenzo Saino of Fastly 
about how they had to disable ECN negotiation when Apple clients started 
to request ECN in 2015 - because a number of different ECMP vendors 
still hash on the ToS byte for ECMP and load balancing, so the data 
would have been routed to a different server from the handshake:
https://youtu.be/ciClZdwHelU?t=805

Regards



Bob

{Note 1}: Reasoning for scenario occurring with low probability (we'll 
call it p_s):
Usually, an institution provides a tunnel-mode VPN between physically 
secured networks (e.g. between their intranet and an employee's home 
laptop). AQM deployment would be unusual in the interior of corporate 
and institutional intranets, because the bottleneck is more likely 
elsewhere (e.g. in the access link between the site and the Internet). 
However, there are bound to be some cases of tunnel-mode VPNs where the 
traffic arriving could have already been ECN-marked (for instance 
Jonathan Morton's example of gamers providing a VPN end-point on the 
public Internet to hide their own IP address).
Then, to experience this problem, there also has to be some load 
balancing or ECMP after the VPN ingress but before the egress. I'm sure 
that will occur sometime somewhere (let's say with probability p_r). The 
likelihood of the scenario occurring will then be p_s * p_r.

{Note 2}:
* N packets in a row within a Classic flow have to be CE-marked for 
early CE packets to result in a spurious re-xmt, let's say that happens 
with probability p_N. As already explained in Appx B of ecn-l4s-id, N 
was traditionally 3, but RACK is making it larger. And the main sources 
of ECN marking for Classic flows are FQ_CoDel and CAKE, both of which 
take great care not to mark multiple packets in a row. So p_N is going 
to be tiny.
And the probability of a spurious re-xmt with early reordering will be 
the vanishingly small product (p_s * p_r * p_N).

{Note 3}:
* Let's say p_l (for late) is the probability that the reordering 
between the two SAs is sufficient to make each CE packet arrive late 
enough  to be deemed a loss. p_l is unlikely to be as small as p_N, but 
the overall probability of this occurring is still reduced because the 
probability that CE marking precedes that VPN (p_s), and that there's 
routing keyed on flow IDs and SPIs (p_r). So the overall probability of 
a spurious rexmt due to late reordering is (p_s * p_r * p_l).



Bob

> Thanks, --David
>
> -----Original Message-----
> From: Sebastian Moeller <moeller0@gmx.de>
> Sent: Friday, May 21, 2021 5:27 PM
> To: Bob Briscoe
> Cc: Gorry Fairhurst; Black, David; Wesley Eddy; tsvwg@ietf.org
> Subject: Re: [tsvwg] New rev of draft-ietf-tsvwg-ecn-l4s-id-17
>
>
> [EXTERNAL EMAIL]
>
> Bob, chairs,
>
> section 6.2 with its, "use two SAs, one for ECT(1) and one for the rest" seems a bit limited since it ignores that VPNs might propagate both DSCPs and ECN bits between the layers, so IMHO a better approach might be to recommend to treat DSCP+ECN bits as one aggregate byte (let's cal it TOS ;) ) as the extra ECT(1)-SA seems to be required for all SAs that already exist to deal with multiple supported DSCPs. So in a sense the recommendation would be to double the number of SAs.

[BB] Yes, we ought to reword it to say that the VPN ingress should use 
two SAs indexed on the LSB of the ECN field, and, if it was also 
classifying on DSCPs, it could also consider classifying any low latency 
DSCP(s) with the L4S packets. To avoid the anti-replay problem, there 
would only need to be one SA configured per each degree of queuing 
delay, not one for every ECN x DSCP combination.

>
> Also:
> "and the current draft of DTLS 1.3 says "The receiver	
>   	   SHOULD pick a window large enough to handle any plausible reordering,	
>   	   which depends on the data rate."  However, in practice, the size of	
>   	   the VPN's anti-replay window is not always scaled appropriately."
>
> L4S on a 10 ms path under load can introduce re-ordering in the range of 50 ms (roughly twice the difference between the L- and C-queue delay targets), re-ordering tolerance 5 times of the path RTT seems to be a bit on the high side to expect, no?

[BB] The above text that I quoted from the DTLS spec. is reasonable, 
both practically (see below) and in terms of taking responsibility for 
the problem. Beyond its window, the anti-replay function presumes a 
packet is guilty of a replay attack with no evidence, purely because it 
chooses not to hold that amount of evidence. Therefore it's proper that 
it holds a sufficient window of evidence for any plausible reordering.

BTW, the C-queue target has never been 25ms. I noticed JM said that 
incorrectly as well recently.
* A default C queue delay target of 15ms has always been recommended in 
aqm-dualq-coupled. That results in PI2 Qdelay of about 25ms at the 
99%ile or 30ms at the 99.9%ile. We have been considering whether to 
change the default target to 10ms for some time, but not done so yet.
* Low Latency DOCSIS specifies a default C queue delay target of 10ms.

So a replay window allowing for 30ms of packets at the interface rate 
would be sufficient.
At 1Gb/s (say) using 1500B packets, that's a replay window of 2500 packets.

Quoting Pete Heist's info here 
https://github.com/heistp/l4s-tests/#dropped-packets-for-tunnels-with-replay-protection-enabled 
:

    "Modern Linux kernels have a default maximum replay window size of
    4096 (|XFRMA_REPLAY_ESN_MAX| in xfrm.h
    <https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/xfrm.h>).
    Wireguard uses a hardcoded value of 8192 with no option for runtime
    configuration, increased from 2048 in May 2020 by this commit
    <https://git.zx2c4.com/wireguard-linux/commit/drivers/net/wireguard?id=c78a0b4a78839d572d8a80f6a62221c0d7843135>."

Regards



Bob

>
>
>
>
> Regards
> 	Sebastian
>
>
>> On May 21, 2021, at 11:21, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Chairs, list,
>>
>> We've posted a new rev of draft-ietf-tsvwg-ecn-l4s-id-17 attempting to address all the discussion since the last posting just before the interim. In particular:
>> * review comments on a careful read from Gorry and the chairs
>> * the VPN anti-replay problem
>> * added an out-of-band test for an RFC3168 ECN AQM in a shared queue.
>>
>> There are a couple of outstanding discussions, which I'm sure will continue on the list, e.g. the role of RFC4774 and whether to remove any of Appx C. But it was considered better to get the queued up changes out, to re-base the discussions.
>>
>> This is quite an extensive set of changes, so pls check and pass any comments to the list.
>>
>> Thanks for everyone who is contributing, and particularly to the chairs for continuing to referee this all. We've added appropriate thanks in the Acks section.
>>
>>
>> Bob
>>
>>
>> On 21/05/2021 10:09, internet-drafts@ietf.org wrote:
>>> A New Internet-Draft is available from the on-line Internet-Drafts directories.
>>> This draft is a work item of the Transport Area Working Group WG of the IETF.
>>>
>>>          Title           : Explicit Congestion Notification (ECN) Protocol for Very Low Queuing Delay (L4S)
>>>          Authors         : Koen De Schepper
>>>                            Bob Briscoe
>>> 	Filename        : draft-ietf-tsvwg-ecn-l4s-id-17.txt
>>> 	Pages           : 57
>>> 	Date            : 2021-05-21
>>>
>>> Abstract:
>>>     This specification defines the protocol to be used for a new network
>>>     service called low latency, low loss and scalable throughput (L4S).
>>>     L4S uses an Explicit Congestion Notification (ECN) scheme at the IP
>>>     layer that is similar to the original (or 'Classic') ECN approach,
>>>     except as specified within.  L4S uses 'scalable' congestion control,
>>>     which induces much more frequent control signals from the network and
>>>     it responds to them with much more fine-grained adjustments, so that
>>>     very low (typically sub-millisecond on average) and consistently low
>>>     queuing delay becomes possible for L4S traffic without compromising
>>>     link utilization.  Thus even capacity-seeking (TCP-like) traffic can
>>>     have high bandwidth and very low delay at the same time, even during
>>>     periods of high traffic load.
>>>
>>>     The L4S identifier defined in this document distinguishes L4S from
>>>     'Classic' (e.g. TCP-Reno-friendly) traffic.  It gives an incremental
>>>     migration path so that suitably modified network bottlenecks can
>>>     distinguish and isolate existing traffic that still follows the
>>>     Classic behaviour, to prevent it degrading the low queuing delay and
>>>     low loss of L4S traffic.  This specification defines the rules that
>>>     L4S transports and network elements need to follow with the intention
>>>     that L4S flows neither harm each other's performance nor that of
>>>     Classic traffic.  Examples of new active queue management (AQM)
>>>     marking algorithms and examples of new transports (whether TCP-like
>>>     or real-time) are specified separately.
>>>
>>>
>>> The IETF datatracker status page for this draft is:
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-tsvwg-ecn-l4s-id/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9qj70HfR$ [datatracker[.]ietf[.]org]
>>>
>>> There is also an htmlized version available at:
>>> https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9vC50L2Y$ [datatracker[.]ietf[.]org]
>>>
>>> A diff from the previous version is available at:
>>> https://urldefense.com/v3/__https://www.ietf.org/rfcdiff?url2=draft-ietf-tsvwg-ecn-l4s-id-17__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9oZoDRib$ [ietf[.]org]
>>>
>>>
>>> Internet-Drafts are also available by anonymous FTP at:
>>> https://urldefense.com/v3/__ftp://ftp.ietf.org/internet-drafts/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9rKXcwvA$ [ftp[.]ietf[.]org]
>>>
>>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               https://urldefense.com/v3/__http://bobbriscoe.net/__;!!LpKI!xeGDaVTRL33QRdX3Tos2AURXirtYtZXHcEP8W5a6OO_m4gWSHU68p06V9gj_uPXC$ [bobbriscoe[.]net]
>>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/