Re: [tsvwg] SCE / L4S and fragmentation

Bob Briscoe <> Sun, 15 March 2020 13:28 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 07F533A1646 for <>; Sun, 15 Mar 2020 06:28:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.432
X-Spam-Status: No, score=-1.432 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id DUtwBFA98u6j for <>; Sun, 15 Mar 2020 06:28:16 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B8A753A1642 for <>; Sun, 15 Mar 2020 06:28:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=5gwMicU2C9VVHTOGQVRGoI4wBaf5a8flC6HLZHbk750=; b=RQIzFm/BTs3UKqcid4z7RU7W3 bGAA7KKHeCzWNXkxny0MEn0SQ9437FcxsFqTewDMYONIuN6RUHOoS2hS5k8zUHh73GNj/oSFz3ngp 3mmL3LoCWZXoNQoXX9KZvBrGdFfnjt0xLdpES67cMNRY4YLLfNwmLcPZ3IDbe4i0YD7OzMj/bloQc NEt3yaYBsu4gZqkV9qNuwkqqTKag+u5dMHKnKhQCBtJOXl6eXxwsdf9rxg1mM39uUJWwAgctSxn8E ey26B8hImUmA3FAXUN419CcQUymk1NLUA30jvktha4L+XiJ8/bpi7YrKMX/fRBRU+1OzzDma0sQQa 2CoO2800Q==;
Received: from [] (port=40302 helo=[]) by with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <>) id 1jDTJM-002Wv5-Bd; Sun, 15 Mar 2020 13:28:12 +0000
To: Jonathan Morton <>, "" <>
References: <> <> <> <> <>
From: Bob Briscoe <>
Message-ID: <>
Date: Sun, 15 Mar 2020 13:28:11 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------3C27594555D8B26C5E4A7F8E"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
Archived-At: <>
Subject: Re: [tsvwg] SCE / L4S and fragmentation
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 15 Mar 2020 13:28:19 -0000


I would like to draw the list's attention to this email, which is a 
classic example of the way Jonathan Morton spins untrue FUD (fear 
uncertainty and doubt) over L4S, even when the failure mode under 
discussion is actually a downside of SCE, which L4S was deliberately 
designed to avoid.

Whatever,... now to you Jonathan,

It's hardly necessary to be so underhand. Even without fragmentation, 
SCE markings applied within a tunnel usually won't reach the receiver 
anyway, as I've explained elsewhere. So in the sub-case of 
fragmentation, if SCE markings also don't survive reassembly, 
SCE-in-a-tunnel is just something that was already broken, getting 
broken again.


On 13/03/2020 21:22, Jonathan Morton wrote:
>>> [JM] The existing language in RFC-3168 succeeds in preserving the number of CE marks applied to a flow.  Any deficiencies we should consider are in relation to handling the distinction between ECT(1) and ECT(0), as this is what newly becomes significant with both the L4S and SCE proposals.
>> [BB]
>> * For SCE to give any benefit in the presence of fragmentation, it needs reassembly of ECT1 fragments to be changed.
>> * In contrast, L4S needs no change to reassembly of ECT1 fragments, because it keeps to the RFC3168 transitions.  So there will never need to be a packet consisting of a mix of ECT0 & ECT1 fragments.
> [JM] I'll agree with this: where the ECN codepoint is set at origin (or, equivalently, before the fragmentation point), the fragments will all carry identical codepoints and will be reassembled correctly under RFC-3168 rules.  These rules also handle the case of CE marks being set on IP fragments on the tunnel path.  The case of marking with ECT(1) is indeed left undefined, which is undesirable.

[BB] You rewrite my description of a downside of SCE in woffle-language 
that pads out the scenarios with irrelevant cases and obfuscates by 
removing all mention of the words SCE or L4S. Then you snip (reinstated 
below) where I asked you to retract your untrue assertion that L4S 
suffers from this as well.

>> [BB] Please confirm that this is only an issue with SCE, not L4S.
>> And please do not cast doubt over L4S by saying this affects L4S when 
>> it does not.
[BB] I just knew you were going to snip this. You're so predictable.

Moving on...

>> Before going on, I understand fragmentation in both IPv4 and IPv6 is better to be avoided by using PMTUD. But I believe fragmentation is still used, particularly where an IPv4 router ignores the DF flag, which I believe is particularly prevalent with tunnels.
>> DF does not exist in IPv6.
> [JM] I think it would be more accurate to say that IPv6 behaves as though DF is *always* set.  There is no defined mechanism for a router to perform fragmentation.  Fragmented packets may be generated *at source* to eg. convey an oversize datagram; TCP would just reduce the MSS to fit, when notified of a reduced MTU on the path.  So I don't consider IPv6 fragmentation to be a significant problem.

[BB] You've conveniently side-stepped oversize UDP datagrams over IPv6, 
where the IPv6 source has to do the fragmentation, and IPv6 reassembly 
logic at the destination would need *new logic* to propagate the SCE 

I drew attention to this in the IPv6 section of the email below (which 
you didn't respond to).

> Of course, you could end up tunnelling IPv6 packets through a fragmented IPv4 scheme.  But that mathematically reduces to the existing problem of fragmenting IPv4.
>> Explanation of the issue:
>> An SCE AQM changes some ECT0 markings to ECT1. So, when these packets happen to be fragments (having already been fragmented), there will usually be a mix of ECT1 and ECT0 fragments to be reassembled into a packet. RFC3168 identifies this case and explicitly says it does not specify what to do, which would require a new specification. So current behaviour will be implementation-dependent.
>> IPv6:
>> If fragmentation is needed, IPv6 fragments packets at source, and reassembles at the receiver.
>> So, for SCE to provide any benefit if the IPv6 source is fragmenting, the receiver implementation of IPv6 will need to be updated (once a spec has been written, agreed and approved).
>> Until now, I thought that, at least in QUIC, SCE would get feedback without changing the receiver. But, if the sender is fragmenting, the receiver's IPv6 layer will need to be changed as well. It's possible some receiver implementations might happen to do roughly the right thing (once we agree what the right thing is).

recommends (but does not mandate) that QUIC implementations clamp the 
max datagram size. Ideally it recommends using PLPMTUD as well.

The latest draft on RTP feedback over UDP mentions MTU issues but is not 
as concrete as QUIC:

>> IPv4:
>> RFC3168 advises to set the DF flag if a mix of ECT0 and ECT1 is expected.
>> However, many IPv4 tunnels ignore DF and fragment anyway, using "outer fragments" [draft-ietf-intarea-tunnels].
>> Therefore, the IPv4 reassembly behaviour will need to be specified. Then this ECT1 reassembly during tunnel decapsulation will need to be implemented.
> ISTR, at some point in the past, interim language was suggested which would require taking the ECN codepoint from one of the fragments constituting the packet, with the behaviour being otherwise unspecified except by the existing rules.  This would be a worthwhile improvement from SCE's point of view, and is likely to match at least some existing implementations.

[BB] The language was in the previous revision:

It was for PCN (pre-congestion notification), which is a standards track 
set of IETF RFCs that were implemented but not deployed, AFAICT. I 
removed all that text because you objected to the same language for CE. 
I generalized it into a requirement to achieve an externally observable 
effect, rather than saying how to do it.

I would be happy to reinstate it. But you can't have it both ways: you 
want standards track RFCs to cater for SCE, which isn't even chartered 
IETF work, but not to cater for other standards track RFCs.

You do not seem to be aware of the other WGs and other standards bodies 
(3GPP, IEEE802.1) that have been involved in this draft, and who are 
waiting for it to go to RFC (including an RFC being held in the RFC 
Editor's queue).

> An implementation which performs a bitwise-OR across the ECN fields of the fragments would effectively convert partial SCE marks into CE marks (as the ECT codepoints are 01 and 10).  This is less than ideal, but at least some form of congestion control is maintained by this.

[BB] We're talking about a standards track RFC-to-be here. You're asking 
for it to includes requirements to support SCE that isn't even chartered 
IETF work. If you want the IETF to give a signal to implementers of 
large numbers of existing layering protocols to update their code for 
you, you have to do a lot of work convincing people first.

> RFC-3168's existing recommendation to set DF seems like a good one to me, and is effectively automatic with IPv6.  Tunnels which perform outer fragmentation should be fixed to implement MTU Discovery support instead.  That seems like the easiest fix to me.
[BB] If you mean MTU Discovery between the tunnel endpoints, that 
doesn't help when a packet arrives that is too large for the discovered 
MTU. So you must mean supporting MTU Discovery by the endpoints (by the 
tunnel ingress sending a PTB ICMP). But most networks black-hole these 

So I can only conclude that you're masquerading as an expert again, and 
we should just move on. I wouldn't mind if you couched your objections 
with "I'm not sure but,..." or whatever.

Anyway, this is not how IETF RFCs work. They have to specify what to 
implement, in order to interoperate with what exists. Not what 
implementers and operators should have done. The only grey area is when 
what exists is non-compliant. But if non-compliance is widespread, it 
has to be catered for.

>> As well as IP-in-IP, and IPSec, here's a list of the 14 IP-shim-(L2)-IP encapsulations that are widely deployed and whether they comply with RFC6040 (needed for SCE tunnel decap). To support SCE, they will also need fragment reassembly to be specified and implemented.
> The list of 14 appears to be missing.

[BB] Sorry [got distracted]:


>   - Jonathan Morton

Bob Briscoe