Re: [tsvwg] [Int-area] 2nd TSVWG WGLC on ecn-encap-guidelines andrfc6040-update-shim drafts, closes 6 May 2019

Bob Briscoe <> Sat, 27 July 2019 01:53 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9771A1201F2; Fri, 26 Jul 2019 18:53:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hSMh-9lHwAjl; Fri, 26 Jul 2019 18:53:33 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id D6AE21201EF; Fri, 26 Jul 2019 18:53:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=yChLoGQHoMritq6JOkowuMW4tlaKXz5slJ8LCI9E6g0=; b=QtYTG5GDPA8xcsIF5AiipBF6sp CzU1O6ZceGvTlBuRXU56q/fJMiIuYCxnuPZPlucWf9aagibhdcoOpuSHSsyGs8rIMstNsQBKLxDTE y5CghzOUOJF07Y4BMxUfVhjnrAaxfR42pjYEKCR3xXtGCdQaw1VMWpDKhO7MdzvEoF7Xj+qknArXh 9ggqrZgQbKnK9gqmDmhlcNaFA0MDMQBs/80Q0YtkA5Pao4nY8+31/5lueFn8cePRsfw0nTYvXr+5P JV222Vfp2y9WWE0cH5j5q573FneyG3axhQLxGGhTplZZ4/Im1i9FfLyZn8QTgTFQCc/xNTQl24TVG albrQb7g==;
Received: from [] (port=45090 helo=[]) by with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <>) id 1hrBtn-0006Gq-TN; Sat, 27 Jul 2019 02:53:28 +0100
To: Markku Kojo <>
Cc: Joe Touch <>, David Black <>, "" <>, tsvwg <>
References: <> <> <> <> <>
From: Bob Briscoe <>
Message-ID: <>
Date: Fri, 26 Jul 2019 21:53:25 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=iso-8859-7; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
Archived-At: <>
Subject: Re: [tsvwg] [Int-area] 2nd TSVWG WGLC on ecn-encap-guidelines andrfc6040-update-shim drafts, closes 6 May 2019
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 27 Jul 2019 01:53:39 -0000


On 25/07/2019 08:50, Markku Kojo wrote:
> Hi Bob, all,
> catching up ...
> The justification for the logical OR in RFC 3168 is not because TCP 
> only reacts to one ECN mark per RTT. Instead, it ensures that no 
> congestion signal is lost (that is MUST in RFC 3168), i.e., CE is 
> delivered to end hosts for ECN-capable traffic equivalent to drop for 
> non-ECN-capable traffic (if one fragment for a non-ECN-capable flow 
> gets dropped by an AQM router, all fragments of the packet get dropped 
> at reassembly).
> I'm very concerned if the reassembly behavior is changed as proposed 
> (e.g., for tunnels), because it makes it impossible for RFC3168-based 
> ("classic") ECN traffic to follow the leading guidelines of RFC 3168 
> for fair co-existence of ECN-capable and non-ECN-capable traffic in 
> the presence of such tunnels.
When packets are reassembled, there are less unmarked packets as well as 
less marked packets.
This is not intuitive for most people, so I recommend people do not only 
think about it intuitively.

Yes, I know RFC3168 requires the logical OR approach. It was me that 
suggested it (see draft-ietf-tsvwg-ecn-ip-00 from 2001). But I was wrong 
(in my defence, at least I said at the time that a more accurate 
approach would be preferred).

I am not proposing that we should use rfc6040-update-shim to correct 
RFC3168. The WG chairs might chose to, but I doubt it - I suspect that's 
mission creep. But at least we can ensure tunnels and lower layers are 
more correct than RFC3168 reassembly (I've only said "SHOULD" to allow 
less precise approaches as well).

Example: Imagine every packet is fragmented into two fragments. And the 
outers of 1% of fragments were marked within the tunnel. By the logical 
OR approach, that would lead to nearly 2% of reassembled packets marked 
(actually 1.99%, because some packets would be reassembled from 2 marked 

Certainly, TCP responds to 1 or more marks within a RTT, so marks within 
the same RTT will not make any difference to TCP, but at low marking 
levels, roughly doubling the marking probability will nearly double the 
number of TCP round trips with a mark in them (again the exact 
proportion depends on the window and the marking probability). This will 
cause utilization to reduce to about 1/sqrt(2) = 71% (assuming it was 
100% before).

Not only is the logical OR approach incorrect and harmful to TCP 
performance, it is also just as harmful to other congestion controllers 
that use the proportion of marked or dropped packets, such as rmcat, etc.

> Moreover, it begs for justification why two ECN-capable flows (A and 
> B) that share the same ECN-enebled bottleneck within a tunnel should 
> get different ECN-marking behavior, when flow A gets its packets 
> fragmented before the tunnel and flow B within the tunnel but before 
> the common bottleneck. And fragment sizes for A and B are (roughly) 
> equivalent.
By the above arguments, the byte-preserving approach is not different - 
quite the opposite - it results in the same marking probability. The 
logical OR approach would result in greater ECN marking for B.

See also


PS. Of course, fragmentation adds more bytes, because it adds more 
headers. So if a link within a tunnel is congested, the same rate of 
data sent in fragments will cause more congestion at this link than 
full-sized packets would have. This increase in congestion is 
appropriate. The byte-preserving recommendation that I suggest ensures 
that it is not accompanied by further inflation because of the logical OR.
> Cheers,
> /Markku
> On Mon, 8 Jul 2019, Bob Briscoe wrote:
>> Joe,
>> Following up my email to you in May quoted further down, you made me 
>> realize that RFC6040 did not
>> address what to do with ECN during fragmentation and reassembly. So 
>> I've added the following to
>> my local copy of draft-ietf-tsvwg-rfc6040-update-shim (to be posted 
>> later today), which recently
>> went through TSVWG last call, and will imminently be last called on 
>> various int-area lists, I
>> believe.
>> These are quite significant updates to outer fragment processing at 
>> the tunnel egress. But, given
>> something has to be said, I can't think of a better way (see the 
>> original quoted email about why
>> the logical OR of the ECN codepoints as defined in RFC3168 is no 
>> longer sufficient - and it's no
>> simpler anyway).
>> 5.  ECN Propagation and Fragmentation/Reassembly
>>    The following requirements update RFC6040, which omitted handling of
>>    the ECN field during fragmentation or reassembly.  These changes
>>    might alter how many ECN-marked packets are propagated by a tunnel
>>    that fragments packets, but this would not raise any backward
>>    compatibility issues:
>>    If a tunnel ingress fragments a packet, it MUST set the outer ECN
>>    field of all the fragments to the same value as it would have set if
>>    it had not fragmented the packet.
>>    As a tunnel egress reassembles sets of outer fragments
>>    [I-D.ietf-intarea-tunnels] into packets, it SHOULD propagate CE
>>    markings on the basis that a congestion indication on a packet
>>    applies to all the octets in the packet.  On average, a tunnel egress
>>    SHOULD approximately preserve the number of CE-marked and ECT(1)-
>>    marked octets arriving and leaving (counting the size of inner
>>    headers, but not encapsulating headers that are being stripped).
>>    This process proceeds irrespective of the addresses on the inner
>>    headers.
>>    Even if only enough incoming CE-marked octets have arrived for part
>>    of the departing packet, the next departing packet SHOULD be
>>    immediately CE-marked.  This ensures that CE-markings are propagated
>>    immediately, rather than held back waiting for more incoming CE-
>>    marked octets.  Once there are no outstanding CE-marked octets, if
>>    only enough incoming ECT(1)-marked octets have arrived for part of
>>    the departing packet, the next departing packet SHOULD be immediately
>>    marked ECT(1).
>>    For instance, an algorithm for marking departing packets could
>>    maintain a pair of counters, the first representing the balance of
>>    arriving CE-marked octets minus departing CE-marked octets and the
>>    second representing a similar balance of ECT(1)-marked octets.  The
>>    algorithm:
>>    o  adds the size of every CE-marked or ECT(1)-marked packet that
>>       arrives to the appropriate counter;
>>    o  if the CE counter is positive, it CE-marks the next packet to
>>       depart and subtracts its size from the CE counter;
>>    o  if the CE counter is negative but the ECT(1) counter is positive,
>>       it marks the next packet to depart as ECT(1) and subtracts its
>>       size from the ECT((1) counter;
>>    o  (the previous two steps will often leave a negative remainder in
>>       the counters, which is deliberate);
>>    o  if neither counter is positive, it marks the next packet to depart
>>       as ECT(0);
>>    o  until all the fragments of a packet have arrived, it does not
>>       commit any updates to the counters so that, if reassembly fails
>>       and the partly reassembled packet has to be discarded, none of the
>>       discarded fragments will have updated any of the counters.
>>    During reassembly of outer fragments [I-D.ietf-intarea-tunnels], if
>>    the ECN fields of the outer headers being reassembled into a single
>>    packet consist of a mixture of Not-ECT and other ECN codepoints, the
>>    packet MUST be discarded.
>>    A tunnel end-point that claims to support the present specification
>>    MUST NOT use an approach that results in a significantly different
>>    ECN-marking outcome to that defined by the "SHOULD" statements
>>    throughout this section.  "SHOULD" is only used to allow similar
>>    perhaps more efficient approaches that result in approximately the
>>    same outcome.
>> Bob
>> On 16/05/2019 22:14, Bob Briscoe wrote:
>>       Joe,
>>       Sorry I missed this posting at the time (my mail filters moved 
>> both cross-postings
>>       into my int-area box which I check only rarely).
>>       On 27/04/2019 18:13, Joe Touch wrote:
>>       Cross-posting to let both communities know:
>> - it would be useful for these documents to address how fragmentation 
>> and reassembly
>> affects these signals
>> (esp. if reassembling fragments with different ECN values)
>> [BB] This is addressed by the re-framing section in 
>> ecn-encap-guidelines, altho it doesn't
>> give examples of what might have caused frame boundary misalignment, 
>> so fragmentation is
>> not specifically mentioned. I think I will add an explicit mention of 
>> fragmentation (if
>> only so a search finds that section).
>> Actually I've realized that this highlights an inconsistency between 
>> the advice on ECN and
>> fragment reassembly in RFC3168 and in ecn-encap-guidelines.:
>>  *  RFC3168 requires that the ECN marking of a reassembled packet is 
>> the logical OR of the
>>     ECN marks on the fragments,
>>  *  whereas ecn-encap-guidelines recommends marking the same number 
>> of outgoing as incoming
>>     octets when reassembling L2 frames or tunnelled packets with 
>> different boundaries -
>>     using a simple counter to track the balance.
>> In fact, it was the review of RFC3168 by me and Jon Crowcroft back in 
>> 2001 that originally
>> raised the question of how to handle reassembly of ECN-marked 
>> fragments. I'll quote a
>> passage from the review, which I think justifies the recommendation 
>> in ecn-encap-guidelines
>> to count marked bytes, rather than use the logical OR of RFC3168:
>> To use the logical OR of the marking of all fragments might be a 
>> pragmatic
>> solution, particularly for congestion control protocols like TCP 
>> where one
>> loss per round trip is treated identically to many. However, it is 
>> becoming
>> more common to see large numbers of packets per round trip time as data
>> rates increase while packet sizes and the speed of light haven't 
>> increased
>> for many years. Therefore it is to be expected that newer congestion
>> control protocols might take more accurate account of the number of 
>> packets
>> marked in a round trip. Hence, the inaccuracy of a logical OR during
>> re-assembly at the IP layer is best avoided.
>> I'm not too worried about the inaccuracy of using a logical OR, but I 
>> think it best to
>> recommend more accurate and no less costly counting. The only 
>> justification for the logical
>> OR was that TCP only reacted to one ECN mark per RTT. But that is 
>> changing now, and the
>> behaviour of one transport should not be embedded in lower layers 
>> anyway.
>>       - it would be useful for these documents to consider 
>> draft-ietf-intarea-tunnels
>>       (which relates to the above) and its discussion on many of the 
>> protocols cited
>> I can't find anything in draft-ietf-intarea-tunnels that ought to be 
>> cited from
>> ecn-encap-guidelines or rfc6040-update-shim. Did you have something 
>> specific in mind?
>> I do want to raise a question about the following sentence, which 
>> precedes the mention of
>> ECN:
>>    There are exceptions to this rule that are explicitly intended to
>>    relay signals from inside the tunnel to the network outside the
>>    tunnel, typically relevant only when the tunnel network N and the
>>    outer network M use the same network.
>> Was that last word meant to say "network protocol"?
>> Then, if that is what you meant, I would disagree. Many different 
>> network protocols include
>> concepts similar to Diffserv and/or ECN (e.g. IEEE802.1p, MPLS and 
>> TRILL support both,
>> etc), and there's rarely a reason /not/ to propagate the concept 
>> between different network
>> protocols when they encapsulate each other, even tho it's not always 
>> straightforward to do
>> so.
>> Bob
>> Bob
>> Joe
>>       On Apr 26, 2019, at 1:50 PM, Black, David 
>> <> wrote:
>> This may be of interest to INT folks who are interested in tunnels and
>> encapsulations.
>> Comments by the WGLC deadline are encouraged, but comments after the 
>> deadline
>> are ok, as they’d have to be dealt with anyway at IETF Last Call.
>> Thanks, --David
>> From: tsvwg <> On Behalf Of Black, David
>> Sent: Wednesday, April 17, 2019 2:51 PM
>> To:
>> Subject: [tsvwg] 2nd WGLC on ecn-encap-guidelines and 
>> rfc6040-update-shim
>> drafts, closes 6 May 2019
>> This email announces a 2nd TSVWG Working Group Last Call (WGLC) on 
>> two drafts:
>> [1] Guidelines for Adding Congestion Notification to Protocols that
>>                              Encapsulate IP
>>                 draft-ietf-tsvwg-ecn-encap-guidelines-12
>> This draft is intended to become a Best Current Practice RFC
>> [2] Propagating Explicit Congestion Notification Across IP Tunnel 
>> Headers
>>                           Separated by a Shim
>>                  draft-ietf-tsvwg-rfc6040update-shim-08
>> This draft is intended to become a Proposed Standard RFC.
>> This WGLC will run through the end of the day on Monday, May 6, 2019.
>> Comments should be sent to the list, although purely
>> editorial comments may be sent directly to the author. Please cc: the
>> WG chairs at  if you would like the chairs to
>> track such editorial comments as part of the WGLC process.
>> No IPR disclosures have been submitted directly on either draft
>> Thanks,
>> David, Gorry and Wes
>> (TSVWG Co-Chairs)
>> _______________________________________________
>> Int-area mailing list
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                     
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                     

Bob Briscoe