Re: [tsvwg] SCE / L4S and fragmentation (was: Status of ECN encapsulation drafts (i.e., stuck))

Jonathan Morton <> Fri, 13 March 2020 21:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D055F3A1050 for <>; Fri, 13 Mar 2020 14:22:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.849
X-Spam-Status: No, score=-1.849 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id pANRlBJqrntU for <>; Fri, 13 Mar 2020 14:22:17 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6550D3A104D for <>; Fri, 13 Mar 2020 14:22:17 -0700 (PDT)
Received: by with SMTP id j15so9079156lfk.6 for <>; Fri, 13 Mar 2020 14:22:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=LKsMNykTmxQPuy0CXR4zZ2vPe2myBWCbf3D3uJcgREs=; b=Qii/BlEUjmn/fAfcmUQXHOO/57lAbLRvzS39KYdRmRKLvgAx3BcW8hh2x2v5MUWON4 XN1G0L+21tQBYSdDQ/bBCrYSVDCudKmgDLs9si2S0YixawGwrb2+jf57zhIznNzj3aRv M1XSRK6LDoBwd5wggP65/7UuxxwqkTvV9ROjRNQ474mZLHxzC8STO+bKarikHhL2Lqbv qP0TXSBk0xyywVamcuPUkqqZGEp4B7h/KIzc5iOGVz86xSNNwZ/mIOEI/hwZzFG3eB5o 8R70JI766AKDA5UK3OW4lbKxg9Qd0Kpm0XiyWsJvqBd3tcqdYUqPGAvy0v1hmQpgKsLb JyrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=LKsMNykTmxQPuy0CXR4zZ2vPe2myBWCbf3D3uJcgREs=; b=bJmND7JM8TXs6esvrEhUIDdrrxiSrGvpbmQ/v/mMcwKcqjWsKLMraLPPZsoTm6a6BV Xq5QE53IVm68KdvnlXDxNodYt7d8xa75XQSKtz8HgopUYCE+5iZ9jGGXUVvPKwX+d0yO 06KRTBnx9lpo3FVQamTCd/r1e/hyRozcABncWKEPwTusnf9hIQ2uvknVN2U9eZM0TlPy getMJp/zMklX9EbJlhXw3fpmIiGowu6TMq1GWp4D/IO9BllUbiwM+XrlEImzyviBhEdq Ht95IHM+1fQwCXTmXk9ecCSwj2UO6tMaqxa/WN552dqan4oTS9Jp9pq7o9DVyZQb5VuJ y08A==
X-Gm-Message-State: ANhLgQ2qyK93ezo25oQsVgvSiDDUuQej8RnyU3lbApxdK81pGb/wAFCC bt/adqsIS+ulQKBsKWfg9kM=
X-Google-Smtp-Source: ADFU+vs/quzUw7NI2U60Fh24crNTBPqh8Lg88w7RdfPqhhqPkvqLER3ALOXrm3hiR8Sv893E3tuR/w==
X-Received: by 2002:a19:2314:: with SMTP id j20mr9904640lfj.40.1584134535487; Fri, 13 Mar 2020 14:22:15 -0700 (PDT)
Received: from jonathartonsmbp.lan ( []) by with ESMTPSA id m203sm8159634lfa.88.2020. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Mar 2020 14:22:15 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <>
In-Reply-To: <>
Date: Fri, 13 Mar 2020 23:22:12 +0200
Cc: "Black, David" <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <>
To: Bob Briscoe <>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
Subject: Re: [tsvwg] SCE / L4S and fragmentation (was: Status of ECN encapsulation drafts (i.e., stuck))
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 13 Mar 2020 21:22:20 -0000

>> The existing language in RFC-3168 succeeds in preserving the number of CE marks applied to a flow.  Any deficiencies we should consider are in relation to handling the distinction between ECT(1) and ECT(0), as this is what newly becomes significant with both the L4S and SCE proposals.
> * For SCE to give any benefit in the presence of fragmentation, it needs reassembly of ECT1 fragments to be changed. 
> * In contrast, L4S needs no change to reassembly of ECT1 fragments, because it keeps to the RFC3168 transitions.  So there will never need to be a packet consisting of a mix of ECT0 & ECT1 fragments.

I'll agree with this: where the ECN codepoint is set at origin (or, equivalently, before the fragmentation point), the fragments will all carry identical codepoints and will be reassembled correctly under RFC-3168 rules.  These rules also handle the case of CE marks being set on IP fragments on the tunnel path.  The case of marking with ECT(1) is indeed left undefined, which is undesirable.

> Before going on, I understand fragmentation in both IPv4 and IPv6 is better to be avoided by using PMTUD. But I believe fragmentation is still used, particularly where an IPv4 router ignores the DF flag, which I believe is particularly prevalent with tunnels.

> DF does not exist in IPv6.

I think it would be more accurate to say that IPv6 behaves as though DF is *always* set.  There is no defined mechanism for a router to perform fragmentation.  Fragmented packets may be generated *at source* to eg. convey an oversize datagram; TCP would just reduce the MSS to fit, when notified of a reduced MTU on the path.  So I don't consider IPv6 fragmentation to be a significant problem.

Of course, you could end up tunnelling IPv6 packets through a fragmented IPv4 scheme.  But that mathematically reduces to the existing problem of fragmenting IPv4.

> Explanation of the issue:
> An SCE AQM changes some ECT0 markings to ECT1. So, when these packets happen to be fragments (having already been fragmented), there will usually be a mix of ECT1 and ECT0 fragments to be reassembled into a packet. RFC3168 identifies this case and explicitly says it does not specify what to do, which would require a new specification. So current behaviour will be implementation-dependent.
> IPv6:
> If fragmentation is needed, IPv6 fragments packets at source, and reassembles at the receiver.
> So, for SCE to provide any benefit if the IPv6 source is fragmenting, the receiver implementation of IPv6 will need to be updated (once a spec has been written, agreed and approved).
> Until now, I thought that, at least in QUIC, SCE would get feedback without changing the receiver. But, if the sender is fragmenting, the receiver's IPv6 layer will need to be changed as well. It's possible some receiver implementations might happen to do roughly the right thing (once we agree what the right thing is).
> IPv4:
> RFC3168 advises to set the DF flag if a mix of ECT0 and ECT1 is expected.
> However, many IPv4 tunnels ignore DF and fragment anyway, using "outer fragments" [draft-ietf-intarea-tunnels].
> Therefore, the IPv4 reassembly behaviour will need to be specified. Then this ECT1 reassembly during tunnel decapsulation will need to be implemented.

ISTR, at some point in the past, interim language was suggested which would require taking the ECN codepoint from one of the fragments constituting the packet, with the behaviour being otherwise unspecified except by the existing rules.  This would be a worthwhile improvement from SCE's point of view, and is likely to match at least some existing implementations.

An implementation which performs a bitwise-OR across the ECN fields of the fragments would effectively convert partial SCE marks into CE marks (as the ECT codepoints are 01 and 10).  This is less than ideal, but at least some form of congestion control is maintained by this.

RFC-3168's existing recommendation to set DF seems like a good one to me, and is effectively automatic with IPv6.  Tunnels which perform outer fragmentation should be fixed to implement MTU Discovery support instead.  That seems like the easiest fix to me.

> As well as IP-in-IP, and IPSec, here's a list of the 14 IP-shim-(L2)-IP encapsulations that are widely deployed and whether they comply with RFC6040 (needed for SCE tunnel decap). To support SCE, they will also need fragment reassembly to be specified and implemented.

The list of 14 appears to be missing.

 - Jonathan Morton