[iccrg] On RFC3168 ECN signaling in TCP implememtations

"Scheffenegger, Richard" <rs.ietf@gmx.at> Mon, 13 January 2020 14:44 UTC

Return-Path: <rs.ietf@gmx.at>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 913D31200D6; Mon, 13 Jan 2020 06:44:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id Q6sEYuJKYyFm; Mon, 13 Jan 2020 06:44:06 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F0E9D120127; Mon, 13 Jan 2020 06:44:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1578926642; bh=T20/Ul3HMaceeKeWiEZ6aOzFXkTksK+5QEOgPpLMqms=; h=X-UI-Sender-Class:To:From:Subject:Date; b=GS002NhLWpzwEdM0oiroQCKET3FJldUbeLC376PxrQ8Wx4+lY84rRxFyRikM68vsW c2A+GT0ZC9ybg52W98EARMl6DC6Adejj9BNHwIqWmB9Puc9BK2yVvczwl0oazH4fSp +scOVbiQ30MVwnnJLKON6Ik+VSM+kTimxIfhAtSE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [] ([]) by mail.gmx.com (mrgmx005 []) with ESMTPSA (Nemesis) id 1N79yQ-1jiuJ41CaS-017WnE; Mon, 13 Jan 2020 15:44:02 +0100
To: "tcpm@ietf.org" <tcpm@ietf.org>, iccrg IRTF list <iccrg@irtf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
From: "Scheffenegger, Richard" <rs.ietf@gmx.at>
Message-ID: <9743ac1c-9909-9ec9-9df4-53cd108f9baa@gmx.at>
Date: Mon, 13 Jan 2020 15:44:00 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:5NQIAb2/YumjEnde4QZh3gWCDjrROjEXguuBYvskfT6I/YHp81O 5oXDXQRR6nCZeuWfky7NS/BKkUCAkj64tWi/5qX3DhCnpcGo5Mh+e9x9uRVsDtWQwlBskpw pMFBnaSdEzoPnmbCpPKHSyoDggkbSBeDZOp72c4fMrtXFUNGkcQlND2OanAZbteqAc94wNL eWehQgWZmk7Ey+PFOskAg==
X-UI-Out-Filterresults: notjunk:1;V03:K0:BheClyc03Zk=:bceN9sZwdsoSFtvfMe5uxg N3lgeWZXVXIIpvMJTXulbBhFaIdhWdmj4SX7LyVr9EjFFScdNQC5dD5eJMM4d0Gq550oosP5A J71RV0bhlhO2q7K3JeBw6dnaduaMz6p4moEi5kTbWtCqDGpKCAvozml+A9qXXCWIQ9HnVCrHA DSG8is3h3r2PvwcOaghTnEYQZsreuUUqWSMHE/7yhJBc5j5kAnOZDTnVsZOFTKHCAPbErx6iI gqLetly4BmiTVCVfywSWLKSP0OukiSTqzUsEhmJ6cl6YjE6LPMjTG4O26FNc7e7yetaoyKyza EwN74Pa5Zqit0TFPUUj+ArAvDQ9v0CmKxFk/3BC34r+xbwg31tLX+JepqHfqhA+dKEFKZgd5F ymTbRU+Aum+4l9fvaTKUIGc0dfCitUaI2OctajAo3V6AJ7t52mYIkd8YyszqWKWYrA9NBair9 EM751cBo/AI+Up4ZuJl1abV76skYwBev2w1n0yZWqJb/1PZAQj5QcxExlQAWUnIZLZ1wsrLjS CZDbGSoSdORuzT5PxFAkwpUXxcH7Xui2ytvTlOjNCntVCI73rEZHe4OGp1XvlyW3u4Q/0q5BZ 9ks2Nn3/bFVLJDY/lMMQwMeyA8Qxob7Nhz5ZAoUsAH+oSQT6l6fp3M4IJlC7xWO6ogRS3Befc XE82BbtkCZPxAaQGWEA2dUrs4pnvYkSxIlf51XlOXfonWxuFBtwcTBQJBZwcQKUeez+ZqEm2j XRMDNOGuCw7NCp4aI6z8WDL++KElP5MxYDMImnOOGMw5+M1Ey27IanNxlezBB1JHgpqILhL+T 5sjBZzFUH97PGQ0h95ARJfuytUM7eRU8g3s25OwjVxFwmJ1V8cJMz0I9+dkd81lvaeKd+hIrs YMpAsSNg063SJB9KIz9t+Tj08Kl3CGX+bLW49a4CpCvbplEk872AGIHUZ2Ej8paWocbMpNya+ EFLmGb+zeeSyDYjTp98Jy8fLqzW2nV7Zkv3l0ag0unS2DnpS2pEePb9GNJmh7lDEcOfVvRDAF 0/Ds8Xopn7El+jkl1Z3aXbfc2WyF91BJQA4VIOD88DW8W2ojAVoJelE+jj1ByOgbXiMHTv9v0 ky5QDFJuaIIuNUU06dcGc4m47QU2x39wVcV2BNhLrNvAYQ5iDjruMSDCWA8Wvf3ZRON9dI+LK eerxFYAmfkPaoNFAWXd2rCpnXcK3UFIiXiEIAS8thj80kipYVjr04XOe0rqAAr2nko1lLFSES tmHLhgM4hu8eJuqv62UPAcB+47Cm/MiwbVRE+oQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/XUEq2PcbY5M62gqpu-J3bjYMePU>
Subject: [iccrg] On RFC3168 ECN signaling in TCP implememtations
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jan 2020 14:44:09 -0000

Hi group,

I just wanted to report a few findings that Neil and I found while
looking into corner cases of the TCP header RFC3168 ECN flags signaling.

tl;dnr: Please confirm, that the TCP RFC3168 header flags are
independent from the IP ECT codepoint. And that CWR shall be sent
immediately whenever cwnd is reduced, including during RTO events (for
ECN-enabled sessions across a loss-only congestion point).

First, there is an implementation oversight in FBSD, for ECN+SACK
enabled sessions. While RFC3168 is quite strict in stipulating, that all
retransmissions (and control packets) are not to be sent with the
ECN-capable-transport Codepoint in the IP header, this is only true for
non-SACK loss recovery at the moment in that stack.

I have mentioned this on the tcpm list already with the subject ECN++.

Second, I found that FreeBSD (not checked OpenBSD or NetBSD) is not
setting the CWR (congestion window reduced) flag for
Retransmission-Timeout retransmissions.

This is also problematic in a larger context (which is how we found this
initially) - if the RTO happens due to lost retransmissions, and CE is
set during the same window - while loss recovery is happening - the lack
of the CWR bit will keep the latched ECE flag on the receiver to remain set.

Thus, the ACK for the RTO retransmission is returned with ECE still set
- preventing the growth of cwnd in this first RTT after RTO. On top, the
RTO has a 0.5 probability to run into the delayed ACK timeout. The 1st
packet thereafter, sent with the ECE-marked ACK, will have a probabiliy
of 1.0 to wait for an delayed ACK timeout.

In that cornercase (two different congestion points, one with loss-only
and one with ECN), the above reults in very sluggish recovery from an
RTO, since the cwnd is also set to 1 MSS when the RTO happens.

While discussing this with Neal, he found that there exists another
oversight on the Linux side. Apparently, TCP ECN header flags are only
set when the IP packet of the segment is also ECT-marked.

With other words, during RTO the Linux stack does want to send out the
CWR bit (as we believe is the correct behavior), but doesn't because
that RTO retransmission is sent without the ECN-capable-transport
codepoint in the IP header.

Effectively, a similar scenario as in the FreeBSD plays out - the RTO is
sent, doesn't clear the latched ECE flag on the receiver; the ACK to the
RTO retransmission still carries an ECE flag, to which the stack then
responds by reducing cwnd once more (to 1 segment), setting CWR only
then (or when the first new data segment is transmitted).

We wanted to confirm if our understaning of RFC3168 signaling is
correct, and what the expected interaction between RFC3168 signaling
with loss / RTO should be.

Best regards,