Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S

Jonathan Morton <> Mon, 05 August 2019 10:59 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7E81412017F; Mon, 5 Aug 2019 03:59:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.748
X-Spam-Status: No, score=-1.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id XlLtaJ8QpE6i; Mon, 5 Aug 2019 03:59:52 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::12e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B1D5D12017E; Mon, 5 Aug 2019 03:59:51 -0700 (PDT)
Received: by with SMTP id v16so3717304lfg.11; Mon, 05 Aug 2019 03:59:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=67873c6Ujww9PpG0/vpmOnnCffRtzd0TlT7xDJ/LIrw=; b=COoKF5Yw8XY792TSRXj15/IIKxTJMYR8KRTg3Gz37UiyaYzbhtxjTFiqHNx4EAsLDJ PV1uVhJC5TDg75/yS7HiXO2+imKm1KIQpA59uJ9yrha49XTNxtRU5YR1E3yklW/AUPmx xSO826+T4WzBi8mkUlDkFZ/IKxLgaLUG/UQjqLj/1qETpxoSSnUz3uirthvgbfDj+2eB tCxFlU+S78Ks83kdsv7eukK1ZzSRap3IaxlkTVS2PVpGlvMyi/J39bYwlU8lW9bdDklu fpL2qPjdLYh0DGTeGyQusTu+fnYnPMw0OhIfSpSpHIC5O3UlWt3AgmmxJmFQfA/3kEtH K4TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=67873c6Ujww9PpG0/vpmOnnCffRtzd0TlT7xDJ/LIrw=; b=QQLG0VlT9j0RRpGkoQ+dGxAbtZd5CqzYfQD35SNtt8/wijomjUJUMQtxpXJbR6P/G6 0OSgXmzOdlD1/oWvAM2xrks7EtsayD1JLXzuBFp0vH5A2gbFc8vXpYxMwJXPlPg86wwd dvhg75x+oPUQtS2CkLOai6RDMQjUsi81pS1e+3haR0REjYRzflG5AxuJnCAf6lCclqlb 2NdxLA92s0OFNDHdwrqhMeLSdzG1jR3WdZyAfDOJCSD+OtxTrXbXZADhsb/vt7DvGP1y SOEZqe21DdgEuzBrAIsQMvb76d2XdGGMEe1piEkIq2UY0tOwcxLXXO5CevbMStssbzNn SKLw==
X-Gm-Message-State: APjAAAV5Mu4eE4uFFYt3i94ajvp4rlLZDUsxSSNWYw85JVLOfKr+WNow N6380ZINq+zd+Hg8rV4hYkk=
X-Google-Smtp-Source: APXvYqwXymIPt5q0xWlprL0MchC5E6FJNcIS3kSTv7gxM3VUKR1j1o5kk0KMcp4vuHIG7f66muXL9g==
X-Received: by 2002:ac2:4201:: with SMTP id y1mr8735611lfh.127.1565002789993; Mon, 05 Aug 2019 03:59:49 -0700 (PDT)
Received: from jonathartonsmbp.lan ( []) by with ESMTPSA id d16sm2059625lfn.36.2019. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Aug 2019 03:59:49 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Jonathan Morton <>
In-Reply-To: <LEXPR01MB046306842E5AB407A7BFC6619CDA0@LEXPR01MB0463.DEUPRD01.PROD.OUTLOOK.DE>
Date: Mon, 5 Aug 2019 13:59:47 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <FRXPR01MB0710085AE4AEFA896819AFA69CD90@FRXPR01MB0710.DEUPRD01.PROD.OUTLOOK.DE> <> <LEXPR01MB046306842E5AB407A7BFC6619CDA0@LEXPR01MB0463.DEUPRD01.PROD.OUTLOOK.DE>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
Subject: Re: [tcpm] [tsvwg] [Ecn-sane] ECN CE that was ECT(0) incorrectly classified as L4S
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 05 Aug 2019 10:59:54 -0000

> [JM] A progressive narrowing of effective link capacity is very common in consumer Internet access.  Theoretically you can set up a chain of almost unlimited length of consecutively narrowing bottlenecks, such that a line-rate burst injected at the wide end will experience queuing at every intermediate node.  In practice you can expect typically three or more potentially narrowing points:
> [RG] deleted. Please read , first two sentences. That's a sound starting point, and I don't think much has changed since 2005. 

As I said, that reference is *usually* true for *responsible* ISPs.  Not all ISPs, however, are responsible vis a vis their subscribers, as opposed to their shareholders.  There have been some high-profile incidents of *deliberately* inadequate peering arrangements in the USA (often involving Netflix vs major cable networks, for example), and consumer ISPs in the UK *typically* have diurnal cycles of general congestion due to under-investment in the high-speed segments of their network.

To say nothing of what goes on in Asia Minor and Africa, where demand routinely far outstrips supply.  In those areas, solutions to make the best use of limited capacity would doubtless be welcomed.

> [RG] About the bursts to expect, it's probably worth noting that today's most popular application generating traffic bursts is watching video clips streamed over the Internet. Viewers dislike the movies to stall. My impression is, all major CDNs are aware of that and try their best to avoid this situation. In particular, I don't expect streaming bursts to overwhelm access link shaper buffers by design. And that, I think, limits burst sizes of the majority of traffic.

In my personal experience with YouTube, to pick a major video streaming service not-at-all at random, the bursts last several seconds and are essentially ack-clocked.  It's just a high/low watermark system in the receiving client's buffer; when it's full, it tells the server to stop sending, and after it drains a bit it tells the server to start again.  When traffic is flowing, it's no different from any other bulk flow (aside from the use of BBR instead of CUBIC or Reno) and can be managed in the same way.

The timescale I'm talking about, on the other hand, is sub-RTT.  Packet intervals may be counted in microseconds at origin, then gradually spaced out into the millisecond range as they traverse the successive bottlenecks en route.  As I mentioned, there are several circumstances when today's servers emit line-rate bursts of traffic; these can also result from aggregation in certain link types (particularly wifi), and hardware offload engines which try to treat multiple physical packets from the same flow as one.  This then results in transient queuing delays as the next bottleneck spaces them out again.

When several such bursts coincide at a single bottleneck, moreover, the queuing required to accommodate them may be as much as their sum.  This "incast effect" is particularly relevant in datacentres, which routinely produce synchronised bursts of traffic as responses to distributed queries, but can also occur in ordinary web traffic when multiple servers are involved in a single page load.  IW10 does not mean you only need 10 packets of buffer space, and many CDNs are in fact using even larger IWs as well.

These effects really do exist; we have measured them in the real world, reproduced them in lab conditions, and designed qdiscs to accommodate them as cleanly as possible.  The question is to what extent they are relevant to the design of a particular technology or deployment; some will be much more sensitive than others.  The only way to be sure of the answer is to be aware, and do the appropriate maths.

> [RG] Other people use their equipment to communicate and play games

These are examples of traffic that would be sensitive to the delay from transient queuing caused by other traffic.  The most robust answer here is to implement FQ at each such queue.  Other solutions may also exist.

> Any solution for Best Effort service which is TCP friendly and support scommunication expecting no congestion at the same time should be easy to deploy and come with obvious benefits. 

Well, obviously.  Although not everyone remembers this at design time.

> [RG] I found Sebastian's response sound. I think, there are people interested in avoiding congestion at their access.

> the access link is the bottleneck, that's what's to be expected.

It is typically *a* bottleneck, but there can be more than one from the viewpoint of a line-rate burst.

> [RG] I'd like to repeat again what's important to me: no corner case engineering. Is there something to be added to Sebastian's scenario?

He makes an essentially similar point to mine, from a different perspective.  Hopefully the additional context provided above is enlightening.

 - Jonathan Morton