Re: [tsvwg] ECN encapsulation draft - proposed resolution

Bob Briscoe <ietf@bobbriscoe.net> Mon, 26 July 2021 21:19 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6273C3A12DB for <tsvwg@ietfa.amsl.com>; Mon, 26 Jul 2021 14:19:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1zRvEPUK_kwe for <tsvwg@ietfa.amsl.com>; Mon, 26 Jul 2021 14:19:24 -0700 (PDT)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB3623A12D0 for <tsvwg@ietf.org>; Mon, 26 Jul 2021 14:19:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JnX9UYsWP5gOuBiRy9nxnBVw5ouUE99B7Eo1P6qVNZw=; b=JzDhbs98g9eUBYvvQhZ9ZbTVZt NAoEEKj/wZYwcw2oMjogB3jHQwurZptdd1+39xq1W2O41FhOHF27oJw68WjO01rHeXOkVnyInGOR1 JHi8IJRA8ttP5Ndj3bI2DXKCkiqEKKXDeoLjOMEb4+VH8mgNlpQ/6nF+YOOnXxIORdNhzAl9iiVA8 UKEtnvxS4/R30IzqIUpUleEwiA+OhYWk+Arz9Enzy7U+00OF0PjVgJTTKBU8WnNXWiXbl+jDbY3T+ 8YCGgCtcSiN9pxjP+0WCblmPYGBrcR4u+XRZhJJG7JrKaiaWyllPA0ANOJuczJC9rRNG6qLSur8ub rLPAEwrQ==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:60172 helo=[192.168.1.10]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <ietf@bobbriscoe.net>) id 1m880R-0005N6-Fp; Mon, 26 Jul 2021 22:19:21 +0100
To: Jonathan Morton <chromatix99@gmail.com>, Markku Kojo <kojo@cs.helsinki.fi>
Cc: Donald Eastlake <d3e3e3@gmail.com>, John Kaippallimalil <kjohn@futurewei.com>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <MN2PR19MB40454BC50161943BC33AAAD783289@MN2PR19MB4045.namprd19.prod.outlook.com> <43e89761-d168-1eca-20ce-86aa574bd17a@bobbriscoe.net> <de8d355d-08b6-34fb-a6cc-56755c9a11ee@bobbriscoe.net> <MN2PR19MB4045DB9D2C45066AEB0762DB83259@MN2PR19MB4045.namprd19.prod.outlook.com> <alpine.DEB.2.21.2106021717300.4214@hp8x-60.cs.helsinki.fi> <BE497F82-5452-41A1-943F-7ABD0048C7F9@gmail.com> <56c2887b-5e9e-c2b6-c760-81e2627400a2@bobbriscoe.net> <3a66effa-9269-a9b0-48e8-d48bd46d70d1@bobbriscoe.net> <alpine.DEB.2.21.2106221542420.4160@hp8x-60.cs.helsinki.fi> <D7734F0E-3A3F-4E82-98E3-035B45AC5876@gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <b483b55f-8a82-ef5b-9198-3b32f52d7245@bobbriscoe.net>
Date: Mon, 26 Jul 2021 22:19:18 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <D7734F0E-3A3F-4E82-98E3-035B45AC5876@gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/JJLMRR7lH802KHaVeg4G35vzviE>
Subject: Re: [tsvwg] ECN encapsulation draft - proposed resolution
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2021 21:19:29 -0000

Jonathan,

On 22/06/2021 20:38, Jonathan Morton wrote:
>> On 22 Jun, 2021, at 5:02 pm, Markku Kojo <kojo@cs.helsinki.fi> wrote:
> Up front, let me say that I mainly agree with your approach to this.
>
>> And, if the number of flows is increased, then the time between the marks decreases. So, we need to be careful in understanding what we are modelling. Otherwise, if we just play with the formulae, the outcome is just crap.
> So, rather than playing with formulae, I organised an empirical test over the weekend to see what the effects really are.  That accounts for the long pause before replying.
>
> 	https://sce.dnsmgr.net/results/mss-tests/
>
> The test involves three flows with the same CC algorithm running through a common single-queue, single-instance AQM.  The AQM is Codel as an exemplar of time-domain marking, or PIE as an exemplar of packet-mode marking, both with ECN enabled.  The test was repeated for Reno and CUBIC, as exemplars of standards-track CC, and DCTCP as an exemplar of 1/p response.  The path parameters are 50Mbps throughput, 80ms RTT, sufficient to keep CUBIC out of "Reno compatibility" mode.
>
> The three flows differ in terms of packet and segment size.  Flow 1 is a conventional bulk flow using a full 1460-byte MSS.  Flow 2 reduces this to 730-byte MSS.  Flow 3 uses a 1460-byte MSS, but goes through a path with reduced MTU (with PMTUD disabled) causing fragmentation into two packets per segment, so has the same packet rate at the AQM as Flow 2.

[BB] Can you quantify the MTU? So we know the sizes of the first and 
second of each pair of fragments. I see a legend label later with 1280 
in it. Is that the MTU? It doesn't actually matter to the main outcome, 
I'd just like to be able to double check.

> Fragmentation reassembly is as per RFC-3168.
>
> A well-conditioned congestion system should give each flow roughly equal application goodput on average.  Some allowance can be made for bandwidth lost to headers in the smaller packets, but this is a small effect.  But what do we actually get?

[BB] In /this/ exercise, we are not trying to make flows with different 
packet sizes have the same bit-rate, if they didn't have the same 
bit-rate without fragmentation.

While that might be an interesting research project, it is boiling the 
ocean compared to the task in hand. We are only trying to ensure that 
flows that are reframed at a lower layer, then reassembled, end up with 
the same rate as if they had not been reframed  then reassembled (modulo 
any slight differences due to differences in total header sizes, as you 
say). That is the art of designing an experiment that only changes the 
one parameter of interest.


> The theoretically ideal case is DCTCP with time-domain marking, where the expected goodput is exactly equal for all three flows.  The actual observed goodput is not quite equal, but reasonably close and very consistent over a 10-minute period:
>
> 	https://sce.dnsmgr.net/results/mss-tests/dctcp-fq_codel-plot.html

[BB] You started out the email saying CoDel. However, the title of this 
plot (and the file name) says fq_codel. Surely you didn't mean to use FQ 
for this experiment? If the bottleneck includes a /scheduler/ designed 
to give each flow the same bit-rate, it will do that - irrespective of 
packet sizes, fragmentation, framing, or anything else for that matter.

>
> But if we apply packet-mode marking, the smaller packets are markedly disadvantaged compared to larger ones, approximately in proportion to the average packet size, and again this is consistent for 10 minutes straight:
>
> 	https://sce.dnsmgr.net/results/mss-tests/dctcp-pie-plot.html

[BB] This proves that the disadvantage of using small packets has 
nothing to do with fragmentation or reassembly, because both flows 1 & 2 
do not go through fragmentation/reassembly, and flow 2 has half the rate 
of flow 1, because its packets are half the size. This is the well-known 
fact that flows using smaller packets disadvantage themselves. That has 
nothing to do with fragmentation or reassembly.

In fact, it shows exactly what I predicted. Even though flow 3 sends the 
same size packets as flow 1 and is identical in every way, it ends up 
with half the rate, the only difference having been that flow 3 goes 
through RFC3168 fragmentation and reassembly with the AQM marking ECN in 
between.


>
> I do not have running code with which to test a "marked bytes preserving" reassembly rule.  However, such a rule could only possibly influence Flow 3, as that is the only one that's fragmented.  In theory it would be elevated to the level of Flow 1, leaving Flow 2 as the only disadvantaged flow.
>
> You might say that using small MSS is not reasonable for capacity-seeking traffic.  But recall what Paul Vixie pointed out in the other thread about UDP Options, that the MTU over Internet paths could reasonably increase in future.  This implies that "jumbo" TCP segments would, at least for some time, share space with traffic still using today's Ethernet-sized packets, and the latter would be in the analogous position of Flow 2 in this test.  So do not dismiss Flow 2 as irrelevant to your interests.
>
> Moving on to Reno, we see roughly the same effect, though the large Reno sawtooth means we have to peer through a lot of noise to see the trends.  Nevertheless they are similar to DCTCP over a long average:
>
> 	https://sce.dnsmgr.net/results/mss-tests/reno-fq_codel-plot.html
> 	https://sce.dnsmgr.net/results/mss-tests/reno-pie-plot.html
>
> And further likewise for CUBIC:
>
> 	https://sce.dnsmgr.net/results/mss-tests/cubic-fq_codel-plot.html
> 	https://sce.dnsmgr.net/results/mss-tests/cubic-pie-plot.html
>
> These results with standards-track CC confirm those from DCTCP, and the same logic applies.
>
> Current practice on the Internet is to use time-domain ECN marking (because Codel is the most widely deployed AQM with ECN enabled) and RFC-3168 fragment reassembly.  The above results show that this is clearly superior to packet-mode marking with either RFC-3168 or "byte-preserving" reassembly rules.  Hence we should probably try to avoid encouraging the latter behaviour in new Internet specifications.

As above, the PIE results prove exactly what I predicted. And the 
FQ_CoDel results don't say anything about the effect of fragmentation 
and reassembly on flow rate, because flow rate is enforced by the scheduler.

Now, it will be interesting to see what happens if the AQM is CoDel 
rather than FQ-CoDel.


Cheers



Bob

>   - Jonathan Morton
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/