Re: [tsvwg] start of WGLC on L4S drafts

Sebastian Moeller <moeller0@gmx.de> Fri, 05 November 2021 11:31 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B23E93A0D3C for <tsvwg@ietfa.amsl.com>; Fri, 5 Nov 2021 04:31:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.686
X-Spam-Level: *
X-Spam-Status: No, score=1.686 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i9larZ_X3Tl2 for <tsvwg@ietfa.amsl.com>; Fri, 5 Nov 2021 04:31:48 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 194543A0D3B for <tsvwg@ietf.org>; Fri, 5 Nov 2021 04:31:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1636111857; bh=RZMYjgUpOObEuIeKp8pv12Qp2pr7UquywVxhxs1WcG8=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=jcW0nsTBMet7/TsA6zpy6WKo004AhMFEorovZ/S5BMlvFxMhGkJ+o5MVfDWdAghDK /q6i196STT3ywZnOJ1ka2uNJ/yRwPlkdqYWwag36WlD1twR/OllhdboKjotboOI9nd IqVzm2fhvS8ou0bLDp69vNWXn5fjE2HMHqb5ZzZE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1McH5Q-1mB0851Z4G-00clOP; Fri, 05 Nov 2021 12:30:57 +0100
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <97851be0-1613-ef26-9b05-0a3c2850a369@erg.abdn.ac.uk>
Date: Fri, 05 Nov 2021 12:30:55 +0100
Cc: Bob Briscoe <ietf@bobbriscoe.net>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <84492633-749F-4C47-A51B-2A5DF36130C3@gmx.de>
References: <7dd8896c-4cd8-9819-1f2a-e427b453d5f8@mti-systems.com> <720c7342212a8569ba072d9b85193fbf01faa1b4.camel@heistp.net> <cdda0f10-047c-9886-2515-7133dd3551c8@bobbriscoe.net> <2126A97D-30C8-4FD4-B138-482EF20CF45F@gmx.de> <97851be0-1613-ef26-9b05-0a3c2850a369@erg.abdn.ac.uk>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Provags-ID: V03:K1:22tkLgxWk8Q02tQ2XcDjmfvbIme7A7aKCrbfatA1+S6D66UzLBS W1gf0EeIDHxr0edo6benWUmxtcbhQhak55aVPNDfusHTOIB6rxQGO7qGxUCJH7xjs3SGIyS /AYYVwGfbR58GgzZ6kD1S/sLomWm9qsHFtKUMqRMll4LsiTNzOk4wDP4tU6EvpJPr7dBTar oDL96e0n+FCo8WCUFr8lQ==
X-UI-Out-Filterresults: notjunk:1;V03:K0:zjvyk1ixN/w=:pyv4CZp8FRpBJyZjy+QXl0 9T2TE2TolKWH/v+czRoNI4//GDUP+xovmC2hZBjsKaDWFE8VlAYLwqX6Pvx2EougU+SG0jsA8 JDteRjqC4ehqg/VggGn3VLDq+MsNIH5yiFkpQZ/lAUpEoMygzVtv3RzucPGe9OUiw5NLqNlDf EtKdjr8cP1ckYFrmzgNPgWHF/Uz2lIfy+yXZiRPabJMGtve7s9nrnP0mAD9t+IrOOZzU+KHFn lYBxijgsurqb37jViPjWFS8JzCIZbkzUHpGlLzOcjVQ4SnFPdxSRsgduIsoj0hpuAlSHIkdzO yivaiId+uY4riMXHX3QGlMbeMIkziUSnHNjsSO1sQSxqcylnSVC3fLgmIznfqVj+xItz96ZV8 7KZPucGp+kR3aAv5X0K71BVsn0XYBvCArQGihjR1zqRwoqwBB1kR7dqSxIjSMXpsYQiPtbTdr TpFAXVXohcBioRFc4vd2/GFiTgeG2keWNXkKgCHrwiprAlfxTYAIHSjdyDuPAaTMujfZYO+Tv ZiGPXJdEQ4av/F055hQ/LJwux4W/uuwPxUfqS8QdMBVKFohhH1jdEB4TlkKFiqG4Zh8ujTv7T nRDEECNzNsce4AsdQmRw13sqjpa8z3cYvCKvV15ahcH44nltt/AR+FisUIUerrilujpXLpd+i mji1PdnGFtHyhIh4dI8mLuATonJWa2o/lZOZqDq27XTwerPQgGZW0WDLOLkiHfshZ5IkDZOgG +U3mXz6iicfb6nULg20I7vtiQTqwTRfd4gWfBMSiOyqMbBdoGTIVH+Wc+W7hFIQH5Tkc0QfL/ 01y+dw9MHhIIpRV8zY00nP6eQFqWNl1R4udq4rVmW1W4hRB4J2SU7yghQJha7TZE3u/kHpUg+ Lo2/9TUUzHKq15ItEIxDqGbUnFk2uH9BO912HA2r6w9wbLIbQPvSyxeQODdKvewVHrIBGmldv W64bkeqjllFbdf3xV6KksSMtVKEavbFObXiok5x3mTW8rNu/sKWbPqcQaAN70H9CYdJPrDAcc sQZRPYxUa7cSyb7E/d6nwxnwLVWrJ7FwGv7bFSIk22AOYePlGWUMFU9uWxNn1BflL6LIMKNrA k10CyyoIwgFqjU=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Qo2bYg1hUXv7c8-gpU2Cm5JweLc>
Subject: Re: [tsvwg] start of WGLC on L4S drafts
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Nov 2021 11:31:54 -0000

Hi Gorry,

thanks.


> On Oct 28, 2021, at 17:42, Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
> 
> I'll quote some words from RFC805.


[SM] https://www.rfc-editor.org/rfc/rfc8085, "RFC 805: Computer Mail Meeting Notes" seems a bit of a stretch ;)


> 
> This notes that transport need to "employ suitable mechanisms to prevent congestion collapse and
> establish a degree of fairness".
> 
> It also makes this recommendation:
>    "Bulk-transfer applications that choose not to implement TFRC or TCP-
>   like windowing SHOULD implement a congestion control scheme that
>   results in bandwidth (capacity) use that competes fairly with TCP
>   within an order of magnitude."

[SM] Question: am I right that we are talking base2-orders of magnitude or base10? I presume base2.

But the core of my question was more about whether a systemic bias of noticeable magnitude (say because L4S flow consistently get higher throughput) is to be considered differently from a random bias (where the fairness between flows depends on the exact conditions and the "winner" is not reliably predictable).


Now, I am also puzzled, if this one order of magnitude fairness is a BCP, why are we considering RFC-ing DualQ which at short RTTs robustly and  reliably introduces imbalances on the order of 1:15 (as in the draft) or 1:10 as in the implementation for Linux, both of which are clearly >> 1:2, and the argument about the added queueing delay for the classic flow does not hold water, because it is DualQ that inflicts that damage on classic flows. (Due to the unfortunate choice of 15ms for the classic delay target and the unfortunate choice of the weights for the weighted priority scheduler, if we agree on the one order of magnitude goal, these two parameters need to be re-visited and the draft needs to spell out this 1:2 fairness goal explicitly as guidance for implementers of a dual queue design).

Regards
	Sebastian

P.S.: If it appears that I am repeating the same arguments over and over again, this is because sadly the same arguments/objections still apply.



> 
> - Which at least to me seems like a starting point for any conversation.
> 
> Gorry
> 
> On 28/10/2021 15:42, Sebastian Moeller wrote:
>> Hi list,
>> 
>> one question below, prefixed [SM].
>> 
>> 
>>> On Oct 26, 2021, at 01:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>> 
>>> Pete,
>>> 
>>> Replying to your points inline [BB] on each draft one at a time, starting with dualq-coupled-16...
>>> 
>>> On 24/08/2021 21:27, Pete Heist wrote:
>>>> # Comments on dualq-coupled-16
>>>> 
>>>> 
>>>>> dualq-coupled Abstract:
>>>>> 
>>>>> Analytical study and implementation testing of the coupled aqm have
>>>>> shown that scalable and classic flows competing under similar
>>>>> conditions run at roughly the same rate.
>>>>> 
>>>> Testing DualPI2 with the defaults showed about a 2x advantage in
>>>> throughput for L4S flows vs non-L4S flows, at the same RTT (see #2
>>>> above). "Roughly the same rate" isn't quantified, but the results seem
>>>> different from that.
>>>> 
>>> [BB] I've changed it to "roughly equivalent", which I think is the briefest way of summarizing loads of results for words in an abstract.
>>> 
>>> 2:1 cannot cause a significantly negative impact [RFC5033] and is well within the "order of magnitude" required in the most recent RFC on this subject [RFC8085] (altho' that was about bulk UDP transfers, I can't see how fairness would depend on which transport protocol is in use).
>> 	[SM] Is there consensus, that a systematic rate-bias of 2:1 is equivalent? I was under the impression that if rates randomly vary in the 2:1 to 1:2 range that would be equivalent, but a robust and reliable advantage of one type over the other would not qualify as equivalent. What am I missing here?
>> 
>> 
>> Regards
>> 	Sebastian
>> 
>>>>> dualq-coupled Abstract:
>>>>> 
>>>>> When tested in a residential broadband setting, DCTCP also achieves
>>>>> sub-millisecond average queuing delay and zero congestion loss under
>>>>> a wide range of mixes of DCTCP and `Classic' broadband Internet
>>>>> traffic, without compromising the performance of the Classic traffic.
>>>>> The solution has low complexity and requires no configuration for the
>>>>> public Internet.
>>>>> 
>>>> Bursty traffic or link layers can raise the limits on low queueing
>>>> delay:
>>>> 
>>>> 
>>>> https://github.com/heistp/l4s-tests/#between-flow-induced-delay
>>> [BB] Gorry asked me to cut down the abstract, so I've removed all the stuff about testing in a residential broadband setting anyway.
>>> See the mail I just sent him for the new wording. However, I can see now that changing it is probably going to lead to further haggling over different wording...
>>> 
>>> Nonetheless, let's move from wording to your technical point itself. Before I was hired by CableLabs to work on Low Latency DOCSIS I wrote a tech report about why bursty effects like this might happen and how to deal with it if they were a problem [SigQ-Dyn]. It's about how sojourn time doesn't measure a burst even once the whole burst is queued. You'll find this is already referenced from the dualq draft. We implemented the alternative metric ('scaled sojourn time'), but didn't use it in the end because we couldn't find a good reason for it being necessary. Since that time, I've thought of a simpler way of implementing an improved variant of the metric, which I wrote into the next rev of the draft recently.
>>> 
>>> But before considering replacing sojourn time we need to bear in mind that there's only one run of one experiment that looks odd. So we need to set ourselves up with a lot more solid evidence. If there is a performance degradation in more typical scenarios, we need to know whether it's big enough to worry about. If it is, the modified metric should improve the performance again.
>>> 
>>> [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], September 2017, <https://arxiv.org/abs/1904.07044>.
>>> 
>>>>> dualq-coupled Section 1.1:
>>>>> 
>>>>> In contrast, dualq coupled aqms addresses the root cause of the
>>>>> latency problem --- they are an enabler for the smooth low latency
>>>>> scalable behaviour of scalable congestion controls, so that every
>>>>> packet in every flow can enjoy very low latency, then there is no
>>>>> need to isolate each flow into a separate queue.
>>>>> 
>>>> I would like to be sure that this is true, not just with bursty traffic
>>>> and link layers, but also cross-flow traffic (tests needed on the
>>>> latter).
>>>> 
>>> [BB] Which part do you mean by 'this'? "...every flow can enjoy..."? or "No need to isolate..."?
>>> 
>>> I've edited the former to "...can potentially enjoy...".
>>> 
>>>> 
>>>> 
>>>>> dualq-coupled Section 1.4:
>>>>> 
>>>>> Thousands of tests have been conducted in a typical fixed residential
>>>>> broadband setting.  experiments used a range of base round trip
>>>>> delays up to 100ms and link rates up to 200 mb/s between the data
>>>>> centre and home network, with varying amounts of background traffic
>>>>> in both queues.
>>>>> 
>>>> It sounds like these tests were performed on a certain type of link
>>>> (residential cable?), in a certain RTT range (below 100ms).
>>>> 
>>> [BB] The draft says "below 100ms". And it gives the ref to the paper with the spec of the tests - it says they were over DSL, then emulated with an ethernet testbed for higher speeds than the DSL could achieve, after comparing the validity of the emulation.
>>> 
>>>> Bursty
>>>> links and higher RTTs should also be tested if L4S is to be used for
>>>> general purpose congestion control (see #3 above). We should end up
>>>> with a system that works well under most all conditions, otherwise it's
>>>> not clear how users and admins will know to change CCAs from one
>>>> activity to the next.
>>>> 
>>> See my posting on the list to Sebastian on 31 Aug 21:
>>> "[tsvwg] tests of Prague utilization with bursty competition in L and C [more burst sources]"
>>> https://mailarchive.ietf.org/arch/msg/tsvwg/LfcOHNupIusPHyjVV4vFWFwS8bI/
>>> 
>>> There's a lot in that conversation, and you'll see some results of it appear in the ecn-l4s-id draft shortly.
>>> At the risk of over-simplifying, two key sentences are:
>>> * "Updated link designs are not going to appear any time soon. In the mean time, the parameters of any flaky links and the L4S target Q delay can be set to levels that trade off between utilization and delay."
>>> * "Whichever technology enables reduced queuing delay variability, the world would want to change to reduce burst delays as well. "
>>> 
>>> 
>>>>> dualq-coupled Section 4.1.3:
>>>>> 
>>>>> Experiments with the DualPI2 AQM (Appendix A) have shown that
>>>>> introducing 'drop on saturation' at 100% L4S marking addresses this
>>>>> problem with unresponsive ECN as well as addressing the saturation
>>>>> problem.  It leaves only a small range of congestion levels where
>>>>> unresponsive traffic gains any advantage from using the ECN
>>>>> capability, and the advantage is hardly detectable [DualQ-Test].
>>>>> 
>>>> I might be misunderstanding the text, but two-flow tests of traffic
>>>> that is unresponsive to CE show that, as with RFC3168 ECN, there is an
>>>> advantage to setting ECT(0) or ECT(1), then not responding to CE,
>>>> that's easy to detect:
>>>> 
>>> [BB] That section is about unresponsive traffic gaining advantage from using the ECN capability.
>>> I think you're talking about gaining advantage from being unresponsive, which is not in doubt.
>>> 
>>> To be clearer, I'll add "... unresponsive traffic gains any advantage from using the ECN capability (relative to being unresponsive without ECN)"
>>> 
>>>> https://sce.dnsmgr.net/results/ect1-2020-05-20-s11-ce-unresponsive/l4s-s11-ce-unresponsive/
>>> [BB] What am I looking at here? Sorry, I'm not familiar with the framing of these tests.
>>> 
>>> ___________
>>> [BB] Whatever, thank you v much for your continued testing, reviewing and comments.
>>> 
>>> Cheers
>>> 
>>> 
>>> Bob
>>> 
>>> -- 
>>> ________________________________________________________________
>>> Bob Briscoe
>>> http://bobbriscoe.net/
> 
>