Re: [tsvwg] [Technical Errata Reported] RFC7141 (7237)

Sebastian Moeller <moeller0@gmx.de> Sun, 01 October 2023 17:20 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E3443C15153D; Sun, 1 Oct 2023 10:20:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.447
X-Spam-Level: **
X-Spam-Status: No, score=2.447 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, GB_SUMOF=5, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id crKGeETJFBdQ; Sun, 1 Oct 2023 10:20:18 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 92289C15109B; Sun, 1 Oct 2023 10:20:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1696180765; x=1696785565; i=moeller0@gmx.de; bh=iHbSOfWGrvwOUPkz178dtMuzCUCrPCMl6sbidRAOw7k=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=kdZC129VLX3wf6DRPYupCN0+lypVFZ64r7GCm80oVEzGQxCiB/IOdgYzqILPHVosBIUKo60zwy8 XLRbGJqiFRa1fOMNk61YFImNKbHKO9G+raqtPhSxWJxn8OE2ERKDT8fWw7YKiJVjCR9OUGV8kMIsP RzTfbldWJ9E6x2LuRmxhGmjpOLvJrtUvyP8l46H2CtvKoJJyA4rZ3G9GHuG99b4XNMmlr2CO8kKVC 076PJGarl7Ms6Zr7ULMNCOFvz9hnbIB/+iE29opTtMQudt4BmA15ovkhZDBcQSiY9+JkGt0aBgXbt F/v2cL7N4XY8SiTvXqlBLTbdBf3ef1cmPZYA==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([77.0.37.4]) by mail.gmx.net (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MdNcG-1rM3rx2DYW-00ZMbb; Sun, 01 Oct 2023 19:19:25 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <d946c450-3a9d-38c8-a2d1-a5fc54c287b4@bobbriscoe.net>
Date: Sun, 01 Oct 2023 19:19:24 +0200
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, TSVWG <tsvwg@ietf.org>, jukka.manner@aalto.fi, RFC Errata System <rfc-editor@rfc-editor.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7BD52E0C-3532-4B48-B44A-EB4DA19BD258@gmx.de>
References: <20221104094005.747A455F68@rfcpa.amsl.com> <4aef3037-fae5-68c9-661f-4ce89b1ce7e7@erg.abdn.ac.uk> <273A82C1-E675-4950-A7E0-E8C564B09834@gmx.de> <6672b32e-19b6-b295-1460-904481de2c83@erg.abdn.ac.uk> <1351054E-7647-40CA-B2FA-7A566DE09E24@gmx.de> <f02cfbb6-9a14-0c70-4986-358b9226033f@erg.abdn.ac.uk> <CC3F2650-2CC7-4EC9-B0BC-2200D482CDEC@gmx.de> <073c8aed-91f4-a11a-771e-9932032cedba@bobbriscoe.net> <25626F56-AA8E-4B7D-904E-F17E2B57642E@gmx.de> <d946c450-3a9d-38c8-a2d1-a5fc54c287b4@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3696.120.41.1.4)
X-Provags-ID: V03:K1:mn4mNq+9uha9nmnvX/RdxM6aNumBmGR0088Qz0ibtw5zDqJIjzQ 5KcHtaV9ODvS2Pm+rkssUEkxsmnGfbnQRflb0n2cCeP6ypnD3cBcRZHePBTcV2rZFd5YhvU pJdonz70P91BsEGJy/QiozfK+bvwH/Qzq6QrRYpFc4f0LtKJTbHMQManfLo906GHJxbEvF8 BvEb8RIHWMqtdD+wmi5sg==
UI-OutboundReport: notjunk:1;M01:P0:tJ8j2oUc8+A=;xMZp5dzoPc8H5lnPcXkeZkI9Mrn 0XJKc6zGWaUbCzLHEOQCw3BBLaZY8Eho7P7gWzlF62ZAqZA5/a16Td1GcyzvwN1wM3iLv4XsZ e1sNGCw9HFZQU6R+150XtwT3/g9bbL99X/y2vZdwOzR1pQoFN/kJb1ROj1soXu6dIpw34W+pK PRL/aoQ9k3gp4BIZioHvRSx9p/kWYg6n2DsghJiLTPsAO7me3Rd9v6lQOdFN/sMv6AYdTaI9B hRJrkI5i9khzlakh4eDVnj0BkIS3xcEXkk7/8ImXTHariSmXpk8V1rOs4AbUYkvuwoTcqHpS0 G5Q8sqrkDTH9epajbbwyvMUb9EXsnVVLEqqMFbPWIrYabTQSoQgNSIU2q+bv6vWCo2OPki/sc IjCFpJgcs9DulTfySnyGg97m7UakfiWJhKfktl40yM1X8qyH+t7GNJj2uY/b75L86zFgMUlLN F/EAflUmMBDfBv8xIvbKKbhBCvwxxBRm5N7+jlaZiCWfIR86mI42KZhngrfj2ApnVLbZg7tB/ YYUnIClayhQONlFsSvCoiLKJ1CJ0WisUtT3rVXB9mlLGuMI6glk5LpvSJd9uSvRz/ONZ1YuXG jrpQrIvV2aSjSe7tOvVnLobBDzKJJHqH0AXEKNg5SZT+ZOhO3TAYSPOv+lYmURpMUx+eLIk/v AgzwiS9GPOrdwz9g+MpP50Eo/yrHCdAnU6gfLOnbtjsWStNp3dpmx7VtL+5xwwGNRisbhvbob QuxQBobw/dlgIwptR0PRevThLRw+Pofh5Oiv9ali+vXH4xfXebduGFEQTq1gXd/hU/bggXqB2 F9uQv2K5RXAwMAGMC3iQp+yp58Z3YbxAX0UExMqf54rJKeO0h+d5VJCThs3xpL8lHSgtWSj5P MmLoHTj+IHq2wa7pbkcRxTG9tDWCnXmCtwZTjqdtDTzzPYs9whBt+T49o2Ho7BJ+YzSrgIzvG 4/Ax+A==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/IzXpEYEsBv6NTvwe2OgN0Dv9c6Y>
Subject: Re: [tsvwg] [Technical Errata Reported] RFC7141 (7237)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Oct 2023 17:20:23 -0000

Hi Bob,


> On Sep 30, 2023, at 11:22, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastian, pls see [BB2]
> 
> On 22/09/2023 15:41, Sebastian Moeller wrote:
>> Dear Bob,
>> 
>> 
>>> On Sep 21, 2023, at 18:28, Bob Briscoe <ietf@bobbriscoe.net>
>>>  wrote:
>>> 
>>> Sebastian,
>>> 
>>> I've just read your erratum, which is about §2.2 & §2.3:
>>>     
>>> https://www.rfc-editor.org/errata/rfc7141
>>> 
>>> and I just read this thread, which is more about §2.4 (sorry I missed these at the time).
>>> See [BB]
>>> 
>>> 
>>> 
>>> On 04/11/2022 13:37, Sebastian Moeller wrote:
>>> 
>>>> Hi Gorry,
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Nov 4, 2022, at 14:03, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
>>>>> 
>>>>>  wrote:
>>>>> 
>>>>> On 04/11/2022 12:42, Sebastian Moeller wrote:
>>>>> 
>>>>> 
>>>>>> Hi Gorry,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 4, 2022, at 11:56, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
>>>>>>> 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 04/11/2022 10:43, Sebastian Moeller wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi Gorry,
>>>>>>>> 
>>>>>>>> See [SM] below.
>>>>>>>> 
>>>>>>>> On 4 November 2022 11:20:56 CET, Gorry Fairhurst 
>>>>>>>> 
>>>>>>>> <gorry@erg.abdn.ac.uk>
>>>>>>>> 
>>>>>>>>  wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Commenting as an individual on the Errata filing:
>>>>>>>>> 
>>>>>>>>> On 04/11/2022 09:40, RFC Errata System wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> The following errata report has been submitted for RFC7141,
>>>>>>>>>> "Byte and Packet Congestion Notification".
>>>>>>>>>> 
>>>>>>>>>> --------------------------------------
>>>>>>>>>> You may review the report below and at:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> https://www.rfc-editor.org/errata/eid7237
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --------------------------------------
>>>>>>>>>> Type: Technical
>>>>>>>>>> Reported by: Sebastian Moeller 
>>>>>>>>>> 
>>>>>>>>>> <moeller0@gmx.de>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Section: 2
>>>>>>>>>> 
>>>>>>>>>> Original Text
>>>>>>>>>> -------------
>>>>>>>>>> 2.2.  Recommendation on Encoding Congestion Notification
>>>>>>>>>> 
>>>>>>>>>>     When encoding congestion notification (e.g., by drop, ECN, or PCN),
>>>>>>>>>>     the probability that network equipment drops or marks a particular
>>>>>>>>>>     packet to notify congestion SHOULD NOT depend on the size of the
>>>>>>>>>>     packet in question.
>>>>>>>>>> [...]
>>>>>>>>>> 2.3.  Recommendation on Responding to Congestion
>>>>>>>>>> 
>>>>>>>>>>     When a transport detects that a packet has been lost or congestion
>>>>>>>>>>     marked, it SHOULD consider the strength of the congestion indication
>>>>>>>>>>     as proportionate to the size in octets (bytes) of the missing or
>>>>>>>>>>     marked packet.
>>>>>>>>>> 
>>>>>>>>>>     In other words, when a packet indicates congestion (by being lost or
>>>>>>>>>>     marked), it can be considered conceptually as if there is a
>>>>>>>>>>     congestion indication on every octet of the packet, not just one
>>>>>>>>>>     indication per packet.
>>>>>>>>>> 
>>>>>>>>>>     To be clear, the above recommendation solely describes how a
>>>>>>>>>>     transport should interpret the meaning of a congestion indication, as
>>>>>>>>>>     a long term goal.  It makes no recommendation on whether a transport
>>>>>>>>>>     should act differently based on this interpretation.  It merely aids
>>>>>>>>>>     interoperability between transports, if they choose to make their
>>>>>>>>>>     actions depend on the strength of congestion indications.
>>>>>>>>>> 
>>>>>>>>>> Corrected Text
>>>>>>>>>> --------------
>>>>>>>>>> I am not sure the text is actually salvageable, as it appears ti be a logic disconnect at the core of the recommendations.
>>>>>>>>>> 
>>>>>>>>>> Notes
>>>>>>>>>> -----
>>>>>>>>>> The recommendations seem not self consistent:
>>>>>>>>>> A) Section 2.2.  recommends that CE marking should be made independent of packet size, so *a CE-mark carries no information about packet size*.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> I did not understood that it needed to. 
>>>>>>>>> 
>>> [BB] 
>>> 1/ I've emphasized the words in your erratum that help me see where a first misunderstanding might be:
>>> 
>>>> A) Section 2.2. recommends that CE marking should be made independent of packet size, so a CE-mark carries no information about packet size.
>>>> B) Section 2.3 then recommends to use the size of marked packets as direct indicators of congestion strength.
>>>> 
>>> An analogy might help explain: When a tail-drop queue discards a packet, or when a flow-agnostic AQM (like PIE or RED) ECN-marks a packet, they do so independent of flow ID. Nonetheless, because there is a flow-ID in each packet, these arbitrary/random algorithms blindly pick the flows that should respond, even though these algorithms don't read the flow-ID (it can even be encrypted). 
>>> 
>> 	[SM] Yes, in a sense a single/dual-queue AQM treats its traffic as a single aggregate, but it expects that aggregate to respond to congestion signaling. This is less important for drops, and considerably more important for congestion marks (to avoid easy abuse of the marking system). (Side-note, my take on hiding flow-ids is, you are free to do so if you desire, but that decision is not guaranteed to be without side-effects, after all the IETF recommends not to starve any flow/connection, and to be able to do so the network needs information about what constitutes a flow... but I digress)
> 
> [BB2] I was just giving this as another example of a mechanism that selects some attribute of packets (in this case flow ID) by randomly selecting packets without looking at the attribute. I didn't intend or expect anyone to start pronouncing on the merit of the technology that I used as an example.

	[SM2] Well, if you do not want something to be discussed, maybe do not introduce it into the conversation?


> 
>> 
>>> Returning to packet-size, most AQMs don't mark dependent on packet size.
>>> 
>> 	[SM] For a change I agree with you, 
> 
> [BB2] Again, I didn't intend or expect anyone to agree or disagree with the mechanism in this example. The example was merely given to show that the text from your erratum (that I have again emphasized below) is incorrect:
>     "Section 2.2.  recommends that CE marking should be made independent of packet size, so *a CE-mark carries no information about packet size*.
> 
> All I was proving was that:
> A marked packet carries information about flow ID, packet size, DSCP, IP version, TTL, etc irrespective of whether those attributes were used to select the packet for marking. 
> 
>> packet size does not seem a meaningful determinant of the contribution of a flow to congestion, so not making a signaling decision contingent on this seems like the a better choice than taking it into account. To illustrate this if we compare two flows both currently using up 80% of the bottleneck capacity each (for a total of 160%, so we need to signal "slow down" as we are or shortly will be growing the bottleneck queue) one using a (hypothetical) MSS of 1500 and the other a MSS of 150 octets, both contribute equally to the impeding (capacity) congestion and both should reduce their cwin by an equal proportion... 
> 
> [BB2] But 10 times more of the smaller packets will be flowing through this AQM, so its size-independent marking will select 10 times more of the smaller packets. That doesn't seem to matter in your toy example with 2 flows and heavy overload, but it does matter in a more realistic example, say...
> * half the capacity is consumed by 100 capacity-seeking flows with MSS=1500 and
> * the other half is consumed by another 100 capacity-seeking flows with MSS=150, 
> * and within each flow there are a large number of round trips between each mark.

	[SM2] Please do not pivot away from my example, I gave it for a reason... on a home link that is mostly idle very few active flows is not a rare occurrence (I extrapolate this from average usage numbers on residential internet access links in the low double digit Mbps range mostly decoupled from actual link capacity, it is clear that 200 concurrent capacity seeking flows is not the norm, and hence any scheme that relies in averaging over large numbers will not really fare all that well).

> Then, your example AQM is 10 times more likely to select packets from the flows with the smaller MSS. So the 100 flows with smaller MSS react more often and gradually consume a smaller and smaller share of the capacity. 

	[SM2] Yes, and this is exactly a reason why single queue AQMs are clearly not the way to go, they suffer both from bias agains smaller packets and longer RTTs and any scheme trying to counter that from the end-points suffers from not knowing what happens at the bottleneck.
If you shape your response as if you where using a specific packet size, you end up betting on knowing the packet sizes of the competing flows (unless all flows would use exactly the same scheme, but that is not the current state of things, nor is it likely the future state of things). And similar issues appear when trying to normalize response based on RTT from the end-points. IMHO this is a fool's errand without getting the network to help.


> This is the sort of problem that the early designers of AQMs were faced with. Some decided to solve the problem by adding packet-size dependent marking to AQMs. That was the problem that motivated RFC7141 to be written (which I've admitted isn't described in the intro). 

	[SM2] But we all agree that marking should depend on packet size, so just let this rest it seems only historically relevant.


> First RFC7141 showed that size-dependent marking had rarely, if ever, been enabled. 
> 
> Then RFC7141 said there would be an interop mess if some AQMs tried to do packet-size dependent marking and others didn't, so it recommended only size-independent marking.
> 
> It wouldn't be so much of a problem if some CCAs did packet-size dependent response and others didn't, because most flows use PMTU-sized packets anyway, and if any used smaller packets it would be down to them if they still inflicted harm on *themselves* by responding as much to congestion as other flows did. That's why RFC7141 gives size-dependent response as a long-term goal, without requiring a CCA to act on it.

	[SM2] I am sorry, this still makes little sense, a size dependent response makes clear assumptions on the sizes of the competing flows which might work for the common case, but is not generally robust and reliable. We can do much better and hence we should recommend better solutions to the issue instead of argueing which band-aid is preferable.


>> I would argue that the relevant factor an AQM might base signaling severity on would be something like most recent capacity-share a flow used, or, if you must, percentage of current contribution to the queue (best measured in service-time or aggregate size). Given that current ECN/L4S offers only to have a single signaling bit (assuming we want to avoid drops) what an AQM can do is increase the marking frequency*.
> 
> [BB2] Applications can open multiple flows, and larger flows cause congestion over a longer time than smaller flows. But let's not get into a debate about the merits of per-flow mechanisms in the network, which is beyond the scope of this RFC and this erratum. This RFC dealt with non-FQ AQMs as they were, and as they are still.
> 
> 
>> *) Mind you, I consider this to be a problem in its own right, especially for single/dual-queue AQMs as non-amgiguous signaling of congestion magnitude has been shown in several papers to be beneficial.
>> 
>> 
>> 
>>> But packets do have a size (and a size field). And larger packets do contribute more per packet to link congestion than smaller packets.
>>> 
>> 
>> 	[SM] Let me cite your own words here:
>> "applying to just each frame in isolation, which would make absolutely no sense at all"
>> 
> 
> [BB2] The context you have removed this quote of my words from was about carrying a byte count over from one marked frame to the next frame, which is why I said applying it to each frame in isolation made no sense.
> 
>> This, IMHO applies here as well, marking a packet only makes sense if the flow it belongs to reacts to that mark. And as a corollary an AQM ideally send marks preferably to those flows contributing most to the congestion, but that does not depend on packet size, as I hope I showed in my above example. So yes, if we force ourselves to only compare two random packets out of a shared queue the larger packet occupies more equivalent service-time of that queue than the smaller, but that is as true as inconsequential.
>> 
>> REQUEST to list members: If anybody can demonstrate that this is an incorrect interpretation on my side please do so.
> 
> [BB2] I hope I have shown that it is not inconsequential in realistic examples with large numbers of flows. 


	[SM2] No, you pivoted away from an example much more realistic at the edge of the network were an AQM is realistically used (even today) and replaced it with an example likely relevant for a back-bone (200 concurrent bulk flows); I note back-bones to my knowledge are typically operated without AQMs but with sufficient over-provisioning so that congestion is sufficiently rare there. I am an end-user and hence look at these things from an end-user perspective primarily, but I am not hiding this position.


> 
> 
>> 
>>> So, if we assume that the marking algorithm has not already taken packet size into account, the strength of a congestion signal for the purpose of congestion response can be taken to depend on the size of the packet it is applied to.
>>> 
>> 	[SM] Again, I disagree, you are now making a leap of faith from the comparison of two random packets (which I technically agree with) to deduce something about the two flows these two packets were picked from. And there I disagree that your assumption generally holds (just think path's of different pMTU/pMSS). 
> 
> [BB2] It's not a leap of faith. It's a way to design algorithms that use repeated operations over large numbers of packets by exploiting probability (like the AIMD algorithms).

	[SM2] Relaying on the "law of big numbers" can work quite well, on-average, but will likely have pretty nasty outliers. For timely congestion control this IMHO is clearly a problematic mode of operation. 


> The outcome from the repetitive algorithms in existing CCAs have been derived as response functions and validated. So once you understand how those repetitive algorithms lead to those outcomes, you can modify the repeated operations to achieve the effect you want.
> 
> For instance, the well-known response function for the steady-state window (W) of Reno, which is a good approximation as long as p is small, is:
>     W = √(K/p), 
> where K is approx 3/2
> and bit rate,
>     x = s/R * W
>        =  s/R * √(K/p), 
> where s is the SMSS, R is the RTT and p is the marking/dropping probability.
> 
> It is well-known and widely validated that Reno flows with smaller SMSS converge on a proportionately lower rate, as can be seen from the presence of s in the above formula. This is simply because the additive increase per RTT is 1 SMSS, so a smaller SMSS will increase more slowly.

	[SM2] This is not in dispute, but I agrue not relevant here, sorry.



> 
>> Like if there was a flow scheduling bottleneck with an individual AQM for each "flow" then making the response contingent on packet size is arguably exactly the wrong thing. 
> 
> [BB2] I don't know what led you to believe that, but it's wrong in two ways.
> 
> 1/ Any packet-size dependence of a CCA will not determine flow rate anyway if there's an FQ scheduler at the bottleneck constraining each flow's rate (in FQ each CoDel AQM adjusts the time between marks independently for each flow so that each flow builds a similar queue in each equal-rate lane). 

	[SM2] Almost... longer RTT flows reacting slower to signals with more data in flight will transiently tend to have larger queue size variations than shorter RTT flows, I am on purpose not saying anything about average queue size, as that IMHO is clearly not all that relevant if we want timely congestion response.


> 
> Packet-size dependent CCAs are only necessary for ensuring equal flow rates when flows all share a bottleneck queue, and therefore cannot have different marking or dropping probabilities. Over an FQ, that doesn't make them wrong; it just makes them unnecessary.

	[SM2] Again explain how these can work unless we either have:
a) veridical information about the packet size distribution of the competing flows
b) all involved flows use the same algorithm to normalize their response to a reference packet size.

My understanding is, thy can not. They likely will work okay for a typical bottleneck with typical packet size distributions, but will fail if e.g. the MTU over the bottleneck is 536 and our 150b byte flow scales as if the size was 1500 (I also note that ~1500 is what is typically achievable, but depending on where you are larger MTUs might also be operational).



> 
> 2/ If you're thinking that you'd like the CCA not to need the FQ-CoDel AQMs to converge on a different time between marks for flows with different SMSS,

	[SM2] I do not, where did I give the impression I did? The core of FQ scheduling is giving each long-enough duration flow approximately the required amount of back-pressure to achieve the desired offered traffic rate. I am not, and have not claimed, that TCP does not have a size (and RTT) bias for that matter, my argument is that the end-points can not really fix this, so need help from the network. We have the data showing that such a scheme works pretty well. (Side-note: cake solved the flow-inflation issue essentially by adding an additional per internal-IP fairness round before the per-flow scheduling, restricting the fall-out of flow-inflating applications on the host running these*).


*) And given that the operator might not actually consider that flow-inflation problematic (e.g. torrents) there might be no perceived fall-out at all.


> then you're already out of luck, 'cos existing Reno-Friendly CCAs already cause the CoDel AQMs to adjust to a longer time between marks for flows with smaller SMSS.
> 
> Specifically, for flows constrained to equal bit rates by the FQ scheduler, the ratio of the inter-mark times (T) that CoDel reaches for flows with small SMSS (index 1) and large SMSS (index 2) will be roughly as follows:
> Reno: 
>     T1/T2 = s2/s1
> Reno-SP: 
>     T1/T2 = s1/s2
> where s is the SMSS, so s1 and s2 are respectively the sizes of the smaller and larger SMSS.
> And Reno-SP (standing for small packets) is an invented algorithm that, whatever the SMSS, it keeps the bit rate the same as it would have been if the SMSS were s2 (the larger SMSS that it considers as a reference), all else being unchanged.

	[SM2] Bob, I will not check your math here and just stipulate for the discussion for it to be correct, but this is as interesting as it is irrelevant, equal marking frequency is not a data point I am interested in.

> So as not to break up the flow of this already long email, see {Note 1} at the end for the working that proves the above two equations.
> Briefly, the factors that lead to these results are:
> * During a certain time between marks applied by CoDel, there will be more small packets than large.
> * FQ aims to equalize the bit rates of flows, but Reno aims to equalize their windows (in number of SMSS-sized segments, not in bits).

	[SM2] Again probably correct and IMHO definitely irrelevant.


>> Similarly if a flow should always send pairs of packets one large one small (as sees in some games) by your logic the flow should respond differentially dependent of the small or the large packet was marked, which I consider not to be a defensible position.
> 
> [BB2] Why not?

	[SM2] Because you are again putting meaning to something the AQM did not, you IMHO could in theory better model this case as two packets of average size and generate a response scaled to that... 


> That's the robust way to design an algorithm: a repetitive process that responds differently to markings on different-sized packets in order to achieve the desired outcome dependent on the relative prevalence of different sized packets (picked at random by the marking process). Algorithms like this are robust compared to an alternative that might maintain an average packet size, which would then require an arbitrary choice of the averaging period, which would then never be right in all scenarios, eg small packets might sometimes be evenly spread, or other times in bursts, etc etc.

	[SM2] Well, you simply move the assumption about "average packet size" of the competing traffic simply to the end-point.


>>> And that can be done, even if the people who originally developed the AQM algorithm thought that packet-size dependence ought to be done by the AQM (but it wasn't enabled). 
>>> 
>> 	[SM] Again, I fully agree with you, packet size is not a robust and reliable predictor of "contribution to congestion" and hence neither AQMs nor end-points should pretend it is. We seem to agree on the former, but not on the latter.
> 
> [BB2] You are agreeing with not making AQM algorithms dependent on packet size.

	[SM2] Yes.

> But you're not agreeing with me (nor with RFC7141) when you say 'packet size is not a robust and reliable predictor of "contribution to congestion"'. I admit that the size of a packet is only a robust and reliable indicator of its contribution to congestion if the congested resource is bit-congestible. However, that is the common case (but not universal - see rfc-editor.org/rfc/rfc7141.html#section-5.2 ). So I'd only go as far as saying packet size is a generally useful indicator of "contribution to congestion".

	[SM2] Under some assumptions there will be a correlation, hence my "robust and reliable predictor" qualification., so we also seem to agree here to some degree.

> So far, I've detected that you are hesitant to make the response of a CCA depend on the size of arbitrary packets picked by an AQM.

	[SM2] Again I am, this is because I want my congestion response to be timely and correct and hence would very much like to remove the need to average over time... 
 

> But, here you're saying something much more controversial. But you're not saying *why* you think packet size is not a robust and reliable predictor of "contribution to congestion". 

	[SM2] Getting back to my 2 flow example, both flows contribute equally to congestion, but taking packet size into account we need to assume the smaller packet flow contributes less... This clearly shows the limits of the correlation and hence limits of robustness.

> If you follow your position to its logical conclusion...
> Would you *mandate* that, when a CCA (say Reno or CUBIC in Reno-friendly mode) has a smaller pMTU, it MUST end up with a lower bit-rate than if the pMTU is larger (like it does now, because it increases less, but decreases the same). Surely that is a perverse position to adopt.

	[SM2] No, I am saying that the CCA can not rely on the sized of marked packets to robustly and reliably counter that size bias. To do so it would need robust and reliably information about what happens on the bottleneck and how the competing flows look and behave.... And to give an extreme example if out hypothetical 150 octet flow is on a path with MTU 151 and it still responds as if it had MTU 1500 it will not fall behind the other flows but take a larger share of the capacity. Now this is NOT a realistic example of a situation likely to happen, but it clearly shows that end-points can not reliably extrapolate magnitude of congestion from marked packet size. The kicker is again that it would need to know what the competing flows do...


> The recommendation in RFC7141 is that CCAs are recommended to respond dependent on packet size (for their own performance), but they can respond independent of packet-size (harming themselves) if they choose. I can't see why anyone would be against that, and you haven't given any reason, only an assertion.

	[SM2] Because it is not robust and reliable, I prefer to act on causal reasoning, not pure correlation.


> 
>>> Next step: If this does capture the your misunderstanding on /this/ point, pls say, then either you or I can suggest an edit to avoid the misunderstanding.
>>> 
>> 	[SM] As I tried to explain above, I do not see a mis-understanding, I do however see a logically problematic extrapolation from random packets to flows/connections that I consider to require quite a lot of evidential data to back it up...
> 
> [BB2] I understand that, and I hope I have helped with my earlier explanations.
> 
> 
> 
>> 
>>> 2/ A second problem might be that RFC7141 doesn't actually outline the original packet-size problem that byte-mode RED was trying to solve. I detect you're not familiar with that history, and why should you be? That /is/ something that could be corrected with an erratum.
>>> 
>> 	[SM] That is one (not the only) argument from rfc7141 that I am happy to agree with: that marking should not depend on packet-size. However, unlike you, I clearly see that if packet size is not a measure of congestion severity, then it should also not be interpreted as such.
>> 
>> 
>>> 
>>> 3/ Then this thread shows further misunderstanding regarding §2.4 on splitting and merging - see [BB] below...
>>> 
>>> 
>>>>>>>>> This RFC I think was intended to be independent of the transport.  I see the transport sender as responsible for determining the packetisation of the transport segments, and the (S)ACKs can often identify segments, hence the sender can determine the segments that have been acknoweldged or times when ECN marking was seen.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> [SM] This assumes that relevant segment size does not change along the path. Which generally is not true. Just think fragmentation, if the sender sends a packet that gets fragmented along the path and only a single fragment gets CE marked the sender will see this as the whole packet being marked. Or from the other side of the issue, if say a Linux router uses GRO/GSO and queues a larger meta packet and CE marks that, receiver and sender at best see a sequence of CE marked packets. So the recommendation would need to be changed to calculate the consecutive sequence of CE marked octets and take these as correlate for congestion strength. So no, the sender really has no reliable knowledge about the size of the data unit the marking node marked.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> I suggest IETF transports treat all IP fragments as one unit of retransmission/congestion at the transport layer.
>>>>>>> 
>>>>>>> 
>>>>>> 	[SM2] But what if the re-segmentation does not happen at the receiver, but say a fragmenting and CE-marking path tries to act transparently. According to the rules both in RFC3168 and RFC7141 a re-segmented packet containing even a single CE-marked fragment is to be CE-marked (or dropped). So the AQM might have marked a 576 octet segment but all the endpoint sees is a marked ~1460 octet segment.
>>>>>> 	This also illustrates how section 2.4 of RFC7141 proposes a method that does not achieve its aim, of giving veridical "number of market octets" information. It simply is impossible to do so generally (often it will work, but the endpoints can not even know when it was correct and when not).
>>>>>> Section 2.4 has more issues BTW, it tries to give recommendation how to deal with splitting and merging but fails to achieve its goals of giving a veridical account of the marked octets:
>>>>>> 
>>>>>> Let's see what happens when applying the proposed counter method in regards to number of marked octets under the conditions this section addresses
>>>>>> Here let's look at a toy problem with 20 byte headers and a total payload of 1200 octets that is split in or merged out of 3 fragments/segments with 400 octets payload each
>>>>>> 
>>>>>> Merging multiple segments pre-marking:
>>>>>> (20+400) + (20+400)-20 + (20+400)-20 -> 1220 total 1200 payload + CE
>>>>>> -> AQM marks 1220 or 1200 octets
>>>>>> (12+1200)+CE
>>>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>>>> sender can assume 1200 octets where marked
>>>>>> CORRECT
>>>>>> 
>>>>>> Merging multiple segments post-marking ():
>>>>>> -> AQM marks segment 2 of 420 or 400 octets
>>>>>> (20+400) + (20+400+CE) + (20+400)
>>>>>> (20+400)+(20+400+CE)-20+(20+400)-20 = 1220 total 1200 payload + CE
>>>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>>>> sender must assume 1200 octets where marked
>>>>>> FALSE
>>>>>> 
>>>>>> Fragmenting a segment pre-marking
>>>>>> 1220 -> (20+400) + (20+400) + (20+400)
>>>>>> -> AQM marks segment 2 of 420 or 400 octets
>>>>>> (20+400) + (20+400+CE) + (20+400)
>>>>>> Resegmentation happens before protocol sees marking
>>>>>> (20+400) + (20+400+CE)-20 + (20+400)-20 -> 1220 total 1200 payload + CE
>>>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>>>> sender must assume 1200 octets where marked
>>>>>> FALSE
>>>>>> 
>>>>>> Fragmenting a segment post-marking
>>>>>> (20+1200)
>>>>>> -> AQM marks 1220 or 1200 octets
>>>>>> (12+1200)+CE
>>>>>> fragmentation happens:
>>>>>> (20+400+CE) + (20+400+CE) + (20+400+CE)
>>>>>> Resegmentation happens before protocol sees marking
>>>>>> (20+400=CE) + (20+400+CE)-20 + (20+400+C)-20 -> 1220 total 1200 payload + CE
>>>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>>>> sender must assume 1200 octets where marked
>>>>>> CORRECT
>>>>>> 
>>>>>> So only in two out of four conditions does the proposed method actually achieves its goal.
>>>>>> 
>>>>>> 
>>> [BB] You seem to have interpreted §2.4 as applying to just each frame in isolation, which would make absolutely no sense at all.
>>> 
>> 	[SM] This example above indeed looks like this but that is mainly to keep the number of conditions to look at low, possible that this results in too much simplification.
>> 
>> 
>> 
>> 
>>> It applies to a stream of frames that are being split or merged. Indeed, the text already makes this clear, as follows:
>>> 
>>>    even the smallest
>>>    positive remainder in the conceptual counter should trigger the next
>>>    outgoing packet to be marked (causing the counter to go negative)
>>> 
>>> 
>>> I detect that you prefer to be fed every little detail, so here's some pseudocode that might help.
>>> 
>> 	[SM] I respectfully argue that as the author of a method, the onus is on you to describe it in sufficient detail. Whether you consider that spoon feeding or not is irrelevant.
>> 
> 
> [BB2] Did you think the text description in the draft meant something other than what the pseudocode describes? If so what was misleading or ambiguous?
> 
>> 
>>>     https://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing_goal2_pseudo.c
>>> 
>>> It's written in terms of frame decapsulation, not fragment reassembly, but you can see it works on the principle of ignoring the sizes of any headers that are not preserved during the process.
>>> 
>>> Applying the pseudocode to your examples:
>>> Ignoring the timeout block for a moment, you should be able to see that, in the two cases you've tagged 'FALSE', _diff will become negative (-800) after the reassembled packet in either of your 'FALSE' examples is marked. Then if more 400-byte-payload fragments arrived, it would take three marked ones to make _diff positive again, causing the packet reassembled from last to be marked before forwarding. 
>>> 
>> 	[SM] Sure this is averaging out quite nicely, 
> 
> [BB2] And would you say that it's not complex?

	[SM2] Too complex for my taste versus filling one variable when reading a frame and simply putting it on the first (or all) IP packets this will work fine assuming your separation of streams by the en-framer and even if ECN states are mixed in frames it will not be much more complex to add one more conditional.

> 
> (BTW, I've fixed a bug in the wrap aspect.)

	[SM2] But that bug already convinced me that pseudocode is really just a waste of everybody's time unless this is the first step to an actual implementation. The issue with pseudocode is, that unlike real code it does not even get the structural checks you get from having it run through a compiler.

> 
>> but it absolutely requires temporal averaging, because at its core it is not correct frame by frame.
> 
> [BB2]  I'm not sure, but I think you're saying that, if one compares the proportions of marked frames and marked packets over a window of one frame, you can only get them to be closer to equal if you take moving averages of each measurement. {Note 2}

	[SM2] No I am saying that to get at the marking probability of the AQM we would need to average over multiple frames and then we could apply the same marking probability to the reconstructed IP frames. If the AQM conveys information in the marking probability that we want to convey to the end-points that is what we would need to do. But we generally can not do so in a timely fashion as we would need to average over multiple frames to get a reliable estimate of the marking probability. (Assuming the marking probability actually stays constant over a long enough stream of frames, and that the AQM did not actually pick specific frames on purpose).
I am not debating that we can try to conserved marked octets, I am arguing that there is very little reason to actually do so, as it seems not in any meaningful way superior to the two simpler alternatives, neither manages to veridically transfer the L2 marking....


> However, that shouldn't be confused with trying to improve the algorithm itself by using averaging within the algorithm. That would always make the algorithm worse. Addition of any form of averaging would require an arbitrary choice of averaging window, which would make the algorithm less correct for smaller measurement windows. Importantly, it would also make the algorithm less responsive to changes in signalling intensity.

	[SM2] But the value we rally claim we want to conserve is the marking probability of the L2-AQM and with a rate coded signal we have zero way of doing that robustly and reliably without averaging (short term observed probability switches between 0 and 1, but clearly for a RED-like AQM these zeros and ones will just be realizations of a latent "marking probability propotional to the queue size").



> Of course, the proportion of marked frames and marked packets certainly each fluctuate up and down around each other. However, that's not an inherent problem with the algorithm; rather it's inherent to the problem. I mean, inherently, marks can only be applied to whole packets or frames, and we have unaligned packet and frame boundaries. Put it another way, I don't think another algorithm can exist that is more correct, because this algorithm is as correct as it can be at every packet boundary.

	[SM2] Glad we finally agree on this, this effectively answers my long standing question: "Can we actually predict what the marking AQM would have done had it seen the IP packet stream instead of the L2-frame stream." So now that we agree that "perfection" is not achievable the question really boils down to which of the available alternative heuristics can justfiy the required implementation complexity by the achieved "performance" (as in being close enough to the unachievable true IP marking from the L2-AQM).



> 
> 
>> 
>>> This carrying over of the balance only continues as long as the time between incoming marks is less than CE_TIMEOUT. If not, you can see that the counters reset to become equal. The value of CE_TIMEOUT is just for illustration.
>>> 
>>> The use of WRAP_GUARD is just a cheap emulation of modulo arithmetic that allows either counter to wrap to zero before the other and still work regardless.
>>> 
>> /* On completion of each reassembled packet */
>> void update_ecn_out(packet) {
>>     _diff = ce_oct_in > ce_oct_out;
>>     if ( (_diff > 0) || (_diff < WRAP_GUARD) {
>>         ce_mark(packet);    // Irrespective of whether it's already marked
>>         ce_oct_out += size(packet);   // Size including packet header
>>     }
>> }
>> 
>> 
>> Not sure WRAP_GUARD (#define WRAP_GUARD -(1<<63)) does that poor man's modulo thing you mention... given that _diff is either 1 or zero (_diff = ce_oct_in > ce_oct_out;, > is comparing left versus right), and the initial (_diff > 0) makes sure it is not zero.... i would guess that somewhere you wanted to use the real difference between ce_oct_in and ce_oct_out.
>> 
>> Now, I might well misread that code, after all I am no compiler, but that illustrates why actual code compared to pseudocode seems desirable...
>> 
> 
> [BB2] That was a pseudocode bug (typo). The '>' was meant to be a minus.

	[SM2] Yes, I clearly could see where you wanted to go with this, but it nicely illustrates my point that there is value in simpler solutions.


> As I said earlier, pls re-download from the same URL, 'cos I had also noticed another problem with the wrap, even once the typo was corrected. The new pseudocode for comparison with wrapping is a bit opaque, I'm afraid. I corrected it with a different cheap modulo trick.

	[SM2] I will not download and look at this again, as it would serve no purpose for me. You made your point and I made mine (method 1a and 1b require less state and no time and counter wrap management and hence are conceptually simpler, also things get only more complex for your solution if we assume the likely case that the en-framer did not already sort different ECN codepoints into different frames).


>>>>>> Now add the complication that the RFC fails to mention what it considers marked octets, just the payload or payload+headers.
>>>>>> 
>>>>>> 
>>> [BB]  This RFC is not specifying a protocol, it's giving a statement of principle as a goal of future protocol design. Any protocol designer (or professional implementer) would be able to fill in details like which headers to include or ignore; or how to confine each instance of the algorithm to a stream of packets that had passed through the same AQM with the same type of ECN codepoint. 
>>> 
>>> Next steps:  Everyone else who discussed this section grocked the idea immediately.
>>> 
>> 	[SM] My take on this is, nobody even tried to implement this at all and hence politely ignored to interact with that proposal at that level of scrutiny.
>> 
>> 
>> 
>>> So I don't think more explanation is needed to understand §2.4. Certainly pseudocode would seem overkill, given splitting and merging is a fairly minor part of this RFC. But I'll leave that decision for whoever has to decide on errata - the AD I think?
>>> 
>> 
>> 
>> 
>> 
>>> 
>>>>>> This is important as the sum of payload + headers of X fragments is larger than the sum payload + header of the single packet re-constituted out of these fragments. So the de-fragmenting process arguably needs to only look at payload size, but RFC7141 section 2.4 does not make that explicit.
>>>>>> If an implementation actually uses the full size instead of the payload size now the last condition also gets it wrong:
>>>>>> 
>>>>>> Fragmenting a segment post-marking
>>>>>> (20+1200)
>>>>>> -> AQM marks 1220 or 1200 octets
>>>>>> (12+1200)+CE
>>>>>> fragmentation happens:
>>>>>> (20+400+CE) + (20+400+CE) + (20+400+CE)
>>>>>> Resegmentation happens before protocol sees marking
>>>>>> (20+400+CE) + (20+400+CE) + (20+400+C) -> 1220 total 1200 payload + CE
>>>>>> but (20+400)*3 = 1260 marked octets
>>>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>>>> sender must assume 1200 octets where marked
>>>>>> CORRECT
>>>>>> But now the left over 40 bytes in the marked-octet budget will result in CE feedback for the next (re-assembled) packet.
>>>>>> FALSE
>>>>>> 
>>>>>> For rfc3168 that will not matter much as ECE is sustained until CWR is received anyway, but L4S style signaling now acquired an erroneous CE mark.
>>>>>> 
>>>>>> 
>>>>> Network fragmentation be it in tunnels, extension headers or IPv4 fragments is indeed thwarted with all manner of issues. Nothing new - the IETF has long recommended the unit of loss/marking to be the same as the end to end PDU. PMTU is tricky, but does have benfits:-)
>>>>> 
>>>>> 
>>>> 	[SM3] Not a fan of fragmentation either, but I assume that fragmentation will stay a fact of life over the internet independent of my opinion. My point here is if the IETF proposes a method that aims to correctly account for CE-marked octets, that method should actually deliver on its premise. Failing in 2-3 out of 4 conditions the method is designed to handle is IMHO a sign that the proposed method is/was not in proper shape for becoming an official recommendation. 
>>>> This is fine in an informational RFC as documenting subjective opinion, but problematic in a standards or BCP type document, if like in this case we feel that BCP methods (even incomplete or impossible to achieve ones) are binding precedence that need to be respected in later RFC drafts.
>>>> 
>>>> 
>>> [BB] Instead of implying that everyone involved has been acting unprofessionally, or incompetently, it is good practice to word emails in such a way that allows for the possibility that you just haven't understood something.
>>> 
>> 	[SM] Yes, that is indeed good advise, thank you for that. However it also ignores my argument that especially in a BCP the onus is on us to make sure our recommendations are water-tight, correct and useful.
>> 
>> 
>>>>>>> GSO/GRO and variants would/could change the fragmentation, that is true and need to be considered.
>>>>>>> 
>>>>>>> 
>>>>>> 	[SM2] I am confused? How do GRO/GSO affect fragmentation, IMHO these two will cause larger aggregates that exist only locally (Linux will segment meta-packets in the sending process and will not sent out say a large 64K TCP packet in fragments, but will re-segment the meta-packet into a neat sequence of complete self-sustained TCP packets)? IMHO they affect primarily the unit size the AQM might CE-mark on, in a way that is in-transparent to the end points. My point is the unit size an AQM acts on is generally unknowable precisely be the end-points. At which point making the end-points pretend that congestion strength somehow correlates with size of marked packets really stops making sense.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> The segment delivered can be a different size to the unit of transmission. This is an implementation optimisation - if this done without regard to the marking, then the results will be different and likely do not deliver what is expected - optimisations need to understand what they optimise.
>>>>> 
>>>>> 
>>>> 	[SM3] Yes, we seem to agree that it is impossible for endpoints to veridically measure the amount of octets actually CE-marked for a number of reasons. IMHO from this observation it follows directly that basing end-point decisions on number of marked octets is not going to generally work, as that number is not robustly and reliably available at the end-point. That is end-points do see a number but that number is not guaranteed to actually match what happened at the marking entity, and hence this number can not be a correlate of congestion strength. Interpreting that number never the less as indicator of congestion strength seems hence sub-optimal and not something to unconditionally recommend.
>>>> 
>>> [BB] With a better understanding of preserving marking probability when re-framing, I hope you can now see that it would be possible for endpoints to respond based on packet size of congestion indications (even if any re-framing includes approximations).
>>> 
>>> 
>>>>>>>>>> B) Section 2.3 then recommends to use the size of marked packets as direct indicators of congestion strength.
>>>>>>>>>> 
>>>>>>>>>> C) Section 2.3 then later clarifies that transports should interpret the size of CE-marked packets as correlate for congestion strength but are in no way required to take this interpretation into account when acting based on the congestion signal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This has several problems:
>>>>>>>>>> 1) A) and B) are in direct contradiction to each other. If we ask marking nodes to ignore packet size while marking, but end nodes to take it into account we basically create random congestion strength "information" by the pure chance of a specific packet of a specific size "catching" a CE mark. At which point we might as well simply draw a random number at the end-point to interpret congestion strength (except that packet sizes are not distributed randomly).
>>>>>>>>>> 
>>>>>>>>>> 2) Asking endpoints to interpret CE_marks in this way but not act on it, is hardly actionable advice for potential implementers. If we can not recommend a specific way, we should refrain from offering recommendations at all to keep things as simple as reasonably possible.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> This doesn't appear to be textual errata, it seems more like the request is for more clarification or motivating an alternative?
>>>>>>>>> 
>>> [BB] I hope I have explained for Sebastian how A) & B) being different is a way of exploiting random selection, not a contradiction.
>>> Re #2, the draft doesn't say "don't act on it"; it says it's not mandatory, which is very different (and necessary when protocols already exist that do not comply).
>>> 
>> 	[SM] I thank you for your attempt at explaining these contradictory recommendations. I am sad to report that you failed to do so in a satisfactory fashion. End points should respond to the best possible estimate of congestion along the path, and if that information is independent of packet size, then interpreting packet size when deciding on a response is simply not acting on the best possible estimate... 
> 
> [BB2] I've now made a further attempt to explain why this is incorrect logic. Pls reconsider, because many of the tried and tested algorithms used in traffic control (CCAs, AQMs, policers, etc) use similar methods to those you seem to think are "incorrect".

	[SM2] No, you are conflating things here. That a FQ scheduler/AQM adjusts the marking/dropping for each flow to get the desired approximate rate equality is not what I mean, I note that such an AQM will not even look at the packet size, but simply at the behavior we try to equalize (or rather a proxy of that)... Do even if sich an AQM gives a different marking rate to flows of small packet flows than of large packet flows, packet size is not the relevant parameter here. If you do not believe that, exchange the CCA in the small packet flow with a much more aggressive one, the hallmark of a size-dependent AQM would be that such a flow would get a higher throughput...  And for single low AQMs, well these are better than o AQM, I grant you that, but generally we should replace these ASAP with better designs.
	Now, I argue that and points really can not meaningfully use size of marked packets to deduce a more meaningful measure of the state of congestion along the path. And I also argue that without knowing how the competing flows' size distribution looks a small packet flow can not really robustly and reliably scale its response based on packet size. 



> 
>> IHO this looks like an attempt at increasing "fairness" at the bottleneck for flows of different packet sizes, but not a robust and reliable attempt at that. If we desire better bottleneck sharing behaviour, we already know how to accomplish that by making the bottleneck better managed and make better marking decisions (by e.g. taking a flow's contribution to congestion into account what and when to mark). Trying to solve this specific problem from the end-points via heuristics like scaling the response to size of received marked packets is IMHO not a fruitful way forward.
> 
> [BB2] It is not an attempt at increasing rate 'fairness' per se, but it is an attempt to set down the 'rules of the road' that will ensure interoperability if anyone chooses to design something to increase 'fairness' wrt different packet sizes. 

	[SM2] Then I am missing a general mandate for all flows to respond as if they were using a common reference packet size, without such a common reference these 'rules of the road' will not result in the desired behaviour. 


> As such, it is not appropriate to say (paraphrasing) "Well I think the whole class of algorithms that do not rely on per-flow queuing ought to be deprecated, and everyone ought to use FQ instead."

	[SM2] I agree to disagree with you here. For FQ scheduler/AQMs scaling the response by marked packet size is irrelevant busy-work, and for single-queue AQMs doing so will only result in small changes for increased complexity. But humor me and show data that your proposed method actually delivers and is robust against arbitrary path characteristics (like pMTU). As I said before I will (occasionally grudgingly) accept that data can settle such disputes. And I consider the onus to supply supporting data on the party making such claims.


> This RFC covers the whole set of traffic control algorithms: CCAs, AQMs, etc. without assuming FQ but allowing for it. You have raised an erratum on this RFC. I have explained why the RFC says what it says, which has taken a huge amount of my own time.

	[SM] Thank you for your responses.


> You seem to have stopped disagreeing with the network aspects of the draft (AQM and splitting/merging).

	[SM] Where did I do that? I still maintain that your goal 2is a) generally unachievable, b) considerably more complex that the alternatives, c) you have shown not data demonstrating that your added complexity actually results in better congestion response (nor explained how we would assess that) than the two alternative optins for goal 1. 


> Now that we are left with the CCA response part, you seem to have decided that you would rather duck the question of whether any statement is actually wrong, and resorted instead to saying you wish the whole system didn't exist in the form that it does, and it all ought to be based on FQ.

	[SM] Then let me be explicit; the argument that an end-point can meaningfully scale its congestion response by the size of the marked packet is _generally_ incorrect. As I have explained repeatedly? above doing so either requires knowledge about the "scaling aggressiveness" of the competing flows or that all flows use the same size adaptation method. Neither is true for the present situation and IMHO also unlikely for the future.
	Now, if flows would cycle through using different packet sizes quickly, and AQMs would encode congestion magnitude based on the size of the packet they mark the end-points could get a better estimate of the true congestion and hence should take packet size into account when evaluation marks. But that is not the case that rfc7141 recommends.


> Perhaps it would help you to be more constructive if you imagined an Internet where there was widespread per-flow rate policing (or FQ). Then consider how CCAs ought to deal with smaller packets without triggering any punishment from the policers.

	[SM] I assume with policer you mean more traffic shaper (with queue) than strict rate limiter (packets exceeding a limit get dropped), how would in such a world a punishment for small packets realize? Any greedy flow, independent of packet size, is expected to ramp up until it hits a limit, either application limited, or network limited in the latter case I expect the flow to eventually receive a signal (be it CE mark or drop) and also to respond to that signal.
See https://dl.acm.org/doi/pdf/10.5555/230719.230732 how DRR does a decent job of arbitrating over flows of different packet sizes pretty efficiently, given that data I see no punishment for small packets. But please elaborate what you meant here.

Regards
	Sebastian

> 
> 
> _______________________________
> [BB2] Notes
> 
> {Note 1} As promised, there follows the working for the FQ-CoDel example given earlier
> 
> Different long-running ECN flows use either all small (index 1) or all large (index 2) SMSS packets 
> Some flows of each size use Reno and some use Reno-SP, where Reno-SP is an invented algorithm defined earlier.
> It doesn't matter how many flows of each - all that's important is that FQ keeps them all to the same bit rate
> 
> Terminology:
> x: flow bit rate
> s: SMSS of all packets within a flow
> p: flow marking probability
> R: RTT of flow (assumed all the same, so as not to distract from the focus on size-dependence)
> T: inter-mark time converged on by CoDel AQM in steady-state.
> K: the Reno constant, roughly 3/2
> I've used unicode encoding, so apologies if you're still living in an ASCII world.
> 
> A) Reno
>     x1 = s1/R * √(K/p1)
>     x2 = s2/R * √(K/p2)
>     x1 = x2
> => s1/√p1 = s2/√p2
> => p1/p2 = (s1/s2)²                    (1)
> 
> A marking probability (assumed low) can be converted to a time interval between marks as follows:
>     T1 = s1/(p1*x1)
>     T2 = s2/(p2*x2)
> => T1/T2 = s1/s2 * p2/p1        (2)
> Substituting from eqn (1)
>     T1/T2 = s1/s2 * (s2/s1)²
>                = s2/s1
> 
> B) Reno-SP
>     x1 = s2/R * √(K/p1)
>     x2 = s2/R * √(K/p2)
>     x1 = x2
> => p1 = p2                                (3)
> Substituting into eqn (2), which still applies here
>     T1/T2 = s1/s2
> 
> 
> {Note 2}: I would add that, wherever the frame window is measured up to, the packet window has to be measured up to a point half a packet earlier in order to get the most accurate comparison. That's because the algorithm deliberately marks packets as early as possible (when the balance exceeds just 1 byte) to avoid any part of the congestion signal at the end of a congestion episode being delayed. That's a "good thing", so shifting the packet measurement earlier shouldn't be considered as "incorrect".

	[SM] Side-note, this moving congestion marks essentially between frames clearly is an operation that assumes the marking entity did not select specific frames purposefully... this likely is true, but it is by no means guaranteed.


> 
> 
> 
> Regards
> 
> 
> Bob
> 
> 
>> 
>> 
>> Regards
>> 	Sebastian
>> 
> [snip]
> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/