Re: [tsvwg] [Technical Errata Reported] RFC7141 (7237)

Sebastian Moeller <moeller0@gmx.de> Fri, 22 September 2023 14:42 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BA2B3C14CE4F; Fri, 22 Sep 2023 07:42:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.444
X-Spam-Level: **
X-Spam-Status: No, score=2.444 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, GB_SUMOF=5, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eXRKxxnIzcTc; Fri, 22 Sep 2023 07:42:24 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A9B30C14CE46; Fri, 22 Sep 2023 07:42:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1695393693; x=1695998493; i=moeller0@gmx.de; bh=oRvR9PggMC/lOGJw973xYYTgi8ioMT7fabH/dv8sAHA=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=TFfS7X0+fthPGyif3/HSiosbiXlt4VfabDogKGXCAS/uyKZAnA4i9KswFc5vSaji37HoVGdGnXf SA7nOAsAED2CIMmi1+sl87MCb7stkU6JSWjbMVQjItw6WQ+AFhwFB7vlCnOWE79XgJNz1mGdkwuxe lvSBpeprcMjPs8LRQflW7bi0DLqMBAEDpWTxRlNvJaSC0pc1SHbLH8gneE6+RVdS/jzy+CAuTN7xj sTmfrCT4W4RqBRqm5X2TzaJU6IxK7HJYV48SaGct5HZVSDG8KJ1JY/4xYNJMlxaYf/0eccuifDzMX E29SXPx8k15OnimIq5xj9AUJRduWqwD1h5TQ==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from smtpclient.apple ([134.76.241.253]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MfpOd-1rKhuu2ifB-00gDXk; Fri, 22 Sep 2023 16:41:33 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <073c8aed-91f4-a11a-771e-9932032cedba@bobbriscoe.net>
Date: Fri, 22 Sep 2023 16:41:30 +0200
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, tsvwg@ietf.org, bob.briscoe@bt.com, jukka.manner@aalto.fi, RFC Errata System <rfc-editor@rfc-editor.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <25626F56-AA8E-4B7D-904E-F17E2B57642E@gmx.de>
References: <20221104094005.747A455F68@rfcpa.amsl.com> <4aef3037-fae5-68c9-661f-4ce89b1ce7e7@erg.abdn.ac.uk> <273A82C1-E675-4950-A7E0-E8C564B09834@gmx.de> <6672b32e-19b6-b295-1460-904481de2c83@erg.abdn.ac.uk> <1351054E-7647-40CA-B2FA-7A566DE09E24@gmx.de> <f02cfbb6-9a14-0c70-4986-358b9226033f@erg.abdn.ac.uk> <CC3F2650-2CC7-4EC9-B0BC-2200D482CDEC@gmx.de> <073c8aed-91f4-a11a-771e-9932032cedba@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
X-Mailer: Apple Mail (2.3696.120.41.1.4)
X-Provags-ID: V03:K1:ZLQILSHcUxfVFwxnQGaMl9OeDpy0LYZQYENWTgTgINt+WiQejjM 3UTvWMWIiBbIkNHqxcqACUtHb3fYXqIaw1HYrGy4YUMIzQ12oFRC1IOXm9kputy+tfBSRKP +Pm+vFssQsoWgF+T6/6uKkRgTcJrO9Y/L+3FhYeSETBqrsbOqrD03Y9k04ezEq79061FUzi h9iKlntZOaAh19Mrp/7pA==
UI-OutboundReport: notjunk:1;M01:P0:6K79htpcbO0=;5ULlNtieTmtpMGu2EBHig7yY8Nh Z8WWDFl58jdQCAwX0OhG2poweTZyH+ItMO4mzt4Y/ZJV5Pk0fn5jZOwRyKcnV0NGAnzjcScNT E/TK5jTtp1gW1xkAUxi452vFjQp4rxHvVTDwr+E7ag4+r/VJlpxcXHsQ5JOsBWM/l5vtxz8oH tNydDFBdi+EX634fjcjxrd6YEm8R9jcFtJfdO0UfiJ0EYx9lA61gIMgwjOW+yaubqjtK+skHD 7iSUFHUikJL+YccTTfuzXY+kOHuHU1dAyGNRNT71thwMKmav50yWSKwj12rJjXWr0SkarhIDW cYzkFh326TKjgU9JZ8zAbiZekvQuiH/Rp8+mj1wwAotXOAm1QHZ8C8M64LbmjP9ES4yhPb2zL P1/uO7KJ/ja31vQaBs3Wo3t0PEZpdsaiU1PhgiyK0h1KUmr+b5viGa7W9BGZzEv8WA6//prHO 563aLAxLeX1ToVuIXjagN6O1V557vkYNOSTSHvgLXd72xGKEofSSoMyoBOpYbfPH83R1F4GFU MH3nIGzDL3I/Jq9IDk7Bj6ASbZswADT3mpEPH602NzpfNSoLJSFlwiyWsnPStSXTOlfqlWDZV zPk+kfAOfkl8geVjNzDQKBnTdJ/Cui0n0nCjHy7bD6sDvEt2L5BEWRpQgGjjXqgW0dUdRNX9P nbz2HbvoLNUm4Pk6dIzINniyyTb0BaIoDvlIIKDp9gUi6T9We+vDEZsp3k0xQ9ecW3cU4kBPL lqhXdt7wqv/easMJoK39Fxtp0096ybCysf6x01lVW7s1/aogB96XtDePTW/eG+gpIN7ifk4ZC odt84nhgqC6AWN35kfCKvd7VVHwjVOecKiB+1K+mUZ1BbVpketmJy1mwHviNO94L27cKH8nUl LWki76INJJ9nclcUw0MGSWwUEgNdA6zFpCxRyTXx6YFi5V9heSChLENCYor0ZjNVv2O7OyUvt s8MgOVOt8I63Y2xl2agLwQ5IROc=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/3oF63P_QiixN0ktv1kRvFsqDUVQ>
Subject: Re: [tsvwg] [Technical Errata Reported] RFC7141 (7237)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Sep 2023 14:42:28 -0000

Dear Bob,

> On Sep 21, 2023, at 18:28, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastian,
> 
> I've just read your erratum, which is about §2.2 & §2.3:
>     https://www.rfc-editor.org/errata/rfc7141
> and I just read this thread, which is more about §2.4 (sorry I missed these at the time).
> See [BB]
> 
> 
> 
> On 04/11/2022 13:37, Sebastian Moeller wrote:
>> Hi Gorry,
>> 
>> 
>> 
>>> On Nov 4, 2022, at 14:03, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
>>>  wrote:
>>> 
>>> On 04/11/2022 12:42, Sebastian Moeller wrote:
>>> 
>>>> Hi Gorry,
>>>> 
>>>> 
>>>> 
>>>>> On Nov 4, 2022, at 11:56, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
>>>>>  wrote:
>>>>> 
>>>>> On 04/11/2022 10:43, Sebastian Moeller wrote:
>>>>> 
>>>>>> Hi Gorry,
>>>>>> 
>>>>>> See [SM] below.
>>>>>> 
>>>>>> On 4 November 2022 11:20:56 CET, Gorry Fairhurst 
>>>>>> <gorry@erg.abdn.ac.uk>
>>>>>>  wrote:
>>>>>> 
>>>>>>> Commenting as an individual on the Errata filing:
>>>>>>> 
>>>>>>> On 04/11/2022 09:40, RFC Errata System wrote:
>>>>>>> 
>>>>>>>> The following errata report has been submitted for RFC7141,
>>>>>>>> "Byte and Packet Congestion Notification".
>>>>>>>> 
>>>>>>>> --------------------------------------
>>>>>>>> You may review the report below and at:
>>>>>>>> 
>>>>>>>> https://www.rfc-editor.org/errata/eid7237
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --------------------------------------
>>>>>>>> Type: Technical
>>>>>>>> Reported by: Sebastian Moeller 
>>>>>>>> <moeller0@gmx.de>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Section: 2
>>>>>>>> 
>>>>>>>> Original Text
>>>>>>>> -------------
>>>>>>>> 2.2.  Recommendation on Encoding Congestion Notification
>>>>>>>> 
>>>>>>>>     When encoding congestion notification (e.g., by drop, ECN, or PCN),
>>>>>>>>     the probability that network equipment drops or marks a particular
>>>>>>>>     packet to notify congestion SHOULD NOT depend on the size of the
>>>>>>>>     packet in question.
>>>>>>>> [...]
>>>>>>>> 2.3.  Recommendation on Responding to Congestion
>>>>>>>> 
>>>>>>>>     When a transport detects that a packet has been lost or congestion
>>>>>>>>     marked, it SHOULD consider the strength of the congestion indication
>>>>>>>>     as proportionate to the size in octets (bytes) of the missing or
>>>>>>>>     marked packet.
>>>>>>>> 
>>>>>>>>     In other words, when a packet indicates congestion (by being lost or
>>>>>>>>     marked), it can be considered conceptually as if there is a
>>>>>>>>     congestion indication on every octet of the packet, not just one
>>>>>>>>     indication per packet.
>>>>>>>> 
>>>>>>>>     To be clear, the above recommendation solely describes how a
>>>>>>>>     transport should interpret the meaning of a congestion indication, as
>>>>>>>>     a long term goal.  It makes no recommendation on whether a transport
>>>>>>>>     should act differently based on this interpretation.  It merely aids
>>>>>>>>     interoperability between transports, if they choose to make their
>>>>>>>>     actions depend on the strength of congestion indications.
>>>>>>>> 
>>>>>>>> Corrected Text
>>>>>>>> --------------
>>>>>>>> I am not sure the text is actually salvageable, as it appears ti be a logic disconnect at the core of the recommendations.
>>>>>>>> 
>>>>>>>> Notes
>>>>>>>> -----
>>>>>>>> The recommendations seem not self consistent:
>>>>>>>> A) Section 2.2.  recommends that CE marking should be made independent of packet size, so a CE-mark carries no information about packet size.
>>>>>>>> 
>>>>>>> I did not understood that it needed to. 
> 
> [BB] 
> 1/ I've emphasized the words in your erratum that help me see where a first misunderstanding might be:
>> A) Section 2.2. recommends that CE marking should be made independent of packet size, so a CE-mark carries no information about packet size.
>> B) Section 2.3 then recommends to use the size of marked packets as direct indicators of congestion strength.
> 
> An analogy might help explain: When a tail-drop queue discards a packet, or when a flow-agnostic AQM (like PIE or RED) ECN-marks a packet, they do so independent of flow ID. Nonetheless, because there is a flow-ID in each packet, these arbitrary/random algorithms blindly pick the flows that should respond, even though these algorithms don't read the flow-ID (it can even be encrypted). 

	[SM] Yes, in a sense a single/dual-queue AQM treats its traffic as a single aggregate, but it expects that aggregate to respond to congestion signaling. This is less important for drops, and considerably more important for congestion marks (to avoid easy abuse of the marking system). (Side-note, my take on hiding flow-ids is, you are free to do so if you desire, but that decision is not guaranteed to be without side-effects, after all the IETF recommends not to starve any flow/connection, and to be able to do so the network needs information about what constitutes a flow... but I digress)


> 
> Returning to packet-size, most AQMs don't mark dependent on packet size.

	[SM] For a change I agree with you, packet size does not seem a meaningful determinant of the contribution of a flow to congestion, so not making a signaling decision contingent on this seems like the a better choice than taking it into account. To illustrate this if we compare two flows both currently using up 80% of the bottleneck capacity each (for a total of 160%, so we need to signal "slow down" as we are or shortly will be growing the bottleneck queue) one using a (hypothetical) MSS of 1500 and the other a MSS of 150 octets, both contribute equally to the impeding (capacity) congestion and both should reduce their cwin by an equal proportion... I would argue that the relevant factor an AQM might base signaling severity on would be something like most recent capacity-share a flow used, or, if you must, percentage of current contribution to the queue (best measured in service-time or aggregate size). Given that current ECN/L4S offers only to have a single signaling bit (assuming we want to avoid drops) what an AQM can do is increase the marking frequency*.



*) Mind you, I consider this to be a problem in its own right, especially for single/dual-queue AQMs as non-amgiguous signaling of congestion magnitude has been shown in several papers to be beneficial.


> But packets do have a size (and a size field). And larger packets do contribute more per packet to link congestion than smaller packets.


	[SM] Let me cite your own words here:
"applying to just each frame in isolation, which would make absolutely no sense at all"
This, IMHO applies here as well, marking a packet only makes sense if the flow it belongs to reacts to that mark. And as a corollary an AQM ideally send marks preferably to those flows contributing most to the congestion, but that does not depend on packet size, as I hope I showed in my above example. So yes, if we force ourselves to only compare two random packets out of a shared queue the larger packet occupies more equivalent service-time of that queue than the smaller, but that is as true as inconsequential.




REQUEST to list members: If anybody can demonstrate that this is an incorrect interpretation on my side please do so.



> So, if we assume that the marking algorithm has not already taken packet size into account, the strength of a congestion signal for the purpose of congestion response can be taken to depend on the size of the packet it is applied to.

	[SM] Again, I disagree, you are now making a leap of faith from the comparison of two random packets (which I technically agree with) to deduce something about the two flows these two packets were picked from. And there I disagree that your assumption generally holds (just think path's of different pMTU/pMSS). Like if there was a flow scheduling bottleneck with an individual AQM for each "flow" then making the response contingent on packet size is arguably exactly the wrong thing. Similarly if a flow should always send pairs of packets one large one small (as sees in some games) by your logic the flow should respond differentially dependent of the small or the large packet was marked, which I consider not to be a defensible position.

 
> 
> And that can be done, even if the people who originally developed the AQM algorithm thought that packet-size dependence ought to be done by the AQM (but it wasn't enabled). 

	[SM] Again, I fully agree with you, packet size is not a robust and reliable predictor of "contribution to congestion" and hence neither AQMs nor end-points should pretend it is. We seem to agree on the former, but not on the latter.

> 
> Next step: If this does capture the your misunderstanding on /this/ point, pls say, then either you or I can suggest an edit to avoid the misunderstanding.

	[SM] As I tried to explain above, I do not see a mis-understanding, I do however see a logically problematic extrapolation from random packets to flows/connections that I consider to require quite a lot of evidential data to back it up...

> 
> 2/ A second problem might be that RFC7141 doesn't actually outline the original packet-size problem that byte-mode RED was trying to solve. I detect you're not familiar with that history, and why should you be? That /is/ something that could be corrected with an erratum.

	[SM] That is one (not the only) argument from rfc7141 that I am happy to agree with: that marking should not depend on packet-size. However, unlike you, I clearly see that if packet size is not a measure of congestion severity, then it should also not be interpreted as such.

> 
> 
> 3/ Then this thread shows further misunderstanding regarding §2.4 on splitting and merging - see [BB] below...
> 
>>>>>>> This RFC I think was intended to be independent of the transport.  I see the transport sender as responsible for determining the packetisation of the transport segments, and the (S)ACKs can often identify segments, hence the sender can determine the segments that have been acknoweldged or times when ECN marking was seen.
>>>>>>> 
>>>>>> [SM] This assumes that relevant segment size does not change along the path. Which generally is not true. Just think fragmentation, if the sender sends a packet that gets fragmented along the path and only a single fragment gets CE marked the sender will see this as the whole packet being marked. Or from the other side of the issue, if say a Linux router uses GRO/GSO and queues a larger meta packet and CE marks that, receiver and sender at best see a sequence of CE marked packets. So the recommendation would need to be changed to calculate the consecutive sequence of CE marked octets and take these as correlate for congestion strength. So no, the sender really has no reliable knowledge about the size of the data unit the marking node marked.
>>>>>> 
>>>>>> 
>>>>> I suggest IETF transports treat all IP fragments as one unit of retransmission/congestion at the transport layer.
>>>>> 
>>>> 	[SM2] But what if the re-segmentation does not happen at the receiver, but say a fragmenting and CE-marking path tries to act transparently. According to the rules both in RFC3168 and RFC7141 a re-segmented packet containing even a single CE-marked fragment is to be CE-marked (or dropped). So the AQM might have marked a 576 octet segment but all the endpoint sees is a marked ~1460 octet segment.
>>>> 	This also illustrates how section 2.4 of RFC7141 proposes a method that does not achieve its aim, of giving veridical "number of market octets" information. It simply is impossible to do so generally (often it will work, but the endpoints can not even know when it was correct and when not).
>>>> Section 2.4 has more issues BTW, it tries to give recommendation how to deal with splitting and merging but fails to achieve its goals of giving a veridical account of the marked octets:
>>>> 
>>>> Let's see what happens when applying the proposed counter method in regards to number of marked octets under the conditions this section addresses
>>>> Here let's look at a toy problem with 20 byte headers and a total payload of 1200 octets that is split in or merged out of 3 fragments/segments with 400 octets payload each
>>>> 
>>>> Merging multiple segments pre-marking:
>>>> (20+400) + (20+400)-20 + (20+400)-20 -> 1220 total 1200 payload + CE
>>>> -> AQM marks 1220 or 1200 octets
>>>> (12+1200)+CE
>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>> sender can assume 1200 octets where marked
>>>> CORRECT
>>>> 
>>>> Merging multiple segments post-marking ():
>>>> -> AQM marks segment 2 of 420 or 400 octets
>>>> (20+400) + (20+400+CE) + (20+400)
>>>> (20+400)+(20+400+CE)-20+(20+400)-20 = 1220 total 1200 payload + CE
>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>> sender must assume 1200 octets where marked
>>>> FALSE
>>>> 
>>>> Fragmenting a segment pre-marking
>>>> 1220 -> (20+400) + (20+400) + (20+400)
>>>> -> AQM marks segment 2 of 420 or 400 octets
>>>> (20+400) + (20+400+CE) + (20+400)
>>>> Resegmentation happens before protocol sees marking
>>>> (20+400) + (20+400+CE)-20 + (20+400)-20 -> 1220 total 1200 payload + CE
>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>> sender must assume 1200 octets where marked
>>>> FALSE
>>>> 
>>>> Fragmenting a segment post-marking
>>>> (20+1200)
>>>> -> AQM marks 1220 or 1200 octets
>>>> (12+1200)+CE
>>>> fragmentation happens:
>>>> (20+400+CE) + (20+400+CE) + (20+400+CE)
>>>> Resegmentation happens before protocol sees marking
>>>> (20+400=CE) + (20+400+CE)-20 + (20+400+C)-20 -> 1220 total 1200 payload + CE
>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>> sender must assume 1200 octets where marked
>>>> CORRECT
>>>> 
>>>> So only in two out of four conditions does the proposed method actually achieves its goal.
>>>> 
> 
> [BB] You seem to have interpreted §2.4 as applying to just each frame in isolation, which would make absolutely no sense at all.

	[SM] This example above indeed looks like this but that is mainly to keep the number of conditions to look at low, possible that this results in too much simplification.



> It applies to a stream of frames that are being split or merged. Indeed, the text already makes this clear, as follows:
> 
>    even the smallest
>    positive remainder in the conceptual counter should trigger the next
>    outgoing packet to be marked (causing the counter to go negative)
> 
> 
> I detect that you prefer to be fed every little detail, so here's some pseudocode that might help.

	[SM] I respectfully argue that as the author of a method, the onus is on you to describe it in sufficient detail. Whether you consider that spoon feeding or not is irrelevant.



>     https://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing_goal2_pseudo.c
> It's written in terms of frame decapsulation, not fragment reassembly, but you can see it works on the principle of ignoring the sizes of any headers that are not preserved during the process.
> 
> Applying the pseudocode to your examples:
> Ignoring the timeout block for a moment, you should be able to see that, in the two cases you've tagged 'FALSE', _diff will become negative (-800) after the reassembled packet in either of your 'FALSE' examples is marked. Then if more 400-byte-payload fragments arrived, it would take three marked ones to make _diff positive again, causing the packet reassembled from last to be marked before forwarding. 

	[SM] Sure this is averaging out quite nicely, but it absolutely requires temporal averaging, because at its core it is not correct frame by frame.


> 
> This carrying over of the balance only continues as long as the time between incoming marks is less than CE_TIMEOUT. If not, you can see that the counters reset to become equal. The value of CE_TIMEOUT is just for illustration.
> 
> The use of WRAP_GUARD is just a cheap emulation of modulo arithmetic that allows either counter to wrap to zero before the other and still work regardless.

/* On completion of each reassembled packet */
void update_ecn_out(packet) {
    _diff = ce_oct_in > ce_oct_out;
    if ( (_diff > 0) || (_diff < WRAP_GUARD) {
        ce_mark(packet);    // Irrespective of whether it's already marked
        ce_oct_out += size(packet);   // Size including packet header
    }
}


Not sure WRAP_GUARD (#define WRAP_GUARD -(1<<63)) does that poor man's modulo thing you mention... given that _diff is either 1 or zero (_diff = ce_oct_in > ce_oct_out;, > is comparing left versus right), and the initial (_diff > 0) makes sure it is not zero.... i would guess that somewhere you wanted to use the real difference between ce_oct_in and ce_oct_out.

Now, I might well misread that code, after all I am no compiler, but that illustrates why actual code compared to pseudocode seems desirable...


> 
>>>> 
>>>> Now add the complication that the RFC fails to mention what it considers marked octets, just the payload or payload+headers.
>>>> 
> 
> [BB]  This RFC is not specifying a protocol, it's giving a statement of principle as a goal of future protocol design. Any protocol designer (or professional implementer) would be able to fill in details like which headers to include or ignore; or how to confine each instance of the algorithm to a stream of packets that had passed through the same AQM with the same type of ECN codepoint. 
> 
> Next steps:  Everyone else who discussed this section grocked the idea immediately.

	[SM] My take on this is, nobody even tried to implement this at all and hence politely ignored to interact with that proposal at that level of scrutiny.


> So I don't think more explanation is needed to understand §2.4. Certainly pseudocode would seem overkill, given splitting and merging is a fairly minor part of this RFC. But I'll leave that decision for whoever has to decide on errata - the AD I think?




> 
> 
>>>> This is important as the sum of payload + headers of X fragments is larger than the sum payload + header of the single packet re-constituted out of these fragments. So the de-fragmenting process arguably needs to only look at payload size, but RFC7141 section 2.4 does not make that explicit.
>>>> If an implementation actually uses the full size instead of the payload size now the last condition also gets it wrong:
>>>> 
>>>> Fragmenting a segment post-marking
>>>> (20+1200)
>>>> -> AQM marks 1220 or 1200 octets
>>>> (12+1200)+CE
>>>> fragmentation happens:
>>>> (20+400+CE) + (20+400+CE) + (20+400+CE)
>>>> Resegmentation happens before protocol sees marking
>>>> (20+400+CE) + (20+400+CE) + (20+400+C) -> 1220 total 1200 payload + CE
>>>> but (20+400)*3 = 1260 marked octets
>>>> receiver sees 1200 octets with CE and ACKs these with ECE
>>>> sender must assume 1200 octets where marked
>>>> CORRECT
>>>> But now the left over 40 bytes in the marked-octet budget will result in CE feedback for the next (re-assembled) packet.
>>>> FALSE
>>>> 
>>>> For rfc3168 that will not matter much as ECE is sustained until CWR is received anyway, but L4S style signaling now acquired an erroneous CE mark.
>>>> 
>>> Network fragmentation be it in tunnels, extension headers or IPv4 fragments is indeed thwarted with all manner of issues. Nothing new - the IETF has long recommended the unit of loss/marking to be the same as the end to end PDU. PMTU is tricky, but does have benfits:-)
>>> 
>> 	[SM3] Not a fan of fragmentation either, but I assume that fragmentation will stay a fact of life over the internet independent of my opinion. My point here is if the IETF proposes a method that aims to correctly account for CE-marked octets, that method should actually deliver on its premise. Failing in 2-3 out of 4 conditions the method is designed to handle is IMHO a sign that the proposed method is/was not in proper shape for becoming an official recommendation. 
>> This is fine in an informational RFC as documenting subjective opinion, but problematic in a standards or BCP type document, if like in this case we feel that BCP methods (even incomplete or impossible to achieve ones) are binding precedence that need to be respected in later RFC drafts.
>> 
> 
> [BB] Instead of implying that everyone involved has been acting unprofessionally, or incompetently, it is good practice to word emails in such a way that allows for the possibility that you just haven't understood something.

	[SM] Yes, that is indeed good advise, thank you for that. However it also ignores my argument that especially in a BCP the onus is on us to make sure our recommendations are water-tight, correct and useful.

> 
>> 
>>>> 
>>>>> GSO/GRO and variants would/could change the fragmentation, that is true and need to be considered.
>>>>> 
>>>> 	[SM2] I am confused? How do GRO/GSO affect fragmentation, IMHO these two will cause larger aggregates that exist only locally (Linux will segment meta-packets in the sending process and will not sent out say a large 64K TCP packet in fragments, but will re-segment the meta-packet into a neat sequence of complete self-sustained TCP packets)? IMHO they affect primarily the unit size the AQM might CE-mark on, in a way that is in-transparent to the end points. My point is the unit size an AQM acts on is generally unknowable precisely be the end-points. At which point making the end-points pretend that congestion strength somehow correlates with size of marked packets really stops making sense.
>>>> 
>>>> 
>>> The segment delivered can be a different size to the unit of transmission. This is an implementation optimisation - if this done without regard to the marking, then the results will be different and likely do not deliver what is expected - optimisations need to understand what they optimise.
>>> 
>> 	[SM3] Yes, we seem to agree that it is impossible for endpoints to veridically measure the amount of octets actually CE-marked for a number of reasons. IMHO from this observation it follows directly that basing end-point decisions on number of marked octets is not going to generally work, as that number is not robustly and reliably available at the end-point. That is end-points do see a number but that number is not guaranteed to actually match what happened at the marking entity, and hence this number can not be a correlate of congestion strength. Interpreting that number never the less as indicator of congestion strength seems hence sub-optimal and not something to unconditionally recommend.
> 
> [BB] With a better understanding of preserving marking probability when re-framing, I hope you can now see that it would be possible for endpoints to respond based on packet size of congestion indications (even if any re-framing includes approximations).
> 
>> 
>>>>>>>> B) Section 2.3 then recommends to use the size of marked packets as direct indicators of congestion strength.
>>>>>>>> 
>>>>>>>> C) Section 2.3 then later clarifies that transports should interpret the size of CE-marked packets as correlate for congestion strength but are in no way required to take this interpretation into account when acting based on the congestion signal.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This has several problems:
>>>>>>>> 1) A) and B) are in direct contradiction to each other. If we ask marking nodes to ignore packet size while marking, but end nodes to take it into account we basically create random congestion strength "information" by the pure chance of a specific packet of a specific size "catching" a CE mark. At which point we might as well simply draw a random number at the end-point to interpret congestion strength (except that packet sizes are not distributed randomly).
>>>>>>>> 
>>>>>>>> 2) Asking endpoints to interpret CE_marks in this way but not act on it, is hardly actionable advice for potential implementers. If we can not recommend a specific way, we should refrain from offering recommendations at all to keep things as simple as reasonably possible.
>>>>>>>> 
>>>>>>> This doesn't appear to be textual errata, it seems more like the request is for more clarification or motivating an alternative?
> 
> [BB] I hope I have explained for Sebastian how A) & B) being different is a way of exploiting random selection, not a contradiction.
> Re #2, the draft doesn't say "don't act on it"; it says it's not mandatory, which is very different (and necessary when protocols already exist that do not comply).

	[SM] I thank you for your attempt at explaining these contradictory recommendations. I am sad to report that you failed to do so in a satisfactory fashion. End points should respond to the best possible estimate of congestion along the path, and if that information is independent of packet size, then interpreting packet size when deciding on a response is simply not acting on the best possible estimate... IHO this looks like an attempt at increasing "fairness" at the bottleneck for flows of different packet sizes, but not a robust and reliable attempt at that. If we desire better bottleneck sharing behaviour, we already know how to accomplish that by making the bottleneck better managed and make better marking decisions (by e.g. taking a flow's contribution to congestion into account what and when to mark). Trying to solve this specific problem from the end-points via heuristics like scaling the response to size of received marked packets is IMHO not a fruitful way forward.


Regards
	Sebastian


> 
> 
> Bob
> 
>>>>>> [SM] What alternatives to changing incorrect text do exist? I do not think changing the status to historic is a realistic option in spite of the text recommending the impossible.
>>>>>> 
>>>>>> 
>>>>> Put simply:
>>>>> 
>>>>> An Erratum would normally specify either:
>>>>> 
>>>>>     a direct change of text to fix a mistake in production, but a change of the spec from the original intended method;
>>>>> 
>>>>>     or specify something to inform a future revision.
>>>>> 
>>>>> An update in a new RFC is needed to change the method, or a process request to mark an RFC as historic.
>>>>> 
>>>> 	[SM2] Would it also be possible to request to re-classify as informative? This RFC with its impossible recommendations is causing issues with other RFCs and I think it would help if this could be ameliorated by moving away from BCP status.
>>>> 
>>>> 
>>> If there is consensus an RFC shouldn't be associated with a BCP, we can examine what to do. The first thing is to write a (short) ID and see if you can gain sufficient attention from the WG to enable this to be discussed.
>>> 
>> 	[SM3] Thank you very much. Is there an example for a similar process I could look at and take inspiration from?
>> 
>> Regards
>> 	Sebastian
>> 
>> 
>> 
>>> Gorry
>>> 
>>> 
>>>>> Gorry
>>>>> 
>>>>> 
>>>>>>>> Instructions:
>>>>>>>> -------------
>>>>>>>> This erratum is currently posted as "Reported". If necessary, please
>>>>>>>> use "Reply All" to discuss whether it should be verified or
>>>>>>>> rejected. When a decision is reached, the verifying party
>>>>>>>> can log in to change the status and edit the report, if necessary.
>>>>>>>> 
>>>>>>>> --------------------------------------
>>>>>>>> RFC7141 (draft-ietf-tsvwg-byte-pkt-congest-12)
>>>>>>>> --------------------------------------
>>>>>>>> Title               : Byte and Packet Congestion Notification
>>>>>>>> Publication Date    : February 2014
>>>>>>>> Author(s)           : B. Briscoe, J. Manner
>>>>>>>> Category            : BEST CURRENT PRACTICE
>>>>>>>> Source              : Transport Area Working Group
>>>>>>>> Area                : Transport
>>>>>>>> Stream              : IETF
>>>>>>>> Verifying Party     : IESG
>>>>>>>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/