Re: [tsvwg] Progress with draft-ietf-tsvwg-ecn-encap-guidelines

Bob Briscoe <ietf@bobbriscoe.net> Thu, 21 September 2023 16:45 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AE869C1519B0 for <tsvwg@ietfa.amsl.com>; Thu, 21 Sep 2023 09:45:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.197
X-Spam-Level:
X-Spam-Status: No, score=-2.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, GB_SUMOF=5, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GkMn0V93Ex_o for <tsvwg@ietfa.amsl.com>; Thu, 21 Sep 2023 09:45:17 -0700 (PDT)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4AEF2C1519AB for <tsvwg@ietf.org>; Thu, 21 Sep 2023 09:45:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NixqQzyMWWa/zrYKriMgCg8+Kuacpav61+sQdLTc84Y=; b=BGeeSYP+a4clxGw7nfSiO8KQ04 ciji1EDCpU82E5PE5HfdOMfjremqP40V0GFFxKvWLIJyqMC9srfScl1ABf7CAZd2RMo3uKLXfQn6X 5OB9PF/1lqxF5m4nT09t9hEYePM/ILjlTsn9POjYnQxmjTkiEI7utNcHeWU434Ki6LoLqXEO8oLtw w1uaBdSdeCv9770ji+Jhl77nHjHUrmYrf40odYwr+81LCGVB2RtiX7bOPFK16572mvmudmUqJBPEV vAwueIZsWo5atqPgGFl06cc6Z9wmPgMFe+FBw6Vf4JVtxuK9j8RFpFca+MJ40r4QwULmND5yJP7Rj n1tR7fIg==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:42366 helo=[192.168.1.7]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <ietf@bobbriscoe.net>) id 1qjMnq-0005o9-1B; Thu, 21 Sep 2023 17:45:15 +0100
Content-Type: multipart/alternative; boundary="------------SHhBMzmaSsZqimTvodgCbIPP"
Message-ID: <e80e355b-098b-ede4-71cd-560f2b480538@bobbriscoe.net>
Date: Thu, 21 Sep 2023 17:45:14 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0
Content-Language: en-GB
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <23c00fae-e6a6-072c-0513-1c0d5c637c17@bobbriscoe.net> <3d442824-722f-90af-8d04-916b29bafca4@erg.abdn.ac.uk> <D2E7D6FA-C39D-44B4-BC27-8897CE24145C@gmx.de> <bdc9685f-77b1-50f6-63b7-8b167d850148@bobbriscoe.net> <C4D0E327-32E9-42E8-850F-DFF579612CD0@gmx.de> <52b12bcc-1aa7-9069-2b21-aeb00e9e39db@bobbriscoe.net> <646319E7-78D3-4BBF-9EC7-F069CE7124BA@gmx.de>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <646319E7-78D3-4BBF-9EC7-F069CE7124BA@gmx.de>
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/DHAq1duJlbYibgL2cGVmAGInC64>
Subject: Re: [tsvwg] Progress with draft-ietf-tsvwg-ecn-encap-guidelines
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Sep 2023 16:45:22 -0000

Sebastian,

On 13/09/2023 11:47, Sebastian Moeller wrote:
> Hi Bob,
>
>
>> On Sep 12, 2023, at 17:19, Bob Briscoe<in@bobbriscoe.net>  wrote:
>>
>> Sebastian,
>>
>> The draft is now going forward.
> 	[SM] As expected (and announced by the chairs). Had I not mentioned that method 2 was incorrect/incomplete in the past we would have ratified this long ago, in spite of it being incorrect. This does not fill me with confidence about our process.
>
>
>> But I will still respond...
>>
>> On 08/09/2023 08:23, Sebastian Moeller wrote:
>>> Bob,
>>>
>>>
>>>
>>>> On Sep 7, 2023, at 16:50, Bob Briscoe<in@bobbriscoe.net>
>>>>   wrote:
>>>>
>>>> Sebastian,
>>>>
>>>> Due to the impasse between the views of two 'camps', some time ago the chairs asked me (as editor of the draft) to write both goals in the draft without stating a preference for either. And two descriptions of example ways they might be implemented.
>>>>
>>> 	[SM] Yes, based on the observation that the same method was already sketched out in an earlier BCP. However, I question the validity of doing so given that:
>>>
>> [BB] I've addressed your point (c) first, because it is the root of the misunderstanding.
>>
>>> c) it flies in the face of how a flow should IMHO operate, it should try to generate the best possible estimate of the real state of congestion along the path and then react appropriately. And since IMHO no AQM marks based on packet-size (and hence does not effectively mark individual octets), it makes no sense to propagate ECN marks in an octet preserving fashion, as that does add noise to the data, making it harder to get a veridical estimate of what happened.
>> [BB] Just because the word 'octet' appears in the technique, doesn't make its marking size-dependent.
> 	[SM] Yes that is the point, rfc7141 recommends to mark size independent, but to interpret the marking size dependent, the whole rationale behind your method 2 hence must be to make this size dependent interpretation possible by conserving this in spite of the re-framing.

[BB2] The two ends are different (size-independent and size-dependent). 
So, there's no point trying to infer what "the whole rationale ... must 
be" by picking one end or the other.

> I understand that the chairs of tsvwg have already asked you to write up a draft of your arguments against RFC7141, if you have any. This is the constructive way expected at the IETF.

	[SM] I opted created an erratum for rfc 7141 instead and the response to that (and lack thereof) convinced me that writing a new draft is going to be an exercise in futility, but I digress.


[BB2] I've just responded to your erratum 
<https://www.rfc-editor.org/errata/rfc7141> on the tsvwg list.
Subject: [Technical Errata Reported] RFC7141 (7237)

So pls read that before continuing with this thread.


>
>> On the contrary, after decap, preserving marked octets preserves size-independence better than preserving the presence of marks. I'll try to explain that at the end of this response (in A3).
> 	[SM] See the challenge the real challenge here is to post-hoc figure out how the marking entity had decided, had is seen the IP packets individually... as this seems not really achievable I see the best approach to come up with something that is simple and still covers the gist of the congestion signaling.

[BB2] Again, see thread about your erratum to RFC7141, and below.
TL;DR: it is achievable.

>
>> Before that, my first assertion might have raised a question in your mind:
>>      Q1. In the draft, why do I say goal 1 is to preserve proportion, but the example preserves octets? Especially given that using octets seems to cause controversy.
> 	[SM] Why is it confused to assume that this method was motivated by rfc7141's recommendation to take the size of marked packets into account when "interpreting" congestion marks?

[BB2] RFC7141 isn't actually written like that. It is broken into 
sections, where each is about a different part of the process: Encoding, 
Responding, or Splitting / Merging packets. RFC7141 never says 
congestion notifications can or should be 'interpreted' universally 
across all these stages (without specifying what is doing the interpreting).

(Except for that sentence in §2.4 that I already said (further down the 
last round of this thread) that I would now disown; where it wrongly 
says that octet preservation for splitting/merging is based on the 
principle of responding to congestion as if every octet of a marked 
packet is marked.)

> Rfc7141 manly talks about proportionality to packet size, not proportionality to proportion of marked bytes. (With the possible exception of Appendic B1:
>    "Packet-mode drop actually gives flows sufficient information to
>     measure their loss rate in bits per second, if they choose, not just
>     packets per second.  Each flow can count the size of a lost or marked
>     packet and scale its rate response in proportion (as TFRC-SP does).)"
>
> If the justification to add method 2 is truely rfc7141's precedence then I would expect that method two actually is a consequence of rfc7141, which it strictly does not seem to be the case.
>
> Side-note: looking at TFRC-SP I get:
> In TFRC-SP, the loss event rate is calculated by counting at most one
>     loss event in loss intervals longer than two round-trip times, and by
>     counting each packet lost or marked in shorter loss intervals.
>
> This implies that TFRC-SP does indeed not look at the size of marked/lost packets when registering marks... so not sure the TFRC-SP reference here is useful.
>
>
>
>
>> That might lead to a second question, which I'll also answer below:
>>      Q2. Why preserve marking proportion anyway? Is it "the best possible estimate of the real state of congestion along the path"?
>>
>> A1. Why preserving octets preserves proportion
>>
>>
>> Imagine a stream of packets with their headers all run back-to-back as a stream of octets then cut up into frame payloads at L2.
>>      For instance, consider scenario a) at:https://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing.svg  (which excludes frame headers)
>>
>> Now, take any window of data, e.g. the first 3 frame payloads (top left). 1/3 are marked pink.
>> Now count the packets encapsulated inside those 3 frames. There are about 15. So proportion preserving wants 1/3 of 15 = 5 to be marked. The marked octet preserving technique in the Goal2 row marks 6 packets, which is near enough, given it deliberately rounds up.
> 	[SM]  Looking at your example, I immediately note that you did NOT depict the variant to achieving goal 1 by marking all packets somehow "touched" by the marked L2-frame (aka method 1a)... this will result in something in between goal 1b and goal 2 ...

[BB2] I didn't depict method 1a, because I do not believe it implements 
Goal1 and I'm arguing against Goal1 here (which I believe only method 1b 
implements). Nonetheless, as you know, the chairs asked me (as editor) 
to include both methods 1a & 1b verbatim in the draft, in order to 
document the lack of consensus.

However, as you brought it up, here's the problem with *method 1a*:
Consider scenario b) at: 
https://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing.svg
If there were a Method 1a row, an episode with infrequent frames being 
marked would translate into a run of 100% packet marking (or near-100%); 
similar to the right-hand episode in the Goal1 row of scenario b). Thus 
limiting the dynamic range available for frame marking (why that's such 
a problem is explained further below wrt method 1b).

So, having already shown (below) that Method 1b is problematic, both 
methods that purport to implement Goal 1 are problematic. That's 
because, as I've argued on this list, the rationale of Goal 1 is incorrect.

> [SM] ...with considerably less complexity than goal two (with its requirement for timeout and to account for dropped frames/packets). My point is goal 2 seems pretty complex with very little to show for it in regards to data demonstrating that is superior than methods 1a and 1b that both are considerably simpler in scope and implementation.

[BB2] So, you say method 2 has "got very little to show for it",
... other than you seem to be tending towards agreeing that it's the 
only one that works robustly - you just don't like the implementation!?

But you have made this complexity pronouncement without having seen code 
for either method and you don't know the specifics of the protocols 
involved, or even whether it's for hardware or software.

I'm not going to descend into "my pseudocode is less complex than 
yours."  But I've given pseudocode for method 2, so that others can 
judge complexity, depending on their particular scenario and protocols 
involved:
https://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing_goal2_pseudo.c

Re your specific points on complexity:
   * Accounting for dropping due to incompatible ECN fields:
       o Method 2 doesn't need to because each ECN type is already 
classified into a separate stream of frames.
       o Methods 1a & 1b:
          - will either need code branches to handle all the possible 
combinations of ECN types,
          - or they will also have to rely on classification into a 
stream per type.
   * Timeout:
       o I'll leave others to judge whether a single timeout is complex.

To preserve proportion (Goal2) I proposed method 2 in preference to 
other candidates because, during normal operation, there is no 
branching, making it ideal for pipelining, and I've shown how to avoid a 
lock by not sharing writing of the balance between in and out. Also, in 
method 2, there are only two state variables per ECN class, whereas both 
method 1a & 1b require ECN state per-packet.

>
>> Proportion is preserved because, when marking is approximately independent of frame and packet size:
>>      marked frames / total frames ~= marked octets before decap / total octets
>>      marked packets / total packets ~= marked octets after decap / total octets
>> In both cases frame headers are excluded, but packet headers are included, which makes total octets the same.
> 	[SM] This is fine, the question is still, is this complexity actually warranted.

[BB2] Complexity... according to your assessment.

>
>> By ensuring marked octets are preserved before and after decap, both the ratios on the right will be identical. This makes both the marking ratios on the left (frame and packet) approximately the same.
>>
>> A2. Why preserve proportion?
>>
>> For the same traffic and link scenario, irrespective of whether the decap preserves presence (Goal1) or proportion (Goal2), congestion control algorithms (CCAs) and the AQM will adjust so that the sum of the flows still fits into the link.
>>
>> The only reason to preserve marking proportion is to avoid the proportion of marks being shifted too far outside its normal operating range. I.e. not so high that it more often saturates at 100% and not so low that the AQM has to emit marks so far apart that control becomes very slack or jumpy.
> 	[SM] Then please show that methods 1a and/or 1b actually affect the proportion of marked bytes sufficiently strongly to make this more than a theoretical musing.

[BB2] Please don't trivialise of other people's arguments with swipes 
like this.

>
>> For example, let's consider scenario a) inhttps://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing.svg  again. With the Goal1 decap (2nd row assuming the second implementation bullet from the draft), even though there are about 3 packets in each marked frame, only one packet gets marked each time.
> 	[SM] Yes, that is a direct consequence of the propagate the number of congestion marks policy that is at the heart of method 1b.
>
>
>> Assuming the IP packets in a frame will often belong to separate flows, this means fewer flows see each L2 mark. So the system will adjust to increase the L2 marks, until enough flows respond. But the AQM cannot mark more than 100% of the frames, so it cannot mark more than about 1 in 3 of the packets.
> 	[SM] Yes, and that is why we have method 1a, which certainly will hit all flows in a marked frame...

[BB2] By resorting to method 1a, I think you're admitting that method 1b 
is problematic.
I've now shown (above) that method 1a is also problematic. And here's 
why *method 1b* is not just slightly problematic (addressing your 
earlier swipe)...

You can imagine a longer period of 100% frame marking (say 100 frames) 
that would still translate to about 1 in 3 packet marking. So, if 
congestion has got bad enough to cause about 33% packet marking, if more 
flows joined, congestion at the AQM would get worse, but it wouldn't be 
able to mark more than 33% of packets, because it can't mark more than 
100% of frames. Now consider even further that the frames are 9,216 B 
and the packets are still 1500B, therefore the ratio is about 6:1, not 
3:1. Then transports would be limited to a range of just 0-17% marking.

Consider further that the 300 packets within those 100 frames might map 
to say 100 flows (assumed all equal rate for illustration), i.e. about 3 
packets per flow. So on average about 1 packet per flow would be marked. 
In practice some flows get more marks and some less. So a number of the 
flows wouldn't even see a mark, even though the AQM is marking 100% of 
the frames.

So you can see why I didn't want method 1b in the draft - it can 
severely limit the usable range of marking proportion.

The point here is that the rationale for preserving presence/timing of 
congestion events doesn't carry over from IP fragmentation to L2 
encapsulation, where the packets inside a large frame will generally 
belong to different flows. Then, the timing of a congestion event should 
be propagated to all the flows within the frame.

Method 1a, does that, but it also spreads to all partial packets covered 
by the frame, which can saturate packet marking, as explained earlier.


>
>> Conversely, in scenario b) with the Goal1 decap, if the AQM marks more than about 1 in 3 of the smaller frames, 100% of the packets will be marked. So, taking the system as a whole, the AQM will mark fewer frames before the CCAs slow down enough. This gives the AQM a smaller operating range and it is likely to make the system more jumpy, and less controllable.
>>
>> 	[SM] As I said before here we are trying to second guess the AQM, we do NOT know how that AQM would have marked had it seen our actual IP packets, so it seems futile to figure out which of our approximations is "best", the goal should be "good enough" and as simple as possible.

[BB2] The question for the ecn-encap draft was: which principle or goal 
to recommend. You seem to be back-tracking away from Goal1 and veering 
towards method 1a being an approximation that could be used for Goal2. 
Pls confirm whether you no longer subscribe to Goal 1?

Method 2 illustrates Goal2 precisely, and I /believe/ it can be 
implemented very simply, possibly with less complexity that the Goal1 
methods.

Nonetheless, for the purposes of a design guidelines draft, it is 
expected that a protocol designer (and implementers) will optimize for 
their particular case, which could involve methods that only approximate 
the goal.

>> A3. Preserving size-independent marking
>>
>> It might help to visualize the following usinghttps://bobbriscoe.net/projects/netsvc_i-f/consig/encap/ecn-encap-reframing.svg
>>
>> If L2 frame boundaries are completely independent of L3 packet boundaries:
>> 	• If the packet that includes the start of each marked frame is marked (as in each Goal1 row), packet marking will become size-dependent.
>> 		• Reason: the start of each frame is equivalent to a point picked at random in the stream of packets.
>> 		• any point picked at random within the stream is more likely to fall within a larger packet
>> 	• With the Goal2 approach, the start of a congestion episode will tend to fall within a larger packet by the same reason as for Goal1 above
>> 		• further packets in the congestion episode are marked depending on how much octet marking is left over from previous marking
>> 			• so subsequent packets are marked whatever their size
>> 		• however, the last packet to be marked in an episode will also tend to be a larger packet,
>> 			• because the last octet in a frame-marking episode is also a random point in the packet stream, like the start.
> 	[SM] As I said, this assumes a specific way the AQM would mark the IP packets in question

[BB2] The only presumption (adopted from you) is that that frame marking 
starts out size-independent before decap.
BTW, the AQM is marking L2 frames, not IP packets.

> (say if variable size L2 frames would be used each only containing an individual IP packet).

[BB2] Er... that's the simple case with aligned frame and packet 
boundaries, which is outside the scope of this whole discussion.

>
>
>> If the traffic stream contains idle periods, of course, the first L2 frame and L3 packet boundaries after each idle will coincide.
>>
>>> a) in the years sine that BCP was ratified no known implementatinn of that method has come to see the light of day (and hence no real data exists about it working as intended)
>> [BB] Before responding to that, with hindsight, I would make the following picky disagreement with one of my own sentences [in §2.4 of RFC7141]:
>>     "This [octet preserving when splitting or merging packets] is based on the principle used above;
>>     that an indication of congestion on a packet can be considered as an
>>     indication of congestion on each octet of the packet."
>>
>>
>> In the context of splitting or merging packets at decap, I would say today that octet preserving is an implementation technique not a principle. At decap, the appropriate principle is to preserve the proportion of marking (which happens to also preserve octets at any function where total octets are preserved - by the reasoning given earlier).
> 	[SM] Yes, you would ned to do that, given that the justification for method 2 is not really obvious in RFC7141... which IMHO is a problem in itself as it is odd to claim precedence by an RFC if that RFC actually says something different (however rfc7141 still wants octet preservation and method 2 will also deliver that).
>
>
>> [BB] Now to your point,...
>> I'm not sure there have been many, if any, implementations of splitting or merging packets while propagating ECN since RFC7141. And I'm not sure how you know there haven't been any that follow §2.4 of BCP7141. (I admit though that, if you were omniscient, I wouldn't know that you were, because I'm not.)
> 	[SM] Oh, I asked here on the list and got back no response, which I think conservatively needs to be interpreted as non-existence for the scope of this discussion, the onus to show differently is on the proponents of method 2 IMHO.

[BB2] So presumably you also believe there have been no implementations 
of Goal1 either?

>
>
>> In two cases that I am aware of (both protocol specs, rather than implementations), ECN has been disabled over an encapsulating tunnel in order to avoid having to propagate ECN at decap:
>>
>> https://datatracker.ietf.org/doc/html/rfc9347#section-3.1
>> https://datatracker.ietf.org/doc/html/draft-ietf-masque-connect-ip-13#section-10.2
> 	[SM] Which can be counted against all methods, and to be honest if there is no data showing methods 1a and 1b being used, I would propose to rip out that whole section as well, not only method 2.

[BB2] On the contrary, if we had sorted out which goal to recommend, 
they would have been able to follow the recommendation.

>
>>> b) you yourself got involved in specifying a protocol (TCP Prague) that also does not see to follow that method
>> [BB] When we were first using Linux DCTCP for L4S, it used acked_bytes to maintain the fraction of ce-marked bytes, which did follow the principle of treating all the octets in a marked packet as marked. See:
>> https://elixir.bootlin.com/linux/v3.18.9/source/net/ipv4/tcp_dctcp.c#L186
>>
>> But by the time Prague was forked off from DCTCP, someone had DCTCP to counting marked packets, probably for efficiency because the relevant variables (delivered_ce and delivered) were already maintained by the kernel. But none of us noticed until a while later. We should now make the change back to bytes, but haven't got around to arguing it through with everyone yet.
> 	[SM] Well, come back after you made that change? It seems rather odd that you are willing to stall a draft for almost a year to fight for a method that you did not bother to actually use (consistently) in your own protocol.

[BB2] ecn-encap draft stalled on a recommendation about non-aligned 
frame/packet boundaries, which is for lower-layer network infrastructure 
and therefore has much wider impact than one CCA (Prague), which can be 
changed very easily if responding to small packets becomes problematic.

Also, I think you're getting your RFCs and drafts confused. The 
recommendation about end systems is in RFC7141, not ecn-encap. And BTW 
the recommendation about end-system in RFC7141 is quite liberally worded 
(because it is primarily about avoiding self-harm, not inter-flow 
interaction).

>
>
>> As you saw in my response to the review from Neal, it doesn't much matter when only a few ECN-capable packets are smaller than the SMSS, because the proportions of marked packets and marked octets aren't often that different. But once ACKs are ECN-capable, it becomes important to get this right, particularly in connections with significant 2-way data flow.
> 	[SM] I would respectfully argue that in connections with significant 2-way data flow the ACK will often be piggy backed onto data packets, hence will not be all that small.
> And then I ask: if we talk about pure reverse ACK flows what differential response to a marked ACK do consider appropriate (keeping in mind that the AQM was supposed to mark packet based, hence marking probability should e decoupled from packet size)?

[BB2] The relevant scenario is where the direction of flow alternates, 
so you get a round-trip of pure ACKs at the end of each volley.

>
>>> d) I could poke severe holes into the described method (that seem fixed now) and it seems a clear indicating that this method was truly never more than a sketch.
>> [BB] Indeed, it /was/ meant to be a sketch to give implementers an idea of how they might implement the design goal.
> 	[SM] And that is exactly what I think should e avoided for RFCs/BCPs unless said sketch is based on real data.

[BB2] But I had worked out the details and considered alternatives 
before sketching it at high level. Because I (also) subscribe to the 
view that it has to be known to be possible to implement a policy, 
before that policy can be recommended.

>> At the time you pointed this out, I thought it was obvious that an implementer would not mix non-ECN and ECN codepoints together in the same frames, but I was happy to say that explicitly when you asked.
> 	[SM] "I thought it was obvious" is not the most robust and reliable approach to write recommendations.
>
>
>> Both the examples for how to achieve Goal1 suffer from the same problem.
> 	[SM] How so? The issue is that method 2 needs special accounting for dropped packets as otherwise the counters go out of sync
> Let's look at method
> 1a) Mark all IP packets related to a marked L2 frame:
> This will not see dropped IP packets, but it does not matter as a drop in itself is a (slightly ambiguous) congestion signal, also all packets that can carry a mark will be marked, so this method is IMHO robust against that issue

[BB2] A Not-ECT packet covered by a marked frame would need to be 
dropped. That's admittedly not a counting problem, but there's a problem 
with large amounts of unnecessary drop unless packets are pre-classified 
into ECN types, which again I assumed they would be for this approach.

>
> 1b) propagate a single mark only to one IP packet out of the set related to a L2 frame:
> So what happens here is that we end up potentially sending more congestion signals per frame and hence slightly violate the method. If a single IP packet was dropped one could argue the precise way forward would be to only propagate a mark to an eligible packet if no packet was dropped; but given that a single marked frame could contain multiple Not-ECT IP packets (that need to be dropped) this is unfixable. It also will not matter all that much, since no end2end protocol will depend on the mark propagation following this method strictly.

[BB2] I think we can conclude that your trying to wriggle out of your 
previous assertions.

>
>
> But since you say both goal 1 examples have the same issue, please elaborate how this affects 1a, and how it affects 1b in a relevant way.

[BB2] Surely you just did.

> As far as I can see it is only the elanorate dual counter method that requires special care in this regard.
>
>
>
>
>
>> But I didn't touch them 'cos I had been asked to include that wording verbatim (and like I said, I think it's obvious that one doesn't mix ECN codepoints within the same frame).
> 	[SM] "it's obvious that one doesn't mix ECN codepoints within the same frame" this as a policy will either introduce re-ordering at the en-framer, or will require variable sized frames (at which point one could to 1:1 frame to IP packet framing making things moot again), or partially empty frames (wasting utilisation); it will also make the en-framer (and de-framer) considerably more complex e.g. by requiring multiple queues...

[BB2] Classification by ECN codepoint shouldn't lead to data reordering. 
Whatever, these sorts of issues all depend on the specific circumstances 
- degree of aggregation, frame sizing constraints, whether flows consist 
of packets all of the same ECN type, etc. which is why only high level 
examples are given in ecn-encap.

>
>>> I understand that BCPs, inspite of what their name implies, are intended to inject some policy, but if such an injectin attempt proves to be a complete dud, as here, I argue it is time to drop it.
>>>
>> [BB] As above, the principle of preserving proportion is held by most people.
> 	[SM] I accept that this is your assumption. Given the amount of people that participated in this discussion I am more inclined to believe most folks do not really have an opinion on that.

[BB2] You are seeing little discussion now, probably because you are 
raising stuff:
* 17 years after all the discussion on RFC7141 started and 9 years after 
it was published
* 12 years after all the discussion started on ecn-encap, and 4 years 
after the 2nd WGLC closed

>
>
>> Having digested my arguments above, I hope you might join them.
> 	[SM] Not really, as I said, let's not second guess the AQM and come up with something simple enough to describe in 1-2 sentences so an implementer will do the right thing.
>
>
>> And octet preserving is one of the few ways to preserve proportion without delaying the signal. Other ways that measure proportion (e.g. by counting the packets between marks, or the way Koen suggested) all delay changes to the signal.
>>
>>>> This draft is not a protocol spec. It's design guidelines for adding congestion notification to a L2 protocol in the future. The question of whether there is an implementation can only be relevant at the time a spec is written for a specific protocol.
>>>>
>>> 	[SM] I disagree, any RFC should (IMHO, apparemtly this is not accepted commonly) ONLY recommend methods that are known to work, and the easiest way to demonstrate that is to implement it and show data. Again, the TSV wg does not require this, but I consider that to be the wrong approach.
>> [BB] No-one can write an implementation of how an abstract unknown L2 protocol propagates ECN.
> 	[SM] This is why I would be satisfied to see an implementation for a known L2 protocol/framer...
>
>
>> The two approaches I'm aware of for propagating ECN between the layers (TRILL and for MPLS) are very different, because they are tailored to their protocols. When a requirement to propagate ECN with disjoint packet boundaries needs to be implemented, it will again be tailored to the protocol involved.
> 	[SM] At which point we might do best not to mention any method at all...
>
>>>> The question of which goal to aim for and which implementation to use will be decided when a specific protocol is designed. It's possible that the disagreements have been due to a difference in assumptions between the two 'camps'. Which assumptions are appropriate might become clearer when a specific protocol is on the table.
>>>>
>>> 	[SM] No, recommending an untested approach is not what an IETF document should do, at the very least it should clearly mark untested ideas as such, but that is not what the current draft does.
>>>
>>>
>>>
>>>> Hence, I believe the chairs are asking me to post the proposed wording because it's not relevant whether you or I or anyone else wants one or the other of the examples in the draft. You don't agree with goal 2. I don't agree with goal 1.
>>>>
>>> 	[SM] And there we have it, this whole thing is based on your personal dislike... is an RFC/BCP really the best place to voice personal opinions?
>> [BB] Please don't twist my words.
> 	[SM] Twisting of words was not intended, this was a rephrasing of "I don't agree with goal 1".
>
>
>>> Please show data that method 2 works better than method 1 (heck, show data that method 2 works at all) and we are talking. As far as I can tell, correct me if I am wrong, method one is actually implemented? If however nether method is implemented right now, the draft should clearly say that both are speculative and untested.
>> [BB] As above, design goals aren't implementable without a specific protocol. So criticism of lack of implementation is just pointless negativity and applies to all methods anyway.
> 	[SM] Again, I am asking for one example implementation that shows the practical feasibility of goal 2, having to actually implement something tends to highlight areas of underspecification pretty quickly and after having one (tested and) working implementation the issues encountered during the implementation can help improve the recommendation.
>
>
>>
>>>> But I have written both into the draft anyway, so the options are recorded, but the decision is essentially deferred until a specific protocol is written.
>>>>
>>> 	[SM] Well, how about ripping out both then? That clearly leaves even more freedom to implementers, without giving them wrong ideas.

[BB2] You may have noticed I prefer to be constructive. Because protocol 
designers need guidance on this point, as illustrated by the two recent 
RFCs that bypassed the issue.

If you now agree that Goal1 is incorrect, perhaps you could help 
persuade its proponents that it is incorrect and have it removed.




Bob


>>>
>> [BB] No more changes. The draft is moving forward.
>> This email was just to pick up on points where I could see misconceptions.
> 	[SM] As I said, this is pretty much the outcome I expected.
>
> Regards
> 	Sebastian
>
>
>>
>> Bob
>>
>>>> Bob
>>>>
>>>> On 23/08/2023 13:39, Sebastian Moeller wrote:
>>>>
>>>>> Dear List,
>>>>>
>>>>> this is not going as it should. We are still promoting a method that was essentially proposed in 2014* and has since apparently never been implemented/properly tested. If anybody has evidence of an actual implementation and data showing this implementation actually working, please come forward.
>>>>>
>>>>>
>>>>> *) 2014 RFC7141 was ratified, the actual idea/methos is probably older given that what resulted in rfc7141 was started in 2007.
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 23, 2023, at 10:21, Gorry Fairhurst<gorry@erg.abdn.ac.uk>
>>>>>>   wrote:
>>>>>>
>>>>>> As promised at the meeting in San Francisco, we will be progressing with draft-ietf-tsvwg-ecn-encap-guidelines.
>>>>>>
>>>>>> This (and it's related dependency ID) have been on the Chair's action list since completion of WGLC quite some time ago.  At the last IETF meeting, the Chairs worked with the document editor to revise the text around the two possible design goals. The changes that we expect are summarised below and we are now expecting a new revision of this draft. This will allow us to complete a Shepherd  writeup for these drafts.
>>>>>>
>>>>>> Gorry
>>>>>> (TSVWG Co-Chair)
>>>>>>
>>>>>> ----
>>>>>> BEFORE
>>>>>>
>>>>>> Two possible design goals for propagating congestion indications, described in section 5.3 of [RFC3168] and section 2.4 of [RFC7141], are:
>>>>>> 	• approximate preservation of the presence of congestion marks on the L2 frames used to construct an IP packet;
>>>>>> 	• approximate preservation of the proportion of congestion marks arriving and departing.
>>>>>>
>>>>>> In either case, an implementation SHOULD ensure that any new incoming congestion indication is propagated immediately, not held awaiting the possibility of further congestion indications to be sufficient to indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to facilitate pipelined implementation, it would be acceptable for congestion marks to propagate to a slightly later IP packet.
>>>>>>
>>>>>> Concrete example implementations of goal #1 include (but are not limited to):
>>>>>> 	• Every IP PDU that is constructed, in whole or in part, from an L2 frame that is marked with a congestion signal, has that signal propagated to it;
>>>>>> 	• Every L2 frame that is marked with a congestion signal, propagates that signal to one IP PDU which is constructed, in whole or in part, from it. If multiple IP PDUs meet this description, the choice can be made arbitrarily but ought to be consistent.
>>>>>>
>>>>>> Concrete example implementations of goal #2 include (but are not limited to):
>>>>>> 	• A counter ('in') tracks octets arriving within the payload of marked L2 frames and another ('out') tracks octets departing in marked IP packets. While 'in' exceeds 'out', forwarded IP packets are ECN-marked. If 'out' exceeds 'in' for longer than a timeout, both counters are zeroed, to ensure that the start of the next congestion episode propagates immediately;
>>>>>>
>>>>>> AFTER
>>>>>>
>>>>>> Two possible design goals for propagating congestion indications, described in section 5.3 of [RFC3168] and section 2.4 of [RFC7141], are:
>>>>>> 	• approximate preservation of the presence (and therefore timing) of congestion marks on the L2 frames used to construct an IP packet;
>>>>>> 	• a) at high frequency of congestion marking, approximate preservation of the proportion of congestion marks arriving and departing;
>>>>>> b) at low frequency of congestion marking, approximate preservation of the timing of congestion marks arriving and departing;
>>>>>>
>>>>>> In either case, an implementation SHOULD ensure that any new incoming congestion indication is propagated immediately, not held awaiting the possibility of further congestion indications to be sufficient to indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to facilitate pipelined implementation, it would be acceptable for congestion marks to propagate to a slightly later IP packet.
>>>>>>
>>>>> 	[SM] 1 and 2.b contain already mention conservation of timing, which is essentially a direct consequence of the next paragraph "In either case, an implementation SHOULD ensure that any new incoming congestion indication is propagated immediately". Either this does not hold for 2.a) (which should be noted somewhere) or the addition of timing in 1.) and 2.b) seems redundant.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> At decapsulation in either case:
>>>>>> 	• ECN marking propagation logically occurs before application of rule 1 in Section 4.4.  For instance, if ECN marking propagation would cause an ECN congestion indication to be applied to an IP packet that is a Not-ECN-PDU, then that IP packet is dropped in accordance with rule 1.
>>>>>> 	• if a mix of frames with different types of ECN capability arrives to construct the same IP packet, that packet MUST be discarded. This requirement uses the generalization 'types of ECN capability', because the L2 ECN protocol might not map exactly to the three types in IP, which are Not-ECN-capable, ECT(0) and ECT(1) [RFC8311].
>>>>>>
>>>>>> The following gives one way that goal #1 might be achieved, but it is not intended to be the only way:
>>>>>> 	• Every IP PDU that is constructed, in whole or in part, from an L2 frame that is marked with a congestion signal, has that signal propagated to it;
>>>>>> 	• Every L2 frame that is marked with a congestion signal, propagates that signal to one IP PDU which is constructed, in whole or in part, from it. If multiple IP PDUs meet this description, the choice can be made arbitrarily but ought to be consistent.
>>>>>>
>>>>> 	[SM] I am confused:
>>>>> The first clause says: every IP PDU "inherits" the mark from a L2 frame. Which to me means all IP PDUs (even only partially) constructed from a marked L2 frame will inherit the mark.
>>>>> The second clause says that mark is propagated to only one (consistently seected) IP PDU.
>>>>> These appear to describe two mutually incompatible ways to achieve goal 1, not well described as "The following gives one way", no? So what am I missing here?
>>>>>
>>>>>
>>>>>
>>>>>> The following gives one way that goal #2 might be achieved, but it is not intended to be the only way:
>>>>>> 	• For each of the streams of frames encapsulating IP packets
>>>>>>
>>>>> 	[SN] "for each of the streams of the frames" so this now allows for multiple different frame types to arrive in the IP-decapsulator?
>>>>>
>>>>>
>>>>>
>>>>>> of each IP-ECN codepoint,
>>>>>>
>>>>> 	[SM] There are arguably 4 IP-ECN codepoints, is this supposed to result in 4 counters? I had thought that we really only care about propagating L2 congestion events to ECN-CE marks here?
>>>>>
>>>>>
>>>>>
>>>>>> a counter ('in') tracks octets arriving within the payload of marked L2 frames and another ('out') tracks octets departing in marked IP packets.
>>>>>> While 'in' exceeds 'out', forwarded IP packets are ECN-marked. If 'out' exceeds 'in' for longer than a timeout,
>>>>>>
>>>>> 	[SM] In this condition we dequeued more "CE-bits" than we enqueued, if the next marked L2 frame results in less bits than out-in we will not immediately mark this and essentially "swallow" that mark. This now means that this "a timeout" will need to be pretty short to still obey the "SHOULD ensure that any new incoming congestion indication is propagated immediately" rule. So this timeout needs to be equivalent to not more than "slightly later"?
>>>>>
>>>>>
>>>>>> both counters are zeroed, to ensure that the start of the next congestion episode propagates immediately. The 'out' counter includes octets in reconstructed IP packets that would have been marked, but had to be dropped because they were Not-ECN-PDUs (by rule 1 in Section 4.4).
>>>>>>
>>>>> 	[SM] What about packets that would be marked where dropped for other reasons (e.g. queue full)? Such dropped packet will also send a "slow" down signal to the end-points so why still follow this up with more marking?
>>>>>
>>>>>
>>>>> I really would like to see a working implementation of that method before putting it in a RFC/BCP*... yes this AFTER version is better than the BEFORE version, but I still think it would be prudent to drop this still speculative discussion of "method 2". Yes, this was essentially propsed 2014 in rfc7141, but the apparent lack of implementations indicates lack of interest in the field.
>>>>>
>>>>> Regards
>>>>> 	Sebastian
>>>>>
>>>>> *) Which is not a requirement in tsvwg, I just think it should be.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe
>>>> http://bobbriscoe.net/
>>>>
>>>>
>>>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe
>> http://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/