Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02

Sebastian,

See [BB3]

On 13/09/2023 13:33, Sebastian Moeller wrote:
> Hi Bob,
>
>
>> On Sep 13, 2023, at 14:07, Bob Briscoe<ietf@bobbriscoe.net>  wrote:
>>
>> Sebastian, See [BB2] (and sry for delayed reply...)
>>
>> On 05/08/2023 09:37, Sebastian Moeller wrote:
>>> Question below, prefixed [SM]
>>>
>>>
>>>> On Aug 5, 2023, at 03:47, Bob Briscoe<ietf=40bobbriscoe.net@dmarc.ietf.org>
>>>>   wrote:
>>>>
>>> [...]
>>>
>>>>> -----------
>>>>> A system wide option is available to disable AccECN negotiation, but the Prague CC module will always override this setting, as it depends on AccECN. Then, solely in this case, AccECN will only be active for TCP flows using the Prague CCA.
>>>>> -->
>>>>> A system-wide sysctl is available to enable or disable AccECN negotiation. However, the Prague CC module overrides this sysctl and will always enable AccECN negotiation, since it depends on AccECN (i.e., when the system-wide sysctl disables AccECN negotiation, TCP flows using the Prague CCA will still attempt AccECN negotiation).
>>>>>
>>>> [BB] Yes, it was badly worded. I've had another go myself:
>>>>
>>>> A system-wide option is available to enable or disable AccECN negotiation. However, TCP flows using the Prague CCA module depend on AccECN; so they  always ignore this system-wide sysctl and enable AccECN negotiation anyway.
>>>>
>>> [SM] This seems to violate the principle of least surprise. If there is a toggle to disable AccECN system-wide it needs to be honored. TCP Prague could maybe write a message in the kernel/system log noting that AccECN is missing and TCP prague might fail somehow. Alternatively at least introduce another sysctl to disable the use TCP Prague completely. The administrator of a system should be in control and that means the administrator can also put the system in odd states.
>> [BB2] Here it's written how Linux Prague currently works. If we have to change it when submitting for Linux mainlining, we'll change the I-D as well.
>>
>> Altho this is really an issue for Linux netdev, I'll give the reasoning here (I'm no expert on Linux design principles, so pls bash).
>> * AccECN is a dependency for Prague. Admins expect Linux to automatically sort out dependencies.
>> * On a system with AccECN disabled, flows that are not using the prague cc module still have AccECN disabled.
>>
>> Surely an admin would be surprised if they tried to enable Prague, but the system replied that it was not going to enable a dependency even though it could.
> 	[SM] No if there is a toggle for AccECN it needs to be honored. You can introduce multiple modes to your AccECN sysctl, like:
> 0: disable unconditionally
> 1: enable unconditionally
> 2: enable only for TCP-Prague
>
> and for all it is worth you might even convincingly argue for 2 being the default, but "disabled" really needs to be disabled...
>
>
>> Wouldn't it be more normal to write a warning that AccECN had been enabled for Prague flows, even though it was disabled by a global sysctl?
> 	[SM] As you noted, not my call to make. However as a Linux and sysctl user I certainly would prefer a message telling me that I need to change AccECN to 1 or 2 instead of silently ignoring my p[olicy setting.

[BB3] In discussion with the implementers, we've decided to change the 
draft text to what Neal suggests, and leave the implementation as it is. 
Then we'll see what netdev says.

>>>>> -----------
>>>>> A Prague CCA triggers update of its moving average once per RTT by recording the packet it sent after the previous update, then watching for the ACK of that packet to return.
>>>>> -->
>>>>> A Prague CCA triggers update of its moving average ECN mark rate once per rtt_virt [see Section 2.4.4].
>>>>>
>>>>>
>>>> [BB] Thx for catching this.
>>>>
>>>>
>>>>> -----------
>>>>> To maintain its moving average, it measures the fraction, frac, of ACKed bytes or ACKed packets
>>>>> -->
>>>>> [IMHO the spec should specify whether the CCA is measuring using bytes or packets, since the answers may be very different depending on the approach, leading to unfairness between implementations with different approaches. I would argue for using the fraction of packets marked (as IIRC  I have argued on some IETF mailing list or another). And Linux TCP Prague is already doing this.]
>>>>>
>>>> [BB] Agreed that this ought to say just packets, to document what Linux Prague uses.
>>>>
>>>> If packet sizes were independently and identically distributed (IID), on average any differences would cancel out, 'cos the distribution of packet sizes is in both the top and bottom of the fraction. That assumes all L4S AQMs mark packets independently of size, which is currently true (and recommended by RFC7141).
>>>>
>>>> Nonetheless, if packet sizes do vary, they would very likely not be IID. For instance, if one end was sending ECN-capable pure ACKs, it would be likely to be sending a lot in a row, not just randomly. Then measuring bytes would be the right thing (adding a nominal header size to each packet if an exact one were not available).
>>>>
>>> 	[SM] Why? If you want rate fairness (as your "right thing" seems to imply) then just use a rate-equalizing scheduler...
>> [BB2] This is primarily about preventing harm to self, not rate-fairness. It's to ensure that the rate is not reduced more than is appropriate in response to any marked small packets.
>>
>> BTW, a CCA on a sending host cannot know what schedulers there might be in its path.
>>
>>>> BTW, I do remember you raising this on a list somewhere. I meant to reply, and I guess it's still in my todo list somewhere - I'll dig it out.
>>>>
>>>> If we conclude thatRFC7141 is OK on this point, then we'll need to write something in the future work section under congestion metrics about this (and we'll have to implement it).
>>>>
>>> 	[SM] I have mentioned before that I for one consider RFC7141 to be wrong on the
>>> " When a transport detects that a packet has been lost or congestion
>>>     marked, it SHOULD consider the strength of the congestion indication
>>>     as proportionate to the size in octets (bytes) of the missing or
>>>     marked packet."
>>>
>>> section. A flow should try to get as veridical an estimate about a congestin event as possible and react to that best estimate of the congestion, and if as RFC7141 recommends congestion marking does not take packet size into account, nor should the receiver of the congestion signal.
>>>
>> [BB] Disagreeing with something because it seems odd to you that two things don't match is not a reasoned argument, let alone a scientific argument.
> 	[SM] I laid out my rationale, why the proposal in rfc7141 is logically not sound... that seems sufficiently reasonable and scientific to me. I did not wrote "odd" I used the classification "wrong".

[BB3] I answer everything about the splitting and merging aspect of 
RFC7141 in the thread currently running on tsvwg about your RFC7141 erratum.
Subject: [Technical Errata Reported] RFC7141 (7237)

>> The many reasons for why size should be taken into account by the function responding to congestion signals, and not by the function doing the marking are given in RFC7141 (which was the consensus outcome of protracted WG discussions).
> 	[SM] And I am telling you that RFC7141 simply is wrong in that regard. As seen with other drafts, not all sections and sentences receive the scrutiny they require and sometimes things slip though that make little sense, IMHO this is one of those cases.
>
>> I understand that the chairs of tsvwg have already asked you to write up a draft of your arguments against RFC7141, if you have any. This is the constructive way expected at the IETF.
> 	[SM] I opted created an erratum for rfc 7141 instead and the response to that (and lack thereof) convinced me that writing a new draft is going to be an exercise in futility, but I digress.

[BB3] See the tsvwg thread about your erratum on the splitting and 
merging recommendation of RFC7141.

BTW, "I am telling you that xyz simply is wrong in that regard" is not a 
reasoned argument.

>>
>>> Sidenote:  RFC7141 is ratified since 9.5 years and has been arguing for this odd dichotomy between encoding congestion signals and interpreting congestion signal since the first draft in 2007. The fact that apparent ZERO implementatins of the recommended approach seem to exist, let alone seem to be quantitatively used over the internet IMHO really should end that folly. Protocol stacks should not make up congestion signal, but simply respond appropriately to the best congestion estimate they can reasonably maintain.
>>>
>>> [...]
>>>
>> [BB2] RFC7141 recognized that, at the time, packet size was rarely taken into account in either case (neither marking nor responding), and explains that it is more important not to introduce size-dependent marking in the network (MUST NOT), while introducing size-dependent response on hosts is not obligatory but recommended (SHOULD).
> 	[]
>
>
>> There are implementations of size-dependent response, e.g. TFRC-SP.
> 	[SM] Except that  TFRC-SP simply counts marked packets.... the size dependence hence is not related to interpretation of congestion signals, seehttps://datatracker.ietf.org/doc/html/rfc4828:
> In TFRC-SP, the loss event rate is calculated by counting at most one
>     loss event in loss intervals longer than two round-trip times, and by
>     counting each packet lost or marked in shorter loss intervals.
>
> This is pretty clear in what is counted, and that is not related to the size of the received packets. TFRC-SP needs to calculate a loss rate because it intends to respond as if it was a flow with larger packet size, but for that loss rate does not depend on number of marked bytes... This is IMHO not an applicable example for "size-dependence" of interpretation of congestion signals.

[BB3] Yes, you're right. I was remembering a TFRC variant that uses 
variable packet sizes, scaling them down with congestion. But I can't 
find it at the mo.

Fortunately, I didn't make that mistake in RFC7141, which describes 
TFRC-SP correctly. As you say, TFRC-SP scales up the rate to allow for 
smaller packets (by the ratio of packet-size to nominal packet-size), 
rather than scaling losses by packet size. Indeed, the outcome is closer 
to scaling the loss event probability by the square of the relative 
packet size (relative to a nominal size).

That tells me that the recommendation in §2.3 of RFC7141 about 
considering a congestion signal as proportionate to packet size is too 
prescriptive. It should say 'dependent on', not 'proportionate to'. And 
it should say that techniques can be used that don't directly depend on 
the packet size of congestion signals (like scaling up the outcome of 
the whole congestion signalling process (the flow rate), as in TFRC-SP).

For the avoidance of doubt, this doesn't change the recommendation in 
§2.4 of RFC7141 about the number of octets being equivalent before and 
after splitting or merging. Please don't respond here about that, but 
see the parallel thread running about splitting and merging frames on tsvwg:
https://mailarchive.ietf.org/arch/msg/tsvwg/-ZiC1XgcOCaCrO_GFO_SKJjOPE0/
TL;DR:, for splitting and merging packets, preserving octets is not the 
/principle/. The principle is to preserve the proportion of marks, and 
preserving octets is just one technique for doing that.

>
>> But implementation is not a voting system, because the message of RFC7141 is only if you're implementing size-dependence, do it in the host, not the network.  Seehttps://www.rfc-editor.org/rfc/rfc7141#section-2.3
> [SM] Here is the first section of that section:
>
>     When a transport detects that a packet has been lost or congestion
>     marked, it SHOULD consider the strength of the congestion indication
>     as proportionate to the size in octets (bytes) of the missing or
>     marked packet.
>
> In other words, when a packet indicates congestion (by being lost or
>     marked), it can be considered conceptually as if there is a
>     congestion indication on every octet of the packet, not just one
>     indication per packet.
>
>     To be clear, the above recommendation solely describes how a
>     transport should interpret the meaning of a congestion indication, as
>     a long term goal.  It makes no recommendation on whether a transport
>     should act differently based on this interpretation.
>
> This is considerably stronger than a "if you must, do it at the end-hosts" argument, this is a clear recommendation to do it at the end hosts....
> IMHO the last sentence than makes this worse... interpret it differently but do not act on that interpretation...

[BB3] See the last sentence that you've quoted, and subsequent sentences 
that you haven't quoted. I'm afraid this is the result of wording 
"written by committee" to allow for existing implementations. It does 
eventually say what the WG intended, but you have to read it all to get 
the full idea. I'll leave others to judge what they think by reading the 
rest of the section without it having been taken out of context.

>
>> More generally, implementation is not a voting system on the correctness of science anyway.
> 	[SM] Indeed. However a working implementation typically (especially if used in a peer reviewed paper, or accepted by critical upstreams) implies that some level of testing was used and a few critocal eyes have looked over it.
>
>
>>   Congestion control and AQM have a grounding in control engineering and the maths behind it. Yes, the IETF's mantra is "...running code". But that doesn't mean an implementation of bad science makes it good science.
> 	[SM] +1; but that is not my claim. My claim is that the attempt at injecting policy (not backed up by empirical data) failed and it is time to let go...

[BB3] You've switched from 'implementation' to 'empirical data'. But 
even then, you can't run an experiment without a goal to test against.

Bob

> Regards
> 	Sebastian
>
>
>> Bob
>>
>>>>> -----------
>>>>> Also, integer rounding bias ought to be removed from the multiplicative decrease calculation.
>>>>> -->
>>>>> [I would suggest spelling out how to do this correctly to increase the odds that this is implemented correctly by implementors that can't look at the GPL tcp_prague.c reference code]
>>>>>
>>>> [BB] I introduced a pseudcode name for the carry variable into the previous sentence, Then added the pseudocode below:
>>>>      "... delay can be made significantly less jumpy by tracking a fractional value, cwnd_carry, alongside the integer window and carrying over any fractional remainder to the next reduction." ... Specifically:
>>>>
>>>> #define ONE_CWND (1LL << 20)        /* Must be signed */
>>>> #define MAX_ALPHA (1ULL << 20)
>>>>
>>>> /* On CE feedback, calculate the reduction in cwnd */
>>>>      /* Adding MAX_ALPHA to the numerator effectively adds 1/2
>>>>       *  which compensates for integer division always rounding down */
>>>>      reduction = (alpha * cwnd * ONE_CWND + MAX_ALPHA) / MAX_ALPHA / 2;
>>>>      cwnd_carry -= reduction;
>>>>
>>>> /* Round reduction into whole segments and carry the remainder */
>>>>      if (cwnd_carry <= -ONE_CWND) {
>>>>          cwnd_carry += ONE_CWND;
>>>>          cwnd = max(cwnd - 1, MIN_CWND);
>>>>          ssthresh = cwnd;
>>>>      }
>>>>
>>>>
>>>>
>>>>> -----------
>>>>> Example functions for the virtual RTT are:
>>>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>>> 	• rtt_virt = srtt + AdditionalRTT;
>>>>> where RTT_VIRT_MIN and AdditionalRTT are constants. The current default is rtt_virt = max(srtt, 25ms), which addresses the main Prague requirement for when the RTT is smaller than typical.
>>>>> -->
>>>>> The virtual RTT, rtt_virt is computed as:
>>>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>>> where RTT_VIRT_MIN = 25ms.This addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
>>>>>
>>>> [BB] The fluffiness is because this is a case where implementations might differ, so I've made it clearer what the Linux implementation does but also left in the other example. Also the constants depend on the deployment environment. Specifically:
>>>>
>>>> Example functions that implementations might use for the virtual RTT are:
>>>>      rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>>      rtt_virt = srtt + AdditionalRTT;
>>>> where the parameters RTT_VIRT_MIN or AdditionalRTT would be set for a particular deployment environment.
>>>>
>>>> The Linux implementation of Prague uses the first example and, for the public Internet, it sets RTT_VIRT_MIN=25ms. Thus, Linux Prague defines
>>>> rtt_virt = max(srtt, 25ms), which addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
>>>>
>>> 	[SM] I still think this is not a general solution*... this really just takes the edge off TCP Pragues increased RTT bias compared to other TCPs but will only be noticeable if the TCP Prague flow has an RTT below 25ms, if we look at competition say between TCP Prague @25ms RTT with TCP Prague@160ms we will still see a considerably larger RTT bias than with say TCP Cubic @25ms and TCP Cubic@160ms.
>>>
>>> *) IIRC this was only introduced as a counter-measure after some testing (
>>> https://github.com/heistp/l4s-tests
>>> ) demonstrated quite noticeable increased RTT bias over TCO Cubic:
>>>
>>> https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667
>>>
>>>
>>> Especially the middle section with the FIFO. This test was using 10ms versus 160ms RTT and making Prague always act as if having 25 ms RTT ameliorated the issue somewhat, bit did not actually generally solve it. I find it odd to find a section "Reduced RTT-Dependence" in this draft given that essentially TCP Prague comes with a noticeably increased RTT bias (at least the default implementation for Linux). Big fan of truth in advertising....
>>>
>>>
>>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoe
>> http://bobbriscoe.net/
> _______________________________________________
> iccrg mailing list
> iccrg@irtf.org
> https://www.irtf.org/mailman/listinfo/iccrg

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/