Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02

Bob Briscoe <ietf@bobbriscoe.net> Wed, 13 September 2023 12:07 UTC

Content-Type: multipart/alternative; boundary="------------W2rHL9zjn1iES3tBkGJSoNbO"
Message-ID: <c5ec024a-0d7d-669c-5afd-4c6d5c00c5c0@bobbriscoe.net>
Date: Wed, 13 Sep 2023 13:07:20 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0
Content-Language: en-GB
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Neal Cardwell <ncardwell@google.com>, iccrg IRTF list <iccrg@irtf.org>, Greg White <g.white@cablelabs.com>, "De Schepper, Koen (Koen)" <koen.de_schepper@nokia.com>, Vidhi Goel <vidhi_goel@apple.com>
References: <CADVnQynoZxSX1biBDkGV-PV5zQP4vgxuG=9t8HfNm80_q+zdeg@mail.gmail.com> <176686bb-b75a-a545-5ab7-6a9cc6ce097a@bobbriscoe.net> <7553D74D-5050-4E4A-947A-7CC59F97E584@gmx.de>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <7553D74D-5050-4E4A-947A-7CC59F97E584@gmx.de>
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/r2Tu3vpLiEaDa7QzbUPo7tcvGlM>
Subject: Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02
Precedence: list

Sebastian, See [BB2] (and sry for delayed reply...)

On 05/08/2023 09:37, Sebastian Moeller wrote:
> Question below, prefixed [SM]
>
>> On Aug 5, 2023, at 03:47, Bob Briscoe<ietf=40bobbriscoe.net@dmarc.ietf.org>  wrote:
> [...]
>>> -----------
>>> A system wide option is available to disable AccECN negotiation, but the Prague CC module will always override this setting, as it depends on AccECN. Then, solely in this case, AccECN will only be active for TCP flows using the Prague CCA.
>>> -->
>>> A system-wide sysctl is available to enable or disable AccECN negotiation. However, the Prague CC module overrides this sysctl and will always enable AccECN negotiation, since it depends on AccECN (i.e., when the system-wide sysctl disables AccECN negotiation, TCP flows using the Prague CCA will still attempt AccECN negotiation).
>> [BB] Yes, it was badly worded. I've had another go myself:
>>
>> A system-wide option is available to enable or disable AccECN negotiation. However, TCP flows using the Prague CCA module depend on AccECN; so they  always ignore this system-wide sysctl and enable AccECN negotiation anyway.
> [SM] This seems to violate the principle of least surprise. If there is a toggle to disable AccECN system-wide it needs to be honored. TCP Prague could maybe write a message in the kernel/system log noting that AccECN is missing and TCP prague might fail somehow. Alternatively at least introduce another sysctl to disable the use TCP Prague completely. The administrator of a system should be in control and that means the administrator can also put the system in odd states.

[BB2] Here it's written how Linux Prague currently works. If we have to 
change it when submitting for Linux mainlining, we'll change the I-D as 
well.

Altho this is really an issue for Linux netdev, I'll give the reasoning 
here (I'm no expert on Linux design principles, so pls bash).
* AccECN is a dependency for Prague. Admins expect Linux to 
automatically sort out dependencies.
* On a system with AccECN disabled, flows that are not using the prague 
cc module still have AccECN disabled.

Surely an admin would be surprised if they tried to enable Prague, but 
the system replied that it was not going to enable a dependency even 
though it could.
Wouldn't it be more normal to write a warning that AccECN had been 
enabled for Prague flows, even though it was disabled by a global sysctl?

>
>>> -----------
>>> A Prague CCA triggers update of its moving average once per RTT by recording the packet it sent after the previous update, then watching for the ACK of that packet to return.
>>> -->
>>> A Prague CCA triggers update of its moving average ECN mark rate once per rtt_virt [see Section 2.4.4].
>>>
>> [BB] Thx for catching this.
>>
>>> -----------
>>> To maintain its moving average, it measures the fraction, frac, of ACKed bytes or ACKed packets
>>> -->
>>> [IMHO the spec should specify whether the CCA is measuring using bytes or packets, since the answers may be very different depending on the approach, leading to unfairness between implementations with different approaches. I would argue for using the fraction of packets marked (as IIRC  I have argued on some IETF mailing list or another). And Linux TCP Prague is already doing this.]
>> [BB] Agreed that this ought to say just packets, to document what Linux Prague uses.
>>
>> If packet sizes were independently and identically distributed (IID), on average any differences would cancel out, 'cos the distribution of packet sizes is in both the top and bottom of the fraction. That assumes all L4S AQMs mark packets independently of size, which is currently true (and recommended by RFC7141).
>>
>> Nonetheless, if packet sizes do vary, they would very likely not be IID. For instance, if one end was sending ECN-capable pure ACKs, it would be likely to be sending a lot in a row, not just randomly. Then measuring bytes would be the right thing (adding a nominal header size to each packet if an exact one were not available).
> 	[SM] Why? If you want rate fairness (as your "right thing" seems to imply) then just use a rate-equalizing scheduler...

[BB2] This is primarily about preventing harm to self, not 
rate-fairness. It's to ensure that the rate is not reduced more than is 
appropriate in response to any marked small packets.

BTW, a CCA on a sending host cannot know what schedulers there might be 
in its path.

>
>> BTW, I do remember you raising this on a list somewhere. I meant to reply, and I guess it's still in my todo list somewhere - I'll dig it out.
>>
>> If we conclude thatRFC7141 is OK on this point, then we'll need to write something in the future work section under congestion metrics about this (and we'll have to implement it).
> 	[SM] I have mentioned before that I for one consider RFC7141 to be wrong on the
> " When a transport detects that a packet has been lost or congestion
>     marked, it SHOULD consider the strength of the congestion indication
>     as proportionate to the size in octets (bytes) of the missing or
>     marked packet."
>
> section. A flow should try to get as veridical an estimate about a congestin event as possible and react to that best estimate of the congestion, and if as RFC7141 recommends congestion marking does not take packet size into account, nor should the receiver of the congestion signal.

[BB] Disagreeing with something because it seems odd to you that two 
things don't match is not a reasoned argument, let alone a scientific 
argument.

The many reasons for why size should be taken into account by the 
function responding to congestion signals, and not by the function doing 
the marking are given in RFC7141 (which was the consensus outcome of 
protracted WG discussions). I understand that the chairs of tsvwg have 
already asked you to write up a draft of your arguments against RFC7141, 
if you have any. This is the constructive way expected at the IETF.

> Sidenote:  RFC7141 is ratified since 9.5 years and has been arguing for this odd dichotomy between encoding congestion signals and interpreting congestion signal since the first draft in 2007. The fact that apparent ZERO implementatins of the recommended approach seem to exist, let alone seem to be quantitatively used over the internet IMHO really should end that folly. Protocol stacks should not make up congestion signal, but simply respond appropriately to the best congestion estimate they can reasonably maintain.
>
> [...]

[BB2] RFC7141 recognized that, at the time, packet size was rarely taken 
into account in either case (neither marking nor responding), and 
explains that it is more important not to introduce size-dependent 
marking in the network (MUST NOT), while introducing size-dependent 
response on hosts is not obligatory but recommended (SHOULD).

There are implementations of size-dependent response, e.g. TFRC-SP. But 
implementation is not a voting system, because the message of RFC7141 is 
only *if you're implementing size-dependence*, do it in the host, not 
the network.  See https://www.rfc-editor.org/rfc/rfc7141#section-2.3


More generally, implementation is not a voting system on the correctness 
of science anyway.  Congestion control and AQM have a grounding in 
control engineering and the maths behind it. Yes, the IETF's mantra is 
"...running code". But that doesn't mean an implementation of bad 
science makes it good science.


Bob

>
>>> -----------
>>> Also, integer rounding bias ought to be removed from the multiplicative decrease calculation.
>>> -->
>>> [I would suggest spelling out how to do this correctly to increase the odds that this is implemented correctly by implementors that can't look at the GPL tcp_prague.c reference code]
>> [BB] I introduced a pseudcode name for the carry variable into the previous sentence, Then added the pseudocode below:
>>      "... delay can be made significantly less jumpy by tracking a fractional value, cwnd_carry, alongside the integer window and carrying over any fractional remainder to the next reduction." ... Specifically:
>>
>> #define ONE_CWND (1LL << 20)        /* Must be signed */
>> #define MAX_ALPHA (1ULL << 20)
>>
>> /* On CE feedback, calculate the reduction in cwnd */
>>      /* Adding MAX_ALPHA to the numerator effectively adds 1/2
>>       *  which compensates for integer division always rounding down */
>>      reduction = (alpha * cwnd * ONE_CWND + MAX_ALPHA) / MAX_ALPHA / 2;
>>      cwnd_carry -= reduction;
>>
>> /* Round reduction into whole segments and carry the remainder */
>>      if (cwnd_carry <= -ONE_CWND) {
>>          cwnd_carry += ONE_CWND;
>>          cwnd = max(cwnd - 1, MIN_CWND);
>>          ssthresh = cwnd;
>>      }
>>
>>
>>>
>>> -----------
>>> Example functions for the virtual RTT are:
>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>> 	• rtt_virt = srtt + AdditionalRTT;
>>> where RTT_VIRT_MIN and AdditionalRTT are constants. The current default is rtt_virt = max(srtt, 25ms), which addresses the main Prague requirement for when the RTT is smaller than typical.
>>> -->
>>> The virtual RTT, rtt_virt is computed as:
>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>> where RTT_VIRT_MIN = 25ms.This addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
>> [BB] The fluffiness is because this is a case where implementations might differ, so I've made it clearer what the Linux implementation does but also left in the other example. Also the constants depend on the deployment environment. Specifically:
>>
>> Example functions that implementations might use for the virtual RTT are:
>>      rtt_virt = max(srtt, RTT_VIRT_MIN);
>>      rtt_virt = srtt + AdditionalRTT;
>> where the parameters RTT_VIRT_MIN or AdditionalRTT would be set for a particular deployment environment.
>>
>> The Linux implementation of Prague uses the first example and, for the public Internet, it sets RTT_VIRT_MIN=25ms. Thus, Linux Prague defines
>> rtt_virt = max(srtt, 25ms), which addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
> 	[SM] I still think this is not a general solution*... this really just takes the edge off TCP Pragues increased RTT bias compared to other TCPs but will only be noticeable if the TCP Prague flow has an RTT below 25ms, if we look at competition say between TCP Prague @25ms RTT with TCP Prague@160ms we will still see a considerably larger RTT bias than with say TCP Cubic @25ms and TCP Cubic@160ms.
>
> *) IIRC this was only introduced as a counter-measure after some testing (https://github.com/heistp/l4s-tests) demonstrated quite noticeable increased RTT bias over TCO Cubic:
> https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667
>
> Especially the middle section with the FIFO. This test was using 10ms versus 160ms RTT and making Prague always act as if having 25 ms RTT ameliorated the issue somewhat, bit did not actually generally solve it. I find it odd to find a section "Reduced RTT-Dependence" in this draft given that essentially TCP Prague comes with a noticeably increased RTT bias (at least the default implementation for Linux). Big fan of truth in advertising....
>
>

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/

Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Neal Cardwell
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe