Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02

Sebastian Moeller <moeller0@gmx.de> Wed, 13 September 2023 12:34 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <c5ec024a-0d7d-669c-5afd-4c6d5c00c5c0@bobbriscoe.net>
Date: Wed, 13 Sep 2023 14:33:04 +0200
Cc: Neal Cardwell <ncardwell@google.com>, iccrg IRTF list <iccrg@irtf.org>, Greg White <g.white@cablelabs.com>, "De Schepper, Koen (Koen)" <koen.de_schepper@nokia.com>, Vidhi Goel <vidhi_goel@apple.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F49FF965-F0F6-482D-B3E5-6C765D2353A7@gmx.de>
References: <CADVnQynoZxSX1biBDkGV-PV5zQP4vgxuG=9t8HfNm80_q+zdeg@mail.gmail.com> <176686bb-b75a-a545-5ab7-6a9cc6ce097a@bobbriscoe.net> <7553D74D-5050-4E4A-947A-7CC59F97E584@gmx.de> <c5ec024a-0d7d-669c-5afd-4c6d5c00c5c0@bobbriscoe.net>
To: Bob Briscoe <ietf@bobbriscoe.net>
UI-OutboundReport: notjunk:1;M01:P0:bHNhDWDZ9Cg=;y7oAkB8gDR4c4eQ8KmCDEzQG25G qpvHwXFRDbzKTUsoBr5Cwit4HyiOrTDl75qNm2uoQw56G6Z3PGXYRmRAWn0Sc2Rn9JSnP94bN w9CvAVBWSA4Gnx5f36UnEbJxg8xYlxb5ipMLjFxKvGzUom4aoc7mg8Y08nMIVwLJW7sXxEbqD lIRmkRW6/3lfK2IsS+cZbzH9JHg6PkMWcW5xNVCGVlP9z4TkarPRkiJjezr37ADnud+Pr7vLm IMpo2hga09dzpONJfvkM4w/A2zkVIIAhMbnkx1QioKh7CBsEToLbG5kGkMOT++82T9nHmUy70 yoOOD/7DUcJnWa/mc0QFss/RZP4ixE/wcw1Kg8RbPNhwHHL/Uo3EYIbAJMgu1NmTp4ymJeoQM 6/4WTEqrY7JLc4bgUW0vcMQI6L0gzlJ5UFEILzJnuFMii3ytiiKpEbCgCAb9XaQBhXGYuVtgQ 7DCBd1OorqSOtJ+j9u7acFJVcWAvpq1y7QzDN53rbByVJcj9/7zaLcqjXpbbrSGTCXj6EzJC1 38/s/E+xnDY3JZvAxZILJfmIqFiuaSqXc6UgKaIUeb/v31DMAtNoaR8C3YtM51VNkIy+Bzo5A urGu3EXhrJ2B8QvbiJcoRKG3HmhVjpIiD63gOMWOITNqjdHB19Jnw8nboY/XTbSPlCFgYJEv0 QMhXUQh3M+OdOeJWcRuAyyN+9N78n9D4u5haiBIkbsvlo5robTZy62/XGnR8a3isK30cshpzE /o2MuwyVv8IC74YNIDB0tA8o9NmBZtbedrX/zaYNNRXGM5sKxHXcsWC4c/tBTNyyieRWro7Iy +t0N2oVN6M2PDeuof8VWci0kh1w/9h6VZnGZ2/DUJ7K9ZwN+A1TZYn0nltNddraAY5DFrbFcg pgJPOCYM2X15AvYi/ClPjtadSBkQQw5NcZ0S4O1FJds90ajmgzDGiUi/FRT6QDCj48jmrlfg0 HcfRpThtFrCLLbgJtLZJh7PuWZU=
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/mspHVSmG34GYSKPUL8wn_NezN_Q>
Subject: Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02
Precedence: list

Hi Bob,


> On Sep 13, 2023, at 14:07, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastian, See [BB2] (and sry for delayed reply...)
> 
> On 05/08/2023 09:37, Sebastian Moeller wrote:
>> Question below, prefixed [SM]
>> 
>> 
>>> On Aug 5, 2023, at 03:47, Bob Briscoe <ietf=40bobbriscoe.net@dmarc.ietf.org>
>>>  wrote:
>>> 
>> [...]
>> 
>>>> -----------
>>>> A system wide option is available to disable AccECN negotiation, but the Prague CC module will always override this setting, as it depends on AccECN. Then, solely in this case, AccECN will only be active for TCP flows using the Prague CCA.
>>>> -->
>>>> A system-wide sysctl is available to enable or disable AccECN negotiation. However, the Prague CC module overrides this sysctl and will always enable AccECN negotiation, since it depends on AccECN (i.e., when the system-wide sysctl disables AccECN negotiation, TCP flows using the Prague CCA will still attempt AccECN negotiation).
>>>> 
>>> [BB] Yes, it was badly worded. I've had another go myself:
>>> 
>>> A system-wide option is available to enable or disable AccECN negotiation. However, TCP flows using the Prague CCA module depend on AccECN; so they  always ignore this system-wide sysctl and enable AccECN negotiation anyway. 
>>> 
>> [SM] This seems to violate the principle of least surprise. If there is a toggle to disable AccECN system-wide it needs to be honored. TCP Prague could maybe write a message in the kernel/system log noting that AccECN is missing and TCP prague might fail somehow. Alternatively at least introduce another sysctl to disable the use TCP Prague completely. The administrator of a system should be in control and that means the administrator can also put the system in odd states. 
> 
> [BB2] Here it's written how Linux Prague currently works. If we have to change it when submitting for Linux mainlining, we'll change the I-D as well.
> 
> Altho this is really an issue for Linux netdev, I'll give the reasoning here (I'm no expert on Linux design principles, so pls bash).
> * AccECN is a dependency for Prague. Admins expect Linux to automatically sort out dependencies.
> * On a system with AccECN disabled, flows that are not using the prague cc module still have AccECN disabled.
> 
> Surely an admin would be surprised if they tried to enable Prague, but the system replied that it was not going to enable a dependency even though it could.

	[SM] No if there is a toggle for AccECN it needs to be honored. You can introduce multiple modes to your AccECN sysctl, like:
0: disable unconditionally
1: enable unconditionally
2: enable only for TCP-Prague

and for all it is worth you might even convincingly argue for 2 being the default, but "disabled" really needs to be disabled...


> Wouldn't it be more normal to write a warning that AccECN had been enabled for Prague flows, even though it was disabled by a global sysctl?

	[SM] As you noted, not my call to make. However as a Linux and sysctl user I certainly would prefer a message telling me that I need to change AccECN to 1 or 2 instead of silently ignoring my p[olicy setting.


> 
>> 
>>>> -----------
>>>> A Prague CCA triggers update of its moving average once per RTT by recording the packet it sent after the previous update, then watching for the ACK of that packet to return.
>>>> -->
>>>> A Prague CCA triggers update of its moving average ECN mark rate once per rtt_virt [see Section 2.4.4].
>>>> 
>>>> 
>>> [BB] Thx for catching this.
>>> 
>>> 
>>>> -----------
>>>> To maintain its moving average, it measures the fraction, frac, of ACKed bytes or ACKed packets
>>>> -->
>>>> [IMHO the spec should specify whether the CCA is measuring using bytes or packets, since the answers may be very different depending on the approach, leading to unfairness between implementations with different approaches. I would argue for using the fraction of packets marked (as IIRC  I have argued on some IETF mailing list or another). And Linux TCP Prague is already doing this.]
>>>> 
>>> [BB] Agreed that this ought to say just packets, to document what Linux Prague uses. 
>>> 
>>> If packet sizes were independently and identically distributed (IID), on average any differences would cancel out, 'cos the distribution of packet sizes is in both the top and bottom of the fraction. That assumes all L4S AQMs mark packets independently of size, which is currently true (and recommended by RFC7141).
>>> 
>>> Nonetheless, if packet sizes do vary, they would very likely not be IID. For instance, if one end was sending ECN-capable pure ACKs, it would be likely to be sending a lot in a row, not just randomly. Then measuring bytes would be the right thing (adding a nominal header size to each packet if an exact one were not available).
>>> 
>> 	[SM] Why? If you want rate fairness (as your "right thing" seems to imply) then just use a rate-equalizing scheduler... 
> 
> [BB2] This is primarily about preventing harm to self, not rate-fairness. It's to ensure that the rate is not reduced more than is appropriate in response to any marked small packets.
> 
> BTW, a CCA on a sending host cannot know what schedulers there might be in its path.
> 
>> 
>>> BTW, I do remember you raising this on a list somewhere. I meant to reply, and I guess it's still in my todo list somewhere - I'll dig it out.
>>> 
>>> If we conclude thatRFC7141 is OK on this point, then we'll need to write something in the future work section under congestion metrics about this (and we'll have to implement it).
>>> 
>> 	[SM] I have mentioned before that I for one consider RFC7141 to be wrong on the 
>> " When a transport detects that a packet has been lost or congestion
>>    marked, it SHOULD consider the strength of the congestion indication
>>    as proportionate to the size in octets (bytes) of the missing or
>>    marked packet."
>> 
>> section. A flow should try to get as veridical an estimate about a congestin event as possible and react to that best estimate of the congestion, and if as RFC7141 recommends congestion marking does not take packet size into account, nor should the receiver of the congestion signal.
>> 
> 
> [BB] Disagreeing with something because it seems odd to you that two things don't match is not a reasoned argument, let alone a scientific argument.

	[SM] I laid out my rationale, why the proposal in rfc7141 is logically not sound... that seems sufficiently reasonable and scientific to me. I did not wrote "odd" I used the classification "wrong".


> The many reasons for why size should be taken into account by the function responding to congestion signals, and not by the function doing the marking are given in RFC7141 (which was the consensus outcome of protracted WG discussions).

	[SM] And I am telling you that RFC7141 simply is wrong in that regard. As seen with other drafts, not all sections and sentences receive the scrutiny they require and sometimes things slip though that make little sense, IMHO this is one of those cases.

> I understand that the chairs of tsvwg have already asked you to write up a draft of your arguments against RFC7141, if you have any. This is the constructive way expected at the IETF. 

	[SM] I opted created an erratum for rfc 7141 instead and the response to that (and lack thereof) convinced me that writing a new draft is going to be an exercise in futility, but I digress.


> 
>> Sidenote:  RFC7141 is ratified since 9.5 years and has been arguing for this odd dichotomy between encoding congestion signals and interpreting congestion signal since the first draft in 2007. The fact that apparent ZERO implementatins of the recommended approach seem to exist, let alone seem to be quantitatively used over the internet IMHO really should end that folly. Protocol stacks should not make up congestion signal, but simply respond appropriately to the best congestion estimate they can reasonably maintain.
>> 
>> [...]
>> 
> 
> [BB2] RFC7141 recognized that, at the time, packet size was rarely taken into account in either case (neither marking nor responding), and explains that it is more important not to introduce size-dependent marking in the network (MUST NOT), while introducing size-dependent response on hosts is not obligatory but recommended (SHOULD).

	[]


> 
> There are implementations of size-dependent response, e.g. TFRC-SP. 

	[SM] Except that  TFRC-SP simply counts marked packets.... the size dependence hence is not related to interpretation of congestion signals, see https://datatracker.ietf.org/doc/html/rfc4828:
In TFRC-SP, the loss event rate is calculated by counting at most one
   loss event in loss intervals longer than two round-trip times, and by
   counting each packet lost or marked in shorter loss intervals.

This is pretty clear in what is counted, and that is not related to the size of the received packets. TFRC-SP needs to calculate a loss rate because it intends to respond as if it was a flow with larger packet size, but for that loss rate does not depend on number of marked bytes... This is IMHO not an applicable example for "size-dependence" of interpretation of congestion signals.



> But implementation is not a voting system, because the message of RFC7141 is only if you're implementing size-dependence, do it in the host, not the network.  See https://www.rfc-editor.org/rfc/rfc7141#section-2.3

[SM] Here is the first section of that section:

   When a transport detects that a packet has been lost or congestion
   marked, it SHOULD consider the strength of the congestion indication
   as proportionate to the size in octets (bytes) of the missing or
   marked packet.

In other words, when a packet indicates congestion (by being lost or
   marked), it can be considered conceptually as if there is a
   congestion indication on every octet of the packet, not just one
   indication per packet.

   To be clear, the above recommendation solely describes how a
   transport should interpret the meaning of a congestion indication, as
   a long term goal.  It makes no recommendation on whether a transport
   should act differently based on this interpretation.

This is considerably stronger than a "if you must, do it at the end-hosts" argument, this is a clear recommendation to do it at the end hosts.... 
IMHO the last sentence than makes this worse... interpret it differently but do not act on that interpretation...



> More generally, implementation is not a voting system on the correctness of science anyway.

	[SM] Indeed. However a working implementation typically (especially if used in a peer reviewed paper, or accepted by critical upstreams) implies that some level of testing was used and a few critocal eyes have looked over it.


>  Congestion control and AQM have a grounding in control engineering and the maths behind it. Yes, the IETF's mantra is "...running code". But that doesn't mean an implementation of bad science makes it good science. 

	[SM] +1; but that is not my claim. My claim is that the attempt at injecting policy (not backed up by empirical data) failed and it is time to let go...

Regards
	Sebastian


> 
> 
> Bob
> 
>> 
>>>> -----------
>>>> Also, integer rounding bias ought to be removed from the multiplicative decrease calculation.
>>>> -->
>>>> [I would suggest spelling out how to do this correctly to increase the odds that this is implemented correctly by implementors that can't look at the GPL tcp_prague.c reference code]
>>>> 
>>> [BB] I introduced a pseudcode name for the carry variable into the previous sentence, Then added the pseudocode below:
>>>     "... delay can be made significantly less jumpy by tracking a fractional value, cwnd_carry, alongside the integer window and carrying over any fractional remainder to the next reduction." ... Specifically:
>>> 
>>> #define ONE_CWND (1LL << 20)        /* Must be signed */
>>> #define MAX_ALPHA (1ULL << 20)
>>> 
>>> /* On CE feedback, calculate the reduction in cwnd */
>>>     /* Adding MAX_ALPHA to the numerator effectively adds 1/2 
>>>      *  which compensates for integer division always rounding down */
>>>     reduction = (alpha * cwnd * ONE_CWND + MAX_ALPHA) / MAX_ALPHA / 2;
>>>     cwnd_carry -= reduction;
>>> 
>>> /* Round reduction into whole segments and carry the remainder */
>>>     if (cwnd_carry <= -ONE_CWND) {
>>>         cwnd_carry += ONE_CWND;
>>>         cwnd = max(cwnd - 1, MIN_CWND);
>>>         ssthresh = cwnd;
>>>     }
>>> 
>>> 
>>> 
>>>> 
>>>> -----------
>>>> Example functions for the virtual RTT are:
>>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>> 	• rtt_virt = srtt + AdditionalRTT;
>>>> where RTT_VIRT_MIN and AdditionalRTT are constants. The current default is rtt_virt = max(srtt, 25ms), which addresses the main Prague requirement for when the RTT is smaller than typical.
>>>> -->
>>>> The virtual RTT, rtt_virt is computed as:
>>>> 	• rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>> where RTT_VIRT_MIN = 25ms.This addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
>>>> 
>>> [BB] The fluffiness is because this is a case where implementations might differ, so I've made it clearer what the Linux implementation does but also left in the other example. Also the constants depend on the deployment environment. Specifically:
>>> 
>>> Example functions that implementations might use for the virtual RTT are:
>>>     rtt_virt = max(srtt, RTT_VIRT_MIN);
>>>     rtt_virt = srtt + AdditionalRTT;
>>> where the parameters RTT_VIRT_MIN or AdditionalRTT would be set for a particular deployment environment.
>>> 
>>> The Linux implementation of Prague uses the first example and, for the public Internet, it sets RTT_VIRT_MIN=25ms. Thus, Linux Prague defines 
>>> rtt_virt = max(srtt, 25ms), which addresses the Prague requirement for Reduced RTT-Dependence when the RTT is smaller than typical public Internet RTTs.
>>> 
>> 	[SM] I still think this is not a general solution*... this really just takes the edge off TCP Pragues increased RTT bias compared to other TCPs but will only be noticeable if the TCP Prague flow has an RTT below 25ms, if we look at competition say between TCP Prague @25ms RTT with TCP Prague@160ms we will still see a considerably larger RTT bias than with say TCP Cubic @25ms and TCP Cubic@160ms.
>> 
>> *) IIRC this was only introduced as a counter-measure after some testing (
>> https://github.com/heistp/l4s-tests
>> ) demonstrated quite noticeable increased RTT bias over TCO Cubic:
>> 
>> https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667
>> 
>> 
>> Especially the middle section with the FIFO. This test was using 10ms versus 160ms RTT and making Prague always act as if having 25 ms RTT ameliorated the issue somewhat, bit did not actually generally solve it. I find it odd to find a section "Reduced RTT-Dependence" in this draft given that essentially TCP Prague comes with a noticeably increased RTT bias (at least the default implementation for Linux). Big fan of truth in advertising.... 
>> 
>> 
>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/

Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Neal Cardwell
Re: [iccrg] editorial comments on draft-briscoe-i… Sebastian Moeller
Re: [iccrg] editorial comments on draft-briscoe-i… Bob Briscoe