Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02

Bob Briscoe <ietf@bobbriscoe.net> Sat, 05 August 2023 01:47 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC660C14CF18 for <iccrg@ietfa.amsl.com>; Fri, 4 Aug 2023 18:47:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.197
X-Spam-Level:
X-Spam-Status: No, score=-2.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mFAhKadw8EuX for <iccrg@ietfa.amsl.com>; Fri, 4 Aug 2023 18:47:27 -0700 (PDT)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1E977C14CF13 for <iccrg@irtf.org>; Fri, 4 Aug 2023 18:47:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=In-Reply-To:Cc:References:To:Subject:From: MIME-Version:Date:Message-ID:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=l4Ts9t7n14vdyAPZKTBQCf64fuLMmgM8TnTaDfsSQEQ=; b=H796uWU9nncCG7C/eV1CuYByp+ SCr5ZChE7QuGz0ipglQdZDMkqwLbrNHdVU178cs2j7mmzcD2xsKof9RCzpVbQl6fzqzgR/4RTCG91 6N4+9wV2obqTyxtXqpH42KouwVGUB10VxcAQM+GHRZMDtA+sngDuZ09Vv+alx+hOq7simiZWKLOZ2 pMq9jBB/a6nkDnKG0aWSMTal3QkwN4ahwdpeqWZbV7HhbbfPIRJjW52Hl2aZlymcVA/w6VL9OJxiY 8H7iRvXfRl036urxGadaOtH7FmGjudibxtuyk4Uu4jSHpIttP5guT5EOru970XcAgqbN95qt1IjNu QT3VB7nA==;
Received: from [156.39.10.100] (port=49422 helo=[10.46.43.213]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <ietf@bobbriscoe.net>) id 1qS6OB-0005e5-1t; Sat, 05 Aug 2023 02:47:22 +0100
Content-Type: multipart/alternative; boundary="------------YEgleOriEZPiamtACpRKZklY"
Message-ID: <176686bb-b75a-a545-5ab7-6a9cc6ce097a@bobbriscoe.net>
Date: Fri, 04 Aug 2023 18:47:17 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Neal Cardwell <ncardwell@google.com>
References: <CADVnQynoZxSX1biBDkGV-PV5zQP4vgxuG=9t8HfNm80_q+zdeg@mail.gmail.com>
Content-Language: en-GB
Cc: iccrg IRTF list <iccrg@irtf.org>, Greg White <g.white@cablelabs.com>, "De Schepper, Koen (Koen)" <koen.de_schepper@nokia.com>, Vidhi Goel <vidhi_goel@apple.com>
In-Reply-To: <CADVnQynoZxSX1biBDkGV-PV5zQP4vgxuG=9t8HfNm80_q+zdeg@mail.gmail.com>
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - irtf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/-HITWPF-JQI-z3F4LfmiIDVX1u8>
Subject: Re: [iccrg] editorial comments on draft-briscoe-iccrg-prague-congestion-control-02
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Aug 2023 01:47:31 -0000

Neal, (and adding iccrg as suggested)

Thanks for these. All accepted except the one about connection v kernel.
I've given responses to some, but please take no response to mean 
'accepted'.

Anyone can find the diff in the editor's copy of the xml, linked via the 
draft's datatracker page.

On 28/07/2023 16:18, Neal Cardwell wrote:
> Hi,
>
> Spent the flight back home from the IETF reading the Prague CC draft 
> and skimming the tcp_prague.c code. Thanks for both! Below are some 
> editorial comments/suggestions from reading the draft. Hope some of 
> them are useful.
>
> Feel free to CC this e-mail to IETF mailing lists if you feel that 
> makes sense.
>
> best regards,
> neal
>
> -----------
>
> But if it is sharing the bottleneck with Classic ECN traffic, this is 
> more difficult to detect
>
> -->
>
> But if it is sharing a Classic ECN  bottleneckwith Classic ECN 
> traffic, this is more difficult to detect
>
>
> -----------
>
> A general response function has the form cwnd = K/p^B
>
> -->
>
> One family of response functionshas the form cwnd = K/p^B
>

[BB] Yes, not the only family - you're right.
Nonetheless, I made this 'A widely used family of response functions...' 
'cos otherwise it could have been read as some little-known family.

>
> -----------
>
> The meeting agreed a list
>
> -->
>
> The meeting agreed ona list
>

[BB] Happy to oblige, given British English allows both, but apparently 
a US ear doesn't.

>
> -----------
>
> Scaling down to a fractional window (no longer mandatory, see Section 3.5)
>
> -->
>
> [Perhaps this should be mandatory, to avoid having low-RTT flows 
> starve high-RTT flows? Sounds like IETF 117 testing had some 
> conclusions in this area? What were the findings there?]
>

[BB] For this draft, I've just added '(*Recommended but *no longer 
mandatory...' 'cos this debate really needs to be in the context of all 
scalable controls - i.e. a standards track update to RFC9331, not here.

>
> -----------
>
> An additional check is provided to verify that the kernel actually 
> does support AccECN
>
> -->
>
> An additional check is provided to verify that the connectionactually 
> does support AccECN
>

[BB] This does actually intend to mean the kernel.
I've clarified with "*The Prague CCA module  does a*n additional check 
to verify that the kernel actually does support AccECN"

>
> -----------
>
> A system wide option is available to disable AccECN negotiation, but 
> the Prague CC module will always override this setting, as it depends 
> on AccECN. Then, solely in this case, AccECN will only be active for 
> TCP flows using the Prague CCA.
>
> -->
>
> A system-wide sysctl is available to *enable or* disable AccECN 
> negotiation. However, the Prague CC module overrides this sysctl and 
> will always enable AccECN negotiation, since it depends on AccECN 
> (i.e., when the system-wide sysctl disables AccECN negotiation, TCP 
> flows using the Prague CCA will still attempt AccECN negotiation).
>

[BB] Yes, it was badly worded. I've had another go myself:

A system-wide option is available to enable or disable AccECN 
negotiation. However, TCP flows using the Prague CCA module depend on 
AccECN; so they  always ignore this system-wide sysctl and enable AccECN 
negotiation anyway.

>
> -----------
>
> A Prague CCA triggers update of its moving average once per RTT by 
> recording the packet it sent after the previous update, then watching 
> for the ACK of that packet to return.
>
> -->
>
> A Prague CCA triggers update of its moving average ECN mark rate once 
> per rtt_virt [see Section 2.4.4].
>
>

[BB] Thx for catching this.

> -----------
>
> To maintain its moving average, it measures the fraction, frac, of 
> ACKed bytes or ACKed packets
>
> -->
>
> [IMHO the spec should specify whether the CCA is measuring using bytes 
> or packets, since the answers may be very different depending on the 
> approach, leading to unfairness between implementations with different 
> approaches. I would argue for using the fraction of packets marked (as 
> IIRC  I have argued on some IETF mailing list or another). And Linux 
> TCP Prague is already doing this.]
>

[BB] Agreed that this ought to say just packets, to document what Linux 
Prague uses.

If packet sizes were independently and identically distributed (IID), on 
average any differences would cancel out, 'cos the distribution of 
packet sizes is in both the top and bottom of the fraction. That assumes 
all L4S AQMs mark packets independently of size, which is currently true 
(and recommended by RFC7141).

Nonetheless, if packet sizes do vary, they would very likely not be IID. 
For instance, if one end was sending ECN-capable pure ACKs, it would be 
likely to be sending a lot in a row, not just randomly. Then measuring 
bytes would be the right thing (adding a nominal header size to each 
packet if an exact one were not available).

BTW, I do remember you raising this on a list somewhere. I meant to 
reply, and I guess it's still in my todo list somewhere - I'll dig it out.

If we conclude that RFC7141 is OK on this point, then we'll need to 
write something in the future work section under congestion metrics 
about this (and we'll have to implement it).

>
> -----------
> that carried ECN feedback over the previous round trip
> -->
> that carried ECN feedback over the previous *rtt_virt*
>
>
> -----------
>
> where g is the gain of the EWMA (default 1/16).
>
> -->
>
> where *g=1/16* is the gain of the EWMA.
>
>
> -----------
>
> Linux represents alpha with a 10-bit integer (with resolution 1/1024).
>
> -->
>
> Linux DCTCP represents alpha with a 10-bit integer (with resolution 
> 1/1024).
>
>
> -----------
>
> Currently the above per-RTT update to the moving average, which was 
> inherited from DCTCP, is the default in the Linux Prague CCA.
>
> -->
>
> The above per-rtt_virtupdate to the moving average is the approach  in 
> the Linux Prague CCA.
>
>
> -----------
>
> Subsequently, as the window grows, RACK shifts to using a fraction of 
> the RTT for loss detection.
>
> -->
>
> If the TCP connection detects reordering, then RACK shifts to using a 
> fraction of the RTT for loss detection.
>
>
> -----------
>
> as long as the reductions, due to ECN and the loss, when multiplied 
> together result in the reduction that the implementation usually makes 
> in response to loss (e.g. 50% to emulate Reno or 30% to emulate CUBIC).
>
> ->
>
> as long as the reductions, due to ECN and the loss, when combined, 
> result in a reduction that is at least as large as theimplementation 
> usually makes in response to loss (e.g. 50% to emulate Reno or 30% to 
> emulate CUBIC).
>
> [In the spirit of the discussion in Section 3.2.1 "ECN with Loss", 
> with ECN+loss in a single round trip, presumably we want the reduction 
> to be at least as large as the max of the reductions dictated by the 
> ECN and loss responses separately. For example, if the ECN mark rate 
> is a sustained 100% and then there is a packet loss, we want a 
> CUBIC-ish Prague CCA to reduce cwnd by the ECN-dictated reduction of 
> 50% rather than the CUBIC-loss reduction of 30%].
>

[BB] Yes, good point.

>
>
> -----------
>
> Linux Prague CCA clocks its moving average of ECN-marking, alpha, once 
> per round trip
>
> -->
>
> Linux Prague CCA clocks its moving average of ECN-marking, alpha, once 
> per rtt_virt [see Section 2.4.4]
>

[BB] Thank you. I really didn't do a good job of finding all the 
occurrences of 'round trip' or 'RTT', did I!

>
> -----------
>
> Also, integer rounding bias ought to be removed from the 
> multiplicative decrease calculation.
>
> -->
>
> [I would suggest spelling out how to do this correctly to increase the 
> odds that this is implemented correctly by implementors that can't 
> look at the GPL tcp_prague.c reference code]
>

[BB] I introduced a pseudcode name for the carry variable into the 
previous sentence, Then added the pseudocode below:
     "... delay can be made significantly less jumpy by tracking a 
fractional value, cwnd_carry, alongside the integer window and carrying 
over any fractional remainder to the next reduction." ... Specifically:

#define ONE_CWND (1LL << 20)        /* Must be signed */
#define MAX_ALPHA (1ULL << 20)

/* On CE feedback, calculate the reduction in cwnd */
     /* Adding MAX_ALPHA to the numerator effectively adds 1/2
      *  which compensates for integer division always rounding down */
     reduction = (alpha * cwnd * ONE_CWND + MAX_ALPHA) / MAX_ALPHA / 2;
     cwnd_carry -= reduction;

/* Round reduction into whole segments and carry the remainder */
     if (cwnd_carry <= -ONE_CWND) {
         cwnd_carry += ONE_CWND;
         cwnd = max(cwnd - 1, MIN_CWND);
         ssthresh = cwnd;
     }


>
>
> -----------
>
> Example functions for the virtual RTT are:
>
>  *
>
>     rtt_virt = max(srtt, RTT_VIRT_MIN);
>
>  *
>
>     rtt_virt = srtt + AdditionalRTT;
>
> where RTT_VIRT_MIN and AdditionalRTT are constants. The current 
> default is rtt_virt = max(srtt, 25ms), which addresses the main Prague 
> requirement for when the RTT is smaller than typical.
>
> -->
>
> The virtual RTT, rtt_virt is computed as:
>
>  *
>
>     rtt_virt = max(srtt, RTT_VIRT_MIN);
>
> where RTT_VIRT_MIN = 25ms.This addresses the Prague requirement for 
> Reduced RTT-Dependence when the RTT is smaller than typical public 
> Internet RTTs.
>

[BB] The fluffiness is because this is a case where implementations 
might differ, so I've made it clearer what the Linux implementation does 
but also left in the other example. Also the constants depend on the 
deployment environment. Specifically:

Example functions that implementations might use for the virtual RTT are:
     rtt_virt = max(srtt, RTT_VIRT_MIN);
     rtt_virt = srtt + AdditionalRTT;
where the parameters RTT_VIRT_MIN or AdditionalRTT would be set for a 
particular deployment environment.

The Linux implementation of Prague uses the first example and, for the 
public Internet, it sets RTT_VIRT_MIN=25ms. Thus, Linux Prague defines
rtt_virt = max(srtt, 25ms), which addresses the Prague requirement for 
Reduced RTT-Dependence when the RTT is smaller than typical public 
Internet RTTs.

>
> -----------
>
> it spans 5 actual round trips
>
> ->
>
> cwnd_virt spans 5 actual round trips
>
> [the text looks correct; this is just a suggestion for clarity...]
>
>
> -----------
>
> bursts of packets that can occur, for example, when a jump in the 
> acknowledgement number opens up cwnd
>
> ->
> bursts of packets that can occur, for example, when restarting from 
> idle orwhen a jump in the acknowledgement number opens up cwnd
>
>
> -----------
>
> A Prague CCA SHOULD pace the packets it sends
>
> -->
>
> A Prague CCA MUST pace the packets it sends
>
> [Hard to see how an implementation can keep delays tolerable if it 
> bursts a full cwnd when restarting from idle?]
>
>
> -----------
>
> sleeping handset battery
>
> -->
>
> sleeping handset *radio*
>
>
> -----------
>
> it is also planned to allow the application
>
> -->
>
> It is also planned to allow the application
>
> [capitalization]
>
>
> -----------
>
> New EWMA and resposne algorithms
>
> -->
>
> New EWMA and responsealgorithms
>
>
> -----------
>
> stack ahs to be
>
> -->
>
> stack hasto be
>
>

Thanks again for all these.

Cheers


Bob

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/