Re: [tcpm] [Technical Errata Reported] RFC8257 (6697)

Martin Duke <martin.h.duke@gmail.com> Thu, 04 November 2021 21:54 UTC

Return-Path: <martin.h.duke@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 516A03A0C47; Thu, 4 Nov 2021 14:54:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rF4SWjCAjUxo; Thu, 4 Nov 2021 14:54:16 -0700 (PDT)
Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com [IPv6:2607:f8b0:4864:20::92c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1439F3A0C4E; Thu, 4 Nov 2021 14:54:16 -0700 (PDT)
Received: by mail-ua1-x92c.google.com with SMTP id q13so13662249uaq.2; Thu, 04 Nov 2021 14:54:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KlRsfJUN5N4WeR5Hn38utyCcxw9fRLporw34Vpf4yQs=; b=RyEMzgRlCC8GShaXMeXAnkvpxZJqohORDL1hIe/ae7B2nu7iRuT183kiZfIM8lZmy+ qKq7338r1lCBugybXjIzDLOZa9oh8/4uziaB9iW7NwbFURjPM9WAFo+t5dMt969cSOOX 13uT9HE84sYZ12rVU9Hso7KB2R4vEDKhqKcjxL2RBfYz+bwyyQLy/EcxYpulCrEmRCjP oqQkCNh0q1aE0y+K6zEF1TswOL05AVi5ebK/gMqqNBAGV+/5A6y5Wmhg/1Az49C//uL3 8pkVUQKLA3sFPqeFwxV+LqtFcLJEXN5JPPd5TtrEr4tff8hFZ5itE+oxtlNrauMGxmYY FueA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KlRsfJUN5N4WeR5Hn38utyCcxw9fRLporw34Vpf4yQs=; b=aor0LqtiEombkHjwIjVq5rsZ3U5pjUdk1dyfAGayLYUvHbFjppLDlbKioUyzg7XPqy xx3HZmDFj0wPjY4Cgsn4Bidtp29XwOOe6ZSz0RrbcBFziYGt3dQurvqZ0b19Nr0oKwZU fikNhAPYOSAoHvy0itUHj1xum/sW6RnJj1dq/QQLBS22j7SPOloyyAeSlKgObnBroMv0 ghKPciyIm/yp3MKeoSi1YPTGJRQjtwnbl7Jshbo0v/Nb79u2AthUNl7xGBrPtV5fDO++ rJMJlI/EsNuHBD1lFLugRSuk69OEShdNBLJcDQNiH2fD9Yc2kn4w6z4BkdZZ4ajsfN63 Jvsw==
X-Gm-Message-State: AOAM531TQ7X3HNzuqWYv6JIv2aM9oploQficOsFf0GODFR0YQrvYH89V 0FWxZ7i6dpj5j4wAJvIVeeUWzI11WVXNj8C31RM=
X-Google-Smtp-Source: ABdhPJzORNeQOZQ2cAM1M/1SX2deX4Vryuf9LRjG5WP18TbjfwXcA+2PtDMA3jHJa158aamNuo2v9ZzEr94iszVeSRU=
X-Received: by 2002:a05:6102:3e82:: with SMTP id m2mr66851195vsv.58.1636062852507; Thu, 04 Nov 2021 14:54:12 -0700 (PDT)
MIME-Version: 1.0
References: <20210928071818.BE0D7F40865@rfc-editor.org> <96ce4984-3678-9bdf-6b76-d7ba1bd42dcc@bobbriscoe.net> <CADVnQymMRzvs_4QRuSziYXfwu6ttKfak5cv5G=eBRvX8qOQKWw@mail.gmail.com> <d1514f76-fc40-fa73-c953-efcb70fe6901@bobbriscoe.net> <abd8609d-b643-2911-f082-9ce2ebe41bbd@bobbriscoe.net> <CB870F54-4E2B-43C4-9674-7D847081D96D@fh-muenster.de> <DFE6EB68-DC06-4958-88A4-FD8ADF769226@apple.com> <618410F1.1000809@btconnect.com> <CAM4esxReBRNfkf7GJ9z44e9wNs5z4=GkLATYj5x9QW-Eo0p-iw@mail.gmail.com>
In-Reply-To: <CAM4esxReBRNfkf7GJ9z44e9wNs5z4=GkLATYj5x9QW-Eo0p-iw@mail.gmail.com>
From: Martin Duke <martin.h.duke@gmail.com>
Date: Thu, 04 Nov 2021 14:54:00 -0700
Message-ID: <CAM4esxTg1CpxiLEQOp9_kKiY+7Do2fSiqMZRXn7n_KWB1n3dJg@mail.gmail.com>
To: t petch <ietfa@btconnect.com>
Cc: Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>, Michael Tuexen <tuexen@fh-muenster.de>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Dave Thaler <dthaler@microsoft.com>, "Eggert, Lars" <lars@netapp.com>, "tsv-ads@ietf.org" <tsv-ads@ietf.org>, "tcpm-chairs@ietf.org" <tcpm-chairs@ietf.org>, RFC Errata System <rfc-editor@rfc-editor.org>
Content-Type: multipart/alternative; boundary="000000000000276ae505cffd9103"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/7XZlO06v4lY2scoRxVe587V6dVs>
Subject: Re: [tcpm] [Technical Errata Reported] RFC8257 (6697)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Nov 2021 21:54:22 -0000

As someone who hasn't implemented DCTCP, I'd like to understand the
proposed changes. While there's some editorial rearranging, which is
usually not good grounds for a verified errata, there are three concrete
concerns I see in the thread.

1) WindowEnd should be initialized to SND.NXT, not SND.UNA.

These are the same at the time of initialization (IIUC, entering
ESTABLISHED), no? The very first ACK (if not ECE) will set WindowEnd to
SND.NXT = (end of the first flight). If the first ack is ECE, then we'll
run through step 9 and have an immediate cwnd adjustment with Alpha=1. ISTM
there is an issue with the initial conditions in the latter case, but not
one fixed by initializing to SND.NXT.

2) If SEG.ACK < DCTCP.WindowEnd, it skips steps 5 through 9. It should skip
5-8 and still execute step 9.

My understanding from the paragraph after step 9 is that cwnd reduction in
step 9 should be done only once per RTT, so the original text seems
correct?

3) Step 9 should only apply when the ECE flag is set.

As this suggests the rule would only apply if the last ack in the window
was ECE, I think the correct condition would be M > 0 (i.e. something was
marked in the past window)

If I've missed a concern or mangled one of the three listed, don't hesitate
to speak up!

Martin

On Thu, Nov 4, 2021 at 1:38 PM Martin Duke <martin.h.duke@gmail.com> wrote:

> Yes, I'm the one to apply the erratum to the RFC.
>
> The rfc-editor page presents both the original and corrected version of
> the RFC. A future RFC that obsoletes this one would be expected to
> incorporate all verified errata.
>
> On Thu, Nov 4, 2021 at 9:57 AM t petch <ietfa@btconnect.com> wrote:
>
>> n 04/11/2021 00:37, Vidhi Goel wrote:
>> >>> The status of this erratum is 'Reported'. I think some consensus was
>> reached on the list. What happens now? Who is meant to propose updated text
>> for the erratum based on the discussion?
>> >> Hi Bob,
>> >>
>> >> as far as I know, the WG chairs can't change erratas. That might be
>> possible for AD, I think,
>> >> but I'm not sure.
>> >
>> > I am a bit new to the errata process and would like to understand a bit
>> more about next steps. Based on Bob’s suggestion about updated text and
>> feedback from others, how do we proceed to make the necessary change to the
>> RFC 8257?
>> > Is the only thing we can do now is somehow update the Suggestion in
>> errata itself? When would the update me applied to the RFC?
>>
>>
>> The usual form of an Erratum is along the lines of
>> OLD
>> <current text>
>> NEW
>> <proposed text>
>> Note
>> Section 31.102 is ambiguous and could be interpreted as ... or ....
>> This erratum .....
>>
>>
>> Discussion may result in a better version of <proposed text> in which
>> case the usual practice is to reject the Erratum and craft a new one.
>>
>> ADs sometimes add notes explaining why the action that is being taken is
>> being taken but I think it unusual for an AD to do more than that.
>>
>> This Erratum I do not understand and so would reject.  What is the
>> problem?  What is the fix?
>>
>> As others have said, an Erratum cannot change the consensus expressed in
>> the RFC.  The RFC is immutable.  A change of meaning is a new RFC.
>>
>> Perhaps discuss on the list what you perceive the problem to be and if
>> there is consensus that there is a problem and that it falls within the
>> limited scope of an Erratum, then craft an Erratum.
>>
>> Tom Petch
>>
>> > Thanks,
>> > Vidhi
>> >
>> >> On Nov 3, 2021, at 12:02 PM, tuexen@fh-muenster.de wrote:
>> >>
>> >>> On 3. Nov 2021, at 19:09, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>> >>>
>> >>> TCPM chairs,
>> >>>
>> >>> The status of this erratum is 'Reported'. I think some consensus was
>> reached on the list. What happens now? Who is meant to propose updated text
>> for the erratum based on the discussion?
>> >> Hi Bob,
>> >>
>> >> as far as I know, the WG chairs can't change erratas. That might be
>> possible for AD, I think,
>> >> but I'm not sure.
>> >>
>> >> Best regards
>> >> Michael
>> >>>
>> >>>
>> >>> Bob
>> >>>
>> >>> On 01/10/2021 14:12, Bob Briscoe wrote:
>> >>>> Neal,
>> >>>>
>> >>>> On 30/09/2021 16:43, Neal Cardwell wrote:
>> >>>>> I agree with the points made by Vidhi and Bob, and really like
>> Bob's text.
>> >>>>>
>> >>>>> In the suggested text there may be a typo; I believe we want
>> s/SND.UNA/SND.NXT/.
>> >>>>
>> >>>> [BB] Agree (and your next point about solely ECN indications).
>> >>>>
>> >>>> I only said SND.UNA 'cos I was looking back at this earlier sentence
>> in the RFC, and I copied the idea without engaging brain:
>> >>>>    o  DCTCP.WindowEnd: the TCP sequence number threshold when one
>> >>>>       observation window ends and another is to begin; initialized to
>> >>>>       SND.UNA.
>> >>>>
>> >>>> Why does this say SND.UNA? Is this another erratum? I believe the
>> Linux code initializes to SND.NXT in dctcp_reset(), which is called from
>> dctcp_init():
>> >>>>
>> https://elixir.bootlin.com/linux/v4.7/source/net/ipv4/tcp_dctcp.c#L78
>> >>>>
>> >>>> BTW, step 7 correctly says SND.NXT:
>> >>>>    7.  Determine the end of the next observation window:
>> >>>>
>> >>>>           DCTCP.WindowEnd = SND.NXT
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Bob
>> >>>>
>> >>>>> And probably we want to be more specific about only suppressing
>> further ECN-based reductions (further loss-triggered reductions would be
>> good to allow). I'm posting my suggested tweaks in blue, starting from
>> Bob's nice green text:
>> >>>>>
>> >>>>> SUGGESTED:
>> >>>>> ==========
>> >>>>>
>> >>>>> 3.4. Congestion Window Reduction
>> >>>>>    Rather than always halving the congestion window as described in
>> >>>>>    [RFC3168]
>> >>>>> , on the arrival of ECN congestion feedback,
>> >>>>> the sender SHOULD
>> >>>>>    update cwnd as follows:
>> >>>>>
>> >>>>>       cwnd = cwnd * (1 - DCTCP.Alpha / 2)
>> >>>>>
>> >>>>>    Just as specified in [RFC3168], DCTCP does not react to
>> congestion
>> >>>>>    indications more than once for every window of data.
>> >>>>> Therefore, as
>> >>>>>    for RFC3168 ECN, it sets the variable for the end of congestion
>> >>>>>    window reduced (CWR) state to
>> >>>>> SND.NXT
>> >>>>> and suppresses further
>> >>>>>
>> >>>>> ECN-triggered
>> >>>>> reductions until this TCP sequence number is acknowledged. Periods
>> >>>>>    of CWR state are triggered by congestion feedback, and therefore
>> >>>>>    occur at times unrelated to the continuous cycle of observation
>> >>>>>    windows used to update DCTCP.Alpha in Section 3.3.
>> >>>>>
>> >>>>>
>> >>>>>    The setting of the CWR bit is also as per [RFC3168].  This is
>> >>>>>    required for interoperation with classic ECN receivers due to
>> >>>>>    potential misconfigurations.
>> >>>>>
>> >>>>> 3.
>> >>>>> 5.  Handling of Congestion Window Growth...
>> >>>>>
>> >>>>> neal
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Sep 30, 2021 at 11:22 AM Bob Briscoe <in@bobbriscoe.net>
>> wrote:
>> >>>>> Vidhi,
>> >>>>>
>> >>>>> You're right. It's incorrect to have the window reduction hanging
>> off the end of the list of steps for updating the EWMA.
>> >>>>>
>> >>>>> To make this concrete, here's some specific additional text (in
>> green for those with HTML mail readers). Also, rather than splitting into
>> sub-subsections, I have suggested that Item 9. of the list in subsection
>> 3.3 is moved out of the list, and instead forms the basis of a new
>> subsection 3.4. entitled "Congestion Window Reduction".
>> >>>>>
>> >>>>> CURRENT:
>> >>>>> ========
>> >>>>>    9.  Rather than always halving the congestion window as
>> described in
>> >>>>>        [RFC3168], the sender SHOULD update cwnd as follows:
>> >>>>>
>> >>>>>           cwnd = cwnd * (1 - DCTCP.Alpha / 2)
>> >>>>>
>> >>>>>    Just as specified in [RFC3168], DCTCP does not react to
>> congestion
>> >>>>>    indications more than once for every window of data.  The
>> setting of
>> >>>>>    the CWR bit is also as per [RFC3168].  This is required for
>> >>>>>    interoperation with classic ECN receivers due to potential
>> >>>>>    misconfigurations.
>> >>>>>
>> >>>>>
>> >>>>> 3.4
>> >>>>> .  Handling of Congestion Window Growth...
>> >>>>>
>> >>>>>
>> >>>>> SUGGESTED:
>> >>>>> ==========
>> >>>>>
>> >>>>> 3.4. Congestion Window Reduction
>> >>>>>    Rather than always halving the congestion window as described in
>> >>>>>    [RFC3168]
>> >>>>> , on the arrival of congestion feedback,
>> >>>>> the sender SHOULD
>> >>>>>    update cwnd as follows:
>> >>>>>
>> >>>>>       cwnd = cwnd * (1 - DCTCP.Alpha / 2)
>> >>>>>
>> >>>>>    Just as specified in [RFC3168], DCTCP does not react to
>> congestion
>> >>>>>    indications more than once for every window of data.
>> >>>>> Therefore, as
>> >>>>>    for RFC3168 ECN, it sets the variable for the end of congestion
>> >>>>>    window reduced (CWR) state to SND.UNA and suppresses further
>> >>>>>    reductions until this TCP sequence number is acknowledged.
>> Periods
>> >>>>>    of CWR state are triggered by congestion feedback, and therefore
>> >>>>>    occur at times unrelated to the continuous cycle of observation
>> >>>>>    windows used to update DCTCP.Alpha in Section 3.3.
>> >>>>>
>> >>>>>
>> >>>>>    The setting of the CWR bit is also as per [RFC3168].  This is
>> >>>>>    required for interoperation with classic ECN receivers due to
>> >>>>>    potential misconfigurations.
>> >>>>>
>> >>>>>
>> >>>>> 3.5.  Handling of Congestion Window Growth...
>> >>>>>
>> >>>>> Then the of numbering all subsequent subsections of section 3. will
>> increment by 0.1.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Bob
>> >>>>>
>> >>>>> On 28/09/2021 08:18, RFC Errata System wrote:
>> >>>>>> The following errata report has been submitted for RFC8257,
>> >>>>>> "Data Center TCP (DCTCP): TCP Congestion Control for Data Centers".
>> >>>>>>
>> >>>>>> --------------------------------------
>> >>>>>> You may review the report below and at:
>> >>>>>>
>> >>>>>> https://www.rfc-editor.org/errata/eid6697
>> >>>>>>
>> >>>>>>
>> >>>>>> --------------------------------------
>> >>>>>> Type: Technical
>> >>>>>> Reported by: Vidhi Goel
>> >>>>>> <vidhi_goel@apple.com>
>> >>>>>>
>> >>>>>>
>> >>>>>> Section: 3.3
>> >>>>>>
>> >>>>>> Original Text
>> >>>>>> -------------
>> >>>>>> The below pseudocode follows after DCTCP.Alpha is updated on ACK
>> processing. This is wrong as cwnd should only be reduced using DCTCP.Alpha
>> when ECE is received.
>> >>>>>>
>> >>>>>> 9. Rather than always halving the congestion window as described in
>> >>>>>>        [RFC3168], the sender SHOULD update cwnd as follows:
>> >>>>>>
>> >>>>>>           cwnd = cwnd * (1 - DCTCP.Alpha / 2)
>> >>>>>>
>> >>>>>> Corrected Text
>> >>>>>> --------------
>> >>>>>> Instead, a new paragraph for Congestion Response to ECN feedback
>> would be much clearer. First start with RFC 3168's response to ECE and then
>> provide DCTCP's response to ECE.
>> >>>>>>
>> >>>>>> I am thinking splitting section 3.3 into two sub-sections -
>> >>>>>> 3.3.1 Computation of DCTCP.Alpha
>> >>>>>> 3.3.2 Congestion Response to ECE at sender
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Notes
>> >>>>>> -----
>> >>>>>> Although RFC 8257 refers to RFC 3168 congestion window halving at
>> step 9, but it is confusing to put it right after step 8.
>> >>>>>>
>> >>>>>> Instructions:
>> >>>>>> -------------
>> >>>>>> This erratum is currently posted as "Reported". If necessary,
>> please
>> >>>>>> use "Reply All" to discuss whether it should be verified or
>> >>>>>> rejected. When a decision is reached, the verifying party
>> >>>>>> can log in to change the status and edit the report, if necessary.
>> >>>>>>
>> >>>>>> --------------------------------------
>> >>>>>> RFC8257 (draft-ietf-tcpm-dctcp-10)
>> >>>>>> --------------------------------------
>> >>>>>> Title               : Data Center TCP (DCTCP): TCP Congestion
>> Control for Data Centers
>> >>>>>> Publication Date    : October 2017
>> >>>>>> Author(s)           : S. Bensley, D. Thaler, P. Balasubramanian,
>> L. Eggert, G. Judd
>> >>>>>> Category            : INFORMATIONAL
>> >>>>>> Source              : TCP Maintenance and Minor Extensions
>> >>>>>> Area                : Transport
>> >>>>>> Stream              : IETF
>> >>>>>> Verifying Party     : IESG
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> tcpm mailing list
>> >>>>>>
>> >>>>>> tcpm@ietf.org
>> >>>>>> https://www.ietf.org/mailman/listinfo/tcpm
>> >>>>>
>> >>>>> --
>> >>>>> ________________________________________________________________
>> >>>>> Bob Briscoe
>> >>>>> http://bobbriscoe.net/
>> >>>>> _______________________________________________
>> >>>>> tcpm mailing list
>> >>>>> tcpm@ietf.org
>> >>>>> https://www.ietf.org/mailman/listinfo/tcpm
>> >>>>
>> >>>> --
>> >>>> ________________________________________________________________
>> >>>> Bob Briscoe
>> >>>> http://bobbriscoe.net/
>> >>>
>> >>> --
>> >>> ________________________________________________________________
>> >>> Bob Briscoe
>> >>> http://bobbriscoe.net/
>> >>
>> >
>> >
>> >
>> > _______________________________________________
>> > tcpm mailing list
>> > tcpm@ietf.org
>> > https://www.ietf.org/mailman/listinfo/tcpm
>> >
>>
>