Re: [core] Review of draft-ietf-core-fasor-01

Markku Kojo <kojo@cs.helsinki.fi> Thu, 16 March 2023 10:54 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A7EDC1522AD; Thu, 16 Mar 2023 03:54:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.987
X-Spam-Level:
X-Spam-Status: No, score=-1.987 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SPF_HELO_TEMPERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yptZeWeZIOew; Thu, 16 Mar 2023 03:53:58 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B6A78C151524; Thu, 16 Mar 2023 03:53:47 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Thu, 16 Mar 2023 12:53:38 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; s=dkim20130528; bh=CCepjU3162igsl7VK ymnpbd7VKDGB6vA8iVzEoUOw28=; b=iAJgf73lQtX47LDA9wdUxFyi6anDUNap8 fHk/lafaw8QFk8Rsf4RFD3RrFT43Ga4kjcrl/o4FMOZhaL4xunVoGbXTQRdOF7SV qM94jYf4T+oI1RI5VfVBr2I06ozKRv8uZ0C4lDdM0dltBRYVd1W+wHyXstTNUnRH WItf0KLkxg=
Received: from hp8x-60 (85-76-12-65-nat.elisa-mobile.fi [85.76.12.65]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Thu, 16 Mar 2023 12:53:37 +0200 id 00000000005A014E.000000006412F531.00000D64
Date: Thu, 16 Mar 2023 12:53:37 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Carles Gomez Montenegro <carles.gomez@upc.edu>
cc: core@ietf.org, draft-ietf-core-fasor.authors@ietf.org
In-Reply-To: <5e1904ad67f56adb0001ee0d25488349.squirrel@webmail.entel.upc.edu>
Message-ID: <alpine.DEB.2.21.2303161250360.4394@hp8x-60.cs.helsinki.fi>
References: <5e1904ad67f56adb0001ee0d25488349.squirrel@webmail.entel.upc.edu>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/csB4djMHRQKN-CWVTkoKINzFX94>
Subject: Re: [core] Review of draft-ietf-core-fasor-01
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Mar 2023 10:54:04 -0000

Dear Carles,

thank you very much for your thorough review. It is much appreciated and 
useful, although this reply comes so very late as the draft has been 
dormant for so long for various reasons. Apologies for that.

We believe we have addressed most of the suggestions in -02. In the few 
cases where we possibly did not (fully) address the suggestions we hope 
our answers and clarifications make sense and we may agree on how to 
address those cases (or if they need not to be addressed).

Please see inline some comments and answers.

On Sun, 7 Feb 2021, Carles Gomez Montenegro wrote:

> Dear FASOR authors,
>
> First of all, my apologies for providing you with this review later than I
> expected. I hope it is not too late, considering that the cutoff date is
> quickly approaching!
>
> Thank you for the document. Please find below a number of comments and
> suggestions:
>
> - Section 1, first paragraph: "Both default RTO mechanism for CoAP
> [RFC7252] and CoCoA [I-D.ietf-core-cocoa] have issues in dealing with
> unnecessary retransmissions". CoCoA is still work in progress, and there
> are ideas on the table to improve it. Perhaps some different phrasing (for
> the sentence  I highlighted here) might better support future updates of
> CoCoA.

Fair enough. Would it be better now when changed to:

  "Both default RTO mechanism for CoAP [RFC7252] and the latest
   version of CoCoA [I-D.ietf-core-cocoa] have issues in dealing
   with unnecessary retransmissions"

> - Section 1, last paragraph: s/unnecessary retransmissions that a sender
> may have sent due to an inaccurate RTT estimate/unnecessary
> retransmissions due to an inaccurate RTT estimate

Done.

> - The RTO acronym is defined (preceded by its expanded form) twice in the
> document. Also, there are instances of both "RTO" and retransmission
> timeout throughout the document. Once "RTO" has been introduced, just the
> acronym should probably be used.

Good catch.
Hope it's better now.

> - Page 3: "However, there are couple of exceptions when the RTT estimation
> is not available". Well, the second bullet after that sentence corresponds
> rather to "is not updated" than "is not available".

I think the text is correct as it says *RTT estimation*, i.e., the 
process of making the estimate, is not available. However, it is now 
modified to read:
  "... when the RTT estimate is not available or it cannot be updated".

Hope that works better.

> - Page 3: "- At the beginning of a flow where an initial RTO of 2 seconds is
> used." As a reader, I am not sure if this is mentioned as a negative
> aspect of CoCoA. Perhaps not, but it might be good to emphasize that
> something like this is unavoidable (and, as I understand, it is the same
> with FASOR), whereas the second bullet is where a problem may actually
> happen.

True, unnecessary rexmits are unavoidable with an initial RTO if the 
actual RTT is high enough. This holds for FASOR as well. But the problem 
is not alone in the inability to avoid unnecessary rexmits and take a new 
RTT sample. With CoCoA, if the RTT is high enough, the unnecessary 
rexmits are repeated for all subsequent messages because CoCoA has the 
same problem as default CoAP: it does not apply a larger, backed-off RTO 
from the previous msg exhange to the next msg exchange (which would break 
the chain of unnecessary rexmits).

The new text tries to clarify this.

> - Page 3, last paragraph: "CoCoA being unable to take RTT samples at all".
> In my opinion, "at all" is probably not the right text here, and the
> phrasing used should be relative to the context of the problem mentioned.

Agreed. This text has now been removed and replaced with the new text 
I mentioned above. Hope it works better.

> - Page 3, last paragraph: similar to my first comment, some adaptation
> might be needed for this paragraph to better support possible future
> updates of CoCoA.

Not sure where this (Page 3, last paragraph) refers to, but when we now 
mention the latest CoCoA and cite version -03, is it better (or sufficient 
to address this)?

> - Section 4.1, first line: s/an CoAP/a CoAP

Done.

> - Section 4.1, third line: "normal RTO or FastRTO". I was wondering if
> handling two terms for the same concept is really useful. But perhaps it
> may make sense depending on the context in each instance of either term.
> So feel free to proceed as you prefer.

The new text is now using FastRTO in the context of FASOR.

> - Section 4.1, first paragraph: s/backup mechanisms/backup mechanism

Done.

> - Section 4.1, last paragraph: "FastRTO is updated only with unambiguous
> RTT samples.  Therefore, it closely tracks the actual RTT of the network
> and can quickly trigger a retransmission when the network state is not
> dubious."  Perhaps there may be lossy intervals during which the "actual
> RTT" might vary (regardless of losses), e.g. due to a change of the
> physical layer bit rate, but then FASOR would not be able to collect
> "actual RTT" samples during such interval... It might be interesting to
> consider this situation as well. If no modification to the FASOR algorithm
> is deemed necessary, then perhaps at least some discussion on this might
> be useful.

Sure, there may be periods during which no valid samples can be taken and 
the FastRTO remains unchanged. If we would use ambiguous RTT samples, 
there is always the danger of computing (badly) incorrect RTO.

The most importan is to catch up with RTT increase. We think that Slow 
RTO helps there as it makes it much more likely that a new RTT sample is 
optained, if there is no loss. In case the msgs get lost, it should also 
be noted that the FC 6298 algo is notably fast in catching up with 
increased RTT once a RTT sample is finally obtained.

I also think we already have this partly covered in Sec 4.4 and 4.5. Of 
course, without Retransmission Count (or corresponding token) there is 
not much to do without sacrificing the principle of using unambiguous RTT 
samples?

> - End of section 4.1. What is "K"? (I wasn't able to find a definition of
> "K" earlier in the document.)

K is from RFC 6298 like RTTVAR and SRTT that we also use. It 
is now clarified that K = 4 (as in RFC 6298).

> - The document assumes that responses are "acknowledgments". However,
> there may be responses that do not correspond to acknowledgments at the
> messages sublayer of CoAP, that still provide RTT samples. It may be good
> to clarify this, since otherwise perhaps a fraction of the full set of RTT
> samples may remain unused.

Good catch. You are right. A part of the text was written as if 
considering the case of request - piggybacked response exchanges only, 
whereas it should have considered CON - Ack exchanges. Tried to reflect 
this in the modified text.

> - The document also assumes in several instances that the message that
> triggers some response/ACK/packet from the other endpoint is a "request".
> Please note that there may be messages that do not carry requests which
> may anyway be sent as CON messages, from which an RTT sample can be
> obtained.

Good point.
Removed unnecessary/misleading "reguests". Hope it works better now.

> - Section 4.2, first paragraph. A factor of 1.5 (by default) is mentioned
> at the end of the paragraph. Is there any hint on how this factor should
> be set?

We are proposing to use the default. Do you think we should 
recommend it explicitly? We briefly explain the tradeoff; having a 
much smaller factor would maybe not allow enough room for delay increase 
while a larger factor would possibly result in unnecessary conservative 
behaviour. The factor of 1.5 worked well in our experiments.

This is intended to be Experimental, so later experimentation may provide 
more information whether the current advice is still valid or maybe 
the factor should be tuned up or down.

> - Section 4.2, first paragraph, last sentence. Perhaps adding some forward
> reference might help the reader?

Done.

> - Section 4.3.1, FAST_SLOW_FAST: "If the request message needs to be
> retransmitted, continue by using Slow RTO for the first retransmission in
> order to respond to congestion".  This text implies that it is *known*
> that the retransmission is due to congestion, but it could also be due to
> bad link quality.

Sure, but when discussing congestion, it is one of the fundamental 
principles that a packet loss should be considered as a congestion signal 
in the first place. When it is unknown whether the loss in due to 
congestion or some non-congestion event, a sender should react as if it 
was congestion. However, in the FAST_SLOW_FAST state, FASOR deliberately 
allows for a one-shot probe with a quicker FastRTO to recover from a 
possible non-congestion related loss before using (much longer) Slow RTO 
to respond to congestion.

> - Section 4.3.1, SLOW_FAST: s/transmisssion/transmission

Done.

> - Section 4.3.1: s/if further retransmission are/if further
> retransmissions are

Done.

> - Section 4.3.1, last paragraph: "if RTO expires also for that request
> message". When mentioning "that request message", which one does the text
> refer to? Perhaps, consider rephrasing.

Tried to clarify. Hope it is clearer now.

> I hope this feedback can be useful!

Very helpful, indeed. Thank you!


Cheers,

/Markku

> Cheers,
>
> Carles
>
>