Re: [core] Tsvart early review of draft-ietf-core-fasor-01

Markku Kojo <kojo@cs.helsinki.fi> Mon, 20 March 2023 21:35 UTC

Date: Mon, 20 Mar 2023 23:34:25 +0200
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Yoshifumi Nishida <nsd.ietf@gmail.com>
cc: tsv-art@ietf.org, core@ietf.org, draft-ietf-core-fasor.all@ietf.org, jaime@iki.fi, marco.tiloca@ri.se, barryleiba@computer.org, superuser@gmail.com
In-Reply-To: <160846750677.2364.7100486944014136268@ietfa.amsl.com>
Message-ID: <alpine.DEB.2.21.2303121603120.4394@hp8x-60.cs.helsinki.fi>
References: <160846750677.2364.7100486944014136268@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/cQk2FXz0mXF2MFA1mvtBj2ZHs2M>
Subject: Re: [core] Tsvart early review of draft-ietf-core-fasor-01
Precedence: list

Hi Yoshi,

thank you very much for your review and useful comments, and sincere 
apologies for the very long delay in replying as for various reasons no 
cycles were put on this draft for a long time.

Please see inline.

On Sun, 20 Dec 2020, Yoshifumi Nishida via Datatracker wrote:

> Reviewer: Yoshifumi Nishida
> Review result: On the Right Track
>
> This document has been reviewed as part of the transport area review team's
> ongoing effort to review key IETF documents. These comments were written
> primarily for the transport area directors, but are copied to the document's
> authors and WG to allow them to address any issues raised and also to the IETF
> discussion list for information.
>
> When done at the time of IETF Last Call, the authors should consider this
> review as part of the last-call comments they receive. Please always CC
> tsv-art@ietf.org if you reply to or forward this review.
>
> Summary: This document needs clarifications on some points.
>
> I might miss something, but I have concerns on the proposed logic.
> Because it seems that it doesn't exactly follow exponential backoff requirement
> in rfc8961 and it's possible to become more aggressive than normal backoff
> algorithm in some cases.

It is true that FASOR is deliberately somewhat more aggressive than normal 
(TCP) backoff algorithm, but it does so in a controlled way and we believe 
it is also justified as per RFC 8961.

TL;DR:
RFC 8961 allows for deviating from its requirements when some 
specifics related to the protocol at hand are known. The CoAP protocol 
is not capasity-seeking unlike TCP and many other protocols; it never has 
more than one msg in flight, so it does not necessitate as conservative 
full backoff as TCP.

Expected CoAP operating environment is likely to have packet drops 
unraleted to congestion that calls for faster loss detection than 
that of TCP RTO with full backoff.

The full backoff also tends to result in unfair capacity allocation as 
some of the competing flows are likely to hit the full backoff several 
times in a row due to synchronization effects while other flows may 
simultaneously avoid the backoff.

A longer explanation:

RFC 8961 says:

In Sec 2:

  "The requirements in this document may not be appropriate in all cases.
   In particular, the guidelines in Section 4 are concerned with the
   general case, but specific situations may allow for more flexibility
   in terms of loss detection because specific facets of the environment
   are known ..."
   ...
  "The correct way to view this document is as the default case and
   not as one-size-fits-all guidance that is optimal in all cases."
   ...
  "The requirements in this document may not be appropriate in all cases;
   therefore, deviations and variants may be necessary in the future
   (hence the "SHOULD" in the last bullet). However, inconsistencies MUST
   be (a) explained and (b) gather consensus."

In Sec 3:

  "The requirements in this document are only directly applicable to
   last-resort loss detection. However, we expect that many of the
   requirements can serve as useful guidelines for more aggressive
   non-last-resort timers as well."

So, RFC 8961 leaves room for deviating from its requirements. 
Particularly, CoAP does not use RTO as the last-resort loss detection but
as the primary loss detection mechanism, meaning that the requirements in 
RFC 8961 are not directly applicable (see RFC 8961, Sec 3 above).

Below please find justifications for FASOR that we believe are well in 
line with RFC 8961. Of course, we need to gather consensus and also more 
explanation is likely needed.

The FASOR logic differs from that of TCP and the direct requirements of 
RFC 8961 mainly due to three reasons:

1) (RFC 8961 says: "... but specific situations may allow for more 
flexibility in terms of loss detection because specific facets of the 
environment are known.")

The traffic patterns generated by CoAP known to be specific; CoAP senders 
never generate capasity-seeking traffic patterns unlike TCP (and many 
other transports) that are capasity-seeking protocols. CoAP is a 
"request-reply" protocol that never has more than a single message in 
flight (i.e., it runs similar to TCP with fixed cwnd = 1 MSS, and message 
sizes often being smaller than what's the typical MSS with TCP).

This means that individual CoAP flows never contribute to increased load 
per flow because they send at very limited maximum rate. The problem of 
congestion (and potential congestion collapse) with CoAP senders comes 
into the picture only when the number of co-existing flows increases to a 
high enough level. This is different from TCP where an individual flow 
seeks for more capacity with exponentially increased rate in slow start 
after an RTO expired, meaning that the TCP RTO backoff needs to be 
more conservative than that of a non-capacity seeking CoAP.

Moreover, the FASOR backoff is actually exponentially increasing. In the 
FAST state the behaviour is the same as that of TCP, i.e, it 
implements binary exponential backoff. If a message gets retrasmitted in 
the FAST state, the FASOR sender switches to the FAST_SLOW_FAST state, 
which is a single-shot state (see items 2 and 3 below for the 
justification to use FastRTO in the FAST_SLOW_FAST backoff series). 
If congestion still persists and the message gets rexmitted in the 
FAST_SLOW_FAST state, the sender switches to the SLOW_FAST state and 
stays there until a message is delivered without retransmission. This 
means that when the sender moves from FAST-SLOW_FAST state to SLOW_FAST 
state and the next message also needs to be retransmitted, the sender 
re-enters the SLOW_FAST state. That is, in case of persitent congestion 
the state changes that follow are:

FAST -> FAST-SLOW_FAST -> SLOW_FAST -> SLOW_FAST -> SLOW_FAST ...

Each time FASOR re-enters the SLOW_fAST state the Slow RTO is recomputed.
Slow RTO measures the worst case RTT including all retransmissions from 
the previous message exchange and it is multiplied with the factor of 
1.5, resulting in exponential increase of Slow RTO that is always 
applied first for the next message (Slow RTO from the previous msg is 
always included in the Slow RTO computed for the next msg). This does not 
guarantee full backoff (like in TCP with factor of 2), but the backoff is 
often even more conservative than that of TCP because it sums up time 
elapsed in Slow RTO as well as the other potential FastRTO-based 
retransmissions and is then multiplied by factor of 1.5.

This should be well sufficient to clear congestion as the 
potential increase in the congestion level may come only from the 
continuous increase in the number of competing senders. And, the number 
of senders cannot effectively be expected to increase (exponentially) 
forever.

Particularly, the classic congestion collapse requires that the 
unnecessary retransmissions are buffered and finally appear on the 
bottleneck link to eat link capacity. This is efficiently addressed by 
SlowRto that is based on a resent measurement of the worst case RTT, 
i.e., it ensures that all unnecessary retransmissions that this FASOR 
sender injected have left the network. The experiments in [JRCK18a]
show that the current standards track default congestion control for CoAP 
as specified in RFC 7252 is actually prone to congestion collapse in an 
environment with buffer bloated bottleneck, while the results in 
[JRCK18b] show that FASOR is significantly less aggressive than the 
default CoAP in such an environment and effectively clears the 
congestion.

2) FASOR is designed to provide fast loss detection in case of 
non-congestion related losses that are typical in the typical 
operating environments of CoAP (e.g., losses due to bit-corruption on 
wireless links). Therefore, FASOR allows a single FastRTO as the first 
RTO in the FAST_SLOW_FAST state and may continue after a Slow RTO with a 
backoff series based on FastRTO. In lossy environments, this allows for 
more timely loss detection than with full backoff without sacrificing 
safeness in case of congestion-related losses. The full backoff tends to 
result in unnecessarily long idle times at times, resulting in prolonged 
flow completion times.

The increased aggressiveness due to FastRTO-based backoff series is 
possible because CoAP is not capacity seeking and hence the potential 
FastRTO-based backoff series do not increase the overall aggressiveness 
of FASOR too much but provide a balanced tradeoff between 
congestion-related and non-congestion related losses.

3) FASOR backoff logic contributes to better fairness than full 
backoff when there are several competing flows creating high level of 
congestion. The full backoff is quite well-known to result in unfair 
capacity allocation as flows tend to get synchronized at the bottleneck; 
when a message that made it to the receiver triggers an ack that clocks 
out the next message, it usually means that the next message is also 
successful because the ack clock synchrinizes tha arrival of messages at 
the bottleneck. On the other hand, the unlucky flows that encounter a loss 
need to wait for an RTO. Hence, the retransmission is not any more 
synchronized and is likely to enter full queue and get dropped, resulting 
in longer, exponentially backed-off RTO that is again likely to get 
dropped. Instead, FASOR does not directly enter such a chain of 
backed-off RTOs but may probe for success with similar FastRTO-backoff 
serias as any flow that experiances its first drop and RTO (or newly 
started flows that also begin with FastRTO abckoff series. This induces 
better fairness and more stable flow completion times.

> In my understanding, in FAST_SLOW_FAST state, the 3rd or later RTOs can be
> shorter than 2nd RTO. Similarly,  in SLOW_FAST state, 2nd or later RTOs can be
> shorter than 1st RTO. For example, when slowrto = 4.0 sec and fastrto is 0.25
> sec, then RTOs in SLOW_FAST state will be updated 4.0, 0.5, 1.0, 2.0 secs,...
> on each retransmission. But, I am not very sure if setting shorter RTO on the
> later retransmissions is a good idea unless we have good knowledge on the
> congestion status in the network. I would like to see some more discussions and
> clarifications on this point in the draft.

Hope that the descriptions above provide enough clarification. In 
particular, the specific low-load traffic pattern of CoAP and the items 2) 
and 3) above. And, the effectiveness of Slow RTO to clear the bottleneck 
queue from any unnecessary retransmissions that this flow potentially 
sent.

Note also what tha draft says in Sec 4.2:
     "Slow RTO itself is a form of back off because it includes the
     accumulated time from the RTO back off of the previous
     exchange."

> BTW, if RTOs will be updated something like, 4.0, 1.0, 2.0, 4.0 secs in this
> case, it looks better to me as it'll be conservative than normal backoff
> algorithm. The algorithm would look setting longer RTOs in some special cases
> while following backoff algorithm.

We actually experimented with a variant of FASOR that did exactly this by 
starting with a two times larger FastRTO after an Slow RTO. The results 
didn't show any notable improvement in clearing up congestion but resulted 
slightly inefficient loss detection and recovery with non-congestion 
related losses. As this is intended to be published as Experimental, it is 
possible to suggest experimenting more with such variant. However,  I 
don't see much of a reason to recommend it without any evidence of 
potential problems with the current approach being too aggressive. Of 
course, if any experiments encounter any such problems it is maybe 
useful to do it, although I think it might be better to first tune the 
factor of 1.5 slightly larger.

> Some minor comments:
>
> 4.1 "We call this normal RTO or FastRTO"
>        -> How about using just one term? It seems that both terms are used in
>        the doc. But, it looks a bit confusing.

Tried to improve this. Now using mainly "FastRTO" but a part of the 
instances where we used normal RTO were intended to refer to both 
"normal" RFC 6298 RTO and FASOR FastRTO, so it was not that 
straightforward. Hope it is better now in this sense.

> 4.3  It might be good if there's example diagrams here so that readers can
> understand how algorithm work easily
>
>     "First perfom a probe"
>       -> What is 'probe'?

With "probe" we intended to indicate that the first RTO used in the 
FAST_SLOW_FAST state is exceptionally a shorter FastRTO that kinda probes 
the network faster to recover quickly in case it was a wireless loss 
(i.e., not congestion-related loss) as explained in the last para of Sec 
4.3.1. But maybe it is better now when modified to just delete the 
"probe"?

Hope this way too longish reply clears up things and maybe 
we may help to find a way towards consensus.

Thanks a lot,

/Markku

[core] Tsvart early review of draft-ietf-core-fas… Yoshifumi Nishida via Datatracker
Re: [core] Tsvart early review of draft-ietf-core… Markku Kojo
Re: [core] Tsvart early review of draft-ietf-core… Yoshifumi Nishida