Re: [tcpm] Review of draft-ietf-tcpm-alternativebackoff-ecn-02

Naeem Khademi <naeemk@ifi.uio.no> Wed, 15 November 2017 11:54 UTC

From: Naeem Khademi <naeemk@ifi.uio.no>
To: "Bless, Roland (TM)" <roland.bless@kit.edu>
CC: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Michael Welzl <michawe@ifi.uio.no>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, grenville armitage <garmitage@swin.edu.au>
Thread-Topic: Review of draft-ietf-tcpm-alternativebackoff-ecn-02
Thread-Index: AQHTS+d0BLEXnPhOQ0SyTgbWWxRRQqMVaD0A
Date: Wed, 15 Nov 2017 11:53:57 +0000
Message-ID: <7447FBC9-6B81-4A97-AB45-C57555B30559@ifi.uio.no>
References: <bd5142c3-6ea9-f703-4a57-78ccb3679574@kit.edu>
In-Reply-To: <bd5142c3-6ea9-f703-4a57-78ccb3679574@kit.edu>
Accept-Language: en-GB, nb-NO, en-US
Content-Language: en-US
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [129.240.169.59]
Content-Type: multipart/alternative; boundary="_000_7447FBC96B814A97AB45C57555B30559ifiuiono_"
MIME-Version: 1.0
X-UiO-SPF-Received: Received-SPF: neutral (mail-mx02.uio.no: 129.240.120.74 is neither permitted nor denied by domain of ifi.uio.no) client-ip=129.240.120.74; envelope-from=naeemk@ifi.uio.no; helo=mail-ex12.exprod.uio.no;
X-UiO-Spam-info: not spam, SpamAssassin (score=-0.5, required=5.0, autolearn=disabled, AWL=-1.150, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 98B1B5FF2F928A9B00414DA428A885DCC1F1FD0B
X-UiOonly: DF74DDEE15B4A5D8FFF0C30B73224B48B925EB27
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/OaZzOlcOUgsQGYVSMWVdOUuRzNA>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-alternativebackoff-ecn-02
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Nov 2017 11:54:08 -0000

Hi Roland

Thanks a lot for the comments. Almost all of your comments are now addressed in -03 (submitted) and -04 (un-submitted but attached to the email I sent in response to L. Stewart’s comments). However, here are per-item responses. Please see inline:


On Oct 23, 2017, at 6:12 PM, Bless, Roland (TM) <roland.bless@kit.edu<mailto:roland.bless@kit.edu>> wrote:

Hi,

as promised at the last IETF meeting, here is my (lengthy)
review of draft-ietf-tcpm-alternativebackoff-ecn-02.

In summary:
1) more precise terminology, e.g., better distinction between buffer and
queue,
  also use same terminology as in RFC 5681 (largely done in -02 already).

2) state more clearly that the concrete recommendations are tailored to
  specific congestion controls and be more open to other congestion
control variants

3) the abstract is too long

and section 4 contains a bit redundancy w.r.t. sections 2 and 3.

I have provided answers to the above items in the “in more detail” part below:


in more detail
1) I would prefer to use _buffer_ for the maximum memory space that is
  allocated for potentially enqueued packets and _queue_ for the
  amount of actually queued packets, i.e., current buffer
  occupancy. Therefore, IMHO it makes sense to speak of shallow
  buffers and short queues, but not of "shallow queues". In
  particular, an AQM tries to keep the (longer-term) queue short in a
  buffer while accepting transient bursts -- the behavior therefore
  also differs from a "shallow buffer”.

Agreed in general; changed all instances of “shallow buffer” to “short queue” both in the abstract and the main body,
except for “shallow AQM marking threshold” which remains the same as it refers to the “shallow threshold”.


2) the main point of this draft is: it makes sense to behave different to
  CE-ECN marked packets than to packet loss. One benefit is to
  achieve higher utilization by adjusting the backoff to be less. The
  recommendation for two backoff factors is specific for two
  congestion controls, CUBIC and New Reno. For other congestion
  controls it may also make sense to adapt differently, but the draft
  doesn't provide any recommendations for them. In general, there can
  be lots of different congestion controls that do not need this kind
  of modification to keep the utilization high.

This seems to have been already captured in Section 4.3 (below):

   beta_{ecn} depends on how the response of a TCP connection to shallow
   AQM marking thresholds is optimised. beta_{loss} reflects the
   preferred response of each congestion control algorithm when faced
   with exhaustion of buffers (of unknown depth) signalled by packet
   loss.  Consequently, for any given TCP congestion control algorithm
   the choice of beta_{ecn} is likely to be algorithm-specific, rather
   than a constant multiple of the algorithm's existing beta_{loss}.

So I’m not sure if we need to add anything more beyond what’s discussed in here without risking being redundant. If you think otherwise, please
suggest text (that differs from above) and a suitable (sub-)section.

I have also added this text to Section 4.3:

The recommended beta_{ecn} value in this document is only applicable for Standard TCP congestion control.


3) The abstract should be corrected according to 1 and shortened,
  such as: Recent Active Queue Management (AQM) mechanisms allow for
  burst tolerance while enforcing short queues to minimise the time
  that packets spend enqueued at a bottleneck. This can cause
  noticeable performance degradation for TCP connections traversing
  such a bottleneck, especially if they are only a few or their
  bandwidth-delay-product is large.  An Explicit Congestion
  Notification (ECN) signal indicates that an AQM mechanism is used
  at the bottleneck, and therefore the bottleneck network queue is
  likely to be short.  This document therefore proposes an update to
  the TCP sender-side ECN reaction in congestion avoidance to reduce
  the congestion window by a smaller amount than the congestion
  control algorithm's reaction to loss.

Simply used the above text suggestion, while keeping the abbreviation definition of cwnd.


------------------
Walk through:

Section 2.
==========

 Research has demonstrated the benefits of reducing network delays due
 to excessive buffering [BUFFERBLOAT]; this has led to the creation of
 new AQM mechanisms like PIE [RFC8033] and CoDel [CODEL2012]
 [I-D.CoDel], which avoid causing the bloated queues that are common
 with a simple tail-drop behaviour (also known as a First-In First-
 Out, FIFO, queue).

The first sentence is confusingly put: "reducing network delays due to
excessive buffering", better rephrase.

below

Moreover, I'd like to see a
more precise description of the problem here: The main problem is here
that existing loss-based congestion controls complete fill available
bottleneck buffer capacity. So it's primarily _not_ the tail-drop
behavior causing bloated queues, but the congestion control.

Changed to:  Research has demonstrated the benefits of reducing network delays
   that are caused by interaction of loss-based TCP congestion control
   and excessive buffering [BUFFERBLOAT].


There
exist two approaches to reduce the queues: use different a different
congestion control (modify end points) or enforce short queues in
routers by using AQMs (modify intermediate systems). So a delay-based
congestion control can use a tail-drop FIFO queue and still avoid
excessive queuing delays, i.e., not even requiring an AQM to control
the queue.

We (authors) think that it’s best to leave the discussion on the delay-based CCs outside of this document. Despite the fact that we initially talked about this in -03, we have now removed the text that mentions them (in -04 draft). We would like to avoid detouring into specific mention of "i.e delay-based" approaches and the dismissive "The... suffers from... out of scope...". There's a wide literature on techniques that are based on delay. However, the mention of "delay" based CC distracts from our I-D. Delay-based algos aren't out of scope for our doc due to coexistency problems (as we initially wrote in -03) but they are out of scope because they're completely irrelevant to our proposal by definition.


These AQM mechanisms instantiate short queues that are designed to
tolerate packet bursts.

More precisely:
These AQM mechanisms aim to keep a sustained queue short while
tolerating transient (short-term) packet bursts.

Fixed.


However, congestion control mechanisms
cannot always utilise a bottleneck link well where there are short
queues.

=> However, currently used loss-based congestion control mechanisms

Fixed.


to compensate for TCP halving the "cwnd" and "ssthresh" variables in
response to a lost packet [RFC5681].

see 1), cwnd is set to FlightSize/2, not cwnd/2 (RFC5681 is quite
specific about this).

This language (using “halving”) is common throughout RFC3168 (perhaps wrongly). Therefore changed “halving” to “reducing”. Since it already cites RFC5681, it’s clear how it does it. saying halving the “FlightSize” would have been wrong as TCP doesn’t change the flight size variable (it’s measured/calculated), so this seems to be an easy way out.

Fixed and reads as:

   For example, a TCP sender must be able to store at least an
   end-to-end bandwidth-delay product (BDP) worth of data at the
   bottleneck buffer if it is to maintain full path utilisation in the
   face of loss-induced reduction of cwnd [RFC5681], which effectively
   doubles the amount of data that can be in flight, the maximum round-
   trip time (RTT) experience, and the path's effective RTT using the
   network path.


This requires the bottleneck
queue to be able to store at least an end-to-end bandwidth-delay

queue => buffer

Done!


product (BDP) of data, which effectively doubles both the amount of
data that can be in flight and the round-trip time (RTT) experience
using the network path.

it effectively doubles the RTT only if the buffer is completely
filled, usually the queue is varying over time.

Added “maximum” to the “round-trip time (RTT)”.


ABE improves the
performance when routers use shallow buffered AQM mechanisms.

See 1), e.g., "when routers use AQM controlled buffers that allow
for short queues only.”

Fixed.


Section 3.
==========
This specification describes an update to the congestion control
algorithm of an ECN-capable TCP transport protocol.

See 2.) This statement is very generic, whereas the recommendation is
quite specific to CUBIC and NewReno. It may be useful for other congestion
controls as well if they require also a more moderate response/backoff
in order to keep the utilization high. Their backoff modification may
however, be different. Moreover, there exist other congestion controls
that don't suffer from underutilization if they react to a congestion
signal.

Actually the recommendation is purely for NewReno, also standing as IETF-standard “TCP congestion control”. We mention that we have tested for CUBIC as well as provide a value where CUBIC works well at, but the RECOMMENDATION given in the I-D (i.e. beta_{ecn}=0.8) only concerns the standard TCP.


It RECOMMENDS that a TCP
sender multiplies the cwnd by 0.8 and reduces the slow start
threshold (ssthresh) in congestion avoidance following reception of a
TCP segment that sets the ECN-Echo flag (defined in [RFC3168]).

See previous comment: here you should be explicit about the particular
congestion controls where the recommended behavior and parameter can
be applied to.

See above.

Moreover, "cwnd= max (FlightSize * beta_{ecn}, 2 * SMSS)",
which is a bit different from cwnd= cwnd  * beta_{ecn} (this is what
the text suggests).

Now reads as:

   It RECOMMENDS that a TCP sender multiplies the slow start threshold (ssthresh) by 0.8 times of
   the FlightSize (with its minimum value set to 2 * SMSS) and reduces
   the cwnd in congestion avoidance following reception of a TCP segment
   that sets the ECN-Echo flag (defined in [RFC3168]).


Section 4.
==========
 performance gains in lightly-multiplexed scenarios, without losing

"lightly-multiplexed scenarios" means presumably that only a few flows
traverse the considered bottleneck, but how many are "a few" then?
three or nine or twenty?
Later on it is defined as
"lightly-multiplexed case (few concurrent connections)",
better mention this at first use already.>

Done.


loss is detected (regarded as a notification of congestion), Standard
TCP halves the cwnd and ssthresh [RFC5681], which causes the TCP
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See 1), roughly speaking yes, but not quite correct. Moreover, packet
loss can be detected by various ways, e.g., by using the retransmission
timer
or not; the congestion control response may differ then, too.

Complying with the description text of RFC5681, it now reads as:

   When packet loss is inferred using the retransmission timer and the given packet
   has not yet been resent by way of the retransmission timer (regarded
   as a notification of congestion), Standard TCP sets the ssthresh to
   the maximum of half of the FlightSize and 2*SMSS [RFC5681], which
   causes the TCP congestion control to go back to allowing only a BDP
   of packets in flight -- just sufficient to maintain 100% utilisation
   of the bottleneck on the network path.


delay target in routers and use congestion notifications to constrain
the queuing delays experienced by packets, rather than in response to

if AQMs set CE, they hope for an appropriate action, however,
they'll limit the queueing delays by actively dropping packets.

it uses the term "congestion notifications”, which can be loss (implicit congestion notification) or explicit (ECN). If the sender is unresponsive, then dropping packets is a “protection mechanism”, but the underlying assumption is the whatever traffic is traversing the path is congestion controlled at end-points.


that were not necessarily configured to emulate a
shallow queue

see 1), short queue vs. shallow buffer

Changed to "emulate a bottleneck with a short queue"


However, it interacts badly for a lightly-multiplexed
case (few concurrent connections) over a path with a large BDP.
Conventional TCP backoff in such cases leads to gaps in packet
transmission and under-utilisation of the path.

Maybe combine these two sentences:

However, in a lightly-multiplexed case (few concurrent connections)
over a path with a large BDP, conventional TCP backoff leads to
gaps in packet transmission and under-utilisation of the path.

Done.


hence the CE-mark likely came from a bottleneck with a shallow queue.

controlled and short queue

Changed to "controlled short queue".


Reacting differently to an ECN CE-mark than to packet loss can then
yield the benefit of a reduced back-off, as with CUBIC [I-D.CUBIC],
when queues are short, yet it can avoid generating excessive delay
when queues are long.

I'm not sure that I understood the gist in this statement, better
rephrase and split up into two sentences?

Now reads as:

   Reacting differently to an ECN-signalled
   congestion than to an inferred packet loss can then yield the benefit
   of a reduced back-off when queues are short.  Using ECN can also be
   advantageous for several other reasons [RFC8087].


For non-ECN-enabled
TCP connections,

Not fully clear what this means. Are the end-systems ECN capable, but
the routers in between do not mark? Or does it mean that at least one
end-system isn't ECN capable?

ECN-enabled *connection* is the kind of connection in which both end points have successfully negotiated the ECN. non-ECN-enabled connection is the opposite of that.


   ssthresh_(t+1) = max (FlightSize_t * beta_{loss}, 2 * SMSS)

RFC 5681 doesn't use any notation with "t". If you are using t, you
should specify what it means. My suggestion is to avoid introducing
it and to use the same terminology as in RFC 5681.

Done.


I think that the beginning of section 4.3 belongs more to section 3
while that rest fits to the section title (discussion of the ABE
multiplier).

Which sentence do you exactly prefer to be moved to Section 3?


Section 5.
===========

5.  Status of the Update

I don't understand the purpose of this section or the section
title is weird at least.
Is it meant to describe required changes? The use this as section
title.

it’s the "status of the update to the congestion control” that is being proposed in this document. It addresses the “Requirement for the update to the congestion control”, but for the sake of brevity I have now changed this to “ABE requirements”


congestion-control algorithms, it does not require any change to the

everywhere else it is "congestion control" without dash.

Fixed.


 The currently published ECN specification requires that the
 congestion control response to a CE-marked packet is the same as the
 response to a dropped packet [RFC3168].  The specification is
 currently being updated to allow for specifications that do not
 follow this rule [I-D.ECN-exp].  The present specification defines
 such an experiment and has thus been assigned an Experimental status
 before being proposed as a Standards-Track update.

This is largely a repetition from the introduction.

Unless repetition is bad, it is okay. Introductions can often be treated as places that summarise key messages contained in the body of a document. So reptition is to be expected.


Because this advantage applies only to ECN-marked packets and not to
loss indications, the new method cannot lead to congestion collapse.

I'm not sure that I can follow here. There are several forms of congestion
collapse and the classical one causes unnecessary retransmissions by a
timer mismatch. Maybe you can elaborate a bit more here.

It now reads as:

   Because this advantage applies only to ECN-marked packets and not to
   packet loss indications, in the worst-case (e.g., an ABE-compliant
   TCP sender using beta_{ecn} = 1.0) the ECN-capable bottleneck will
   still fall back to dropping packets, and the result is no different
   than if the TCP sender was using traditional loss-based congestion
   control.

Section 8.
==========

http://heim.ifi.uio.no/naeemk/research/ABE/ This code was used to

Full stop missing here (presumably to avoid problems with the URL).

Fixed.

Maybe put the (most important) changes into an appendix? I'm not sure
how long this URL will be valid after the RFC has been published.

We are not yet at WGLC, but will fix this in later revisions.


Regards,
Roland


Cheers,
Naeem

[tcpm] Review of draft-ietf-tcpm-alternativebacko… Bless, Roland (TM)
Re: [tcpm] Review of draft-ietf-tcpm-alternativeb… Lawrence Stewart
Re: [tcpm] Review of draft-ietf-tcpm-alternativeb… Naeem Khademi
Re: [tcpm] Review of draft-ietf-tcpm-alternativeb… Naeem Khademi