[tcpm] Review of draft-ietf-tcpm-rack-07

" Ilpo Järvinen " <ilpo.jarvinen@cs.helsinki.fi> Fri, 14 February 2020 12:22 UTC

Return-Path: <ilpo.jarvinen@cs.helsinki.fi>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 20DBE120827 for <tcpm@ietfa.amsl.com>; Fri, 14 Feb 2020 04:22:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.301
X-Spam-Level:
X-Spam-Status: No, score=-4.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wrhhp_PHCi6X for <tcpm@ietfa.amsl.com>; Fri, 14 Feb 2020 04:22:37 -0800 (PST)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 815CD1200DE for <tcpm@ietf.org>; Fri, 14 Feb 2020 04:22:36 -0800 (PST)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Fri, 14 Feb 2020 14:22:30 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:subject:message-id:mime-version:content-type; s= dkim20130528; bh=eWzQpJMdfcIZLM1cSo8nZIGv8TG3aQ6OZzXP0ZF46o0=; b= iY7q4qU0xN5zJGwvLGoBU4T5EoD8sqe+xkK8kymNZY+WfX91G4bm7jl4Epq3Egxk ISqqLg+Gqme3j8WGqtrMaLWnunKg1mDrZrdQAB6BfaGKLLW+qawUsuli2ZuTJIyG Y6V0svlSMRaAvnj0THfqkOLzpc3kf2QIkUetjLRvQqo=
Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) (TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPS; Fri, 14 Feb 2020 14:22:30 +0200 id 00000000005A0170.000000005E469106.00002BC5
Date: Fri, 14 Feb 2020 14:22:30 +0200
From: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
X-X-Sender: ijjarvin@whs-18.cs.helsinki.fi
To: tcpm@ietf.org, Yuchung Cheng <ycheng@google.com>, Neal Cardwell <ncardwell@google.com>, nanditad@google.com, priyarjha@google.com
Message-ID: <alpine.DEB.2.20.2002141414420.25645@whs-18.cs.helsinki.fi>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/sDSWzHZL_3S9eVMiM6bMh5zTdv8>
Subject: [tcpm] Review of draft-ietf-tcpm-rack-07
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Feb 2020 12:22:41 -0000

In general, I find the draft quite good shape already, it's quite easy
to read and follow (that is, except the TLP sections that are specifically 
mentioned below).

Here's the list of issues I came across:

Sect 6.1

"RACK.pkts_sacked" lacks the empty line separator.

Sect 7.2

  "This
   heuristic would fail to update RACK stats if the packet is spuriously
   retransmitted because of a recent minimum RTT decrease (e.g. path
   change)."
This defies my logic as RTT decrease should not result in spurious
retransmissions but help to avoid them.


  "Note that
   extensions that require additional TCP features (e.g.  DSACK) would
   work if the feature functions simply return false."
Please rephrase this. I suspect you meant that if some functions relate
to TCP features that are not available, the given code still works
as long as the functions related to such features return false when
the TCP feature is not available but that surely isn't what the
sentence currently says.

Sect 7.2 Step 5

  "For timely loss detection, the sender MAY
   install a "reordering settling" timer set to fire at the earliest
   moment at which it is safe to conclude that some packet is lost.  The
   earliest moment is the time it takes to expire the reordering window
   of the earliest unacked packet in flight."
vs
   RACK_detect_loss():
      ...
          timeout = max(remaining, timeout)
Which way it is, "some" or "all"? The use of max() in RACK_detect_lost()
seems to indicate that the timeout will be set according to the longest
time, that is, the latest moment rather than earliest moments. Which
is the intented behavior here? (I can there might be merits in both
approaches).


In general, I didn't like the presentation order here as this form looks
the most obvious to me:
   now >= Packet.xmit_ts + RACK.reo_wnd + (RACK.ack_ts - RACK.xmit_ts)
...and I find the derivation of it just complicating things rather
than helping but that's perhaps just my opinion.


   "It is also useful when the timestamp clock granularity is close to
    or longer than the actual round trip time."
Given that some things based on timestamps will obviously stop working
if the clock is slower than one tick per RTT I found this fragment a bit
odd. RFC 7323 says "the clock SHOULD tick at least once per window's worth 
of data". I can understand though that a host might find clock tick 
granularity challenging due to extensive RTT range but on the same
I'd not want such a "too slow clock" behavior to be endorsed as if
it would be just normal. That is, I'm not against RACK handling
also this case nicely but it would be nice to somehow downplay the
normalness tone in it with something along the lines of "This check is 
also useful if the recommended one tick per RTT cannot be achieved due
to timestamp clock granularity."

Is RACK_detect_lost() supposed to detect also lost rexmits? How?
The rexmit code itself is not in the draft so it's hard to evaluate
how the Packet booleans are supposed to work in the first place.
I'm not saying the rexmit code should be included but currently it
surely looks like any rexmitted packet just remains .lost = TRUE and
.retransmitted = TRUE state until RTO?


Sect 7.4

"PTO" just appears out of nowhere (and is not even written open
in the first instance).

   "2. ... in step (2) of the algorithm."
"step (2)" is forward-refing, it would naturally refer back to
"Step 2: Update RACK stats" so it makes very little sense as is.

Sect 7.5

Separator missing for PTO: and TLPRxtOut:

Sect 7.5.1

  "3.  Upon receiving a SACK that selectively acknowledges data that was
       last sent before the segment with SEG.SEQ=SND.UNA was last
       (re)transmitted"
This is hard to understand, perhaps rephrase it somehow?


  "But choosing PTO to be
   exactly an SRTT is likely to generate spurious probes given that
   network delay variance and even end-system timings can easily push an
   ACK to be above an SRTT.  We chose PTO to be the next integral
   multiple of SRTT.

   Similarly, network delay variations and end-system processing
   latencies and timer granularities can easily delay ACKs beyond
   2*SRTT, so senders SHOULD add at least 2ms to a computed PTO value
   (and MAY add more if the sending host OS timer granularity is more
   coarse than 1ms)."
Does this later paragraph depend on some assumptions (such as very
low RTT?) as I don't otherwise understand why you say what you say
in the given order. First you said x & y can push ACK over SRTT so
we pick 2* but then you say the same things can push over 2*SRTT
too?

WCDelAckT to terminology?

Sect 7.5.2

TLP_send_probe():
	- Perhaps set TLPRxtOut somewhere?
	- Add "Arm RTO" to the end?

  "In the case where
   there is only one original outstanding segment of data (N=1), the
   same logic (trivially) applies: an ACK for a single outstanding
   segment tells the sender the N-1=0 segments preceding that segment
   were lost.  Furthermore, whether there are N>1 or N=1 outstanding
   segments,"
If this distinction is useful to make at all, maybe just simply say:
"Applies also in case N=1."

Sect 7.6

  "SND.NXT at the time the episode started."
Isn't this TLPHighRxt, why not use it here?

Sect 7.6.2

  "Upon sending a TLP probe that is a retransmission, the sender sets
   TLPRxtOut to true and TLPHighRxt to SND.NXT."
This should appear in TLP_send_probe().

"Detecting recoveries accomplished by loss probes"
Did you want to start another section here?

"Step 1: Track ACKs ..." until what point? (It's told on later but
this question comes into mind when reading the draft in order).


Sect 7.4 - 7.6.2

In general, this particular part of the draft seems to require some
further editing. After reading it all through, I feel most bits are
present but the text is very hard to follow. I think the presentation
order needs significant work. There are random bits here and there
that are explained only later. Also, algorithm lacks some parts
discussed only in the text but variable like items should definitely
be included to the algorithm.


Sect 8.4

  "RACK is compatible with and does not interfere with the the standard
   RTO [RFC6298], RTO-restart [RFC7765], F-RTO [RFC5682] and Eifel
   algorithms [RFC3522]. ...
   It neither changes the RTO timer calculation nor detects spurious
   timeouts."
For F-RTO, also send order is significant.

Sect 8.5

SHOULDs PRR (and also placed to normative references) that is only
experimental and RACK aims to standards track. I feel that PRR is
disjoint enough anyway, which I think was also your intention when
maintaining the decoupling CC and RACK, that it's bit odd to take
this strong stance on PRR (indepedent of the stds track vs
experimental thing).

  "due to congestion window validation."
Does this refer to CWV as in RFC? Or some other mechanism in which
case you shouldn't use the same term here.

I find the described scenario somewhat unconvincing becase
the description ends with:
  "...
   The ending congestion window after recovery also impacts
   subsequent data transfer."
...First there wasn't enough data to fill cwnd of 20 packets and
then suddenly transferring that non-existing data is impacted.

References

QUIC-LR is very old ref and even the filename has changed a few
times from draft-tsvwg-...


Nits

"runt" made the sentence incomprehendable for me, I had to look up
that word (and only second source was descriptive enough that I
think I could understand the meaning, and I'd guessed wrong because
I thought there was some special intent here because of the quotes).

is dicussed -> is discussed
RACT.RTT -> RACK.rtt
First "TSO" -> TCP Segmentation Offload (TSO)
grainularity -> granularity
higher-sequenc -> higher-sequence


-- 
 i.