Re: [tcpm] WGLC for draft-ietf-tcpm-rack-08

Yuchung Cheng <> Wed, 15 July 2020 00:36 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E40543A0817 for <>; Tue, 14 Jul 2020 17:36:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id ONSyMdRDgCzI for <>; Tue, 14 Jul 2020 17:36:12 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::92d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 4E8203A0544 for <>; Tue, 14 Jul 2020 17:36:12 -0700 (PDT)
Received: by with SMTP id b13so50255uav.3 for <>; Tue, 14 Jul 2020 17:36:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Pm4I/hVf1AiYLZ4hELo1U5wYLjAjfHlo1v7R+xdohPo=; b=SNWWTgqkJIBc+6GPDgAbKXsmkf8VlBzTJ5/RAPCry+Y1QoHTtEblzgn+Bk22qNtaN2 f2rJJ1Ze+eOtSxMeKA5r8CI7OaAjWgNjjVQGMzduKJh5g1CdJNr0MVF8nWJi6q6dvRf2 4ajB6d6AXnS2iONIoYhEemHEmEr1ckzZk+dmZTUOxZmqpEOCOMTm2hu9wkoqzEsaGjzf kQoLzGJXFhxEJaOLjFF2r9bRQUL/150yZwVzmMlyd520DMdnuxOP+E4NYOTwW0QibXY/ RfkGOrYNZ6C7gdE9tXLrTqh26rFQGYIC06CLoAJuLctiXhx53xe6NuhtiloWZMqrvQsw 1Tmg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Pm4I/hVf1AiYLZ4hELo1U5wYLjAjfHlo1v7R+xdohPo=; b=sjYq8nq9htOvjQg/ZTeUJWqESvzKe11GKRcD6WfpgRK/z8S7gTwPxsp9j56FzoSgkG T+TpyOLU+7ApS3jTLNyGvKksseQDHgdJJ0R+RHsTRuN7T9ogYNlCtBhr44Sbr6Fai8b/ FSpNVmQhCxpD8Gj9E67GD8i++FLv5bqXgyNj8JYSLiqpjXf6/cpQGOlHy+EAjRvgPO5l 2SwZ0s+fPpoC1yYluxJ6UVlg32W6OvgRRg9gmr/xrTA3iNdoSVl3BleqYdW5hOwYRL9E pQEJWQhHV9o5R4MAVx1QG3AWLSsEkA9s+AVmFzUPDgsLidtzJAT2IsFPev+AQYc+h2w5 LONw==
X-Gm-Message-State: AOAM533gwAMo1JhChoBAkbjM47mK7aUrH7qjyk2a5tsGZ92hnvCmAFMK gFXIVh0bYn52crEPdL+dyZhL6eOQ9ObUsVEx/eZ+epSoZFg=
X-Google-Smtp-Source: ABdhPJyZ1NrTP0AnqbHyeqc6Bh3KGMuRW+K/xNDD+uihJZiDGNgfcvM0yD8dMHxWqm7WusHSqrvK9EPy6Am+sfJj21w=
X-Received: by 2002:ab0:15a4:: with SMTP id i33mr5523805uae.85.1594773370706; Tue, 14 Jul 2020 17:36:10 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <>
In-Reply-To: <>
From: Yuchung Cheng <>
Date: Tue, 14 Jul 2020 17:35:34 -0700
Message-ID: <>
To: Gorry Fairhurst <>
Cc: tcpm IETF list <>, Michael Tuexen <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [tcpm] WGLC for draft-ietf-tcpm-rack-08
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 15 Jul 2020 00:36:15 -0000

On Mon, Apr 6, 2020 at 8:24 AM Gorry Fairhurst <> wrote:
> I reviewed the latest version of RACK as a part of the TCPM WGLC.
> I do not think this is ready. I think the specification is important,
> but the document has many editorial issues.
> These issues seem to be very important and I do not think the current
> document is ready to proceed without a revision being prepared that can
> be properly reviewed. Overall I am concerned that this draft provides
> little evidence or justification for the proposed change to an existing
> PS. I do not argue against the proposal, but  I do suggest the
> justification and trade-offs involved needs to be clear before the IETF
> should publish such a specification.
> Some of the comments below could be fixed by editorial work, but as
> currently written this makes many bold statements about IETF consensus
> that I would not agree with. Many of these could be easily rephrased too
> focus only on the topics being standardised, or by providing appropriate
> evidence for the claims.
Thanks for your review and comments. Sorry about the late response.
We've revised the documents to remove all wordings about the frequency or
population of issues we've observed, such as "prevalent reordering ...".

Instead we put an explicit high-level design section to focus on the actual
problems this draft aims to improve / fix, without claiming these problems
are common or not, because they do vary across networks.

> Specific comments follow - some issues I will highlight in a separate mail.
> Gorry
> Section 2:
> —
> I do not see a reference that offers data for this:
> In recent years we have been observing several increasingly common loss
> and reordering patterns in the Internet.
> - Can you explain who “we” are and where this has been published or
> presented? I’d be quite comfortable if this said “in recent tears there
> is an increasing concern about the implications of loss or reordering” -
> but the present set asserts this is common, so I ask for evidence please?

We rewrote sections 1-3 to clarify the motivation, the design goals,
and the rationale for these changes, plus a section to
illustrate/overview the protocol.
Mostly we removed “we” and other claims based on unpublished data or
anecdotal evidence.

> —
> This is not explained:
> “ Structured request-response traffic turns more losses into tail drops.”
> - I don’t think the losses are turned into tail drops, maybe the losses
> are observed at the tails of transmission?
We removed the confusing word "tail" drops. instead a new subsection to
explain the problems more precisely

> —
> This is not justified:
> “ Despite TCP stacks (e.g.  Linux) that implement many of the standard
>     and proposed loss detection algorithms
> [RFC4653][RFC5827][RFC5681][RFC6675][RFC7765][FACK][THIN-STREAM], we've
> found that together they do not perform well. “
> - Can you explain who “we” are and where this has been published/presented?
removed all wordings of "we", but it means the authors. This sentence
is removed in the new revision entirely.

> —
> This sentence doesn’t read well: “They can either detect loss
>     quickly or accurately, but not both, especially when the sender is
>     application-limited or under reordering that is unpredictable. “
> - /They/A sender/ …. /or under reordering that is unpredictable/or when
> the reordering is unpredictable/.
> - What does unpredictable mean in this sentence? Is it about the pattern
> or reordering, the variation in the pattern, the presence or not within
> an RTT or something else?
the pattern of reordering. we put down two specific cases to explain:

> —
> This is not explained or justified:
> “And
>     under these conditions none of them can detect lost retransmissions
>     well.”
> - None of them can doe well? Can you explain this better, saying this is
> not “well” seems a very poor justification for changing a standard
removed this sentence. instead we compared their pros & cons in the

> —
> This is confusing:
> “Also, these algorithms, including RFCs, rarely address the
>     interactions with other algorithms.  “
> - Which? and why this this important?

For all comments above. We agree this section was not clear and some
statements lack data citations.
As mentioned earlier we divided into a motivation section and a design
section to avoid conflating the two.

> —
> Since section 2 seems mainly set out to provide the reason for the
> change, would it not be more normal to precede the RFC2119 terminology
> declaration?

Sure we moved the terminology section to the beginning.

> This could also apply to section 3, which introduces the method, but
> does not appear normative.
> ——
> Section 3.
> “The main idea behind RACK is that if a packet has been delivered out
>     of order, then the packets sent chronologically before that were
>     either lost or reordered.”
> - This is put forward as a TCP specification, and yet it describes
> things in network-layer “packets”. If the IP layer were to fragment,
> does this still work on IP packets?  I think this should be in terms of
> transport segments, or the relationship between packets and segments
> explained.

all packets are replaced with segments (if appropriate).

> —
> “Using a threshold for counting duplicate acknowledgments (i.e.,
>     DupThresh) alone is no longer reliable because of today's prevalent
>     reordering patterns. “
> - To me this is quite an annoying statement to make without reference to
> a source of this consensus. Saying the method is robust to reodering
> seems good, saying the current Internet has "prevalent reordering
> patterns" makes me ask where? what evidence? how important? etc. I
> suggest this document should not be making these claims.
> ——
> “A common type of reordering is that the last
>     "runt" packet of a window's worth of packet bursts gets delivered
>     first, then the rest arrive shortly after in order. “
> - Cite data please and explain the cases, or please do not claim this is
> “common”!
> —
> “Today's prevalent lost retransmissions also cause problems with
>     packet-counting approaches “
> - Cite data please, or please do not claim this is “prevalent”.
re:annoying claims -- explained earlier. We try to be as thorough as
possible to remove all the adjectives in the new revision.

> ——
> “are often unable to infer and quickly repair losses”
> - “often” is being used here in a way that implies this is frequent? Is
> this strictly necessary to define the algorithm. I think it would be
> enough to explain these events can happen, and that the algorithm can
> detect these cases and appropriately respond. I do not see the need to
> make a statement of how common this is.
> —
> This isn’t clear text, I had to read several times to be sure:
> “On each ACK, RACK marks any already-expired packets lost,
>     and for any packets that have not yet expired it waits until the
>     reordering window passes and then marks those lost as well”
> - I think this could be explained more simply.
is this better?
"...  For each ACK received, the sender
   calculates the latest RTT measurement (if eligible) and adjusts the
   expiration time of every segment sent but not yet delivered.  If a
   segment has expired, RACK marks it lost."

> —
> “hurts” performance. Could be better phrased.
Section motivation tries to be more precise on the latency difference
with / without RACK-TLP

> —
>   “TCP congestion control implicitly assumes the
>         feedback from ACKs are from the same bottleneck. “
> - I don’t think this is true. There is no assumption. The design was
> based on a single bottleneck at any one time along the path. However
> that does not imply that more than one bottleneck can not be present.
Fair enough. We removed this since it's more or less out of scope of RACK-TLP

> —
> “Therefore it
>         cannot handle well scenarios where packets are traversing largely
>         disjoint paths.”
> - I don’t know what “handle well” means, not do I understand “largely
> disjoint” really means in this context. Can this be explained?
> —
> “Having an excessively large reordering window to
>         accommodate widely different latencies from different paths would
>         increase the latency of loss recovery.”
> - This needs better phrasing. Why does this have to be “excessively”
> large and why does a TCP transport care about “different paths”
> irrespective of being “widely different” - I could not understand what
> was intended at all.

In order to be more clear about the reordering design of RACK-TLP, we made
an explicit new section
please let us know if that looks more clear

> - —
> “An end-to-end transport protocol cannot tell immediately whether a
>     hole is reordering or loss. “
> - Please explain hole before using.
removed the colloquial word "hole". new text

"Upon receiving an ACK indicating an out-of-order data delivery, a
   sender cannot tell immediately whether that out-of-order delivery was
   a result of reordering or loss. "

> —
> “How long the sender waits for such
>     potential reordering events to settle is determined by the current
>     reordering window.”
> - NiT; except one is measured in packets, and “how long” is in units of
> seconds.
> —
> “The initial RACK reordering window SHOULD be set to a small
>         fraction of the round-trip time.”
> - Sounds good, is this after connection setup, or before? How does the
> sender know the RTT?
Nice catch. The sender is assumed to implement RFC6298 and estimate RTT.
But in case the sender does not know (e.g. syn retransmits w/o TS),
we advise to keep reo_wnd 0 w/o further better choice so it mimics old 3-dupack

"1. If the sender has observed some reordering since the connection
       was established, then the RACK reordering window SHOULD be set to
       a small fraction of the round-trip time, or zero if no round-trip
       time estimate is available."

> —
> “ To accomplish this RACK places the
>     following mandates on the reordering window:”
> - The discussion ‘mandates’ appears to be different to the discussion of
> requirements. I don’t see the difference explained. I wonder whether the
> document should place all requirements in section 5, but if not, then
> the purpose should be better explained.
That's a good idea. We consolidated all reordering window requirements into

> —
> “   RACK does not need any change on the receiver.”
> - OK. However, there is a requirement that the receiver reports loss
> using SACK.
added that to be clear

> —
> “ RACK.dsack" indicates if a DSACK option .”
> - DSACK not mentioned until here. Should you cite DSACK [RFC3708] ?
cited now in -09

> ---
> “The RECOMMENDED value for  WCDelAckT is 200ms.”
> -Why this value? is this linked to TCP Delayed ACK value?
yes it's now named TLP.max_ack_delay to be explicit.
200ms originated from Linux which we believe (but not are not certain)
came from old BSD.

but this value is fairly inadequate in modern networking as some
recent discussions reveal.
so we changed to to

Third, when FlightSize is one segment, the sender MAY inflate PTO by
   TLP.max_ack_delay to accommodate a potential delayed acknowledgment
   and reduce the risk of spurious retransmissions.  The actual value of
   TLP.max_ack_delay is implementation-specific.

to avoid hard-coding yet another number that'd become obsolete soon

> —
> “We have evaluated using the smoothed RTT”
> …. where is the data?
Not published

> —
> “They do not make any significant difference in terms of total recovery
> latency.”
> - Is this true of **ALL** cases? I strongly dislike such conclusions in
> RFCs. The document can only make conclusions based on available data,
> and claiming this for arbitrary paths is dangerous and unnecessary.
> -
> “While RACK can be a supplemental loss
>     detection mechanism on top of these algorithms, this is not
>     necessary, because RACK implicitly subsumes most of them.”
> - what is not necessary RACK? or one of them? why does it say “most”,
> please be specific.

We removed this. In the abstract we make it clear what RACK is intended to
be (an alternative to conventional DUPACK-counting approach).

> -
> I think sections 8 and 9 are both informative. Is this the case? I like
> this information, but I would prefer the sections to explain that they
> are informative, if that is the case
I am not sure what to add to the section. "This is an informative section, ..."
> ---
> “"Common causes of RTOs include:"
> - This seems unnecessary, can’t you scope this to the ones that you
> think TLP will address, instead of making this very general statement.
> The current text seems to need a reference, whereas saying this is what
> TLP addresses would not.
sure removed

> —
> “"A sender should schedule a PTO only if all of the following conditions
> are met"
> - what is the actual requirement here? I’m searching to understand what
> is actually needed or recommended.
The requirement is clear to me. but here is our newer text to hopefully
make the reason clear to others:
"After attempting to send a loss probe, regardless of whether a loss
   probe was sent, the sender MUST re-arm the RTO timer, not the PTO
   timer, if FlightSize is not zero.  This ensures RTO recovery remains
   the last resort if TLP fails."

> ---
> NiTs:
> “eg.” should be “e.g.,”

> “is the easiest way to get a network to go faster.”
> - This needs to be explained better.
This is over-simplified. so we remove this.

> ---
> “Therefore their main
>     constraint on speed is reordering, and there is pressure to relax
>     that constraint.  “
> - I object to this statement within a TCPM document. This is not related
> to the TCPM Charter which is about maintaining transport protocols and
> not changing network-layer forwarding behaviour.
> I similarly think  the following sentence is entirely inappropriate:
> “If RACK becomes widely deployed, the underlying
>     networks may introduce more reordering for higher throughput. “
> - albeit with a lower case “may”, this is still something that tsvwg has
> spent many meetings discussing and one that has significant pushback and
> therefore a need for careful choice of words!
Fine. the may is not intended to be the "MAY". We *do not* want to
encourage reordering.

Here is our revised text on excessive reordering:
"However, the fact that the initial reordering window is low, and the
   reordering window's adaptive growth is bounded, means that there will
   continue to be a cost to reordering to disincentivize excessive
   network reordering over highly disjoint paths.  For such networks
   there are good alternative solutions, such as MPTCP."
> _______________________________________________
> tcpm mailing list