Re: [tcpm] [EXTERNAL] WGLC for draft-ietf-tcpm-rack-08

Yuchung Cheng <> Wed, 15 July 2020 19:30 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 84E323A0E91 for <>; Wed, 15 Jul 2020 12:30:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id VXb_WLmzWtG0 for <>; Wed, 15 Jul 2020 12:30:00 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::a2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7E6FC3A0C56 for <>; Wed, 15 Jul 2020 12:30:00 -0700 (PDT)
Received: by with SMTP id c11so754636vkn.11 for <>; Wed, 15 Jul 2020 12:30:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=kxsv+3EmJamx/LQjFCLzTh5yGWFK0Hq750o5iq5MrWQ=; b=p+8b3w18z6CyaCOdPD6qZxQ24Rl+dJElpRVI0rak/r2De1V3kTRYEi06QOATxrAyEm XHdLLYmAzhyrDa9Isep3pO7RppcxQyEE2B1r+2I30I/nDgioE3okQxT2N/QwrKg9uf5E D7mgrx5zzCWs7dI4J2vvMDwBrh+Zi3CIgTA5xMkbhY+ky2eE0rMry1fCrlJ6RriEULPK 39FokYVDjvqW46RK5Vda825isi3+62R94zL43MIgwjoGUT51dRbXVmwG3jEeQ48xYpDY x/kdKLXprUTPO6zFz+dsGMxRYDZKD6D/B6RUVlF1Zvo9s0w3eSqw6RX71KU2Vt6fhv1x BGTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=kxsv+3EmJamx/LQjFCLzTh5yGWFK0Hq750o5iq5MrWQ=; b=ZpKUdo6SKEhbMOD8JwSgKGHXtxeRwW4ogFY9uuzh51jdHquYJtyU3Vyuq/ICvFzHM1 mASwCw964QgVxazt2Mku/QpXjC2YQi39esZFUZv3S69ielTIqVGP0JIKCnJSn9CPH9lx OJwLViDWvqHmo58R0lI01QLhyBv+PzNu/fyTPi49xzOJOvu5n3BOhHR7xrrRbMBZWnAB ivxj5PEi8UmxAEJuPAduUYmbOgP9XnABI3LvI2GgcY/FeLB6ftvIpyQqiJ2zmEjIop4o OpkdZsrTVX9IkN0MYktMqEN3HdVkUdVV4C+ZwklSI+0YGhDS20Pz6HiyA/2TCuYK7AsU IuEA==
X-Gm-Message-State: AOAM532S5E+RkdYO4W0qb7BL80yNfkdDernpyVhTSEiCG54RHkWftBV3 cR1KuS9Cy8JVU5dkUjO6FxIdEZ48yHrUxfmGiZjWVg==
X-Google-Smtp-Source: ABdhPJy8wzn1Ro1WN8t1QS1l7GRuxCKTipY1t+LoxiDxELt1w1fZ5E1+G5IglPyr1vIdzocanLt3bFRtC5XfvLRvI48=
X-Received: by 2002:a1f:3d47:: with SMTP id k68mr589371vka.68.1594841399127; Wed, 15 Jul 2020 12:29:59 -0700 (PDT)
MIME-Version: 1.0
References: <> <>
In-Reply-To: <>
From: Yuchung Cheng <>
Date: Wed, 15 Jul 2020 12:29:22 -0700
Message-ID: <>
To: Praveen Balasubramanian <>
Cc: Michael Tuexen <>, tcpm IETF list <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [tcpm] [EXTERNAL] WGLC for draft-ietf-tcpm-rack-08
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 15 Jul 2020 19:30:03 -0000

On Mon, Apr 6, 2020 at 3:54 PM Praveen Balasubramanian
<> wrote:
> We reviewed the latest version of RACK as a part of the TCPM WGLC and this is some feedback:
> Section 5 Requirements
> >>  For each packet sent, the sender MUST store its most recent transmission time
> It is important to call out that in practice for sends that are batched together in time, an implementation may choose to keep per-send transmission timestamps instead of per-packet. This improves memory efficiency. The way the RFC requirement is worded it seems to exclude such an optimization.

Good idea. We have added
"   2.  For each data segment sent, the sender MUST store its most recent
       transmission time with a timestamp whose granularity that is
       finer than 1/4 of the minimum RTT of the connection.  At the time
       of writing, microsecond resolution is suitable for intra-
       datacenter traffic and millisecond granularity or finer is
       suitable for the Internet.  Note that RACK-TLP can be implemented
       with TSO (TCP Segmentation Offload) support by having multiple
       segments in a TSO aggregate share the same timestamp."

> Section 4 Design Rationale for Reordering Tolerance
> >>If RACK becomes widely deployed, the underlying networks may introduce more reordering for higher throughput.
> A better word here would be universally deployed because otherwise networks would be unfairly harming TCP stacks that have not deployed RACK. Plus as pointed out, reordering puts other strains on the network including ECMP, RSS, hardware offloads etc. Consider removing any recommendations for networks from this draft to prevent any unintentional relaxing of requirements on networks. This section should ideally only focus on why RACK chooses to relax reordering requirements as they pertain to loss detection and recovery.

yes the old text was being mis-intepreted as RACK encourages
reordering so we tried our best to make that clear the opposite is
true, toward the end of a new "reordering design" sub-section

> >>temporarily override the reorder window
> Nit: Let's consistently call this reordering window throughout the draft.
great idea. we have now renamed all different names to "RACK
reordering window, RACK.reo_wnd" to be specific throughout, instead of
"reordering settle time", "reordering time window", etc

> Section 7.2 Upon receiving an ACK   Step 5.
> >>For timely loss detection, the sender MAY install a "reordering settling" timer set to fire at the earliest moment at which it is safe to conclude that some packet is lost.
> At least in lab tests this timer seems to make a big difference. Recommend changing this to a SHOULD instead of MAY.

s.g. s/MAY/RECOMMENDED in step 5 of

For timely loss detection, the sender is RECOMMENDED to install a
   reordering timer.  This timer expires at the earliest moment when
   RACK would conclude that all the unacknowledged segments within the
   reordering window were lost.

> >>This can be implemented by using a seperate doubly-linked list sorted in time order.
> Typo separate. We shouldn't recommend a specific data structure.
fixed typo and removed the RECOMMENDED

> Some questions that need more explanation:
> 1. What about TLP and its relationship to Limited Transmit? Both of them are techniques trying to solve the same problem. Its important to clarify whether Limited Transmit is left unchanged from RFC 6675. It is also unclear whether Limited Transmit should be upgraded from a MAY to a SHOULD. The non-SACK case might benefit from Limited Transmit?
TLP is 100% compatible with LT. We have some text on this now

7.2.  Scheduling a loss probe

   The sender schedules a loss probe timeout (PTO) to transmit a segment
   during the normal transmission process.  The sender SHOULD start or
   restart a loss probe PTO timer after transmitting new data (that was
   not itself a loss probe) or upon receiving an ACK that cumulatively
   acknowledges new data, unless it is already in fast recovery, RTO
   recovery, or the sender has segments delivered out-of-order (i.e.
   RACK.segs_sacked is not zero).  These conditions are excluded because
   they are addressed by similar mechanisms, like Limited Transmit
   [RFC3042], the RACK reordering timer, and F-RTO [RFC5682].

> 2. Why does TLP require SACK? In the case SACK negotiation fails, or SACK blocks are stripped in the network, TLP may still recover faster than an RTO when FlightSize is 1.
     TLP requires RACK which requires SACK. TLP on non-sack is in
theory possible but has very limited benefit. It can repair the single
loss well, but consider this TLP w/ neither RACK & SACK:
    t0: send 10 packets (P1...P10, lost all except number 8)
    t1: dupack by P8
    t2: send TLP of P10
    t3: dupack by P10
    --> ? still RTO given that only two dupacks are received.

We decided early on it's no point to complicate TLP further to support
non-sack, given sack is widely deployed. the non-sack deserves to stay
unoptimized (their choice).

> -----Original Message-----
> From: tcpm <> On Behalf Of Michael Tuexen
> Sent: Monday, March 30, 2020 3:28 PM
> To: tcpm IETF list <>
> Subject: [EXTERNAL] [tcpm] WGLC for draft-ietf-tcpm-rack-08
> Dear all,
> just a quick reminder:
> We are currently running a WGLC for TCP RACK until Tuesday, April 7th 2020.
> Please send any comments, including indications to support this document, to the TCMP mailing list by then.
> The ID is available at
> Best regards
> Michael
> _______________________________________________
> tcpm mailing list