Re: [tcpm] Review of draft-ietf-tcpm-rack-07

Yuchung Cheng <ycheng@google.com> Fri, 08 May 2020 23:08 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EC5713A106F for <tcpm@ietfa.amsl.com>; Fri, 8 May 2020 16:08:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level:
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NgWmjJBdzWrb for <tcpm@ietfa.amsl.com>; Fri, 8 May 2020 16:08:33 -0700 (PDT)
Received: from mail-vk1-xa44.google.com (mail-vk1-xa44.google.com [IPv6:2607:f8b0:4864:20::a44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1E7403A105C for <tcpm@ietf.org>; Fri, 8 May 2020 16:08:33 -0700 (PDT)
Received: by mail-vk1-xa44.google.com with SMTP id i185so866129vki.12 for <tcpm@ietf.org>; Fri, 08 May 2020 16:08:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=daZizMW/hvMktefnFxXDsbHHmMMsRvPd53gytqD2HBk=; b=lyjPUx4710pHRs2NGYDQBPL3b5VU6epspAxu/bJQhSQAuAStWT5JROCimM1v4270QO Ef69Fn7YzSxSBbkIocP/Ox7xljWpOYPumkixjG9skWuhN07bPdjF7+dhGeBGiofKkRXb flx0eFlwyzwE7cRKQVBYdsr3VosCsycnMAJXettvClH26tphdvnwmo2ZZR6eVeT/+mun Xn12FLa3E/obli3bQ2F89Rrp4LSQKMB8gembeDRENCIVwsxODNk/XDMnRr8IWGmCjaxd ltpkJ84VclTiemNfxgpPz78p8l8y8c6VENlddYGsMfBCtDXLsxCOiLbByrBhY6GZ3Cgh nSnA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=daZizMW/hvMktefnFxXDsbHHmMMsRvPd53gytqD2HBk=; b=dOWotiu1G5MNk+ZjQdJb/hSM4tXXphWvtxbL/BiBHPGaA/9HeudhB1xVo1q0uvX6RF s5c5a3AoA2khJVD37jEq0t+qie+0lyHGAPLKBCpEyyCYMg6uxosGkXZ0jkdTBD6LUuaA nJJWzgKmiPuFdNowpLbodXWhEPXYyx6pSz3+/b2GZdU4D4TTie5NkoT23akBK71vtflY buHjyxI5YjmI7weLDvm5vR348qiMGb2EYiNfQto3PVxSMEX1qU8Zaeoftk72SSRoWPVG E6Zl1BxnP7HG5/1gu9KOnuXm/jxEOoXoZUwz7yQ/SUTme8RPxVVMkiO05QR5KHKJnMw6 rF7A==
X-Gm-Message-State: AGi0PuZAj3kYVCuOTfA1+CmcjJr59LKEoO3Y5CXDOYbYS8p2hrMO1Vpk OH87ZKYlaLF966Hnxo1MFofViupio2dUVVe2JYSUQw==
X-Google-Smtp-Source: APiQypLt3DCxX4g1oOqTtneQU7BwSuy7a/QEgx6NiZcD+NXDSlo8SJgn/BCpXQzrdqb8ECno1dK6CsiFgnTMvOsOsIY=
X-Received: by 2002:a1f:1757:: with SMTP id 84mr4224482vkx.68.1588979311678; Fri, 08 May 2020 16:08:31 -0700 (PDT)
MIME-Version: 1.0
References: <alpine.DEB.2.20.2002141414420.25645@whs-18.cs.helsinki.fi> <CAK6E8=dZN5PnCt7QFoyGJUnSoCcg0XUKxyZCbJAM_2tVN7A8wQ@mail.gmail.com> <alpine.DEB.2.20.2004062342530.14114@whs-18.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.20.2004062342530.14114@whs-18.cs.helsinki.fi>
From: Yuchung Cheng <ycheng@google.com>
Date: Fri, 08 May 2020 16:07:55 -0700
Message-ID: <CAK6E8=eVjMACZSkUWmY9Z+fiid6cdfG310y4O8G32cEeXFAhFQ@mail.gmail.com>
To: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Neal Cardwell <ncardwell@google.com>, Nandita Dukkipati <nanditad@google.com>, Priyaranjan Jha <priyarjha@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/JoaC8pPaFLr3ygeJIj765k1Bj3I>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-rack-07
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 May 2020 23:08:38 -0000

On Mon, Apr 6, 2020 at 2:01 PM Ilpo Järvinen
<ilpo.jarvinen@cs.helsinki.fi> wrote:
>
> Hi Yuchung,
>
> Certainly improved on many things, however, I'm surprised that the lost
> rexmit comment was not addressed at all. Aren't the rexmit losses
> supposed to be detected by RACK or have I misunderstood something?
>
> I don't understand how lost rexmits are detected by the algorithms in
> the draft. I think the algorithms are either flawed or lack something
> (mainly RACK_detect_loss is relevant I think). My original comment
> from below.
>
>        Is RACK_detect_loss() supposed to detect also lost rexmits? How?
>        The rexmit code itself is not in the draft so it's hard to evaluate
>        how the Packet booleans are supposed to work in the first place.
>        I'm not saying the rexmit code should be included but currently it
>        surely looks like any rexmitted packet just remains .lost = TRUE
>        and .retransmitted = TRUE state until RTO?

Sorry to miss this point earlier -- RACK definitely detects
lost-retransmits, but the current algorithm does not properly define
some variables. Here is how I propose to address these by adding
these:


Per-packet variables


Theses variables indicate the status of the most recent transmission
of a data packet:
        “Packet.lost” is true if the most recent (re)transmission of
the Packet has been considered lost and needs to be retransmitted.
False otherwise.
        “Packet.retransmitted” is tue if its sequence numbers were
retransmitted in the most recent transmission. False otherwise.
        “Packet.xmit_ts” is the time of the last transmission of a
data packet, including retransmissions, if any. The sender needs to
record the transmission time for each packet sent and not yet
acknowledged. The time MUST be stored at millisecond granularity or
finer.


Transmitting a data packet
Upon transmitting a new packet or retransmitting an old packet, record
the time in Packet.xmit_ts.

--- src

RACK_transmit_data(Packet):

Packet.time_ts = Now()

Packet.lost = FALSE


RACK_retransmit_data(Packet)

Packet.retransmitted = TRUE
      RACK_transmit_data(Packet)

---

When a packet is retransmitted again, its Packet.lost is reset and
transmission timestamp is renewed. Therefore, when its transmission
time falls out of the RACK reordering window in RACK_detect_loss(), it
will be marked as lost (again).




>
> Also my timeout "some" vs "all" internal inconsistency comment relates to
> this same algorithm:
>
>          "For timely loss detection, the sender MAY
>           install a "reordering settling" timer set to fire at the earliest
>           moment at which it is safe to conclude that some packet is lost.  The
>           earliest moment is the time it takes to expire the reordering window
>           of the earliest unacked packet in flight."
>        vs
>           RACK_detect_loss():
>              ...
>                  timeout = max(remaining, timeout)
>        Which way it is, "some" or "all"? The use of max() in RACK_detect_loss()
>        seems to indicate that the timeout will be set according to the longest
>        time, that is, the latest moment rather than earliest moments. Which
>        is the intented behavior here? (I can there might be merits in both
>        approaches).

Comment 1.2: good catch. We changed to: 'For timely loss detection,
the sender is RECOMMENDED to install a "reordering settling" timer.
The timer expires at the earliest moment when it is safe to conclude
that all the unacknowledged packets within the reordering window are
lost.'


>
>
> Then there are some other comments I made which seem also unaddressed but
> I'll not duplicate them all here.
>
>
> --
>  i.
>
>
> On Mon, 9 Mar 2020, Yuchung Cheng wrote:
>
> > Hi Ilpo,
> > Thanks for a detailed review and the great suggestions. Here's all the edits
> > we made.
> > https://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-rack-08
> >
> >  We defined the missing terms and addressed the nits, removed the less
> > critical parts like recommending PRR, and edited TLP algorithms (sections
> > 7.4 to 7.6). I hope you find it's better now and would be happy to hear any
> > improvements you'd like to make.
> >
> > Yuchung
> >
> > On Fri, Feb 14, 2020 at 4:22 AM Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
> > wrote:
> >       In general, I find the draft quite good shape already, it's
> >       quite easy
> >       to read and follow (that is, except the TLP sections that are
> >       specifically
> >       mentioned below).
> >
> >       Here's the list of issues I came across:
> >
> >       Sect 6.1
> >
> >       "RACK.pkts_sacked" lacks the empty line separator.
> >
> >       Sect 7.2
> >
> >         "This
> >          heuristic would fail to update RACK stats if the packet is
> >       spuriously
> >          retransmitted because of a recent minimum RTT decrease (e.g.
> >       path
> >          change)."
> >       This defies my logic as RTT decrease should not result in
> >       spurious
> >       retransmissions but help to avoid them.
> >
> >
> >         "Note that
> >          extensions that require additional TCP features (e.g.  DSACK)
> >       would
> >          work if the feature functions simply return false."
> >       Please rephrase this. I suspect you meant that if some functions
> >       relate
> >       to TCP features that are not available, the given code still
> >       works
> >       as long as the functions related to such features return false
> >       when
> >       the TCP feature is not available but that surely isn't what the
> >       sentence currently says.
> >
> >       Sect 7.2 Step 5
> >
> >         "For timely loss detection, the sender MAY
> >          install a "reordering settling" timer set to fire at the
> >       earliest
> >          moment at which it is safe to conclude that some packet is
> >       lost.  The
> >          earliest moment is the time it takes to expire the reordering
> >       window
> >          of the earliest unacked packet in flight."
> >       vs
> >          RACK_detect_loss():
> >             ...
> >                 timeout = max(remaining, timeout)
> >       Which way it is, "some" or "all"? The use of max() in
> >       RACK_detect_lost()
> >       seems to indicate that the timeout will be set according to the
> >       longest
> >       time, that is, the latest moment rather than earliest moments.
> >       Which
> >       is the intented behavior here? (I can there might be merits in
> >       both
> >       approaches).
> >
> >
> >       In general, I didn't like the presentation order here as this
> >       form looks
> >       the most obvious to me:
> >          now >= Packet.xmit_ts + RACK.reo_wnd + (RACK.ack_ts -
> >       RACK.xmit_ts)
> >       ...and I find the derivation of it just complicating things
> >       rather
> >       than helping but that's perhaps just my opinion.
> >
> >
> >          "It is also useful when the timestamp clock granularity is
> >       close to
> >           or longer than the actual round trip time."
> >       Given that some things based on timestamps will obviously stop
> >       working
> >       if the clock is slower than one tick per RTT I found this
> >       fragment a bit
> >       odd. RFC 7323 says "the clock SHOULD tick at least once per
> >       window's worth
> >       of data". I can understand though that a host might find clock
> >       tick
> >       granularity challenging due to extensive RTT range but on the
> >       same
> >       I'd not want such a "too slow clock" behavior to be endorsed as
> >       if
> >       it would be just normal. That is, I'm not against RACK handling
> >       also this case nicely but it would be nice to somehow downplay
> >       the
> >       normalness tone in it with something along the lines of "This
> >       check is
> >       also useful if the recommended one tick per RTT cannot be
> >       achieved due
> >       to timestamp clock granularity."
> >
> >       Is RACK_detect_lost() supposed to detect also lost rexmits? How?
> >       The rexmit code itself is not in the draft so it's hard to
> >       evaluate
> >       how the Packet booleans are supposed to work in the first place.
> >       I'm not saying the rexmit code should be included but currently
> >       it
> >       surely looks like any rexmitted packet just remains .lost = TRUE
> >       and
> >       .retransmitted = TRUE state until RTO?
> >
> >
> >       Sect 7.4
> >
> >       "PTO" just appears out of nowhere (and is not even written open
> >       in the first instance).
> >
> >          "2. ... in step (2) of the algorithm."
> >       "step (2)" is forward-refing, it would naturally refer back to
> >       "Step 2: Update RACK stats" so it makes very little sense as is.
> >
> >       Sect 7.5
> >
> >       Separator missing for PTO: and TLPRxtOut:
> >
> >       Sect 7.5.1
> >
> >         "3.  Upon receiving a SACK that selectively acknowledges data
> >       that was
> >              last sent before the segment with SEG.SEQ=SND.UNA was
> >       last
> >              (re)transmitted"
> >       This is hard to understand, perhaps rephrase it somehow?
> >
> >
> >         "But choosing PTO to be
> >          exactly an SRTT is likely to generate spurious probes given
> >       that
> >          network delay variance and even end-system timings can easily
> >       push an
> >          ACK to be above an SRTT.  We chose PTO to be the next
> >       integral
> >          multiple of SRTT.
> >
> >          Similarly, network delay variations and end-system processing
> >          latencies and timer granularities can easily delay ACKs
> >       beyond
> >          2*SRTT, so senders SHOULD add at least 2ms to a computed PTO
> >       value
> >          (and MAY add more if the sending host OS timer granularity is
> >       more
> >          coarse than 1ms)."
> >       Does this later paragraph depend on some assumptions (such as
> >       very
> >       low RTT?) as I don't otherwise understand why you say what you
> >       say
> >       in the given order. First you said x & y can push ACK over SRTT
> >       so
> >       we pick 2* but then you say the same things can push over 2*SRTT
> >       too?
> >
> >       WCDelAckT to terminology?
> >
> >       Sect 7.5.2
> >
> >       TLP_send_probe():
> >               - Perhaps set TLPRxtOut somewhere?
> >               - Add "Arm RTO" to the end?
> >
> >         "In the case where
> >          there is only one original outstanding segment of data (N=1),
> >       the
> >          same logic (trivially) applies: an ACK for a single
> >       outstanding
> >          segment tells the sender the N-1=0 segments preceding that
> >       segment
> >          were lost.  Furthermore, whether there are N>1 or N=1
> >       outstanding
> >          segments,"
> >       If this distinction is useful to make at all, maybe just simply
> >       say:
> >       "Applies also in case N=1."
> >
> >       Sect 7.6
> >
> >         "SND.NXT at the time the episode started."
> >       Isn't this TLPHighRxt, why not use it here?
> >
> >       Sect 7.6.2
> >
> >         "Upon sending a TLP probe that is a retransmission, the sender
> >       sets
> >          TLPRxtOut to true and TLPHighRxt to SND.NXT."
> >       This should appear in TLP_send_probe().
> >
> >       "Detecting recoveries accomplished by loss probes"
> >       Did you want to start another section here?
> >
> >       "Step 1: Track ACKs ..." until what point? (It's told on later
> >       but
> >       this question comes into mind when reading the draft in order).
> >
> >
> >       Sect 7.4 - 7.6.2
> >
> >       In general, this particular part of the draft seems to require
> >       some
> >       further editing. After reading it all through, I feel most bits
> >       are
> >       present but the text is very hard to follow. I think the
> >       presentation
> >       order needs significant work. There are random bits here and
> >       there
> >       that are explained only later. Also, algorithm lacks some parts
> >       discussed only in the text but variable like items should
> >       definitely
> >       be included to the algorithm.
> >
> >
> >       Sect 8.4
> >
> >         "RACK is compatible with and does not interfere with the the
> >       standard
> >          RTO [RFC6298], RTO-restart [RFC7765], F-RTO [RFC5682] and
> >       Eifel
> >          algorithms [RFC3522]. ...
> >          It neither changes the RTO timer calculation nor detects
> >       spurious
> >          timeouts."
> >       For F-RTO, also send order is significant.
> >
> >       Sect 8.5
> >
> >       SHOULDs PRR (and also placed to normative references) that is
> >       only
> >       experimental and RACK aims to standards track. I feel that PRR
> >       is
> >       disjoint enough anyway, which I think was also your intention
> >       when
> >       maintaining the decoupling CC and RACK, that it's bit odd to
> >       take
> >       this strong stance on PRR (indepedent of the stds track vs
> >       experimental thing).
> >
> >         "due to congestion window validation."
> >       Does this refer to CWV as in RFC? Or some other mechanism in
> >       which
> >       case you shouldn't use the same term here.
> >
> >       I find the described scenario somewhat unconvincing becase
> >       the description ends with:
> >         "...
> >          The ending congestion window after recovery also impacts
> >          subsequent data transfer."
> >       ...First there wasn't enough data to fill cwnd of 20 packets and
> >       then suddenly transferring that non-existing data is impacted.
> >
> >       References
> >
> >       QUIC-LR is very old ref and even the filename has changed a few
> >       times from draft-tsvwg-...
> >
> >
> >       Nits
> >
> >       "runt" made the sentence incomprehendable for me, I had to look
> >       up
> >       that word (and only second source was descriptive enough that I
> >       think I could understand the meaning, and I'd guessed wrong
> >       because
> >       I thought there was some special intent here because of the
> >       quotes).
> >
> >       is dicussed -> is discussed
> >       RACT.RTT -> RACK.rtt
> >       First "TSO" -> TCP Segmentation Offload (TSO)
> >       grainularity -> granularity
> >       higher-sequenc -> higher-sequence
> >
> >
> >       --
> >        i.
> >
> >
> >