Re: [tcpm] Review of draft-ietf-tcpm-rack-07

Yuchung Cheng <ycheng@google.com> Mon, 11 May 2020 16:40 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 860773A09B8 for <tcpm@ietfa.amsl.com>; Mon, 11 May 2020 09:40:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level:
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ViH8rv3TuXDO for <tcpm@ietfa.amsl.com>; Mon, 11 May 2020 09:40:33 -0700 (PDT)
Received: from mail-vk1-xa41.google.com (mail-vk1-xa41.google.com [IPv6:2607:f8b0:4864:20::a41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9DD203A0998 for <tcpm@ietf.org>; Mon, 11 May 2020 09:40:32 -0700 (PDT)
Received: by mail-vk1-xa41.google.com with SMTP id 134so1507072vky.2 for <tcpm@ietf.org>; Mon, 11 May 2020 09:40:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=thearFoBqJ3h8Flb0Vn6u25BvVfCTvXDqlRdh5JIU44=; b=gp1b7SpBFz9Vfa+fHxUOmczKpzo1P0vrvY75DJcl9RzaSA4CIWijNQt+pQruKj3kg6 vMxIa32bgtcmyaDjHsL0/oQCqMWnnjnowXxbDOesk5rn+MJ9cbPyUWtg+1hTwH5waTkA +io2E2u4stIuIodC0dsFhNWD7LX3kYP82mwaY1ktaxNH0xaErclbTJSdTli14pZzN6yQ Q3aUt9OwoC2FQfilzIjQ4VG8iW6cltY5ST5kbM839RXr7xh/SBIqUbhHmGA5dhnl5TKI AZzalCM4MSfCWUpZfAdY3bHYP9J+O3rNVOirWGnLJ+BMq9UmKrEWFocyc1dTDLWKsE5h Gt5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=thearFoBqJ3h8Flb0Vn6u25BvVfCTvXDqlRdh5JIU44=; b=niXwnhRMpHy2vfMsSWOsLl2snphSpORTkrV9z+nQoPTiXdY0ov9Mhb871ryEnTWWjZ e0h4pl+QA7kNDO8j68ulqbGM2r1N4UWohaFjiYYUlOX2LIr0PKWcZFNw633iLSr7NJsU Wj0/KKBLBNCT7FKBVZzEAsUMT15BBhzlQxxkI08BhAl10oMHNZaFOy6VQeY8tl0kAMoM EJ/yqaoat7pEO+iiWscHGDBlAqk3xHrWqUsxwLfVwKzZzsmxBOfdS3EJtmNO2I1ibdem Dl9qhRAWDPUTLgVgExjHtU75coFNyRf1rIcLqimIbFqMrOvjh31Gsvnbhjw1ewsVUSYp f34Q==
X-Gm-Message-State: AGi0Pua2WSqdmIokyqnDnuzobeYOTdtS5HltQYr1POIfxeZ2nR3wI2Uz m7M5MTGN76KY7vdR+/6M0L8UiMkvU6tdf0kwSJisde8vA3g=
X-Google-Smtp-Source: APiQypJ8POXWz4WRg9/KOEzVBIBIkn0lklBS7aibvCtEbdweh7p+LDrg++FrXSQAK3sFCptuLInmRFc4B50waUAEQJQ=
X-Received: by 2002:a1f:cd83:: with SMTP id d125mr12367241vkg.35.1589215230786; Mon, 11 May 2020 09:40:30 -0700 (PDT)
MIME-Version: 1.0
References: <alpine.DEB.2.20.2002141414420.25645@whs-18.cs.helsinki.fi> <CAK6E8=dZN5PnCt7QFoyGJUnSoCcg0XUKxyZCbJAM_2tVN7A8wQ@mail.gmail.com> <alpine.DEB.2.20.2004062342530.14114@whs-18.cs.helsinki.fi> <CAK6E8=eVjMACZSkUWmY9Z+fiid6cdfG310y4O8G32cEeXFAhFQ@mail.gmail.com> <alpine.DEB.2.20.2005091114100.17711@whs-18.cs.helsinki.fi>
In-Reply-To: <alpine.DEB.2.20.2005091114100.17711@whs-18.cs.helsinki.fi>
From: Yuchung Cheng <ycheng@google.com>
Date: Mon, 11 May 2020 09:39:54 -0700
Message-ID: <CAK6E8=dAvskPOg6-eNPUNmiUojcs81k+Fvu6cAfVgw8C_bOQbg@mail.gmail.com>
To: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Neal Cardwell <ncardwell@google.com>, Nandita Dukkipati <nanditad@google.com>, Priyaranjan Jha <priyarjha@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/YkJ9x-6NdKxkitM0MML72ui2UDc>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-rack-07
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2020 16:40:36 -0000

On Sat, May 9, 2020 at 1:30 AM Ilpo Järvinen
<ilpo.jarvinen@cs.helsinki.fi> wrote:
>
> On Fri, 8 May 2020, Yuchung Cheng wrote:
>
> > On Mon, Apr 6, 2020 at 2:01 PM Ilpo Järvinen
> > <ilpo.jarvinen@cs.helsinki.fi> wrote:
> > >
> > > Hi Yuchung,
> > >
> > > Certainly improved on many things, however, I'm surprised that the lost
> > > rexmit comment was not addressed at all. Aren't the rexmit losses
> > > supposed to be detected by RACK or have I misunderstood something?
> > >
> > > I don't understand how lost rexmits are detected by the algorithms in
> > > the draft. I think the algorithms are either flawed or lack something
> > > (mainly RACK_detect_loss is relevant I think). My original comment
> > > from below.
> > >
> > >        Is RACK_detect_loss() supposed to detect also lost rexmits? How?
> > >        The rexmit code itself is not in the draft so it's hard to evaluate
> > >        how the Packet booleans are supposed to work in the first place.
> > >        I'm not saying the rexmit code should be included but currently it
> > >        surely looks like any rexmitted packet just remains .lost = TRUE
> > >        and .retransmitted = TRUE state until RTO?
> >
> > Sorry to miss this point earlier -- RACK definitely detects
> > lost-retransmits, but the current algorithm does not properly define
> > some variables. Here is how I propose to address these by adding
> > these:
>
> Np and I was pleased to note already during the interim presentation
> that you'd addressed this.
>
> > Per-packet variables
> >
> >
> > Theses variables indicate the status of the most recent transmission
> > of a data packet:
> >         “Packet.lost” is true if the most recent (re)transmission of
> > the Packet has been considered lost and needs to be retransmitted.
> > False otherwise.
> >         “Packet.retransmitted” is tue if its sequence numbers were
> > retransmitted in the most recent transmission. False otherwise.
> >         “Packet.xmit_ts” is the time of the last transmission of a
> > data packet, including retransmissions, if any. The sender needs to
> > record the transmission time for each packet sent and not yet
> > acknowledged. The time MUST be stored at millisecond granularity or
> > finer.
> >
> >
> > Transmitting a data packet
> > Upon transmitting a new packet or retransmitting an old packet, record
> > the time in Packet.xmit_ts.
> >
> > --- src
> >
> > RACK_transmit_data(Packet):
> >
> > Packet.time_ts = Now()
> >
> > Packet.lost = FALSE
> >
> >
> > RACK_retransmit_data(Packet)
> >
> > Packet.retransmitted = TRUE
> >       RACK_transmit_data(Packet)
> >
> > ---
> >
> > When a packet is retransmitted again, its Packet.lost is reset and
> > transmission timestamp is renewed. Therefore, when its transmission
> > time falls out of the RACK reordering window in RACK_detect_loss(), it
> > will be marked as lost (again).
> >
> >
> >
> >
> > >
> > > Also my timeout "some" vs "all" internal inconsistency comment relates to
> > > this same algorithm:
> > >
> > >          "For timely loss detection, the sender MAY
> > >           install a "reordering settling" timer set to fire at the earliest
> > >           moment at which it is safe to conclude that some packet is lost.  The
> > >           earliest moment is the time it takes to expire the reordering window
> > >           of the earliest unacked packet in flight."
> > >        vs
> > >           RACK_detect_loss():
> > >              ...
> > >                  timeout = max(remaining, timeout)
> > >        Which way it is, "some" or "all"? The use of max() in RACK_detect_loss()
> > >        seems to indicate that the timeout will be set according to the longest
> > >        time, that is, the latest moment rather than earliest moments. Which
> > >        is the intented behavior here? (I can there might be merits in both
> > >        approaches).
> >
> > Comment 1.2: good catch. We changed to: 'For timely loss detection,
> > the sender is RECOMMENDED to install a "reordering settling" timer.
> > The timer expires at the earliest moment when it is safe to conclude
> > that all the unacknowledged packets within the reordering window are
> > lost.'
>
> I was under impression it's likely what you intented to propose right
> from the start. In that case this change seems not enough to address my
> concern, the algorithm seems to do exactly opposite on
>   timeout = max(remaining,  timeout)
> line, no? That is, if there is a moment further away in the future,
> the line selects it rather than "the earliest moment". This inconsistency
> between the text and algorithm is why I brought this up and I think the
> algorithm requires some small tweak to remove the inconsistency here.

The diff is not the "earliest moment" wording. it was "the earliest
moment of some packet" but it is now "the earliest moment of all
unacked packets"

Here is the old and proposed new text

...; but this
   risks a timeout (RTO) if no more ACKs come back (e.g, due to losses
   or application limit).  For timely loss detection, the sender MAY
   install a "reordering settling" timer set to fire at the earliest
   moment at which it is safe to conclude that some packet is lost.  The
   earliest moment is the time it takes to expire the reordering window
   of the earliest unacked packet in flight.

new text:

... but this risks a timeout (RTO) if no more ACKs come back (e.g, due
to losses or application-limited transmissions). For timely loss
detection, the sender is RECOMMENDED to install a reordering settling
timer. This timer expires at the earliest moment when it is safe to
conclude that all the unacknowledged packets within the reordering
window are lost.

so the keyword all would make the description match the pseudo code.

>
>
> --
>  i.
>
> >
> >
> > >
> > >
> > > Then there are some other comments I made which seem also unaddressed but
> > > I'll not duplicate them all here.
> > >
> > >
> > > --
> > >  i.
> > >
> > >
> > > On Mon, 9 Mar 2020, Yuchung Cheng wrote:
> > >
> > > > Hi Ilpo,
> > > > Thanks for a detailed review and the great suggestions. Here's all the edits
> > > > we made.
> > > > https://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-rack-08
> > > >
> > > >  We defined the missing terms and addressed the nits, removed the less
> > > > critical parts like recommending PRR, and edited TLP algorithms (sections
> > > > 7.4 to 7.6). I hope you find it's better now and would be happy to hear any
> > > > improvements you'd like to make.
> > > >
> > > > Yuchung
> > > >
> > > > On Fri, Feb 14, 2020 at 4:22 AM Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
> > > > wrote:
> > > >       In general, I find the draft quite good shape already, it's
> > > >       quite easy
> > > >       to read and follow (that is, except the TLP sections that are
> > > >       specifically
> > > >       mentioned below).
> > > >
> > > >       Here's the list of issues I came across:
> > > >
> > > >       Sect 6.1
> > > >
> > > >       "RACK.pkts_sacked" lacks the empty line separator.
> > > >
> > > >       Sect 7.2
> > > >
> > > >         "This
> > > >          heuristic would fail to update RACK stats if the packet is
> > > >       spuriously
> > > >          retransmitted because of a recent minimum RTT decrease (e.g.
> > > >       path
> > > >          change)."
> > > >       This defies my logic as RTT decrease should not result in
> > > >       spurious
> > > >       retransmissions but help to avoid them.
> > > >
> > > >
> > > >         "Note that
> > > >          extensions that require additional TCP features (e.g.  DSACK)
> > > >       would
> > > >          work if the feature functions simply return false."
> > > >       Please rephrase this. I suspect you meant that if some functions
> > > >       relate
> > > >       to TCP features that are not available, the given code still
> > > >       works
> > > >       as long as the functions related to such features return false
> > > >       when
> > > >       the TCP feature is not available but that surely isn't what the
> > > >       sentence currently says.
> > > >
> > > >       Sect 7.2 Step 5
> > > >
> > > >         "For timely loss detection, the sender MAY
> > > >          install a "reordering settling" timer set to fire at the
> > > >       earliest
> > > >          moment at which it is safe to conclude that some packet is
> > > >       lost.  The
> > > >          earliest moment is the time it takes to expire the reordering
> > > >       window
> > > >          of the earliest unacked packet in flight."
> > > >       vs
> > > >          RACK_detect_loss():
> > > >             ...
> > > >                 timeout = max(remaining, timeout)
> > > >       Which way it is, "some" or "all"? The use of max() in
> > > >       RACK_detect_lost()
> > > >       seems to indicate that the timeout will be set according to the
> > > >       longest
> > > >       time, that is, the latest moment rather than earliest moments.
> > > >       Which
> > > >       is the intented behavior here? (I can there might be merits in
> > > >       both
> > > >       approaches).
> > > >
> > > >
> > > >       In general, I didn't like the presentation order here as this
> > > >       form looks
> > > >       the most obvious to me:
> > > >          now >= Packet.xmit_ts + RACK.reo_wnd + (RACK.ack_ts -
> > > >       RACK.xmit_ts)
> > > >       ...and I find the derivation of it just complicating things
> > > >       rather
> > > >       than helping but that's perhaps just my opinion.
> > > >
> > > >
> > > >          "It is also useful when the timestamp clock granularity is
> > > >       close to
> > > >           or longer than the actual round trip time."
> > > >       Given that some things based on timestamps will obviously stop
> > > >       working
> > > >       if the clock is slower than one tick per RTT I found this
> > > >       fragment a bit
> > > >       odd. RFC 7323 says "the clock SHOULD tick at least once per
> > > >       window's worth
> > > >       of data". I can understand though that a host might find clock
> > > >       tick
> > > >       granularity challenging due to extensive RTT range but on the
> > > >       same
> > > >       I'd not want such a "too slow clock" behavior to be endorsed as
> > > >       if
> > > >       it would be just normal. That is, I'm not against RACK handling
> > > >       also this case nicely but it would be nice to somehow downplay
> > > >       the
> > > >       normalness tone in it with something along the lines of "This
> > > >       check is
> > > >       also useful if the recommended one tick per RTT cannot be
> > > >       achieved due
> > > >       to timestamp clock granularity."
> > > >
> > > >       Is RACK_detect_lost() supposed to detect also lost rexmits? How?
> > > >       The rexmit code itself is not in the draft so it's hard to
> > > >       evaluate
> > > >       how the Packet booleans are supposed to work in the first place.
> > > >       I'm not saying the rexmit code should be included but currently
> > > >       it
> > > >       surely looks like any rexmitted packet just remains .lost = TRUE
> > > >       and
> > > >       .retransmitted = TRUE state until RTO?
> > > >
> > > >
> > > >       Sect 7.4
> > > >
> > > >       "PTO" just appears out of nowhere (and is not even written open
> > > >       in the first instance).
> > > >
> > > >          "2. ... in step (2) of the algorithm."
> > > >       "step (2)" is forward-refing, it would naturally refer back to
> > > >       "Step 2: Update RACK stats" so it makes very little sense as is.
> > > >
> > > >       Sect 7.5
> > > >
> > > >       Separator missing for PTO: and TLPRxtOut:
> > > >
> > > >       Sect 7.5.1
> > > >
> > > >         "3.  Upon receiving a SACK that selectively acknowledges data
> > > >       that was
> > > >              last sent before the segment with SEG.SEQ=SND.UNA was
> > > >       last
> > > >              (re)transmitted"
> > > >       This is hard to understand, perhaps rephrase it somehow?
> > > >
> > > >
> > > >         "But choosing PTO to be
> > > >          exactly an SRTT is likely to generate spurious probes given
> > > >       that
> > > >          network delay variance and even end-system timings can easily
> > > >       push an
> > > >          ACK to be above an SRTT.  We chose PTO to be the next
> > > >       integral
> > > >          multiple of SRTT.
> > > >
> > > >          Similarly, network delay variations and end-system processing
> > > >          latencies and timer granularities can easily delay ACKs
> > > >       beyond
> > > >          2*SRTT, so senders SHOULD add at least 2ms to a computed PTO
> > > >       value
> > > >          (and MAY add more if the sending host OS timer granularity is
> > > >       more
> > > >          coarse than 1ms)."
> > > >       Does this later paragraph depend on some assumptions (such as
> > > >       very
> > > >       low RTT?) as I don't otherwise understand why you say what you
> > > >       say
> > > >       in the given order. First you said x & y can push ACK over SRTT
> > > >       so
> > > >       we pick 2* but then you say the same things can push over 2*SRTT
> > > >       too?
> > > >
> > > >       WCDelAckT to terminology?
> > > >
> > > >       Sect 7.5.2
> > > >
> > > >       TLP_send_probe():
> > > >               - Perhaps set TLPRxtOut somewhere?
> > > >               - Add "Arm RTO" to the end?
> > > >
> > > >         "In the case where
> > > >          there is only one original outstanding segment of data (N=1),
> > > >       the
> > > >          same logic (trivially) applies: an ACK for a single
> > > >       outstanding
> > > >          segment tells the sender the N-1=0 segments preceding that
> > > >       segment
> > > >          were lost.  Furthermore, whether there are N>1 or N=1
> > > >       outstanding
> > > >          segments,"
> > > >       If this distinction is useful to make at all, maybe just simply
> > > >       say:
> > > >       "Applies also in case N=1."
> > > >
> > > >       Sect 7.6
> > > >
> > > >         "SND.NXT at the time the episode started."
> > > >       Isn't this TLPHighRxt, why not use it here?
> > > >
> > > >       Sect 7.6.2
> > > >
> > > >         "Upon sending a TLP probe that is a retransmission, the sender
> > > >       sets
> > > >          TLPRxtOut to true and TLPHighRxt to SND.NXT."
> > > >       This should appear in TLP_send_probe().
> > > >
> > > >       "Detecting recoveries accomplished by loss probes"
> > > >       Did you want to start another section here?
> > > >
> > > >       "Step 1: Track ACKs ..." until what point? (It's told on later
> > > >       but
> > > >       this question comes into mind when reading the draft in order).
> > > >
> > > >
> > > >       Sect 7.4 - 7.6.2
> > > >
> > > >       In general, this particular part of the draft seems to require
> > > >       some
> > > >       further editing. After reading it all through, I feel most bits
> > > >       are
> > > >       present but the text is very hard to follow. I think the
> > > >       presentation
> > > >       order needs significant work. There are random bits here and
> > > >       there
> > > >       that are explained only later. Also, algorithm lacks some parts
> > > >       discussed only in the text but variable like items should
> > > >       definitely
> > > >       be included to the algorithm.
> > > >
> > > >
> > > >       Sect 8.4
> > > >
> > > >         "RACK is compatible with and does not interfere with the the
> > > >       standard
> > > >          RTO [RFC6298], RTO-restart [RFC7765], F-RTO [RFC5682] and
> > > >       Eifel
> > > >          algorithms [RFC3522]. ...
> > > >          It neither changes the RTO timer calculation nor detects
> > > >       spurious
> > > >          timeouts."
> > > >       For F-RTO, also send order is significant.
> > > >
> > > >       Sect 8.5
> > > >
> > > >       SHOULDs PRR (and also placed to normative references) that is
> > > >       only
> > > >       experimental and RACK aims to standards track. I feel that PRR
> > > >       is
> > > >       disjoint enough anyway, which I think was also your intention
> > > >       when
> > > >       maintaining the decoupling CC and RACK, that it's bit odd to
> > > >       take
> > > >       this strong stance on PRR (indepedent of the stds track vs
> > > >       experimental thing).
> > > >
> > > >         "due to congestion window validation."
> > > >       Does this refer to CWV as in RFC? Or some other mechanism in
> > > >       which
> > > >       case you shouldn't use the same term here.
> > > >
> > > >       I find the described scenario somewhat unconvincing becase
> > > >       the description ends with:
> > > >         "...
> > > >          The ending congestion window after recovery also impacts
> > > >          subsequent data transfer."
> > > >       ...First there wasn't enough data to fill cwnd of 20 packets and
> > > >       then suddenly transferring that non-existing data is impacted.
> > > >
> > > >       References
> > > >
> > > >       QUIC-LR is very old ref and even the filename has changed a few
> > > >       times from draft-tsvwg-...
> > > >
> > > >
> > > >       Nits
> > > >
> > > >       "runt" made the sentence incomprehendable for me, I had to look
> > > >       up
> > > >       that word (and only second source was descriptive enough that I
> > > >       think I could understand the meaning, and I'd guessed wrong
> > > >       because
> > > >       I thought there was some special intent here because of the
> > > >       quotes).
> > > >
> > > >       is dicussed -> is discussed
> > > >       RACT.RTT -> RACK.rtt
> > > >       First "TSO" -> TCP Segmentation Offload (TSO)
> > > >       grainularity -> granularity
> > > >       higher-sequenc -> higher-sequence
> > > >
> > > >
> > > >       --
> > > >        i.
> > > >
> > > >
> > > >
> >