Re: [tcpm] A review for draft-ietf-tcpm-rack-09

Yuchung Cheng <ycheng@google.com> Thu, 20 August 2020 01:18 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CDE13A0FF7 for <tcpm@ietfa.amsl.com>; Wed, 19 Aug 2020 18:18:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.589
X-Spam-Level:
X-Spam-Status: No, score=-17.589 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dUHiZ5oc4StJ for <tcpm@ietfa.amsl.com>; Wed, 19 Aug 2020 18:18:37 -0700 (PDT)
Received: from mail-ua1-x92f.google.com (mail-ua1-x92f.google.com [IPv6:2607:f8b0:4864:20::92f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7C6223A0FF9 for <tcpm@ietf.org>; Wed, 19 Aug 2020 18:18:37 -0700 (PDT)
Received: by mail-ua1-x92f.google.com with SMTP id v20so102154ual.4 for <tcpm@ietf.org>; Wed, 19 Aug 2020 18:18:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Gm7phkrmcuYb6qJI3eenQ2VMs2k4wD9kaWG1Vp+jd0g=; b=ua+QvMhsRFNQQxxmPQCuxv4zofXibZEth9cppPNdSyGvdKW5cQ0SihxTMpRzZWbknE S1djzKDSh+ww4qcTOjItq+JNMESQPBa0WVNzBRjjv6A3CN0w8DKg0UtJq+yeGWEtj5wY vVgoOO0Dn6GwGKQurzi9WO6ws3eolRW5NOKEABoXG0WoYlNluaVq1j4wsza2rEGfrePf j1KYMfv6vKZAy+iZUV+yop3zScOaoXXATiJCG52wzMzlPxl7hsG2NoQFAD/TUAOew9bW pPq7rE0Z4D7MPqyPdQXpehwFduorT2uPcT7bk2ogZxVNPWlokKdD1YWjwwpaHBeY78zl BUsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Gm7phkrmcuYb6qJI3eenQ2VMs2k4wD9kaWG1Vp+jd0g=; b=rV8gS/pB64zQeT6i4FWh+Q90k/2zG2/XihnM7kNpjivdc7Eqro3+pzz7DUueKDokRo Mk8/9B4ksT6at6VDHowo9/UfDNJFvxDUxDDH7HQHrDA3x+69u0kELi0eTL7d1kKlSaMb EwBkxnVcV6qDhv+OSFkMeNZeLdLIhLwM7PwXr3cCOGGyvoi7MtR1hgR5NnzQjnieOv4x WKC/3K5rQ6qC17Fd6Gz6evPYcnE2ccFogB+CDC8CQmVH7yXSbQ2V/KLN6JEWUHVCQKAn 739GeugYw15F/SpEIEuWzxqhf2kuYHuSWgDw8o69NIO/y6txXx/R44ozOTjxMQrdMN1O PySw==
X-Gm-Message-State: AOAM53034jfzJ/Fy5nzeQJ4yFVpq6sdwvbw4mfVwF1Hd00aWE+2KuJ/Z SeSWXKqZB6Ea2fGBs6nwcRNC/IgjWeA/yCzfb3b2oFGGyiLolA==
X-Google-Smtp-Source: ABdhPJxlOqcX6leYNtkpTCGokI2GtTGGNFsgS/q9p17R9THXh0GGYFZYscFSxbzmXPMDtdOqWw9Q3yqtEQnFZZcS9SY=
X-Received: by 2002:ab0:4828:: with SMTP id b37mr265797uad.83.1597886315999; Wed, 19 Aug 2020 18:18:35 -0700 (PDT)
MIME-Version: 1.0
References: <CAK6E8=d512Uvz-m37pkSg5Zxoq7Unvsf9rE8c2Kz0D-O4eQjng@mail.gmail.com> <C8CD0CD8-1364-4387-93AE-D9C2C6F7FF72@erg.abdn.ac.uk> <CAK6E8=duEmVz-m0K=f3ZUAu5BL3pnH1ENCo-VjZ+7zzuuHcSog@mail.gmail.com> <227bfb55-a2b4-37f0-9802-ddc76b22fa91@erg.abdn.ac.uk> <CAK6E8=emWx=OXc6OWbGsnsq2raAadd3QnY1GhtYgm-kBJJnVZw@mail.gmail.com> <da2a131d-a25d-5299-da65-9d4c5db3c53f@erg.abdn.ac.uk>
In-Reply-To: <da2a131d-a25d-5299-da65-9d4c5db3c53f@erg.abdn.ac.uk>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 19 Aug 2020 18:17:59 -0700
Message-ID: <CAK6E8=eiJ35b_jnf19+rEvCOuw1YeHCAkGFH2zjy+aQCb7uH6g@mail.gmail.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Cc: tcpm IETF list <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000422d4b05ad44e68e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/mwVescAp9ue7enYI2dvgeCoJorU>
Subject: Re: [tcpm] A review for draft-ietf-tcpm-rack-09
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Aug 2020 01:18:41 -0000

On Wed, Aug 19, 2020 at 12:27 AM Gorry Fairhurst <gorry@erg.abdn.ac.uk>
wrote:

> See below.
> On 18/08/2020 21:28, Yuchung Cheng wrote:
>
>
>
> On Tue, Aug 18, 2020 at 6:42 AM Gorry Fairhurst <gorry@erg.abdn.ac.uk>
> wrote:
>
>> I have read draft-ietf-tcpm-rack-09 in preparation for TCPM to publish
>> this document, and do have some comments. I note that this new rev is
>> much easier to read this time, so thanks for the significant work to
>> produce a clear spec, and I see also that you have already addressed my
>> major concerns - thanks. (I'll separately catch a set of minor editorial
>> notes on this rev).
>>
> Thank you for the review!
>
>
>>
>> (1) I don’t understand this clause:
>> “  3.  The RACK reordering window SHOULD leverage that to adaptively
>> estimate the duration of reordering events, if the receiver uses
>> Duplicate Selective Acknowledgement (DSACK) [RFC2883].”
>> - What does “leverage that” actually mean?
>> - Might it mean something like: can be increased if the sender receives
>> a Duplicate Selective Acknowledgement (DSACK) [RFC2883], that suggests
>> the window is too small, and has resulted in spurious retransmission." …
>> or does it mean something different?
>>
> It means exactly what you said :-)
>
> Here is the latest rev based on Theresa' suggestion:
> "The RACK reordering window SHOULD adaptively increase if the sender
> receives the Duplicate Selective Acknowledgement (DSACK) [RFC2883],
> suggesting the window is too small which causes spurious retransmission."
>
> Let me know if that's better?
>
> WFM.
>
>
>
>
>>
>> (2) I did not understand this part in Section 3.3:
>> “However, the fact that the initial reordering window is low, and the
>>     reordering window's adaptive growth is bounded, means that there will
>>     continue to be a cost to reordering to disincentivize excessive
>>     network reordering over highly disjoint paths.  For such networks
>>     there are good alternative solutions, such as MPTCP.“
>> - Is this intended to read as /reordering that disincentivizes excessive/
>> - If that was intended, the spec appears to be for a single path, MPTCP
>> would still view this path with reordering as a single path, so I do not
>> understand how MPTCP helps unless the awareness of the disjoint paths is
>> somehow known at the endpoints? Please explain.
>>
>> - (You’d need a REF for MPTCP if you keep this senetnce and the
>> explanation).
>>
> Yes that's the intention. Will removing the last sentence (about MPTCP)
> help?
>
>
> That would be an easy fix that would WFM.
>
>
>> (3)
>>   Question in Section 7.4.2.
>> /If the TLP
>>     sender does not receive such an indication, then it SHOULD assume
>>     that either the original data segment or the TLP retransmission were
>>     lost, for congestion control purposes./
>> - Why is this not a MUST?
>> - Under what conditions would it be safe to ignore a SHOULD?
>>
> We use SHOULD because the ACK could be lost instead of the data, so if the
> sender has some mechanism to detect ACK losses well, it may be safe not to
> assume data was lost.
>
> But I agree this is over-thinking the corner cases so how about
>
> If the TLP sender does not receive such an indication, then it *MUST*
> assume that either the original data segment, or the TLP retransmission
> were lost*, or their ACKs* are lost for congestion control purposes.
>
>
> I think the new text reads as safer to me.
>
>
>> (4) in Section 6.2. I was also left with a potential concern about
>> “min_RTT” with respect to “a simple global minimum of all RTT
>> measurements from the connection”.  This method results in a significant
>> path change from low to high RTT exhibiting an invalid RTT. Even a
>> simple windowed min-filtered estimate - mentioned here - would avoid
>> this effect on reordering detection.
>>
>> - I’d prefer adding a short sentence  so people know why the updates to
>> min_RTT might be useful.
>>
> Good point. (some) windowed-filter is definitely prefered. how about:
>
> The sender SHOULD track a windowed min-filtered estimate of recent RTT
> measurements to adapt migrating to significant longer paths, compared to a
> simple global minimum of all RTT measurements.
>
>
> I like that proposed text. I don't think teh method has to be
> sophisticated, but some method seems the right thing to do.
>
>
>> (5) In section 8.1, I expected more discussion of the potential
>> disadvantages (even if the authors seem to suggest these are not
>> significant) :
>>
>> * I still was left without understanding whether there is an impact from
>> a path with a varying RTT (perhaps one that uses a link layer
>> retransmission or access technology). My feeling is that if there is
>> excessive variation the method could go wrong when the RTT increases,
>> but I wonder if this is usually captured by inflating the SRTT and that
>> more excessive variation is not common. I think this case should be
>> discussed briefly.
>>
> AFAIU the scenario you are referring is when the link-layer retransmitted
> packets took a much longer > current SRTT to be delivered.
>
> Yes.
>
> Then RACK reordering window (and its SRTT-bound) requires DSACK and new
> RTT measurement to raise to capture this variation. Depending on the RTT
> variation, it may take multiple round trips to adapt to. The idea is for
> short flows, RACK may not be able to cope with it (and we welcome any
> simple idea to improve that).
>
> Understood.
>
> For long flows, RACK should eventually adapt to these high varying RTT.
> Hence we've tested RACK with high varying degree of reordering (in time) in
> synthetic benchmarks. In production experiments, we have not found such
> high variation too common tho: the wireless radio does not alter its
> transmission rate too often or change its retry count dynamically. But of
> course, we try to avoid any mention of "empirical case frequency" :-)
>
> It that addresses your question, we can add some words in the reordering
> design rationale about this.
>
> I think a few words would help - just so that if the corner case is
> encountered people understand there was a tradeoff.
>

Sure -- how about adding this right after the reordering window rules in
the 'reordering window adaptation section'

...
2. The RACK reordering window SHOULD adaptively increase if the sender
receives the Duplicate Selective Acknowledgement (DSACK) [RFC2883],
suggesting the window is too small which causes spurious retransmission.

3. The RACK reordering window MUST be bounded and this bound SHOULD be SRTT.

<new> *The rule 2 and 3 combined are required to adapt reordering caused by
extended link layer recovery time described earlier. Then the reordering
window (and its SRTT-bound) requires DSACK and new RTT measurement to
increase. Depending on the RTT variation, it may take multiple round trips
to adapt. *</new>For short flows, the low initial reordering window is key
to recover quickly by risking spurious retransmissions. The rationale is
that spurious retransmissions for short flows are not expected to produce
excessive network traffic additionally. For long flows the design tolerates
reordering within a round trip. This handles reordering caused by path
divergence in small time scales (reordering within the round-trip time of
the shortest path).


>
>> * As I read it, RACK can also send more retransmissions after there has
>> been loss, where it is making spurious retransmissions - from timer
>> events, or as a result of multiple segments from a single TSO segment
>> being retransmitted. While it seems to be argued these are not
>> significant, it could none the less a disadvantage in some scenarios,
>> and should be captured in section 8.1.
>>
> The first case spurious RACK timer expiration is possible. So we can
> append to the end of 8.1 with
> "Another disadvantage is the reordering timer may expire prematurely (like
> any other retransmission timer) to cause higher spurious retransmission
> especially if DSACK is not supported".
>
> That would be fine for me.
>
> However RACK should not cause extra spurious retransmission with TSO (vs
> non-TSO) as a loss detection algorithm. Perhaps some text in TSO was not
> clear on this?
>
>
>>
>> * Also there seem cases where RACK allows TCP to continue to increase
>> the congestion window upon receiving ACKs after loss, making the sender
>> more aggressive. This is noted in 8.3, but I think should be listed in
>> the disadvantages in 8.1.
>>
> Section 8.3 clearly states RACK can make C.C. more aggressive.
> Disadvantage or not depends on how one looks at  it. IMO really beyond the
> scope of this doc.
>
>
> :-)
>
> Indeed, how you view this will depend on what is regarded as best. I think
> it's kind of obvious there might be CC implications, maybe there is no need
> to discuss that.
>
>
>> (6) Section 8.4 talks about ACK loss or a delayed ACK without a DSACK,
>> but does not mention Stretch ACKs, which have also been experienced, and
>> need some form of mention!
>>
> There are two kinds of stretched ACKs I knew of: (a) ACKs delayed by the
> receiver in hope to accumulate more.  (b) ACKs "compressed" or decimated by
> the receiver or middle-boxes.
>
> It's the former (a) that affects RACK so we use delayed ACK in the general
> delayed stretched ACK form.
>
>
> So, on this I think any stretch ACK caused by ACK delay, loss in the
> network, or intentional dropping has the same effect of extending the
> period it takes for the endpoint to observe the ACK. The typical stretch
> ACK intervals seem to me to be a few packets, so I am not sure typically
> this creates an issue for RACK, but if ACK delay is mentioned I don't
> understand why stretch ACKs from other cases are also not equally mentioned?
>
ok thanks for clarification. I propose we change to

Delayed *or stretched ACKs *complicate the detection of repairs done by
TLP, since with such ACKs the sender *takes longer time to receive fewer
ACKs *than would normally be expected.



>> (7) Why isn't TCP-NCR (RFC4653) at least mentioned and discussed in the
>> intro?
>>
> It is briefly mentioned in
> https://tools.ietf.org/html/draft-ietf-tcpm-rack-09#section-2.2 and
> discussed a bit more in
> https://tools.ietf.org/html/draft-ietf-tcpm-rack-09#section-8.2
>
> The existing discussion is fine (thanks), I did not see the RFC listed in
> the References!
>
Thanks for catching that. Will fix it. and double check if there're missing
refs. Probably another bug in our google-doc to rfc xml conversion script...

>
>> Gorry
>>
>>
>> Best wishes,
>
> Gorry
>