Re: [LOOPS] Transport recovery and interaction with, link, and LOOPS loss recovery

Liyizhou <liyizhou@huawei.com> Tue, 28 July 2020 09:23 UTC

Return-Path: <liyizhou@huawei.com>
X-Original-To: loops@ietfa.amsl.com
Delivered-To: loops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0EDCF3A09A9 for <loops@ietfa.amsl.com>; Tue, 28 Jul 2020 02:23:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p1s8YaVBpSx3 for <loops@ietfa.amsl.com>; Tue, 28 Jul 2020 02:23:03 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AD7643A09A5 for <loops@ietf.org>; Tue, 28 Jul 2020 02:23:02 -0700 (PDT)
Received: from lhreml731-chm.china.huawei.com (unknown [172.18.7.108]) by Forcepoint Email with ESMTP id 6183047F5FE806662CA6 for <loops@ietf.org>; Tue, 28 Jul 2020 10:22:59 +0100 (IST)
Received: from nkgeml705-chm.china.huawei.com (10.98.57.154) by lhreml731-chm.china.huawei.com (10.201.108.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Tue, 28 Jul 2020 10:22:58 +0100
Received: from nkgeml707-chm.china.huawei.com (10.98.57.157) by nkgeml705-chm.china.huawei.com (10.98.57.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Tue, 28 Jul 2020 17:22:55 +0800
Received: from nkgeml707-chm.china.huawei.com ([10.98.57.157]) by nkgeml707-chm.china.huawei.com ([10.98.57.157]) with mapi id 15.01.1913.007; Tue, 28 Jul 2020 17:22:55 +0800
From: Liyizhou <liyizhou@huawei.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
CC: loops <loops@ietf.org>
Thread-Topic: [LOOPS] Transport recovery and interaction with, link, and LOOPS loss recovery
Thread-Index: AdZkwEljFe9neiBoQWmvZQdjub40MQ==
Date: Tue, 28 Jul 2020 09:22:55 +0000
Message-ID: <0e2fdf678d224426945bfb14dcf44242@huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.136.74.115]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/loops/ggTgy5IR-mlx4oQPjrPcdWwiIG0>
Subject: Re: [LOOPS] Transport recovery and interaction with, link, and LOOPS loss recovery
X-BeenThere: loops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Local Optimizations on Path Segments <loops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/loops>, <mailto:loops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/loops/>
List-Post: <mailto:loops@ietf.org>
List-Help: <mailto:loops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/loops>, <mailto:loops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jul 2020 09:23:06 -0000

Hi Gorry,

Thank you for taking time to read through it. 

Please find the inlines with [yz].

-----Original Message-----
From: LOOPS [mailto:loops-bounces@ietf.org] On Behalf Of Gorry Fairhurst
Sent: Monday, July 27, 2020 11:22 PM
To: loops <loops@ietf.org>
Subject: [LOOPS] Transport recovery and interaction with, link, and LOOPS loss recovery

Thank you for taking the trouble to update draft-li-tsvwg-loops-problem-opportunities, the new text does help in many places, and I'm sorry not to have responded earlier to this new text.  I am still not clear yet what is antcipated from the transport congestion-control response. Please see comments below:

(1) Moving figure 1 seems helpful to introduce the context. Thanlks.

(2) In Section 6: I read section 6 as describing two recovery cases: One is about DupACK-based recovery. The other says it is about RACK recovery, I wonder if this is mainly the case where the dupACK, where I think this is about TLP (which uses RACKL) to short-cut the RTO by trying to recover after ~2RTTs?

[yz] Section 6 intended to explain when e2e RACK is in use, e2e retransmission is less likely to compete with local retransmission. 

Flows are aggregated into LOOPS enabled tunnel so tail processing for aggregated traffic is rare. Then LOOPS’ loss detection and recovery usually takes one more local_SRTT. 

PTO = 2*SRTT  [RACK]

An ACK for locally retransmitted packet takes (SRTT + local_SRTT) to reach sender.
The recovery will be faster than RACK-TLP by (2*SRTT - local_SRTT ) and at the same time the sender won’t get PTO expired.

I think the following two comments (3) & (4) are about RACK-TLP too. 

Basically LOOPS leverages the features that the aggregated flows has no "tails loss" compared with individual flow and the shorter delay over the segment to recover tail loss of single flow faster. It would make sender trigger RACK-TLP PTO less. 

[/yz]

(3) I think the text suggests these might have different implications on LOOPS, which seems reasonable, although I have questions:

“If LOOPS network does not buffer the out-of-order packets caused
       by packet loss, TCP sender which uses a time based loss detection
       like RACK [I-D.ietf-tcpm-rack] will perform well here.  It uses
       the notion of time to replace the conventional DUPACK threshold
       approach to detect losses.  Hence it prevents the TCP sender from
       invoking fast retransmit too early. “

(4) In Section 4.1:

“When a sender does not receive an ACK for a given segment within
    a certain amount of time called retransmission timeout (RTO), it re-
    sends the segment [RFC6298].  RTO can be as long as several seconds.”
…
“Even when
    the lost packet is not an exact tail, it can possibly add another RTT
    because there may not be enough packets in flight to trigger the fast
    retransmit).”

- I think this text argues simply for TLP?

- How would the method imporve performance, if TLP were to be widely deployed, since it's been promoted for TCP and specified for QUIC?


(5) Also, maybe I misunderstand, can’t RACK trigger retransmission earlier than timeout for a TLP - which the start of the ID said was an important use-case?

[yz] Sorry, I did not quite get it. I think TLP is for tail loss retransmission while RACK trigger retransmission is for non-tail loss retransmission. One cannot replace the other? Only PTO will fire at the sender if the lost packet is a tail?


“Local retransmission will not
       interfere the sender's retransmission generally in this case.  If
       time based loss detection is not supported at the sender, end to
       end retransmission may be invoked as usual.  It consumes extra
       bandwidth Because the lost packets (i.e. recovered packet) is
       normally a very small percentage of the total packets.  Then extra
       bandwidth cost is not significant.”
- I don’t really understand this argument. Is this saying there will be few losses, and hence the additional capacity used by retransmission is small. If that was my correct interpretation, how does LOOPS know there are few losses? Does it measure this and rate-control it’s retransmissions to ensure this, or maybe some other method is used to avoid there being significant load from two levels of retransmission?
  
[yz]
RACK has adaptive re-ordering window algorithm to overcome packet re-ordering caused by wireless link layer retransmission or load balancing (section 2 of draft-ietf-tcpm-rack-09). So it works with LOOPS better than traditional dupack based loss detection by invoking less unnecessary e2e retransmission. So the competition of retransmissions is minimum.
The text of (3) above tried to elaborate on this.

LOOPS focuses on the short flows or transactional flows. If a loss recovered by LOOPS is a tail loss for an individual flow, the flow sender would not start to retransmit in most cases since its RTO (or PTO) does not fire. Hence e2e retransmission does not compete with local retransmission in such case. This is the majority of LOOPS recovery case (earlier TLP draft draft-dukkipati-tcpm-tcp-loss-probe-01 said 70% & 46% are RTO based retransmissions for different applications). 

For the rest, there may be e2e and local retransmission at the same time.
It is expected that some ways to ensure the limited local retransmissions. For instance, make the local retransmission non-persistent (i.e. retransmit once only), set a threshold for a retransmission percentage, limit the buffer at LOOPS ingress to hold local RTT equivalent of packets etc. 
[/yz]    


(6) Is some sort of check needed anyway perhaps? Because LOOPS could potentially work over a path with L2 retransmissions (e.g. WiFi, and any loops retransmissions multiply the number of L2 retransmissions?
- Can LOOPS be tunnelled over LOOPS in any cases?

[yz] Sounds like LOOPS and L2 retransmission could be complementary. But cross layer information passing is always a problem. 

(7) I now think I might understand the scope of the proposed work is constrained only to cases where ECN has been deployed and is enabled end-to-end. In Section 5.2:


“LOOPS can CE(Congestion Experienced) marks
    its recovered packets as the loss signal to end-to-end. Converting a
    packet loss signal to CE marking signal brings the benefits of
    reducing Head-of-Line blocking and probability of RTO expiry
    [RFC8087] without affecting TCP sender's loss based congestion
    control behaviour while enjoying the faster local recovery. ECN
    based indication is equivalent to a loss event at the TCP sender
    [RFC3168].”
…
“ In this way, a requirement is set for applying LOOPS.”

I still think this would benefit from being clearer at the start of the document, because only ECT (ECN-Capable Transport) flows should be directed to an LOOPS enabled path segment, as I think it says?

[yz] Yes, that's the current scope saying.

Best wishes,

Gorry

--
LOOPS mailing list
LOOPS@ietf.org
https://www.ietf.org/mailman/listinfo/loops