Re: [tcpm] draft-ietf-tcpm-rack-05 review

Yuchung Cheng <ycheng@google.com> Wed, 11 September 2019 19:17 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1740D120800 for <tcpm@ietfa.amsl.com>; Wed, 11 Sep 2019 12:17:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.498
X-Spam-Level:
X-Spam-Status: No, score=-17.498 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_IMAGE_RATIO_04=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dCucBqLnHhxN for <tcpm@ietfa.amsl.com>; Wed, 11 Sep 2019 12:17:29 -0700 (PDT)
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 06E8C12006E for <tcpm@ietf.org>; Wed, 11 Sep 2019 12:17:27 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id i1so25258477wro.4 for <tcpm@ietf.org>; Wed, 11 Sep 2019 12:17:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k9swo0rXvKx8GssVIVU/ZTDEGE5f0G7lDbwBhjWIXjA=; b=TN6euIMpf1RW9t6KXcjcnarsxXp973qBL7sBeC/14GlAbu+6BICgGZMZ2kG9m1c99m JT1tuWO2tbRjV2H28R6AzsGazMjn3wO6t1zgLAet0aVDmNZqRtAbnI6HdKuqEwMFRbPw EDtq2xgVycQo3qjkguqrNB5wVKpeowUBKE/gGrP5zDFfLAJG37/7eQ/aPFlz4vn9mnaW ksFC9TaFzPviXvSU/Dk4/xOrH4tV7RV6kFiJp01hrst3eR1jqdwGrGY5HtMelWECh5Ij wgCKo9/jvviWXp0bBALcbkKOfkRTnfNHNuSRMMcjIQ9Isn8yBjYpwknBmPapgi69+fEP M0FA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k9swo0rXvKx8GssVIVU/ZTDEGE5f0G7lDbwBhjWIXjA=; b=QNyic2RnBx4lBR3WicynKdna0QA/DbX3S6lJTKO9bKdv2x6iDiYcSD7Vse3SwUTXza Ts5z1lQG+Kxt8yYUgWusD2yGiKY0Y4kPdcbB4kKT+Jo0luZ99+9wUqsS2hj4RPLEPBD0 H0q/2mSgXl7C+laL0v8Ry1H74jprBaYLJUsUcXLLAXlbah5kChBkdEAt5yb6U23pjaQO Aryw/8fXjM5TJZeSbvIfZ8F36fKVAhaS9QByifC5We6Jfr3njDUtt2+x2fJ7ykyQwr7p /y9N8TGm2a/ZjIQ9aM1ACshscEgLDOIU70EXr3hcsD+SJ1iRVLu7ho4QmpMLB2otshmb UyDw==
X-Gm-Message-State: APjAAAUOCA4B3YMn8QvAvrTr1JJZsD0DTwVXbMNvDyq5fwGTk4n3Rh93 yReT9ljHePh7YfErbHStmTYkialngTW1OSRHT9FGtA==
X-Google-Smtp-Source: APXvYqzfsa54lXFdBp26pJ5nnL1Ievy6TZlm3z2mESL4RJsoSZ2FqK81c6NWQbqofknX0Ladfr+sdRlVov3xTxlX/F8=
X-Received: by 2002:a5d:5049:: with SMTP id h9mr4464429wrt.101.1568229445269; Wed, 11 Sep 2019 12:17:25 -0700 (PDT)
MIME-Version: 1.0
References: <CAM4esxQR5zeHC0g0MmCG3iF2js_2BU6+tdwCKi4ZiGFYMr5MRg@mail.gmail.com> <CAK6E8=f2fhOk_-_zq=cj+Bh13kGdVUsZNi+FvYcjtnbJdLDXbg@mail.gmail.com> <CAM4esxTj1A226PP1wNcdxwb_aXQemp-qX_7CpKYcjs5qyJH+Qg@mail.gmail.com> <84fac75f-07ea-e3a1-b9cd-33ffc8a6d454@isae-supaero.fr> <CAK6E8=dDJOhsdYqhdhnKmKktgygCn9+LLhA79=XO9W9+WK121g@mail.gmail.com> <67a9baf6-76bc-29ef-0523-4571ae62c236@isae-supaero.fr>
In-Reply-To: <67a9baf6-76bc-29ef-0523-4571ae62c236@isae-supaero.fr>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 11 Sep 2019 12:16:47 -0700
Message-ID: <CAK6E8=ctkhc1_1ks2uRjAAj0gErJm+_ZeotiL3fW0hFPD9TT_A@mail.gmail.com>
To: Emmanuel Lochin <emmanuel.lochin@isae-supaero.fr>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Kuhn Nicolas <Nicolas.Kuhn@cnes.fr>, Priyaranjan Jha <priyarjha@google.com>, Neal Cardwell <ncardwell@google.com>
Content-Type: multipart/related; boundary="000000000000042d3105924bdf1a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/teuPWZCM12urFXMpE2XFnqVx5Oc>
X-Mailman-Approved-At: Thu, 12 Sep 2019 08:06:56 -0700
Subject: Re: [tcpm] draft-ietf-tcpm-rack-05 review
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Sep 2019 19:17:34 -0000

Hi Emmanuel,

Sorry for the late reply and thanks for testing RACK. Some questions:
1) what's the congestion control? I assume it's the default (i.e. Cubic)

2) what's the value of sysctl net.ipv4.tcp_no_metrics_save? I assume it's
default 0 meaning TCP save some metrics to be reused by future flows
towards same IP

3) what's the bandwidth of the bottleneck, or is BDP generally beyond the
5000 packets in your test

If my assumptions above are correct, I speculate that the performance gap
is caused by spurious early exit TCP Cubic slow-start. This is caused by
RACK not catching the some reordering well initially with its init RTT/4
reordering window. Many RACK parameters are tuned based on Google's traffic
where nominal RTT is much lower than 500ms.

In contrast, the Linux dupack implementation (referred by 3DUPACK in your
graph) has two features:
a) it dynamically extends the dupthresh to a maximum of 300 packets (sysctl
net.ipv4.tcp_max_reordering), hence makes TCP very resilient if only OOO is
reordering not drops.
b) the high dupthresh is cached and reused for new connection toward the
same dst ip by (2).
Thus it's possible all subsequent flows in one test are starting with
dupthresh to the highest possible, making it bullet-proof to any reordering
by never entering fast recovery falsely.

To check my theory, could you provide some tcpdump of each test (RACK vs
non-RACK)? only header captures are needed.



On Sat, Sep 7, 2019 at 4:03 AM Emmanuel Lochin <
emmanuel.lochin@isae-supaero.fr> wrote:

> Hi Yuchung,
>
> Thanks for your answer, I've asked this question as we observed a decrease
> of the performance of TCP transfer in some reordering scenario when RACK is
> enabled. In brief, the problem has been solved by disabling RACK.
> I am involved in a CNES project (French space agency) where SATCOM systems
> are used (GEO or LEO) and in particular I work with Nicolas Kuhn from CNES
> (in copy of this email) .
>
> To assess whether or not the problem effectively came from RACK, we have
> started a bunch of experiments using Mininet or simply using the loopback
> interface. The results presented below are done with Mininet except the
> last one done in localhost
>
> The x-axis reports the size of the flows, basically we send 5 TCP flows in
> the same time of a given size and we repeat the experiments hundred of
> times over a Netem queue. We then compute the following statistics over
> these 500 flows for each flow length.
>
> There is no artificial losses introduced with Netem, we only use it only
> to reorder packets. The nominal RTT is 500ms (equiv. to a GEO system) where
> some packets are reordered every 5, 10, 15 or 20 packets sent. These
> reordered packets have then an RTT=250ms. As you can see, in all cases when
> RACK is enabled, the completion time (time difference between the first
> packet received at destination and the last one) is always worse with RACK.
> The kernel is 4.15.0-60-generic.
>
>
>
> We thought, may be, it was due to the reordering pattern which is too
> deterministic. We thus applied several other random patterns.
> Below we present the Netem config documented in the TC manpage for
> simplicity (i.e. reorder 25% 50% gap 5 and 10)
>
>
> But results obtained where the same.
>
> In all these previous experiments the reorder delay is 50% of the RTT, and
> indeed, the RTT is important.
> If we decrease it to 100ms, when the reorder delay corresponds to 90% of
> the RTT (meaning that RTT=100ms and reordered packets get 90ms), in that
> case, RACK performs much more better as shown below.
>
>
>
>
> Using the loopback interface, even with low RTT (i.e. 100ms), and when the
> reordering delay is half the RTT (50ms), RACK still does not perform always
> better than 3DUPACK.
>
>
> We really believe that RACK is a good idea, but it seems that the kernel
> implementation does not lead to what we expected. Do you have experienced
> such behavior or do you have measurements that thwart mine ?
>
> Thanks
>
> Emmanuel
>
>
>
> Le 07/09/2019 à 00:32, Yuchung Cheng a écrit :
>
> Hi Emmanuel,
>
> Yes setting the sysctl to 0 will completely disable RACK. Although by
> default it supports DUPTHRESH-based detection (as a simple extension)
> beside time-based detection. You can disable that extension w/ 0x4. But if
> you want absolutely old-style pure DUPTHRESH-based then you can set that to
> 0.
>
>
>
> On Fri, Sep 6, 2019 at 3:24 AM Emmanuel Lochin <
> emmanuel.lochin@isae-supaero.fr> wrote:
>
>> Hi Yuchung, all,
>>
>> I have a question concerning the GNU/Linux kernel implementation of RACK.
>> There are 3 sysctl options to enable RACK (0x1 0x2 and 0x4) but none to
>> disable it (or may be 0x0 is not documented). As tcp_input.c implements
>> this boolean function:
>>
>> static bool tcp_is_rack(const struct sock *sk)
>> {
>>         return sock_net(sk)->ipv4.sysctl_tcp_recovery &
>> TCP_RACK_LOSS_DETECTION;
>> }
>>
>> can I consider that I can disable RACK by typing sysctl -w
>> net.ipv4.tcp_recovery=0 ?
>> I ask this question because I've seen several update done inside the
>> kernel to implement it and I want to be sure that I can really go back to
>> 3DUPACK scheme this way.
>>
>> Cheers,
>>
>> Emmanuel
>>
>> Le 05/09/2019 à 04:44, Martin Duke a écrit :
>>
>> I think the proposed text is fine.
>>
>> On Wed, Sep 4, 2019 at 5:55 PM Yuchung Cheng <ycheng@google.com> wrote:
>>
>>> On Tue, Sep 3, 2019 at 3:31 PM Martin Duke <martin.h.duke@gmail.com>
>>> wrote:
>>>
>>> <snip>
>>
>>
>>> > (3) sec 7.3
>>> >
>>> > We have evaluated using the smoothed RTT (SRTT from
>>> >    [RFC6298] RTT estimation) or the most recently measured RTT
>>> >    (RACK.rtt) using an experiment similar to that in the Performance
>>> >    Evaluation section.  They do not make any significant difference in
>>> >    terms of total recovery latency.
>>> >
>>> > If there is truly no difference, then why not use SRTT as the standard?
>>> > Every TCP implementation has to store this, while min_rtt is unneeded
>>> for many (most?) congestion controls.
>>> >
>>> > Alternatively, you could strengthen this paragraph to not sound like
>>> it makes no difference..
>>> That's a good point -- our experiment at Google servers indeed didn't
>>> show much difference. But I think it's still better to use min_RTT
>>> than SRTT. On buffer-bloat friendly C.C. the SRTT could be orders of
>>> magnitude longer than the actual paths' RTT -- in my opinion factoring
>>> this network queuing delay in reordering window is not a good idea. So
>>> how about adding
>>>
>>> 'While the experiment does not show a difference between min RTT and
>>> SRTT, SRTT is less desirable to size the reordering window as it
>>> includes network congestion or delayed ACKs effects."
>>>
>>>
>>>
>> _______________________________________________
>> tcpm mailing listtcpm@ietf.orghttps://www.ietf.org/mailman/listinfo/tcpm
>>
>>
>> --
>> Emmanuel LOCHIN
>> Professeur ISAE
>> ISAE SUPAERO - Institut Supérieur de l'Aéronautique et de l'Espace
>> 10 avenue Edouard Belin - BP 54032 - 31055 TOULOUSE CEDEX 4 FRANCE -http://www.isae-supaero.fr
>> Tel +33 5 61 33 84 85 <+33%205%2061%2033%2084%2085> - Fax (+33) 5 61 33 83 45 <+33%205%2061%2033%2083%2045>
>> Web : http://personnel.isae.fr/emmanuel-lochin/
>>
>>
> --
> Emmanuel LOCHIN
> Professeur ISAE
> ISAE SUPAERO - Institut Supérieur de l'Aéronautique et de l'Espace
> 10 avenue Edouard Belin - BP 54032 - 31055 TOULOUSE CEDEX 4 FRANCE -http://www.isae-supaero.fr
> Tel +33 5 61 33 84 85 <+33%205%2061%2033%2084%2085> - Fax (+33) 5 61 33 83 45 <+33%205%2061%2033%2083%2045>
> Web : http://personnel.isae.fr/emmanuel-lochin/
>
>