Re: [tcpPrague] [tcpm] Fwd: I-D Action: draft-bagnulo-tcpm-tcp-low-rtt-00.txt

Yuchung Cheng <ycheng@google.com> Mon, 15 August 2016 16:50 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpprague@ietfa.amsl.com
Delivered-To: tcpprague@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA08512D0DC for <tcpprague@ietfa.amsl.com>; Mon, 15 Aug 2016 09:50:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.948
X-Spam-Level:
X-Spam-Status: No, score=-3.948 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hK3yEiQLcs2t for <tcpprague@ietfa.amsl.com>; Mon, 15 Aug 2016 09:50:53 -0700 (PDT)
Received: from mail-it0-x232.google.com (mail-it0-x232.google.com [IPv6:2607:f8b0:4001:c0b::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5C26612D8B3 for <tcpPrague@ietf.org>; Mon, 15 Aug 2016 09:49:54 -0700 (PDT)
Received: by mail-it0-x232.google.com with SMTP id f6so47281157ith.0 for <tcpPrague@ietf.org>; Mon, 15 Aug 2016 09:49:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=jFbq+qmd924+GMTxXpIpWeGG/lrHFOqdk12Yn0hq4YE=; b=Buub3y2ihWU+aG2sE5NvOkVw+fGseV1CNn3UFwMhmQlHSibAO+0aQyhu1/2GMcOLaY rovmKsqnKr2rpINYOgWOg/Mjhjxmnk10EgvhBWPvDIQiDltc9UoO2M/cC5crIf/xsker N2Teua70ckcLJb5PN06Ehmw25Vh+f+4jYpRg+uU1M/FcVw/iT3hIABCsMGqqo4Y4KR3e VMWVSgw8A1/3aT70DK0bimBR4Np06zCcJ7CQbzcWXIB0TH26GVmcRCCJTmPr5uR6WnNM +YOBjh7FR6MdCrPer71xzv6vHengBkRXl67ruxmDOcJBXeFA98RR2H/fmCPe/SA6kANv 0AkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=jFbq+qmd924+GMTxXpIpWeGG/lrHFOqdk12Yn0hq4YE=; b=gLs7Spw3EyQOMfrDuztXGOQdvW3yJyoQHg1yX1I2x65gM6XxV1cLQ96T91rSyBB8wN uolZW1GZCqw7e8XtA+HOn9kjMMZkUPNJHNEqNC19GJaiUTwZx3q5P45pgixMwgS9J3N8 apFuhqzC+EgU0PbpryhaRcKorL0DA31d5oK1u1KfyJ0EkzKKIFvdJZpSOQz3HTJzrt5n H/xG/jkGk5LP0IsBRxJB700/JBitdUVB5ZIauHVyyuRAPbrzEa3rJxOFmLg3TpnHN3Nv ezJwsH8RobDgwqBNKu1uW2tccfgRernVO74GEvYEuiMtA0YVxnmfCZ6VCGEEfWDk96xf j+kQ==
X-Gm-Message-State: AEkoouvKzBYY00lISrvYQmWFwG/dui3TWxMOFTWe46fzrRqiI/63Z3//rU63B+GDGz9gzsFKY8wykUgkVkGVN8uS
X-Received: by 10.36.82.81 with SMTP id d78mr14561037itb.65.1471279793306; Mon, 15 Aug 2016 09:49:53 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.244.103 with HTTP; Mon, 15 Aug 2016 09:49:12 -0700 (PDT)
In-Reply-To: <623f46dedcc661ae74563f6bf08f46a5@mail.gmail.com>
References: <20160708211635.32095.77047.idtracker@ietfa.amsl.com> <2f56d1b8-7450-42de-4a24-53df9cf4c045@it.uc3m.es> <a1f456db6e656785da8e356b9f530717@mail.gmail.com> <CAK6E8=etaxFSgBvQt27s=rB03rRQxQ_u43qWRExTVXys9A82wg@mail.gmail.com> <623f46dedcc661ae74563f6bf08f46a5@mail.gmail.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Mon, 15 Aug 2016 09:49:12 -0700
Message-ID: <CAK6E8=cYtP_KdvK9TEDov3j523OfcLfeBMgNSvaSueqzqwTC7Q@mail.gmail.com>
To: Karen Elisabeth Egede Nielsen <karen.nielsen@tieto.com>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpprague/hN0K2Jl9i_8HOxxhInnr1Tf09q8>
Cc: marcelo bagnulo braun <marcelo@it.uc3m.es>, tcpm IETF list <tcpm@ietf.org>, TCP Prague List <tcpPrague@ietf.org>
Subject: Re: [tcpPrague] [tcpm] Fwd: I-D Action: draft-bagnulo-tcpm-tcp-low-rtt-00.txt
X-BeenThere: tcpprague@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "To coordinate implementation and standardisation of TCP Prague across platforms. TCP Prague will be an evolution of DCTCP designed to live alongside other TCP variants and derivatives." <tcpprague.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpprague/>
List-Post: <mailto:tcpprague@ietf.org>
List-Help: <mailto:tcpprague-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpprague>, <mailto:tcpprague-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Aug 2016 16:50:55 -0000

On Mon, Aug 15, 2016 at 2:00 AM, Karen Elisabeth Egede Nielsen
<karen.nielsen@tieto.com> wrote:
> Hi Yuchung,
>
>> >
>> > An additional comment is  that new approaches to retransmissions - like
>> > TCP TLP and TCP RACK (also SCTP TLR which however is not progressing at
>> > the moment) might fundamentally alter the picture. I.e., if
>> > retransmissions are sent pro-actively in tail loss situations then more
>> > conservative RTOs may be kept for situations where it is prudent to wait
>> > longer. Don't know if TCP TLP is so widely deployed that it is something
>> > that you should relate to even if it may be superseded by RACK. Just a
>> > thought.
>> TLP and RACK help reduce timeout cases in DC environment for sure. But
>> still it can not completely avoid timeouts.
>
> [Karen Elisabeth Egede Nielsen] Agreed.
>
> For SCTP TLR we still see RTO-timeouts when also probes/retransmissions are
> lost.
> I assume that it would be something like this also for Rack/TLP -  though
> potentially depending on how many consecutive probes that are allowed.
> The role of RTO-timeouts are different when RACK/TLP is enabled and it might
> be counterproductive to have RTO-timeouts be too aggressive as the function
> indeed with TLP/RACK becomes something much closer to the original intend
> (read: how I understand the original intend) - namely to introduce a pause
> for the network to recover when things are really (consistently) bad.
>
> If the RTO is lowered down to be comparable in order of magnitude with the
> RTT (with some delay_ack considerations) we are narrowing the situation
> where TLP/RACK probing is able to kick in before RTO-retransmissions starts
> anyway.
>
> I understand that RTOs should be adjusted to the network dynamics, but I do
> think that it is important to understand - for the DC environment - if the
> need for the low RTOs in DC's is
> motivated mainly by having RTO-retransmissions fix the tail loss recovery
> deficiencies of TCP, which RACK/TLP addresses, or more generally for a need
> for more timely probe and reactivation for/after network recovery.
>
> Or perhaps you see the TLP probe timeout and the RTO timeout eventually
> being driven by the same timer, the reactions (e.g. CC reactions)  then
> possibly depending on state (probe already sent or not).
I agree with your thought here.

More generally, I'd like to evolve the state of art recovery with
these principles:
1. React with ack events as much as possible

2. Send tiny probes to trigger (1) after 2-3 RTTs

3. Have conservative  RTOs that (exp-backoff) expire at an order of
RTT to repair the head for reliability.


On thing about RTO: RTO current reset cwnd to 1 upon firing no matter
what. We should ONLY reset cwnd to 1 if no news (acks/data) for a long
duration. This is quite important for performance: today cwnd is reset
to 1 even if every packet but the first one has been recently
delivered. This unnecessarily conservative because the ack clock isn't
lost yet. But the collateral damage to performance is huge on large
BDP networks. This results work-arounds to shorten and avoid RTOs.


I'd like to address these points in the upcoming RACK/TLP draft.



>
> BR, Karen
>
>  So I agree we should keep
>> timeouts conservative, but the current RFCs are way too conservative:
>> min RTO of 1 second is 4-5 orders of magnitude of DC RTTs.
>>
>> The issue is, the "right" min-RTO depends on the actual DC and stacks:
>> they all have different RTTs and timer granularity. The draft cites
>> Glenn's paper, but Morgan Stanley's DC could be different than others.
>>
>>
>> Other comments on the draft:
>>
>> 1. min ssthresh or cwnd: with very large incast ((tens of) thousands
>> of senders into one receiver), cwnd of 0.1 pkt can still be too big. the
>> only way is to pace the packets over N*RTT intervals.
>>
>>
>> 2. delayed acks:
>> as mentioned the actual time depends a lot on the DC implementation,
>> which is really "vender/owner"-specific. instead of perhaps an option
>> during setup to inform the sender about the max delay in the ack works
>> better.
>>
>>
>>
>> >
>> > BR, Karen
>> >
>> > > -----Original Message-----
>> > > From: tcpm [mailto:tcpm-bounces@ietf.org] On Behalf Of marcelo
>> bagnulo
>> > > braun
>> > > Sent: 8. juli 2016 23:19
>> > > To: tcpm IETF list <tcpm@ietf.org>
>> > > Subject: [tcpm] Fwd: I-D Action: draft-bagnulo-tcpm-tcp-low-rtt-00.txt
>> > >
>> > > Hi,
>> > >
>> > > We just submitted this draft for consideration of the WG. Comments are
>> > > appreciated.
>> > >
>> > > Regards, marcelo
>> > >
>> > >
>> > >
>> > >
>> > > -------- Mensaje reenviado --------
>> > > Asunto:       I-D Action: draft-bagnulo-tcpm-tcp-low-rtt-00.txt
>> > > Fecha:        Fri, 08 Jul 2016 14:16:35 -0700
>> > > De:   internet-drafts@ietf.org
>> > > Responder a:  internet-drafts@ietf.org
>> > > Para:         i-d-announce@ietf.org
>> > >
>> > >
>> > >
>> > > A New Internet-Draft is available from the on-line Internet-Drafts
>> > directories.
>> > >
>> > >
>> > >          Title           : Recommendations for increasing TCP
>> > performance in low RTT
>> > > networks.
>> > >          Authors         : Marcelo Bagnulo
>> > >                            Koen De Schepper
>> > >                            Glenn Judd
>> > >       Filename        : draft-bagnulo-tcpm-tcp-low-rtt-00.txt
>> > >       Pages           : 7
>> > >       Date            : 2016-07-08
>> > >
>> > > Abstract:
>> > >     This documents compiles a set of issues that negatively affect TCP
>> > >     performance in low RTT networks as well as the recommendations to
>> > >     overcome them.
>> > >
>> > >
>> > > The IETF datatracker status page for this draft is:
>> > > https://datatracker.ietf.org/doc/draft-bagnulo-tcpm-tcp-low-rtt/
>> > >
>> > > There's also a htmlized version available at:
>> > > https://tools.ietf.org/html/draft-bagnulo-tcpm-tcp-low-rtt-00
>> > >
>> > >
>> > > Please note that it may take a couple of minutes from the time of
>> > submission
>> > > until the htmlized version and diff are available at tools.ietf.org.
>> > >
>> > > Internet-Drafts are also available by anonymous FTP at:
>> > > ftp://ftp.ietf.org/internet-drafts/
>> > >
>> > > _______________________________________________
>> > > I-D-Announce mailing list
>> > > I-D-Announce@ietf.org
>> > > https://www.ietf.org/mailman/listinfo/i-d-announce
>> > > Internet-Draft directories: http://www.ietf.org/shadow.html or
>> > > ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>> > >
>> > >
>> > > _______________________________________________
>> > > tcpm mailing list
>> > > tcpm@ietf.org
>> > > https://www.ietf.org/mailman/listinfo/tcpm