Re: [tcpm] Comments on draft-ietf-tcpm-accurate-ecn

"Scheffenegger, Richard" <rs.ietf@gmx.at> Sat, 14 July 2018 06:33 UTC

Return-Path: <rs.ietf@gmx.at>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9AD1F13107B; Fri, 13 Jul 2018 23:33:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XFZXUsHIWN_9; Fri, 13 Jul 2018 23:33:34 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E2BF4131070; Fri, 13 Jul 2018 23:33:33 -0700 (PDT)
Received: from [192.168.233.109] ([213.143.121.76]) by mail.gmx.com (mrgmx101 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MC8iq-1fn3oQ0hxg-008voL; Sat, 14 Jul 2018 08:32:42 +0200
To: Bob Briscoe <ietf@bobbriscoe.net>, Yuchung Cheng <ycheng@google.com>, =?UTF-8?Q?Mirja_K=c3=bchlewind?= <mirja.kuehlewind@tik.ee.ethz.ch>
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-accurate-ecn@ietf.org" <draft-ietf-tcpm-accurate-ecn@ietf.org>
References: <AM5PR0701MB25477BD5BEB403A98AA2B983933F0@AM5PR0701MB2547.eurprd07.prod.outlook.com> <VI1PR0701MB2558F5DE5FCE5CDC6A43F94793D30@VI1PR0701MB2558.eurprd07.prod.outlook.com> <E729457B-96C5-493D-9B14-70663C24DFB4@tik.ee.ethz.ch> <db66271d-3654-6066-fecc-a405bb88b7f5@bobbriscoe.net> <CAK6E8=dkuyD+PJv9+4iwdXNu0pEv8n59acHx1Q-yBeCBQ=CcEg@mail.gmail.com> <646D10B9-FED7-4E2D-9A9F-0C052F1C908D@tik.ee.ethz.ch> <CAK6E8=evQwrEgYpmbu7GW1oTAkz-xG5HzyRW5e=uBsmJfdjfAQ@mail.gmail.com> <B0B81087-B740-43D5-BB79-FBF8DA9A2FD9@tik.ee.ethz.ch> <effb8c8f-0cf4-009d-6f94-d8d49e53769a@bobbriscoe.net> <CAK6E8=d14apJBf4f5z18PUQG_Si3T60RdPDeDnX3icd2RvtG0Q@mail.gmail.com> <64747841-13C7-43DC-AEA9-FA7EFA1FDD32@tik.ee.ethz.ch> <CAK6E8=c9VuvR46Sg7gtDcHKWsgGtF-jETT44DLoHkh7+KkESng@mail.gmail.com> <E9BA3522-72BE-427B-8198-3338E0D25D08@tik.ee.ethz.ch> <CAK6E8=cszgsHr1yUiWSnkLbPgdYj9ONY=X4xuduB58xheQR6dA@mail.gmail.com> <f76df54e-900d-28a4-387b-2c402c820b07@bobbriscoe.net>
From: "Scheffenegger, Richard" <rs.ietf@gmx.at>
Message-ID: <b49ada1a-904f-9a67-0cf6-b1eb08be9a20@gmx.at>
Date: Sat, 14 Jul 2018 08:32:40 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <f76df54e-900d-28a4-387b-2c402c820b07@bobbriscoe.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K1:EkrW2DiI3aYC9zLbAkEk6N1tDqSq73S9sPw1M/N2+mRjGv1h+9E 7IrbNkvCgQP1Z5mc8G/dtnRu5KmjqUrZExG+r8OXrfSg66D8DNsIrvRXeULMlZW/Lb5DpIl CRqImfSEOEcmkR+ex34zeGNtaprsCTfLNRtjmjFC6fLQUcuEeosRqKoH5rWxtJ4s5h7c7SN beHm+r5B5WImQBjKx+P2g==
X-UI-Out-Filterresults: notjunk:1;V01:K0:aN0GSqwLCE0=:3wbWDlO8JiSdJgZ0IGzgx5 s3rxuX/GU3tliRtWksf+cDf7Kx6pLXmrqkrGWU9oMmFtfsKIOjxGOlYgzAP0OiMsbwtrm5tRO Z4q41jw1x9s/hvM1IctE0bluDQu+E8vTqY1cUxynRiO3pkIZQ7KIRSPr93A2iYRngEfi3QJT9 3GAnxv4yF+nPE4dSbsV6A9drgBa9b4uj55A0rVY86EIHKggpePahn8M0aBhtIl4lkGlLHYwFF WuNbxK8PBTkiSMMnAoC12i2TKF7IA5ZbL+4O5i1piyJvGAVaTNG8A5Ui7gKkzwx8/ocnGBqg5 uEDJ2laqyTANvSfNZmCS9Zv5Tn9QiMUQqvfqIiaJwA12evJAQ0Sdf94QyPpOtxayGoWsJ6J/K sMv1Cjqk0Krag0LHiNFDobnRwnthsORKu2QlPUJhOlbMjA5r8dYWKYf4uRhvhKEiJBfUPA5+l cTBxsi5gyy5E7/qkU8qZq6ilJvK8ZL5p3S0FwCNI3hb9xuy0S/r7lLY0AYK07jsSoQdy7uyvA ycCZDayAktB2YKG5FEh7EI3ipmm7sL0fIpDIGN1iMCOc3MBeVya5gPIB8+VFbTZF/JxKHdPjI kZ87SE8sRUxGJoS25Xchd1sZ3zQZ8QHPw2Ze4YlfoUPbVUw6zzLU8y3wX5IXI8rcKMtom3y9b h2yKaRe2BSnKXieZZ5MmuSzx5y6Se5rU/rrXqbx41WvIEVQQngXGaII4tX4DV7gV1G5G2Kvv3 EyW11w1Id8YNhI1cQ5sMaNJblFh7ALLlSg7UEucqDkFt+A5VBkpzs13bJVRkvpMMCi6Rq9zG4 Ep3B/KC
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/SY_T9WpfXBIZRemlZ6x1nsVETko>
Subject: Re: [tcpm] Comments on draft-ietf-tcpm-accurate-ecn
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jul 2018 06:33:37 -0000

Yuchung, Bob,

I really like this discussion!

Two points:


Am 14.07.2018 um 02:52 schrieb Bob Briscoe:
> Yuchung,
> 
> On 13/07/18 21:45, Yuchung Cheng wrote:
>> On Fri, Jul 13, 2018 at 12:42 PM, Mirja Kühlewind
>> <mirja.kuehlewind@tik.ee.ethz.ch>  wrote:
>>> Hi Yuchung,
>>>
>>>> Am 13.07.2018 um 15:30 schrieb Yuchung Cheng<ycheng@google.com>om>:
>>>>
>>>> On Fri, Jul 13, 2018 at 11:53 AM, Mirja Kühlewind
>>>> <mirja.kuehlewind@tik.ee.ethz.ch>  wrote:
>>>>> Hi Yucheng,
>>>>>
>>>>> please see below.
>>>>>
>>>>>> Am 13.07.2018 um 14:38 schrieb Yuchung Cheng<ycheng@google.com>om>:
>>>>>>
>>>>>> hi --
>>>>>>
>>>>>> I agree:
>>>>>> 1. delayed/streched ACK aren't going away (in fact will be more common)
>>>>>> 2. GRO isn't and should not be a show-stopper
>>>>>> 3. SYN option is running tight
>>>>>> 4. HW opt comes after SW
>>>>>>
>>>>>> I worry:
>>>>>> 1. GRO is a unavoidable issue in deployment (let's not produce an
>>>>>> undeployable RFC). the ACE counter won't work as GRO can pack up to
>>>>>> 64KB/MTU =~ 45 pkts under heavy congestion.
> [BB] I'm really not expert on offload, but I thought GRO collects (or 
> can be made to collect) a set of fields from the headers it strips off?
> 
> If I'm wrong (quite likely), bear in mind the following:
> * you only need the ACE counter (main header) when the AccECN Option is 
> being stripped by a middlebox
> * on those connections that need ACE (cos of middlebox meddling) 
> couldn't GRO be limited to 8 packets, at least while experimenting with 
> AccECN to see if it is useful and gather data?

[RS @ Group] To be pedantic, GRO can coalesce packets, until the ACE 
counter has increased by 7 counts (one less than the previously 
delivered "superpacket"). Under heavy CE marking, this may be as few as 
7 or 8 packets, correct. However, this is when the network expiriences 
congestion. Thus "slowing things down" by reducing the processing speed 
on the client is not an acute problem IMHO - as a slightly delayed 
ACKing would implicitily also reduce the sending speed somewhat.

With many flows, higher RTTs etc, I would expect GRO to be able to 
coalesce many more than just 7 packets.


[...]



 >>>> 3. Leave ACE-count and ACE option optional (i.e. MAY)
 >>>
 >>> I don’t understand this. If both is optional, you don’t have any 
feedback. Or what do you mean by „leave ACE-count optional“?
 >> use-case: We can negotiate DCTCP-style ECN for the internet.
 >>
 >> Then interested parties can progressively experiment on more accurate
 >> "options" (!= TCP-option)


>>>>>> 3. Leave ACE-count and ACE option optional (i.e. MAY)
>>>>> I don’t understand this. If both is optional, you don’t have any feedback. Or what do you mean by „leave ACE-count optional“?
>>>> use-case: We can negotiate DCTCP-style ECN for the internet.
>>>>
>>>> Then interested parties can progressively experiment on more accurate
>>>> "options" (!= TCP-option)
 >>>
 >>> As Appendix A of RFC7560 says I don’t think it is a safe option for 
 >>> the Internet where packet loss more likely then in a full
 >>> ECN-enabled data center.
 >>>
 >>> Again I disagree RFC7560 Appendix A is a big problem based on my
 >>> experience with at times loss-heavy ECN-enabled data-center (Google
 >>> data-center runs very hot and uses a DCTCP-variant).
 >>>
 >>> We can quabble forever w/o data. That's why I asked for some
 >>> (non-simulation) data.

[RS @ Yuchung] I'm afraid I'm with Mirja's comment (other fork of this 
discussing added here), I don't quite follow what you proposing. An 
AccECN-style negotiation, which is used in conjunction with ECN++ (and 
possibly L4S style ECT-1 marking of all packets instead of ECT-0?). But 
how do you envision the feedback scheme? Like the DCTCP-like state 
machine to send change-triggered ACKs, and stretches of ECE and non-ECE 
marked TCP header bits? (Which does not cater well for ACK thinning, ACK 
loss)


However, I think I see where you are coming from. Let me try to 
summarize what I assume here:

With DCTCP-style feedback in a datacenter, and some level of ACK losses, 
the sender-side estimation of the CE levels with with equal probability 
over- and under-estimated the marking levels during one specific RTT. If 
the sender underestimated congestion, the network will provide a higher 
marking level during the next RTT, and thereby correct the false 
estimate. On overestimation, the flow would only have reduced its 
sending rate more than necessary, and regular CA across all flows of 
that bottleneck link will readily make use of that free bandwidth. All 
of that happening in datacenter-typical RTTs of a few dozend nanoseconds 
to a few microseconds... So any deviation will only persist for a very 
short period of time, while most flows have very similar RTTs.

Now, the problem space AccECN tries to address includes the public 
internet, with RTTs over a bottleneck, that vary many orders of 
magnitude potentially. Thus the goal should be to avoid the over- and 
more so, the under-estimation of congestion as much as possible already 
within the first RTT.

Or summarized differently, in a datacenter, small errors in congestion 
estimation have a predictable, very short life-time until they will be 
corrected. On the public internet, it is a better approach to try and 
avoid an estimator error on congestion as good as possible, as these 
errors would result in more unfairness among flows.

I hope that is a fair summary, but please expand on how/what your 
feedback scheme would encompass.

Best regards,
   Richard