Re: [tcpm] Fwd: Fwd: New Version Notification for draft-kuehlewind-tcpm-accurate-ecn-04.txt

Bob Briscoe <ietf@bobbriscoe.net> Wed, 14 October 2015 00:22 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A5C2F1A9151 for <tcpm@ietfa.amsl.com>; Tue, 13 Oct 2015 17:22:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.3
X-Spam-Level:
X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dvDXL5sQv9cM for <tcpm@ietfa.amsl.com>; Tue, 13 Oct 2015 17:22:10 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B7DAE1A914D for <tcpm@ietf.org>; Tue, 13 Oct 2015 17:22:08 -0700 (PDT)
Received: from 61.153.112.87.dyn.plus.net ([87.112.153.61]:33242 helo=[192.168.0.6]) by server.dnsblock1.com with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.85) (envelope-from <ietf@bobbriscoe.net>) id 1Zm9pV-00030Q-Tx; Wed, 14 Oct 2015 01:22:06 +0100
To: Praveen Balasubramanian <pravb@microsoft.com>, tcpm IETF list <tcpm@ietf.org>, Richard Scheffenegger <rscheff@gmx.at>, Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
References: <55F055AD.3050809@tik.ee.ethz.ch> <55F05D54.5060708@tik.ee.ethz.ch> <55FF2910.7080908@bobbriscoe.net> <BN1PR03MB008FCB491B06E80B6A9A915B64F0@BN1PR03MB008.namprd03.prod.outlook.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <561DA02D.8020001@bobbriscoe.net>
Date: Wed, 14 Oct 2015 01:22:05 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <BN1PR03MB008FCB491B06E80B6A9A915B64F0@BN1PR03MB008.namprd03.prod.outlook.com>
Content-Type: multipart/alternative; boundary="------------010809000509060904030800"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
Archived-At: <http://mailarchive.ietf.org/arch/msg/tcpm/vVmUwCxfGTs6ZlbeLRjz5rDcxoc>
Subject: Re: [tcpm] Fwd: Fwd: New Version Notification for draft-kuehlewind-tcpm-accurate-ecn-04.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Oct 2015 00:22:16 -0000

Praveen,

Sorry I've taken so long to reply.
Inline...


On 28/09/15 18:35, Praveen Balasubramanian wrote:
>
> Hi all,
>
> I reviewed this draft and have a few comments below. Overall the draft 
> is well written and covers all the corner cases.
>
> Thanks
>
Thankyou. We tried v hard to keep it clear, and to ensure that the 
logical flow was not interrupted by the corner cases.
>
> The section 2.3 summary immediately brings about a question around 
> stretch ACKs. I see that you explain the receiver requirement in 
> section 3.2.2 (the delayed ACK frequency limit of 6 for CE marked 
> packets) but it might be worth suggesting this up front.
>
That's a good point - as you know, section 2 is meant to be a discursive 
summary of the detailed rules in section 3. However, in this case, I 
think we rewrote 3.2.2 after we had written 2.3. So, you're right that 
the summary ought to give more of a hint of how this will be solved. 
Will do.
>
> Is there any model to suggest what would be the accuracy of the scheme 
> described in Appendix A.3 to estimate the number of marked bytes from 
> the ACE field? Using a new TCP option will have deployment issues.
>
Yes, it's important to check how functional the protocol will be for 
those cases where the TCP option won't get through.
But no, we haven't yet modelled the accuracy of estimating marked bytes 
from marked packets.

I don't want to spend too much time on that until someone has attempted 
to implement AccECN (Mirja has made a good start). Because, at least in 
Linux, it might be relatively easy for the sender to record the sizes of 
all the segments in flight, so that the sender can calculate, rather 
than estimate, CE marked bytes.

Estimating, rather than calculating, might still be necessary for Linux, 
and certainly some other OSs such as FreeBSD. But I would rather 
understand what is possible in at least one OS before modelling this 
problem.

At present DCTCP measures the extent of congestion in packets not bytes. 
But we envisaged that an algorithm like Relentless TCP might be used 
where,  every time new CE feedback arrives, the sender just decrements 
cwnd by (CE bytes)/2.

> With AccECN if SYN can now be safely ECN-capable then what are the 
> drawbacks of making it a requirement?
>
FIrst I need to correct the premise of your question. Just because an 
AccECN receiver /can/ feed back CE on a SYN, does not mean it is safe 
for a sender to set ECN-capable on a SYN. A feedback protocol has to 
feed back stuff it receives, whether or not that stuff is valid (so that 
the sender can know what stuff appeared at the other end in order to 
detect middlebox mangling as well as valid protocol transitions).

Quoting section 2.5  entitled "Generic (Dumb) Reflector":

    Providing feedback of CE marking on the SYN also supports future
    scenarios in which SYNs might be ECN-enabled (without prejudging
    whether they ought to be).


Indeed, an ECN-capable SYN is outside the scope of AccECN, which solely 
covers the feedback behaviour of a receiver. An ECN-capable SYN would 
have to be defined in an RFC that covered TCP sender behaviour. E.g. 
perhaps, once your DCTCP draft becomes RFC, we might work on an 
evolution of DCTCP for both data centres and the public Internet, as 
discussed in Prague.


Nonetheless, to answer your question, IMO making a SYN ECN-capable has 
no drawbacks, /as long as/ it is accompanied by the following safety 
measures:

a) At the SYN stage, the TCP client doesn't yet know whether the server 
even understands ECN. So if a SYN were ECN-capable and then it was 
marked CE in the network, a non-ECN capable receiver would just ignore 
the marking. Therefore, if the sender received a SYN-ACK that did not 
complete the ECN negotiation, it would have to conservatively assume 
that the SYN might have been CE-marked, and reduce its initial window as 
if it was.

b) There is some concern that ECN-capable SYN floods could starve 
non-ECN traffic. However, all AQMs are required to switch off ECN 
marking if the queue starts to overload. So, before I believe an 
ECN-capable SYN could be a DoS risk, I would like to see someone 
demonstrate an ECN-capable SYN flood that does more harm than a non-ECN 
SYN flood. I doubt it would be possible, because before the attack could 
have any effect, the cause of the effect (ECN) would be turned off.

Interestingly, in the recent (2015) study on ECN traversal (Trammell et 
al) they found ECT or CE on SYN was dropped in only 0.82% and 0.61% of 
cases (v4 & v6 resp.), which was (to me) surprisingly low, given RFC3168 
says (without evidence) that this could be a DoS attack vector.


> Regarding the requirement that "a TCP server that confirms its support 
> for AccECN (as described in Section 3.1) MUST also include an AccECN 
> TCP Option in the SYN/ACK", it seems onerous to make the supplementary 
> part a MUST. The server may have prior information that packets in 
> 3WHS are dropped on a particular path. The server may also have a knob 
> which is set in cases where the middleware is known to discard packets 
> with unknown TCP option.
>
Good point. We need to allow an exception where the host has cached 
knowledge that AccECN option black-holes (but it should test 
occasionally in case the black hole has been plugged).
Not only for the SYN/ACK from the server, but also for the first ACK and 
first data segment from the client.

> From practical deployment point of view section 3.3 needs to be 
> expanded upon to include how LSO and LRO would work in presence of the 
> new option.
>
Yes, it might be useful to write another appendix with examples of how 
segmentation offload might work. If you could help write that, it would 
be v useful (I know Richard has expertise in this area too).
>
> For the LSO case the host can choose to generate an extra ACK with the 
> option rather than piggybacking it on a large send. However even if 
> the option is set on an LSO send, the NIC can repeat the option in 
> every segment without any adverse affects.
>
Care! We said the following for a reason:

    Nonetheless, such hardware MUST attempt to preserve the timing of
    each ACK (for example, if it coalesced ACKs it would not be AccECN-
    compliant).


If the large send coalesces the ACKs (whether data ACKs or pure ACKs) 
for a number of received segments, the other end would see possibly 
multiple counters incremented, representing different markings on 
different arriving packets.

...Ideally we wanted the feedback to make it possible for a sender to 
extract the order and the timing of each transition between one ECN 
marking and another, as they arrived at the receiver. For instance, in 
2010, Mirja and I wrote a paper about how to implement rapid detection 
of available capacity using packet chirps 
<http://www.bobbriscoe.net/pubs.html#chirp_impl>. It would be really 
useful to be able for AccECN to support any senders trying to do 
something similar, but with a shallow ECN threshold like that used in 
DCTCP, particularly during slow-start.

I'm not saying all senders would have to work like that, just that it 
would be nice if a generic receiver behaviour would allow some senders to.

Examples like this justify what we called "change-triggered ACKs". We 
(the co-authors) had long internal arguments about whether to make 
change-triggered ACKs a MUST or a SHOULD. We decided that the draft 
would say 'SHOULD', but the WG might want to continue our argument.

The problem: if a sender wants to rely on change-triggered ACKs to get 
rapid info early in a new flow, it needs to be sure that the receiver 
has implemented change-triggered ACKs. We wanted to keep it simple, so 
we didn't want this to be a negotiable option. Therefore, such a sender 
would want change-triggered ACKs to be a MUST. However, we didn't want 
to write a spec that might be impossible to implement with today's 
hardware. So we settled on a SHOULD. But then our chirping sender cannot 
be sure whether the feedback is doing what it SHOULD or not, so it has 
to detect what the receiver is probably doing, which wastes time - 
precisely what the sender doesn't want to do.

So, in the end, we decided to write the SHOULD with an extra-strong plea 
to say you really ought to think really hard before not implementing 
this SHOULD - what one might call a "MUST-SHOULD"!

> LRO will be complicated if data segments contain the new option. At 
> least in Windows, LRO does not coalesce pure ACKs.
>
My personal view: I don't worry if acceleration hardware cannot handle 
newly defined capabilities, as long as software implementations are fast 
enough for the new protocol to be usable and to become successful. IMO, 
acceleration hardware is designed to accelerate today's common 
operations, and no-one would expect it to always be able to accelerate a 
newly invented protocol. As long as AccECN is good enough to be 
successful, tomorrow's hardware will accelerate it.

This is definitely an area where we are looking for input from people 
like you and others in the WG.


Thanks again for your comments.
We've got a few minor edits, which we will fold in with the changes 
promised above, assuming you, my co-authors and the WG are all happy 
with my responses.

Cheers



Bob

> *From:*tcpm [mailto:tcpm-bounces@ietf.org] *On Behalf Of *Bob Briscoe
> *Sent:* Sunday, September 20, 2015 2:46 PM
> *To:* Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>; 
> tcpPrague@ietf.org
> *Cc:* tcpm IETF list <tcpm@ietf.org>; Richard Scheffenegger 
> <rscheff@gmx.at>
> *Subject:* Re: [tcpm] Fwd: Fwd: New Version Notification for 
> draft-kuehlewind-tcpm-accurate-ecn-04.txt
>
> tcpPrague list and Mirja,
>
> For those of you who don't follow the IETF so closely, the reason for 
> not using DCTCP's original feedback scheme was because it was not 
> designed to cope with ACK loss.
>
> This is explained in the requirements for more accurate ECN feedback 
> that were recently agreed and published as RFC 7560 
> <https://tools.ietf.org/html/rfc7560>. The appendix in that RFC gives 
> some examples where the original DCTCP would get highly confused by a 
> few ACK losses.
>
> In that RFC it was admitted that no-one expected that all the 
> requirements could be satisfied at once.
> * The previous version <draft-kuehlewind-tcpm-accurate-ecn-03> 
> satisfied them all except 'simple', but there was push-back on that.
> * So this time <draft-kuehlewind-tcpm-accurate-ecn-04 
> <https://tools.ietf.org/html/draft-kuehlewind-tcpm-accurate-ecn-04>> 
> has gone for simple, but compromised a bit on 'low to zero overhead'.
>
> As Mirja said, pls read it and tell the IETF tcpm list (in cc) whether 
> you support this new approach, and if not, why not.
> The IETF doesn't charter work if no-one is talking about it.
>
> Cheers
>
>
> Bob
>
> On 09/09/15 17:24, Mirja Kühlewind wrote:
>
>     Hi all,
>
>     find below the mail that I've just sent to the tcpm list. You
>     might have seen this already but I ed explicitly point this out to
>     the tcpprague list as we partly discussed this activity already at
>     the meeting in Prague.
>
>     In short AccECN can be used with DCTCP or any other (DCTCP-like)
>     scheme that needs more accurate information on how many ECN
>     markings have been received. If that's interesting for you, please
>     read the draft and provide feedback. Please make sure that you
>     also cc the tcpm list regarding all feedback that is directly on
>     AccECN and the draft!
>
>     If you would like to discuss future usage of AccECN, this can
>     happen on this list only, or you may cc tcpm if e.g. your use case
>     would require changes to AccECN as currently proposed.
>
>     Thanks!
>     Mirja
>
>
>     -------- Forwarded Message --------
>     Subject: [tcpm] Fwd: New Version Notification for
>     draft-kuehlewind-tcpm-accurate-ecn-04.txt
>     Date: Wed, 9 Sep 2015 17:52:13 +0200
>     From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
>     <mailto:mirja.kuehlewind@tik.ee.ethz.ch>
>     To: tcpm@ietf.org <mailto:tcpm@ietf.org> Extensions
>     <tcpm@ietf.org> <mailto:tcpm@ietf.org>, Bob Briscoe
>     <ietf@bobbriscoe.net> <mailto:ietf@bobbriscoe.net>, Richard
>     Scheffenegger <rscheff@gmx.at> <mailto:rscheff@gmx.at>
>
>     Hi all,
>
>     we submitted a new draft for AccECN (see below). We significantly
>     simplified the
>     draft because we believe it is more important to have a solution
>     that is easy to
>     understand and therefore more easy to correctly implement, than
>     having a
>     solution that tries to fulfill all requirements but is overly
>     complex.
>
>     In short, we now only overwrite the 3 TCP header bits (no
>     additional bits in the
>     TCP header are used) and use them simply for a CE packet counter,
>     while
>     bytes-wise information on all other markings is provided by a new
>     TCP option. In
>     case the option is not available, e.g. because it's blocked, this
>     will still
>     provide the needed information to react to a CE-based congestion
>     feedback.
>     However, any advanced mechanism that needs further information on
>     the other
>     markings received will only work if the option is available.
>
>     Further, we also tried to keep the draft as short as possible.
>     That means the
>     actual normative part only has 9 pages. However, we still provide
>     an overview
>     section with some reasoning as well as some discussion on
>     interaction with other
>     mechanisms which leads in total to 25 pages (without appendix).
>
>     I'm currently also working on an implementation of this proposed
>     solution. I
>     will announce it as soon as I'm ready!
>
>     In any case we would like to discuss this at the next meeting. We,
>     the author,
>     think that this proposal is now the right way forward and hope
>     that this
>     simplified proposal will allows us to quickly proceed on AccECN as
>     the need for
>     it is increasing.
>
>     Please let us know if you have any feedback on the draft!
>
>     Mirja
>
>
>     -------- Forwarded Message --------
>     Subject: New Version Notification for
>     draft-kuehlewind-tcpm-accurate-ecn-04.txt
>     Date: Sun, 06 Sep 2015 16:19:47 -0700
>     From: internet-drafts@ietf.org <mailto:internet-drafts@ietf.org>
>     To: Richard Scheffenegger <rs@netapp.com> <mailto:rs@netapp.com>,
>     "Mirja Kühlewind"
>     <mirja.kuehlewind@tik.ee.ethz.ch>
>     <mailto:mirja.kuehlewind@tik.ee.ethz.ch>, Mirja Kuehlewind
>     <mirja.kuehlewind@tik.ee.ethz.ch>
>     <mailto:mirja.kuehlewind@tik.ee.ethz.ch>, Richard Scheffenegger
>     <rs@netapp.com> <mailto:rs@netapp.com>, Bob
>     Briscoe <ietf@bobbriscoe.net> <mailto:ietf@bobbriscoe.net>, Bob
>     Briscoe <ietf@bobbriscoe.net> <mailto:ietf@bobbriscoe.net>
>
>
>     A new version of I-D, draft-kuehlewind-tcpm-accurate-ecn-04.txt
>     has been successfully submitted by Bob Briscoe and posted to the
>     IETF repository.
>
>     Name:        draft-kuehlewind-tcpm-accurate-ecn
>     Revision:    04
>     Title:        More Accurate ECN Feedback in TCP
>     Document date:    2015-09-06
>     Group:        Individual Submission
>     Pages:        36
>     URL:
>     https://www.ietf.org/internet-drafts/draft-kuehlewind-tcpm-accurate-ecn-04.txt
>
>     Status:
>     https://datatracker.ietf.org/doc/draft-kuehlewind-tcpm-accurate-ecn/
>     Htmlized:
>     https://tools.ietf.org/html/draft-kuehlewind-tcpm-accurate-ecn-04
>     Diff:
>     https://www.ietf.org/rfcdiff?url2=draft-kuehlewind-tcpm-accurate-ecn-04
>
>
>     Abstract:
>         Explicit Congestion Notification (ECN) is a mechanism where
>     network
>         nodes can mark IP packets instead of dropping them to indicate
>         incipient congestion to the end-points.  Receivers with an ECN-
>         capable transport protocol feed back this information to the
>     sender.
>         ECN is specified for TCP in such a way that only one feedback
>     signal
>         can be transmitted per Round-Trip Time (RTT).  Recently, new TCP
>         mechanisms like Congestion Exposure (ConEx) or Data Center TCP
>         (DCTCP) need more accurate ECN feedback information whenever more
>         than one marking is received in one RTT.  This document
>     specifies an
>         experimental scheme to provide more than one feedback signal
>     per RTT
>         in the TCP header.  Given TCP header space is scarce, it
>     overloads
>         the three existing ECN-related flags in the TCP header and
>     provides
>         additional information in a new TCP option.
>
>
>
>
>     Please note that it may take a couple of minutes from the time of
>     submission
>     until the htmlized version and diff are available at tools.ietf.org.
>
>     The IETF Secretariat
>
>
>     _______________________________________________
>     tcpm mailing list
>     tcpm@ietf.org <mailto:tcpm@ietf.org>
>     https://www.ietf.org/mailman/listinfo/tcpm
>
>
>
> -- 
> ________________________________________________________________
> Bob Briscoehttp://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/