Re: [tcpm] Review of draft-ietf-tcpm-dctcp-01

Bob Briscoe <ietf@bobbriscoe.net> Thu, 18 August 2016 16:35 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 137F212D898 for <tcpm@ietfa.amsl.com>; Thu, 18 Aug 2016 09:35:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.89
X-Spam-Level:
X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xTiVdmtXKNGq for <tcpm@ietfa.amsl.com>; Thu, 18 Aug 2016 09:35:47 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7639612D09C for <tcpm@ietf.org>; Thu, 18 Aug 2016 09:35:45 -0700 (PDT)
Received: from host81-153-214-243.range81-153.btcentralplus.com ([81.153.214.243]:53567 helo=[192.168.1.82]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from <ietf@bobbriscoe.net>) id 1baQIB-0002xf-9x; Thu, 18 Aug 2016 17:35:43 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Praveen Balasubramanian <pravb@microsoft.com>
References: <563CF0F7.6070302@bobbriscoe.net> <BN1PR03MB008D16E15DCBE49E7C21A4AB6350@BN1PR03MB008.namprd03.prod.outlook.com>
Message-ID: <71a48e5c-c992-de4d-12c3-7779ee702bd6@bobbriscoe.net>
Date: Thu, 18 Aug 2016 17:35:42 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <BN1PR03MB008D16E15DCBE49E7C21A4AB6350@BN1PR03MB008.namprd03.prod.outlook.com>
Content-Type: multipart/alternative; boundary="------------24759AAECE54BB7370BFF4EE"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/WvKNOYcx3xb14RfYa6hIuZdcNpU>
Cc: tcpm IETF list <tcpm@ietf.org>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-dctcp-01
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2016 16:35:53 -0000

Praveen,


On 17/07/16 21:25, Praveen Balasubramanian wrote:
>
> Thanks Bob. Sorry for the <ridiculous> delay.
>
[BB] And now sorry for my delay. I tried to finish this email a number 
of times, but kept getting interrupted.
>
> Comments inline prefixed by <Praveen>. Most of these edits will be in 
> the next revision.
>
[BB] I've tried very hard to only comment where I really do think there 
is still a problem (even if minor), and suggested specific text to help 
you...

> Thanks
>
> *From:*Bob Briscoe [mailto:ietf@bobbriscoe.net]
> *Sent:* Friday, November 6, 2015 10:27 AM
> *To:* Praveen Balasubramanian <pravb@microsoft.com>; EGGERT, Lars 
> <lars@netapp.com>; Stephen Bensley <sbens@microsoft.com>; Dave Thaler 
> <dthaler@microsoft.com>; glenn.judd@morganstanley.com
> *Cc:* tcpm IETF list <tcpm@ietf.org>
> *Subject:* Review of draft-ietf-tcpm-dctcp-01
>
> Praveen & co-authors,
>
> As promised, here's my review of draft-ietf-tcpm-dctcp-01
>
> *_General Comments
> _*
> In general, in the comments below I have tried to reduce the multiple 
> assumptions about controlled environments down to the one assumption 
> that they are controlled by a single authority. I believe DCTCP can 
> work without the other assumptions, so the following do not apply in 
> all controlled environments:
>   * not always low RTT (e.g. inter-DC scenarios, or global private 
> networks)
>   * not always no risk of traffic attacks (e.g. internally arranged 
> attacks)
> I may have missed some other occurrences of these assumptions, so pls 
> feel free to seek out them all.
>
> <Praveen> DCTCP can be extended to other environments but this draft 
> does not cover any of that. It is explicitly for deployment in a 
> datacenter where the RTT is low and there is no risk of internal 
> attack. We cannot cover other scenarios that have not been tested or 
> deployed. All the known deployments are in controlled datacenters. The 
> TCP Prague effort should result in a draft that covers additional 
> requirements that will make DCTCP work well on high latency links.
>
>
> Many of the comments argue with your choices of MUST, SHOULD, MAY etc. 
> This might seem nit-picking, given the primary purpose is to document 
> the algo. However, it would be wrong to require things that aren't 
> required or to allow exceptions when exceptions would not be 
> interoperable. An alternative approach would be to just remove all 
> capitalised language.
>
> <Praveen> I am fine with removing such capitalization except for when 
> removing it will lead to interop problems between different 
> implementations of DCTCP. I am responding to each comment below and 
> adding whether the new revision will fix it not.
>
>
> *_Section-by-section review.
> _*
> *Abstract
> *
> I think it needs to have an applicability statement in the abstract 
> (and introduction) that refers to the deployment and implementation 
> status sections at the end. Something like:
>
> "This is an informational specification of the implementation of DCTCP 
> in Microsoft Windows Server 2012. It is applicable to deployments in 
> controlled environments like data centres but it MUST NOT be deployed 
> over the public Internet without additional measures, as detailed in 
> sections 4-6. "
>
> <Praveen> This is already addressed in the introduction. I don’t see 
> the value of repeating it multiple times. An implementer will read the 
> introduction. If you still feel strongly, we can add it to the 
> abstract and remove it from the introduction section. This draft will 
> not cover any additional measures that are intended for deployments 
> outside the datacenter – that’s not the intended purpose.
>
[BB] It is very common for the IETF to require even much less minor 
applicability statements to be highlighted in the abstract (e.g. see TFO 
RFC). This is because nowadays implementers / readers often skip 
introductory text assuming they understand the problem, etc. and they 
just scan for the normative parts. Anyone reading an IETF RFCs assumes 
the context is the public Internet, so any RFC that doesn't have that 
context needs to flag that. I know the name says Data Centre, but the 
abstract needs to explain that this is not an IETF RFC that is 
standardizing DCTCP for the public Internet.

If you still don't think so, we should put this to the WG (which will 
slow it down).
>
>
> *1. Intro
> *s/its many servers/
>  /their many servers/
>
> <Praveen> Fixed in next revision.
>
>
> "...limited queue capacities..."
> I understand that the motivation has been abbreviated, but I think 
> this has lost too much. Perhaps mention memory shared between 
> interfaces, so either bloated buffers or too little, making traffic 
> susceptible to packet losses?
>
>    [RFC3168] describes a mechanism for using Explicit Congestion
>    Notification (ECN) from the switches for early detection of
>    congestion, rather than waiting for packet loss to occur.
>
> This isn't correct. 3168 requires ECN marking to occur no earlier than 
> drop. It is precisely that 3168 does not allow early marking unless 
> there is also early drop that is the problem.
>
> <Praveen> Fixed in next revision.
>
>
>    It is recommended that DCTCP be deployed...
>
> "recommended" is too weak. I suggest you use something like the 
> applicability text given above, both here and in the abstract.
>
> <Praveen> Added the word “only” to make this stronger per Michael’s 
> review. Fixed in next revision.
>
[BB] I suggest the applicability para (here in the intro) refers forward 
to section 5 for details (e.g. so you don't have to explain why it 
shouldn't be deployed in the intro).
>
>
>
> *2. Terminology
> *
> Suggest you explain that capitalised words are not quite the same as 
> in most RFCs:
> "Normative language is used to describe how necessary the various 
> aspects of the Microsoft implementation are for interoperability, but 
> even compliant implementations without the measures in sections 4-6 
> would still only be safe to deploy in controlled environments."
>
> <Praveen> Fair enough. Added text in next revision.
>
>
> *3. DCTCP Algo
> *
> The three bullets say no more than "this is normal stuff". Would it be 
> better to supplement each sentence with a brief phrase saying what is 
> different about each aspect compared to RFC3168?
>
> <Praveen> The goal here it to describe the changes needed on top of 
> 3168 so I don’t think further explanation is necessary.
>
>
> *3.1 Marking
> *
> This needs to say something about how you mark the IP header from L2. 
> I suggest you refer to the up-and-forward mode of 
> draft-ietf-tsvwg-ecn-encap.
>
> <Praveen> Why is that relevant to this document? It is not intended as 
> a specification for operation of L2 switches. Won’t fix.
>
[BB] Well, you specify a lot about the marking algo, e.g.
K > (RTT *  C)/7

A simple solution would be: s/switch/L3 switch/

I just thought that a reader would think, "You say switches (i.e. L2 
devices) mark the IP header - how can that work?" unless you explain.

>
> s/sending rate/
>  /link rate/
>
> <Praveen> Fixed in next revision
>
>
> *3.2
> *
> CURRENT:
>
>     1.  If the CE codepoint is set and DCTCP.CE is false, send an ACK for
>         any previously unacknowledged packets and set DCTCP.CE to true.
>     2.  If the CE codepoint is not set and DCTCP.CE is true, send an ACK
>         for any previously unacknowledged packets and set DCTCP.CE to
>         false.
>
> SUGGESTED:
>
>     1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
>         true and send an ACK for any previously unacknowledged packets.
>     2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
>         to false and send an ACK for any previously unacknowledged packets.
>
>     The figure would have to be changed to be consistent too.
> RATIONALE:
> If the receiver changes DCTCP.CE /after/ sending the ACK like this, it 
> delays each signal until the following ACK is sent. That could add a 
> delay of anything from 1 to m packets. I've never understood why the 
> order of these operations was specified in this way. If there is no 
> reason, then I suggest that it is specified in the opposite order, as 
> I describe above.
>
>
> A note could be added to say the Windows Server 12 implementation does 
> these steps in reverse order, but the order specified is preferred 
> because it reduces signalling delay and has no interoperability issues 
> with the original Windows implementation.
>
> <Praveen> Won’t fix. This is how the Windows implementation behaves so 
> the ordering is required. Both the paper and the draft are consistent 
> in this.
>
[BB] The paper and the draft are not consistent. In the draft, the 
labels on the top and bottom arrows in the diagram have been switched 
round relative to those in the paper.

However, the text in the draft has not (yet?) been changed to reflect this.
>
> There is no delay being introduced here. This ACK is sent immediately 
> so the order does not matter.
>

[BB] I agree that the ACK itself is not delayed. However, even tho a CE 
has been received, feedback of this event is held back until the next ACK.

This is a minor 'unnecessary delay' bug.
* I believe it was fixed in the FreeBSD implementation after I pointed 
it out.
* I've just checked the Linux code, and it's not been fixed there.
* Obviously I cannot see the Windows source code.

If this bug is still in the Windows implementation, it's up to you if 
you also want the spec to require that the bugs are implemented exactly 
as they are in Windows ;)

>        
>     The handling of the "Congestion Window Reduced" (CWR) bit is also
>     exactly as per [RFC3168 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2frfc3168&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Ce9m5q0XzkbQNWEb1rqmXrqx12Uf4LDscrUAz5C78eY%3d>] including [RFC3168-ERRATA3639 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-ietf-tcpm-dctcp-01%23ref-RFC3168-ERRATA3639&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=wDJv%2bcypXj5layfzvQGhXYqB6kwl8lW%2fpdQ4iYg2cN8%3d>].  That is, on
>     receipt of a segment with both the CE and CWR bits set, CWR is
>     processed first and then ECE is processed.
>
> Even tho this is the receiver section, I suggest:
> s/The handling/Receiver handling/
>
> <Praveen> Isn’t this assumed by a reader familiar with 3168? But this 
> is a minor editorial fix so fixed in the next revision.
>
>
> Whatever, this cannot be correct. A DCTCP receiver surely does nothing 
> on receipt of CWR, whereas an RFC3168 receiver does something. A DCTCP 
> receiver certainly cannot do exactly what RFC3168 says, because that 
> says stop setting ECE on receipt of CWR until the next CE arrives, 
> which would surely break this feedback protocol.
>
> <Praveen> I don’t see why this would break the feedback protocol. The 
> receiver will start echoing again as soon as it sees the first CE 
> arrives again. If packet has both CWR and CE set then CE takes 
> precedence and the echoing is done.
>
[BB] I'm not saying it breaks the feedback protocol. I'm just saying 
it's confusing for the reader.

Of course, when the receiver is already not doing something, ignoring a 
signal to stop has the same outcome as stopping. However, describing 
this as "handling CWR ... is also exactly as per RFC3168" is misleading 
for the reader/implementer.

I guess changing to "handling CWR ... is as per RFC3168" would be a 
compromise.
>
>
> *3.3 *
>
>     DCTCP.Alpha, which is initialized to 1 and MUST be updated
>     as follows:
>
> s/MUST/SHOULD/
> And add "An alternative congestion avoidance algorithm MAY be used, 
> but it MUST lead to flow rate inversely proportional to 1/M".
>
> Rationale: There are many ways to implement interoperable 1/p cc. This 
> is just one.
>
> <Praveen> Alternate algorithms are out of scope of this draft. We 
> cannot say without interop testing if a different algorithm will 
> perform well in datacenters or interop well with existing 
> implementations. If there are other ways to implement 1/p they should 
> be covered by other drafts. This draft will only describe the 
> algorithm that has been tested and deployed. That said I am willing to 
> make it a SHOULD so fixed in the next revision.
>
>     DCTCP.BytesSent
>
> This variable actually counts bytes received. it's pretty confusing to 
> call it BytesSent.
>
> <Praveen> No, this is tracking bytes that have been sent.
>
[BB] I meant it counts bytes now acknowledged as received by the other 
end (which is not the same as bytes sent by this end).
DCTCP.BytesAcked would have been more meaningful. I'm not insisting on 
this - the original comment was just to improve comprehensibility.
>
>
> s/TCP SACK options [RFC2018] are ignored./
>  /The sender MUST ignore TCP SACK options [RFC2018] for this computation./
> RATIONALE:
>   a) I think the idea is to ignore CE marks after a gap because, if 
> the gap becomes considered a loss, the response to loss will override 
> any need to count incoming CE marks for the following round trip. 
> However, if the gap is filled before it is considered a loss {Note 1}, 
> then CE marks that were ignored because they arrived after a gap will 
> need to be "unignored".
>   b) We need to make clear that SACK is not ignored altogether, only 
> for this computation.
>
> {Note 1}: I can think of two cases:
> * Less than three dupacks then the gap is filled
> * The gap is filled by a delayed segment so that an earlier 
> retransmission was spurious
>
> <Praveen> Ok I will update the text.
>
> the sender MUST update cwnd as follows:
>        cwnd = cwnd * (1 - DCTCP.Alpha / 2)
>
> s/MUST/SHOULD/
> Rationale: Alternative interoperable response functions are possible. 
> For instance an algorithm like Relentless TCP that simply reduces cwnd 
> by half the size of any CE-marked segment. Then there is no Alpha and 
> no g variable.
>
> <Praveen> Fixed
>
>
> CURRENT:
>
>     Just as specified in [RFC3168 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2frfc3168&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Ce9m5q0XzkbQNWEb1rqmXrqx12Uf4LDscrUAz5C78eY%3d>], TCP should not react to congestion
>     indications more than once for every window of data.
>
> SUGGESTED:
>
>     Just as specified in [RFC3168 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2frfc3168&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Ce9m5q0XzkbQNWEb1rqmXrqx12Uf4LDscrUAz5C78eY%3d>], DCTCP does not react to congestion
>     indications more than once for every window of data.
>
> I don't know whether you intended this to be an upper-case SHOULD NOT. 
> If so, I would argue against this being normative, because it isn't 
> necessary for interop; therefore I've suggested avoiding he 'should' word.
>
> <Praveen> Fixed
>
>     The setting of
>     the "Congestion Window Reduced" (CWR) bit is also exactly as per
>     [RFC3168 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2frfc3168&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Ce9m5q0XzkbQNWEb1rqmXrqx12Uf4LDscrUAz5C78eY%3d>].
>   
>
> It is wrong to say "exactly". The sender certainly sets CWR when it 
> does a congestion response. But in DCTCP the congestion response 
> occurs every RTT, triggered by its own measurement-period logic, 
> whereas in RFC3168 it is triggered by the arrival of an ECE after the 
> ACK of any previous segment it sent with CWR set.
>
> I think it is worth saying that setting CWR is necessary for safety, 
> otherwise in the case where there has been a configuration management 
> error and the receiver complies with RFC 3168 not DCTCP, it will 
> continue sending ECE for ever and starve the session.
>
> It might help to explicitly confirm that:
> "Other aspects of TCP congestion avoidance and control such as the 
> slow start phase are no different to those in RFC5681."
>
> <Praveen> Fixed that CWR is needed for safety. Not adding the text 
> around 5681, it is presumed.
>
>
> *3.4 SYN, SYN-ACK, RST
> *
>    [RFC3168] requires that a compliant TCP MUST NOT set ECT on SYN or
>    SYN-ACK packets.
>
> s/MUST NOT/'MUST NOT'/
> Rationale: Best to make it clear this is quoting a normative statement 
> in another RFC, because it is going to be contradicted.
>
[BB] I don't know whether you saw this one - it's not been changed in 
draft-02.

I suggested adding quotes around 'MUST NOT'. This is good practice when 
quoting another RFC. Otherwise, to a careless reader (or non-native 
English speaker), it can look like it is also a requirement of this RFC.
>
>
> CURRENT
>
>     These RFCs, however, are
>     intended for general Internet use, and do not directly apply to a
>     controlled datacenter environment.
> ...
>     For DCTCP connections, the sender
>     SHOULD set ECT for SYN, SYN-ACK and RST packets.
>
> SUGGESTED:
>
>    Also, a SYN is sent before the ECN-capability has been
>    negotiated, so if it is CE-marked, a non-ECN TCP server would not
>    recognise the marking. RFC 3168 does not specify the ECN field on
>    a RST. The security concerns addressed by these RFCs
>    might not apply in controlled environments like data centres, and
>    it might not be necessary to cater for the presence of non-ECN
>    servers.
> ...
>
>     Therefore, the sender MUST set ECT for SYN/ACK and RST packets and a
>     configuration option SHOULD be provided to set ECT for SYNs.
>
> RATIONALE:
>    The draft should avoid assuming that security concerns do not apply 
> in all controlled environments.
> NOTE:
>    This might not reflect the way your implementation is coded, so you 
> might want to make that clear.
>
> <Praveen> Adding some text around security concerns.
>
[BB] The security sentence you have added in draft-02 doesn't make sense 
where it is. It ought to have been added once sentence earlier (and it 
would be better to start a new para). Specifically:
CURRENT:
                 be dropped with high probability.  For DCTCP 
connections, the sender
                 SHOULD set ECT for SYN, SYN-ACK and RST packets.  The 
security
                 concerns addressed by both these RFCs might not apply 
in controlled
                 environments like datacenters, and it might not be 
necessary to cater
                 to both the presence of non-ECN servers.
SUGGESTED:
                 be dropped with high probability.
<NEW PARA>
The security concerns addressed by both these RFCs might not apply in 
controlled
                 environments like datacenters, and it might not be 
necessary to cater
for the presence of non-ECN servers. For DCTCP connections, the sender
                 SHOULD set ECT for SYN, SYN-ACK and RST packets.

(Note I also fixed a glitch after "cater".)


>
> *4. Implementation Issues
> *
> CURRENT:
>
>     the implementation MUST choose a suitable
>     estimation gain.  [DCTCP10 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-ietf-tcpm-dctcp-01%23ref-DCTCP10&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=33dRUUxyNa2tcX0xJtMl12nWwzIdHZa6j3kgAL1dxIM%3d>] provides a theoretical basis
>
> SUGGEST:
>
>     the implementation will need to choose a suitable
>     estimation gain.  [ADCTCP10 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-ietf-tcpm-dctcp-01%23ref-DCTCP10&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=33dRUUxyNa2tcX0xJtMl12nWwzIdHZa6j3kgAL1dxIM%3d>] provides a theoretical basis ...
>
> RATIONALE:
>    a) As above, this is only one way to implement the cc, and others 
> could well prove to be interoperable when tested.
>    b) More particularly, it would be better for an implementation to 
> adapt the estimation gain to the RTT, so the spec ought not to imply 
> that one value has to be chosen.
>    c) The later paper [ADCTCP10] provides a corrected way to determine 
> the gain. From memory (I'm offline at the mo) it at least makes 
> throughput inversely proportional to 1/RTT, whereas the approach in 
> [DCTCP10] made it inversely proportional to 1/(RTT^2). I believe the 
> Linux implementation uses the [ADCTCP10] approach.
>
> <Praveen> Fair enough. Updated.
>
>     The implementation must also decide when to use DCTCP.
> ...
>     likely in the same datacenter network.
>
> The 2nd & 3rd paras are a mix of implementation and deployment issues, 
> but mostly deployment, so they might be better moved to the deployment 
> issues section.
>
> <Praveen> I think they are mainly implementation issues – the fact 
> that a lack of negotiation mechanism places a burden on the 
> implementation to provide knobs or heuristics to use DCTCP.
>
>
> CURRENT:
>
>     It is RECOMMENDED that an implementation deal with loss episodes in
>     the same way as conventional TCP.
> ...
>     DCTCP implementation MAY also allow configuration of resetting the
>     value of DCTCP.Alpha as part of processing any loss episodes.
>
> SUGGESTED: Move this whole para to 3.3 Sender Behaviour section, and:
>
>     An implementation MUST deal with loss episodes in
>     the same way as conventional TCP.
> ...
>     DCTCP implementation SHOULD also reset the
>     value of DCTCP.Alpha as part of processing any loss episodes.
>     This aspect of the behaviour MAY be configurable.
>
> RATIONALE:
>    a) There would never be a case for not falling back to conventional 
> TCP  (or do you have one?), so "RECOMMENDED" is too weak.
>    b) I have tried to interpret what you might have meant about 
> resetting Alpha. I'm not sure why you think it might need to be 
> configurable though?
>
> <Praveen> Yes this is a useful suggestion. Making loss handling a must 
> in sender section. Reset of alpha was discussed in tcpm list, and 
> Linux implementation supports it.
>
>     ...it is also RECOMMENDED that an implementation allow
>     configuration of restarting the congestion window (cwnd) of idle
>     DCTCP connections
>
> Where special implementation measures are necessary for low RTT, these 
> should be related to the measured RTT, not hard-coded. There is an 
> assumption in a number of places  that "data centre" implies low RTT. 
> Of course this is true in a single DC, but I believe the the single 
> admin aspect of a DC is what characterises DCTCP, not the low RTT 
> aspect. With the modifications in [ADCTCP], DCTCP generalises to a 
> network with larger RTTs as long as it is still  operated by a single 
> admin.
>
> <Praveen> this paragraph has now been removed per Michaels’ feedback.
>
>
> CURRENT:
>    [RFC3168] forbids the ECN-marking of pure ACK packets, because of the
>    inability of TCP to mitigate ACK-path congestion and protocol-wise
>    preferential treatment by routers.  However, dropping pure ACKs -
>    rather than ECN marking them - has disadvantages for typical
>    datacenter traffic patterns.  Because of the prevalence of bursty
>    traffic patterns that feature transient congestion, dropping of ACKs
>    causes subsequent retransmissions. It is RECOMMENDED that an
>    implementation provide a configuration knob that will cause ECT to 
> be set
>    on such control packes, which can be used in environments where such
>    concerns do not apply.
> SUGGESTED:
>    [RFC3168] forbids the ECN-marking of pure ACK packets, because of the
>    inability of TCP to mitigate ACK-path congestion and the extra
>    advantage to injection attackers that ECN is perceived to offer.
>    For the latter reason RFC 3168 also forbids ECN-marking of
>    retransmissions, window probes and RSTs.
>
>    However, dropping all these control packets -
>    rather than ECN marking them - has considerable performance
>    disadvantages.  It is RECOMMENDED that an
>    implementation provide a configuration knob that will cause ECT to 
> be set
>    on such control packes, which can be used in environments where such
>    concerns do not apply.
> NOTE:
>    This might not reflect the way your implementation is coded, so you 
> might want to make that clear.
>
> <Praveen> Fixed
>
[BB] A new comment...

As pointed out in draft-bagulo-tsvwg-generalized-ecn-00, RFC3168 only 
talks about injection attacks with retransmissions (not ACKs etc). But 
you imply injection attacks were the concern for other control packets 
as well.

I'm not insisting you change this. However, unwarranted accusations of 
security vulnerabilities tend to stick, whether or not they are correct. 
So it would be better not to throw mud around onto even more control 
packets than RFC3168 did.

Cheers


Bob

>
>
> *5. Deployment Issues*
>
>     If
>     the traffic in the datacenter is a mix of conventional TCP and DCTCP,
>     it is RECOMMENDED that DCTCP traffic be segregated from conventional
>     TCP traffic.
>
> In a data centre (or anywhere), is there any case where not 
> segregating them would make sense? If not, this needs to be a MUST, 
> not just a RECOMMENDED. I can imagine cases where there is no 
> long-running conventional TCP or no long-running DCTCP, but that's 
> covered by the "if" at the start of the sentence.
>
> You might want to refer to draft-briscoe-aqm-dualq-coupled as another 
> segregation approach here too.
>
>     Since DCTCP relies on congestion marking by the switches, DCTCP can
>     only be deployed in datacenters where the entire network
>     infrastructure supports ECN.
>
> Surely, the "fall-back to conventional TCP on loss" rule means you can 
> delete this constraint.
>
> <Praveen> Reworded to imply that DCTCP will only work as expected in 
> such cases.
>
>     A
>     variant of DCTCP that can be deployed unilaterally and only requires
>     standard ECN behavior has been described in [ODCTCP 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-ietf-tcpm-dctcp-01%23ref-ODCTCP&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=ZwfJ0LgHQ3vk%2bMD0wiia5yLrcMMSUBcGgPYXmeRu8hE%3d>][BSDCAN],
>
>
> Which part of "standard ECN behaviour" do you mean? The host part? The 
> network part? The receiver part? And "can be deployed unilaterally" 
> ought to be in the active voice not the passive, which would help 
> resolve the ambiguity too.
>
> I haven't read [ODCTCP] and [BSDCAN] (I'm offline on a plane at the 
> mo). Are they similar to "Instant ECN" in [Wu12] listed below? If not, 
> that could be referred to as well for a technique that uses the 
> regular ECN host behaviour.
>
> <Praveen> This edit predates me and  I don’t have the full context, I 
> am leaving this as is.
>
>
> *6. Known Issues
> *
> First para could refer to appendix A of RFC7560, which describes the 
> problem precisely.
>
> A method for improving the fairness of DCTCP has been proposed
>     in [ADCTCP 
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-ietf-tcpm-dctcp-01%23ref-ADCTCP&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=edU2TPGpLvyFGvgocDRJiYRQyjCT%2fWEOgS2MwmwzYuU%3d>], but requires additional experimental evaluation.
>
> s/fairness/RTT fairness/
>
> (In our experiments, it's much better than without.)
>
> <Praveen> Goal is to cite other work that can address potential 
> issues. It is not a recommendation. s/fairness/RTT fairness/ Fixed in 
> next revision.
>
>
> *11. References
> *
> I think the following are cited normatively, not just informatively:
> RFC5681, RFC5562.
>
> Additional informational Reference:
>
> [Wu12] Wu, H., Ju, J., Lu, G., Guo, C., Xiong, Y. & Zhang, Y., "Tuning 
> ECN for Data Center Networks," In: Proceedings of the 8th 
> International Conference on Emerging Networking Experiments and 
> Technologies CoNEXT '12 pp.25-36 ACM (2012)
>
> <Praveen> Citation to RFCs updated as normative. Not including the new 
> reference since even though it may be relevant – it was not used as a 
> reference in writing the draft.
>
>
> That's it. HTH.
>
>
> Bob
>
> -- 
> ________________________________________________________________
> Bob Briscoehttp://bobbriscoe.net/ 
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fbobbriscoe.net%2f&data=01%7c01%7cpravb%40microsoft.com%7ca572b0ba8e1b47d3e44a08d2e6d7e03f%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iEvhWj9FCPozWAp8rnxk5VZdNF%2fv7W7aD1Cp4MxOiSs%3d>

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/