[aqm] Review of draft-ietf-aqm-ecn-benefits-03
Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Thu, 23 April 2015 17:29 UTC
Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 83DD21ACD98 for <aqm@ietfa.amsl.com>; Thu, 23 Apr 2015 10:29:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.91
X-Spam-Level:
X-Spam-Status: No, score=-3.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkkTHXXV2GJ9 for <aqm@ietfa.amsl.com>; Thu, 23 Apr 2015 10:29:31 -0700 (PDT)
Received: from smtp.ee.ethz.ch (smtp.ee.ethz.ch [129.132.2.219]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2A5AA1ACEAC for <aqm@ietf.org>; Thu, 23 Apr 2015 10:28:56 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by smtp.ee.ethz.ch (Postfix) with ESMTP id 2911FD930A; Thu, 23 Apr 2015 19:28:55 +0200 (MEST)
X-Virus-Scanned: by amavisd-new on smtp.ee.ethz.ch
Received: from smtp.ee.ethz.ch ([127.0.0.1]) by localhost (.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id gMyafaHCCGXt; Thu, 23 Apr 2015 19:28:54 +0200 (MEST)
Received: from [82.130.103.143] (nb-10510.ethz.ch [82.130.103.143]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mirjak) by smtp.ee.ethz.ch (Postfix) with ESMTPSA id CFF64D9303; Thu, 23 Apr 2015 19:28:54 +0200 (MEST)
Message-ID: <55392BD6.501@tik.ee.ethz.ch>
Date: Thu, 23 Apr 2015 19:28:54 +0200
From: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: aqm@ietf.org, Michael Welzl <michawe@ifi.uio.no>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/z9bQJ0K2fHlx00FcqmM4LjXf4pY>
Subject: [aqm] Review of draft-ietf-aqm-ecn-benefits-03
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Apr 2015 17:29:34 -0000
Hi Gorry, hi Michael, as promised here is my review of draft-ietf-aqm-ecn-benefits-03. My overall comment is that even after reading the document (or even slightly more than before) I'm not completely sure what the purpose of this document is and also what the audience is this documented is directed to. Currently this document seems to do two things: 1. it list benefits (which is interesting for someone who thinks about enabling ECN) and 2. it kind of outlines needed steps for deployment (which would be directed to someone who gets the task from his manager to turn on ECN). However, the second point is not clearly spelled out and therefore it might be rather confusing for some people to read the second part of the document. Also the second part is to some extend still work-in-progress, therefor I would recommend to only focus this document on the first part. For the first part (listing benefits) it might also be good to make clear/distinguish who has these benefits. I think all benefits that are currently listed are only advantageous for the end host/application. Are there any benefits for a network operator? Would it be possible to write this document such that I could also use it to point network operators to and give them an incentive to enable ECN? Another high level comment is that you say in the introduction that this document "also identifies some potential problems that might occur when ECN is used" but then you don't really discuss them. I think to show both sides of the coin in this document would make the document more useful (and more honest). One point that you mention slightly here is that cheating is easier than with loss by not providing the feedback. Another point might be fairness between ECN and non-ECN traffic as marking will not reduce the queue length and therefore might lead to a higher loss rate for the non-ECN traffic instaed. I guess there are papers about this; don't have any by hand right now. Are there any other problems that should be mentioned? Find more detailed comment by section below: Abstract -------- ...says "...potential benefits when applications enable Explicit Congestion Notification (ECN)" -> usually an application cannot able ECN because usually it's a system setting...? Section 1 --------- ... says "..separate configuration of the drop and mark thresholds is known to be supported in some network devices and this is recommended [RFC2309.bis]." RFC2309bis does not recommend different settings, it only say that it should be possible have different configuration of both. Further, I think this should not only concern THE threshold (whatever this is) but usually there are several parameters you might want to set independent of each other, e.g. the max mark/drop probability in RED. Section 2 ---------- 1) I'm not sure I understand the purpose of this section or maybe just the title is wrong. I'm currently seeing this section rather as a section that provides the needed background knowledge than is talking about deployment. For this purpose I'd put all references and potentially a brief summary to other RFC/drafts on ECN in this section including RFC2884, RFC4774, RFC5562, RFC6040, RFC6679, draft-briscoe-tsvwg-ecn-encap-guidelines and draft-ietf-tcpm-accecn-reqs (and rename it). 2) Second paragraph says: "Network devices must not drop packets solely because these codepoints are used [RFC2309.bis]." Not sure this is the right document to says this (because currently it not seems to be directed to network operator/equipment vendors but admins/application developers). However, if it says this, it should also say that network devices should not bleach these bits. 3) First bullet in list says "A recent survey reported growing support for ECN on common network paths [TR15]." This sounds like TR15 shows that ECN is actually used in the Internet. However, TR15 only shows that there are only very few cases left where ECN packets are dropped or incorrectly altered. Please clarify or remove this sentence here. 4) You could cite draft-bensley-tcpm-dctcp-00 instead of the DCTCP Sigcomm paper (or both). 5) I would remove the subsection headings (both 2.1 and 2.2) and just add the text there to the main part of the section. 6) "An AQM algorithm that supports ECN needs to define the threshold and algorithm for ECN-marking." This is kind of self-redundant and therefore does not really makes sense to me to say; of course an algo that supports ECN needs to say something about ECN... 7) You can use TR15 to provide a reference for the first paragraph in section 2.2: "Cases have been noted where a sending endpoint marks a packet with a non-zero ECN mark, but the packet is received with a zero ECN value by the remote endpoint." 8) I'd move the second paragraph of section 2.2. ("The current..") to a potentially new problems section, talking about known/previous deployment problems. 9) I would simply remove paragraph 3-4 of section 2.2 because this was basically as already mentioned by referring to 2309bis and rfc6040 in section 2.1. Section 3.2 ------------ 1) Don't understand why there is a listing here...? Just remove the listing and make text out of it...? 2) The sentence "This also avoids the inefficiency of dropping data that has already made it across at least part of the network path." does not belong in this section. This sentence should just be moved to section 3.1 (or in an own section) and must be further explained, saying that dropping packet at the of the path has already blocked resources that other traffic could have used otherwise. Section 3.3 ----------- 1) I'd say this section misses on part of the discussion. It is true that if by chance your last packet(s) get lost ECN can help. However, this section reads a little like, with ECN it is save to send packet bursts. Which is not true because even if ECN is used by a network device, the queue might be too small to hold the whole burst. I believe this case happen very often which might be a reason for the higher tail loss probability that sometimes is experienced with IW10. Please add this point to the discussion. 2) I don't really get the point of the second paragraph. First of all it is confusion that this paragraph starts which "In addition to avoiding HOL blocking,.."; I guess that is left over from a previous version of this text...? And then you talk about a connection that is currently idle, so why is the performance of this connection that is currently not sending anything reduced? 3) I don't understand what "applications that send intermittent bursts of data, and rely upon timer-based recovery of packet loss" are...? Isn't the transport responsible to not send bursts and care about recovery...? 4) For the last paragraph in section 3.3 note that stacks often remember RTT measurements for a certain IP address and set the initial RTO based on this information. Section 3.4 ----------- You still need FEC or some kind of error concealment even if ECN is used because you can never be sure that your packet are not get dropped (by non-ECN-enable devices or other reasons). Therefore using ECN will clearly not reduce complexity. The only thing you can do is to potentially reduce the amount of redundancy you send if you know that a certain path is ECN enables or don't see losses at the beginning of a connection. This can save network resources but actually might not improve user experience; in fact the user experience might be worse in case there are sudden losses. Further the text says "negative impact of using loss-hiding mechanisms"; I don't really think that FEC has a negative impact as long as you've send enough redundancy...? Error concealment might but is used less and less. I'd recommend to talk about error concealment only in this last paragraph and explain a little further. Section 3.5 ---------- "Recording the presence of CE-marked packets can therefore provide information about the performance of the network path." Would change to: "Recording the presence of CE-marked packets in absence of loss can therefore provide information about the performance of the network path." And also say more concretely what is meant with 'performance of the network path' -> congestion level or no drops by other middleboxes on this path... Section 3.6 ----------- 1) I like the section but I would phrase it differently; also it's not clear who needs to support what in this case. I'd like to propose the following text [not sure about the heading...]: "3.6 Opportunity to provide an improved congestion feedback signal Loss and ECN marking are both used as an indication for congestion. However, while the amount of feedback that is provided by loss should naturally be minimized, this is not the case for ECN. With ECN a network node could provide richer and more frequent feedback on the congestion state of a link which then could be used by the control mechanisms implemented in end host to make a more appropriate decision on how to react to congestion and to react faster to changes in congestion state. Further while drop-based AQM mechanisms usually operate on a smoothed queue length estimation (instead of the instantaneous queue length) and therefore slightly delay the feedback signal to avoid unnecessary losses in case of transient congestion, this would be not necessary for ECN. If congestion is only transient due to short traffic bursts that are active for less than one RTT, the congestion signal would reach the sender at a time where the congestion is already cleared up. However, instead delaying the feedback in the network, the end host could reduce its sending rate incrementally based on the extend of congestion (that was experienced over e.g. the last RTT) similar as DCTCP. In case if the congestion is only transient, the end host would only reduce its rate slightly and be able to catch up quickly again. However, in case the congestion is persistent, this would help to remove additional delays from the network and resolve congestion faster which after all reduces the average queuing delay. However, current ECN is defined as a 'drop equivalent' in RFC3168. To change the semantics of ECN both the AQM in the network nodes and the control mechanism in the end hosts would still need to cope with nodes or end hosts that rely on the old semantics. Therefore changing the semantics can be done more easily in confined environment such as a data center. DCTCP is an example that changes both the configuration of the used AQM as well as the congestion response in the end host and relies on that fact that all nodes in data center are configured the same way. [Deployment strategies to change the semantics of ECN in the Internet are currently under discussion in the IETF.]" 2) I'd move the 1. and 2. paragraph of section 3.6.1 to the background/deployment section or to the intro depending what you going to do with section 2. Sections 4 & 5 --------- First sentence talks about "operational difficulties when the network only partially supports the use of ECN, or to respond to the challenges due to misbehaving network devices and/or endpoints". I think these are to very different things. Misbehaving network devices is a point for a problems section (where the lesson learned is that we didn't think carefully enough about incremental deployment in the first place but do now). However, partial deployment is not a problem but is a thing we simply have to cope with. The text sound as if the goal would be that every router in the whole Internet would at some point of time be ECN-enabled. I don't think this will ever happen and is also not the goal for me. Routers that are very unlikely to ever get congested should no be required to look at the ECN bits or monitor the queue length to calculate a mark/drop probability. However as I said at the beginning I don't really thing that sections 4 and 5 belong in this document. If you decided to keep them (you have to change the abstract) and I'd recommend to rename them e.g 4. 'Incremental Deployment Strategy' or 'Requirements to enable Incremental Deployment' and 5. 'Recommendations for enabling ECN in network nodes and end hosts'. I hope that's helpful! Let me know if you have any questions! Mirja
- [aqm] Review of draft-ietf-aqm-ecn-benefits-03 Mirja Kühlewind
- Re: [aqm] Review of draft-ietf-aqm-ecn-benefits-03 Gorry Fairhurst
- Re: [aqm] Review of draft-ietf-aqm-ecn-benefits-03 Mirja Kühlewind