Re: [aqm] Gen-art LC review of draft-ietf-aqm-recommendation-08
Elwyn Davies <elwynd@dial.pipex.com> Wed, 07 January 2015 23:40 UTC
Return-Path: <elwynd@dial.pipex.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C8DC1A86E8; Wed, 7 Jan 2015 15:40:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -98.702
X-Spam-Level:
X-Spam-Status: No, score=-98.702 tagged_above=-999 required=5 tests=[BAYES_50=0.8, GB_ABOUTYOU=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OarSW3nspgGm; Wed, 7 Jan 2015 15:40:44 -0800 (PST)
Received: from mk-outboundfilter-1.mail.uk.tiscali.com (mk-outboundfilter-1.mail.uk.tiscali.com [212.74.114.37]) by ietfa.amsl.com (Postfix) with ESMTP id B632E1A7026; Wed, 7 Jan 2015 15:40:43 -0800 (PST)
X-Trace: 154078626/mk-outboundfilter-1.mail.uk.tiscali.com/PIPEX/$OFF_NET_AUTH_ACCEPTED/TUK-OFF-NET-SMTP-AUTH-PIPEX-Customers/81.187.254.252/-2.2/elwynd@dial.pipex.com
X-SBRS: -2.2
X-RemoteIP: 81.187.254.252
X-IP-MAIL-FROM: elwynd@dial.pipex.com
X-SMTP-AUTH: elwynd@dial.pipex.com
X-MUA: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
X-IP-BHB: Once
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Am8FAPTCrVRRu/78Tmdsb2JhbABSCg4Ig0JYxgqFcwKBUwEBAQEBBgEBI4ReAQEBAgEBGgECFQEFMwMKAQULCxQECRYPCQMCAQIBMRQGAQkDAQUCAQEFiBsMCcNwAQEBAQEFAQEBAQEBAQEBGY8cBVcHhCkFhDUCiB6FHIIBgzyBDjCCPoIEiCuDOYNTPW8BAYECgT8BAQE
X-IPAS-Result: Am8FAPTCrVRRu/78Tmdsb2JhbABSCg4Ig0JYxgqFcwKBUwEBAQEBBgEBI4ReAQEBAgEBGgECFQEFMwMKAQULCxQECRYPCQMCAQIBMRQGAQkDAQUCAQEFiBsMCcNwAQEBAQEFAQEBAQEBAQEBGY8cBVcHhCkFhDUCiB6FHIIBgzyBDjCCPoIEiCuDOYNTPW8BAYECgT8BAQE
X-IronPort-AV: E=Sophos;i="5.07,718,1413241200"; d="scan'208";a="154078626"
X-IP-Direction: OUT
Received: from neut-r.netinf.eu (HELO [81.187.254.252]) ([81.187.254.252]) by smtp.pipex.tiscali.co.uk with ESMTP/TLS/DHE-RSA-AES128-SHA; 07 Jan 2015 23:40:39 +0000
Message-ID: <54ADC3F5.3040706@dial.pipex.com>
Date: Wed, 07 Jan 2015 23:40:37 +0000
From: Elwyn Davies <elwynd@dial.pipex.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: "Fred Baker (fred)" <fred@cisco.com>, "gorry@erg.abdn.ac.uk (erg)" <gorry@erg.abdn.ac.uk>
References: <54947DCF.3030601@scss.tcd.ie> <40842d620667e7d2a33f451dcd8f502b.squirrel@spey.erg.abdn.ac.uk> <30819CFE-21D3-4EF8-ABFE-4C01940399B7@cisco.com>
In-Reply-To: <30819CFE-21D3-4EF8-ABFE-4C01940399B7@cisco.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/aqm/n82DmEYbVV4uJZYoinajjlDEGww
X-Mailman-Approved-At: Thu, 08 Jan 2015 01:19:23 -0800
Cc: draft-ietf-aqm-recommendation.all@tools.ietf.org, General area reviewing team <gen-art@ietf.org>, aqm@ietf.org
Subject: Re: [aqm] Gen-art LC review of draft-ietf-aqm-recommendation-08
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jan 2015 23:40:53 -0000
(Copied to aqm mailing list as suggested by WG chair). Hi. Thanks for your responses. Just a reminder... I am not (these days, anyway) an expert in router queue management, so my comments should not be seen as deep critique of the individual items, but things that come to mind as matters of general control engineering and areas where I feel the language needs clarification - that's what gen-art is for, As a matter of interest it might be useful to explain a bit what scale of routing engine you are thinking about in this paper. This is because I got a feeling from your responses to the buffer bloat question that you are primarily thinking big iron here. The buffer bloat phenomenon has tended to be in smaller boxes where the AQM stuff may or may not be applicable. I don't quite know what your target is here - or if you are thinking over the whole range of sizes. The responses below clearly indicate that you have some examples in mind (Codel, for example which I know nothing about except (now) that it is an AQM WG product) and I don't know what scale of equipment these are really relevant to. Some more responses in line. Regards, Elwyn On 05/01/15 20:32, Fred Baker (fred) wrote: > >> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote: >> >> Fred, I've applied the minor edits. >> >> I have questions to you on the comments blow (see GF:) before I >> proceed. >> >> Gorry > > Adding Elwyn, as the discussion of his comments should include him - > he might b able to clarify his concerns. I started last night to > write a note, which I will now discard and instead comment here. > >>> I am the assigned Gen-ART reviewer for this draft. For background >>> on Gen-ART, please see the FAQ at >>> >>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. >>> >>> Please resolve these comments along with any other Last Call >>> comments you may receive. >>> >>> Document: draft-ietf-aqm-recommendation-08.txt Reviewer: Elwyn >>> Davies Review Date: 2014/12/19 IETF LC End Date: 2014/12/24 IESG >>> Telechat date: (if known) - >>> >>> Summary: Almost ready for BCP. >>> >>> Possibly missing issues: >>> >>> Buffer bloat: The suggestions/discussions are pretty much all >>> about keeping buffer size sufficiently large to avoid burst >>> dropping. It seems to me that it might be good to mention the >>> possibility that one can over provision queues, and this needs to >>> be avoided as well as under provisioning. >>> >> GF: I am not sure - this to me depends use case. > > To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G > line card comes standard from the factory with 200 ms of queue per > 10G interface. If we were to implement Codel on it, Codel would try > desperately to keep the average induced latency less than five ms. If > it tried to make it be 100 microseconds, we would run into the issues > the draft talks about - we're trying to maximize rate while > minimizing mean latency, and due to TCP's dynamics, we would no > longer maximize rate. If 5 ms is a reasonable number (and for > intra-continental terrestrial delays I would think it is), and we set > that variable to 10, 50, or 100 ms, the only harm would be that we > had some probability of a higher mean induced latency than was really > necessary - AQM would be a little less effective. In the worst case, > (suppose we set Codel's limit to 200 ms), it would revert to tail > drop, which is what we already have. > > There are two reasonable responses to this. One would be to note that > high RTT cases, even if auto-tuning mostly works, manual tuning may > deliver better results or tune itself correctly more quickly (on a > 650 ms RTT satcom link, I'd start by changing Codel's 100 ms trigger > to something in the neighborhood of 650 ms). The other is to simply > say that there is no direct harm in increasing the limits, and there > may be value in some use cases. But I would also tend to think that > anyone that actually operates a network already has a pretty good > handle on that fact. So I don't see the value in saying it - which is > mostly why it's not there already. My take on this would be "make as few assumptions about your audience as possible, and write them down". Its a generally interesting topic and would interest people who are not deeply skilled in the art - as well as potentially pulling in some new researchers! > >>> Interaction between boxes using different or the same algorithms: >>> Buffer bloat seems to be generally about situations where chains >>> of boxes all have too much buffer. One thing that is not >>> currently mentioned is the possibility that if different AQM >>> schemes are implemented in various boxes through which a flow >>> passes, then there could be inappropriate interaction between the >>> different algorithms. The old RFC suggested RED and nothing else >>> so that one just had one to make sure multiple RED boxes in >>> series didn't do anything bad. With potentially different >>> algorithms in series, one had better be sure that the mechanisms >>> don't interact in a bad way when chained together - another >>> research topic, I think. >>> >> GF: I think this could be added as an area for continued research >> mentioned in section 4.7. At least I know of some poor >> interactions between PIE and CoDel on particular paths - where both >> algorithms are triggered. However, I doubt if this is worth much >> discussion in this document? thoughts? >> >> Suggest: "The Internet presents a wide variety of paths where >> traffic can experience combinations of mechanisms that can >> potentially interact to influence the performance of applications. >> Research therefore needs to consider the interactions between >> different AQM algorithms, patterns of interaction in network >> traffic and other network mechanisms to ensure that multiple >> mechanisms do not inadvertently interact to impact performance." > > Mentioning it as a possible research area makes sense. Your proposed > text is fine, from my perspective. > Yes. I think something like this would be good. The buffer bloat example is probably an extreme case of things not having AQM at all and interacting badly. It would maybe be worth mentioning that any AQM mechanism has also got to work in series with boxes that don't have any active AQM - just tail drop. Ultimately, I would say this is just a matter of control engineering principles: You are potentially making a network in which various control algorithms are implemented on different legs/nodes and the combination of transfer functions could possibly be unstable. Has anybody applied any of the raft of control theoretic methods to these algorithms? I have no idea! > I start by questioning the underlying assumption, though, which is > that bufferbloat is about paths in which there are multiple > simultaneous bottlenecks. Yes, that occurs (think about paths that > include both Cogent and a busy BRAS or CMTS, or more generally, if > any link has some probability of congesting, math sophomore > statistics course maintained that any pair of links has the product > of the two probabilities of being simultaneously congested), but I'd > be hard-pressed to make a statistically compelling argument out of > it. The research and practice I have seen has been about a single > bottleneck. Please don't fixate on buffer bloat! > >>> Minor issues: s3, para after end of bullet 3: >>>> The projected increase in the fraction of total Internet >>>> traffic for more aggressive flows in classes 2 and 3 could pose >>>> a threat to the performance of the future Internet. There is >>>> therefore an urgent need for measurements of current conditions >>>> and for further research into the ways of managing such flows. >>>> This raises many difficult issues in finding methods with an >>>> acceptable overhead cost that can identify and isolate >>>> unresponsive flows or flows that are less responsive than TCP. >>> >>> Question: Is there actually any published research into how one >>> would identify class 2 or class 3 traffic in a router/middle box? >>> If so it would be worth noting - the text call for "further >>> research" seems to indicate there is something out there. >>> >> GF: I think the text is OK. > > Agreed. Elwyn's objection appears to be to the use of the word > "further"; if we don't know of a paper, he'd like us to call for > "research". The papers that come quickly to my mind are various > papers on non-responsive flows, such as > http://www.icir.org/floyd/papers/collapse.may99.pdf or > http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf. > We already have a pretty extensive bibliography... > Right either remove/alter "further" if there isn't anything already out there or put in some reference(s). >>> s4.2, next to last para: Is it worth saying also that the >>> randomness should avoid targeting a single flow within a >>> reasonable period to give a degree of fairness. > > Network devices SHOULD use an AQM algorithm to determine the packets > that are marked or discarded due to congestion. Procedures for > dropping or marking packets within the network need to avoid > increasing synchronization events, and hence randomness SHOULD be > introduced in the algorithms that generate these congestion signals > to the endpoints. > >> GF: Thoughts? > > I worry. The reasons for the randomness are (1) to tend to hit > different sessions, and (2) when the same session is hit, to minimize > the probability of multiple hits in the same RTT. It might be worth > saying as much. However, to *stipulate* that algorithms should limit > the hit rate on a given flow invites a discussion of stateful > inspection algorithms. If someone wants to do such a thing, I'm not > going to try to stop them (you could describe fq_* in those terms), > but I don't want to put the idea into their heads (see later comment > on privacy). Also, that is frankly more of a concern with Reno than > with NewReno, and with NewReno than with anything that uses SACK. > SACK will (usually) retransmit all dropped segments in the subsequent > RTT, while NewReno will retransmit the Nth dropped packet in the Nth > following RTT, and Reno might take that many RTO timeouts. You have thought about what I said. Put in what you think it needs. > >>> s4.2.1, next to last para: >>>> An AQM algorithm that supports ECN needs to define the >>>> threshold and algorithm for ECN-marking. This threshold MAY >>>> differ from that used for dropping packets that are not marked >>>> as ECN-capable, and SHOULD be configurable. >>>> >>> Is this suggestion really compatible with recommendation 3 and >>> s4.3 (no tuning)? >>> >> GF: I think making a recommendation here is beyond the "BCP" >> experience, although I suspect that a lower marking threshold is >> generally good. Should we add it also to the research agenda as an >> item at the end of para 3 in S4.7.? I think you may have misunderstood what I am saying here. Rec 3 and s4.3 say things should work without tuning. Doesn't having to set these thresholds/algorithms constitute tuning? If so then it makes it difficult to see these ECN schemes as meeting the constraints. If you disagree then explain how it isn't - or suggest that there should be research to see how to make ECN zero config as well. > > I can see adding it to the research agenda; the comment comes from > Bob Briscoe's research. > > That said, any algorithm using any mechanism by definition needs to > specify any variables it uses - Codel, for example, tries to keep a > queue at 5 ms or less, and cuts in after a queue fails to empty for a > period of 100 ms. I don't see a good argument for saying "but an > ECN-based algorithm doesn't need to define its thresholds or > algorithms". Also, as I recall, the MAY in the text came from the > fact that Bob seemed to think there was value in it (which BTW I > agree with). To my mind, SHOULD and MUST are strong words, but absent > such an assertion, an implementation MAY do just about anything that > comes to the implementor's mind. So saying an implementation MAY <do > something> is mostly a suggestion that an implementor SHOULD think > about it. Are we to say that an implementor, given Bob's research, > should NOT think about giving folks the option? > > I also don't think Elwyn's argument quite follows. When I say that an > algorithm should auto-tune, I'm not saying that it should not have > knobs; I'm saying that the default values of those knobs should be > adequate for the vast majority of use cases. I'm also not saying that > there should be exactly one initial default; I could easily imagine > an implementation noting the bit rate of an interface and the ping > RTT to a peer and pulling its initial configuration out of a table. That would be at least partially acceptable as a mode of operation. But you might have a "warm-up" issue - would it work OK while the algorithm was working out what the RTT actually was? And would the algorithms adapt autonomously (i.e., auto-tune) to close in on optimum values after picking initial values from the table? > >>> s7: There is an arguable privacy concern that if schemes are >>> able to identify class 2 or class 3 flows, then a core device can >>> extract privacy related info from the identified flows. >>> >> GF: I don't see how traffic profiles expose privacy concerns, sure >> users and apps can be characterised by patterns of interaction - >> but this isn't what is being talked about here. > > Agreed. If the reference is to RFC 6973, I don't see a violation of > https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared > to be inviting stateful inspection algorithms. To give an example of > how difficult sessions are managed, RFC 6057 uses the CTS message in > round-robin fashion to push back on top-talker users in order to > enable the service provider to give consistent service to all of his > subscribers when a few are behaving in a manner that might prevent > him from doing so. Note that the "session", in that case, is not a > single TCP session, but a bittorrent-or-whatever server engaged in > sessions to tens or hundreds of peers. The fact that a few users > receive some pushback doesn't reveal the identities of those users. > I'd need to hear the substance behind Elwyn's concern before I could > write anything. My reaction was that if your algorithm identifies flows then you have On 05/01/15 20:32, Fred Baker (fred) wrote:> >> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote: >> >> Fred, I've applied the minor edits. >> >> I have questions to you on the comments blow (see GF:) before I proceed. >> >> Gorry > > Adding Elwyn, as the discussion of his comments should include him - he might b able to clarify his concerns. I started last night to write a note, which I will now discard and instead comment here. > >>> I am the assigned Gen-ART reviewer for this draft. For background on >>> Gen-ART, please see the FAQ at >>> >>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. >>> >>> Please resolve these comments along with any other Last Call comments >>> you may receive. >>> >>> Document: draft-ietf-aqm-recommendation-08.txt >>> Reviewer: Elwyn Davies >>> Review Date: 2014/12/19 >>> IETF LC End Date: 2014/12/24 >>> IESG Telechat date: (if known) - >>> >>> Summary: Almost ready for BCP. >>> >>> Possibly missing issues: >>> >>> Buffer bloat: The suggestions/discussions are pretty much all about >>> keeping buffer size >>> sufficiently large to avoid burst dropping. It seems to me that it might >>> be good to >>> mention the possibility that one can over provision queues, and this needs >>> to be avoided >>> as well as under provisioning. >>> >> GF: I am not sure - this to me depends use case. > > To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G line card comes standard from the factory with 200 ms of queue per 10G interface. If we were to implement Codel on it, Codel would try desperately to keep the average induced latency less than five ms. If it tried to make it be 100 microseconds, we would run into the issues the draft talks about - we're trying to maximize rate while minimizing mean latency, and due to TCP's dynamics, we would no longer maximize rate. If 5 ms is a reasonable number (and for intra-continental terrestrial delays I would think it is), and we set that variable to 10, 50, or 100 ms, the only harm would be that we had some probability of a higher mean induced latency than was really necessary - AQM would be a little less effective. In the worst case, (suppose we set Codel's limit to 200 ms), it would revert to tail drop, which is what we already have. > > There are two reasonable responses to this. One would be to note that high RTT cases, even if auto-tuning mostly works, manual tuning may deliver better results or tune itself correctly more quickly (on a 650 ms RTT satcom link, I'd start by changing Codel's 100 ms trigger to something in the neighborhood of 650 ms). The other is to simply say that there is no direct harm in increasing the limits, and there may be value in some use cases. But I would also tend to think that anyone that actually operates a network already has a pretty good handle on that fact. So I don't see the value in saying it - which is mostly why it's not there already. > >>> Interaction between boxes using different or the same algorithms: Buffer >>> bloat seems to >>> be generally about situations where chains of boxes all have too much >>> buffer. One thing >>> that is not currently mentioned is the possibility that if different AQM >>> schemes are >>> implemented in various boxes through which a flow passes, then there could >>> be inappropriate >>> interaction between the different algorithms. The old RFC suggested RED >>> and nothing else so >>> that one just had one to make sure multiple RED boxes in series didn't do >>> anything bad. With >>> potentially different algorithms in series, one had better be sure that >>> the mechanisms don't >>> interact in a bad way when chained together - another research topic, I >>> think. >>> >> GF: I think this could be added as an area for continued research >> mentioned in section 4.7. At least I know of some poor interactions >> between PIE and CoDel on particular paths - where both algorithms are >> triggered. However, I doubt if this is worth much discussion in this >> document? thoughts? >> >> Suggest: >> "The Internet presents a wide variety of paths where traffic can >> experience combinations of mechanisms that can potentially interact to >> influence the performance of applications. Research therefore needs to >> consider the interactions between different AQM algorithms, patterns of >> interaction in network traffic and other network mechanisms to ensure that >> multiple mechanisms do not inadvertently interact to impact performance." > > Mentioning it as a possible research area makes sense. Your proposed text is fine, from my perspective. > > I start by questioning the underlying assumption, though, which is that bufferbloat is about paths in which there are multiple simultaneous bottlenecks. Yes, that occurs (think about paths that include both Cogent and a busy BRAS or CMTS, or more generally, if any link has some probability of congesting, math sophomore statistics course maintained that any pair of links has the product of the two probabilities of being simultaneously congested), but I'd be hard-pressed to make a statistically compelling argument out of it. The research and practice I have seen has been about a single bottleneck. > >>> Minor issues: >>> s3, para after end of bullet 3: >>>> The projected increase in the fraction of total Internet traffic for >>>> more aggressive flows in classes 2 and 3 could pose a threat to the >>>> performance of the future Internet. There is therefore an urgent >>>> need for measurements of current conditions and for further research >>>> into the ways of managing such flows. This raises many difficult >>>> issues in finding methods with an acceptable overhead cost that can >>>> identify and isolate unresponsive flows or flows that are less >>>> responsive than TCP. >>> >>> Question: Is there actually any published research into how one would >>> identify >>> class 2 or class 3 traffic in a router/middle box? If so it would be >>> worth noting - >>> the text call for "further research" seems to indicate there is >>> something out there. >>> >> GF: I think the text is OK. > > Agreed. Elwyn's objection appears to be to the use of the word "further"; if we don't know of a paper, he'd like us to call for "research". The papers that come quickly to my mind are various papers on non-responsive flows, such as http://www.icir.org/floyd/papers/collapse.may99.pdf or http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf. We already have a pretty extensive bibliography... > >>> s4.2, next to last para: Is it worth saying also that the randomness >>> should avoid targeting a single flow within a reasonable period to give >>> a degree of fairness. > > Network devices SHOULD use an AQM algorithm to determine the packets > that are marked or discarded due to congestion. Procedures for > dropping or marking packets within the network need to avoid > increasing synchronization events, and hence randomness SHOULD be > introduced in the algorithms that generate these congestion signals > to the endpoints. > >> GF: Thoughts? > > I worry. The reasons for the randomness are (1) to tend to hit different sessions, and (2) when the same session is hit, to minimize the probability of multiple hits in the same RTT. It might be worth saying as much. However, to *stipulate* that algorithms should limit the hit rate on a given flow invites a discussion of stateful inspection algorithms. If someone wants to do such a thing, I'm not going to try to stop them (you could describe fq_* in those terms), but I don't want to put the idea into their heads (see later comment on privacy). Also, that is frankly more of a concern with Reno than with NewReno, and with NewReno than with anything that uses SACK. SACK will (usually) retransmit all dropped segments in the subsequent RTT, while NewReno will retransmit the Nth dropped packet in the Nth following RTT, and Reno might take that many RTO timeouts. > On 05/01/15 20:32, Fred Baker (fred) wrote:> >> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote: >> >> Fred, I've applied the minor edits. >> >> I have questions to you on the comments blow (see GF:) before I proceed. >> >> Gorry > > Adding Elwyn, as the discussion of his comments should include him - he might b able to clarify his concerns. I started last night to write a note, which I will now discard and instead comment here. > >>> I am the assigned Gen-ART reviewer for this draft. For background on >>> Gen-ART, please see the FAQ at >>> >>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. >>> >>> Please resolve these comments along with any other Last Call comments >>> you may receive. >>> >>> Document: draft-ietf-aqm-recommendation-08.txt >>> Reviewer: Elwyn Davies >>> Review Date: 2014/12/19 >>> IETF LC End Date: 2014/12/24 >>> IESG Telechat date: (if known) - >>> >>> Summary: Almost ready for BCP. >>> >>> Possibly missing issues: >>> >>> Buffer bloat: The suggestions/discussions are pretty much all about >>> keeping buffer size >>> sufficiently large to avoid burst dropping. It seems to me that it might >>> be good to >>> mention the possibility that one can over provision queues, and this needs >>> to be avoided >>> as well as under provisioning. >>> >> GF: I am not sure - this to me depends use case. > > To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G line card comes standard from the factory with 200 ms of queue per 10G interface. If we were to implement Codel on it, Codel would try desperately to keep the average induced latency less than five ms. If it tried to make it be 100 microseconds, we would run into the issues the draft talks about - we're trying to maximize rate while minimizing mean latency, and due to TCP's dynamics, we would no longer maximize rate. If 5 ms is a reasonable number (and for intra-continental terrestrial delays I would think it is), and we set that variable to 10, 50, or 100 ms, the only harm would be that we had some probability of a higher mean induced latency than was really necessary - AQM would be a little less effective. In the worst case, (suppose we set Codel's limit to 200 ms), it would revert to tail drop, which is what we already have. > > There are two reasonable responses to this. One would be to note that high RTT cases, even if auto-tuning mostly works, manual tuning may deliver better results or tune itself correctly more quickly (on a 650 ms RTT satcom link, I'd start by changing Codel's 100 ms trigger to something in the neighborhood of 650 ms). The other is to simply say that there is no direct harm in increasing the limits, and there may be value in some use cases. But I would also tend to think that anyone that actually operates a network already has a pretty good handle on that fact. So I don't see the value in saying it - which is mostly why it's not there already. > >>> Interaction between boxes using different or the same algorithms: Buffer >>> bloat seems to >>> be generally about situations where chains of boxes all have too much >>> buffer. One thing >>> that is not currently mentioned is the possibility that if different AQM >>> schemes are >>> implemented in various boxes through which a flow passes, then there could >>> be inappropriate >>> interaction between the different algorithms. The old RFC suggested RED >>> and nothing else so >>> that one just had one to make sure multiple RED boxes in series didn't do >>> anything bad. With >>> potentially different algorithms in series, one had better be sure that >>> the mechanisms don't >>> interact in a bad way when chained together - another research topic, I >>> think. >>> >> GF: I think this could be added as an area for continued research >> mentioned in section 4.7. At least I know of some poor interactions >> between PIE and CoDel on particular paths - where both algorithms are >> triggered. However, I doubt if this is worth much discussion in this >> document? thoughts? >> >> Suggest: >> "The Internet presents a wide variety of paths where traffic can >> experience combinations of mechanisms that can potentially interact to >> influence the performance of applications. Research therefore needs to >> consider the interactions between different AQM algorithms, patterns of >> interaction in network traffic and other network mechanisms to ensure that >> multiple mechanisms do not inadvertently interact to impact performance." > > Mentioning it as a possible research area makes sense. Your proposed text is fine, from my perspective. > > I start by questioning the underlying assumption, though, which is that bufferbloat is about paths in which there are multiple simultaneous bottlenecks. Yes, that occurs (think about paths that include both Cogent and a busy BRAS or CMTS, or more generally, if any link has some probability of congesting, math sophomore statistics course maintained that any pair of links has the product of the two probabilities of being simultaneously congested), but I'd be hard-pressed to make a statistically compelling argument out of it. The research and practice I have seen has been about a single bottleneck. > >>> Minor issues: >>> s3, para after end of bullet 3: >>>> The projected increase in the fraction of total Internet traffic for >>>> more aggressive flows in classes 2 and 3 could pose a threat to the >>>> performance of the future Internet. There is therefore an urgent >>>> need for measurements of current conditions and for further research >>>> into the ways of managing such flows. This raises many difficult >>>> issues in finding methods with an acceptable overhead cost that can >>>> identify and isolate unresponsive flows or flows that are less >>>> responsive than TCP. >>> >>> Question: Is there actually any published research into how one would >>> identify >>> class 2 or class 3 traffic in a router/middle box? If so it would be >>> worth noting - >>> the text call for "further research" seems to indicate there is >>> something out there. >>> >> GF: I think the text is OK. > > Agreed. Elwyn's objection appears to be to the use of the word "further"; if we don't know of a paper, he'd like us to call for "research". The papers that come quickly to my mind are various papers on non-responsive flows, such as http://www.icir.org/floyd/papers/collapse.may99.pdf or http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf. We already have a pretty extensive bibliography... > >>> s4.2, next to last para: Is it worth saying also that the randomness >>> should avoid targeting a single flow within a reasonable period to give >>> a degree of fairness. > > Network devices SHOULD use an AQM algorithm to determine the packets > that are marked or discarded due to congestion. Procedures for > dropping or marking packets within the network need to avoid > increasing synchronization events, and hence randomness SHOULD be > introduced in the algorithms that generate these congestion signals > to the endpoints. > >> GF: Thoughts? > > I worry. The reasons for the randomness are (1) to tend to hit different sessions, and (2) when the same session is hit, to minimize the probability of multiple hits in the same RTT. It might be worth saying as much. However, to *stipulate* that algorithms should limit the hit rate on a given flow invites a discussion of stateful inspection algorithms. If someone wants to do such a thing, I'm not going to try to stop them (you could describe fq_* in those terms), but I don't want to put the idea into their heads (see later comment on privacy). Also, that is frankly more of a concern with Reno than with NewReno, and with NewReno than with anything that uses SACK. SACK will (usually) retransmit all dropped segments in the subsequent RTT, while NewReno will retransmit the Nth dropped packet in the Nth following RTT, and Reno might take that many RTO timeouts. > >>> s4.2.1, next to last para: >>>> An AQM algorithm that supports ECN needs to define the threshold and >>>> algorithm for ECN-marking. This threshold MAY differ from that used >>>> for dropping packets that are not marked as ECN-capable, and SHOULD >>>> be configurable. >>>> >>> Is this suggestion really compatible with recommendation 3 and s4.3 (no >>> tuning)? >>> >> GF: I think making a recommendation here is beyond the "BCP" experience, >> although I suspect that a lower marking threshold is generally good. >> Should we add it also to the research agenda as an item at the end of para >> 3 in S4.7.? > > I can see adding it to the research agenda; the comment comes from Bob Briscoe's research. > > That said, any algorithm using any mechanism by definition needs to specify any variables it uses - Codel, for example, tries to keep a queue at 5 ms or less, and cuts in after a queue fails to empty for a period of 100 ms. I don't see a good argument for saying "but an ECN-based algorithm doesn't need to define its thresholds or algorithms". Also, as I recall, the MAY in the text came from the fact that Bob seemed to think there was value in it (which BTW I agree with). To my mind, SHOULD and MUST are strong words, but absent such an assertion, an implementation MAY do just about anything that comes to the implementor's mind. So saying an implementation MAY <do something> is mostly a suggestion that an implementor SHOULD think about it. Are we to say that an implementor, given Bob's research, should NOT think about giving folks the option? > > I also don't think Elwyn's argument quite follows. When I say that an algorithm should auto-tune, I'm not saying that it should not have knobs; I'm saying that the default values of those knobs should be adequate for the vast majority of use cases. I'm also not saying that there should be exactly one initial default; I could easily imagine an implementation noting the bit rate of an interface and the ping RTT to a peer and pulling its initial configuration out of a table. > >>> s7: There is an arguable privacy concern that if schemes are able to >>> identify class 2 or class 3 flows, then a core device can extract >>> privacy related info from the identified flows. >>> >> GF: I don't see how traffic profiles expose privacy concerns, sure users >> and apps can be characterised by patterns of interaction - but this isn't >> what is being talked about here. > > Agreed. If the reference is to RFC 6973, I don't see a violation of https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared to be inviting stateful inspection algorithms. To give an example of how difficult sessions are managed, RFC 6057 uses the CTS message in round-robin fashion to push back on top-talker users in order to enable the service provider to give consistent service to all of his subscribers when a few are behaving in a manner that might prevent him from doing so. Note that the "session", in that case, is not a single TCP session, but a bittorrent-or-whatever server engaged in sessions to tens or hundreds of peers. The fact that a few users receive some pushback doesn't reveal the identities of those users. I'd need to hear the substance behind Elwyn's concern before I could write anything. > >> s4.7, para 3: >>> the use of Map/Reduce applications in data centers >>> I think this needs a reference or a brief explanation. >> GF: Fred do you know a reference or can suggest extra text? > > The concern has to do with incast, which is a pretty active research area (http://lmgtfy.com/?q=research+incast). The paragraph asks a question, which is whether the common taxonomy of network flows (mice vs elephants) needs to be extended to include references to herds of mice traveling together, with the result that congestion control algorithms designed under the assumption that a heavy data flow contains an elephant merely introduce head-of-line blocking in short flows. The word "lemmings" is mine. > > I know of at least four papers (Microsoft Research, CAIA, Tsinghua, and KAIST) submitted to various journals in 2014 on the topic. It's also, at least in part, the basis for the DCLC RG. The only ones we could reference, among those, would relate to DCTCP, as the rest have not yet been published. > > Again, I'd like to understand the underlying issue. I doubt that it is that Elwyn doesn't like the question as such. Is it that he's looking for the word “incast” to replace "map/reduce"? > >> --- The edits below have been incorporated in the XML for v-09 --- >>> Nits/editorial comments: >>> General: s/e.g./e.g.,/, s/i.e./i.e.,/ >>> >>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often necessary/ >>> s1.2, para 3: s/a > class of technologies that/a class of technologies >>> that/ >>> >>> s2, first bullet 3: s/Large burst of packets/Large bursts of packets/ >>> >>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe provide >>> refs?? >>> >>> s2.1, last para: s/open a large numbers of short TCP flows/may open a >>> large number of short duration TCP flows/ >>> >>> s4, last para: s/experience occasional issues that need moderation./can >>> experience occasional issues that warrant mitigation./ >>> >>> s4.2, para 6, last sentence: s/similarly react/react similarly/ >>> >>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide when/ >>> >>> s4.7, para 3: >>>> In 2013, >>> "At the time of writing" ? >>> >>> s4.7, para 3: >>>> the use of Map/Reduce applications in data centers >>> I think this needs a reference or a brief explanation. > >>> s4.2.1, next to last para: >>>> An AQM algorithm that supports ECN needs to define the threshold and >>>> algorithm for ECN-marking. This threshold MAY differ from that used >>>> for dropping packets that are not marked as ECN-capable, and SHOULD >>>> be configurable. >>>> >>> Is this suggestion really compatible with recommendation 3 and s4.3 (no >>> tuning)? >>> >> GF: I think making a recommendation here is beyond the "BCP" experience, >> although I suspect that a lower marking threshold is generally good. >> Should we add it also to the research agenda as an item at the end of para >> 3 in S4.7.? > > I can see adding it to the research agenda; the comment comes from Bob Briscoe's research. > > That said, any algorithm using any mechanism by definition needs to specify any variables it uses - Codel, for example, tries to keep a queue at 5 ms or less, and cuts in after a queue fails to empty for a period of 100 ms. I don't see a good argument for saying "but an ECN-based algorithm doesn't need to define its thresholds or algorithms". Also, as I recall, the MAY in the text came from the fact that Bob seemed to think there was value in it (which BTW I agree with). To my mind, SHOULD and MUST are strong words, but absent such an assertion, an implementation MAY do just about anything that comes to the implementor's mind. So saying an implementation MAY <do something> is mostly a suggestion that an implementor SHOULD think about it. Are we to say that an implementor, given Bob's research, should NOT think about giving folks the option? > > I also don't think Elwyn's argument quite follows. When I say that an algorithm should auto-tune, I'm not saying that it should not have knobs; I'm saying that the default values of those knobs should be adequate for the vast majority of use cases. I'm also not saying that there should be exactly one initial default; I could easily imagine an implementation noting the bit rate of an interface and the ping RTT to a peer and pulling its initial configuration out of a table. > >>> s7: There is an arguable privacy concern that if schemes are able to >>> identify class 2 or class 3 flows, then a core device can extract >>> privacy related info from the identified flows. >>> >> GF: I don't see how traffic profiles expose privacy concerns, sure users >> and apps can be characterised by patterns of interaction - but this isn't >> what is being talked about here. > > Agreed. If the reference is to RFC 6973, I don't see a violation of https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared to be inviting stateful inspection algorithms. To give an example of how difficult sessions are managed, RFC 6057 uses the CTS message in round-robin fashion to push back on top-talker users in order to enable the service provider to give consistent service to all of his subscribers when a few are behaving in a manner that might prevent him from doing so. Note that the "session", in that case, is not a single TCP session, but a bittorrent-or-whatever server engaged in sessions to tens or hundreds of peers. The fact that a few users receive some pushback doesn't reveal the identities of those users. I'd need to hear the substance behind Elwyn's concern before I could write anything. > >> s4.7, para 3: >>> the use of Map/Reduce applications in data centers >>> I think this needs a reference or a brief explanation. >> GF: Fred do you know a reference or can suggest extra text? > > The concern has to do with incast, which is a pretty active research area (http://lmgtfy.com/?q=research+incast). The paragraph asks a question, which is whether the common taxonomy of network flows (mice vs elephants) needs to be extended to include references to herds of mice traveling together, with the result that congestion control algorithms designed under the assumption that a heavy data flow contains an elephant merely introduce head-of-line blocking in short flows. The word "lemmings" is mine. > > I know of at least four papers (Microsoft Research, CAIA, Tsinghua, and KAIST) submitted to various journals in 2014 on the topic. It's also, at least in part, the basis for the DCLC RG. The only ones we could reference, among those, would relate to DCTCP, as the rest have not yet been published. > > Again, I'd like to understand the underlying issue. I doubt that it is that Elwyn doesn't like the question as such. Is it that he's looking for the word “incast” to replace "map/reduce"? > >> --- The edits below have been incorporated in the XML for v-09 --- >>> Nits/editorial comments: >>> General: s/e.g./e.g.,/, s/i.e./i.e.,/ >>> >>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often necessary/ >>> s1.2, para 3: s/a > class of technologies that/a class of technologies >>> that/ >>> >>> s2, first bullet 3: s/Large burst of packets/Large bursts of packets/ >>> >>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe provide >>> refs?? >>> >>> s2.1, last para: s/open a large numbers of short TCP flows/may open a >>> large number of short duration TCP flows/ >>> >>> s4, last para: s/experience occasional issues that need moderation./can >>> experience occasional issues that warrant mitigation./ >>> >>> s4.2, para 6, last sentence: s/similarly react/react similarly/ >>> >>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide when/ >>> >>> s4.7, para 3: >>>> In 2013, >>> "At the time of writing" ? >>> >>> s4.7, para 3: >>>> the use of Map/Reduce applications in data centers >>> I think this needs a reference or a brief explanation. > potentially helped a bad actor to pick off such flows or get to know who is communicating in a situation that currently it would be very difficult to know as the queueing is basically flow agnostic. OK this fairly way out but we have seen some pretty serious stuff apparently being done around core routers according to Snowden et al. > >> s4.7, para 3: >>> the use of Map/Reduce applications in data centers I think this >>> needs a reference or a brief explanation. >> GF: Fred do you know a reference or can suggest extra text? > > The concern has to do with incast, which is a pretty active research > area (http://lmgtfy.com/?q=research+incast). The paragraph asks a > question, which is whether the common taxonomy of network flows (mice > vs elephants) needs to be extended to include references to herds of > mice traveling together, with the result that congestion control > algorithms designed under the assumption that a heavy data flow > contains an elephant merely introduce head-of-line blocking in short > flows. The word "lemmings" is mine. > > I know of at least four papers (Microsoft Research, CAIA, Tsinghua, > and KAIST) submitted to various journals in 2014 on the topic. It's > also, at least in part, the basis for the DCLC RG. The only ones we > could reference, among those, would relate to DCTCP, as the rest have > not yet been published. > > Again, I'd like to understand the underlying issue. I doubt that it > is that Elwyn doesn't like the question as such. Is it that he's > looking for the word “incast” to replace "map/reduce"? I was just looking for somebody to define the jargon - As far as I am concerned at this moment "incast" would be just as "bad" since it would produce an equally blank stare followed by a grab for Google. > >> --- The edits below have been incorporated in the XML for v-09 >> --- >>> Nits/editorial comments: General: s/e.g./e.g.,/, s/i.e./i.e.,/ >>> >>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often >>> necessary/ s1.2, para 3: s/a > class of technologies that/a class >>> of technologies that/ >>> >>> s2, first bullet 3: s/Large burst of packets/Large bursts of >>> packets/ >>> >>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe >>> provide refs?? >>> >>> s2.1, last para: s/open a large numbers of short TCP flows/may >>> open a large number of short duration TCP flows/ >>> >>> s4, last para: s/experience occasional issues that need >>> moderation./can experience occasional issues that warrant >>> mitigation./ >>> >>> s4.2, para 6, last sentence: s/similarly react/react similarly/ >>> >>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide >>> when/ >>> >>> s4.7, para 3: >>>> In 2013, >>> "At the time of writing" ? >>> >>> s4.7, para 3: >>>> the use of Map/Reduce applications in data centers >>> I think this needs a reference or a brief explanation. >
- [aqm] Fwd: Gen-art LC review of draft-ietf-aqm-re… Wesley Eddy
- Re: [aqm] Gen-art LC review of draft-ietf-aqm-rec… Fred Baker (fred)
- Re: [aqm] Gen-art LC review of draft-ietf-aqm-rec… Elwyn Davies
- Re: [aqm] Gen-art LC review of draft-ietf-aqm-rec… gorry
- Re: [aqm] Gen-art LC review of draft-ietf-aqm-rec… Fred Baker (fred)
- Re: [aqm] Gen-art LC review of draft-ietf-aqm-rec… Elwyn Davies