Re: [aqm] Gen-art LC review of draft-ietf-aqm-recommendation-08

Elwyn Davies <elwynd@dial.pipex.com> Wed, 07 January 2015 23:40 UTC

Return-Path: <elwynd@dial.pipex.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C8DC1A86E8; Wed, 7 Jan 2015 15:40:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -98.702
X-Spam-Level:
X-Spam-Status: No, score=-98.702 tagged_above=-999 required=5 tests=[BAYES_50=0.8, GB_ABOUTYOU=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OarSW3nspgGm; Wed, 7 Jan 2015 15:40:44 -0800 (PST)
Received: from mk-outboundfilter-1.mail.uk.tiscali.com (mk-outboundfilter-1.mail.uk.tiscali.com [212.74.114.37]) by ietfa.amsl.com (Postfix) with ESMTP id B632E1A7026; Wed, 7 Jan 2015 15:40:43 -0800 (PST)
X-Trace: 154078626/mk-outboundfilter-1.mail.uk.tiscali.com/PIPEX/$OFF_NET_AUTH_ACCEPTED/TUK-OFF-NET-SMTP-AUTH-PIPEX-Customers/81.187.254.252/-2.2/elwynd@dial.pipex.com
X-SBRS: -2.2
X-RemoteIP: 81.187.254.252
X-IP-MAIL-FROM: elwynd@dial.pipex.com
X-SMTP-AUTH: elwynd@dial.pipex.com
X-MUA: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
X-IP-BHB: Once
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Am8FAPTCrVRRu/78Tmdsb2JhbABSCg4Ig0JYxgqFcwKBUwEBAQEBBgEBI4ReAQEBAgEBGgECFQEFMwMKAQULCxQECRYPCQMCAQIBMRQGAQkDAQUCAQEFiBsMCcNwAQEBAQEFAQEBAQEBAQEBGY8cBVcHhCkFhDUCiB6FHIIBgzyBDjCCPoIEiCuDOYNTPW8BAYECgT8BAQE
X-IPAS-Result: Am8FAPTCrVRRu/78Tmdsb2JhbABSCg4Ig0JYxgqFcwKBUwEBAQEBBgEBI4ReAQEBAgEBGgECFQEFMwMKAQULCxQECRYPCQMCAQIBMRQGAQkDAQUCAQEFiBsMCcNwAQEBAQEFAQEBAQEBAQEBGY8cBVcHhCkFhDUCiB6FHIIBgzyBDjCCPoIEiCuDOYNTPW8BAYECgT8BAQE
X-IronPort-AV: E=Sophos;i="5.07,718,1413241200"; d="scan'208";a="154078626"
X-IP-Direction: OUT
Received: from neut-r.netinf.eu (HELO [81.187.254.252]) ([81.187.254.252]) by smtp.pipex.tiscali.co.uk with ESMTP/TLS/DHE-RSA-AES128-SHA; 07 Jan 2015 23:40:39 +0000
Message-ID: <54ADC3F5.3040706@dial.pipex.com>
Date: Wed, 07 Jan 2015 23:40:37 +0000
From: Elwyn Davies <elwynd@dial.pipex.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: "Fred Baker (fred)" <fred@cisco.com>, "gorry@erg.abdn.ac.uk (erg)" <gorry@erg.abdn.ac.uk>
References: <54947DCF.3030601@scss.tcd.ie> <40842d620667e7d2a33f451dcd8f502b.squirrel@spey.erg.abdn.ac.uk> <30819CFE-21D3-4EF8-ABFE-4C01940399B7@cisco.com>
In-Reply-To: <30819CFE-21D3-4EF8-ABFE-4C01940399B7@cisco.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/aqm/n82DmEYbVV4uJZYoinajjlDEGww
X-Mailman-Approved-At: Thu, 08 Jan 2015 01:19:23 -0800
Cc: draft-ietf-aqm-recommendation.all@tools.ietf.org, General area reviewing team <gen-art@ietf.org>, aqm@ietf.org
Subject: Re: [aqm] Gen-art LC review of draft-ietf-aqm-recommendation-08
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jan 2015 23:40:53 -0000

(Copied to aqm mailing list as suggested by WG chair).
Hi.

Thanks for your responses.  Just a reminder... I am not (these days, 
anyway) an expert in router queue management, so my comments should not 
be seen as deep critique of the individual items, but things that come 
to mind as matters of general control engineering and areas where I feel 
the language needs clarification  - that's what gen-art is for,

As a matter of interest it might be useful to explain a bit what scale 
of routing engine you are thinking about in this paper.  This is because 
I got a feeling from your responses to the buffer bloat question that 
you are primarily thinking big iron here.  The buffer bloat phenomenon 
has tended to be in smaller boxes where the AQM stuff may or may not be 
applicable.  I don't quite know what your target is here - or if you are 
thinking over the whole range of sizes.  The responses below clearly 
indicate that you have some examples in mind (Codel, for example which I 
know nothing about except (now) that it is an AQM WG product) and I 
don't know what scale of equipment these are really relevant to.

Some more responses in line.

Regards,
Elwyn

On 05/01/15 20:32, Fred Baker (fred) wrote:
>
>> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote:
>>
>> Fred, I've applied the minor edits.
>>
>> I have questions to you on the comments blow (see GF:) before I
>> proceed.
>>
>> Gorry
>
> Adding Elwyn, as the discussion of his comments should include him -
> he might b able to clarify his concerns. I started last night to
> write a note, which I will now discard and instead comment here.
>
>>> I am the assigned Gen-ART reviewer for this draft. For background
>>> on Gen-ART, please see the FAQ at
>>>
>>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>>>
>>> Please resolve these comments along with any other Last Call
>>> comments you may receive.
>>>
>>> Document: draft-ietf-aqm-recommendation-08.txt Reviewer: Elwyn
>>> Davies Review Date: 2014/12/19 IETF LC End Date: 2014/12/24 IESG
>>> Telechat date: (if known) -
>>>
>>> Summary:  Almost ready for BCP.
>>>
>>> Possibly missing issues:
>>>
>>> Buffer bloat:  The suggestions/discussions are pretty much all
>>> about keeping buffer size sufficiently large to avoid burst
>>> dropping.  It seems to me that it might be good to mention the
>>> possibility that one can over provision queues, and this needs to
>>> be avoided as well as under provisioning.
>>>
>> GF: I am not sure - this to me depends use case.
>
> To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G
> line card comes standard from the factory with 200 ms of queue per
> 10G interface. If we were to implement Codel on it, Codel would try
> desperately to keep the average induced latency less than five ms. If
> it tried to make it be 100 microseconds, we would run into the issues
> the draft talks about - we're trying to maximize rate while
> minimizing mean latency, and due to TCP's dynamics, we would no
> longer maximize rate. If 5 ms is a reasonable number (and for
> intra-continental terrestrial delays I would think it is), and we set
> that variable to 10, 50, or 100 ms, the only harm would be that we
> had some probability of a higher mean induced latency than was really
> necessary - AQM would be a little less effective. In the worst case,
> (suppose we set Codel's limit to 200 ms), it would revert to tail
> drop, which is what we already have.
>
> There are two reasonable responses to this. One would be to note that
> high RTT cases, even if auto-tuning mostly works, manual tuning may
> deliver better results or tune itself correctly more quickly (on a
> 650 ms RTT satcom link, I'd start by changing Codel's 100 ms trigger
> to something in the neighborhood of 650 ms). The other is to simply
> say that there is no direct harm in increasing the limits, and there
> may be value in some use cases. But I would also tend to think that
> anyone that actually operates a network already has a pretty good
> handle on that fact. So I don't see the value in saying it - which is
> mostly why it's not there already.
My take on this would be "make as few assumptions about your audience as 
possible, and write them down".  Its a generally interesting topic and 
would interest people who are not deeply skilled in the art - as well as 
potentially pulling in some new researchers!
>
>>> Interaction between boxes using different or the same algorithms:
>>> Buffer bloat seems to be generally about situations where chains
>>> of boxes all have too much buffer.  One thing that is not
>>> currently mentioned is the possibility that if different AQM
>>> schemes are implemented in various boxes through which a flow
>>> passes, then there could be inappropriate interaction between the
>>> different algorithms.  The old RFC suggested RED and nothing else
>>> so that one just had one to make sure multiple RED boxes in
>>> series didn't do anything bad.  With potentially different
>>> algorithms in series, one had better be sure that the mechanisms
>>> don't interact in a bad way when chained together - another
>>> research topic, I think.
>>>
>> GF: I think this could be added as an area for continued research
>> mentioned in section 4.7. At least I know of some poor
>> interactions between PIE and CoDel on particular paths - where both
>> algorithms are triggered. However, I doubt if this is worth much
>> discussion in this document? thoughts?
>>
>> Suggest: "The Internet presents a wide variety of paths where
>> traffic can experience combinations of mechanisms that can
>> potentially interact to influence the performance of applications.
>> Research therefore needs to consider the interactions between
>> different AQM algorithms, patterns of interaction in network
>> traffic and other network mechanisms to ensure that multiple
>> mechanisms do not inadvertently interact to impact performance."
>
> Mentioning it as a possible research area makes sense. Your proposed
> text is fine, from my perspective.
>
Yes. I think something like this would be good.   The buffer bloat 
example is probably an extreme case of things not having AQM at all and 
interacting badly.  It would maybe be worth mentioning that any AQM 
mechanism has also got to work in series with boxes that don't have any 
active AQM - just tail drop.  Ultimately, I would say this is just a 
matter of control engineering principles:  You are potentially making a 
network in which various control algorithms are implemented on different 
legs/nodes and the combination of transfer functions could possibly be 
unstable.  Has anybody applied any of the raft of control theoretic 
methods to these algorithms?  I have no idea!

> I start by questioning the underlying assumption, though, which is
> that bufferbloat is about paths in which there are multiple
> simultaneous bottlenecks. Yes, that occurs (think about paths that
> include both Cogent and a busy BRAS or CMTS, or more generally, if
> any link has some probability of congesting, math sophomore
> statistics course maintained that any pair of links has the product
> of the two probabilities of being simultaneously congested), but I'd
> be hard-pressed to make a statistically compelling argument out of
> it. The research and practice I have seen has been about a single
> bottleneck.
Please don't fixate on buffer bloat!
>
>>> Minor issues: s3, para after end of bullet 3:
>>>> The projected increase in the fraction of total Internet
>>>> traffic for more aggressive flows in classes 2 and 3 could pose
>>>> a threat to the performance of the future Internet.  There is
>>>> therefore an urgent need for measurements of current conditions
>>>> and for further research into the ways of managing such flows.
>>>> This raises many difficult issues in finding methods with an
>>>> acceptable overhead cost that can identify and isolate
>>>> unresponsive flows or flows that are less responsive than TCP.
>>>
>>> Question: Is there actually any published research into how one
>>> would identify class 2 or class 3 traffic in a router/middle box?
>>> If so it would be worth noting - the text call for "further
>>> research" seems to indicate there is something out there.
>>>
>> GF: I think the text is OK.
>
> Agreed. Elwyn's objection appears to be to the use of the word
> "further"; if we don't know of a paper, he'd like us to call for
> "research". The papers that come quickly to my mind are various
> papers on non-responsive flows, such as
> http://www.icir.org/floyd/papers/collapse.may99.pdf or
> http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf.
> We already have a pretty extensive bibliography...
>
Right either remove/alter "further" if there isn't anything already out 
there or put in some reference(s).

>>> s4.2, next to last para: Is it worth saying also that the
>>> randomness should avoid targeting a single flow within a
>>> reasonable period to give a degree of fairness.
>
> Network devices SHOULD use an AQM algorithm to determine the packets
> that are marked or discarded due to congestion.  Procedures for
> dropping or marking packets within the network need to avoid
> increasing synchronization events, and hence randomness SHOULD be
> introduced in the algorithms that generate these congestion signals
> to the endpoints.
>
>> GF: Thoughts?
>
> I worry. The reasons for the randomness are (1) to tend to hit
> different sessions, and (2) when the same session is hit, to minimize
> the probability of multiple hits in the same RTT. It might be worth
> saying as much. However, to *stipulate* that algorithms should limit
> the hit rate on a given flow invites a discussion of stateful
> inspection algorithms. If someone wants to do such a thing, I'm not
> going to try to stop them (you could describe fq_* in those terms),
> but I don't want to put the idea into their heads (see later comment
> on privacy). Also, that is frankly more of a concern with Reno than
> with NewReno, and with NewReno than with anything that uses SACK.
> SACK will (usually) retransmit all dropped segments in the subsequent
> RTT, while NewReno will retransmit the Nth dropped packet in the Nth
> following RTT, and Reno might take that many RTO timeouts.

You have thought about what I said.  Put in what you think it needs.
>
>>> s4.2.1, next to last para:
>>>> An AQM algorithm that supports ECN needs to define the
>>>> threshold and algorithm for ECN-marking.  This threshold MAY
>>>> differ from that used for dropping packets that are not marked
>>>> as ECN-capable, and SHOULD be configurable.
>>>>
>>> Is this suggestion really compatible with recommendation 3 and
>>> s4.3 (no tuning)?
>>>
>> GF: I think making a recommendation here is beyond the "BCP"
>> experience, although I suspect that a lower marking threshold is
>> generally good. Should we add it also to the research agenda as an
>> item at the end of para 3 in S4.7.?

I think you may have misunderstood what I am saying here.  Rec 3 and 
s4.3 say things should work without tuning.  Doesn't having to set these 
thresholds/algorithms constitute tuning?  If so then it makes it 
difficult to see these ECN schemes as meeting the constraints.  If you 
disagree then explain how it isn't - or suggest  that there should be 
research to see how to make ECN zero config as well.
>
> I can see adding it to the research agenda; the comment comes from
> Bob Briscoe's research.
>
> That said, any algorithm using any mechanism by definition needs to
> specify any variables it uses - Codel, for example, tries to keep a
> queue at 5 ms or less, and cuts in after a queue fails to empty for a
> period of 100 ms. I don't see a good argument for saying "but an
> ECN-based algorithm doesn't need to define its thresholds or
> algorithms". Also, as I recall, the MAY in the text came from the
> fact that Bob seemed to think there was value in it (which BTW I
> agree with). To my mind, SHOULD and MUST are strong words, but absent
> such an assertion, an implementation MAY do just about anything that
> comes to the implementor's mind. So saying an implementation MAY <do
> something> is mostly a suggestion that an implementor SHOULD think
> about it. Are we to say that an implementor, given Bob's research,
> should NOT think about giving folks the option?
>
> I also don't think Elwyn's argument quite follows. When I say that an
> algorithm should auto-tune, I'm not saying that it should not have
> knobs; I'm saying that the default values of those knobs should be
> adequate for the vast majority of use cases. I'm also not saying that
> there should be exactly one initial default; I could easily imagine
> an implementation noting the bit rate of an interface and the ping
> RTT to a peer and pulling its initial configuration out of a table.
That would be at least partially acceptable as a mode of operation.  But 
you might have a "warm-up" issue - would it work OK while the algorithm 
was working out what the RTT actually was?  And would the algorithms 
adapt autonomously (i.e., auto-tune) to close in on optimum values after 
picking initial values from the table?
>
>>> s7:  There is an arguable privacy concern that if schemes are
>>> able to identify class 2 or class 3 flows, then a core device can
>>> extract privacy related info from the identified flows.
>>>
>> GF: I don't see how traffic profiles expose privacy concerns, sure
>> users and apps can be characterised by patterns of interaction -
>> but this isn't what is being talked about here.
>
> Agreed. If the reference is to RFC 6973, I don't see a violation of
> https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared
> to be inviting stateful inspection algorithms. To give an example of
> how difficult sessions are managed, RFC 6057 uses the CTS message in
> round-robin fashion to push back on top-talker users in order to
> enable the service provider to give consistent service to all of his
> subscribers when a few are behaving in a manner that might prevent
> him from doing so. Note that the "session", in that case, is not a
> single TCP session, but a bittorrent-or-whatever server engaged in
> sessions to tens or hundreds of peers. The fact that a few users
> receive some pushback doesn't reveal the identities of those users.
> I'd need to hear the substance behind Elwyn's concern before I could
> write anything.

My reaction was that if your algorithm identifies flows then you have

On 05/01/15 20:32, Fred Baker (fred) wrote:>
 >> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote:
 >>
 >> Fred, I've applied the minor edits.
 >>
 >> I have questions to you on the comments blow (see GF:) before I proceed.
 >>
 >> Gorry
 >
 > Adding Elwyn, as the discussion of his comments should include him - 
he might b able to clarify his concerns. I started last night to write a 
note, which I will now discard and instead comment here.
 >
 >>> I am the assigned Gen-ART reviewer for this draft. For background on
 >>> Gen-ART, please see the FAQ at
 >>>
 >>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
 >>>
 >>> Please resolve these comments along with any other Last Call comments
 >>> you may receive.
 >>>
 >>> Document: draft-ietf-aqm-recommendation-08.txt
 >>> Reviewer: Elwyn Davies
 >>> Review Date: 2014/12/19
 >>> IETF LC End Date: 2014/12/24
 >>> IESG Telechat date: (if known) -
 >>>
 >>> Summary:  Almost ready for BCP.
 >>>
 >>> Possibly missing issues:
 >>>
 >>> Buffer bloat:  The suggestions/discussions are pretty much all about
 >>> keeping buffer size
 >>> sufficiently large to avoid burst dropping.  It seems to me that it 
might
 >>> be good to
 >>> mention the possibility that one can over provision queues, and 
this needs
 >>> to be avoided
 >>> as well as under provisioning.
 >>>
 >> GF: I am not sure - this to me depends use case.
 >
 > To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G 
line card comes standard from the factory with 200 ms of queue per 10G 
interface. If we were to implement Codel on it, Codel would try 
desperately to keep the average induced latency less than five ms. If it 
tried to make it be 100 microseconds, we would run into the issues the 
draft talks about - we're trying to maximize rate while minimizing mean 
latency, and due to TCP's dynamics, we would no longer maximize rate. If 
5 ms is a reasonable number (and for intra-continental terrestrial 
delays I would think it is), and we set that variable to 10, 50, or 100 
ms, the only harm would be that we had some probability of a higher mean 
induced latency than was really necessary - AQM would be a little less 
effective. In the worst case, (suppose we set Codel's limit to 200 ms), 
it would revert to tail drop, which is what we already have.
 >
 > There are two reasonable responses to this. One would be to note that 
high RTT cases, even if auto-tuning mostly works, manual tuning may 
deliver better results or tune itself correctly more quickly (on a 650 
ms RTT satcom link, I'd start by changing Codel's 100 ms trigger to 
something in the neighborhood of 650 ms). The other is to simply say 
that there is no direct harm in increasing the limits, and there may be 
value in some use cases. But I would also tend to think that anyone that 
actually operates a network already has a pretty good handle on that 
fact. So I don't see the value in saying it - which is mostly why it's 
not there already.
 >
 >>> Interaction between boxes using different or the same algorithms: 
Buffer
 >>> bloat seems to
 >>> be generally about situations where chains of boxes all have too much
 >>> buffer.  One thing
 >>> that is not currently mentioned is the possibility that if 
different AQM
 >>> schemes are
 >>> implemented in various boxes through which a flow passes, then 
there could
 >>> be inappropriate
 >>> interaction between the different algorithms.  The old RFC 
suggested RED
 >>> and nothing else so
 >>> that one just had one to make sure multiple RED boxes in series 
didn't do
 >>> anything bad.  With
 >>> potentially different algorithms in series, one had better be sure that
 >>> the mechanisms don't
 >>> interact in a bad way when chained together - another research topic, I
 >>> think.
 >>>
 >> GF: I think this could be added as an area for continued research
 >> mentioned in section 4.7. At least I know of some poor interactions
 >> between PIE and CoDel on particular paths - where both algorithms are
 >> triggered. However, I doubt if this is worth much discussion in this
 >> document? thoughts?
 >>
 >> Suggest:
 >> "The Internet presents a wide variety of paths where traffic can
 >> experience combinations of mechanisms that can potentially interact to
 >> influence the performance of applications. Research therefore needs to
 >> consider the interactions between different AQM algorithms, patterns of
 >> interaction in network traffic and other network mechanisms to 
ensure that
 >> multiple mechanisms do not inadvertently interact to impact 
performance."
 >
 > Mentioning it as a possible research area makes sense. Your proposed 
text is fine, from my perspective.
 >
 > I start by questioning the underlying assumption, though, which is 
that bufferbloat is about paths in which there are multiple simultaneous 
bottlenecks. Yes, that occurs (think about paths that include both 
Cogent and a busy BRAS or CMTS, or more generally, if any link has some 
probability of congesting, math sophomore statistics course maintained 
that any pair of links has the product of the two probabilities of being 
simultaneously congested), but I'd be hard-pressed to make a 
statistically compelling argument out of it. The research and practice I 
have seen has been about a single bottleneck.
 >
 >>> Minor issues:
 >>> s3, para after end of bullet 3:
 >>>>     The projected increase in the fraction of total Internet 
traffic for
 >>>>     more aggressive flows in classes 2 and 3 could pose a threat 
to the
 >>>>     performance of the future Internet.  There is therefore an urgent
 >>>>     need for measurements of current conditions and for further 
research
 >>>>     into the ways of managing such flows.  This raises many difficult
 >>>>     issues in finding methods with an acceptable overhead cost 
that can
 >>>>     identify and isolate unresponsive flows or flows that are less
 >>>>     responsive than TCP.
 >>>
 >>> Question: Is there actually any published research into how one would
 >>> identify
 >>> class 2 or class 3 traffic in a router/middle box? If so it would be
 >>> worth noting -
 >>> the text call for "further research" seems to indicate there is
 >>> something out there.
 >>>
 >> GF: I think the text is OK.
 >
 > Agreed. Elwyn's objection appears to be to the use of the word 
"further"; if we don't know of a paper, he'd like us to call for 
"research". The papers that come quickly to my mind are various papers 
on non-responsive flows, such as 
http://www.icir.org/floyd/papers/collapse.may99.pdf or 
http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf. We 
already have a pretty extensive bibliography...
 >
 >>> s4.2, next to last para: Is it worth saying also that the randomness
 >>> should avoid targeting a single flow within a reasonable period to give
 >>> a degree of fairness.
 >
 >     Network devices SHOULD use an AQM algorithm to determine the packets
 >     that are marked or discarded due to congestion.  Procedures for
 >     dropping or marking packets within the network need to avoid
 >     increasing synchronization events, and hence randomness SHOULD be
 >     introduced in the algorithms that generate these congestion signals
 >     to the endpoints.
 >
 >> GF: Thoughts?
 >
 > I worry. The reasons for the randomness are (1) to tend to hit 
different sessions, and (2) when the same session is hit, to minimize 
the probability of multiple hits in the same RTT. It might be worth 
saying as much. However, to *stipulate* that algorithms should limit the 
hit rate on a given flow invites a discussion of stateful inspection 
algorithms. If someone wants to do such a thing, I'm not going to try to 
stop them (you could describe fq_* in those terms), but I don't want to 
put the idea into their heads (see later comment on privacy). Also, that 
is frankly more of a concern with Reno than with NewReno, and with 
NewReno than with anything that uses SACK. SACK will (usually) 
retransmit all dropped segments in the subsequent RTT, while NewReno 
will retransmit the Nth dropped packet in the Nth following RTT, and 
Reno might take that many RTO timeouts.
 >

On 05/01/15 20:32, Fred Baker (fred) wrote:>
 >> On Jan 5, 2015, at 1:13 AM, gorry@erg.abdn.ac.uk wrote:
 >>
 >> Fred, I've applied the minor edits.
 >>
 >> I have questions to you on the comments blow (see GF:) before I proceed.
 >>
 >> Gorry
 >
 > Adding Elwyn, as the discussion of his comments should include him - 
he might b able to clarify his concerns. I started last night to write a 
note, which I will now discard and instead comment here.
 >
 >>> I am the assigned Gen-ART reviewer for this draft. For background on
 >>> Gen-ART, please see the FAQ at
 >>>
 >>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
 >>>
 >>> Please resolve these comments along with any other Last Call comments
 >>> you may receive.
 >>>
 >>> Document: draft-ietf-aqm-recommendation-08.txt
 >>> Reviewer: Elwyn Davies
 >>> Review Date: 2014/12/19
 >>> IETF LC End Date: 2014/12/24
 >>> IESG Telechat date: (if known) -
 >>>
 >>> Summary:  Almost ready for BCP.
 >>>
 >>> Possibly missing issues:
 >>>
 >>> Buffer bloat:  The suggestions/discussions are pretty much all about
 >>> keeping buffer size
 >>> sufficiently large to avoid burst dropping.  It seems to me that it 
might
 >>> be good to
 >>> mention the possibility that one can over provision queues, and 
this needs
 >>> to be avoided
 >>> as well as under provisioning.
 >>>
 >> GF: I am not sure - this to me depends use case.
 >
 > To me, this is lily-gilding. To pick one example, the Cisco ASR 8X10G 
line card comes standard from the factory with 200 ms of queue per 10G 
interface. If we were to implement Codel on it, Codel would try 
desperately to keep the average induced latency less than five ms. If it 
tried to make it be 100 microseconds, we would run into the issues the 
draft talks about - we're trying to maximize rate while minimizing mean 
latency, and due to TCP's dynamics, we would no longer maximize rate. If 
5 ms is a reasonable number (and for intra-continental terrestrial 
delays I would think it is), and we set that variable to 10, 50, or 100 
ms, the only harm would be that we had some probability of a higher mean 
induced latency than was really necessary - AQM would be a little less 
effective. In the worst case, (suppose we set Codel's limit to 200 ms), 
it would revert to tail drop, which is what we already have.
 >
 > There are two reasonable responses to this. One would be to note that 
high RTT cases, even if auto-tuning mostly works, manual tuning may 
deliver better results or tune itself correctly more quickly (on a 650 
ms RTT satcom link, I'd start by changing Codel's 100 ms trigger to 
something in the neighborhood of 650 ms). The other is to simply say 
that there is no direct harm in increasing the limits, and there may be 
value in some use cases. But I would also tend to think that anyone that 
actually operates a network already has a pretty good handle on that 
fact. So I don't see the value in saying it - which is mostly why it's 
not there already.
 >
 >>> Interaction between boxes using different or the same algorithms: 
Buffer
 >>> bloat seems to
 >>> be generally about situations where chains of boxes all have too much
 >>> buffer.  One thing
 >>> that is not currently mentioned is the possibility that if 
different AQM
 >>> schemes are
 >>> implemented in various boxes through which a flow passes, then 
there could
 >>> be inappropriate
 >>> interaction between the different algorithms.  The old RFC 
suggested RED
 >>> and nothing else so
 >>> that one just had one to make sure multiple RED boxes in series 
didn't do
 >>> anything bad.  With
 >>> potentially different algorithms in series, one had better be sure that
 >>> the mechanisms don't
 >>> interact in a bad way when chained together - another research topic, I
 >>> think.
 >>>
 >> GF: I think this could be added as an area for continued research
 >> mentioned in section 4.7. At least I know of some poor interactions
 >> between PIE and CoDel on particular paths - where both algorithms are
 >> triggered. However, I doubt if this is worth much discussion in this
 >> document? thoughts?
 >>
 >> Suggest:
 >> "The Internet presents a wide variety of paths where traffic can
 >> experience combinations of mechanisms that can potentially interact to
 >> influence the performance of applications. Research therefore needs to
 >> consider the interactions between different AQM algorithms, patterns of
 >> interaction in network traffic and other network mechanisms to 
ensure that
 >> multiple mechanisms do not inadvertently interact to impact 
performance."
 >
 > Mentioning it as a possible research area makes sense. Your proposed 
text is fine, from my perspective.
 >
 > I start by questioning the underlying assumption, though, which is 
that bufferbloat is about paths in which there are multiple simultaneous 
bottlenecks. Yes, that occurs (think about paths that include both 
Cogent and a busy BRAS or CMTS, or more generally, if any link has some 
probability of congesting, math sophomore statistics course maintained 
that any pair of links has the product of the two probabilities of being 
simultaneously congested), but I'd be hard-pressed to make a 
statistically compelling argument out of it. The research and practice I 
have seen has been about a single bottleneck.
 >
 >>> Minor issues:
 >>> s3, para after end of bullet 3:
 >>>>     The projected increase in the fraction of total Internet 
traffic for
 >>>>     more aggressive flows in classes 2 and 3 could pose a threat 
to the
 >>>>     performance of the future Internet.  There is therefore an urgent
 >>>>     need for measurements of current conditions and for further 
research
 >>>>     into the ways of managing such flows.  This raises many difficult
 >>>>     issues in finding methods with an acceptable overhead cost 
that can
 >>>>     identify and isolate unresponsive flows or flows that are less
 >>>>     responsive than TCP.
 >>>
 >>> Question: Is there actually any published research into how one would
 >>> identify
 >>> class 2 or class 3 traffic in a router/middle box? If so it would be
 >>> worth noting -
 >>> the text call for "further research" seems to indicate there is
 >>> something out there.
 >>>
 >> GF: I think the text is OK.
 >
 > Agreed. Elwyn's objection appears to be to the use of the word 
"further"; if we don't know of a paper, he'd like us to call for 
"research". The papers that come quickly to my mind are various papers 
on non-responsive flows, such as 
http://www.icir.org/floyd/papers/collapse.may99.pdf or 
http://www2.research.att.com/~jiawang/sstp08-camera/SSTP08_Pan.pdf. We 
already have a pretty extensive bibliography...
 >
 >>> s4.2, next to last para: Is it worth saying also that the randomness
 >>> should avoid targeting a single flow within a reasonable period to give
 >>> a degree of fairness.
 >
 >     Network devices SHOULD use an AQM algorithm to determine the packets
 >     that are marked or discarded due to congestion.  Procedures for
 >     dropping or marking packets within the network need to avoid
 >     increasing synchronization events, and hence randomness SHOULD be
 >     introduced in the algorithms that generate these congestion signals
 >     to the endpoints.
 >
 >> GF: Thoughts?
 >
 > I worry. The reasons for the randomness are (1) to tend to hit 
different sessions, and (2) when the same session is hit, to minimize 
the probability of multiple hits in the same RTT. It might be worth 
saying as much. However, to *stipulate* that algorithms should limit the 
hit rate on a given flow invites a discussion of stateful inspection 
algorithms. If someone wants to do such a thing, I'm not going to try to 
stop them (you could describe fq_* in those terms), but I don't want to 
put the idea into their heads (see later comment on privacy). Also, that 
is frankly more of a concern with Reno than with NewReno, and with 
NewReno than with anything that uses SACK. SACK will (usually) 
retransmit all dropped segments in the subsequent RTT, while NewReno 
will retransmit the Nth dropped packet in the Nth following RTT, and 
Reno might take that many RTO timeouts.
 >
 >>> s4.2.1, next to last para:
 >>>>     An AQM algorithm that supports ECN needs to define the 
threshold and
 >>>>     algorithm for ECN-marking.  This threshold MAY differ from 
that used
 >>>>     for dropping packets that are not marked as ECN-capable, and 
SHOULD
 >>>>     be configurable.
 >>>>
 >>> Is this suggestion really compatible with recommendation 3 and s4.3 (no
 >>> tuning)?
 >>>
 >> GF: I think making a recommendation here is beyond the "BCP" experience,
 >> although I suspect that a lower marking threshold is generally good.
 >> Should we add it also to the research agenda as an item at the end 
of para
 >> 3 in S4.7.?
 >
 > I can see adding it to the research agenda; the comment comes from 
Bob Briscoe's research.
 >
 > That said, any algorithm using any mechanism by definition needs to 
specify any variables it uses - Codel, for example, tries to keep a 
queue at 5 ms or less, and cuts in after a queue fails to empty for a 
period of 100 ms. I don't see a good argument for saying "but an 
ECN-based algorithm doesn't need to define its thresholds or 
algorithms". Also, as I recall, the MAY in the text came from the fact 
that Bob seemed to think there was value in it (which BTW I agree with). 
To my mind, SHOULD and MUST are strong words, but absent such an 
assertion, an implementation MAY do just about anything that comes to 
the implementor's mind. So saying an implementation MAY <do something> 
is mostly a suggestion that an implementor SHOULD think about it. Are we 
to say that an implementor, given Bob's research, should NOT think about 
giving folks the option?
 >
 > I also don't think Elwyn's argument quite follows. When I say that an 
algorithm should auto-tune, I'm not saying that it should not have 
knobs; I'm saying that the default values of those knobs should be 
adequate for the vast majority of use cases. I'm also not saying that 
there should be exactly one initial default; I could easily imagine an 
implementation noting the bit rate of an interface and the ping RTT to a 
peer and pulling its initial configuration out of a table.
 >
 >>> s7:  There is an arguable privacy concern that if schemes are able to
 >>> identify class 2 or class 3 flows, then a core device can extract
 >>> privacy related info from the identified flows.
 >>>
 >> GF: I don't see how traffic profiles expose privacy concerns, sure users
 >> and apps can be characterised by patterns of interaction - but this 
isn't
 >> what is being talked about here.
 >
 > Agreed. If the reference is to RFC 6973, I don't see a violation of 
https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared to 
be inviting stateful inspection algorithms. To give an example of how 
difficult sessions are managed, RFC 6057 uses the CTS message in 
round-robin fashion to push back on top-talker users in order to enable 
the service provider to give consistent service to all of his 
subscribers when a few are behaving in a manner that might prevent him 
from doing so. Note that the "session", in that case, is not a single 
TCP session, but a bittorrent-or-whatever server engaged in sessions to 
tens or hundreds of peers. The fact that a few users receive some 
pushback doesn't reveal the identities of those users. I'd need to hear 
the substance behind Elwyn's concern before I could write anything.
 >
 >> s4.7, para 3:
 >>> the use of Map/Reduce applications in data centers
 >>> I think this needs a reference or a brief explanation.
 >> GF: Fred do you know a reference or can suggest extra text?
 >
 > The concern has to do with incast, which is a pretty active research 
area (http://lmgtfy.com/?q=research+incast). The paragraph asks a 
question, which is whether the common taxonomy of network flows (mice vs 
elephants) needs to be extended to include references to herds of mice 
traveling together, with the result that congestion control algorithms 
designed under the assumption that a heavy data flow contains an 
elephant merely introduce head-of-line blocking in short flows. The word 
"lemmings" is mine.
 >
 > I know of at least four papers (Microsoft Research, CAIA, Tsinghua, 
and KAIST) submitted to various journals in 2014 on the topic. It's 
also, at least in part, the basis for the DCLC RG. The only ones we 
could reference, among those, would relate to DCTCP, as the rest have 
not yet been published.
 >
 > Again, I'd like to understand the underlying issue. I doubt that it 
is that Elwyn doesn't like the question as such. Is it that he's looking 
for the word “incast” to replace "map/reduce"?
 >
 >> --- The edits below have been incorporated in the XML for  v-09 ---
 >>> Nits/editorial comments:
 >>> General: s/e.g./e.g.,/, s/i.e./i.e.,/
 >>>
 >>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often 
necessary/
 >>> s1.2, para 3: s/a > class of technologies that/a class of technologies
 >>> that/
 >>>
 >>> s2, first bullet 3: s/Large burst of packets/Large bursts of packets/
 >>>
 >>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe provide
 >>> refs??
 >>>
 >>> s2.1, last para: s/open a large numbers of short TCP flows/may open a
 >>> large number of short duration TCP flows/
 >>>
 >>> s4, last para: s/experience occasional issues that need moderation./can
 >>> experience occasional issues that warrant mitigation./
 >>>
 >>> s4.2, para 6, last sentence: s/similarly react/react similarly/
 >>>
 >>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide when/
 >>>
 >>> s4.7, para 3:
 >>>> In 2013,
 >>> "At the time of writing" ?
 >>>
 >>> s4.7, para 3:
 >>>> the use of Map/Reduce applications in data centers
 >>> I think this needs a reference or a brief explanation.
 >

 >>> s4.2.1, next to last para:
 >>>>     An AQM algorithm that supports ECN needs to define the 
threshold and
 >>>>     algorithm for ECN-marking.  This threshold MAY differ from 
that used
 >>>>     for dropping packets that are not marked as ECN-capable, and 
SHOULD
 >>>>     be configurable.
 >>>>
 >>> Is this suggestion really compatible with recommendation 3 and s4.3 (no
 >>> tuning)?
 >>>
 >> GF: I think making a recommendation here is beyond the "BCP" experience,
 >> although I suspect that a lower marking threshold is generally good.
 >> Should we add it also to the research agenda as an item at the end 
of para
 >> 3 in S4.7.?
 >
 > I can see adding it to the research agenda; the comment comes from 
Bob Briscoe's research.
 >
 > That said, any algorithm using any mechanism by definition needs to 
specify any variables it uses - Codel, for example, tries to keep a 
queue at 5 ms or less, and cuts in after a queue fails to empty for a 
period of 100 ms. I don't see a good argument for saying "but an 
ECN-based algorithm doesn't need to define its thresholds or 
algorithms". Also, as I recall, the MAY in the text came from the fact 
that Bob seemed to think there was value in it (which BTW I agree with). 
To my mind, SHOULD and MUST are strong words, but absent such an 
assertion, an implementation MAY do just about anything that comes to 
the implementor's mind. So saying an implementation MAY <do something> 
is mostly a suggestion that an implementor SHOULD think about it. Are we 
to say that an implementor, given Bob's research, should NOT think about 
giving folks the option?
 >
 > I also don't think Elwyn's argument quite follows. When I say that an 
algorithm should auto-tune, I'm not saying that it should not have 
knobs; I'm saying that the default values of those knobs should be 
adequate for the vast majority of use cases. I'm also not saying that 
there should be exactly one initial default; I could easily imagine an 
implementation noting the bit rate of an interface and the ping RTT to a 
peer and pulling its initial configuration out of a table.
 >
 >>> s7:  There is an arguable privacy concern that if schemes are able to
 >>> identify class 2 or class 3 flows, then a core device can extract
 >>> privacy related info from the identified flows.
 >>>
 >> GF: I don't see how traffic profiles expose privacy concerns, sure users
 >> and apps can be characterised by patterns of interaction - but this 
isn't
 >> what is being talked about here.
 >
 > Agreed. If the reference is to RFC 6973, I don't see a violation of 
https://tools.ietf.org/html/rfc6973#section-7. I would if we appeared to 
be inviting stateful inspection algorithms. To give an example of how 
difficult sessions are managed, RFC 6057 uses the CTS message in 
round-robin fashion to push back on top-talker users in order to enable 
the service provider to give consistent service to all of his 
subscribers when a few are behaving in a manner that might prevent him 
from doing so. Note that the "session", in that case, is not a single 
TCP session, but a bittorrent-or-whatever server engaged in sessions to 
tens or hundreds of peers. The fact that a few users receive some 
pushback doesn't reveal the identities of those users. I'd need to hear 
the substance behind Elwyn's concern before I could write anything.
 >
 >> s4.7, para 3:
 >>> the use of Map/Reduce applications in data centers
 >>> I think this needs a reference or a brief explanation.
 >> GF: Fred do you know a reference or can suggest extra text?
 >
 > The concern has to do with incast, which is a pretty active research 
area (http://lmgtfy.com/?q=research+incast). The paragraph asks a 
question, which is whether the common taxonomy of network flows (mice vs 
elephants) needs to be extended to include references to herds of mice 
traveling together, with the result that congestion control algorithms 
designed under the assumption that a heavy data flow contains an 
elephant merely introduce head-of-line blocking in short flows. The word 
"lemmings" is mine.
 >
 > I know of at least four papers (Microsoft Research, CAIA, Tsinghua, 
and KAIST) submitted to various journals in 2014 on the topic. It's 
also, at least in part, the basis for the DCLC RG. The only ones we 
could reference, among those, would relate to DCTCP, as the rest have 
not yet been published.
 >
 > Again, I'd like to understand the underlying issue. I doubt that it 
is that Elwyn doesn't like the question as such. Is it that he's looking 
for the word “incast” to replace "map/reduce"?
 >
 >> --- The edits below have been incorporated in the XML for  v-09 ---
 >>> Nits/editorial comments:
 >>> General: s/e.g./e.g.,/, s/i.e./i.e.,/
 >>>
 >>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often 
necessary/
 >>> s1.2, para 3: s/a > class of technologies that/a class of technologies
 >>> that/
 >>>
 >>> s2, first bullet 3: s/Large burst of packets/Large bursts of packets/
 >>>
 >>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe provide
 >>> refs??
 >>>
 >>> s2.1, last para: s/open a large numbers of short TCP flows/may open a
 >>> large number of short duration TCP flows/
 >>>
 >>> s4, last para: s/experience occasional issues that need moderation./can
 >>> experience occasional issues that warrant mitigation./
 >>>
 >>> s4.2, para 6, last sentence: s/similarly react/react similarly/
 >>>
 >>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide when/
 >>>
 >>> s4.7, para 3:
 >>>> In 2013,
 >>> "At the time of writing" ?
 >>>
 >>> s4.7, para 3:
 >>>> the use of Map/Reduce applications in data centers
 >>> I think this needs a reference or a brief explanation.
 >
potentially helped a bad actor to pick off such flows or get to know who 
is communicating in a situation that currently it would be very 
difficult to know as the queueing is basically flow agnostic.  OK this 
fairly way out but we have seen some pretty serious stuff apparently 
being done around core routers according to Snowden et al.
>
>> s4.7, para 3:
>>> the use of Map/Reduce applications in data centers I think this
>>> needs a reference or a brief explanation.
>> GF: Fred do you know a reference or can suggest extra text?
>
> The concern has to do with incast, which is a pretty active research
> area (http://lmgtfy.com/?q=research+incast). The paragraph asks a
> question, which is whether the common taxonomy of network flows (mice
> vs elephants) needs to be extended to include references to herds of
> mice traveling together, with the result that congestion control
> algorithms designed under the assumption that a heavy data flow
> contains an elephant merely introduce head-of-line blocking in short
> flows. The word "lemmings" is mine.
>
> I know of at least four papers (Microsoft Research, CAIA, Tsinghua,
> and KAIST) submitted to various journals in 2014 on the topic. It's
> also, at least in part, the basis for the DCLC RG. The only ones we
> could reference, among those, would relate to DCTCP, as the rest have
> not yet been published.
>
> Again, I'd like to understand the underlying issue. I doubt that it
> is that Elwyn doesn't like the question as such. Is it that he's
> looking for the word “incast” to replace "map/reduce"?

I was just looking for somebody to define the jargon - As far as I am 
concerned at this moment "incast" would be just as "bad" since it would 
produce an equally blank stare followed by a grab for Google.
>
>> --- The edits below have been incorporated in the XML for  v-09
>> ---
>>> Nits/editorial comments: General: s/e.g./e.g.,/, s/i.e./i.e.,/
>>>
>>> s1.2, para 2(?) - top of p4: s/and often necessary/and is often
>>> necessary/ s1.2, para 3: s/a > class of technologies that/a class
>>> of technologies that/
>>>
>>> s2, first bullet 3: s/Large burst of packets/Large bursts of
>>> packets/
>>>
>>> s2, last para: Probably need to expand POP, IMAP and RDP; maybe
>>> provide refs??
>>>
>>> s2.1, last para: s/open a large numbers of short TCP flows/may
>>> open a large number of short duration TCP flows/
>>>
>>> s4, last para: s/experience occasional issues that need
>>> moderation./can experience occasional issues that warrant
>>> mitigation./
>>>
>>> s4.2, para 6, last sentence: s/similarly react/react similarly/
>>>
>>> s4.2.1, para 1: s/using AQM to decider when/using AQM to decide
>>> when/
>>>
>>> s4.7, para 3:
>>>> In 2013,
>>> "At the time of writing" ?
>>>
>>> s4.7, para 3:
>>>> the use of Map/Reduce applications in data centers
>>> I think this needs a reference or a brief explanation.
>