Re: [tsvwg] [REVISED I-D]: Document writeup for draft-ietf-tsvwg-byte-pkt-congest

Andrew McGregor <andrewmcgr@gmail.com> Thu, 11 October 2012 22:24 UTC

Return-Path: <andrewmcgr@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4506121F8551; Thu, 11 Oct 2012 15:24:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.679
X-Spam-Level:
X-Spam-Status: No, score=-2.679 tagged_above=-999 required=5 tests=[AWL=-0.920, BAYES_00=-2.599, J_CHICKENPOX_72=0.6, RCVD_IN_DNSWL_LOW=-1, SARE_LWSHORTT=1.24]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5oc27BI2LDrq; Thu, 11 Oct 2012 15:24:45 -0700 (PDT)
Received: from mail-pb0-f44.google.com (mail-pb0-f44.google.com [209.85.160.44]) by ietfa.amsl.com (Postfix) with ESMTP id 2EE3421F8532; Thu, 11 Oct 2012 15:24:45 -0700 (PDT)
Received: by mail-pb0-f44.google.com with SMTP id ro8so2303323pbb.31 for <multiple recipients>; Thu, 11 Oct 2012 15:24:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=+GzoWQBcK9+OsC1tXUgs7CJUUXtqgCBxNdbJfJ4D4Ps=; b=Fqe+HlnJBNe3wlSlUmQf8b6tZ0m3sb/uuUz+TOrhy8y+pMKPv2LNcmN+RlQ510YyiD CaJMZSqppeauC66M6pipcuR/7IRDhUK99v6XaBFmMuaFKQuOcYZIV6lSajAGZcPcIIQc sk8KVfgeS4OM7t0G+0Lq7gmjX5XafqJ8CknszluowoWaDpeQzPzBULCBEIF+IXQ//xnw zJUzwbPYoVp9cmVGVZZ46xrux7y/ZhzDlIWG0HybRFW6EtTVRZ60o/6rqwPfUXH8v5rw MbGulW8OogOV9YaeP0FNIEN7maFkdjzummhDoExjAc6xapWiq2iu9gscCM/409PrtnK6 Kq9g==
Received: by 10.66.72.35 with SMTP id a3mr5848779pav.66.1349994284779; Thu, 11 Oct 2012 15:24:44 -0700 (PDT)
Received: from ?IPv6:2406:e000:62e9:1:7171:d37a:589b:90b6? ([2406:e000:62e9:1:7171:d37a:589b:90b6]) by mx.google.com with ESMTPS id qp2sm2968971pbc.29.2012.10.11.15.24.39 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 11 Oct 2012 15:24:44 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
From: Andrew McGregor <andrewmcgr@gmail.com>
In-Reply-To: <201210111655.q9BGt47l013369@bagheera.jungle.bt.co.uk>
Date: Fri, 12 Oct 2012 11:24:33 +1300
Content-Transfer-Encoding: quoted-printable
Message-Id: <A007E196-75ED-4CE0-A5CE-26F0D1DF4ACA@gmail.com>
References: <20120815190728.GQ15012@verdi> <502BFB47.7010803@erg.abdn.ac.uk> <791AD3077F94194BB2BDD13565B6295D45A6212A@PALLENE.office.hd> <15181_1346848013_5047450C_15181_645_1_50474507.3090706@erg.abdn.ac.uk> <3A5279D338C75C40B4B3DA15BA47D6330B616BD2@EXMDB04.org.aalto.fi> <201209051541.q85FfCs0019699@bagheera.jungle.bt.co.uk> <9510D26531EF184D9017DF24659BB87F33D793B351@EMV65-UKRD.domain1.systemhost.net> <3A5279D338C75C40B4B3DA15BA47D6330B69261A@EXMDB04.org.aalto.fi> <9510D26531EF184D9017DF24659BB87F33D793B845@EMV65-UKRD.domain1.systemhost.net> <04AC95B5-A2B1-4DD2-B765-E20A4C1A69A3@aalto.fi> <3A5279D338C75C40B4B3DA15BA47D6330B6E81E4@EXMDB04.org.aalto.fi> <9510D26531EF184D9017DF24659BB87F33DCE34251@EMV65-UKRD.domain1.systemhost.net> <201210041126.q94BQ0hx002951@bagheera.jungle.bt.co.uk> <1904eec816d080f886f5bf8685358ddc.squirrel@www.erg.abdn.ac.uk> <201210101525.q9AFP4Mi008013@bagheera.jungle.bt.co.uk> <3A5279D338C75C40B4B3DA15BA47D6330B6FC2BB@EXMDB04.org.aalto.fi> <2012101112 08.q9BC8BAU012! ! 305@bag he era.jungl e.b t.co.uk> <3A5279D338C75C40B4B3DA15BA47D6330B6FD072@EXMDB04.org.aalto.fi> <999913AB42CC9341B05A99BBF358718D01F360CF@FIESEXC035.nsn-intra.net> <3A5279D338C75C40B4B3DA15BA47D6330B6FD197@EXMDB04.org.aalto.fi> <999913AB42CC9341B05A99BBF358718D01F360F7@FIESEXC035.nsn-intra.net> <201210111655.q9BGt47l013369@bagheera.jungle.bt.co.uk>
To: Bob Briscoe <bob.briscoe@bt.com>
X-Mailer: Apple Mail (2.1498)
Cc: gorry@erg.abdn.ac.uk, philip.eardley@bt.com, tsvwg IETF list <tsvwg@ietf.org>, tsvwg-chairs@ietf.org
Subject: Re: [tsvwg] [REVISED I-D]: Document writeup for draft-ietf-tsvwg-byte-pkt-congest
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tsvwg>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 22:24:48 -0000

On 12/10/2012, at 5:06 AM, Bob Briscoe <bob.briscoe@bt.com> wrote:

> Hannes,
> 
> This doc is very general. The advice on AQM not biasing for small packets applies equally to CoDel as to RED (it has already been applied to PCN, which is another AQM). The doc gives specific advice for RED as an exemplar, but it's not just about RED (it's worrying that you think it is - which text gave this impression?). Everywhere that RED is mentioned, the wording should carefully say it's an example.
> 
> Similarly, it gives guidelines on end-to-end transports, and uses TCP as an concrete example.
> 
> [Changing the subject to "Who said CoDel is a panacea anyway?" CoDel isn't really applicable on hi-speed links, certainly not if one wants the main benefit of being insensitive to the chosen hard-coded parameters. Even in the CoDel paper's simulations, CoDel gave higher queuing delay than RED in all the cases shown except the lowest link speed experiments. CoDel's 100ms target would be a crazy delay target in a high speed network. To deploy CoDel in hi-speed networks, you have to tune the parameters away from the hard-coded ones, which loses the only benefit it has.
> ]
> 
> 
> Bob

In discussing CoDel, it is important to realise that the 100ms interval is not a delay target, nor is the 5ms.  They're both measurement intervals, and pretty much form bounds on how long the algorithm will allow the queueing delay to spike above whatever value CoDel is going to converge to on that link.  Interval at 100ms was chosen such that CoDel would still be able to converge even on the longest delay links likely in the internet (it degrades somewhat gracefully with RTT between 5x interval and 8x interval, but performs horribly if RTT is greater than that).  Convergence queueing delay depends on the link characteristics somewhat, especially if it is an aggregating link or one that is transmission opportunity limited rather than bit rate limited, but is generally around the transmission time of a few packets, between two and perhaps 30.  Which is on most realistic links, much less than 100ms of delay, and on most fast links much less than 5ms.  We've seen the actual delay be under 250 microseconds on 10GbE.

However, my view is that CoDel itself is just an ingredient in a solution.  CoDel will pick victim packets from flows that are not congesting the queue, and does nothing for RTT fairness.  The two fq_codel variants that are around are much closer to a solution, but even they are strong hints not a completely general answer.  The 'fq' in the name is somewhat of a misnomer too, since neither is strictly classical fair queueing.

In an actual deployed situation, fq_codel (Eric Dumazet version in the Linux kernel) is magic.  The interactive feel of a network where the bottleneck link is managed by fq_codel in both directions is vastly superior, RTT fairness is vastly superior, VoIP and video conferencing just work in conjunction with throughput-maximising traffic, and in actual practice the problems of poor queue management are in fact fixed.

I've deployed fq_codel into four different networks now, three homes (one DSL, two cable) and one 120-odd person office (gigabit fiber ethernet), and in every case there was an immediate and noticeable improvement in web latency, VoIP and video call quality, and in the residential environments game transports benefited spectacularly as well.  The office required a transparent filtering web proxy, and the web browsing behaviour there is now totally dominated by the proxy's decision latency.  Even then, the latency is so much better than without fq_codel, it was a very noticeable benefit.

I'll not that the residential deployments I'm referring to do use the trick of rate-limiting both directions to a fraction less than the available bandwidth so as to gain control of the bottleneck queue.  This loses about 10% of the bandwidth, but you gain so much from fq_codel that it is a very worthwhile tradeoff.  Actual observed downloads usually complete faster due to fewer stalls anyway.

In every case, the improvement was dramatic enough to be observed by non-technical users. In the office case the mostly very technical user base were not told of the deployment in advance, but most users noticed that something had changed immediately, just from the improvement in browsing and VoIP experience.

So, proof by example: It is possible to do much, much better with AQM.  A good AQM can spectacularly improve coexistence of very different transports sharing a bottleneck link, and in particular can allow throughput maximising and real-time traffic to share a link with very little impact on each other (yes, the real-time jitter does degrade slightly in the presence of other traffic, but this is well within the bounds of acceptable even on a fairly slow DSL link).

Andrew

> 
> At 13:57 11/10/2012, Tschofenig, Hannes (NSN - FI/Espoo) wrote:
>> Hi Jukka,
>> 
>> With QoS there is no such thing as "short term" (as we already had to
>> learn painfully).
>> 
>> That makes me wonder what is more useful: publish a document that
>> describe how to configure RED correctly or to go for an AQM technique
>> that (according to the authors) works almost zero configuration.
>> 
>> My take-away from the Vancouver congestion control workshop was that it
>> is better to ignore RED. (Maybe my impression was wrong -- someone else
>> should confirm.)
>> 
>> Ciao
>> Hannes
>> 
>> 
>> > -----Original Message-----
>> > From: ext Manner Jukka [mailto:jukka.manner@aalto.fi]
>> > Sent: Thursday, October 11, 2012 3:50 PM
>> > To: Tschofenig, Hannes (NSN - FI/Espoo)
>> > Cc: Bob Briscoe; <gorry@erg.abdn.ac.uk>; <philip.eardley@bt.com>;
>> > <tsvwg-chairs@ietf.org>; tsvwg IETF list
>> > Subject: Re: [tsvwg] [REVISED I-D]: Document writeup for draft-ietf-
>> > tsvwg-byte-pkt-congest
>> >
>> > Hi, this document talks about the current and short term goals and
>> > operations, while AFAIK CoDel is a more futuristic concept,
>> interesting
>> > concept, sure, but not deployed at large.
>> >
>> > cheers,
>> > Jukka
>> >
>> > On Oct 11, 2012, at 3:44 PM, Tschofenig, Hannes (NSN - FI/Espoo)
>> wrote:
>> >
>> > > Hi Jukka, Hi Bob,
>> > >
>> > > having listened to the presentations about CoDel I am wondering
>> > whether
>> > > this document may have taken over by events and is obsolete by now.
>> > > What's your view on that?
>> > >
>> > > Ciao
>> > > Hannes
>> > >
>> > >
>> > >> -----Original Message-----
>> > >> From: tsvwg-bounces@ietf.org [mailto:tsvwg-bounces@ietf.org] On
>> > Behalf
>> > >> Of ext Manner Jukka
>> > >> Sent: Thursday, October 11, 2012 3:34 PM
>> > >> To: Bob Briscoe
>> > >> Cc: <gorry@erg.abdn.ac.uk>; <philip.eardley@bt.com>; <tsvwg-
>> > >> chairs@ietf.org>; tsvwg IETF list
>> > >> Subject: Re: [tsvwg] [REVISED I-D]: Document writeup for
>> draft-ietf-
>> > >> tsvwg-byte-pkt-congest
>> > >>
>> > >> Hi,
>> > >>
>> > >> Yeah, please at the point about cheap devices, still, I wonder what
>> > > the
>> > >> cost of AQM is on those devices. Maybe 1 cnt in mass production?
>> > Thus,
>> > >> I still wonder if it is worth arguing something about the future
>> > that
>> > >> strongly.
>> > >>
>> > >> Sorry, but the same goes for AQM in general, and where it is, or
>> > will
>> > >> not be deployed. I understand technical reasons why not something
>> is
>> > >> feasible, but business or monetary reasons can change very quickly.
>> > ;)
>> > >>
>> > >> Jukka
>> > >>
>> > >> On Oct 11, 2012, at 3:08 PM, Bob Briscoe wrote:
>> > >>
>> > >>> Jukka,
>> > >>>
>> > >>> What I had in mind is all low level buffers in cheap pure L2
>> > > devices,
>> > >> in cheap NATs etc. I was going to list some of these, but instead
>> > > chose
>> > >> to say "no-one expects AQM to be universally deployed". This
>> clearly
>> > >> missed the mark.
>> > >>>
>> > >>> Should we include some of these concrete examples of where AQM
>> > won't
>> > >> be deployed instead?
>> > >>>
>> > >>> [I've cc'd the list, given we're starting to wordsmith
>> > significantly
>> > >> new text added since WG last call.]
>> > >>>
>> > >>>
>> > >>> Bob
>> > >>>
>> > >>> At 20:04 10/10/2012, Manner Jukka wrote:
>> > >>>> Hi, looked good to me. Just one comment, I wouldn't put that much
>> > >> emphasis on what is a probably state of things, e.g.
>> > >>>>
>> > >>>> "no-one expects AQM to ever be .."  > in the future it might, we
>> > >> don't know, and since the paragraph says that it doesn't matter
>> > > anyway,
>> > >> I would formulate this paragraph a bit towards that we have partial
>> > >> deployment now and  even with a full deployment, ...
>> > >>>>
>> > >>>> and similarly with the unrealistic deployment of AQM in the
>> > >> subsequent paragraph.
>> > >>>>
>> > >>>> cheers,
>> > >>>> Jukka
>> > >>>>
>> > >>>> On Oct 10, 2012, at 6:25 PM, Bob Briscoe wrote:
>> > >>>>
>> > >>>>> Phil,
>> > >>>>>
>> > >>>>> Sorry for delay, my round-robin scheduler is taking so long to
>> > > get
>> > >> round all the tasks that I keep finding dead robins (that reminds
>> me
>> > > of
>> > >> a bike ride my grandmother took. On her return, she said there were
>> > a
>> > >> worryingly large number of robins run over on the roads. It turned
>> > out
>> > >> that she had been going round the same small triangle of roads
>> > > multiple
>> > >> times).
>> > >>>>>
>> > >>>>> I've done a complete re-write of 3.4. I hope you will agree that
>> > >> it really is a motivating argument now and it can stay where it is
>> -
>> > > it
>> > >> is not just a preamble.
>> > >>>>>
>> > >>>>> I've gone back to the very first version of 3.4 I wrote and I
>> now
>> > >> realise that I had written the core of the argument so badly that
>> > even
>> > >> I forgot what it was meant to say when we started editing it. In
>> the
>> > >> draft-08 version, the original meaning is lost and in your re-
>> > wording
>> > >> it is entirely lost. The link with the draft-08 version is nearly
>> > >> unrecognisable, but it's the argument I intended here.
>> > >>>>>
>> > >>>>>
>> > >>
>> > >
>> >
>> =======================================================================
>> > >> ===============
>> > >>>>> 3.4.  Permanent Partial Deployment of AQM
>> > >>>>>
>> > >>>>> In overview, the argument in this section runs as follows:
>> > >>>>> * Because the network will not and cannot always drop packets in
>> > >> proportion to their size, it shouldn't be given the task of making
>> > > drop
>> > >> signals depend on packet size at all.
>> > >>>>> * Transports on the other hand don't always want to make their
>> > >> rate response proportional to the size of dropped packets, but if
>> > they
>> > >> want to, they always can.
>> > >>>>>
>> > >>>>> The argument is similar to the end-to-end argument that says
>> > >> "Don't do X in the network if end-systems can do X by themselves,
>> > and
>> > >> they want to be able to choose whether to do X anyway." Actually
>> the
>> > >> following argument is stronger; in addition it says "Don't give the
>> > >> network task X that could be done by the end-systems, if X won't
>> > ever
>> > >> be deployed on all network nodes, and end-systems won't be able to
>> > > tell
>> > >> whether their network is doing X, or whether they need to do X
>> > >> themselves." In this case, the X in question is "making the
>> response
>> > > to
>> > >> congestion depend on packet size".
>> > >>>>>
>> > >>>>> We will now re-run this argument taking each step in more depth.
>> > >> The argument applies solely to drop, not ECN marking.
>> > >>>>>
>> > >>>>> A queue drops packets for either of two reasons: a) to signal to
>> > >> host congestion controls that they should reduce the load and b)
>> > >> because there is no buffer left to store the packets. Active queue
>> > >> management tries to use drops as a signal for hosts to slow down
>> > (case
>> > >> a) so that drop due to buffer exhaustion (case b) should not be
>> > >> necessary.
>> > >>>>>
>> > >>>>> No-one expects AQM to ever be universally deployed in every
>> queue
>> > >> in the Internet; and, even if AQM were universal, it has to be able
>> > to
>> > >> cope with buffer exhaustion (by switching to a behaviour like tail-
>> > >> drop), in order to cope with unresponsive or excessive transports.
>> > >> Therefore networks will often be dropping packets as a last resort
>> > >> (case b) rather than under AQM control (case a).
>> > >>>>>
>> > >>>>> When buffers are exhausted (case b), they don't naturally drop
>> > >> packets in proportion to their size. The network can only reduce
>> the
>> > >> probability of dropping smaller packets if it has enough space to
>> > > store
>> > >> them somewhere while it waits for a larger packet that it can drop.
>> > If
>> > >> the buffer is exhausted, it does not have this choice. Admittedly
>> > > tail-
>> > >> drop does naturally drop somewhat fewer small packets, but exactly
>> > how
>> > >> few depends more on the mix of sizes than the size of the packet in
>> > >> question. Nonetheless, in general, if we wanted networks to do
>> size-
>> > >> dependent drop, we would need universal deployment of (packet-size
>> > >> dependent) AQM code, which is unrealistic.
>> > >>>>>
>> > >>>>> A host transport cannot know whether any particular drop was a
>> > >> deliberate signal from an AQM or a sign of a queue shedding packets
>> > > due
>> > >> to buffer exhaustion. Therefore, because the network cannot
>> > > universally
>> > >> do size-dependent drop, it should not do it all.
>> > >>>>>
>> > >>>>> Whereas universality is desirable in the network, diversity is
>> > >> desirable between different transport layer protocols - some, like
>> > >> NewReno TCP [RFC5681], may not choose to make their rate response
>> > >> proportionate to the size of each dropped packet, while others will
>> > >> (e.g. TFRC-SP [RFC4828]).
>> > >>>>>
>> > >>
>> > >
>> >
>> =======================================================================
>> > >> ===============
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> Bob
>> > >>>>>
>> > >>>>> At 13:13 04/10/2012, gorry@erg.abdn.ac.uk wrote:
>> > >>>>>> That would be brilliant!
>> > >>>>>>
>> > >>>>>> Gorry
>> > >>>>>>
>> > >>>>>>> Will try to look at this later today.
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> Bob
>> > >>>>>>>
>> > >>>>>>> At 10:42 02/10/2012, philip.eardley@bt.com wrote:
>> > >>>>>>>> Think the token is with bob
>> > >>>>>>>>
>> > >>>>>>>> Re your comment - yes, I think you could move 3.4 to the end
>> > > of
>> > >> S3 -
>> > >>>>>>>> or to the start - and do some re-phrasing, maybe something
>> > > like
>> > >> I tried.
>> > >>>>>>>>
>> > >>>>>>>> -----Original Message-----
>> > >>>>>>>> From: Manner Jukka [mailto:jukka.manner@aalto.fi]
>> > >>>>>>>> Sent: 01 October 2012 20:18
>> > >>>>>>>> To: Eardley,PL,Philip,DUB8 R; Briscoe,RJ,Bob,DUB8 R
>> > >>>>>>>> Cc: tsvwg-chairs@ietf.org; Gorry Fairhurst
>> > >>>>>>>> Subject: Re: [REVISED I-D]: Document writeup for
>> > >>>>>>>> draft-ietf-tsvwg-byte-pkt-congest
>> > >>>>>>>>
>> > >>>>>>>> Phil, Bob, can we close this finally? It's three weeks now
>> > >> since
>> > >>>>>>>> this previous message.
>> > >>>>>>>>
>> > >>>>>>>> Jukka
>> > >>>>>>>>
>> > >>>>>>>> On Sep 10, 2012, at 4:42 PM, Jukka Manner wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Ack, I'm getting there...;)
>> > >>>>>>>>>
>> > >>>>>>>>> So, we could move 3.4 at the end of Section 3 and rephrase
>> > >> it to
>> > >>>>>>>> not talk about scaling per se, but rather about how Req 2.2
>> > >>>>>>>> eventually leads to Req 2.3?
>> > >>>>>>>>>
>> > >>>>>>>>> Jukka
>> > >>>>>>>>>
>> > >>>>>>>>> On Sep 7, 2012, at 4:07 PM, <philip.eardley@bt.com>
>> > >>>>>>>>> <philip.eardley@bt.com> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> 3.4 is, in my mind, completely different from 3.1, 3.2,
>> > > 3.3
>> > >> and 3.5.
>> > >>>>>>>>>>
>> > >>>>>>>>>> 3.1, 3.2, 3.3 and 3.5 present motivating arguments for
>> > >> Recommendation
>> > >>>>>>>>>> 2.2 & a lesser extent 2.3
>> > >>>>>>>>>> 3.4 makes doesn't make a motivating argument - it explains
>> > >> that  Rec
>> > >>>>>>>>>> 2.2 & 2.3 come as a pair (if you recommend one, then
>> > >> inevitably you
>> > >>>>>>>>>> recommend the other) (delta some politeness about TCP)
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> -----Original Message-----
>> > >>>>>>>>>> From: Manner Jukka [mailto:jukka.manner@aalto.fi]
>> > >>>>>>>>>> Sent: 07 September 2012 13:24
>> > >>>>>>>>>> To: Eardley,PL,Philip,DUB8 R
>> > >>>>>>>>>> Cc: Briscoe,RJ,Bob,DUB8 R; <tsvwg-chairs@ietf.org>;
>> > >>>>>>>>>> <gorry@erg.abdn.ac.uk>
>> > >>>>>>>>>> Subject: Re: [REVISED I-D]: Document writeup for
>> > >>>>>>>>>> draft-ietf-tsvwg-byte-pkt-congest
>> > >>>>>>>>>>
>> > >>>>>>>>>> Hi, I'm trying to follow your reasoning, with little luck.
>> > >> ;)
>> > >>>>>>>>>>
>> > >>>>>>>>>> After having read Section 3 again, I could think of a few
>> > >>>>>>>> updates, but I'm not fully following what you want to see:
>> > >>>>>>>>>>
>> > >>>>>>>>>> - S3.5 is kind of small and lonely by itself. It could be
>> > >> either
>> > >>>>>>>> longer or merged somewhere. Not a major issue though and
>> > > could
>> > >> remain as
>> > >>>>>>>> is.
>> > >>>>>>>>>>
>> > >>>>>>>>>> - S3.3: Yes, probably could be turned upside down, good
>> > >> first,
>> > >>>>>>>> bad later. Don't have a strong opinion.
>> > >>>>>>>>>>
>> > >>>>>>>>>> - S3.4: Here the scaling is used many times and in the
>> > >> title,
>> > >>>>>>>> although we talk about the end host side. This could be
>> > >> rephrased
>> > >>>>>>>> and slightly reworded to talk about hosts and their operation
>> > >>>>>>>> altogether. Also the word scaling can turn the readers
>> > >> expectation
>> > >>>>>>>> into the wrong direction.
>> > >>>>>>>>>>
>> > >>>>>>>>>> - S3 altogether: It would help to have introductory text
>> > >> about
>> > >>>>>>>> networks vs. end host side, but I fail to see why  (& how) to
>> > >>>>>>>> restructure everything.
>> > >>>>>>>>>>
>> > >>>>>>>>>> cheers,
>> > >>>>>>>>>> Jukka
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Sep 6, 2012, at 6:18 PM,
>> > >>>>>>>> <philip.eardley@bt.com>  <philip.eardley@bt.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> Ps otherwise Bob's comments are fine, though I suggest
>> > > for
>> > >> S3.5 you
>> > >>>>>>>>>>> give more context (ie are you knocking down what you
>> > >> recommend
>> > >>>>>>>>>>> against, or building up what you recommend?)  and in S3.3
>> > >> think
>> > >>>>>>>>>>> would be better if you re-structured slightly [first talk
>> > >> about what
>> > >>>>>>>>>>> you like - penult para + 1st sentence of last para - then
>> > >> talk about
>> > >>>>>>>>>>> what you don't like - rest of text]
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> From: Eardley,PL,Philip,DUB8 R
>> > >>>>>>>>>>> Sent: 06 September 2012 16:12
>> > >>>>>>>>>>> To: Briscoe,RJ,Bob,DUB8 R; Manner Jukka
>> > >>>>>>>>>>> Cc: tsvwg-chairs@ietf.org; Gorry Fairhurst
>> > >>>>>>>>>>> Subject: RE: [REVISED I-D]: Document writeup for
>> > >>>>>>>>>>> draft-ietf-tsvwg-byte-pkt-congest
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Thanks bob. Have now re-booted some state.
>> > >>>>>>>>>>> I don't like S3.4. My basic problem with it is that it
>> > >> isn't a
>> > >>>>>>>> "motivating argument" (S3 title). It's an explanation about
>> > >> the
>> > >>>>>>>> consequences of assuming either [option 1] do pkt-size
>> > > biasing
>> > >> in
>> > >>>>>>>> the transport; [option 2]; do pkt-size biasing in the nw.
>> > >>>>>>>>>>> Perhaps your effort to buy peace with those who won't
>> > >> change
>> > >>>>>>>> TCP has obfuscated the basic point?
>> > >>>>>>>>>>> The title of S3.4 (Scaling...) and the early text
>> > > "scaling
>> > >>>>>>>> argument" suggests that you're going to argue there is some
>> > >> reason
>> > >>>>>>>> why X scales as order N rather than N^2 or something like
>> > > that
>> > >> -
>> > >>>>>>>> which isn't what S3.4 is about.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I actually think it would be better to put S3.4 at the
>> > >> start of
>> > >>>>>>>> S3 - it's background where you say there are 2 options. After
>> > >> that,
>> > >>>>>>>> you can explain the reasons why option 1 [pkt biasing in the
>> > >>>>>>>> transport] is better - the reasons being:-  S3.1 (security
>> > > etc
>> > >>>>>>>> attacks if small pkts get preferential treatment), S3.2
>> > >> (really a
>> > >>>>>>>> side note to S3.1; ie doesn't really present a "motivating
>> > >>>>>>>> argument"), S3.3 (nw guessing MTU vs transport knowing pkt
>> > >> size),
>> > >>>>>>>> S3.5 (easier implementation).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Something like...
>> > >>>>>>>>>>> The behaviour of the network and transport together needs
>> > >> to
>> > >>>>>>>>>>> alleviate congestion. We assume the network is bit-
>> > >> congestible
>> > >>>>>>>> (currently packet-congestible resources are rate). Therefore
>> > >>>>>>>> overall the response of the {network + transport} should be
>> > >> the
>> > >>>>>>>> same for the same congestion - whether the congestion is
>> > >> caused by
>> > >>>>>>>> a lot of small packets or a smaller number of larger packets.
>> > >> The
>> > >>>>>>>> implication is that the transport's behaviour should depend
>> > > on
>> > >>>>>>>> whether the network takes account (or not) of packet size
>> > > when
>> > >> it
>> > >>>>>>>> generates congestion indications:- Case 1: If the network
>> > >>>>>>>> implements packet-mode drop, ie its algorithm treats all
>> > >> packets
>> > >>>>>>>> equally regardless of their size, then the transport should
>> > >> take
>> > >>>>>>>> account of the size of the packet - in other words, the
>> > >> transport's
>> > >>>>>>>> congestion response should depend on the number of dropped
>> > >> bytes.
>> > >>>>>>>>>>> Case 2: If the network implements byte-mode drop, ie its
>> > >>>>>>>> algorithm treats packets differently depending on their size,
>> > >> then
>> > >>>>>>>> the transport should not take account of the size of the
>> > >> packet -
>> > >>>>>>>> in other words, the transport's congestion response should
>> > >> depend
>> > >>>>>>>> on the number of dropped packets.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> These cases are explained in more detail in sub-section
>> > >> below.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> In both cases the same discussion applies if packets are
>> > >>>>>>>> ECN-marked rather than dropped.
>> > >>>>>>>>>>> In the following sections we explain why Case 1 is
>> > >> preferable,
>> > >>>>>>>> and hence why we make the Recommendations of S2.2 and S3.3
>> > >>>>>>>> (respectively, drop/mark pkts equally regardless of their
>> > >> size, and
>> > >>>>>>>> the strength of the transport's response should be
>> > >> proportionate to
>> > >>>>>>>> the size of the dropped/marked pkt).
>> > >>>>>>>>>>> Note that TCP responds to dropped or marked packets,
>> > > which
>> > >> differs
>> > >>>>>>>>>>> from our recommendation for transports. However, we are
>> > >> not
>> > >>>>>>>>>>> recommending that TCP  should be changed; to date it
>> > >> hasn't been a
>> > >>>>>>>>>>> significant problem because most TCP implementations
>> > > have
>> > >> been
>> > >>>>>>>> used with similar packet sizes. But we do recommend that
>> > >> future
>> > >>>>>>>> transport protocols should  respond to dropped or marked
>> > > bytes
>> > >> (as
>> > >>>>>>>> for example TFRC-SP [RFC4828]effectively does).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Sub-section
>> > >>>>>>>>>>> You could use much of S3.3, I suggest supplement with
>> > >> numerical
>> > >>>>>>>>>>> examples continuing S1.2
>> > >>>>>>>>>>> --
>> > >>>>>>>>>>> << This gives a get-out clause to any transport that
>> > > isn't
>> > >>>>>>>>>>> packet-size dependent (e.g. current TCP). Lacking packet-
>> > >> size
>> > >>>>>>>>>>> dependence just means they don't scale correctly with
>> > >> packet size -
>> > >>>>>>>>>>> but they are at liberty not to scale with packet size
>> > >> (particularly
>> > >>>>>>>>>>> given MTU growth isn't often being used to increase link
>> > >> rates at
>> > >>>>>>>>>>> this stage in history).>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> This is introducing a different point. it's saying that
>> > >> your
>> > >>>>>>>> recommendation 2 [drop/mark pkts equally regardless of their
>> > >> size]
>> > >>>>>>>> is a really strong reco whilst recommendation 3 [strength of
>> > >>>>>>>> transport's response proportionate to the size of the
>> > >>>>>>>> dropped/marked pkt] is a much less important reco.  I can see
>> > >> this
>> > >>>>>>>> might be true, but I don't think the i-d talks about this.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> --
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> From: Briscoe,RJ,Bob,DUB8 R
>> > >>>>>>>>>>> Sent: 05 September 2012 16:41
>> > >>>>>>>>>>> To: Manner Jukka
>> > >>>>>>>>>>> Cc: tsvwg-chairs@ietf.org; Gorry Fairhurst;
>> > >> Eardley,PL,Philip,DUB8 R
>> > >>>>>>>>>>> Subject: Re: [REVISED I-D]: Document writeup for
>> > >>>>>>>>>>> draft-ietf-tsvwg-byte-pkt-congest
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Jukka,
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I've repeated Phil's suggestions as quoted email then
>> > > made
>> > >> some
>> > >>>>>>>> alterations.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Phil asked me to send this, then he will think about it.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I'll respond about John Leslie's argument next...
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 2.2. Recommendation on Encoding Congestion Notification
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [...]
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 1.  AQM algorithms such as RED SHOULD use packet-mode
>> > >> drop, ie they
>> > >>>>>>>>>>>     SHOULD NOT use byte-mode drop. The latter is more
>> > >> complex,
>> > >>>>>>>>>>>     it creates the perverse incentive to fragment
>> > >> segments into
>> > >>>>>>>> tiny
>> > >>>>>>>>>>>     pieces and it is vulnerable to floods of small
>> > >> packets.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 2.  If a vendor has implemented byte-mode drop, and an
>> > >> operator has
>> > >>>>>>>>>>>     turned it on, it is RECOMMENDED to turn it off,
>> > > after
>> > >>>>>>>> establishing
>> > >>>>>>>>>>>     if there are any implications on the relative
>> > >> performance of
>> > >>>>>>>>>>>     applications using different packet sizes.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>     RED as a whole SHOULD NOT be turned off. Without
>> > > RED,
>> > >> a drop
>> > >>>>>>>> tail
>> > >>>>>>>>>>>     queue biases against large packets
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [BB adds]:                                 and it is
>> > >> vulnerable to
>> > >>>>>>>> floods
>> > >>>>>>>>>>>      of small-packets.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> (I suggest the last para just qualifies item 2. It
>> > > doesn't
>> > >> warrant
>> > >>>>>>>>>>> another item number.)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> <Agree with Phil; Don't indent "Note well...">
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [PE:] S3.3
>> > >>>>>>>>>>> The section could be slightly clearer. Starting with the
>> > >> second
>> > >>>>>>>> para, where you start talking about something you don't
>> > >> recommend
>> > >>>>>>>> (ie you have to remember that you're talking about the
>> > >> consequences
>> > >>>>>>>> of not following your recommendation).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> 3.3 Transport-Independent Network
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> TCP congestion control ensures that flows competing for
>> > >> the same
>> > >>>>>>>>>>> resource each maintain the same number of segments in
>> > >> flight,
>> > >>>>>>>>>>> irrespective of segment size.  So under similar
>> > >> conditions, flows
>> > >>>>>>>>>>> with different segment sizes will get different
>> > > bit-rates.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [OLD:]
>> > >>>>>>>>>>> One motivation for the network biasing congestion
>> > >> notification by
>> > >>>>>>>>>>> packet size is to counter this effect and try to equalise
>> > >> the bit-
>> > >>>>>>>>>>> rates of flows with different packet sizes.
>> > >>>>>>>>>>> [SUGGESTED:]
>> > >>>>>>>>>>> To counter this effect it seems tempting not to follow
>> > >> our
>> > >>>>>>>>>>> recommendation,  and instead for the network to bias
>> > >> congestion
>> > >>>>>>>>>>> notification by  packet size in order to equalise the
>> > > bit-
>> > >> rates of
>> > >>>>>>>>>>> flows with different  packet sizes.
>> > >>>>>>>>>>> [CONTINUE AS BEFORE:]
>> > >>>>>>>>>>>               However, in order to do  this, the queuing
>> > >> algorithm
>> > >>>>>>>>>>> has to make assumptions about the  transport, which
>> > > become
>> > >> embedded
>> > >>>>>>>>>>> in the network.  Specifically:
>> > >>>>>>>>>>> [...]
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> S3.3
>> > >>>>>>>>>>> o  The queuing algorithm has to assume how aggressively
>> > >> the
>> > >>>>>>>> transport
>> > >>>>>>>>>>>    will respond to congestion (see Section 4.2.4).  If
>> > >> the network
>> > >>>>>>>>>>>    assumes the transport responds as aggressively as TCP
>> > >> NewReno,
>> > >>>>>>>> it
>> > >>>>>>>>>>>    will be wrong for Compound TCP and differently wrong
>> > >> for Cubic
>> > >>>>>>>>>>>    TCP, etc.  To achieve equal bit-rates, each transport
>> > >> then has
>> > >>>>>>>> to
>> > >>>>>>>>>>>    guess what assumption the network made, and work out
>> > >> how to
>> > >>>>>>>>>>>    replace this assumed aggressiveness with its own
>> > >> aggressiveness.
>> > >>>>>>>>>>> o  Also, if the network biases congestion notification
>> > > by
>> > >> packet
>> > >>>>>>>> size
>> > >>>>>>>>>>>    it has to assume a baseline packet size--all proposed
>> > >> algorithms
>> > >>>>>>>>>>>    use the local MTU.  Then transports have to guess
>> > >> which link was
>> > >>>>>>>>>>>    congested and what its local MTU was, in order to
>> > > know
>> > >> how to
>> > >>>>>>>>>>>    tailor their congestion response to that link.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> So you're saying that for byte-mode drop, the network
>> > >> element
>> > >>>>>>>> has to compare the size of the current pkt with the MTU, and
>> > >> if
>> > >>>>>>>> it's 25* smaller then reduce the pkt loss probably by 25*?
>> > > And
>> > >> if
>> > >>>>>>>> the MTU was bigger, then the current pkt would be dropped
>> > > with
>> > >>>>>>>> greater probability? And both of these things are hard to do.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [BB]: SUGGESTED REPLACEMENT to 2nd bullet:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> o  Also, if the network biases congestion notification
>> > > by
>> > >> packet
>> > >>>>>>>> size
>> > >>>>>>>>>>>    it has to assume a baseline packet size--all proposed
>> > >> algorithms
>> > >>>>>>>>>>>    use the local MTU (for example see the byte-mode loss
>> > >>>>>>>> probability
>> > >>>>>>>>>>>    formula in Table 1).  Then if the non-Reno transports
>> > >> mentioned
>> > >>>>>>>> above
>> > >>>>>>>>>>>    are trying to reverse engineer what the network
>> > >> assumed, they
>> > >>>>>>>> also
>> > >>>>>>>>>>>    have to guess the MTU of the congested link.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [Aside for Phil: The byte-mode algo uses MTU as its
>> > >> baseline because
>> > >>>>>>>>>>> a link cannot send a packet bigger than its MTU so the
>> > >> algo won't
>> > >>>>>>>>>>> ever give a drop probability >1).]
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> S3.4 (& knock-on effect on 3.3)
>> > >>>>>>>>>>> I think this could be better.
>> > >>>>>>>>>>> I agree with your arguments in S3.1 & 3.2 that dropping
>> > > or
>> > >>>>>>>> marking should be independent of pkt size.
>> > >>>>>>>>>>> You now go into a couple of pages, which I think could be
>> > >> boiled
>> > >>>>>>>>>>> down to a paragraph, saying something like:-
>> > >>>>>>>>>>> * the overall system (network + transport) has to be
>> > >> packet-size
>> > >>>>>>>>>>> dependent in order to drive out the right amount of
>> > >> traffic so the
>> > >>>>>>>>>>> resource becomes uncongested (since we assume all
>> > >> resources are in
>> > >>>>>>>>>>> practice byte rather than packet limited)
>> > >>>>>>>>>>> * from S3.1 and S3.2 the IETF is convinced that the
>> > >> network
>> > >>>>>>>> should notify independent of packet-size.
>> > >>>>>>>>>>> * therefore the transport must act dependent on packet-
>> > >> size.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> It seems to me this is a necessary consequence of your
>> > >> earlier
>> > >>>>>>>> assumptions /arguments.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [BB:] This isn't at all obvious. Until I worked it
>> > >> through, it
>> > >>>>>>>> wasn't obvious to me. I think it needs to be spelled out.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [On the other hand, if someone disagrees with the
>> > >> arguments and
>> > >>>>>>>>>>> thinks that the network should notify dependent on
>> > > packet-
>> > >> size, then
>> > >>>>>>>>>>> it follows that the transport can act independent of
>> > >> packet-size.]
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I think in S3.3 you're trying to make another argument
>> > >> (?):-
>> > >>>>>>>>>>> * we can either do packet-size biasing in the network
>> > >> element or on
>> > >>>>>>>>>>> the end host transport (see argument above)
>> > >>>>>>>>>>> * packet-size biasing in the network is quite tricky,
>> > >> because
>> > >>>>>>>> then both the network element & the transport have to make
>> > >> some
>> > >>>>>>>> guesses about MTU - whereas with packet-size biasing in the
>> > >>>>>>>> transport, neither network element nor transport needs to
>> > > know
>> > >> what the
>> > >>>>>>>> MTU is.
>> > >>>>>>>>>>> * it is also easier to implement packet-size biasing in
>> > >> the
>> > >>>>>>>> transport than in the network.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [BB] ...and all the other arguments about vulnerability
>> > > to
>> > >>>>>>>> flooding etc, transport-independence etc.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Our argument is sort of like how Phil summarises it, but
>> > >> with
>> > >>>>>>>> an added nuance. It's also saying no-one has to do
>> > > packet-size
>> > >>>>>>>> biasing anyway. It's not saying "The transport should given
>> > >> the
>> > >>>>>>>> network shouldn't". It's saying "*If the transport wants to
>> > >> claim
>> > >>>>>>>> to be scalable wrt packet size* it should if the network
>> > >> shouldn't".
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> This gives a get-out clause to any transport that isn't
>> > >>>>>>>> packet-size dependent (e.g. current TCP). Lacking packet-size
>> > >>>>>>>> dependence just means they don't scale correctly with packet
>> > >> size -
>> > >>>>>>>> but they are at liberty not to scale with packet size
>> > >> (particularly
>> > >>>>>>>> given MTU growth isn't often being used to increase link
>> > > rates
>> > >> at
>> > >>>>>>>> this stage in history).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> None of what Phil says convinces me we could say 3.4 (or
>> > >> 3.3)
>> > >>>>>>>> any more briefly or differently. It's easier to summarise
>> > > what
>> > >> you
>> > >>>>>>>> understand having read something. But that's not the same as
>> > >>>>>>>> writing the thing that helped you understand. There doesn't
>> > >> seem to
>> > >>>>>>>> be any argument for why anything needs to be changed, so I'm
>> > >> going
>> > >>>>>>>> to leave it as is, unless pushed.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On the last bullet, I didn't really follow your argument
>> > >> in
>> > >>>>>>>> S3.5. Might be easier to follow if S3.5 also talked about how
>> > >> a
>> > >>>>>>>> network element would take account of pkt size?
>> > >>>>>>>>>>> Also, does the following argument also apply?:
>> > >>>>>>>>>>> * if pkt-size biasing is done in the transport, then the
>> > >> work (to do
>> > >>>>>>>>>>> the biasing) is spread over many end hosts,
>> > >>>>>>>>>>> * whereas if pkt-size biasing is done in the network,
>> > > then
>> > >> the
>> > >>>>>>>> work is concentrated in just the congested network element,
>> > >> which
>> > >>>>>>>> has to operate on all the traffic at line speed.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [BB]: 3.5. Implementation Efficiency
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [ADD:]
>> > >>>>>>>>>>> Biasing against large packets typically requires an
>> > > extra
>> > >> multiply
>> > >>>>>>>>>>> and  divide in the network (see the example byte-mode
>> > > drop
>> > >>>>>>>> formula in table 1).
>> > >>>>>>>>>>> [CONTINUE AS BEFORE:]
>> > >>>>>>>>>>> Allowing for packet size at the transport rather than in
>> > >> the
>> > >>>>>>>>>>> network  ensures that neither the network nor the
>> > >> transport needs to
>> > >>>>>>>>>>> do a  multiply operation--multiplication by packet size
>> > > is
>> > >>>>>>>>>>> effectively  achieved as a repeated add when the
>> > > transport
>> > >> adds to
>> > >>>>>>>>>>> its count of  marked bytes as each congestion event is
>> > > fed
>> > >> to it.
>> > >>>>>>>>>>> [ADD:]
>> > >>>>>>>>>>> Also the work to do the biasing is spread over many
>> > >> hosts, rather
>> > >>>>>>>>>>> than  concentrated in just the congested network element.
>> > >>>>>>>>>>> [CONTINUE AS BEFORE:]
>> > >>>>>>>>>>>
>> > > These
>> > >> aren't
>> > >>>>>>>>>>> principled reasons in themselves, but they are a happy
>> > >> consequence
>> > >>>>>>>>>>> of the  other principled reasons.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Bob
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> At 13:36 05/09/2012, Manner Jukka wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Hi Bob,
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Since Phil is your colleague at BT, even at Ipswich (?),
>> > >> can
>> > >>>>>>>> you check those and agree with Phil how to go forward? Some
>> > > of
>> > >>>>>>>> those are clear, others need discussion with Phil.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Gorry, on point 2) below, I don't follow Leslie's
>> > >> arguments. I
>> > >>>>>>>> asked from him detailed comments and resolutions months ago,
>> > >> and nothing
>> > >>>>>>>> came.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> cheers,
>> > >>>>>>>>>>> Jukka
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On Sep 5, 2012, at 3:26 PM, Gorry Fairhurst wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Bob and Jukka,
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> I think I need your help to resolve this... Some of the
>> > >>>>>>>> comments raised by John Leslie appear to have substance, and
>> > >> I'm
>> > >>>>>>>> not sure what you plan to do to address these.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> It would be good to do something quick so that I can
>> > >> complete
>> > >>>>>>>> the submission to the IESG.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> The issues I particularly note are:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> 1) This contribution from Phil appears to be within
>> > > scope
>> > >> and
>> > >>>>>>>> it seems may have been overlooked (sorry):
>> > >>>>>>>>>>>> http://www.ietf.org/mail-
>> > >> archive/web/tsvwg/current/msg11234.html
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> 2) Can you be clear here about what is actually been
>> > > said
>> > >> -
>> > >>>>>>>> because the point below seems to be a misunderstanding of
>> > >> language
>> > >>>>>>>> to me, and I hoped we did not claim this:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>>> ] Alas, when I read it carefully this week, I find
>> > > that
>> > >> Bob is
>> > >>>>>>>>>>>>>> actually ] saying that transport-layer should _only_
>> > >> consider
>> > >>>>>>>>>>>>>> "congested bytes", not ] "congested packets".
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> 3) You'll need to post the updated draft and send a note
>> > >> to
>> > >>>>>>>> the group to confirm that the new version indeed concludes
>> > > the
>> > >>>>>>>> WGLC. I'll add a note to the writeup to indicate the draft
>> > >> content
>> > >>>>>>>> has been stable since Oct 2011.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Gorry (with my TSVWG Documnent Shepherd hat)
>> > >>>>>>>>>>>> cc: Co-Chairs for info.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> -----Original Message-----
>> > >>>>>>>>>>>>>> From: Gorry Fairhurst [ mailto:gorry@erg.abdn.ac.uk]
>> > >>>>>>>>>>>>>> Sent: Mittwoch, 15. August 2012 21:41
>> > >>>>>>>>>>>>>> To: tsvwg chair
>> > >>>>>>>>>>>>>> Subject: Fwd: Re: [tsvwg] FYI: Document writeup for
>> > >>>>>>>>>>>>>> draft-ietf-tsvwg- byte-pkt-congest
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Hi guys,
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Apart form the usual rhetoric, there are some points
>> > >> here that we
>> > >>>>>>>>>>>>>> should note as chairs. What are you own thoughts? And
>> > >> was
>> > >>>>>>>>>>>>>> anything noted at the IETF in Vancouver concerning
>> > > this
>> > >> draft?
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Gorry
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> -------- Original Message --------
>> > >>>>>>>>>>>>>> Subject: Re: [tsvwg] FYI: Document writeup for
>> > >>>>>>>>>>>>>> draft-ietf-tsvwg-byte- pkt-congest
>> > >>>>>>>>>>>>>> Date: Wed, 15 Aug 2012 15:07:28 -0400
>> > >>>>>>>>>>>>>> From: John Leslie <john@jlc.net>
>> > >>>>>>>>>>>>>> To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
>> > >>>>>>>>>>>>>> CC: tsvwg@ietf.org
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> The following document has been revised after
>> > >> receiving
>> > >>>>>>>> WGLC comments.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  The changes are mostly refreshing a document
>> > >> scheduled to
>> > >>>>>>>>>>>>>> expire this month.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> No issues were found:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Is that with WGC-hat-on?
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Neither my issues nor Philip Eardley's comments were
>> > >> addressed.
>> > >>>>>>>>>>>>>> In fact, not even the changes Bob Briscoe mentioned
>> > > on-
>> > >> list have
>> > >>>>>>>>>>>>>> been included.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> As required by RFC 4858, this is the current template
>> > >> for the
>> > >>>>>>>>>>>>>> Document
>> > >>>>>>>>>>>>>>> Shepherd Write-Up.
>> > >>>>>>>>>>>>>>> ...
>> > >>>>>>>>>>>>>>> (1) What type of RFC is being requested (BCP,
>> > > Proposed
>> > >> Standard,
>> > >>>>>>>>>>>>>>> Internet Standard, Informational, Experimental, or
>> > >> Historic)?
>> > >>>>>>>>>>>>>>> Why is this the proper type of RFC? Is this type of
>> > >> RFC
>> > >>>>>>>>>>>>>>> indicated in the title page header?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> This document is intended as BCP. (This was discussed
>> > >> at IETF-81
>> > >>>>>>>>>>>>>>> and that the status changed from Informational to BCP
>> > >> since it
>> > >>>>>>>>>>>>>>> provides guidance to implementors and people
>> > >> configuring
>> > >>>>>>>> routers and hosts).
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Thank you for finally answering my question about
>> > > BCP
>> > >> status.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> ...
>> > >>>>>>>>>>>>>>> Working Group Summary
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> There was consensus to publish this as a WG document
>> > >> and
>> > >>>>>>>>>>>>>>> agreement at
>> > >>>>>>>>>>>>>>> IETF-82 that the document was now complete.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Was that reviewed on-list?
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> ...
>> > >>>>>>>>>>>>>>> (3) Briefly describe the review of this document that
>> > >> was
>> > >>>>>>>>>>>>>>> performed
>> > >>>>>>>>>>>>>> by
>> > >>>>>>>>>>>>>>> the Document Shepherd. If this version of the
>> > > document
>> > >> is not
>> > >>>>>>>>>>>>>>> ready for publication, please explain why the
>> > > document
>> > >> is being
>> > >>>>>>>>>>>>>>> forwarded to the IESG.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> The document was presented at IEWTF-82 (Taipei), with
>> > >> a request
>> > >>>>>>>>>>>>>>> for
>> > >>>>>>>>>>>>>> WGLC.
>> > >>>>>>>>>>>>>>> WGLC concluded with some discussion but no
>> > > substantive
>> > >> changes
>> > >>>>>>>>>>>>>>> on Friday 30th March 2012.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Yes, I guess that is true; but ignoring substantive
>> > >> comments
>> > >>>>>>>>>>>>>> isn't a good practice IMHO.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> (4) Does the document Shepherd have any concerns
>> > > about
>> > >> the depth
>> > >>>>>>>>>>>>>>> or breadth of the reviews that have been performed?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> No - the original draft had a lot of background
>> > >> material, much
>> > >>>>>>>>>>>>>>> of
>> > >>>>>>>>>>>>>> this
>> > >>>>>>>>>>>>>>> has been condensed or removed, resulting in a smaller
>> > >> document.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  That is indeed what I basically asked for, but I
>> > >> don't see any
>> > >>>>>>>>>>>>>> evidence of it:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> - diff(07,08)  48 lines changed or deleted;  53 lines
>> > >> changed or
>> > >>>>>>>> added
>> > >>>>>>>>>>>>>> - diff(06,07)   8 lines changed or deleted;  13 lines
>> > >> changed or
>> > >>>>>>>> added
>> > >>>>>>>>>>>>>> - diff(05,06)  64 lines changed or deleted;  69 lines
>> > >> changed or
>> > >>>>>>>>>>>>>> added
>> > >>>>>>>>>>>>>> - diff(04,05) 823 lines changed or deleted; 864 lines
>> > >> changed or
>> > >>>>>>>>>>>>>> added
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> (Need I go farther? That gets us back to March
>> > > 2011...)
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> (9) How solid is the WG consensus behind this
>> > >> document? Does it
>> > >>>>>>>>>>>>>>> represent the strong concurrence of a few
>> > > individuals,
>> > >> with
>> > >>>>>>>>>>>>>>> others being silent, or does the WG as a whole
>> > >> understand
>> > >>>>>>>> and agree with it?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> The document has WG support and there is consensus to
>> > >> publish.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  This, IMHO, doesn't answer the question. :^(
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  Does it represent the concurrence of a few
>> > >> individuals, or does
>> > >>>>>>>>>>>>>> the WG as a whole understand and agree with it?
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> For reference, my comments included:
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] When I previously skimmed this document, I believed
>> > >> it
>> > >>>>>>>>>>>>>> contained only ] common ground: that AQM should
>> > >> drop/mark packets
>> > >>>>>>>>>>>>>> without favoring small ] packets, and that to whatever
>> > >> extent
>> > >>>>>>>>>>>>>> packet size _is_ considered, that ] should be a
>> > >>>>>>>> transport-layer responsibility.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] Alas, when I read it carefully this week, I find
>> > > that
>> > >> Bob is
>> > >>>>>>>>>>>>>> actually ] saying that transport-layer should _only_
>> > >> consider
>> > >>>>>>>>>>>>>> "congested bytes", not ] "congested packets".
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] In fact, Bob openly endorses replacing TCP
>> > >> congestion-control
>> > >>>>>>>>>>>>>> with an ] algorithm which calculates "fair-share
>> > > bytes"
>> > >> and
>> > >>>>>>>>>>>>>> doesn't back off at all ] (even in the presence of 20%
>> > >> packet
>> > >>>>>>>>>>>>>> loss) unless you're already sending ] more than this
>> > >>>>>>>> "fair-share".
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ]    I do not believe that is the consensus of this
>> > > WG;
>> > >> and I
>> > >>>>>>>> believe
>> > >>>>>>>>>>>>>> if
>> > >>>>>>>>>>>>>> ] that were the consensus it would exceed our charter.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] The draft contains a number of things I _do_
>> > > support;
>> > >> and I'd
>> > >>>>>>>>>>>>>> be happy ] to support a considerably shorter draft
>> > >> which
>> > >>>>>>>>>>>>>> concentrates on the AQM ] recommendation, omitting any
>> > >>>>>>>>>>>>>> suggestions of modifying TCP congestion ] control.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> and Philip's comments included:
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] S2.2 I thought the numbered bullets could be better
>> > >> phrased,
>> > >>>>>>>>>>>>>> something like ] +this:- ]
>> > >>>>>>>>>>>>>> ]   1.  AQM algorithms such as RED SHOULD use packet-
>> > >> mode
>> > >>>>>>>> drop, ie they
>> > >>>>>>>>>>>>>> SHOULD NOT
>> > >>>>>>>>>>>>>> ] +use byte-mode drop. The latter ] is more complex,
>> > >>>>>>>>>>>>>> ]        it creates the perverse incentive to fragment
>> > >> segments
>> > >>>>>>>> into
>> > >>>>>>>>>>>>>> tiny
>> > >>>>>>>>>>>>>> ]        pieces and it is vulnerable to floods of
>> > >> small-
>> > >>>>>>>>>>>>>> ]        packets.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ]    2.  If a vendor has implemented byte-mode drop,
>> > >> and an
>> > >>>>>>>> operator
>> > >>>>>>>>>>>>>> has
>> > >>>>>>>>>>>>>> ]        turned it on, it is RECOMMENDED to turn it
>> > >> off, after
>> > >>>>>>>>>>>>>> establishing if
>> > >>>>>>>>>>>>>> ] +there are any implications on
>> > >>>>>>>>>>>>>> ] the relative performance of
>> > >>>>>>>>>>>>>> ]        applications using different packet sizes.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ]   3.  RED SHOULD NOT be turned off. Without RED, a
>> > >> drop tail
>> > >>>>>>>>>>>>>> ]        queue biases against large packets.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] Nit - the following paragraph shouldn't be indented
>> > >> (Note well
>> > >>>>>>>>>>>>>> that
>> > >>>>>>>>>>>>>> RED's...)
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] S3.3
>> > >>>>>>>>>>>>>> ] The section could be slightly clearer. Starting with
>> > >> the second
>> > >>>>>>>>>>>>>> para, where you ] +start talking about something you
>> > >> don't
>> > >>>>>>>>>>>>>> recommend (ie you have to remember that ] +you're
>> > >> talking about
>> > >>>>>>>>>>>>>> the consequences of not following your
>> > > recommendation).
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] S3.3
>> > >>>>>>>>>>>>>> ]    o  Also, if the network biases congestion
>> > >> notification by
>> > >>>>>>>> packet
>> > >>>>>>>>>>>>>> size
>> > >>>>>>>>>>>>>> ]       it has to assume a baseline packet size--all
>> > >> proposed
>> > >>>>>>>>>>>>>> algorithms
>> > >>>>>>>>>>>>>> ]       use the local MTU.  Then transports have to
>> > >> guess which
>> > >>>>>>>> link
>> > >>>>>>>>>>>>>> was
>> > >>>>>>>>>>>>>> ]       congested and what its local MTU was, in order
>> > >> to know
>> > >>>>>>>> how to
>> > >>>>>>>>>>>>>> ]       tailor their congestion response to that link.
>> > >>>>>>>>>>>>>> ] So you're saying that for byte-mode drop, the
>> > > network
>> > >> element
>> > >>>>>>>>>>>>>> has to compare the ] +size of the current pkt with the
>> > >> MTU, and
>> > >>>>>>>>>>>>>> if it's 25* smaller then reduce the ] +pkt loss
>> > >> probably by 25*?
>> > >>>>>>>>>>>>>> And if the MTU was bigger, then the current pkt would
>> > > ]
>> > >> +be
>> > >>>>>>>>>>>>>> dropped with greater probability? And both of these
>> > >> things
>> > >>>>>>>> are hard to do.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] S3.4 (& knock-on effect on 3.3) ] I think this could
>> > >> be better.
>> > >>>>>>>>>>>>>> ] I agree with your arguments in S3.1 & 3.2 that
>> > >> dropping or
>> > >>>>>>>>>>>>>> marking should be ] +independent of pkt size.
>> > >>>>>>>>>>>>>> ] You now go into a couple of pages, which I think
>> > >> could be
>> > >>>>>>>>>>>>>> boiled down to a ] +paragraph, saying something like:-
>> > >> ] * the
>> > >>>>>>>>>>>>>> overall system (network + transport) has to be packet-
>> > >> size
>> > >>>>>>>>>>>>>> dependent in ]
>> > >>>>>>>>>>>>>> +order to drive out the right amount of traffic so the
>> > >> resource
>> > >>>>>>>>>>>>>> becomes ]
>> > >>>>>>>>>>>>>> +uncongested (since we assume all resources are in
>> > >> practice byte
>> > >>>>>>>>>>>>>> +rather
>> > >>>>>>>>>>>>>> than ] +packet limited) ] * from S3.1 and S3.2 the
>> > > IETF
>> > >> is
>> > >>>>>>>>>>>>>> convinced that the network should notify ]
>> > > +independent
>> > >> of
>> > >>>>>>>> packet-size.
>> > >>>>>>>>>>>>>> ] * therefore the transport must act dependent on
>> > >> packet-size.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] It seems to me this is a necessary consequence of
>> > >> your earlier
>> > >>>>>>>>>>>>>> assumptions ] +/arguments.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] [On the other hand, if someone disagrees with the
>> > >> arguments and
>> > >>>>>>>>>>>>>> thinks that the ] +network should notify dependent on
>> > >>>>>>>>>>>>>> packet-size, then it follows that the ] +transport can
>> > >> act
>> > >>>>>>>>>>>>>> independent of
>> > >>>>>>>>>>>>>> packet- size.] ] ] I think in S3.3 you're trying to
>> > >> make another
>> > >>>>>>>>>>>>>> argument (?):-  ] * we can either do packet-size
>> > >> biasing in the
>> > >>>>>>>>>>>>>> network element or on the end host ] +transport (see
>> > >> argument
>> > >>>>>>>>>>>>>> above) ] * packet-size biasing in the network is quite
>> > >> tricky,
>> > >>>>>>>>>>>>>> because then both the ] +network element & the
>> > >> transport have to
>> > >>>>>>>>>>>>>> make some guesses about MTU - whereas ]
>> > >>>>>>>>>>>>>> +with packet-size biasing in the transport, neither
>> > >> network
>> > >>>>>>>>>>>>>> +element
>> > >>>>>>>>>>>>>> nor ] +transport needs to know what the MTU is.
>> > >>>>>>>>>>>>>> ] * it is also easier to implement packet-size biasing
>> > >> in the
>> > >>>>>>>>>>>>>> transport than in ] +the network.
>> > >>>>>>>>>>>>>> ]
>> > >>>>>>>>>>>>>> ] On the last bullet, I didn't really follow your
>> > >> argument in
>> > >>>>>>>> S3.5.
>> > >>>>>>>>>>>>>> Might be ] +easier to follow if S3.5 also talked about
>> > >> how a
>> > >>>>>>>>>>>>>> network element would take ] +account of pkt size?
>> > >> Also, does the
>> > >>>>>>>>>>>>>> following argument also apply?:
>> > >>>>>>>>>>>>>> ] * if pkt-size biasing is done in the transport, then
>> > >> the work
>> > >>>>>>>>>>>>>> (to do the ] +biasing) is spread over many end hosts,
>> > > ]
>> > >> * whereas
>> > >>>>>>>>>>>>>> if pkt-size biasing is done in the network, then the
>> > >> work is ]
>> > >>>>>>>>>>>>>> +concentrated in just the congested network element,
>> > >> which has to
>> > >>>>>>>>>>>>>> operate on all ] +the traffic at line speed.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>  (Has there perhaps been some confusion about what
>> > >> went into the
>> > >>>>>>>>>>>>>> -08
>> > >>>>>>>>>>>>>> version?)
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> --
>> > >>>>>>>>>>>>>> John Leslie <john@jlc.net>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >> ________________________________________________________________
>> > >>>>>>>>>>> Bob Briscoe,                                BT Innovate &
>> > >> Design
>> > >>>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >> ________________________________________________________________
>> > >>>>>>> Bob Briscoe,                                BT Innovate &
>> > >> Design
>> > >>>>>>>
>> > >>>>>
>> > >>>>> ________________________________________________________________
>> > >>>>> Bob Briscoe,                                BT Innovate & Design
>> > >>>
>> > >>> ________________________________________________________________
>> > >>> Bob Briscoe,                                BT Innovate & Design
>> > >
> 
> ________________________________________________________________
> Bob Briscoe,                                BT Innovate & Design