Re: [re-ECN] Call for Comments: draft-livingood-woundy-congestion-mgmt-03
Bob Briscoe <rbriscoe@jungle.bt.co.uk> Thu, 04 March 2010 23:52 UTC
Return-Path: <rbriscoe@jungle.bt.co.uk>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9A6EB28C0D7 for <re-ecn@core3.amsl.com>; Thu, 4 Mar 2010 15:52:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.942
X-Spam-Level:
X-Spam-Status: No, score=-1.942 tagged_above=-999 required=5 tests=[AWL=0.174, BAYES_00=-2.599, DNS_FROM_RFC_BOGUSMX=1.482, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S4SkUJnlpHKJ for <re-ecn@core3.amsl.com>; Thu, 4 Mar 2010 15:52:06 -0800 (PST)
Received: from smtp2.smtp.bt.com (smtp2.smtp.bt.com [217.32.164.150]) by core3.amsl.com (Postfix) with ESMTP id 83B573A8BBE for <re-ecn@ietf.org>; Thu, 4 Mar 2010 15:52:05 -0800 (PST)
Received: from i2kc08-ukbr.domain1.systemhost.net ([193.113.197.71]) by smtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 4 Mar 2010 23:52:06 +0000
Received: from cbibipnt08.iuser.iroot.adidom.com ([147.149.100.81]) by i2kc08-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.3959); Thu, 4 Mar 2010 23:52:06 +0000
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by cbibipnt08.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399); id 1267746725723; Thu, 4 Mar 2010 23:52:05 +0000
Received: from MUT.jungle.bt.co.uk ([10.73.128.162]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id o24Nq1CM032382; Thu, 4 Mar 2010 23:52:02 GMT
Message-Id: <201003042352.o24Nq1CM032382@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Thu, 04 Mar 2010 23:52:00 +0000
To: Jason Livingood <jason_livingood@cable.comcast.com>
From: Bob Briscoe <rbriscoe@jungle.bt.co.uk>
In-Reply-To: <C7B467B1.1DAEE%jason_livingood@cable.comcast.com>
References: <fc0ff13d1003011201t36a875d3je13ba7ffe7b432d7@mail.gmail.com> <C7B467B1.1DAEE%jason_livingood@cable.comcast.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 04 Mar 2010 23:52:06.0408 (UTC) FILETIME=[B22BD880:01CABBF5]
Cc: Matt Mathis <matt.mathis@gmail.com>, re-ecn@ietf.org
Subject: Re: [re-ECN] Call for Comments: draft-livingood-woundy-congestion-mgmt-03
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 23:52:13 -0000
Jason, I had read the draft before, and it will be a useful ref. I would advise that you publish soon before things move on, rather than trying to add stuff on applicability to non-cable networks (even tho that would be interesting - perhaps best as a separate short draft). I have some hopefully constructive comments below from my reading of the latest -03 rev. 1/ Average utilisation It would be useful for context to state what proportion of their provisioned access capacity your customers utilise on average (The summary in S.4 would be a good place). If you gave the typical figures your customers demand it would allow readers to better understand why it doesn't make sense to provision as if everyone uses their max bit-rate all the time. It will also explain why a slight change in the behaviour of many customers all at once might easily congest provisioned capacity. And it would also give context to the stats in S.5.2, where customers using 70% of their provisioned capacity are considered excessive when aggregate utilisation is about 70%. 2/ Why not increase capacity instead? Many readers will ask this question. I believe the answer is straightforward. It would be nice to be able to point to your document for that answer. We tried to give the answer in a draft we have allowed to expire (but we could rejuvenate it if there's demand): See S.3.1 of <draft-briscoe-tsvwg-relax-fairness-01.txt> available at: <http://www.bobbriscoe.net/pubs.html#relax-fairness> 3/ Port utilisation thresholds & durations It should be pointed out that inferring congestion from utilisation is only a heuristic, so the recommended figures are only applicable given current traffic patterns. For instance, port utilisation thresholds can increase as aggregate bit-rates increase over the years (more aggregation leads to more smoothing). Conversely, if the traffic mix became more bursty due to different popular applications, there would be more congestion at lower utilisation. Also the 15-minute duration was presumably found to be useful because shorter episodes of high utilisation tend not to always become persistent. It would be worth saying what criteria you used to make these judgements, as I suspect figures like this will get quoted as gospel by people who don't consider what objective was trying to be achieved. 4/ What is "actually congested"? At the end of S.5.1 it says, "the instances where the network actually becomes congested during the Port Utilization Duration are few," How is "actually congested" defined? Is just one loss considered "actual congestion"? Presumably not. This is important, as many consider mild levels of loss as perfectly healthy, so it's not easy to know the situation that Comcast is trying to avoid with this congestion mangement scheme. This is the objective of the whole congestion management scheme, so it's important to be precise here. In S.5.3 there's the sentence "in those rare cases where packets may be dropped..." But a single TCP flow with RTT 80ms can attain 50Mb/s with a loss fraction of 0.0013% (1 in ~74000 packets) so there's no need to try to achieve loss figures much lower than this. And indeed, if flows aren't bottlenecked elsewhere, TCP will drive the system until it gets such loss levels. If, instead, a customer is downloading 5x 10Mb/s TCP flows still with 80ms RTT, TCP will drive losses up to 1 in ~3000 or 0.03% and any lower loss rates won't be able to improve performance. 5/ Straying into "justificatory advertising"? Generally the doc keeps to technical facts, which is good. In some places I detected a little bit of business justification, which was OK, because it was mild. However, the para justifying User Consumption Thresholds based on usage of VoIP, web surfing, Hulu etc etc struck me as too much like justifying Comcast's choices - it felt like the doc was speaking to the FCC at this point. This may well be how Comcast judges where to pitch these thresholds. But it doesn't sit well in a doc that says that this technique is app-neutral. The same info could be given but in a different way. For instance, you could say business success for companies like Comcast depends on judging what the maximum demand will be from the large majority of households for different applications like VoIP, web surfing, Hulu. So Comcast pitch capacity at a level that allows x, y & z simultaneously. But of course it is up to each household what mix of apps they actually use. Yada yada. 6/ Why hysteresis? Can it fail? In the para starting "A user's traffic is released from a BE state..." in S.5.2, firstly, there's a nit in that 'hysteresis' is given as an alternative word for 'QoS oscillation', when hysteresis is actually the hi-lo watermark technique you use to prevent QoS oscillation. Moving to my point: it's not clear why hysteresis is considered better than QoS oscillation. You say the approach worked well, which presumably means it prevented QoS oscillation. But where's the evidence that QoS oscillation is not actually preferred by the affected customers? To regain the right to priority BE, a customer has to drop consumption to <50% for ~15mins. Without the low watermark, they could regain PBE without having to reduce consumption so much. Wouldn't they prefer this? Or are you saying the watermark approach is beneficial to all the other customers, rather than to the heavy customers? It's also not clear to me how the customer's software knows how to get out of jail. Does forcing them into BE always force their consumption to drop below ~50% for 15mins? So they will automatically get out of jail? Do they have to be using a TCP-like transport that responds to the drops from BE by taking them below ~50%? If their software doesn't automatically take them below 50%, can't they get trapped in jail for a long time without knowing the passkey to get out (ie. reduce consumption <50%)? 7/ What if no users stand out? Wrt Fig 2: "What if everyone persistently arrives at Result #2 'No action taken'"? This could happen if the distribution of customers doesn't include any seriously heavy customers, just a lot of equally heavy customers. I guess you consider upgrading capacity at this juncture? 8/ Bytes transferred during congestion not necessarily the problem. It should be stated somewhere that different transports can respond more or less aggressively to others sharing the same capacity, while still transferring as many bytes (e.g. LEDBAT-like transports). This is one limitation of the approach described: it doesn't reward yielding behaviour. Elsewhere Rich Woundy has compiled a wider list of limitations of Comcast's approach. It might be worth stating these, either in this doc, or in a companion doc along with the generalisations to other technologies that you were discussing with Matt. 9/ All in all... A very useful contribution to the IETF's knowledge. Thank you very much for all the work involved in having produced this. Nits: ----- 491 Even without channel bonding, 492 multiple channels are usually configured to come out each physical 493 port. s/out each/out of each/ Fig 1: Double vision nightmare! 631 this is usually done through a command sent from the network and 632 without any effect on the subscriber. s/without any effect on the subscriber./without the subscriber noticing./ [If there's no effect at all, why do it!?] 742 (Note: Although each cable modem is typically 743 assigned to a particular household, the software does not and cannot 744 actually identify individual users, the number of users sharing a 745 cable modem, or analyze particular users' traffic.) For purposes of 746 this document, we use "cable modem", "user", and "subscriber" 747 interchangeably to mean a subscriber account or user account and not 748 an individual person. This note would be better in the terminology section (S.2). Question #3 under Fig 2: Add: "over a period of 15 minutes." S.5.3 heading: s/Users&apos/Users'/ 942 (It is important to note that 943 congestion can occur in any IP network, and, when it does, packets 944 can be delayed or dropped. As a result, applications and protocols 945 have been designed to deal with this reality. Our congestion 946 management system attempts to ensure that, in those rare cases where 947 packets may be dropped, BE packets are dropped before PBE packets are 956 dropped.) This parenthesis should be upfront somewhere at the start of the doc, and not in parentheses. 1019 find a specific complaint that can be traced back to the effected of 1020 this congestion management system. s/effected/effect/ Cheers Bob ________________________________________________________________ Bob Briscoe, BT Innovate & Design
- [re-ECN] Call for Comments: draft-livingood-wound… Jason Livingood
- Re: [re-ECN] Call for Comments: draft-livingood-w… Ingemar Johansson S
- Re: [re-ECN] Call for Comments: draft-livingood-w… Matt Mathis
- Re: [re-ECN] Call for Comments: draft-livingood-w… Jason Livingood
- [re-ECN] Capturing Current Deployments ... was Re… Hannes Tschofenig
- Re: [re-ECN] Call for Comments: draft-livingood-w… Bob Briscoe
- Re: [re-ECN] Capturing Current Deployments ... wa… Woundy, Richard
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Woundy, Richard
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Ingemar Johansson S
- Re: [re-ECN] Call for Comments: draft-livingood-w… Bob Briscoe
- Re: [re-ECN] Call for Comments: draft-livingood-w… Jason Livingood
- Re: [re-ECN] Call for Comments: draft-livingood-w… Stanislav Shalunov
- Re: [re-ECN] Call for Comments: draft-livingood-w… Jason Livingood
- Re: [re-ECN] Call for Comments: draft-livingood-w… Bob Briscoe
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Dirk Kutscher
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Ingemar Johansson S
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Mcdysan, David E
- Re: [re-ECN] Call for Comments:draft-livingood-wo… Dirk Kutscher
- Re: [re-ECN] Call for Comments:draft-livingood-wo… bo zhou
- [re-ECN] mobile device "congestion" John Leslie
- Re: [re-ECN] mobile device "congestion" bo zhou
- Re: [re-ECN] mobile device "congestion" John Leslie
- Re: [re-ECN] mobile device "congestion" Bob Briscoe