Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR and other congestion control mechanisms
"Black, David" <David.Black@dell.com> Fri, 09 December 2016 02:32 UTC
Return-Path: <David.Black@dell.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BDD301294C0 for <tsvwg@ietfa.amsl.com>; Thu, 8 Dec 2016 18:32:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.71
X-Spam-Level:
X-Spam-Status: No, score=-2.71 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); domainkeys=fail (1024-bit key) reason="fail (message has been altered)" header.from=David.Black@dell.com header.d=dell.com; dkim=pass (1024-bit key) header.d=dell.com header.b=AlRMxfyN; dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=emc.com header.b=ALY9jYe3
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uNNzBsQDopRL for <tsvwg@ietfa.amsl.com>; Thu, 8 Dec 2016 18:32:22 -0800 (PST)
Received: from esa6.dell-outbound.iphmx.com (esa6.dell-outbound.iphmx.com [68.232.149.229]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5FA43129572 for <tsvwg@ietf.org>; Thu, 8 Dec 2016 18:32:22 -0800 (PST)
DomainKey-Signature: s=smtpout; d=dell.com; c=simple; q=dns; h=Received:From:Cc:Received:Received:X-DKIM:DKIM-Signature: X-DKIM:Received:Received:Received:To:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:x-originating-ip:Content-Type: MIME-Version:X-Sentrion-Hostname:X-RSA-Classifications; b=MGMsCEr+whYfMfwSH+jxaEmpbyirlMw2EvToYQ00BC0ZjIWf4YrTrwCG BffddxCvRzAVB8E0L1QWr75SiMYbTpae/eo5sfV9y9y0ypAJhKg/ojCND v3gnOmEGg8f6p/ypFiGKo6j7TKvtJiqVFxioNoIjp0ggtY9Hi0Z5JpDjU M=;
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dell.com; i=@dell.com; q=dns/txt; s=smtpout; t=1481250742; x=1512786742; h=from:cc:to:subject:date:message-id:references: in-reply-to:mime-version; bh=bZR+GQXWo0YcKWN2mlQpS8nn6M2ICPyyLibpGMmWebY=; b=AlRMxfyN9XCpSXDxpzdx+bJGUKBp8Ay2dnUvLlI5tokNM64O1C7uKLc1 ET2K/BOqY/cP59U6nQqcnLk9bFrtS8P7qJbCS5J5rCooE1gNrF8mfPdr9 VMPFsq0d97ejnGxC+5DW/VJbYmnt/8OqxqZrVvj+eS17GOP0s1U06mBL+ s=;
Received: from esa6.dell-outbound2.iphmx.com ([68.232.154.99]) by esa6.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Dec 2016 20:32:21 -0600
From: "Black, David" <David.Black@dell.com>
Received: from mailuogwdur.emc.com ([128.221.224.79]) by esa6.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Dec 2016 08:32:19 +0600
Received: from maildlpprd52.lss.emc.com (maildlpprd52.lss.emc.com [10.106.48.156]) by mailuogwprd53.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id uB92WHB6016047 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 8 Dec 2016 21:32:18 -0500
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd53.lss.emc.com uB92WHB6016047
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1481250739; bh=ppjWW1BNqah7d4/s6FN/3ED4SoQ=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=ALY9jYe3Qaah9KX4JmDx0DDQqZtOfun9RD+NPwO9IgDBGgxkR5H1exUk58B92fngS Q2H0AlMr9H/bhvFMrSg1FdIhoVwRX6vFTD0AEd6SnDoYCb83DXOIichL+h0557jjD8 sYn0Ff5VYD7gNMJ/DeC3q2wcWhrXoH9PAur0aN/o=
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd53.lss.emc.com uB92WHB6016047
Received: from mailusrhubprd04.lss.emc.com (mailusrhubprd04.lss.emc.com [10.253.24.22]) by maildlpprd52.lss.emc.com (RSA Interceptor); Thu, 8 Dec 2016 21:32:05 -0500
Received: from MXHUB303.corp.emc.com (MXHUB303.corp.emc.com [10.146.3.29]) by mailusrhubprd04.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id uB92W8jP016036 (version=TLSv1.2 cipher=AES128-SHA256 bits=128 verify=FAIL); Thu, 8 Dec 2016 21:32:08 -0500
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB303.corp.emc.com ([10.146.3.29]) with mapi id 14.03.0266.001; Thu, 8 Dec 2016 21:32:07 -0500
To: Karen Elisabeth Egede Nielsen <karen.nielsen@tieto.com>, Bob Briscoe <in@bobbriscoe.net>
Thread-Topic: [iccrg] [tsvwg] Policing and L4S WAS RE: BBR and other congestion control mechanisms
Thread-Index: AQHSTjHsG34F/dw+NU+oETEgMJ94NKD+43pQ
Date: Fri, 09 Dec 2016 02:32:07 +0000
Message-ID: <CE03DB3D7B45C245BCA0D243277949362F788B17@MX307CL04.corp.emc.com>
References: <73e2a785b099e897990704d3fdd8c078@mail.gmail.com> <6f068e27c5c0c61a87c5daed27cb0da3.squirrel@server.dnsblock1.com> <c97583d1d72545e5ab7455323afc115b@mail.gmail.com> <CE03DB3D7B45C245BCA0D243277949362F74E34B@MX307CL04.corp.emc.com> <7599733a-d728-0d7d-3615-e8d3b4ec70c6@bobbriscoe.net> <b01082ceac99a176e7e753ebf2694a3e@mail.gmail.com>
In-Reply-To: <b01082ceac99a176e7e753ebf2694a3e@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.105.8.135]
Content-Type: multipart/alternative; boundary="_000_CE03DB3D7B45C245BCA0D243277949362F788B17MX307CL04corpem_"
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd04.lss.emc.com
X-RSA-Classifications: public, GIS Solicitation
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/wA7Pbxvi0vuidI-HkMUhomGUHm8>
Cc: "iccrg@irtf.org" <iccrg@irtf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Subject: Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR and other congestion control mechanisms
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Dec 2016 02:32:27 -0000
A few follow-up comments as an individual: > [Karen Elisabeth Egede Nielsen] And you’re doing a great deal to provide mechanisms > that routers can deploy to support good interworking with ECN/l4s. > We need the same for policers. > It is not really enough that ECN/l4s does not do worse than classic where there > is a drop policer, if this often can be the case. +1 - I expect this to be more difficult to specify and deployed than was the case for ECN support in routers, as ECN-based AQM is not close to what policers currently do. > So yes, we need to suggest which PHBs are appropriate for L4S. I'd be happy to work > with someone doing that, but it's not my priority at the moment. I'll certainly add an > open issues section in the architecture doc and include this so it doesn't get lost. Good - IMHO, that’s important, as at the very least, guidance ought to be provided on which PHBs SHOULD vs. SHOULD NOT be used for (initial) experimental deployments of L4S. There may even be some prohibitions on L4S usage with certain PHBs - VOICE-ADMIT and the network control traffic (e.g., routing) that typically uses CS6 spring to mind as possibilities. >> I have another concern with the text quoted from >> the L4S architecture draft - it contains an assertion that networks ought to just >> ignore the L4S identifier and pass that identifier through unchanged for traffic that >> does not receive L4S service. The analogous exhortation for DSCPs has failed >> miserably for operational reasons [... snip ...] >> If the L4S identifier selects a distinct low latency network service, it will become a >> good network operational practice to remove that identifier from traffic that is >> not authorized to receive that low latency service. > Bleaching ECT(1) cannot be ruled out as a future scenario But we have tried to do as much as we can to head-off this possibility: > 1) By explaining exactly how operators can offer exclusivity without needing to bleach > 2) We are starting from a position where the ECN field is near-universally not bleached. So to bleach, an operator will > have to actively destroy something they currently support. Operators know they would risk referral to their local regulator for that. > 3) The ECN field is part of the Internet's congestion control system, so there is a higher bar to tampering with ECN than there was for Diffserv. Once ECT(1) starts to participate in queue selection in routers, points 2) and 3) above become irrelevant - the network operator owns those router forwarding resources and will exercise complete control over what traffic is allowed to use them, period. Beyond that, I see some optimistic thinking in 1), and while I encourage optimism in the pursuit of innovation, I also suggest a look at the following message for a network operations perspective: https://www.ietf.org/mail-archive/web/tsvwg/current/msg14839.html Thanks, --David From: Karen Elisabeth Egede Nielsen [mailto:karen.nielsen@tieto.com] Sent: Sunday, December 04, 2016 8:26 AM To: Bob Briscoe; Black, David Cc: iccrg@irtf.org; tsvwg@ietf.org Subject: RE: [iccrg] [tsvwg] Policing and L4S WAS RE: BBR and other congestion control mechanisms HI, Just to be clear. I support the ecn work and potential adoption of this work. My comments do not go in the disqualifying the work, rather the opposite. Inline below. BR, Karen From: iccrg [mailto:iccrg-bounces@irtf.org<mailto:iccrg-bounces@irtf.org>] On Behalf Of Bob Briscoe Sent: 1. december 2016 02:23 To: Black, David <David.Black@dell.com<mailto:David.Black@dell.com>>; Karen Elisabeth Egede Nielsen <karen.nielsen@tieto.com<mailto:karen.nielsen@tieto.com>> Cc: iccrg@irtf.org<mailto:iccrg@irtf.org>; tsvwg@ietf.org<mailto:tsvwg@ietf.org> Subject: Re: [iccrg] [tsvwg] Policing and L4S WAS RE: BBR and other congestion control mechanisms David, On 19/11/16 04:19, Black, David wrote: Commenting as an individual (not WG chair), at least for now ... If I read this correctly, that L4S is not compatible with Diffserv-style bandwidth policing and token buckets in particular, Well, no. L4S is not incompatible with Diffserv-style bandwidth policing. Because, on loss the L4S requirements say a source must fall back to a Classic loss-response (e.g. Reno, Cubic, or something equivalent if it's a real-time transport). So a tb bandwidth policer (2-colour, 3-colour, 1-rate, 2-rate, etc) will work just fine to limit an L4S source. I think what Karen meant was the other way round: That an L4S source will not get L4S service from a tb bandwidth policer, [Karen Elisabeth Egede Nielsen] Yes that was what I was addressing. just as it won't get L4S service from a non-L4S bottleneck router or switch. Precisely because L4S falls back to non-L4S (Classic) behaviour in both cases; whenever it gets a loss. [Karen Elisabeth Egede Nielsen] And you’re doing a great deal to provide mechanisms that routers can deploy to support good interworking with ECN/l4s. We need the same for policiers. It is not really enough that ECN/l4s does not do worse than classic where there is a drop policier, if this often can be the case. Interaction with Policing is in the L4S architecture doc (which used to be the L4S problem statement): See section 8.1 Traffic (Non-)Policing<https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-arch-00#section-8.1>, which is in section 8 'Security Considerations'. Section 8.2 'Latency Friendliness'<https://tools.ietf.org/html/draft-briscoe-tsvwg-l4s-arch-00#section-8.2> explains that some operators might introduce burst policing for L4S. However, 8.2 recommends that we start without policing and instead we recommend ways to get a good service while minimizing burstiness. I call it 'latency friendliness', because it's based on "social pressure"; somewhat like the traditional approach to TCP friendliness, where we recommend good algorithms in the expectation they will avoid the need for policing. But it doesn't preclude operators deploying policers. [Karen Elisabeth Egede Nielsen] I think that ECN would be much more powerful if one could device how to make policing interwork with it. I would rather like to see ECN not as a replacement of DiffServ, but as a supplement as also indicated in section 5.2 of the draft (the underlined parts below): Diffserv: Diffserv addresses the problem of bandwidth apportionment for important traffic as well as queuing latency for delay- sensitive traffic. L4S solely addresses the problem of queuing latency. Diffserv will still be necessary where important traffic requires priority (e.g. for commercial reasons, or for protection of critical infrastructure traffic). Nonetheless, if there are ^^^^^^^^^^^^^^^^^^^^^^ Diffserv classes for important traffic, the L4S approach can ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ provide low latency for _all_ traffic within each Diffserv class^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (including the case where there is only one Diffserv class). Something else, suddenly in this paragraph it is written that: “L4S solely addresses the problem of queuing latency. “ As also indicated elsewhere in the draft, keeping loss down is also a primary part of L4s and a prerequisite for that l4s works well. I think that the draft could better motivate the issue with the policiers and the necessity to deploy L4S friendly policies (as you discuss below). This was my point. BR, Karen then the assertion that L4S is universally applicable to all Internet traffic just took a serious hit. I guess you're referring to the opening sentence of the abstract of the L4S architecture: "...a new service that the Internet could provide to eventually replace best efforts for all traffic...". That doesn't mean that when a sender turns on the ECT(1) codepoint it magically creates an L4S implementation in every bottleneck and every policer that the sender traverses. Nonetheless, the point of L4S is for performance to be so good, that it magically makes operators want to deploy an L4S AQM at their bottlenecks (and to deploy more L4S-friendly policers). I might suggest considering which PHBs are more vs. less appropriate for L4S, as token buckets are rather common out there and will have to be dealt with as part of incremental deployment of L4S. Karen - thank you for pointing this out. I think you have got the wrong end of the stick. When we said "for all traffic", we meant the default L4S service could give the service that the traffic requires (that is, for example, the EF or AF /service/, not the EF or AF /per hop behaviour/). We didn't mean the traffic has to be marked EF or AF to get that service, we just meant that the default L4S service would be /roughly/ as good as EF or AF, without having to implement EF or AF. The inclusion of EF is probably an overclaim - there's no explicit mechanism in L4S to give any guarantee of EF-like service. Nonetheless, I included EF in my list, because the Internet has always proved good at meeting performance goals without being able to guarantee that it will. That said, L4S gives only the low queueing delay feature of Diffserv, but not the "priority over bandwidth during anomalies" feature that some Differv classes give over others. We will need to make that clear - it's not clear at the moment. For bandwidth priority, it would be appropriate to use relevant non-default Diffserv PHBs, and it would be appropriate to use L4S within those classes. I have analysed this for the specific case of the classes used by my previous employer, but not for all the Diffserv classes that the IETF has ever dreamed of. So yes, we need to suggest which PHBs are appropriate for L4S. I'd be happy to work with someone doing that, but it's not my priority at the moment. I'll certainly add an open issues section in the architecture doc and include this so it doesn't get lost. In addition to Karen's concern, I have another concern with the text quoted from the L4S architecture draft - it contains an assertion that networks ought to just ignore the L4S identifier and pass that identifier through unchanged for traffic that does not receive L4S service. The analogous exhortation for DSCPs has failed miserably for operational reasons - bleaching to zero is entirely too common so that traffic is marked for the service it is supposed to (authorized to) receive from the network, or as stated in Section 2 of See draft-ietf-tsvwg-diffserv-intercon: RFC2474's recommendation to forward traffic with unrecognized DSCPs with Default (best effort) service without rewriting the DSCP has proven to be a poor operational practice. Network operation and management are simplified when there is a 1-1 match between the DSCP marked on the packet and the forwarding treatment (PHB) applied by network nodes. When this is done, CS0 (the all-zero DSCP) is the only DSCP used for Default forwarding of best effort traffic, and a common practice is to remark to CS0 any traffic received with unrecognized or unsupported DSCPs at network edges. If the L4S identifier selects a distinct low latency network service, it will become a good network operational practice to remove that identifier from traffic that is not authorized to receive that low latency service. Bleaching ECT(1) cannot be ruled out as a future scenario. But we have tried to do as much as we can to head-off this possibility: 1) By explaining exactly how operators can offer exclusivity without needing to bleach 2) We are starting from a position where the ECN field is near-universally not bleached. So to bleach, an operator will have to actively destroy something they currently support. Operators know they would risk referral to their local regulator for that. 3) The ECN field is part of the Internet's congestion control system, so there is a higher bar to tampering with ECN than there was for Diffserv. That last point is subtle. L4S is not a "network service" like Diffserv. L4S is a service largely induced by the sender behaviour, with the network playing a minor but important complementary role (by notifying congestion at a shallow queue threshold). So, unlike Diffserv, the ECN field is not just a request down the layers asking the network for a particular service. It is also the place where congestion notification is carried up the layers, from network to transport. Sorry this reply took a while. When you sent it, I had no laptop (my charger failed on the Wed of the IETF). I got it fixed, but I'm only just getting back to those emails I missed. Cheers Bob Thanks, --David -----Original Message----- From: tsvwg [mailto:tsvwg-bounces@ietf.org] On Behalf Of Karen Elisabeth Egede Nielsen Sent: Friday, November 18, 2016 4:55 AM To: in@bobbriscoe.net<mailto:in@bobbriscoe.net> Cc: iccrg@irtf.org<mailto:iccrg@irtf.org>; tsvwg@ietf.org<mailto:tsvwg@ietf.org>; Michael Welzl Subject: Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR and other congestion control mechanisms HI Bob, Yes, well aware that L4S will only be as good as the best ISP on the path. [Karen Elisabeth Egede Nielsen] yes. One would hope that an operator deploying an L4S AQM would ensure its policers were compatible, but of course your point is that traffic can traverse other operators too. [Karen Elisabeth Egede Nielsen] I wish :-). Not close to having that grand overview at all, simply trying to understand the dependencies and how to make this work in a product. Using token bucket policing with drop to enforce SLAs pops up as an obvious issue then. This is perfectly fine and as it should be, BUT it is actually a very important point looking at the QOS features being developed for products, e.g., smart NICs, If we have token buckets there or don't get ECN in there in the Shaper logic, then really the interworking of Diff Server QoS and ECN will not be possible as indicated in the draft. I am generally assuming that operators would initially want to provide exclusive access to L4S for their own (probably paying) customers. l4s-arch says: Certain network operators might choose to restict access to the L4S class, perhaps only to customers who have paid a premium. In the packet classifer (item 2 in Figure 1), they could identify such customers using some other field than ECN (e.g. source address range), and just ignore the L4S identifier for non-paying customers. This would ensure that the L4S identifier survives end-to-end even though the service does not have to be supported at every hop. Such arrangements would only require simple registered/not-registered packet classification, rather than the managed application-specific traffic policing against customer-specific traffic contracts that Diffserv requires. [Karen Elisabeth Egede Nielsen] Yes I saw this text but it doesn't really say that ECN l4s service requires that the operators do not ingress BW police the traffic - sure this is clear from a technical point (also finally even for me now ..) but as BW policing is really a basic part of Diff Serv and QoS arch I think that it could be good to stress that point in Section 5.2. You may use l4s to regulate traffic within a diff serv class but the traffic must not be subject to non ECN regulated drop at any point not in potential prior diff serv classification steps either. As an example, if you like, then in OPenstack QoS API there is (only yet) a simple BW policier function. Products are being build on this. If we want to provide l4s better be aware either to disable the function or that it need be done proper (..). Virtual services that are afforded l4s may not have a physical NIC at their sole disposal and BW policing for control of the shared access to the NIC resources is being used. Better make clear from the start that if one want ever to be able to provide l4s then these BW policing functions must eventually implement ECN enabled AQM. BR, Karen Bob On Thu, November 17, 2016 12:36 pm, Karen Elisabeth Egede Nielsen wrote: HI Bob, All, Very interesting from my perspective. Thanks a lot for this presentation and mail discussion. With the point of view of acc ECN/l4s then I wonder if it was worth to include some text about token bucket policing and L4S in the L4S Architecture draft, draft-briscoe-tsvwg-l4s-arch-00. There is some considerations there already in section 5.2 and section 8.1, but would it be relevant to make it more explicit that the following statement (section 5.2) Nonetheless, if there are Diffserv classes for important traffic, the L4S approach can provide low latency for _all_ traffic within each Diffserv class (including the case where there is only one Diffserv class). Is only really truly valid if ingress diff serv classification/policing/shaping it self - e.g., a BW policier - does not implement drop and in case of a shaper (and queuing) then ECN must be implemented by the shaper as well. I suppose the latter somehow falls into ECN recommendations for AQM, so perhaps these considerations (end the differences to a policier that do drop) are already totally clear to everybody. ECN sort of is the alternative to let the CC be latency increase aware, but as pointed out here then with bold drop token bucket policing we really have neither. Thanks BR, Karen -----Original Message----- From: iccrg [mailto:iccrg-bounces@irtf.org] On Behalf Of Bob Briscoe Sent: 17. november 2016 08:35 To: Michael Welzl <michawe@ifi.uio.no><mailto:michawe@ifi.uio.no> Cc: iccrg@irtf.org<mailto:iccrg@irtf.org>; Yuchung Cheng <ycheng@google.com><mailto:ycheng@google.com> Subject: Re: [iccrg] BBR and other congestion control mechanisms Michael, On Thu, November 17, 2016 1:49 am, Michael Welzl wrote: Hi, An add-on to my previous email below: Mechanisms like Veno and CTCP, if I get this right, only decide that a loss is due to congestion when it also comes with delay growth. I just witnessed your presentation in MAPRG about policing, and how BBR interacts with it: https://www.ietf.org/proceedings/97/slides/slides-97-maprg-traffic-p o lici ng-in-the-internet-yuchung-cheng-and-neal-cardwell-00.pdf <https://www.ietf.org/proceedings/97/slides/slides-97-maprg-traffic-<https://www.ietf.org/proceedings/97/slides/slides-97-maprg-traffic-policing-in-the-internet-yuchung-cheng-and-neal-cardwell-00.pdf> p olic ing-in-the-internet-yuchung-cheng-and-neal-cardwell-00.pdf><https://www.ietf.org/proceedings/97/slides/slides-97-maprg-traffic-policing-in-the-internet-yuchung-cheng-and-neal-cardwell-00.pdf> ( Thanks for giving this presentation, this was very interesting. ) So this would mean that Veno, CTCP etc. would get it wrong in the face of a token bucket - there would be loss without delay growth, yet it IS due to a rate limit. Given the prevalence of policing that you have shown, and how BBR nicely interacts with it, I find this very interesting (and good); I’m not sure if this has been considered in the design of any congestion control before? Does anyone know of any other examples? 1/ I would be interested if there is enough data in the BBR dataset to zoom in on clients in specific ISPs. I know at least one traffic management vendor uses a congestion policer that limits the rate of heavy users, but the rate at which it limits varies all the time, because it is the outcome congestion limit, not a rate limit. More precisely, it enforces a "congestion rate token bucket" limit on the per-user contribution to shared congestion. This would look similar to the varying capacity of a scheduler, but distinguished from 3G/LTE or WiFi by only tiny additional delay, because the measurement of shared congestion in the policer is based on an AQM on a very high bandwidth link (typically 10G or 40Gb/s), so queuing delay is very low before loss appears. 2/ HULL (High throughput, utra-low latency) uses DCTCP at the sender. In the network it marks ECN on the real packets based on the length of a 'phantom' queue, which is a number incremented by the size of every arriving packet, and decremented slightly more slowly than the drain rate of the real queue. This keeps the real queue extremely short. This is an example where there is zero delay build-up before the intended operating point. Because of the lack of delay, it also uses hardware pacing on the sender NIC, because there is no queuing delay to space out TCP's ACK clock otherwise. I don't know whether any DC operators are using HULL (does anyone?). You would not want to put Not-ECT packets (like current BBR) into such a queue structure. Bob Cheers, Michael On Nov 17, 2016, at 9:57 AM, Michael Welzl <michawe@ifi.uio.no><mailto:michawe@ifi.uio.no> wrote: Hi Yuchung, On Nov 17, 2016, at 8:22 AM, Yuchung Cheng <ycheng@google.com<mailto:ycheng@google.com> <mailto:ycheng@google.com><mailto:ycheng@google.com>> wrote: On Wed, Nov 16, 2016 at 5:14 PM, Michael Welzl <michawe@ifi.uio.no<mailto:michawe@ifi.uio.no> <mailto:michawe@ifi.uio.no><mailto:michawe@ifi.uio.no>> wrote: Dear all, Thanks very much to the authors of BBR for an interesting presentation yesterday! (see https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr- cong estion-control-01.pdf <https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr<https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr-congestion-control-01.pdf> -con<https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr-congestion-control-01.pdf> gestion-control-01.pdf><https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr-congestion-control-01.pdf> if you missed it). I’d be interested to know how BBR compares to some other TCP variants. For example, TIMELY: http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p537.pdf <http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p537.pdf><http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p537.pdf> Any idea, anyone? ( I tried, honestly, I did - but I just can’t manage to hold back a tiny bit of sarcasm here: ...is BBR also 133 times faster than TIMELY? :-) ) I'd like to explain further about 133x result of BBR vs Cubic first. This is a spot-test. It is not a general quantification of BBR's improvement on our B4 network (we'll publish that data in a separate report). This is a 8MB RPC every 30s on a warmed connection on east-US <-> west-EU using the lowest QoS (best effort). That particular path has quite a bit of burst induced losses. In this case, Cubic just died due to 1/sqrt(p). Notice the network path is fairly different from the typical Internet access: it's +10G from start to end, with little buffers (compared to BDP) on the way to build persistent large queues. The losses are caused by burst traffic (of same or higher QoS). What the number 133x highlighted was when loss does not indicate a persistent queue, using it as a sole congestion signal is brittle. It is not to say loss signal is completely useless, but it does not always indicate self-induced queues. Thanks for this information! Comparing TIMELY and BBR are at one level comparing apples to oranges. TIMELY is designed for a very specific hw/transport (RDMA/kernel-bypass) and networks (intra-DC) with the goal to reduce tail latency. Hence it uses specific NIC features like HW timestamps to perform its control. Well … I understand it was made for intra-DC communication, and it uses these features to get the necessary timer granularity. I can’t see why this wouldn’t make it applicable to networks with larger RTTs? BBR is designed to be a generic good transport to achieve high tput with adequate queue. It faces an unknown network and sole feedback is TCP-acks hence it must first perform a exponential search to get a basic sense of the network. It also needs to deal with all sorts of adverse issues in the wild Internet as we'v described. Right… E.g. I guess Timely wouldn’t work so well when facing losses. A more interesting comparison would be vs other Internet-CC like Vegas (which we are working on). No, I think that’s a terribly uninteresting comparison to do. I’d even call it ridiculous, as many many MANY mechanisms work better than Vegas, after all it's now 16 years old. May I propose comparing against the state of the art, not the state of 2000? Some of the direct follow-ups on Vegas for example, like FAST? Or delay-based algorithms that also use a bit more information than just delay? E.g. Westwood comes to mind, as a mechanism that also looks at the rate at which ACKs arrive. AFAIK this one is readily available in Linux for comparison. Also, some TCP variants try to distinguish between random packet loss and congestion-induced packet loss - so similar to BBR, you wouldn’t overreact to random losses with them. Veno is an example, and CTCP (Coded TCP, not Compound TCP) too. Cheers, Michael _______________________________________________ iccrg mailing list iccrg@irtf.org<mailto:iccrg@irtf.org> https://www.irtf.org/mailman/listinfo/iccrg _______________________________________________ iccrg mailing list iccrg@irtf.org<mailto:iccrg@irtf.org> https://www.irtf.org/mailman/listinfo/iccrg _______________________________________________ iccrg mailing list iccrg@irtf.org<mailto:iccrg@irtf.org> https://www.irtf.org/mailman/listinfo/iccrg -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- [tsvwg] Policing and L4S WAS RE: [iccrg] BBR and … Karen Elisabeth Egede Nielsen
- Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR … De Schepper, Koen (Nokia - BE)
- Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR … Bob Briscoe
- Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR … Yuchung Cheng
- Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR … Black, David
- Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR … Karen Elisabeth Egede Nielsen
- Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR … Black, David
- Re: [tsvwg] Policing and L4S WAS RE: [iccrg] BBR … Bob Briscoe
- Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR … Karen Elisabeth Egede Nielsen
- Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR … Black, David
- Re: [tsvwg] [iccrg] Policing and L4S WAS RE: BBR … Bob Briscoe