Re: [OPSAWG] AD review of draft-ietf-opsawg-large-flow-load-balancing (draft response)
Benoit Claise <bclaise@cisco.com> Fri, 28 March 2014 11:24 UTC
Return-Path: <bclaise@cisco.com>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 69C891A02FF for <opsawg@ietfa.amsl.com>; Fri, 28 Mar 2014 04:24:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.51
X-Spam-Level:
X-Spam-Status: No, score=-9.51 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YTXxbkBAYJhV for <opsawg@ietfa.amsl.com>; Fri, 28 Mar 2014 04:24:15 -0700 (PDT)
Received: from aer-iport-2.cisco.com (aer-iport-2.cisco.com [173.38.203.52]) by ietfa.amsl.com (Postfix) with ESMTP id 1D8171A04F6 for <opsawg@ietf.org>; Fri, 28 Mar 2014 04:24:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=58894; q=dns/txt; s=iport; t=1396005851; x=1397215451; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to; bh=pWSAMMF6badFz0no+4N53lg5ZnGAKXXmURd+vQ3SroU=; b=TczCks2kT4jlqBi7hH/1tc590J90Wfl+vdZwYF1bsLPZnjqpniLKlvot D2d5r4R/UsaPzOlDv/nHQHuOXGtTXqDr4gQ4uqaIzwHX6S3Hj1nI8S1cI NWakPEIkzwsT5GGWaCOVtlyYFOVdBcawnQJ2mCiidnm+qlgHUqzVf5YTT w=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AkcFAGhbNVOQ/khN/2dsb2JhbABPCg6CNESJeroFgRwWdIIlAQEBBBoNUQEQCxgJDAoBAQYHCQMCAQIBNBEGDQEFAgEBh3XRcReJTIRECAcTSQcKhC4BA5Rhg2yGTYtognBBPIEt
X-IronPort-AV: E=Sophos; i="4.97,750,1389744000"; d="scan'208,217"; a="11321694"
Received: from ams-core-4.cisco.com ([144.254.72.77]) by aer-iport-2.cisco.com with ESMTP; 28 Mar 2014 11:24:09 +0000
Received: from [10.60.67.85] (ams-bclaise-8914.cisco.com [10.60.67.85]) by ams-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id s2SBO8ju016677; Fri, 28 Mar 2014 11:24:09 GMT
Message-ID: <53355BD8.2030106@cisco.com>
Date: Fri, 28 Mar 2014 12:24:08 +0100
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: Anoop Ghanwani <anoop@alumni.duke.edu>
References: <CA+-tSzxDpD2V7Q15Jjgzz2A+d5Gn_92YQ-1_Zvx2AP=s5AWpxA@mail.gmail.com> <531AF602.5070400@cisco.com>
In-Reply-To: <531AF602.5070400@cisco.com>
Content-Type: multipart/alternative; boundary="------------050300090106010004050806"
Archived-At: http://mailarchive.ietf.org/arch/msg/opsawg/uVvv0POCamKZ4RGbWXFrw-N-AnQ
Cc: "opsawg@ietf.org" <opsawg@ietf.org>
Subject: Re: [OPSAWG] AD review of draft-ietf-opsawg-large-flow-load-balancing (draft response)
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Mar 2014 11:24:19 -0000
Hi Anoop, Ramki, A gentle reminder. Regards, Benoit > Hi Anoop, > > Please post a new draft version, and I'll review the diffs. > Some more answers in-line. > > Regards, Benoit >> >> Hi Benoit, >> >> Thanks for the detailed and careful review. Comments inline. >> >> Anoop >> >> ==== >> >> >> On Tue, Feb 18, 2014 at 7:55 AM, Benoit Claise <bclaise@cisco.com >> <mailto:bclaise@cisco.com>> wrote: >> >> Dear authors, >> >> Here is my AD review of draft-ietf-opsawg-large-flow-load-balancing >> >> - Section 1: >> Networks extensively use link aggregation groups (LAG) [802.1AX] and >> equal cost multi-paths (ECMP) [RFC 2991] as techniques for capacity >> scaling. For the problems addressed by this document, network traffic >> can be predominantly categorized into two traffic types: long-lived >> large flows and other flows. >> >> ... >> >> This draft describes mechanisms for optimal LAG/ECMP component link >> utilization while using hash-based techniques. The mechanisms >> comprise the following steps -- recognizing_large flows_ in a router; >> and assigning the large flows to specific LAG/ECMP component links or >> redistributing the small flows when a component link on the router is >> congested. >> >> It is useful to keep in mind that in typical use cases for this >> mechanism the_large flows_ are those that consume a significant amount >> of bandwidth on a link, e.g. greater than 5% of link bandwidth. The >> number of such flows would necessarily be fairly small, e.g. on the >> order of 10's or 100's per LAG/ECMP. In other words, the number of >> _ large flows_ is NOT expected to be on the order of millions of flows. >> Examples of such large flows would be IPsec tunnels in service >> provider backbone networks or storage backup traffic in data center >> networks. >> >> 3 instances of "large flows": do you mean "long-lived large flows"? >> If not, why do you make a distinction between long-lived large >> flows and other flows in the first paragraph? >> I eventually understood the source of confusion when I read the >> terminology section: >> Large flow(s): long-lived large flow(s) >> >> Either use capitalized term in the Intro section (actually >> throughout the doc.) so that we understand that the term is >> defined somewhere, or make it clear in the intro that large >> flow(s) = long-lived large flow(s) >> >> >> Yes, they are all referring to long-lived large flows. We will >> change the early part of Section 1 to clarify that long-lived large >> flows are, thereafter in the document, referred to as large flows. > Or replace large flow by long-lived flow were it makes sense in the draft. >> >> - >> >> This document presents improved load distribution techniques based on >> the large flow awareness. >> >> Improved compared to? >> >> >> Improved compared to static hash-based distribution techniques that >> do not account for the bandwidth of the flows. Will reword as follows: >> >> "This document presents mechanisms for improving the load >> distribution problem resulting from stateless hashing as seen in the >> above example." > ok >> >> - >> In several places, starting with the title and abstract, you >> speak about mechanisms (plural). >> However, looking at section 4.2, it seems that you propose a >> single mechanism? Or maybe you consider 4.1, 4.2, 4.3 as >> different mechanisms? >> >> >> The title of 4.2 is perhaps misleading and should just be >> "Operational Overview." > ok >> Otherwise the rest of the draft discusses several mechanisms >> (multiple choices for large flow identification, and multiple choices >> for rebalancing). >> >> - >> >> Step 3) On receiving the alert about the congested component link, >> the operator, through a central management entity, finds the large >> flows mapped to that component link and the LAG/ECMP group to which >> the component link belongs. >> >> Step 4) The operator can choose to rebalance the large flows on >> lightly loaded component links of the LAG/ECMP group or redistribute >> the small flows on the congested link to other component links of the >> group. The operator, through a central management entity, can choose >> one of the following actions: >> >> 1) Indicate specific large flows to rebalance; >> >> 2) Have the router decide the best large flows to rebalance; >> >> 3) Have the router redistribute all the small flows on the >> congested link to other component links in the group. >> >> "Indicate specific large flows to rebalance", "through a central >> management entity", what you describe is basically traffic >> engineering. >> Other the other hand, for 2) and 3), why do you need a central >> management entity? >> >> >> The assumption was that the router is controlled by a central >> management entity for the purpose of this function, but that is >> clearly not a requirement. The text will be modified to mention that >> a central management entity may be used (i.e. not required). > Ok. >> >> - >> >> A number of routers support sampling techniques such as sFlow [sFlow- >> v5, sFlow-LAG], PSAMP [RFC 5475] and NetFlow Sampling [RFC 3954]. >> For the purpose of large flow identification, sampling must be >> enabled on all of the egress ports in the router where such >> measurements are desired. >> >> I don't understand the second sentence. >> One way to read this is: sampling must be _enabled _on all of >> the egress ports where such measurements are desired. >> Ok, this is an obvious statement. If the measurements are >> desired, enable them >> >> >> Yes, > ok please clarify the text. >> >> Or maybe you want to say: _sampling _must be enabled on all of >> the egress ports where such measurements are desired. >> This is a false statement: if you have the choice between >> sampling and non sampling, use non sampling measurements. >> Or maybe you want to say: sampling must be enabled on _all _of >> the egress ports where such measurements are desired. >> This is a false statement: if I have ECMP on 2 links, and >> only one of them can't do non sampling, then we should not force >> sampling on both links. >> You see, I'm confused. >> >> You miss a couple of key messages: >> - if unsampled measurements are available, use those. >> - egress means where LAG/ECMP are enabled (this is important for >> the paragraph starting with "If egress sampling is not available, >> ingress sampling can suffice since the central management entity >> use") >> >> >> We were not intending to discuss a mix sampling and non-sampling >> interfaces in the same router, but this is a reasonable point and it >> will be clarified (i.e. we will state that it's possible to mix >> sampled and non sampled interfaces as long as the function of large >> flow detection/identification can be performed). >> >> >> - >> >> If egress sampling is not available, ingress sampling can suffice >> since the central management entity used by the sampling technique >> typically has multi-node visibility and can use the samples from an >> immediately downstream node to make measurements for egress traffic >> at the local node. >> >> It's not clear if "ingress" means the ingress interface of the >> router itself, or the ingress interface of the downstream router. >> A drawing is required. >> Both options are possible: >> 1. ingress interfaces on the router where LAG/ECMP is initiated >> flow monitoring must be enabled on all ingress interfaces >> flow monitoring must have a way to know the egress interfaces >> 2. ingress interfaces of the downstream router >> only work for LAG or ECMP single hop >> ingress interfaces = all components from LAG/ECMP >> (multiple ifIndex, typically) >> >> >> What we meant here was that ingress sampling would have to be enable >> on the downstream device (hence the central management entity must >> come into play to identify large flows). > I still believe that a drawing would clarify things. >> >> >> this entire section 4.3.3 needs some improvements >> >> >> - >> On one side, you wrote "Specific algorithms for placement of >> large flows are out of scope of this document.". On the other >> side, "The following parameters are required the configuration of >> _this_ feature". It seems contradictory. >> It's unclear why you need the following parameter: >> >> . Imbalance threshold: the difference between the utilization of >> the least utilized and most utilized component links. Expressed >> as a percentage of link speed. >> >> Also, does ECMP/LAG always require equivalent link speed for their components? >> >> The imbalance threshold is a measure of how much imbalance one is >> willing to tolerate before taking the hit of potential packet >> reordering in some flows. Will clarify. >> >> Thanks for catching the issue with link speed. While in most cases >> speeds are consistent, there may be the case of composite links which >> combine links of different speeds (actually permitted by 802.1AX), so >> we will provide a generalized formula for the imbalance threshold >> which takes into account the individual speeds of each of the >> component links. >> >> >> - >> 5.2. System Configuration and Identification Parameters >> >> . IP address: The IP address of a specific router that the >> feature is being configured on, or that the large flow placement >> is being applied to. >> >> . LAG ID: Identifies the LAG. The LAG ID may be required when >> configuring this feature (to apply a specific set of large flow >> identification parameters to the LAG) and will be required when >> specifying flow placement to achieve the desired rebalancing. >> >> . Component Link ID: Identifies the component link within a LAG. >> This is required when specifying flow placement to achieve the >> desired rebalancing. >> >> Nothing regarding ECMP? >> >> >> Initially we were more focused on getting this done for LAG, but then >> we completely overlooked ECMP. 5.2, 5.3, and 5.4 would probably >> benefit from a bit of clean-up as follows: >> >> Add the following to 5.2: >> >> ECMP group: Identifies a particular ECMP group. >> >> ECMP nexthop: Identifies a particular nexthop within an ECMP group. >> >> Add the following line to the end of section 5.3. >> >> When using ECMP, the nexthop within an ECMP group is used to identify >> the component link for placing the large flow. >> >> Add the following to the end of Section 5.4. >> >> When using ECMP, the ECMP group and the corresponding Nexthops along >> with the percentage of traffic to be assigned to each Nexthop is >> required. Finally it is also possible that an ECMP Nexthop itself >> comprises a LAG in which case both the Nexthop and the LAG Component >> ID would need to be specified, and the weights of both the Nexthop's >> within the ECMP Group and the Component Links within the LAG would >> need to be adjusted. > Ok. > > Regards, Benoit >> >> >> - >> >> For high speed links, the etherStatsHighCapacityTable MIB [RFC 3273] >> can be used. >> >> Well, only for ethernet. >> >> Will clarify that. >> >> >> >> >> EDITORIAL: >> - >> figure 2 >> >> OLD: >> >> +-----------+ -> +-----------+ >> | | -> | | >> | | ===> | | >> | (1)|--------|(1) | >> | | -> | | >> | | -> | | >> | (R1) | -> | (R2) | >> >> NEW: >> >> +-----------+ -> +-----------+ >> | | -> | | >> | | ===> | | >> | (1)|--------|(1) | >> | | -> | | >> | | -> | | >> | (R1) | -> | (R2) | >> >> >> >> Will fix. >> >> >> - The indentation in section 2 is not correct >> >> >> Will fix. >> >> >> - "For tunneling protocols like GRE, VXLAN, NVGRE, STT, etc.," >> You need to expand and provide references. >> >> Will provide references. What do mean by expand -- just expand the >> acronyms (already in the acronym section) or something else? >> >> - a PBR rule >> Expand. >> >> OK >> >> - >> OLD: >> >> +-----------+ -> +-----------+ >> | | -> | | >> | | ===> | | >> | (1)|--------|(1) | >> | | | | >> | | ===> | | >> | | -> | | >> | | -> | | >> | (R1) | -> | (R2) | >> | (2)|--------|(2) | >> >> NEW: >> +-----------+ -> +-----------+ >> | | -> | | >> | | ===> | | >> | (1)|--------|(1) | >> | | | | >> | | ===> | | >> | | -> | | >> | | -> | | >> | (R1) | -> | (R2) | >> | (2)|--------|(2) | >> >> Will fix. >> >> - >> OLD: >> The IPFIX information model [RFC 7011] >> NEW: >> The IPFIX information model [RFC 7012] >> >> Will fix. >> >> Regards, Benoit >> >> >> >> >> >
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… ramki Krishnan
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… ramki Krishnan
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise