Re: [OPSAWG] AD review of draft-ietf-opsawg-large-flow-load-balancing (draft response)
Benoit Claise <bclaise@cisco.com> Sat, 08 March 2014 11:29 UTC
Return-Path: <bclaise@cisco.com>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7CF91A0265 for <opsawg@ietfa.amsl.com>; Sat, 8 Mar 2014 03:29:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.047
X-Spam-Level:
X-Spam-Status: No, score=-10.047 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.547, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O3ohJYtr6lkQ for <opsawg@ietfa.amsl.com>; Sat, 8 Mar 2014 03:29:40 -0800 (PST)
Received: from aer-iport-2.cisco.com (aer-iport-2.cisco.com [173.38.203.52]) by ietfa.amsl.com (Postfix) with ESMTP id 2F9EE1A0114 for <opsawg@ietf.org>; Sat, 8 Mar 2014 03:29:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=56189; q=dns/txt; s=iport; t=1394278174; x=1395487774; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to; bh=EWxX4XNYIooNLt2T/ok5A0yrbX6I/W2O+bvJfmYQAww=; b=KuBWVXPozpaQfiLtgZGNJKAZrTUNqsV3Or0VdV+Q/hkjmzwEhSEL0YNO tYwJa7AkVVbZdHFn6Ne8yfvYoXHs2J+eCfK2lUY2X0BmWIzRgXnm08Uk2 AJcaUuhGfwrrcvHAdYbPCzq2HodwZe+QUPgo3yPxQvUL2i1dE1PQti9gH 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AkgFAMz+GlOQ/khN/2dsb2JhbABQCg6CNESJdbhNgREWdIIlAQEBBBoNUQEQCxgJDAoBAQYHCQMCAQIBNBEGDQEFAgEBh3XPcReNcAgHE0kHCoQuAQOUWYNshkqLYYJuPz2BLQ
X-IronPort-AV: E=Sophos;i="4.97,613,1389744000"; d="scan'208,217";a="7120380"
Received: from ams-core-4.cisco.com ([144.254.72.77]) by aer-iport-2.cisco.com with ESMTP; 08 Mar 2014 11:29:31 +0000
Received: from [10.61.202.238] ([10.61.202.238]) by ams-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id s28BTVNj029128; Sat, 8 Mar 2014 11:29:31 GMT
Message-ID: <531AF602.5070400@cisco.com>
Date: Sat, 08 Mar 2014 10:50:42 +0000
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: Anoop Ghanwani <anoop@alumni.duke.edu>
References: <CA+-tSzxDpD2V7Q15Jjgzz2A+d5Gn_92YQ-1_Zvx2AP=s5AWpxA@mail.gmail.com>
In-Reply-To: <CA+-tSzxDpD2V7Q15Jjgzz2A+d5Gn_92YQ-1_Zvx2AP=s5AWpxA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------070802060801080208020703"
Archived-At: http://mailarchive.ietf.org/arch/msg/opsawg/g8dDL5Wt8lkYU0DgeLl5GHw8qK8
Cc: "opsawg@ietf.org" <opsawg@ietf.org>
Subject: Re: [OPSAWG] AD review of draft-ietf-opsawg-large-flow-load-balancing (draft response)
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Mar 2014 11:29:44 -0000
Hi Anoop, Please post a new draft version, and I'll review the diffs. Some more answers in-line. Regards, Benoit > > Hi Benoit, > > Thanks for the detailed and careful review. Comments inline. > > Anoop > > ==== > > > On Tue, Feb 18, 2014 at 7:55 AM, Benoit Claise <bclaise@cisco.com > <mailto:bclaise@cisco.com>> wrote: > > Dear authors, > > Here is my AD review of draft-ietf-opsawg-large-flow-load-balancing > > - Section 1: > Networks extensively use link aggregation groups (LAG) [802.1AX] and > equal cost multi-paths (ECMP) [RFC 2991] as techniques for capacity > scaling. For the problems addressed by this document, network traffic > can be predominantly categorized into two traffic types: long-lived > large flows and other flows. > > ... > > This draft describes mechanisms for optimal LAG/ECMP component link > utilization while using hash-based techniques. The mechanisms > comprise the following steps -- recognizing_large flows_ in a router; > and assigning the large flows to specific LAG/ECMP component links or > redistributing the small flows when a component link on the router is > congested. > > It is useful to keep in mind that in typical use cases for this > mechanism the_large flows_ are those that consume a significant amount > of bandwidth on a link, e.g. greater than 5% of link bandwidth. The > number of such flows would necessarily be fairly small, e.g. on the > order of 10's or 100's per LAG/ECMP. In other words, the number of > _ large flows_ is NOT expected to be on the order of millions of flows. > Examples of such large flows would be IPsec tunnels in service > provider backbone networks or storage backup traffic in data center > networks. > > 3 instances of "large flows": do you mean "long-lived large flows"? > If not, why do you make a distinction between long-lived large > flows and other flows in the first paragraph? > I eventually understood the source of confusion when I read the > terminology section: > Large flow(s): long-lived large flow(s) > > Either use capitalized term in the Intro section (actually > throughout the doc.) so that we understand that the term is > defined somewhere, or make it clear in the intro that large > flow(s) = long-lived large flow(s) > > > Yes, they are all referring to long-lived large flows. We will change > the early part of Section 1 to clarify that long-lived large flows > are, thereafter in the document, referred to as large flows. Or replace large flow by long-lived flow were it makes sense in the draft. > > - > > This document presents improved load distribution techniques based on > the large flow awareness. > > Improved compared to? > > > Improved compared to static hash-based distribution techniques that do > not account for the bandwidth of the flows. Will reword as follows: > > "This document presents mechanisms for improving the load distribution > problem resulting from stateless hashing as seen in the above example." ok > > - > In several places, starting with the title and abstract, you speak > about mechanisms (plural). > However, looking at section 4.2, it seems that you propose a > single mechanism? Or maybe you consider 4.1, 4.2, 4.3 as different > mechanisms? > > > The title of 4.2 is perhaps misleading and should just be "Operational > Overview." ok > Otherwise the rest of the draft discusses several mechanisms > (multiple choices for large flow identification, and multiple choices > for rebalancing). > > - > > Step 3) On receiving the alert about the congested component link, > the operator, through a central management entity, finds the large > flows mapped to that component link and the LAG/ECMP group to which > the component link belongs. > > Step 4) The operator can choose to rebalance the large flows on > lightly loaded component links of the LAG/ECMP group or redistribute > the small flows on the congested link to other component links of the > group. The operator, through a central management entity, can choose > one of the following actions: > > 1) Indicate specific large flows to rebalance; > > 2) Have the router decide the best large flows to rebalance; > > 3) Have the router redistribute all the small flows on the > congested link to other component links in the group. > > "Indicate specific large flows to rebalance", "through a central > management entity", what you describe is basically traffic > engineering. > Other the other hand, for 2) and 3), why do you need a central > management entity? > > > The assumption was that the router is controlled by a central > management entity for the purpose of this function, but that is > clearly not a requirement. The text will be modified to mention that > a central management entity may be used (i.e. not required). Ok. > > - > > A number of routers support sampling techniques such as sFlow [sFlow- > v5, sFlow-LAG], PSAMP [RFC 5475] and NetFlow Sampling [RFC 3954]. > For the purpose of large flow identification, sampling must be > enabled on all of the egress ports in the router where such > measurements are desired. > > I don't understand the second sentence. > One way to read this is: sampling must be _enabled _on all of the > egress ports where such measurements are desired. > Ok, this is an obvious statement. If the measurements are > desired, enable them > > > Yes, ok please clarify the text. > > Or maybe you want to say: _sampling _must be enabled on all of the > egress ports where such measurements are desired. > This is a false statement: if you have the choice between > sampling and non sampling, use non sampling measurements. > Or maybe you want to say: sampling must be enabled on _all _of the > egress ports where such measurements are desired. > This is a false statement: if I have ECMP on 2 links, and only > one of them can't do non sampling, then we should not force > sampling on both links. > You see, I'm confused. > > You miss a couple of key messages: > - if unsampled measurements are available, use those. > - egress means where LAG/ECMP are enabled (this is important for > the paragraph starting with "If egress sampling is not available, > ingress sampling can suffice since the central management entity use") > > > We were not intending to discuss a mix sampling and non-sampling > interfaces in the same router, but this is a reasonable point and it > will be clarified (i.e. we will state that it's possible to mix > sampled and non sampled interfaces as long as the function of large > flow detection/identification can be performed). > > > - > > If egress sampling is not available, ingress sampling can suffice > since the central management entity used by the sampling technique > typically has multi-node visibility and can use the samples from an > immediately downstream node to make measurements for egress traffic > at the local node. > > It's not clear if "ingress" means the ingress interface of the > router itself, or the ingress interface of the downstream router. > A drawing is required. > Both options are possible: > 1. ingress interfaces on the router where LAG/ECMP is initiated > flow monitoring must be enabled on all ingress interfaces > flow monitoring must have a way to know the egress interfaces > 2. ingress interfaces of the downstream router > only work for LAG or ECMP single hop > ingress interfaces = all components from LAG/ECMP > (multiple ifIndex, typically) > > > What we meant here was that ingress sampling would have to be enable > on the downstream device (hence the central management entity must > come into play to identify large flows). I still believe that a drawing would clarify things. > > > this entire section 4.3.3 needs some improvements > > > - > On one side, you wrote "Specific algorithms for placement of large > flows are out of scope of this document.". On the other side, "The > following parameters are required the configuration of _this_ > feature". It seems contradictory. > It's unclear why you need the following parameter: > > . Imbalance threshold: the difference between the utilization of > the least utilized and most utilized component links. Expressed > as a percentage of link speed. > > Also, does ECMP/LAG always require equivalent link speed for their components? > > The imbalance threshold is a measure of how much imbalance one is > willing to tolerate before taking the hit of potential packet > reordering in some flows. Will clarify. > > Thanks for catching the issue with link speed. While in most cases > speeds are consistent, there may be the case of composite links which > combine links of different speeds (actually permitted by 802.1AX), so > we will provide a generalized formula for the imbalance threshold > which takes into account the individual speeds of each of the > component links. > > > - > 5.2. System Configuration and Identification Parameters > > . IP address: The IP address of a specific router that the > feature is being configured on, or that the large flow placement > is being applied to. > > . LAG ID: Identifies the LAG. The LAG ID may be required when > configuring this feature (to apply a specific set of large flow > identification parameters to the LAG) and will be required when > specifying flow placement to achieve the desired rebalancing. > > . Component Link ID: Identifies the component link within a LAG. > This is required when specifying flow placement to achieve the > desired rebalancing. > > Nothing regarding ECMP? > > > Initially we were more focused on getting this done for LAG, but then > we completely overlooked ECMP. 5.2, 5.3, and 5.4 would probably > benefit from a bit of clean-up as follows: > > Add the following to 5.2: > > ECMP group: Identifies a particular ECMP group. > > ECMP nexthop: Identifies a particular nexthop within an ECMP group. > > Add the following line to the end of section 5.3. > > When using ECMP, the nexthop within an ECMP group is used to identify > the component link for placing the large flow. > > Add the following to the end of Section 5.4. > > When using ECMP, the ECMP group and the corresponding Nexthops along > with the percentage of traffic to be assigned to each Nexthop is > required. Finally it is also possible that an ECMP Nexthop itself > comprises a LAG in which case both the Nexthop and the LAG Component > ID would need to be specified, and the weights of both the Nexthop's > within the ECMP Group and the Component Links within the LAG would > need to be adjusted. Ok. Regards, Benoit > > > - > > For high speed links, the etherStatsHighCapacityTable MIB [RFC 3273] > can be used. > > Well, only for ethernet. > > Will clarify that. > > > > > EDITORIAL: > - > figure 2 > > OLD: > > +-----------+ -> +-----------+ > | | -> | | > | | ===> | | > | (1)|--------|(1) | > | | -> | | > | | -> | | > | (R1) | -> | (R2) | > > NEW: > > +-----------+ -> +-----------+ > | | -> | | > | | ===> | | > | (1)|--------|(1) | > | | -> | | > | | -> | | > | (R1) | -> | (R2) | > > > > Will fix. > > > - The indentation in section 2 is not correct > > > Will fix. > > > - "For tunneling protocols like GRE, VXLAN, NVGRE, STT, etc.," > You need to expand and provide references. > > Will provide references. What do mean by expand -- just expand the > acronyms (already in the acronym section) or something else? > > - a PBR rule > Expand. > > OK > > - > OLD: > > +-----------+ -> +-----------+ > | | -> | | > | | ===> | | > | (1)|--------|(1) | > | | | | > | | ===> | | > | | -> | | > | | -> | | > | (R1) | -> | (R2) | > | (2)|--------|(2) | > > NEW: > +-----------+ -> +-----------+ > | | -> | | > | | ===> | | > | (1)|--------|(1) | > | | | | > | | ===> | | > | | -> | | > | | -> | | > | (R1) | -> | (R2) | > | (2)|--------|(2) | > > Will fix. > > - > OLD: > The IPFIX information model [RFC 7011] > NEW: > The IPFIX information model [RFC 7012] > > Will fix. > > Regards, Benoit > > > > >
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… ramki Krishnan
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Anoop Ghanwani
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… ramki Krishnan
- Re: [OPSAWG] AD review of draft-ietf-opsawg-large… Benoit Claise