[bess] AD Review of draft-ietf-bess-multicast-damping-03

"Alvaro Retana (aretana)" <aretana@cisco.com> Tue, 23 February 2016 22:56 UTC

Return-Path: <aretana@cisco.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E37171A90CD; Tue, 23 Feb 2016 14:56:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.506
X-Spam-Level:
X-Spam-Status: No, score=-14.506 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.006, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TivOvwxCP-N2; Tue, 23 Feb 2016 14:56:00 -0800 (PST)
Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 05E5C1B361F; Tue, 23 Feb 2016 14:55:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=42231; q=dns/txt; s=iport; t=1456268144; x=1457477744; h=from:to:cc:subject:date:message-id:mime-version; bh=MrQg9P3eXo0G7f8+ozO8iRoaoCJgX2aFaXjT1mHbGTo=; b=gg4eR/DO+2ooRAsNjo0AG072ZEFf4zwA3EiGp48NAwf/l5JGxPTIWh3Q 5SVbnLIGvaPFKIHnvmp/PPn/hclVP3FUDT/6nGm0mKY+diqK5yB4xRvQ3 KH0DeT4r7XZWGJGmEQfcqqxVQ9L28szR94h+CPes7aM8velwvA7yE21KK 4=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0D3AQAQ48xW/4sNJK1egm5MgUW6ZgENgWaGDYFKOBQBAQEBAQEBZBwLhEQEGlINEgFAAT8nBA4eAogDvT8BAQEBAQUBAQEBAQEBGYYSgz2FDIRgBZJzhBQBjV2BXIRDiFKFcohWAR4BAUKCAwUUgUiHZT19AQEB
X-IronPort-AV: E=Sophos;i="5.22,491,1449532800"; d="scan'208,217";a="241890828"
Received: from alln-core-6.cisco.com ([173.36.13.139]) by alln-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 23 Feb 2016 22:55:42 +0000
Received: from XCH-RCD-004.cisco.com (xch-rcd-004.cisco.com [173.37.102.14]) by alln-core-6.cisco.com (8.14.5/8.14.5) with ESMTP id u1NMtgGh016577 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 23 Feb 2016 22:55:42 GMT
Received: from xch-aln-002.cisco.com (173.36.7.12) by XCH-RCD-004.cisco.com (173.37.102.14) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Tue, 23 Feb 2016 16:55:40 -0600
Received: from xch-aln-002.cisco.com ([173.36.7.12]) by XCH-ALN-002.cisco.com ([173.36.7.12]) with mapi id 15.00.1104.009; Tue, 23 Feb 2016 16:55:41 -0600
From: "Alvaro Retana (aretana)" <aretana@cisco.com>
To: "draft-ietf-bess-multicast-damping@ietf.org" <draft-ietf-bess-multicast-damping@ietf.org>
Thread-Topic: AD Review of draft-ietf-bess-multicast-damping-03
Thread-Index: AQHRbo1R5e3rbxWH10Omol5b+cXuqg==
Date: Tue, 23 Feb 2016 22:55:41 +0000
Message-ID: <D2DBF5CB.10DEE7%aretana@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.117.15.5]
Content-Type: multipart/alternative; boundary="_000_D2DBF5CB10DEE7aretanaciscocom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/bess/8Kcdz5_FAQn83ro7W6oaN_pibHg>
Cc: "bess-chairs@ietf.org" <bess-chairs@ietf.org>, "martin.vigoureux@alcatel-lucent.com" <martin.vigoureux@alcatel-lucent.com>, "bess@ietf.org" <bess@ietf.org>
Subject: [bess] AD Review of draft-ietf-bess-multicast-damping-03
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Feb 2016 22:56:05 -0000

Hi!

I think that the title of this document clearly reflects what you want to do — which may be one of the reasons there was virtually no discussion about it (or any of its predecessors) on the list.  However, I think the contents leave many open doors that need to be closed before this document can be published.

I put more detailed comments below, but my main concerns are here:

  1.  The Abstract says that the procedures are "inspired from BGP unicast route damping".  It seems to me that the intent is in fact to adopt the algorithm from RFC2439.  However, the text is not explicit/clear about that.
  2.  As you all know, the history behind BGP damping has not been without it being considered useless and even having recommendations (from RIPE, for example) not to use it.  How did you arrive at the default and maximum values?  It concerns me that there are no known implementations (from the Shepherd's report).  Because of that, I think this document would be better suited as an Experimental RFC, with the explicit purpose of gaining experience with the values and determine the impact in live deployments (which then could support a standard version).  Please consider changing the intended Status.

According to the e-mail archive, it looks like an early presentation of this work happened in an mboned meeting, but I didn't find discussion on the pim or idr lists.  Once the comments below are addressed I will want to forward the document to pim/idr for their review.

Thanks!

Alvaro.


Major:

  1.  There are 6 authors listed on the front page.  According to RFC7322, the total number is generally limited to 5.  Please work among yourselves to cut the number of authors.  Alternatively, we can just list an Editor (there's one already identified)..or you can produce a justification detailing the contributions of each author to consider an exception.
  2.  Replace the reference to RFC4601 with a reference to draft-ietf-pim-rfc4601bis.  Note that the section numbers have changed slightly!
  3.  Are you adopting the exponential decay algorithm from RFC2439?  That seems to be what's happening because you are not explicitly defining a new algorithm, but some of the text leave doubts.  For example:
     *   "inspired from BGP unicast route damping"  I know the application is different, but if the algorithm is the same then please say it.
     *   Section 5.1. (PIM procedures)
        *   "updating the *figure-of-merit* based on the decay algorithm must be done prior to this increment"  This statement seems to directly imply that the algorithm is used.  Please reorder the steps to explicitly call this one out, instead of plugging it in as an afterthought.  BTW, should the "must" be "MUST"?  Ordering should help you not having to deal with that last question.
        *   "Same techniques as the ones described in [RFC2439] can be applied…"   "Can be"?  This sentence seems to imply that what is described in RFC2439 is optional.  Are there other ways of determining the same thing?  What about the exponential decay algorithm?
        *   It would also help if the terminology was consistent.  For example, instead of "damping becomes active" use "suppressed".  I can see how "suppressed" may give the wrong impression as only the propagation of state is affected.  Explaining then how the terminology applies would make it easier to reuse, avoid confusion and be clear.  Note that there's no mention of RFC2439 in the terminology section.
  4.  Section 3. (Overview): "…it is expected that this technique will allow to meet the goals of protecting the multicast routing infrastructure control plane without a significant average increase of bandwidth".  In general, I want to make sure that the qualities of the solution and the expected results are properly reflected in the document. [I'm using the text above as the base for my comment, but the impact is larger.]  Some questions:
     *   "…it is expected that this technique will…"  I wonder why an assertion can't be made that this technique can (vs just expecting that it will) address specific problems.  Is it the case that experience is needed to make a stronger assertion?  Are the goals the same (or at least similar) in every network?  Are there implementations available?  If so, please consider an "Implementation Status" section (see rfc6982).  What has been the deployment experience?  This goes back to my comment above about the Intended Status of this document.
     *   What specifically are the goals?  In a couple of places the text points back at Section 1. (Introduction), but I'm not sure exactly what the goals are.  Of special interest for understanding the goals is the part in Section 4.2. (Existing PIM, IGMP and MLD timers) where other solutions are discarded for not meeting them.
        *   There is scattered text that talks about "…ensure that the load put on the BGP control plane, and on the P-tunnel setup control plane, remains under control…", "protecting these control planes…avoiding negative effects…although at the expense of a minimal increase in average of bandwidth use…".   However, the description is too vague to point at what can satisfy these goals and what can't.
     *   Section 4.1. (Rate-limiting of multicast control traffic) mentions the "risk described in Section 1", which does mention "risks of denial of service attacks".  Is that the risk you're referring to, or something else?
     *   Section 4.3. (BGP Route Damping) mentions "the principle described in this document", which I thought was related to the goals, but Section 1 says that  the "base principle is described in Section 3".  I'm assuming the "principle" in question is such that a "network operator…can delay the propagation of multicast state prune messages between PEs, when faced with a rate of multicast state dynamicity exceeding a certain configurable threshold".  That sounds like a potential goal to me.
  5.  Section 5.2. (Procedures for multicast VPN state damping)
     *   In the Introduction you write that "Section 16 of [RFC6514] specifically spells out the need for damping the activity…"  I think that RFC6514 does a lot more than that:  Section 16.1. (Dampening C-Multicast Routes) "proposes OPTIONAL route dampening procedures similar to what is described in [RFC2439]."   Those procedures look very similar to the ones in this document.  What is the difference?  Is the intent of this document to complement, replace or maybe update what is already specified in RFC6514?
     *   There's an rfc2119 conflict.   "…then the withdrawal of a C-multicast route…SHOULD NOT be damped.  An implementation of the specification in this document MUST whether, not damp these withdrawals by default, or alternatively provide a tuning knob to disable the damping of these withdrawals."  s/whether/either   The "MUST..not damp" and "SHOULD NOT be damped" are in conflict.    I think that eliminating the last sentence would fix the problem and still allow an implementation to put in any knobs that it wants.
  6.  Section 7.3. (Default and maximum values) lists values that are "RECOMMENDED to adopt as default conservative values".  Any guidance about when and/or how an operator should consider changing the recommended defaults?  What does "conservative" mean in this context?  What if the operator wants to be more aggressive?


Minor:

  1.  In 4.2
     *   s/prune override interval/J/P_Override_Interval
     *   Reference for explicit tracking..??  BTW, how would the mechanism in this document interact with explicit tracking?
  2.  Section 5.1. (PIM procedures):
     *   "…a router implementing these procedures MUST…apply unchanged procedures for everything…".  I guess that these "unchanged procedures" are the ones in rfc4601bis, right?  In other words, what you seem to want is that, in addition to what rfc4601bis specifies, for the other steps defined in this document to happen.  If that is correct, please reword the description to make it clear — at least put a reference so that there is no question about which procedures are left unchanged.
     *   "…freeze the upstream state machine…and setup a trigger to update it…"  Maybe a word like "hold" or "maintain" might be better.  In fact, even better would be an explicit indication that "events that may result in the state changing [rfc4601bis] SHOULD be ignored until the reuse threshold is reached", or something along those lines.  What should the state be updated to when the reuse threshold is reached?
     *   I had some trouble parsing this text: "When the recompilation is done periodically, the period should be low enough to not significantly delay the inactivation of damping on a multicast state beyond what the operator wanted to configure (i.e. for a *decay-half-life* of 10s, recomputing the *figure-of-merit* each minute would result in a multicast state to remained damped for a much longer time than what the parameters are supposed to command)."    I think I got it…but what I don't get (based on my understanding of RFC2439) is that the figure-of-merit should decay according to the half-life, so I don't get why its value would be adjusted at a period that is not related to the half-life.
  3.  Section 5.2. (Procedures for multicast VPN state damping)
     *   There are several places in this section where rfc2119 language is used to describe what an implementation should do that sound to me as an attempt to define functionality that is mandatory to implement (MTI).  I find that hard/impossible to enforce and would like to see the rfc2119 language removed.  Please see below..
     *   The text says that an "implementation of [RFC6513] relying on the use of PIM to carry C-multicast routing information MUST support this technique."  That "MUST" is really strong and it makes me think that this document should then be marked as an update to RFC6513.  Is that the intent?  Reading through it again, is the intent MTI?
     *   "…the following procedure is proposed as an alternative to the procedures in Section 5.1…"  "proposed"??  Does this mean that 5.1 is also applicable in this case (when "BGP is used to distribute C-multicast routing information")?   It sounds like the operator would have an option — if so, when should each be considered?
        *   Later in the same section you wrote: "…choice to implement damping based on BGP routes or the procedures described in Section 5, is up to the implementor, but at least one of the two MUST be implemented."  I think it should be section 5.1.  Do you really mean the "implementor", or are you referring to the operator of the network?  The "MUST" sounds too strong for me because it is not needed for interoperability (rfc2119) -- or is this an attempt at MTI?
        *   Same question/observation for "In the perspective of allowing damping to be done on RRs and ASBRs, implementing the BGP approach is RECOMMENDED."  Maybe s/RECOMMENDED/recommended
     *   "…it can be considered useful to also be able to apply damping on RRs as well."  When is it considered useful?
        *   Note that later you also write: "…in such a context, it is RECOMMENDED to not enable any multicast VPN route damping on RRs…"   This partially answers the question.  It would be nice to put the guidance together.
     *   "…damping SHOULD NOT be applied to BGP routes of the following sub-types…"  Are there cases when it is ok?  In other words, why is the "SHOULD NOT" not a "MUST NOT"?
  4.  Section 6.1. (Damping mVPN P-tunnel change events) "Possible ways to do so depend on the type of P-tunnel, and local implementation details are left up to the implementor.     The following is proposed as example of how the above can be achieved."  Either you leave it as an implementation detail or you provide guidance.  If this document was Experimental, then providing guidance it great!


Nits:

  1.  Please put references on first appearance.  For example, IGMP and MLD are mentioned in the introduction, but no reference is made until the 5th mention.  The same for BGP route damping…
     *   You also need a reference to route reflectors.
  2.  "these control planes"  Which control planes?  Please be specific.   There are other places where "these" is uses that may not be completely clear..please take a look.
  3.  s/these specifications/this specification
  4.  s/when enabled /when enabled,
  5.  "PIM-SM specifications [RFC4609]"   RFC4609 is not the PIM spec.
  6.  Section 6.2. (Procedures for Ethernet VPNs)  "…an implementation of these procedures MUST follow the procedures described in Section 6.1."  It is not completely clear which "these procedures" are.  I'm guessing RFC7117.