[pim] comments for draft-ietf-pim-drlb

"Holland, Jake" <jholland@akamai.com> Tue, 17 July 2018 17:02 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: pim@ietfa.amsl.com
Delivered-To: pim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AF46C130F29 for <pim@ietfa.amsl.com>; Tue, 17 Jul 2018 10:02:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.721
X-Spam-Level:
X-Spam-Status: No, score=-0.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gLsIAGYnh857 for <pim@ietfa.amsl.com>; Tue, 17 Jul 2018 10:02:44 -0700 (PDT)
Received: from mx0b-00190b01.pphosted.com (mx0b-00190b01.pphosted.com [IPv6:2620:100:9005:57f::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 509C7130F73 for <pim@ietf.org>; Tue, 17 Jul 2018 10:02:44 -0700 (PDT)
Received: from pps.filterd (m0122331.ppops.net [127.0.0.1]) by mx0b-00190b01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w6HGukKl027708; Tue, 17 Jul 2018 17:57:42 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : content-type : mime-version; s=jan2016.eng; bh=Ni+8r6iP3h5TBLBiBJ0eCrH4VQXLPRpIrRMB2+EbTaE=; b=BvVahCLPOO/B4fCD7roCc6mMC0w7NIosw/do2+auaONnA71CSdmfiUlUpoW+hVKD7DTF Pr5n3wBicTf3xv84MbukYbvSlbod+k6D4E3d5suWR5U2/ZhXqUQ+kaqhENpBwfhI9sEQ RkMPb8GCSo8Sg/8UJ0AfwZEzgfs4upc4TltNZcJLKyrKxP28B8v6vcgJ0hPAi3GQBkrq yafj3L0M1Re0LiZ00FrRMcZjykltEp51Pd49RZQdtReynaNf97rPhdmBvIi8Q+OOfiA1 C3mbOT/H5hannJwJc8Gs9z+q/j4rZikPLew+a/LQ+raPbcChKRQS2838qjJxZZ/A/2YE Dg==
Received: from prod-mail-ppoint1 (prod-mail-ppoint1.akamai.com [184.51.33.18]) by mx0b-00190b01.pphosted.com with ESMTP id 2k9m2n8222-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 17 Jul 2018 17:57:41 +0100
Received: from pps.filterd (prod-mail-ppoint1.akamai.com [127.0.0.1]) by prod-mail-ppoint1.akamai.com (8.16.0.21/8.16.0.21) with SMTP id w6HGoWlT010821; Tue, 17 Jul 2018 12:57:41 -0400
Received: from email.msg.corp.akamai.com ([172.27.123.32]) by prod-mail-ppoint1.akamai.com with ESMTP id 2k7cguart0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 17 Jul 2018 12:57:40 -0400
Received: from USMA1EX-DAG1MB4.msg.corp.akamai.com (172.27.123.104) by usma1ex-dag1mb6.msg.corp.akamai.com (172.27.123.65) with Microsoft SMTP Server (TLS) id 15.0.1365.1; Tue, 17 Jul 2018 12:57:39 -0400
Received: from USMA1EX-DAG1MB4.msg.corp.akamai.com ([172.27.123.104]) by usma1ex-dag1mb4.msg.corp.akamai.com ([172.27.123.104]) with mapi id 15.00.1365.000; Tue, 17 Jul 2018 12:57:39 -0400
From: "Holland, Jake" <jholland@akamai.com>
To: "pim@ietf.org" <pim@ietf.org>
Thread-Topic: comments for draft-ietf-pim-drlb
Thread-Index: AQHUHe9FTO+nXkzXsk6a9WwgU3AdHg==
Date: Tue, 17 Jul 2018 16:57:39 +0000
Message-ID: <237C803D-B634-4A27-8F00-DF7EE67D5712@akamai.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.e.1.180613
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.114.156]
Content-Type: multipart/alternative; boundary="_000_237C803DB6344A278F00DF7EE67D5712akamaicom_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-07-17_04:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807170176
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-07-17_04:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807170177
Archived-At: <https://mailarchive.ietf.org/arch/msg/pim/k95ICJhPMvth8oQknovthh-9X38>
X-Mailman-Approved-At: Fri, 20 Jul 2018 08:19:14 -0700
Subject: [pim] comments for draft-ietf-pim-drlb
X-BeenThere: pim@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Protocol Independent Multicast <pim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pim>, <mailto:pim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pim/>
List-Post: <mailto:pim@ietf.org>
List-Help: <mailto:pim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pim>, <mailto:pim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jul 2018 17:02:56 -0000

Hi Mankamana,

Thanks for updating the draft and sending the clarifying comments, I think there were some major improvements from the previous version and I appreciate the work.

I have some more comments on the -08 version of the draft.

I can’t make it to today’s pimwg meeting, but I think some of the comments still require changes before I would support a wglc, I’m sorry to say.

The comments are inline below, marked with <jh2></jh2>.

Kind regards,
Jake

From: "Mankamana Mishra (mankamis)" <mankamis@cisco.com>
Date: 2018-06-07 at 13:24
To: "Holland, Jake" <jholland@akamai.com>, "draft-ietf-pim-drlb@ietf.org" <draft-ietf-pim-drlb@ietf.org>
Cc: "pim-chairs@ietf.org" <pim-chairs@ietf.org>
Subject: Re: [pim] please let me know any concern / comment for draft-ietf-pim-drlb-07 before we request for WGLC

Hi Jake,
Thanks for valuable comments.  Please find comment inline. I am addressing most of the comment. One which does not require any change, providing explanation.

Thanks
Mankamana

From: "Holland, Jake" <jholland@akamai.com>
Date: Friday, March 23, 2018 at 8:16 PM
To: "Mankamana Mishra (mankamis)" <mankamis@cisco.com>, "draft-ietf-pim-drlb@ietf.org" <draft-ietf-pim-drlb@ietf.org>
Cc: "pim-chairs@ietf.org" <pim-chairs@ietf.org>
Subject: Re: [pim] please let me know any concern / comment for draft-ietf-pim-drlb-07 before we request for WGLC

Hi,

This is my follow-up email with my comments after reviewing the draft in more detail.

I hope it’s helpful. I won’t object if some of the comments are not addressed, especially if you disagree, but I wanted to bring to your attention the issues that I noticed.

I said I’d send it by today, so I wanted to send what I had, but I haven’t quite finished getting past 6.2.2. However, I thought this might still have some useful points to consider, so I thought I’d send what I have.

Best regards,
Jake


Technical comments (all minor):
<technical>
1.
From section 1 (Introduction), second to last paragraph:

   Another important issue is related to failover.  If R1 is the only

   forwarder on the last hop router for shared LAN, in the event of a

   failure when R1 goes out of service, multicast forwarding for the

   entire LAN has to be rebuilt by the newly elected PIM DR.  However,

   if there was a way that allowed multiple routers to forward to the

   LAN for different groups, failure of one of the routers would only

   lead to disruption to a subset of the flows, therefore improving the

   overall resilience of the network.

I’m confused, because if I understand it correctly, the new GDR selection does not try to solve this problem, since it does not use a consistent hashing algorithm (one which provides a stable mapping of entries to the GDRs that remained alive, hence reducing the expected group-remapping count to G/N, for G joined groups and N remaining GDRs).

So I think the expected re-assignment of active groups to different GDRs is almost all of them, for high count of both GDR and group. I think the modulo hashing has the property that the more GDRs that are available, the more flows will have to be re-mapped (with a minimum failover remapping count when there are exactly 2 GDRs and one of them fails).

I suggest removing this paragraph, since this document only defines one hash algorithm with the potentially-surprising failover characteristic that adding more GDRs adds more failover instability instead of reducing it.

(Or changing the hash algorithm to a consistent hash, but I assume since this is describing an existing implementation, that’s less desirable?)
Mankamana: Added text. Would send to you for review.

<jh2>
I might be missing it if there was more, but what I see is this added paragraph in Section 1:

   There is limitation in the hash algorithm used in this document, but
   this draft provides the option to have different and more consistent
   hash algorithms in the future.

(Nits: limitation->“a limitation” and “this draft”->”this document”)

I think the limitation mentioned probably needs a description and a justification, probably as a subsection in section 4.3. Here’s a very rough suggestion for some sample text, but it may be worth expanding by walking through an example, for instance. As a minimum, something like:

4.3.1 Limitations

The Modulo Hash Algorithm has poor failover characteristics, because when one of N GDRs goes out of service in a LAN where M groups are forwarded, the expected number of remapped groups is approximately M – (M/N), for M >> N.

However, because this document describes an existing implementation, the Modulo Hash Algorithm is the only hash algorithm defined in this document, and a hash algorithm type field is defined in <section> so that future extensions can define a more stable hash algorithm.

(then add a reference to the limitations section in the new paragraph in section 1 that mentions the limitation)
</jh2>

<jh2>
I noticed another problem in section 1. This paragraph describes manifesting the issue in a different way:

   The problem may also manifest itself in a different way.  For
   example, R1 happens to forward 500 Mbps worth of unicast data to H1,
   and at the same time, H2 and H3 each request 300 Mbps of different
   multicast data.  R1 experiences packet drop once again. while, in the
   meantime, there is sufficient forwarding capacity left on R2 and R3
   and unused link capacity between the switch and R2/R3.

However, I don’t see how this problem is different from the prior example. In both cases, the DR bandwidth is exceeded and causes loss, even though capacity is available, the only difference being the amount by which it’s exceeded.

Should this one be removed, or is there a difference I didn’t understand?
</jh2>


2.
In 5.1 (... (DRLBC) Hello Option), there’s a SHOULD which seems like a MUST, in order to actually use this specification. Is that correct?

If not, can you explain the value in permitting a participating router not to include this field in its hello? (And what happens if the router includes it only sometimes, if it’s a GDR candidate? I think that would result in much remapping of groups, correct?)
Mankamana: Addressed.

<jh2>
I’m not sure, but I think the edit that addresses this was in section 6.2 where it now says:
   If a router supports this specification then each of the interfaces
   where multicast protocol is enabled, it MUST advertise DRLBC Hello
   Option in its PIM Hello.

However, section 5.1 still says:
   This DRLBC Hello Option SHOULD be advertised by last hop routers from
   interfaces with this specification enabled.

I think section 5.1 should also say MUST here, correct?
</jh2>

3.
I suggest defining an IANA registry for hash algorithm types, so that there’s a defined level of specification requirements for using a new hash algorithm value. (Possibly reserving some values for experimentation.)

If that’s not desired, I suggest instead, since this document defines only one algorithm with value 0, to define the Hash Algorithm Type field in the DRLBC as reserved bits which MUST be set to 0 and ignored when received. (This of course would not change the behavior of the current spec, and would be changeable by a backward compatible hash negotiation algorithm in a future specification, if necessary. However, I think it would require removing all the text that refers to alternate hash algorithms.)
Mankamana: Addressed.

<jh2>
I don’t see where this was addressed, can you please point me to the changes that covered this?
</jh2>

4.
Section 6.1, last paragraph:

   If a PIM DR receives a neighbor DRLBC Hello Option, which contains

   the same hash algorithm type as the DR, and the neighbor has the same

   DR priority as the DR, PIM DR SHOULD consider the neighbor as a GDR

   Candidate and insert the GDR Candidate's Address into the sorted list

   of DRLBGRD Option.
I think this “SHOULD” is a MUST? (Otherwise, what happens if a router does not do this behavior?)
Mankamana: There could be policy which restricts number of PIM DR / GDR in shared LAN. then “SHOULD” would be ok. With policy, there are chance that one PIM router is eligible to be GDR but it can not be.
 <jh2>
OK, I think I get it. Thanks for the explanation.
</jh2>

5.
Section 6.2.2, last paragraph:

   3.  If a router receives IGMP/MLD report for flow for which the

       router has been the GDR AND the DRLBGDR has changed since last

       report for this flow, then the router MUST determine if it is

       still the GDR. if it is, no action needed. if it is not, then the

Is there a time limit on this requirement for how long ago the router might have been the GDR?
</technical>
Mankamana: Got rid of it. As it is redundant. Correct text is already in previous section.

<jh2>
OK, thanks.
</jh2>

Editorial comments:
<editorial>
1.
Section 4.1, 2nd paragraph:
Current:

For example, assume there are 4 routers on the LAN: R1, R2, R3 and

   R4, which all support this specification on the LAN.
Suggestion (remove redundant LAN):

For example, assume there are 4 routers on the LAN: R1, R2, R3 and

   R4, which all support this specification.

Mankamana: Addressed

2.
Section 4.2:
The capitalization of Hash Mask is inconsistent.
Suggestion: lower-case “hash mask” (except at beginning of sentences, in section headers, etc.).

Mankamana: Addressed

3.
Section 4.3:

   Modulo hash algorithm is discussed here as an example, with detailed

   description on hashvalue_RP.

The hash algorithm is not an example, it is a normative description of the only defined hash algorithm. I suggest:


   The modulo hash algorithm is discussed here, with a detailed

   description for hashvalue_RP. The same algorithm is described in brief

   for hashvalue_Group using the group address instead of the RP address

   for an ASM group with RP_hashmask==0, and also with hashvalue_SG for a

   the source address of an (S,G), instead of the RP address,

Mankamana: Addressed

4.
Section 5.2:

      Length: 3 x (4 byte or 16 byte) + n x (4 byte or 16 byte) where n

      is number of GDR candidate


Missing a period and the plurality doesn’t match. Suggest:

      Length: 3 x (4 byte or 16 byte) + n x (4 byte or 16 byte) where n

      is the number of GDR candidates.

Mankamana: Addressed

5.
Section 6.1:

The terms “LBGRD” and “DRLBGRD” in this section (appearing 1 and 2 times, respectively) should be “DRLBGDR”, I think?

Mankamana: Addressed

6.
Section 6.2:
I suggest replacing “proposal” with “specification” both places it appears.
I also suggest cutting the first sentence from the first paragraph.

Mankamana: Addressed



<jh2>OK, thanks for addressing editorial comments 1-6.</jh2>

7.
Section 6.2, paragraph 2:

   A router which supports this specification, a interface where this

   protocol is enabled advertises DRLBC Hello Option in its PIM Hello,

   even if the router may not be considered as a GDR Candidate, for

   example, due to low DR priority. once DR election is done, DRLBGDR

   Hello option would be received from the current PIM DR on link.

I think I don’t understand what this paragraph is saying. If the router is not a GDR candidate, it won’t be in the list and won’t propagate the join for this group, same as not being DR, correct?

I guess I suggest either removing it or clarifying?
Mankamana: It means if this specification is configured on interface, router MUST send DRLBC with PIM Hello. Sending DRLBC does not guarantee to be GDR. Whether a router is GDR or not on interface would be decided once elected PIM DR announce the list of GDR. Please let me know if you want this line to be rephrased.

<jh2>
Thanks, I think the changes are an improvement, though I think the new version still has grammar issues in the new text. I am rushing a bit to finish this review, but I may be able to provide more details in a follow-up email if you have trouble finding them.

For one example, this is not a complete sentence:
Though DRLBC option in PIM hello does not
   guarantee that this router would be considered as a GDR candidate.
</jh2>


8.
Section 6.2.2:
This section has several grammar errors (listed in the Nits section), but I think it also has some problems with clarity. For example, certain portions of the current structure, such as paragraph 4, “If it was GDR for group earlier.  and even new hash choose it as GDR, no processing required.” might be interpreted to mean that the whole hello doesn’t need further processing, but I assume this means only the specific group doesn’t need further processing?

This is a key technical portion of the specification, so I think it might be helpful to structure this section a little differently to improve the clarity. For example, something similar to the pseudocode in https://tools.ietf.org/html/rfc4601#section-4.3.2<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc4601-23section-2D4.3.2&d=DwMGaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=DiDH6xFdsoaygahCNTd1P4gvGjZrfDfj-IC36hSy0Bs&s=_3NOAUlP4Eom6N25nod5DXmtKQ590xm_pdAf9p-zk5I&e=> seems appropriate here (and possibly for the other parts of section 6.2), or perhaps tables that list all the possible state transitions for the receiving router and for each of the groups, and

<jh2>
Sorry, I see I didn’t finish my comment last time.

One of the clarity problems is that this refers to multicast groups often, but I think the same hash logic applies to individual (S,G)s as well, I assume?  So maybe the “(multicast flows)” language that appears late in the section could be defined up front and used consistently?

Another clarity problem is that the logic flow in section 6.2.2 reads like pseudocode, but it’s structured like prose. I think some of the grammar errors might be simpler to fix if the section were structured as pseudocode instead, but it also would be OK to fix the consistency and grammar issues. But I think one of those needs to happen for this section.

I haven’t listed all the consistency and grammar issues, but as an example: of the 5 “if” clauses in #1, to separate the predicate from the response, one clause uses “:”, the next 2 use “then”, the next uses a period that looks like an incomplete sentence ended (and also uses “than” as a misspelling, I think?), and the next uses a “,”.  I think at a minimum, for clarity, they all need to be consistent and grammatically correct.  (And the consistency issues continue in #2.)
</jh2>

</editorial>


Nits:
I noticed several grammar, spelling, and punctuation errors. I’ll list all the errors I noticed in this section.

In other sections, this review contains some suggestions which hopefully also would correct some of these issues, if accepted.

However, if some of the suggestions are not accepted, I also separately suggest that each of these below-listed errors that remain should also be fixed.

- Section 5.2, paragraph 3: “Candicate’s”
- Section 6.1 paragraph 1: “   DR that has this specification enabled on the interface, advertises”, comma is wrong here
- Section 6.2.2 paragraph 1: “   current PIM DR, no action needed.” -> no action is needed
- Section 6.2.2 paragraph 3: “algorithem” -> algorithm
- Section 6.2.2 paragraph 3: capitalized “It” following colon should be lower case
- Section 6.2.2 paragraph 4: “If it was GDR for group earlier.  and even new hash choose it”: this splits a sentence with a period, and is not grammatically correct
- Section 6.2.2 paragraph 5: “and now it is no more GDR” I think is not grammatically correct
- Section 6.2.2 paragraph 5: “Sec 6.3” I think is supposed to be spelled out “section 6.3”
- Section 6.2.2 paragraph 8: “2. If it was non GDR , and updated DRLBGDR” there’s an extra space before the comma
- Section 6.2.2 paragraph 10: “If it is hased as GDR , it need to build multicast”: extra space before the comma and “hashed” spelled wrong and “need” should be “needs”
- Section 6.2.2 paragraph 11: “no action needed. if it is not,” uncapitalized “If” and missing 2nd space after the period.
<jh2>
I happened to notice the extra space in 6.2.2 paragraph 8 is still there, as well as the “Sec 6.3” (which also is missing a period at end of sentence.) I didn’t check whether all the other nits were addressed.

Were you planning on another revision that checks through all the nits I mentioned?

I noticed a few more nits as well:
- Section 6.2.1: “then it need to process”, need->needs
- Section 6.2: “If it was not the GDR for a group earlier, than even the new hash does not make it GDR.”, I think “than even” -> “and”.  (However, I still suggest that refactoring section 6.2 as pseudocode will be more clear, which might remove this.)
</jh2>


<jh2>
Thanks for all the edits, and sorry for such a long list of issues.

I think the protocol extension sounds very useful, but I think the IESG and the RFC Editor will send it back if we try to move it forward without fixing the language problems in the document.

Once again I’m running short on time and stopping for now at section 6.2.2, but I encourage taking a closer look at the language in the following sections as well when posting the next draft, since I may make some similar comments when I get a chance to cover that part in closer detail.

Kind regards,
Jake
</jh2>

On 3/22/18, 9:04 AM, "Holland, Jake" <jholland@akamai.com<mailto:jholland@akamai.com>> wrote:


Hi Mankamana,



I apologize. On reviewing the video I see that I did comment in response to a question about draft-ietf-pim-drlb-07.



I had gotten up to comment on draft-ietf-pim-igmp-mld-yang (which has a security section that doesn't follow the guidelines from rfc 6087 section 3.4), and somehow didn't realize we were talking about draft-ietf-pim-drlb, even though I see now that the notes and the question were clear on that point. The discussion was a little rushed, and combining that with the jetlag, it looks like I just got confused which draft we were talking about.



I've now gone through pim-drlb again more carefully, and I agree with its Security Considerations section, that there are no new concerns not already described in RFC 4601's Security Considerations. (A forged Hello message can prevent local hosts from sending and receiving multicast traffic.)



I do have several editorial comments about the draft. I'll send a follow-up email today or tomorrow.



Sorry for the error,

Jake

On 3/21/18, 9:59 PM, "Mankamana Mishra (mankamis)" <mankamis@cisco.com<mailto:mankamis@cisco.com>> wrote:

Hi,
Yesterday we had requested for WGLC for this draft, as 6th version of draft presented in IETF-100 which had addressed all of the AD’s comment. This draft already has implementation. But there were some security concerns, can you please let us know comment on this draft. So that we can accommodate it and upload latest version soon.


Thanks
Mankamana