Re: [RTG-DIR] Rtgdir last call review of draft-ietf-bess-mvpn-fast-failover-11

Adrian Farrel <adrian@olddog.co.uk> Wed, 28 October 2020 13:51 UTC

Return-Path: <adrian@olddog.co.uk>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C5C8C3A093D; Wed, 28 Oct 2020 06:51:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Gewxn9mnQna; Wed, 28 Oct 2020 06:51:34 -0700 (PDT)
Received: from mta8.iomartmail.com (mta8.iomartmail.com [62.128.193.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F25773A095F; Wed, 28 Oct 2020 06:51:32 -0700 (PDT)
Received: from vs3.iomartmail.com (vs3.iomartmail.com [10.12.10.124]) by mta8.iomartmail.com (8.14.4/8.14.4) with ESMTP id 09SDpUqX022391; Wed, 28 Oct 2020 13:51:30 GMT
Received: from vs3.iomartmail.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BD48622042; Wed, 28 Oct 2020 13:51:29 +0000 (GMT)
Received: from asmtp2.iomartmail.com (unknown [10.12.10.249]) by vs3.iomartmail.com (Postfix) with ESMTPS id A77F62203A; Wed, 28 Oct 2020 13:51:29 +0000 (GMT)
Received: from LAPTOPK7AS653V (81-174-211-216.pth-as4.dial.plus.net [81.174.211.216]) (authenticated bits=0) by asmtp2.iomartmail.com (8.14.4/8.14.4) with ESMTP id 09SDpSps016499 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 28 Oct 2020 13:51:28 GMT
Reply-To: adrian@olddog.co.uk
From: Adrian Farrel <adrian@olddog.co.uk>
To: 'Greg Mirsky' <gregimirsky@gmail.com>
Cc: 'Routing Directorate' <rtg-dir@ietf.org>, draft-ietf-bess-mvpn-fast-failover.all@ietf.org, 'BESS' <bess@ietf.org>, last-call@ietf.org
References: <160313815345.29014.16143591054021036590@ietfa.amsl.com> <CA+RyBmVwRPkmmAKTtoXU8FOoBDOpmt8ZDQkjhbiikqX8xv+-cQ@mail.gmail.com>
In-Reply-To: <CA+RyBmVwRPkmmAKTtoXU8FOoBDOpmt8ZDQkjhbiikqX8xv+-cQ@mail.gmail.com>
Date: Wed, 28 Oct 2020 13:51:28 -0000
Organization: Old Dog Consulting
Message-ID: <05d801d6ad31$6f9c8620$4ed59260$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_05D9_01D6AD31.6F9E33D0"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQEXc6/5URbdVXHTAc3Ha8peCvXEZQJ9A0Khqxd16tA=
Content-Language: en-gb
X-Originating-IP: 81.174.211.216
X-Thinkmail-Auth: adrian@olddog.co.uk
X-TM-AS-GCONF: 00
X-TM-AS-Product-Ver: IMSVA-9.0.0.1623-8.2.0.1013-25754.000
X-TM-AS-Result: No--4.873-10.0-31-10
X-imss-scan-details: No--4.873-10.0-31-10
X-TMASE-Version: IMSVA-9.0.0.1623-8.2.1013-25754.000
X-TMASE-Result: 10--4.873300-10.000000
X-TMASE-MatchedRID: yebcs53SkkCWfDtBOz4q23FPUrVDm6jtekMgTOQbVFue6/kWlKQFQWRg RUDjCLqrmJIhvBinZQxO3y1XkRCCnnDrA8J4/NO6YJF0zuzVwRSpR2kMGcsw7UCVoJv7NVAfBvi 4YesnRHYIuGZikOfLh6LPz7uVwd1gaILlNzS8ndll2ityh8f8aSIk3dpe5X+huhsYdHl6O2thCt QZJIvd6egAIh6oJBlwtg5MlVXePefsEz9ycWwbCuy7jhAGPQakcf2CPmeUCZL5w23Kuluq0ccY9 /dSZVY/3S1kXXd34PJQJFvlrBLJrZ9AIZGRks5CCbXjsXAtrYoGchEhVwJY37+Z3Zp2Td1E9VlG BjCDnciixZ/C6bskHmeb0JZzELt+5neCrx57xxpBsDU0AwZiuClayzmQ9QV0f//XOd24EV0FPJu ewlGgpnBTZPBDeMTkGg8fd2KwPQtQ6n/rRNAUKq3SxRSke3bpZAGtCJE23YgELMPQNzyJS9nN+w KqGAutcN0aVVlT6sdYWWNFohKiu45Xk1APEhaono3qNdYltDcNXjld7zBYeMbJavo8UnfR84o3r C5YI2l1Joq1iFPTtrF/kSFzUBGjsFye7MW0LjRn2VXrfPB6QouaUBd5Za/pJVRUk0oz7R3GP5uN 9TaCZRW+ozhHCHFaZt0t7JnKJ0TSyf+7bm7bolQHuCSA8AmwX6zb82IV74wRQQ4kFqjjJJDFyMt 37C45Y3Tf9ounC4WaHbRNXJBbGmXDmkYNmLn0sFSKfGPIprXBnL3AaGm9o1uB5ItLuryEKTHkhV NiLxVyPmvKIOP0myC06NhJUJwY5pn35+/o15meAiCmPx4NwGmRqNBHmBveGtkvK5L7RXGw7M6dy uYKg4KKFSXtmHPq689GCz5kPyv7kLMA9h9H0AiJJEfr7FPE54MA3iTWsQDXLbprgLZ7piIfQDPd mtkNpVujXU7rEcEe8dfhOt8eyzGlIfD0PJ2C9WDtpS8hL6WCsdXuELGCJuN/ZRkH4or4iMZrcOx Kd7XIcJN0SFwx7MW7ITbFM3dM0h0KGmwSq5wR45lV3u84/8Tnihm4os/F
X-TMASE-SNAP-Result: 1.821001.0001-0-1-12:0,22:0,33:0,34:0-0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/6L8OWFk0FVgx58eDqKv8o6abrxk>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-bess-mvpn-fast-failover-11
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Oct 2020 13:51:38 -0000

Hello Greg,

 

Thanks for this. I’m cutting down to places where we still need to interact. Look for [af] and blue text.

 

Nothing alarming.

 

Best,

Adrian

 

Section 3 notes that the procedure (presumably the procedure defined
in this section) is OPTIONAL. I didn't see anything similar in sections
4 and 5 stating that those procedures are optional. Presumably, since
this document is not updating any other RFCs, all of these procedures
are optional.

Actually it would be good to clarify how all these procedures fit in
with "legacy" deployments, and how they are all optional procedures. I
think that needs a short statement in the Introduction and a small
section of its own (maybe between 6 and 7).

GIM>> Thank you for the suggestion. I've updated the Introduction in this way:

OLD TEXT:

   Section 4 describes protocol extensions that can speed up failover by
   not requiring any multicast VPN routing message exchange at recovery
   time.

   Moreover, section 5 describes a "hot leaf standby" mechanism, that
   uses a combination of these two mechanisms.  This approach has
   similarities with the solution described in [RFC7431] to improve
   failover times when PIM routing is used in a network given some
   topology and metric constraints.

NEW TEXT:

   Section 4 describes optional protocol extensions that can speed up
   failover by not requiring any multicast VPN routing message exchange
   at recovery time.

   Moreover, Section 5 describes a "hot leaf standby" mechanism that can
   be used to improve failover time in MVPN.  The approach combines
   mechanisms defined in Section 3 and Section 4 has similarities with
   the solution described in [RFC7431] to improve failover times when
   PIM routing is used in a network given some topology and metric
   constraints.

 

I think that Section 5 is intended to explain how introduced BGP extensions and their use described in Section 3 and Section 4 enable operators to provide protection for multicast services. Would you suggest adding a new text to the section to highlight particular aspects of introducing protection in MVPN?

 

[af] OK I obviously wasn’t clear. What I’m looking for is something like…

 

The procedures described in this document are optional to enable an operator to provide protection for multicast services. An operator would enable these mechanisms using <foo> and it is assumed that these mechanisms would be supported by all <what?> in the network for the procedures to work. In the case that a BGP implementation does not recognise or is configured to not support the extensions defined in this document, it will respond <somehow> as described in <rfc????>. This would result in <something>.

 

It is curious (to me) that 3.1.1 describes a way to know that a P-tunnel
is up.  You don't say, however, if being unable to determine that the
P-tunnel is up using this method is equivalent to determining that the
P-tunnel is down. (Previously in 3.1 you have talked about the "tunnel's
state is not known to be down".)

GIM>> This method, as noted in the document, is similar to BGP next-hop tracking, may be computationally intensive, and cannot be run frequently. So, in periods between checking whether the root address in the x-PMSI Tunnel attribute is reachable the state is "not known to be down".

 

[af] Well, OK. Can you add to say that, “If it is not possible to determine whether the state of a tunnel is ‘up’, the state shall be considered as ‘not known to be down’, and it may be treated as if it is ‘up’ so that attempts to use the tunnel are acceptable.” This is probably “obvious to one skilled in the art,” but would help this reader.

 

By the way, do you ever say that a P-tunnel has just these two statuses
(up and down) because that could make a big difference?

GIM>> I think that the document then needs to discuss what impact detection time has on MVPN. For example, if the detection time is in single-digit seconds, a two-state model can be used. But would it be a useful model if the detection time is in tens of seconds? Should a "not known to be down" state be introduced?

 

[af] Yes, that *seems* to be the implication. But is there any different action between “up” and “not known to be down”? If you have three states then there is (possibly) an implication that tunnels are prioritised by state. I think, however, that it is OK to use “not known to be down” as if it was “up”.

 


Note that 3.1.2 etc also establish ways to know that the tunnel is up,
but not ways to determine whether the tunnel is down.

GIM>> In this section the state of a P-tunnel is equated with the state of the last link of that tunnel. The document notes that if the link is Up, then the P-tunnel is considered down. It is implied, that if it is determined that the link is Down, then the state of the P-tunnel is considered Down. Would you recommend adding an explanation to the document? 

 

To reiterate, "I don't know if it is up" is not the same as "I know it
is down."

GIM>> Indeed. It is analogous to "it was Up the last time I've checked on it". It meant to be used when the interval between checking is significant.

 

[af] Assuming there is typo in what you just wrote – link not up ==> tunnel down.

Also assuming that we don’t go down the “three state” path, then “not checked for a while” is still “Up”.

I think it was the phrasing in 3.1.2. It sounded like “Here is a way to know that the tunnel is up” which is a good thing, but does not say that it is exclusive. So, avoiding the “implication” would be a good thing. Something like 

OLD

   A condition to consider a tunnel status as Up can be that the last-

   hop link of the P-tunnel is Up.

NEW

   A condition to consider a tunnel status as Up can be that the last-

   hop link of the P-tunnel is Up.  Conversely, if the last-hop link of

   the P-tunnel is Down then this can be taken as an indication that

  the P-tunnel is Down.

END


3.1.2

   Using this method when a fast restoration mechanism (such as MPLS FRR
   [RFC4090]) is in place for the link requires careful consideration
   and coordination of defect detection intervals for the link and the
   tunnel.  In many cases, it is not practical to use both protection
   methods at the same time.

OK, I considered them carefully. Now what? :-)

I think you have to give implementation guidance.

GIM>> I agree, an operational recommendation could be helpful. Usually, in case of multi-layered protection, detection intervals on the higher layer are 10 times of guaranteed restoration time of the lower layer. Would you recommend adding this to the text as an example of a deployment? 

 

[af] An example would be fine (and a forward reference from here). But it would be fine, maybe better, to offer half a sentence of guidance. So…”not practical to use both protection methods at the same time because <adverse interactions?>….”

 

All of 3.1.x are timid about the use of the mechanisms they describe.

I think that the end of 3.1 should say that an implementation may choose
to use any of these mechanisms to determine the status of the P-tunnel.

GIM>> Will the following text reflect that:

NEW TEXT:

   An implementation may support any combination of the methods
   described in this section and provide a network operator with control

   to choose which one to use in the particular deployment.

 

[af] Good.

 

3.1.6

What should I do if I don't recognise or support the setting of the BFD
Mode field?

GIM>> I think that the same handling applies as for the malformed attribute:

   If malformed, the UPDATE
   message SHALL be handled using the approach of Attribute Discard per

   [RFC7606]. 

I propose to extend the applicability of the rule with the following update to the sentence:

NEW TEXT:

   The BFD Discriminator attribute MUST be considered malformed if its
   length is not a non-zero multiple of four.  If the setting of the BFD
   Mode field is not recognized or not supported, or the attribute
   considered malformed, the UPDATE message SHALL be handled using the
   approach of Attribute Discard per [RFC7606].

 

[af] This is a bit subtle and refers also to my first point in this email. If the setting of the BFD Mode is not recognised or not supported, then it is likely because this specification is not supported. Therefore, this specification cannot mandate how the implementation will behave. I think you have to separate:

*	The malformed SHALL be handled using Attribute Discard according to [RFC7606]
*	An unknown or unsupported attribute will be handled by implementations according to the procedures for unknown attributes described in <foo>

 

4.1

   The normal and the standby C-multicast routes must have their Local
   Preference attribute adjusted

Should this be "MUST"?

GIM>> I think that is not an actionable 'must'. It could be expressed as

The Local Preference attribute of the normal and the standby C-multicast route needs to be adjusted.

Would you recommend using the re-worded passage?

 

[af] The alternative text is good.

 

==Nits:==



Section 3 has

   Because of that, procedures described in Section 9.1.1 of [RFC6513]
   MUST be used when using I-PMSI P-tunnels.

Aren't those procedures already mandatory? That section of 6513 already
uses "MUST" (although it oes go on to say that it might not be possible
to apply the procedure and delegates processing to 9.1.2 and 9.1.3 -
peculiarly using lowercase must for that delegation). I wonder whether
you are saying "this case is covered by the procedures of Section 9.1.1
of [RFC6513]" or are you actually defining new normative behaviour?

GIM>> I think that the use of lower case 'must' is ambiguous and somewhat confusing. You are right, the intention is to refer to Section 9.1.1 as the mandatory behavior. But neither 9.1.2, nor 9.1.3 use the normative language. What would you recommend?

 

[af] Maybe…

“Because of that, the procedures of Section 9.1.1 of [RFC6513] are applicable. That document is a foundation for this document and its processes all apply here. Section 9.1.1 mandates the use of specific procedures for sending intra-AS I-PMSI A-D Routes.”

 

4.1

   As long as C-S is reachable via the Primary
   Upstream PE and the Upstream PE is the Primary Upstream PE.

This sentence doesn't seem to be complete. What is the consequence of
this condition?

GIM>> It suppose to be

   As long as

   C-S is reachable via the Primary Upstream PE, the Upstream PE is the
   Primary Upstream PE.

Is it better?

 

[af] That makes sense