Re: [bess] Alvaro Retana's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Alvaro Retana <aretana.ietf@gmail.com> Mon, 21 December 2020 20:01 UTC

From: Alvaro Retana <aretana.ietf@gmail.com>
In-Reply-To: <CA+RyBmVRS3L51cqJgbsgYM8JOaBhmR+F=SabgP_54xOSGnZi3Q@mail.gmail.com>
References: <1336556383.1214634.1608220368883.ref@mail.yahoo.com> <1336556383.1214634.1608220368883@mail.yahoo.com> <CAMMESsxqkuSMkKRt-q=PagiF8dRGda-MBAvpKGRsEXWqgbaR7w@mail.gmail.com> <CA+RyBmVRS3L51cqJgbsgYM8JOaBhmR+F=SabgP_54xOSGnZi3Q@mail.gmail.com>
MIME-Version: 1.0
Date: Mon, 21 Dec 2020 12:00:59 -0800
Message-ID: <CAMMESsxV36nhiXjy5bEYFuHx-CmTLHLPDDA757vuzEPpbW809A@mail.gmail.com>
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: "bfd-chairs@ietf.org" <bfd-chairs@ietf.org>, Stephane Litkowski <slitkows.ietf@gmail.com>, "draft-ietf-bess-mvpn-fast-failover@ietf.org" <draft-ietf-bess-mvpn-fast-failover@ietf.org>, "bess-chairs@ietf.org" <bess-chairs@ietf.org>, The IESG <iesg@ietf.org>, "bess@ietf.org" <bess@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/mi4J73wVA6l3Ilw2LxF4oH4Y-U8>
Subject: Re: [bess] Alvaro Retana's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)
Precedence: list

On December 20, 2020 at 7:24:34 PM, Greg Mirsky wrote:

Greg:

Hi!

I'm leaving in only the parts where we don't agree or where I have commments.

Thanks!

Alvaro.


...
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> >
> > (1) This document describes several methods to determine the status of a
> > tunnel (in §3), none of which "provide a "fast failover" solution when
> > used alone, but can be used together with the mechanism described in
> > Section 4" (§1). §3 also says this:
> >
> > An implementation may support any combination of the methods
> > described in this section and provide a network operator with control
> > to choose which one to use in the particular deployment.
> >
> > While §3.1 is clear in the fact that it is not a requirement for all
> > downstream PEs to use the same mechanism, there are no guidelines to aid
> > the operator to chose which mechanism to use. Some cases may be obvious
> > (e.g. §3.1.3 applies to tunnels of a specific type), but others are not.
> > I would like to see deployment considerations related to the advantages/
> > disadvantages that each method may have in specific situations (including
> > their possible combination).
>
> GIM>> I think it might be not that simple to compare deployment challenges
> and benefits resulting from the deployment of different P-tunnel monitoring
> methods because we have to somehow abstract from an impact of choices made
> by respective implementations. Also, there probably should be a reference
> environment, a use case we agree with. I agree that such a comparative
> analysis of P-tunnel monitoring methods is useful it doesn't seem to benefit
> this document as the choice of a developer which ones to implement and an
> operator which one to use don't affect the functionality discussed in the
> draft. Would you agree?

No, I don't.  That's the reason for this DISCUSS point. :-)

Why is this document specifying multiple mechanisms?  Why not specify
just one?  At a high level, I assume the answer is that one size
doesn't fit all, and that specific deployments might benefit from
specific mechanisms while others may not be as useful.   I don't think
that to provide guidance (I'm not asking for in-depth analysis) there
needs to be a priori agreement on use cases, etc...beyond what the WG
has already been considering for the development of this document.

Your initial statement that "it might be not that simple to compare"
is precisely the reason I think that operational guidance is needed.



> > (2) The BFD Discriminator Attribute has a very narrow application in this
> > document when compared to the potential other uses given the extensibility
> > possibilities related to bootstrapping BFD. I have serious concerns about
> > the attribute being defined in this document, amongst a series of other
> > mechanisms.
> >
> > (2a) The tunnel can be monitored without the new BGP Attribute (assuming
> > proper configuration of course). Why is that option is not even mentioned
> > in the document?
>
> GIM>> You are right, there could be other methods to bootstrap a p2mp BFD
> session. But since RFC 8562 has no discussion of bootstrapping a session,
> having it in this draft seemed somewhat out of place. We can add an
> informational reference to Section 4 of draft-mirsky-mpls-p2mp-bfd
> (https://datatracker.ietf.org/doc/draft-mirsky-mpls-p2mp-bfd/)where several
> bootstrapping options discussed. Would that be acceptable?

No, but thanks for pointing at that other draft which supports the
fact that the attribute is not the only way to bootstrap the session.

My point here is that you don't need the attribute for BFD to be an
useful tool.  However, the description of the tool (monitoring) is
tied to the attribute (including the point below about deleting the
BFD session).  It doesn't have to be: the use of BFD should be
independent of the bootstrapping mechanism.

Do you consider other possible ways of bootstrapping the session
valid?  Would it be ok to use them to enable the use of BFD?  Can a
BFD session that is setup using a different mechanism used to monitor
the tunnel?



...
> > (2b) The fact that BFD monitoring can be achieved without the new
> > attribute makes me think that the bootstrapping of BFD using BGP would be
> > better served in a document produced by the BFD WG. One of the editors has
> > expressed the same opinion [1] [2]. Has a discussion taken place in the
> > BFD WG (or at least with the Chairs) about this work? Why was it not taken
> > up there?
> >
> GIM>> Work on draft-mirsky-mpls-p2mp-bfd is progressing at MPLS WG. AFAIK,
> it is in the state Candidate for WG Adoption. Moving the definition of the
> BFD Discriminator attribute to that draft, as I understand, would require
> using it as the normative reference. Comparing the current states of both
> documents, would severely delay the publication of MVPN Fast Failover
> specification and likely affect implementations of P-tunnel monitoring
> mechanisms. I hope you can agree with the current organization of the
> document.

That doesn't address my question about whether this work should have
been done in the BFD WG -- as you had suggested (links above).

I'm sure you saw the message I sent to Reshad/Jeff on this topic.
I'll rely on their and Martin's opinion.  BTW, I agree with Jeff in
that bfd/idr should be given the opportunity to review this document.





> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
...
> > (2) s/is an OPTIONAL procedure/is an optional procedure
> > This is not a normative statement to require capitalization.
>
> GIM>> I think that the use of the normative form is reasonable. The sentence
> can be re-worded using MAY. For example:
> OLD TEXT:
> The procedure described here is an OPTIONAL procedure that is based
> on a downstream PE taking into account the status of P-tunnels rooted
> at each possible Upstream PE, for including or not including each
> given PE in the list of candidate UMHs for a given (C-S, C-G) state.
> NEW TEXT:
> The procedure described here MAY be used in a BGP/MPLS MVPN [RFC6513]. It is
> based on a downstream PE taking into account the status of P-tunnels rooted
> at each possible Upstream PE, for including or not including each given PE
> in the list of candidate UMHs for a given (C-S, C-G) state.
>
> We wanted to stress that without this mechanism BGP/MPLS MVPN, as defined in
> RFCs 6513 and 6514, is fully functional and architecturally complete. This
> draft discusses mechanisms that support the protection and faster
> convergence in MVPN's control plane. And since these mechanisms only an
> improvement compared to the BGP mechanisms, we wanted to emphasize that by
> using the normative form. Would you agree?

No, I don't.  Normative language was not meant for emphasis.  In any
case, this is not a blocking comment.




...
> > (4) §3.1.1: "similar to BGP next-hop tracking" Is this specified
> > somewhere? I don't remember seeing a specification for next-hop tracking,
> > but do know that implementations do it -- in an implementation-specific
> > way. Please add a little more text about what is meant/expected.
>
> GIM>> Indeed, BGP next-hop tracking is internal to a system behavior that
> has not been, to the best of my knowledge, been documented at IETF or any
> SDO. Would the following text provide reasonable context information:
>
> That is similar to BGP next-hop tracking for VPN routes, except that
> the address considered is not the BGP next-hop address but the root
> address in the x-PMSI Tunnel attribute. BGP next-hop tracking is a
> feature that reduces the BGP convergence time comparing to the
> "regular" BGP by monitoring BGP next-hop address changes in the
> routing table. It's event-based because it detects changes in the
> routing table. When it detects a change, it performs a next-hop scan
> to find if any of the next hops in the BGP table is affected and updates
> it accordingly.

Suggestion:

OLD>
   BGP next-hop tracking is a feature that reduces the BGP convergence time
   comparing to the "regular" BGP by monitoring BGP next-hop address changes
   in the routing table. It's event-based because it detects changes in the
   routing table. When it detects a change, it performs a next-hop scan to
   find if any of the next hops in the BGP table is affected and updates it
   accordingly.

NEW>
   BGP next-hop tracking monitors BGP next-hop address changes in the routing
   table.  In general, when a change is detected, it performs a next-hop scan
   to find if any of the next hops in the BGP table is affected and updates it
   accordingly.>



...
> > (6) The "reachability condition" is mentioned in §3.1.1/§3.1.3/§3.1.4.
> > Does this mean that that root tracking (§3.1.1) should be used with the
> > other mechanisms? The specific text says that "the downstream PE can
> > immediately update its UMH when the reachability condition changes",
> > giving the impression that the combination is possible but not required.
> >
> > Note that §4.3 is titled "Reachability Determination", which I hoped would
> > shed more light, but all it does is point back to §3.1.
>
> GIM>> I don't see any benefit of concurrently using more than one of the
> described in the document mechanisms that can monitor the state of a
> P-tunnel. That is certainly possible but, in my opinion, would add
> unnecessary complexity. The last paragraph in Section 3.1 is intended as the
> introduction to sub-sections that discuss different monitoring mechanisms:
>
> An implementation may support any combination of the methods
> described in this section and provide a network operator with control
> to choose which one to use in the particular deployment.
> Would you suggest an update to this paragraph to clarify the statement?

See the first DISCUSS point above.

You didn't explicitly answer the question:  is the text in
§3.1.1/§3.1.3/§3.1.4 an implication that root tracking should be used
with the other mechanisms?  If not, then that does it mean?



...
> > (8) §3.1.2 mentions that "careful consideration and coordination" is
> > needed when using other mechanisms such as rfc4090 "because uncorrelated
> > timers might cause unnecessary switchovers and destabilize the network."
> > What are the associated timers related to the mechanisms in this section?
>
> GIM>> This is in reference to the defect detection timers. When using
> multi-layer protection particular consideration must be given to the
> interaction of defect detections at different layers of a network. It s
> recommended to use longer detection intervals at the higher layers. Some
> recommendations suggest using a multiplier of 3 or larger, e.g., 10 msec
> detection for FRR and at least 100 msec for e2e detection.

Can you add something like that to the document?



...
> > (10) §3.1.4: "An Upstream PE SHOULD be removed from the UMH candidate
> > list...if...the upstream one-hop branch of the tunnel from P to PE cannot be
> > built." When is it ok to not remove the PE? IOW, why is this action not
> > required?
>
> GIM>> Thank you for bringing this case. It was discussed during Shepherd's
> review and we've asked for the expert's opinion. Jeffrey Zhang kindly
> suggested the current text. (https://mailarchive.ietf.org/arch/browse/bess/?gbt=1&index=LxKYi9F6u1tl2qKtR2Q8wkQPMOA)
> I'll check with him and get back with the answer.

ok.




...
> > (12) §3.1.5 says that "where this mechanism is used in conjunction with
> > the method described in Section 5...downstream PEs can compare reception
> > on the two P-tunnels to determine when one of them is down", but §5 says
> > that "downstream PEs accept traffic from the primary or standby tunnel,
> > based on the status of the tunnel (based on Section 3)". IOW, §3.1.5
> > points at §5 as providing a way to determine if a tunnel is down, while §5
> > points back at §3 as the way to determine which tunnel to receive from.
> > This pointing back and forth is not a total contradiction, but it needs to
> > be clarified.
>
> GIM>> Though it might appear as a circular reference, it was not the
> intention. An implementation of the method described in the last paragraph
> of Section 3.1.5 can periodically accept traffic from primary and standby
> tunnels s the method of determining the state of the primary P-tunnel. I've
> updated that paragraph to note that the comparison can be done periodically:
>
> OLD TEXT:
> In cases where this mechanism is used in conjunction with the method
> described in Section 5, no prior knowledge of the rate or maximum
> inter-packet time on the multicast streams is required; downstream
> PEs can compare actual packet reception statistics on the two
> P-tunnels to determine when one of them is down. The detailed
> specification of this mechanism is outside the scope of this
> document.
>
> NEW TEXT:
> In cases where this mechanism is used in conjunction with the method
> described in Section 5, no prior knowledge of the rate or maximum
> inter-packet time on the multicast streams is required; downstream
> PEs can periodically compare actual packet reception statistics on
> the two P-tunnels to determine when one of them is down. The
> detailed specification of this mechanism is outside the scope of this
> document.

Let me try again.

§3.1.5 says that the "PEs can compare reception on the two P-tunnels
to determine when one of them is down".  OTOH, §5 says that the "PEs
accept traffic from the primary or standby tunnel, based on the status
of the tunnel".  The difference that I'm trying to point at is that §5
says that traffic is accepted from only *one* tunnel ("primary *or*
standby"), but §3.1.5 talks about receiving from *both*.

Adding "periodically" doesn't help because the basic contradiction
(receiving from one or both) is still there.




> > (13) §3.1.6: "An implementation that does not recognize or is configured
> > not to support this attribute MUST follow procedures defined for optional
> > transitive path attributes in Section 5 of [RFC4271]."
> >
> > There cannot be a Normative action specified for a node that "does not
> > recognize...this attribute" because, by definition, it can't be assumed
> > that it is aware of this specification. In this case, it is not necessary
> > to say anything about unrecognized attributes because that is already
> > specified in rfc4271.
> >
> > For the "configured not to support this attribute" case, it should be
> > pointed out that the node should operate as if the attribute was
> > unrecognized.
> >
> > Suggestion>
> > An implementation that is configured not to support this attribute MUST
> > follow the procedures defined in Section 5 of [RFC4271] as if the attribute
> > was unrecognized.
>
> GIM>> Thank you for pointing this out. In the course of addressing comments
> from other IESG reviewers, this text was updated to: This document defines
> the format and ways of using a new BGP attribute called the "BFD
> Discriminator". It is an optional transitive BGP attribute. Thus it is
> expected that an implementation that does not recognize or is configured not
> to support this attribute follows procedures defined for optional transitive
> path attributes in Section 5 of [RFC4271].
> Would the current text address your concern?

Mmmm...sure.

rfc4271 doesn't talk about the case where a node is "configured not to
support" an attribute.  The reason I suggested the specific link
("configured not to support this attribute...as if the attribute was
unrecognized.") is that the behavior of ignoring (which may be an
interpretation of "configured not to support") is different than if
the attribute was unrecognized.  The text above doesn't make that link
and, while some implementations may interpret it the way you meant it,
others may not.




> > (15) §3.1.6: "The BFD Discriminator attribute MUST be considered malformed
> > if its length is not a non-zero multiple of four." Ok, except that the
> > specification of the attribute doesn't mention the length (only the length
> > of the TLVs). Please specify the length and any considerations related to
> > the Extended Length bit. Also, given that this is a new attribute, with an
> > unspecified potential number of TLVs, and that the length is apparently
> > unbounded, all leading to the potential need for extended messages, please
> > specify how to handle peers that cannot accommodate more than 4k octet
> > messages (rfc8654).
>
> GIM>> I've noticed a note from Jeff Hass. Would you agree with his opinion?

Partially. ;-)

He's right about the Extended Length bit.


My reference to rfc8654 is because of this text from §5:

   It is RECOMMENDED that BGP protocol developers and implementers are
   conservative in their application and use of BGP Extended Messages.
   Future protocol specifications MUST describe how to handle peers that
   can only accommodate 4,096 octet messages.

As I mentioned before, it concerns me that the attribute (while having
a narrow use in this document) can have wider applicability.  The
extensibility is significant by allowing sequential or nested TLVs.
This combination may result in a large attribute, leading to large
Updates, specially when considering other MVPN-related
attribute/communities, etc.

Considering the text in rfc8654, the question is: is the attribute
necessary always, or can it be removed/omitted in some cases?  This is
not an attribute that is necessary for route selection, for example.
Given that rfc4271 says this:

   If, due to the limits on the maximum size of an UPDATE message (see
   Section 4), a single route doesn't fit into the message, the BGP
   speaker MUST not advertise the route to its peers and MAY choose to
   log an error locally.

...it may be possible that by not propagating the new attribute the
size of the update will be reduced so that the advertisement can be
made.  OTOH, if the attribute is removed then the tunnel can't be
monitored.

All I want is for you to consider any potential implications of the
new attribute.  The conclusion may be that no further change is
needed.




...
> > (17) §3.1.6.1: "MUST use its IP address as the source IP  address" Which
> > address? Please be specific.
>
> GIM>> Section 3.1.6.1 includes the list that an Upstream PE is required to
> follow: To enable downstream PEs to track the P-tunnel status using a point-
> to-multipoint (P2MP) BFD session the Upstream PE:
> ....
> o MUST use its IP address as the source IP address when transmitting
> BFD Control packets;
> Would adding the reference to the Upstream PE address your concern:
> o MUST use its, i.e., the Upstream PE's, IP address as the source IP address
> when transmitting BFD Control packets;

No -- the question is which address of the Upstream PE?  I'm assuming
that an upstream PE can have several addresses, which one is to be
used?  I can guess it is the address of the interface used to reach
the destination (or something like that)...but I would just want it to
be clear.




> > (18) §3.1.6.2: If the IP address doesn't map correctly at the downstream
> > PE (for example, a different local address is used that doesn't correspond
> > to the information in the PMSI attribute), what action should it take? Can
> > the tunnel still be monitored?
>
> GIM>> There's a possibility that the same downstream PE is monitoring more
> than one P-tunnel. Since each Upstream PE assigns its own BFD Discriminator,
> there's a chance that the same value is picked by more than one Upstream PE.
> According to Section 5.7 of the RFC 8562:
> IP and MPLS multipoint tails MUST demultiplex BFD packets based on a
> combination of the source address, My Discriminator, and the identity
> of the multipoint path that the multipoint BFD Control packet was
> received from. Together they uniquely identify the head of the
> multipoint path.
>
> We may consider adding the source address in the BFD Discriminator attribute
> as an optional TLV. I think that might be a good extension that can be
> introduced in a new document.

Why wait for a new document?  You made a pretty good case for
signaling the source address.



...
> > (22) §4: "Such behavior is referred to as "revertive" behavior and MUST be
> > supported." The text around this sentence seems to indicate that the
> > revertive behavior is the default, is that the intent? Or if the intent
> > for it just to be supported (as written)? Please be clear.
>
> GIM>> This part of the document has been updated in the course of addressing
> IESG comments:
> Such behavior is referred to as "revertive" behavior and MUST be supported.
> Non-revertive behavior refers to the behavior of continuing to select the
> backup PE as the UMH even after the Primary has come up. This non-revertive
> behavior MAY also be supported by an implementation and would be enabled
> through some configuration. Selection of the behavior, revertive or
> non-revertive, is an operational issue, but it MUST be consistent on all PEs
> in the given MVPN.
>
> Do you find the updated text clear enough?

Yes.  But it now brings up a new question: can you provide operational
guidance on the selection of this behavior?



> > (23) §4.1: "...routes that carry the "Standby PE" BGP Community MUST have
> > the LOCAL_PREF attribute set to zero." What should a receiver do if the
> > LOCAL_PREF is not zero?
>
> GIM>> I believe that the preceding text describes the situation when the
> LOCAL_PREF != 0:
> ... two different downstream PEs
> consider different Upstream PEs to be the primary one. In that case,
> without any precaution taken, both Upstream PEs would process a
> standby C-multicast route and possibly stop forwarding at the same
> time.

No, it doesn't.  The text specifies that the update "MUST have the
LOCAL_PREF attribute set to zero".  The question is not about the
potential effect, but about the value being anything else (> 0).  Note
that the value can be low enough (> 0) and no adverse effect may take
place, but the text doesn't specify a "low value", it explicitly
specifies 0.



> > (24) §4.1: In the last paragraph of this section, if I follow correctly,
> > the text talks about the case where the standby becomes the primary and
> > the updated advertisement doesn't have the Standby PE community. If that
> > is correct, then s/ presence/absence of the Standby PE BGP Community/
> > absence of the Standby PE BGP Community
> >
> > Also, the last sentence says that the "LOCAL_PREF attribute MUST be set to
> > zero". If the community is not present, how can a receiver enforce this?
> > What action should it take if the LOCAL_PREF has a different value?
>
> GIM>> Thank you for the suggestion, it clarifies the text. As I understand,
> the requirement is for the Standby Upstream PE, not for a downstream PE. Would
> the following update make that clearer:
> OLD TEXT:
> The LOCAL_PREF attribute MUST be set to zero.
> NEW TEXT:
> The new Upstream PE MUST set the LOCAL_PREF attribute to
> zero for that C-multicast route.

I don't know what the difference in the meaning is. :-(

There are three parts to my question:

(1) If the community is not present, then this would be a "normal"
Update.  How does the receiver know the difference so it can enforce
the "MUST"?

(2) What should the receiver do if the value is not as specified?

(3) Given that the LOCAL_PREF for Updates *with* the community MUST be
0, what's the use of sending Updates *without* the community with the
same value?

[bess] Alvaro Retana's Discuss on draft-ietf-bess… Alvaro Retana via Datatracker
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Jeffrey Haas
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Reshad Rahman
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Jeffrey Haas
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Jeffrey Haas
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Jeffrey Haas
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Alvaro Retana
Re: [bess] Alvaro Retana's Discuss on draft-ietf-… Greg Mirsky