Re: [RTG-DIR] Rtgdir last call review of draft-ietf-bess-mvpn-fast-failover-11

Greg Mirsky <gregimirsky@gmail.com> Wed, 28 October 2020 21:34 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 923553A00D9; Wed, 28 Oct 2020 14:34:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.696
X-Spam-Level:
X-Spam-Status: No, score=-0.696 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_COMMENT_SAVED_URL=1.391, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_HTML_ATTACH=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TRiJ_OMIBNxq; Wed, 28 Oct 2020 14:34:26 -0700 (PDT)
Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 842F93A005E; Wed, 28 Oct 2020 14:34:25 -0700 (PDT)
Received: by mail-lj1-x22a.google.com with SMTP id m16so827667ljo.6; Wed, 28 Oct 2020 14:34:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/bU4eqNFdU2lE3vHXhVtU+6k+gvjhCVnLFOnPaSb0Ak=; b=KWsksnpq76rVLvC1Ai8WlJ4tV4pgSPVcmj6m61xlKJ191p34m+PxRjaYiEsHXBoJov l1puccZbVfZ6ZwZHOGu9LG4a02G9DAkfGIEv1YHv/rvdJYbqYtK8xTvOIhN2wnCZIlnv ij9bCAfSBQ+aGilXbQWMEyZ01iU1yin4jRInzjXPbIUgCsb0m/H2DBpCQ9pKoyRYcVQO uAN3K6QjKJK7jUH3fKX7u8mRr5stcyykKZibQMnTXRYlBxtDAd2GMxI/XEOTqExniV50 b1/FzLWDov6gb2rPbPuBNy2ijbO7Nqo3LG8Sc9g+P7Ou6sq1naa/i1HI15gd2wJZ5bF5 hlaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/bU4eqNFdU2lE3vHXhVtU+6k+gvjhCVnLFOnPaSb0Ak=; b=q90a0tco6B5ss62syeuytOyFP++JfqxbhB9Q0udLLBlz0bId5VX4PJlVwjOsdmkV/d FWssNgKlKn8VPadqSqrmixix280yiV2cEnsRH1Gqu2F2R4WhT/drC/jqbPGcNOgXndWB p5AOKvchFCMwdBrvg6Crg+yzvLHFSRBNtkOsj+20a0II0UrIfjzzKolmPe8Skox9I+v0 lNjqcTb9LyL1y+K5tfjeXgtXZ/RELJicT+4Uf5iEM9AYuHBhrKk4FFtkbEqAjfdSwLPy zzyqdAf9rsdRAteK1HX/vS9icKCvYQDhvGtMnpPEDy5bxu3dXyMUAr5VbQAFWgS6z7fA 5EdQ==
X-Gm-Message-State: AOAM531dRtcOkYw+BEUitvMecPxoXKsVnA6SCP/unVIuYLy7ID17W2s5 YBr8xQXy3zh5tEzuiKBMjxOZfTilXTzoHyxtKrlz8PgflsM=
X-Google-Smtp-Source: ABdhPJxSjFRQ7mBYfNeMxCO6xRs3x6h8B8IDfz0z8VehypxJTZGu3BCCpVPZ+WDJMRObLT3THqGOxw9VcsoR5B/XUhk=
X-Received: by 2002:a2e:95cd:: with SMTP id y13mr461265ljh.266.1603920863527; Wed, 28 Oct 2020 14:34:23 -0700 (PDT)
MIME-Version: 1.0
References: <160313815345.29014.16143591054021036590@ietfa.amsl.com> <CA+RyBmVwRPkmmAKTtoXU8FOoBDOpmt8ZDQkjhbiikqX8xv+-cQ@mail.gmail.com> <05d801d6ad31$6f9c8620$4ed59260$@olddog.co.uk>
In-Reply-To: <05d801d6ad31$6f9c8620$4ed59260$@olddog.co.uk>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Wed, 28 Oct 2020 14:34:12 -0700
Message-ID: <CA+RyBmXUtFuDs-m7cRRU8JoOhkwF1qi90neXXZgLAk=KLCn=8A@mail.gmail.com>
To: Adrian Farrel <adrian@olddog.co.uk>
Cc: Routing Directorate <rtg-dir@ietf.org>, draft-ietf-bess-mvpn-fast-failover.all@ietf.org, BESS <bess@ietf.org>, last-call@ietf.org
Content-Type: multipart/mixed; boundary="00000000000051ab8e05b2c1eddd"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/Jx1P11Y4qY0liKpRs_2kjCRhFLs>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-bess-mvpn-fast-failover-11
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Oct 2020 21:34:37 -0000

Hi Adrian,
many thanks for your detailed clarification and helpful suggestions. Please
find my follow-up notes under GIM2>> tag in green. Also, attached are the
new diff and working version of the draft.

Regards,
Greg

On Wed, Oct 28, 2020 at 6:51 AM Adrian Farrel <adrian@olddog.co.uk> wrote:

> Hello Greg,
>
>
>
> Thanks for this. I’m cutting down to places where we still need to
> interact. Look for [af] and blue text.
>
>
>
> Nothing alarming.
>
>
>
> Best,
>
> Adrian
>
>
>
> Section 3 notes that the procedure (presumably the procedure defined
> in this section) is OPTIONAL. I didn't see anything similar in sections
> 4 and 5 stating that those procedures are optional. Presumably, since
> this document is not updating any other RFCs, all of these procedures
> are optional.
>
> Actually it would be good to clarify how all these procedures fit in
> with "legacy" deployments, and how they are all optional procedures. I
> think that needs a short statement in the Introduction and a small
> section of its own (maybe between 6 and 7).
>
> GIM>> Thank you for the suggestion. I've updated the Introduction in this
> way:
>
> OLD TEXT:
>
>    Section 4 describes protocol extensions that can speed up failover by
>    not requiring any multicast VPN routing message exchange at recovery
>    time.
>
>    Moreover, section 5 describes a "hot leaf standby" mechanism, that
>    uses a combination of these two mechanisms.  This approach has
>    similarities with the solution described in [RFC7431] to improve
>    failover times when PIM routing is used in a network given some
>    topology and metric constraints.
>
> NEW TEXT:
>
>    Section 4 describes optional protocol extensions that can speed up
>    failover by not requiring any multicast VPN routing message exchange
>    at recovery time.
>
>    Moreover, Section 5 describes a "hot leaf standby" mechanism that can
>    be used to improve failover time in MVPN.  The approach combines
>    mechanisms defined in Section 3 and Section 4 has similarities with
>    the solution described in [RFC7431] to improve failover times when
>    PIM routing is used in a network given some topology and metric
>    constraints.
>
>
>
> I think that Section 5 is intended to explain how introduced BGP
> extensions and their use described in Section 3 and Section 4 enable
> operators to provide protection for multicast services. Would you suggest
> adding a new text to the section to highlight particular aspects of
> introducing protection in MVPN?
>
>
>
> [af] OK I obviously wasn’t clear. What I’m looking for is something like…
>
>
>
> The procedures described in this document are optional to enable an
> operator to provide protection for multicast services. An operator would
> enable these mechanisms using <foo> and it is assumed that these mechanisms
> would be supported by all <what?> in the network for the procedures to
> work. In the case that a BGP implementation does not recognise or is
> configured to not support the extensions defined in this document, it will
> respond <somehow> as described in <rfc????>. This would result in
> <something>.
>
GIM2>> I think I've got the idea now. Would appending the new paragraph to
the Introduction address your comment:
NEW TEXT:
   The procedures described in this document are optional to enable an
   operator to provide protection for multicast services in BGP/MPLS IP
   VPNs.  An operator would enable these mechanisms using a method
   discussed in Section 3 in combination with the redundancy provided by
   a standby PE connected to the source of the multicast flow, and it is
   assumed that all PEs in the network would support these mechanisms
   for the procedures to work.  In the case that a BGP implementation
   does not recognize or is configured to not support the extensions
   defined in this document, it will continue to provide the multicast
   service, as described in [RFC6513].

>
>
> It is curious (to me) that 3.1.1 describes a way to know that a P-tunnel
> is up.  You don't say, however, if being unable to determine that the
> P-tunnel is up using this method is equivalent to determining that the
> P-tunnel is down. (Previously in 3.1 you have talked about the "tunnel's
> state is not known to be down".)
>
> GIM>> This method, as noted in the document, is similar to BGP next-hop
> tracking, may be computationally intensive, and cannot be run frequently.
> So, in periods between checking whether the root address in the x-PMSI
> Tunnel attribute is reachable the state is "not known to be down".
>
>
>
> [af] Well, OK. Can you add to say that, “If it is not possible to
> determine whether the state of a tunnel is ‘up’, the state shall be
> considered as ‘not known to be down’, and it may be treated as if it is
> ‘up’ so that attempts to use the tunnel are acceptable.” This is probably
> “obvious to one skilled in the art,” but would help this reader.
>
GIM2>> Thank you for the contributed text. I've added in before "not known
to be Down" used in the text (with the yellowish background):
NEW TEXT:
   The procedure described here is an OPTIONAL procedure that is based
   on a downstream PE taking into account the status of P-tunnels rooted
   at each possible Upstream PE, for including or not including each
   given PE in the list of candidate UMHs for a given (C-S, C-G) state.
   If it is not possible to determine whether a P-tunnel's current
   status is Up, the state shall be considered "not known to be Down",
   and it may be treated as if it is Up so that attempts to use the
   tunnel are acceptable.  The result is that, if a P-tunnel is Down
   (see Section 3.1), the PE that is the root of the P-tunnel will not
   be considered for UMH selection.  This will result in the downstream
   PE failing over to use the next Upstream PE in the list of
   candidates.  Some downstream PEs could arrive at a different
   conclusion regarding the tunnel's state because the failure impacts
   only a subset of branches.  Because of that, procedures described in
   Section 9.1.1 of [RFC6513] MUST be used when using I-PMSI P-tunnels.
>
>
>
> By the way, do you ever say that a P-tunnel has just these two statuses
> (up and down) because that could make a big difference?
>
> GIM>> I think that the document then needs to discuss what impact
> detection time has on MVPN. For example, if the detection time is in
> single-digit seconds, a two-state model can be used. But would it be a
> useful model if the detection time is in tens of seconds? Should a "not
> known to be down" state be introduced?
>
>
>
> [af] Yes, that **seems** to be the implication. But is there any
> different action between “up” and “not known to be down”? If you have three
> states then there is (possibly) an implication that tunnels are prioritised
> by state. I think, however, that it is OK to use “not known to be down” as
> if it was “up”.
>
GIM2>> Thank you.

>
>
>
> Note that 3.1.2 etc also establish ways to know that the tunnel is up,
> but not ways to determine whether the tunnel is down.
>
> GIM>> In this section the state of a P-tunnel is equated with the state of
> the last link of that tunnel. The document notes that if the link is Up,
> then the P-tunnel is considered down. It is implied, that if it is
> determined that the link is Down, then the state of the P-tunnel is
> considered Down. Would you recommend adding an explanation to the document?
>
>
>
> To reiterate, "I don't know if it is up" is not the same as "I know it
> is down."
>
> GIM>> Indeed. It is analogous to "it was Up the last time I've checked on
> it". It meant to be used when the interval between checking is significant.
>
>
>
> [af] Assuming there is typo in what you just wrote – link not up è tunnel
> down.
>
> Also assuming that we don’t go down the “three state” path, then “not
> checked for a while” is still “Up”.
>
> I think it was the phrasing in 3.1.2. It sounded like “Here is a way to
> know that the tunnel is up” which is a good thing, but does not say that it
> is exclusive. So, avoiding the “implication” would be a good thing.
> Something like
>
> OLD
>
>    A condition to consider a tunnel status as Up can be that the last-
>
>    hop link of the P-tunnel is Up.
>
> NEW
>
>    A condition to consider a tunnel status as Up can be that the last-
>
>    hop link of the P-tunnel is Up.  Conversely, if the last-hop link of
>
>    the P-tunnel is Down then this can be taken as an indication that
>
>   the P-tunnel is Down.
>
> END
>
> GIM2>> Many thanks for the new text. Updated accordingly.

>
> 3.1.2
>
>    Using this method when a fast restoration mechanism (such as MPLS FRR
>    [RFC4090]) is in place for the link requires careful consideration
>    and coordination of defect detection intervals for the link and the
>    tunnel.  In many cases, it is not practical to use both protection
>    methods at the same time.
>
> OK, I considered them carefully. Now what? :-)
>
> I think you have to give implementation guidance.
>
> GIM>> I agree, an operational recommendation could be helpful. Usually, in
> case of multi-layered protection, detection intervals on the higher layer
> are 10 times of guaranteed restoration time of the lower layer. Would you
> recommend adding this to the text as an example of a deployment?
>
>
>
> [af] An example would be fine (and a forward reference from here). But it
> would be fine, maybe better, to offer half a sentence of guidance. So…”not
> practical to use both protection methods at the same time because <adverse
> interactions?>….”
>
GIM2>> Thank you for the explanation. Extended the last sentence:
NEW TEXT:
   In many cases, it is not practical to use both protection
   methods at the same time because uncorrelated timers might cause
   unnecessary switchovers and destabilize the network.

>
>
> All of 3.1.x are timid about the use of the mechanisms they describe.
>
> I think that the end of 3.1 should say that an implementation may choose
> to use any of these mechanisms to determine the status of the P-tunnel.
>
> GIM>> Will the following text reflect that:
>
> NEW TEXT:
>
>    An implementation may support any combination of the methods
>    described in this section and provide a network operator with control
>
>    to choose which one to use in the particular deployment.
>
>
>
> [af] Good.
>
>
>
> 3.1.6
>
> What should I do if I don't recognise or support the setting of the BFD
> Mode field?
>
> GIM>> I think that the same handling applies as for the malformed
> attribute:
>
>    If malformed, the UPDATE
>    message SHALL be handled using the approach of Attribute Discard per
>
>    [RFC7606].
>
> I propose to extend the applicability of the rule with the following
> update to the sentence:
>
> NEW TEXT:
>
>    The BFD Discriminator attribute MUST be considered malformed if its
>    length is not a non-zero multiple of four.  If the setting of the BFD
>    Mode field is not recognized or not supported, or the attribute
>    considered malformed, the UPDATE message SHALL be handled using the
>    approach of Attribute Discard per [RFC7606].
>
>
>
> [af] This is a bit subtle and refers also to my first point in this email.
> If the setting of the BFD Mode is not recognised or not supported, then it
> is likely because this specification is not supported. Therefore, this
> specification cannot mandate how the implementation will behave. I think
> you have to separate:
>
>    - The malformed SHALL be handled using Attribute Discard according to
>    [RFC7606]
>    - An unknown or unsupported attribute will be handled by
>    implementations according to the procedures for unknown attributes
>    described in <foo>
>
>
>
GIM2>> I thought of a different "unsupported" scenario. Consider BFD
Discriminator attribute is supported but an implementation does not
recognize the value in the BFD Mode field. In the case the BFD
Discriminator is unknown or unsupported procedures defined for the optional
transitive path attribute in Section 5 of RFC 4271 must be followed. I've
removed the recent updated and added the reference to RFC 4271 (added as
the Normative reference) following the definition of the new attribute:
NEW TEXT:
   This document defines the format and ways of using a new BGP
   attribute called the "BFD Discriminator".  It is an optional
   transitive BGP attribute.  An implementation that does not recognize
   or is configured not to support this attribute MUST follow procedures
   defined for optional transitive path attributes in Section 5 of
   [RFC4271].

> 4.1
>
>    The normal and the standby C-multicast routes must have their Local
>    Preference attribute adjusted
>
> Should this be "MUST"?
>
> GIM>> I think that is not an actionable 'must'. It could be expressed as
>
> The Local Preference attribute of the normal and the standby C-multicast
> route needs to be adjusted.
>
> Would you recommend using the re-worded passage?
>
>
>
> [af] The alternative text is good.
>
GIM2>> Done.

>
>
> ==Nits:==
>
> Section 3 has
>
>    Because of that, procedures described in Section 9.1.1 of [RFC6513]
>    MUST be used when using I-PMSI P-tunnels.
>
> Aren't those procedures already mandatory? That section of 6513 already
> uses "MUST" (although it oes go on to say that it might not be possible
> to apply the procedure and delegates processing to 9.1.2 and 9.1.3 -
> peculiarly using lowercase must for that delegation). I wonder whether
> you are saying "this case is covered by the procedures of Section 9.1.1
> of [RFC6513]" or are you actually defining new normative behaviour?
>
> GIM>> I think that the use of lower case 'must' is ambiguous and somewhat
> confusing. You are right, the intention is to refer to Section 9.1.1 as the
> mandatory behavior. But neither 9.1.2, nor 9.1.3 use the normative
> language. What would you recommend?
>
>
>
> [af] Maybe…
>
> “Because of that, the procedures of Section 9.1.1 of [RFC6513] are
> applicable. That document is a foundation for this document and its
> processes all apply here. Section 9.1.1 mandates the use of specific
> procedures for sending intra-AS I-PMSI A-D Routes.”
>
GIM2>> Thank you for the text. Updated accordingly with noting "when using
I-PMSI P-tunnels"
NEW TEXT:
Because of that, the procedures of
   Section 9.1.1 of [RFC6513] are applicable when using I-PMSI
   P-tunnels.  That document is a foundation for this document, and its
   processes all apply here.  Section 9.1.1 mandates the use of specific
   procedures for sending intra-AS I-PMSI A-D Routes.

>
>
> 4.1
>
>    As long as C-S is reachable via the Primary
>    Upstream PE and the Upstream PE is the Primary Upstream PE.
>
> This sentence doesn't seem to be complete. What is the consequence of
> this condition?
>
> GIM>> It suppose to be
>
>    As long as
>
>    C-S is reachable via the Primary Upstream PE, the Upstream PE is the
>    Primary Upstream PE.
>
> Is it better?
>
>
>
> [af] That makes sense
>
>
>