Re: [bess] Alvaro Retana's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Alvaro Retana <aretana.ietf@gmail.com> Thu, 17 December 2020 20:47 UTC

Return-Path: <aretana.ietf@gmail.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F41D3A1006; Thu, 17 Dec 2020 12:47:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YTzgKtQQhQX9; Thu, 17 Dec 2020 12:47:20 -0800 (PST)
Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 54A383A1003; Thu, 17 Dec 2020 12:47:19 -0800 (PST)
Received: by mail-ej1-x631.google.com with SMTP id x16so595ejj.7; Thu, 17 Dec 2020 12:47:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to :cc:content-transfer-encoding; bh=SsbBZI88t3sYj8Rm5EDAqiaQd0BMKztcBAhVvCs8MYU=; b=YqDhzII3wACv3b/JjIdJ4u1dQGV84beXac9TP1tJEY1Zs1KnsOpP+QdnDYnNFR2Ag7 9e0ar/bvNvtlMnKMbWKFTy+n6EODEjSZRFMaOAKviFnHKU3dq5WJdqze0mTvJkY4k+6A RyUL0hyCkqtysGU/rMk7M4Ki2bAjKUlXk2Y7BIgjSNsINjJGbZxBStSus4SzDvR48PWt t4G6aYWz8B7yZIrkIEhTjjLc2nY2tjfyhR4cbRlwnoc3iBNBUCYn56qPvZ0vBrrfuyt3 eDWozekbETirA3WFBarBEUqf7N9hOG7i9p0JPr1ddH4almqdLQmA7nJpeBFLl5jEtkU2 YIDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to:cc:content-transfer-encoding; bh=SsbBZI88t3sYj8Rm5EDAqiaQd0BMKztcBAhVvCs8MYU=; b=PEtVQRpyx9RfdT/al0kg8W5e6uQqmY2w6DnRzGBUnUxNj3jSdI/gn+dGxbXeuwuCJY h7NfIYvv5eJtX3zoHob0R7r662PyWIqM5/mH1FG2M0luJrHreII7qvvDu2zA0oWDkAGo w00fqqs5VtWfjKKtNEElNRAuUVIaAz1D/jemv4GxGvb9j9oF7Co/iN8MYalyN5a2xIup L1y92qvoNtRnQKPBVG3U1es8cepTwlxBkp9Di3LgSGN15Yz5muXJG6WC5S+zpsUUoKcs OIHBIU9+6hlKAjjEbVlJDOx5+T/a/Jtz0//tDUoUCEMYa8Kq0IBzvJRVHr+Hxcg+jH2e A36Q==
X-Gm-Message-State: AOAM530WsyXUQKeDb3TuAARQUlrif8k/ELp9n0QCANwBTBOZ+V3IWuQQ VXc3uXp43nFt+4xUYp/xqHCASZHjt+0zsX+7wmxMgR4/
X-Google-Smtp-Source: ABdhPJxHyFTY/SlbBfCM6GiRCl0RCV02tlM5cX1kt7WJiow3pEQbBuL0NTSK+1Lchovh1ybUrxZceFTp4yehs8Jj/vo=
X-Received: by 2002:a17:906:e247:: with SMTP id gq7mr852079ejb.27.1608238036964; Thu, 17 Dec 2020 12:47:16 -0800 (PST)
Received: from 1058052472880 named unknown by gmailapi.google.com with HTTPREST; Thu, 17 Dec 2020 15:47:16 -0500
From: Alvaro Retana <aretana.ietf@gmail.com>
In-Reply-To: <1336556383.1214634.1608220368883@mail.yahoo.com>
References: <1336556383.1214634.1608220368883.ref@mail.yahoo.com> <1336556383.1214634.1608220368883@mail.yahoo.com>
MIME-Version: 1.0
Date: Thu, 17 Dec 2020 15:47:16 -0500
Message-ID: <CAMMESsxqkuSMkKRt-q=PagiF8dRGda-MBAvpKGRsEXWqgbaR7w@mail.gmail.com>
To: "draft-ietf-bess-mvpn-fast-failover@ietf.org" <draft-ietf-bess-mvpn-fast-failover@ietf.org>
Cc: Stephane Litkowski <slitkows.ietf@gmail.com>, "bfd-chairs@ietf.org" <bfd-chairs@ietf.org>, "bess-chairs@ietf.org" <bess-chairs@ietf.org>, The IESG <iesg@ietf.org>, "bess@ietf.org" <bess@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/9mt0biPZttulQxMSypNdjM456nQ>
Subject: Re: [bess] Alvaro Retana's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 20:47:23 -0000

Hi!

For some reason the original DISCUSS message didn’t make it into
everyone’s mailbox (including mine), so I’m “hijacking” this reply to
resend the comments.

Note that the archive has the messages:
https://mailarchive.ietf.org/arch/msg/iesg/PByc1h2E-xSBnqXUtFTcltutEkQ/

Alvaro.


===



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) This document describes several methods to determine the status of a
tunnel (in §3), none of which "provide a "fast failover" solution when used
alone, but can be used together with the mechanism described in Section 4"
(§1).  §3 also says this:

   An implementation may support any combination of the methods
   described in this section and provide a network operator with control
   to choose which one to use in the particular deployment.

While §3.1 is clear in the fact that it is not a requirement for all
downstream PEs to use the same mechanism, there are no guidelines to aid the
operator to chose which mechanism to use.  Some cases may be obvious (e.g.
§3.1.3 applies to tunnels of a specific type), but others are not.  I would
like to see deployment considerations related to the advantages/disadvantages
that each method may have in specific situations (including their possible
combination).


(2) The BFD Discriminator Attribute has a very narrow application in this
document when compared to the potential other uses given the extensibility
possibilities related to bootstrapping BFD.  I have serious concerns about
the attribute being defined in this document, amongst a series of other
mechanisms.

(2a) The tunnel can be monitored without the new BGP Attribute (assuming
proper configuration of course).  Why is that option is not even mentioned in
the document?

In fact, the document recommends deleting the BFD session if the Attribute is
not present.  Why is that recommendation in place, and what are the cases when
it can not be followed?


(2b) The fact that BFD monitoring can be achieved without the new attribute
makes me think that the bootstrapping of BFD using BGP would be better served
in a document produced by the BFD WG.  One of the editors has expressed the
same opinion [1] [2].  Has a discussion taken place in the BFD WG (or at least
with the Chairs) about this work?  Why was it not taken up there?


[1] https://mailarchive.ietf.org/arch/msg/rtg-bfd/T1jVpgyXuPatTpuD_wA0JC3CT1c/
[2] https://tools.ietf.org/wg/bess/minutes?item=minutes-96-bess.html


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

(0) I support Ben's DISCUSS.


(1) This document (currently) specifies a new BGP attribute, but I couldn't
find any discussion about it in the idr mailing list.  Has idr been given the
opportunity to review?


(2) s/is an OPTIONAL procedure/is an optional procedure
This is not a normative statement to require capitalization.


(3) "VRF Route Import BGP attribute" is mentioned twice (§3 and §5), but it is
not an attribute; it is an extended community (rfc6514).


(4) §3.1.1: "similar to BGP next-hop tracking"  Is this specified somewhere?
I don't remember seeing a specification for next-hop tracking, but do know
that implementations do it -- in an implementation-specific way.  Please add a
little more text about what is meant/expected.


(5) §3.1.1: "If BGP next-hop tracking is done...and the root
address...[is]...the same as the next-hop address...then checking...whether
the tunnel root is reachable, will be unnecessary duplication...."  I guess
this means that the existing next-hop tracking functionality can be used and
that it doesn't have to be function-specific (root vs next-hop).  This
paragraph seems to assume a specific implementation -- but again, given that
next-hop tracking implementations are internal then this paragraph doesn't
make much sense to me.


(6) The "reachability condition" is mentioned in §3.1.1/§3.1.3/§3.1.4.  Does
this mean that that root tracking (§3.1.1) should be used with the other
mechanisms?  The specific text says that "the downstream PE can immediately
update its UMH when the reachability condition changes", giving the impression
that the combination is possible but not required.

Note that §4.3 is titled "Reachability Determination", which I hoped would
shed more light, but all it does is point back to §3.1.


(7) §3.1.2: What is the "last-hop link" of the tunnel?  Is it the physical
link, or the virtual hop from the previous waypoint?


(8) §3.1.2 mentions that "careful consideration and coordination" is needed
when using other mechanisms such as rfc4090 "because uncorrelated timers might
cause unnecessary switchovers and destabilize the network."  What are the
associated timers related to the mechanisms in this section?


(9) §3.1.3:

   When using this method and if the signaling state for a P2MP TE LSP
   is removed (e.g., if the ingress of the P2MP TE LSP sends a PathTear
   message) or the P2MP TE LSP changes state from Up to Down as
   determined by procedures in [RFC4875], the status of the
   corresponding P-tunnel MUST be re-evaluated.  If the P-tunnel
   transitions from Up to Down state, the Upstream PE that is the
   ingress of the P-tunnel MUST NOT be considered a valid UMH.

These two sentences are redundant.  It seems to me that they could be
replaced by:

   When using this method and if the signaling state for a P2MP TE LSP
   is removed (e.g., if the ingress of the P2MP TE LSP sends a PathTear
   message) or the P2MP TE LSP changes state from Up to Down as
   determined by procedures in [RFC4875], the Upstream PE that is the
   ingress of the P-tunnel MUST NOT be considered a valid UMH.


(10) §3.1.4: "An Upstream PE SHOULD be removed from the UMH candidate
list...if...the upstream one-hop branch of the tunnel from P to PE cannot be
built."   When is it ok to not remove the PE?  IOW, why is this action not
required?


(11) [nit] s/mechanism to execute its actions/mechanism execute its actions


(12) §3.1.5 says that "where this mechanism is used in conjunction with the
method described in Section 5...downstream PEs can compare reception on the
two P-tunnels to determine when one of them is down", but §5 says that
"downstream PEs accept traffic from the primary or standby tunnel, based on
the status of the tunnel (based on Section 3)".  IOW, §3.1.5 points at §5 as
providing a way to determine if a tunnel is down, while §5 points back at §3
as the way to determine which tunnel to receive from.  This pointing back and
forth is not a total contradiction, but it needs to be clarified.


(13) §3.1.6: "An implementation that does not recognize or is configured not
to support this attribute MUST follow procedures defined for optional
transitive path attributes in Section 5 of [RFC4271]."

There cannot be a Normative action specified for a node that "does not
recognize...this attribute" because, by definition, it can't be assumed that
it is aware of this specification.  In this case, it is not necessary to say
anything about unrecognized attributes because that is already specified in
rfc4271.

For the "configured not to support this attribute" case, it should be pointed
out that the node should operate as if the attribute was unrecognized.

Suggestion>
   An implementation that is configured not to support this attribute MUST
   follow the procedures defined in Section 5 of [RFC4271] as if the attribute
   was unrecognized.


(14) [nit] s/BFD Mode field is the one octet long./The BFD Mode field is one
octet long.


(15) §3.1.6: "The BFD Discriminator attribute MUST be considered malformed if
its length is not a non-zero multiple of four."  Ok, except that the
specification of the attribute doesn't mention the length (only the length of
the TLVs).  Please specify the length and any considerations related to the
Extended Length bit.  Also, given that this is a new attribute, with an
unspecified potential number of TLVs, and that the length is apparently
unbounded, all leading to the potential need for extended messages, please
specify how to handle peers that cannot accommodate more than 4k octet
messages (rfc8654).


(16) §3.1.6.1: "MUST set the IP destination address of the inner IP header to
one of the internal loopback addresses..."   Where do these addresses come
from?  How does the Upstream PE figure out which one to use?  At least please
include a reference to where that is explained.


(17) §3.1.6.1: "MUST use its IP address as the source IP address"  Which
address?  Please be specific.


(18) §3.1.6.2: If the IP address doesn't map correctly at the downstream PE
(for example, a different local address is used that doesn't correspond to the
information in the PMSI attribute), what action should it take?  Can the
tunnel still be monitored?


(19) §3.1.6.2: "SHOULD NOT switch the traffic to the Standby Upstream PE"
When is it ok to do it?  IOW, why is this action recommended, and not
required.


(20) §3.1.7: "set the bfd.LocalDiag of the P2MP BFD session to Concatenated
Path Down and/or Reverse Concatenated Path Down"    Which one?  bfd.LocalDiag
carries a single value.


(21) §3.1.7: "...it is desired for the downstream PEs to switch to a backup
Upstream PE.  To achieve that...it SHOULD set the bfd.LocalDiag..."  If not
set, then the objective won't be achieved.  When is it ok to not set the
bfd.LocalDiag to indicate the concatenated failure?


(22) §4: "Such behavior is referred to as "revertive" behavior and MUST be
supported."  The text around this sentence seems to indicate that the
revertive behavior is the default, is that the intent?  Or if the intent for
it just to be supported (as written)?  Please be clear.


(23) §4.1: "...routes that carry the "Standby PE" BGP Community MUST have the
LOCAL_PREF attribute set to zero."  What should a receiver do if the
LOCAL_PREF is not zero?


(24) §4.1: In the last paragraph of this section, if I follow correctly, the
text talks about the case where the standby becomes the primary and the
updated advertisement doesn't have the Standby PE community.  If that is
correct, then s/ presence/absence of the Standby PE BGP Community/ absence of
the Standby PE BGP Community

Also, the last sentence says that the "LOCAL_PREF attribute MUST be set to
zero".  If the community is not present, how can a receiver enforce this?
What action should it take if the LOCAL_PREF has a different value?


(25) §4.3: "other mechanisms MAY be used"   s/MAY/may   This is just a
statement of fact.


(26) §4.4.2: "MUST try to locate"   To try is not an action that can be
normatively enforced.   Also, I don't think that using "MUST" here adds value
since the normative action is in the next sentence ("MUST perform as
follows"). s/MUST/must


(27) s/"hot root standby" mode is used (Section 4)/"hot root standby" mode is
used (Section 5)


(28) §7.3: s/sub-TLV/TLV/g   The attribute includes TLVs, not sub-TLVs.