Re: Conclusion of the discussion on draft-mirsky-bfd-mpls-demand?

Greg Mirsky <gregimirsky@gmail.com> Fri, 21 August 2020 02:01 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEB183A0FF2 for <rtg-bfd@ietfa.amsl.com>; Thu, 20 Aug 2020 19:01:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.696
X-Spam-Level:
X-Spam-Status: No, score=-0.696 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_COMMENT_SAVED_URL=1.391, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_HTML_ATTACH=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fTcqYTgBPpVu for <rtg-bfd@ietfa.amsl.com>; Thu, 20 Aug 2020 19:00:59 -0700 (PDT)
Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6BE603A1503 for <rtg-bfd@ietf.org>; Thu, 20 Aug 2020 19:00:58 -0700 (PDT)
Received: by mail-lj1-x230.google.com with SMTP id g6so150904ljn.11 for <rtg-bfd@ietf.org>; Thu, 20 Aug 2020 19:00:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gRUtVoG5HYg8fy82gtE6sBp5uA1XkNQCAiSsURzfar4=; b=aArd5+e3xbFkHVuAQao2nQCL4AnHai+ciw3D4uU67/df0v7sGyamZCrazVIVhf2CuN 9E7gga+nifs1LAW2a9FDLK7TLMI0oxOZlXDTcGwZfrcfFueMV16YEf8VHdCKN//eAJv/ cNAswgWRCOELF6tjfqJZrflUUhiZ4YQbOyOxBKWWW+KnrtV0uwnNxDj6GK7kXdDcTw5F KwOx20xxLWByVdoH9IrWvYVNkc46NBeUsUl9Duk8FCrBJG9ucEu7Am4gJ5RokCEGqvJE jJPqxRdhqF9Xf6f9l7ljDJn88GCrS3SU/dS6P0tE5XTakM12ocgthUvn3q4Ee5ZxNsRp 9ojw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gRUtVoG5HYg8fy82gtE6sBp5uA1XkNQCAiSsURzfar4=; b=BgQi9kcEH1AqSLH2KSh99nClTiU1REqaPlxXtW0qZE2l2WfOuWTaZZHQnYtSyOxmzA WRNGdMKPQNP33eHa+rA2fMbwiz+iU2C/7nOxS7n9zHQ+Gc8Z+WG/yXpuX40rRx5EkuUz +ZT7lGLi51rla38gToPSpxMx9ey7sj+6UT2R4rmpd6IP1a816D3wZsjMLFav7OwdGGHF jOGdIlo/SvsLFaSPza4+8ZOi7hlFj0xd0y9YhB5Wmc1Ck7TGdwLqpxf/tpccfE22cCHJ wjnWAzAJtt51sQSSihHgzezJmXKUGpjuvy+4CNUL+uIDzYZkoFwjDHdIrxfpKtAkrfoR AWMg==
X-Gm-Message-State: AOAM531zbzoVSf+3WIR14YMO4G5f2Tg45Iec6/zOo45mvk6/XhkbNLa0 VJf1KiGG/ZrK8fyc9xqiq4ABP+h9XBqz7Sdli2V6Jca5+qs=
X-Google-Smtp-Source: ABdhPJzten8FRBNGX5Xs69GjSHBNceNg6csF9+t3m2CaVdEXZF3cT4XLm0IBXbfAQKFs+rwlcw8yZ1noXZfbp1HLc7Y=
X-Received: by 2002:a2e:b8cb:: with SMTP id s11mr400230ljp.110.1597975256357; Thu, 20 Aug 2020 19:00:56 -0700 (PDT)
MIME-Version: 1.0
References: <CA+RyBmXCffDUHfZiwPb_ODjiQTDpnQJs0uJb-5oZS8okSdV8Ew@mail.gmail.com> <20200804003817.GA15350@pfrc.org> <20200804193427.GE31729@pfrc.org> <20200818181430.GG1696@pfrc.org>
In-Reply-To: <20200818181430.GG1696@pfrc.org>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 20 Aug 2020 19:00:44 -0700
Message-ID: <CA+RyBmULVE45m859czznxhPVWSogKvqr2D7_54T2MG94u+H_xg@mail.gmail.com>
Subject: Re: Conclusion of the discussion on draft-mirsky-bfd-mpls-demand?
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: rtg-bfd WG <rtg-bfd@ietf.org>
Content-Type: multipart/mixed; boundary="00000000000083ffe805ad599b21"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/P3p_5lzKRLsN4m548X-FGynn0Mc>
X-Mailman-Approved-At: Thu, 20 Aug 2020 20:26:03 -0700
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Aug 2020 02:01:03 -0000

Hi Jeff, et al.,
thank you for your thorough review and the most detailed comments, all is
greatly appreciated. Please find my notes in-lined and tagged by GIM>>.
I've updated the draft and you can review the updates in the attached diff.
I much appreciate comments on the updates.

Regards,
Greg

On Tue, Aug 18, 2020 at 11:03 AM Jeffrey Haas <jhaas@pfrc.org> wrote:

> Greg,
>
> Thank you for your patience.
>
> On Thu, Jul 30, 2020 at 09:00:38AM -0700, Greg Mirsky wrote:
> > Dear All,
> > I much appreciate it if you can share the conclusion of the discussion of
> > the draft-mirsky-bfd-mpls-demand.
> >
> > Regards,
> > Greg
>
> The BFD Working Group chairs and Area Director have reviewed
> draft-mirsky-bfd-mpls-demand.  The chairs had originally issued a working
> group
> adoption call without having read the document.  Upon review, it was
> determined that the majority of the text in the draft mostly restated
> existing BFD Demand mode procedure.
>
> The chairs apologize for not having done sufficient vetting prior to
> starting the adoption process and causing the confusion that followed.
>
> The majority of the draft covers a re-statement of existing BFD procedure
> and obscures the potential request for normative protocol changes.  This
> response is split into two sections: The first portion covers procedure
> that
> is a restatement of RFC 5880 Demand behaviors with a few possible
> non-intended variances.  The second portion covers a potential change to
> BFD Demand behavior and may be reason to continue working group discussion.
>
> It's noted that a likely motivation for this draft comes from the following
> statement in RFC 5884, §6 "Session Establishment":
>
>     #   A BFD session is bootstrapped using LSP Ping.  This specification
>     #   describes procedures only for BFD asynchronous mode.  BFD demand
> mode
>     #   is outside the scope of this specification.
>
> While "outside the scope", the procedures for exercising Demand mode are
> covered largely in detail in RFC 5880.
>
GIM>> I agree that the Demand mode is defined in RFC 5880.
draft-mirsky-bfd-mpls-demand is intended to discuss the applicability of
BFD in Demand over the MPLS LSP. The updates to the draft are to remove
unnecessary re-statements of RFC 5880 by providing references to,
primarily, Section 6.6 of RFC 5880.

>
> --------------------------------------------------------------------------
>
> In the following response, ':' blockquotes are from
> draft-mirsky-bfd-mpls-demand and '#' blockquotes are from the cited RFC.
>
> The text of the document and the matching procedures from RFC 5880 follow:
>
> : 3.  Use of the BFD Demand Mode
> :
> :    [RFC5880] defines that the Demand mode MAY be:
> :
> :    o  asymmetric, i.e. used in one direction of a BFD session;
> :
> :    o  switched to and from without bringing BFD session to Down state
> :       through using a Poll Sequence.
>
> RFC 5880 §6 "Demand Mode" reads:
>     #   Demand mode MAY be enabled or disabled at any time, independently
> in
>     #   each direction, by setting or clearing the Demand (D) bit in the
> BFD
>     #   Control packet, without affecting the BFD session state.  Note that
>     #   the Demand bit MUST NOT be set unless both systems perceive the
>     #   session to be Up (the local system thinks the session is Up, and
> the
>     #   remote system last reported Up state in the State (Sta) field of
> the
>     #   BFD Control packet).
>     #
>     #   When the transmitted value of the Demand (D) bit is to be changed,
>     #   the transmitting system MUST initiate a Poll Sequence in
> conjunction
>     #   with changing the bit in order to ensure that both systems are
> aware
>     #   of the change.
>
> The poll sequence is defined in RFC 5880 §5 "The Poll Sequence":
>     #   A Poll Sequence consists of a system sending periodic BFD Control
>     #   packets with the Poll (P) bit set.  When the other system receives
> a
>     #   Poll, it immediately transmits a BFD Control packet with the Final
>     #   (F) bit set, independent of any periodic BFD Control packets it may
>     #   be sending (see section 6.8.7).  When the system sending the Poll
>     #   sequence receives a packet with Final, the Poll Sequence is
>     #   terminated, and any subsequent BFD Control packets are sent with
> the
>     #   Poll bit cleared.  A BFD Control packet MUST NOT have both the Poll
>     #   (P) and Final (F) bits set.
>
> :    For the case of BFD over MPLS LSP, ingress Label switching Edge
> :    Router (LER) usually acts as Active BFD peer and egress LER acts as
> :    Passive BFD peer.  The Active peer bootstraps the BFD session by
> :    using LSP ping.  Once the BFD session is in Up state the ingress LER
> :    that supports this specification MUST switch to the Demand mode by
> :    setting Demand (D) bit in its Control packet and initiating a Poll
> :    Sequence.  If the egress LER supports this specification it MUST
> :    respond with the Final (F) bit set in its BFD Control packet sent to
> :    the ingress LER and ceases further transmission of periodic BFD
> :    control packets to the ingress LER.
>
> The procedure above is covered by core RFC 5880 procedures as above.  The
> one item of interest here not part of the specification is "MUST switch".
> Effectively, an optional procedure normally covered by configuration mode
> or
> application profile is mandated by this document.
>
GIM>> Yes, it is assumed that an implementation that supports this
specification will provide control to select between BFD Asynchronous (RFC
5884) and BFD Demand modes. If the latter mode selected, the use of the
normative form appears appropriate. Below is the updated text:
   If the BFD session is configured to use the Demand
   mode, once the BFD session is in Up state the ingress LER MUST switch
   to the Demand mode as defined in Section 6.6 [RFC5880].  The egress
   LER also follows procedures defined in Section 6.6 [RFC5880] and
   ceases further transmission of periodic BFD control packets to the
   ingress LER.

>
> :    In this state BFD peers MAY remain as long as the egress LER is in Up
> :    state.  The ingress LER MAY check liveness of the egress LER by
> :    setting the Poll flag.  The egress LER will respond by transmitting
> :    BFD control packet with the Final flag set.  If the ingress LER
> :    doesn't receive BFD packet with the Final flag from its peer after
> :    the predetermined period of time, default wait time recommended 1
> :    second, the ingress MAY transmit another packet with the Poll flag
> :    set.  If ingress doesn't receive BFD control packet with the Final
> :    flag set in response to three consecutive packets with Poll flag, it
> :    MAY declare the BFD peer non-responsive and change state of the BFD
> :    session to Down state.
>
> RFC 5880 §6 "Demand Mode" further reads:
>     #   When a system in Demand mode wishes to verify bidirectional
>     #   connectivity, it initiates a Poll Sequence (see section 6.5).  If
> no
>     #   response is received to a Poll, the Poll is repeated until the
>     #   Detection Time expires, at which point the session is declared to
> be
>     #   Down.  Note that if Demand mode is operating only on the local
>     #   system, the Poll Sequence is performed by simply setting the Poll
> (P)
>     #   bit in regular periodic BFD Control packets, as required by section
>     #   6.5.
>     #
>     #   The Detection Time in Demand mode is calculated differently than in
>     #   Asynchronous mode; it is based on the transmit rate of the local
>     #   system, rather than the transmit rate of the remote system.  This
>     #   ensures that the Poll Sequence mechanism works properly.  See
> section
>     #   6.8.4 for more details.
>     #
>     #   [...]
>     #
>     #   When the transmitted value of the Demand (D) bit is to be changed,
>     #   the transmitting system MUST initiate a Poll Sequence in
> conjunction
>     #   with changing the bit in order to ensure that both systems are
> aware
>     #   of the change.
>
> The procedure above documents how to use a poll sequence to verify liveness
> while in Demand mode.  The calculation of the Detection time for Demand
> mode
> while undergoing a Poll sequence is referred to in §6.8.4.
>
> draft-mirsky-bfd-mpls-demand procedure cited above, in particular the "1
> second" time is a variance against the Poll sequence procedure.
>
> The "three consecutive packets" is similarly a variance against the core
> BFD
> procedures, where this is covered by the Detect Multiplier.
>
GIM>> I've replaced the restatement with the reference to Section 6.6. The
updated text is below:
    In this state BFD peers MAY remain as long as the egress LER is in Up
   state.  The ingress LER SHOULD periodically check continuity of a
   bidirectional path between the ingress and egress LERs by using the
   Poll Sequence, as described in Section 6.6 [RFC5880].  An
   implementation that supports using the Poll Sequence as the mechanism
   for bidirectional path continuity check MUST be able to control the
   interval between consecutive Poll Sequences.  The RECOMMENDED default
   value is 1 second.
>
>
> With respect to declaring the session Down as part of a Poll sequence in
> Demand Mode, §6.8.4 has the following text:
>
>     #   If Demand mode is active, and a period of time equal to the
> Detection
>     #   Time passes after the initiation of a Poll Sequence (the
> transmission
>     #   of the first BFD Control packet with the Poll bit set), the session
>     #   has gone down -- the local system MUST set bfd.SessionState to
> Down,
>     #   and bfd.LocalDiag to 1 (Control Detection Time Expired).
>
>
> :    If the Detection timer at the egress LER expires it MUST send BFD
> :    Control packet to the ingress LER with the Poll (P) bit set, Status
> :    (Sta) field set to Down value, and the Diagnostic (Diag) field set to
> :    Control Detection Time Expired value.  The egress LER sends these
> :    Control packets to the ingress LER at the rate of one per second
> :    until either it receives the valid for this BFD session control
> :    packet with the Final (F) bit set from the ingress LER or the defect
> :    condition clears and the BFD session state reaches Up state at the
> :    egress LER.
>
> >From the perspective of the egress LER, standard Async BFD without Demand
> is
> still running.  The following text from §6.6, "Demand Mode", applies:
>
>     #   If Demand mode is active on either or both systems, a Poll Sequence
>     #   MUST be initiated whenever the contents of the next BFD Control
>     #   packet to be sent would be different than the contents of the
>     #   previous packet, with the exception of the Poll (P) and Final (F)
>     #   bits.  This ensures that parameter changes are transmitted to the
>     #   remote system and that the remote system acknowledges these
> changes.
>
> Again, the "1 second" time is a variance against RFC 5880.
>
> Prior list e-mail and IETF working group session discussion suggested that
> the last sentence above leads to the possible conclusion that this this is
> only done for "parameter changes".  However, the leading sentence clearly
> covers "whenever the contents [...] would be different".  Discussion among
> the chairs and the AD suggest that the last sentence is not intended to
> specify a normative behavior in restricting to "configuration".
>

> RFC 5880 is largely structured around the BFD PDU contents reflecting the
> "State Variables" documented in §6.8.1 and similar variables in the
> extension documents.  The "parameters" reference in §5.6 is intended to
> refer to such state variables as instantiatied in the PDU.
>
GIM>> I agree that the text might be interpreted as explained in your
comment. I believe that that indicates that the original text is not
definitive and clarifying the procedure in draft-mirsky-bfd-mpls-demand is
helpful to implementors.   I've updated this paragraph as follows:
   If the Detection timer at the egress LER expires it MUST send BFD
   Control packet to the ingress LER with the Poll (P) bit set, Status
   (Sta) field set to Down value, and the Diagnostic (Diag) field set to
   Control Detection Time Expired value.  The egress LER periodically
   transmits these Control packets to the ingress LER until either it
   receives the valid for this BFD session control packet with the Final
   (F) bit set from the ingress LER or the defect condition clears and
   the BFD session state reaches Up state at the egress LER.  An
   implementation that supports this specification MUST provide control
   of the interval between consecutive Poll messages signaling the
   expiration of the Detection timer.  The RECOMMENDED default value of
   the interval is 1 second.

>
> :    The ingress LER transmits BFD Control packets over the MPLS LSP with
> :    the Demand (D) flag set at negotiated interval per [RFC5880], the
> :    greater of bfd.DesiredMinTxInterval and bfd.RemoteMinRxInterval,
> :    until it receives the valid BFD packet from the egress LER with the
> :    Poll (P) bit and the Diagnostic (Diag) field value Control Detection
> :    Time Expired.  Reception of such BFD control packet by the ingress
> :    LER indicates that the monitored LSP has a failure and sending BFD
> :    control packet with the Final flag set to acknowledge failure
> :    indication is likely to fail.
>
> Here we're just using standard BFD procedure.  If the session is Up,
> transmit appropriately.  Don't restate protocol.
>
GIM>> I couldn't find explicit text in Section 6.5 of RFC 5880 that
suggests that a BFD system analyzes the Diag field of the received BFD
Control message with Poll bit set. The very first sentence of the section
refers to "parameter change":
   A Poll Sequence is an exchange of BFD Control packets that is used in
   some circumstances to ensure that the remote system is aware of
   parameter changes.
RFC 5880 does not explicitly define the interpretation of "parameters" and,
as a result, it leaves somewhat ambiguous. I believe that documenting the
procedure is useful and ensures interoperability among independent
implementations.

>
> --------------------------------------------------------------------------
>
> This next section potentially proposes new behavior:
>
> :                                   Instead, the ingress LER transmits the
> :    BFD Control packet to the egress LER over the IP network with:
> :
> :    o  destination IP address MUST be set to the destination IP address
> :       of the LSP Ping Echo request message [RFC8029];
> :
> :    o  destination UDP port set to 4784 [RFC5883];
> :
> :    o  Final (F) flag in BFD control packet MUST be set;
> :
> :    o  Demand (D) flag in BFD control packet MUST be cleared.
> :
> :    The ingress LER changes the state of the BFD session to Down and
> :    changes rate of BFD Control packets transmission to one packet per
> :    second.  The ingress LER in Down mode changes to Asynchronous mode
> :    until the BFD session comes to Up state once again.  Then the ingress
> :    LER switches to the Demand mode.
>
> The behavior here covers the fact that the underlying MPLS LSP is no longer
> usable - the egress LSR detected a failure in the receipt of the BFD PDUs
> from the ingress LSR.
>
> Since Demand mode was enabled, how does the ingress LSR know that the
> session is Down from the perspective of the egress?  The procedures
> documented above per RFC 5880 will have the egress initiating the Poll
> sequence to do the state transition.
>
> But similarly, since Demand mode is enabled, the only way for the egress
> LSR
> to know that the session has gone Down from the perspective of the ingress
> LSR is for it to receive the response to the Poll.  Thus, the above
> procedure attempts to suggest reaching the egress LSR using BFD multi-hop
> procedures.
>
GIM>> I'd note that the egress LSR detects the failure when its Detection
timer expires. It then starts the Poll sequence and uses the Diag field to
inform the ingress LER of the failure using BFD multi-hop mode (as per RFC
5884). The ingress LER uses BFD multi-hop when it sends a BFD Control
message with the Final flag set to conclude the Poll sequence.

>
> Chairs commentary:
> ------------------
>
> This procedural point is potentially worth WG discussion and potentially a
> motivation to advance this draft.  The discussion will largely involve what
> existing implementations of RFC 5884 already do in circumstances where the
> ingress path has gone down.  Even without Demand mode being active, the
> egress LSR will still transition to Down and signal toward the ingress LSR.
> And similarly, the ingress no longer has a valid forward path to carry its
> acknowledgement of session is in the Down state to the egress LSR.
>
GIM>> It is an interesting observation, thank you. I think that in the
context of RFC 5884, the egress LER uses periodic BFD control messages to
signal failure detection to the ingress system rather than a Poll sequence.
If this is correct, the ingress LER does not need to transmit a BFD control
packet out-of-band, i.e., using a BFD multi-hop path.

>
> The fault observed here is really with RFC 5884 procedures rather than
> specifically the Demand mode.
>
> Perversely, it could be observed that in the absence of MPLS encapsulation
> and the use of Demand mode that the above considerations still apply:  When
> a receiver is in Demand mode, needs to transition state, notifies the
> sender
> by changing its local state, it may not receive the acknowledgment that the
> sender has received and processed that state.  This can lead to a similarly
> stale session.
>
GIM>> Thank you for pointing to the more general case.

>
> What we thus have are two conditions wherein it's not possible to fully
> clean up a BFD session that has locally determined it is Down.  Existing
> implementations must already deal with this in some fashion, and that
> discussion on the list may be fruitful.
>
GIM>> Thank you for highlighting this. I'll look at what I can find and
share the results with the group.