Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Mon, 16 December 2019 23:43 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: rtg-bfd@ietf.org
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 9607F120045; Mon, 16 Dec 2019 15:43:13 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bfd-vxlan@ietf.org, Jeffrey Haas <jhaas@pfrc.org>, bfd-chairs@ietf.org, jhaas@pfrc.org, rtg-bfd@ietf.org
Subject: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)
X-Test-IDTracker: no
X-IETF-IDTracker: 6.113.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <157653979360.24617.1864402887480503965.idtracker@ietfa.amsl.com>
Date: Mon, 16 Dec 2019 15:43:13 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/IQMeT4Ju6eoEEjSJ26D4N12g5Wo>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Dec 2019 23:43:14 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bfd-vxlan-09: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bfd-vxlan/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I have a few points that I think merit IESG discussion.

(1) I see that several directorate reviewers expressed unease at the
destination (IP and) MAC address assignment procedure for the inner
VXLAN headers, and appreciate that there was extensive on-list
discussion (more than I could follow).  That said, I failed to find a
clear statement of why the current text is believed to be safe, and in
fact my reading of the current text is that the described procedure is
*not* safe.  Pointers to key parts of the WG discusison would be more
than welcome!

To take something of a high-level view of my concerns, if we think of
the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant
traffic, then what we're trying to do is roughly like BFD between VTEPs,
but we want to get fault-detection over as broad a coverage as we can
(the "outermost part of the tunnel"), so we want to have the option of
per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP).
However, we end up having to do this by trying to insert a thin filter
into the tenant's address space (i.e., the inner VXLAN header) and pick
out the specific stream of BFD traffic that we're introducing.  This is,
in some sense, a namespace grab in what is conceptually the tenant's
namespace, and we have to be careful that what we do is either
guaranteed to not impact the tenant or well-documented and
compartmentalized (akin to the "well-known URIs").

I've made comments at several places in the document that are more
directly tied to specific pieces of text, but in general, if we assume
that the tenant can add/remove new addresses at will within their VXLAN
abstration, then any attempt to preconfigure by mutual agreement the BFD
addresses to use at the VTEPs or to use the VTEP's normal (outer)
address as the sentinel value seems subject to the tenant coming in and
subsequently trying to use that address, leading to (some of) the
tenant's traffic getting silently filtered and interpreted by the VTEP.
If we were using domain names as identifiers, we could allocate
something under .arpa or similar, but I think our options are more
limited when numerical addresses are used.

The option suggested by the rtg-dir reviewer of always using the
management VNI does not suffer from this namespacing issue, though I
recognize that it does reduce the scope over which fault-detection is
available, for the cases when different VNIs' traffic are routed or
handled differently.

(2) Section 6 says:

                                                         The selection
   of the VNI number of the Management VNI MUST be controlled through
   management plane.  An implementation MAY use VNI number 1 as the
   default value for the Management VNI.  All VXLAN packets received on
   the Management VNI MUST be processed locally and MUST NOT be
   forwarded to a tenant.

It seems like the management VNI concept is something that would apply
to the entire VXLAN deployment and not just to the BFD-using portions;
is this already defined somewhere (in which case we should reference
it), or is it new with this document?  In the latter case wouldn't it be
an update to the core VXLAN spec?  (I note that there are some
procedural hoops to jump through for an IETF-stream document to update
an ISE-stream document...)


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1

   In the case where a Multicast Service Node (MSN) (as described in
   Section 3.3 of [RFC8293]) resides behind a Network Virtualization
   Endpoint (NVE), the mechanisms described in this document apply and
   can, therefore, be used to test the connectivity from the source NVE
   to the MSN.

I'm not sure that I'm parsing "resides behind" properly.  Is the idea
that the multicast traffic starts off at a tenant-system source, hits a
NVE gateway to enter the VXLAN, traverses the VXLAN a bit before getting
to the MSN, and is replicated from the MSN to various NVE termini?  I
think I'd be less confused if this was described as "participates in the
VXLAN" or "is part of the virtualized environment", as the current
"behind" wording makes me think of a firewall-like topology where the
NVE behind which the MSN resides will be decapsulating traffic.

   This document describes the use of Bidirectional Forwarding Detection
   (BFD) protocol to enable monitoring continuity of the path between
   VXLAN VTEPs, performing as Network Virtualization Endpoints, and/or
   availability of a replicator multicast service node.

All the commas here potentially make the parsing ambiguous; assuming
that the "performing as Network Virtualization Endpoints" is just
describing the VXLAN VTEPs, I'd suggest do drop the first comma and
instead join those clauses with "that are".

Section 3

   between the same pair of VTEPs.  BFD packets intended for a VTEP MUST
   NOT be forwarded to a VM as a VM may drop BFD packets leading to a
   false negative.  This method is applicable whether the VTEP is a

[This "MUST NOT" is a very strict requirement, so we have to be sure that
it's achievable without disruption to tenant traffic, per the Discuss
point]

   At the same time, a service layer BFD session may be used between the
   tenants of VTEPs IP1 and IP2 to provide end-to-end fault management.
   In such case, for VTEPs BFD Control packets of that session are
   indistinguishable from data packets.

nit(?): I suggest s/indistinguishable from/regular/ -- the tenants' BFD
sessions are just regular data to the VXLAN infrastructure, though IIUC
a VTEP could, if so inclined, peek inside and "distinguish" them from
non-BFD tenant data based on on heuristics and packet format.

   0:0:0:0:0:FFFF:7F00:0/104 range for IPv6).  There could be a firewall
   configured on VTEP to block loopback addresses if set as the
   destination IP in the inner IP header.  It is RECOMMENDED to allow
   addresses from the loopback range through a firewall only if it is
   used as the destination IP address in the inner IP header, and the
   destination UDP port is set to 3784 [RFC5881].

I think we should reword this to make it clear that the default behavior
is still "block all incoming traffic with loopback destination" and that
the exception is tightly scoped to the encapsulated VXLAN traffic
discussed in this document and the specific destination port *and when
BFD has been configured for the VTEP*.  I note that well-known ports are
not reserved ports, and we have no guarangee that only a BFD
implementation would be listening on port 3784.
I think the rewording would include some phrasing like "RECOMMENDED that
the only firewall exception to allow incoming traffic with destination
address from the loopback range is when [...]", and of course, mention
the need to have BFD configured.

Section 4

   VXLAN packet.  The choice of Destination MAC and Destination IP
   addresses for the inner Ethernet frame MUST ensure that the BFD
   Control packet is not forwarded to a tenant but is processed locally
   at the remote VTEP.  [...]

This has to be 100% reliable, and I think we need to provide some
example mechanism that has that property even if we don't mandate that
it be the only allowed mechanism.

         Destination MAC: This MUST NOT be of one of tenant's MAC
         addresses.  The destination MAC address MAY be the address

But the tenant can start using new MAC addresses at any time!  How is
BFD-over-VXLAN going to dynamically detect and avoid that?

         associated with the destination VTEP.  The MAC address MAY be
         configured, or it MAY be learned via a control plane protocol.
         The details of how the MAC address is obtained are outside the
         scope of this document.

This all talks about the MAC address being relatively static
configuration, but per above, I don't think that's safe in the face of a
MUST-level requirement to avoid conflicting with tenant MAC addresses.

      IP header:

         Destination IP: IP address MUST NOT be of one of tenant's IP
         addresses.  The IP address SHOULD be selected from the range
         127/8 for IPv4, for IPv6 - from the range
         0:0:0:0:0:FFFF:7F00:0/104.  Alternatively, the destination IP
         address MAY be set to VTEP's IP address.

As for MAC addresses, can't the tenant start using new ones at any time?
Loopback is mostly safe in that the tenant generally shouldn't expect
incoming traffic to that destination address ... but what if the tenant
is also using a BFD scheme that expects incoming (single-hop) packets to
loopback as an exception to RFC 1122?
nit: please use a parallel grammatical construction for describing the
IPv4 and IPv6 recommended behavior.

         TTL or Hop Limit: MUST be set to 1 to ensure that the BFD
         packet is not routed within the Layer 3 underlay network.  This
         addresses the scenario when the inner IP destination address is
         of VXLAN gateway and there is a router in underlay which
         removes the VXLAN header, then it is possible to route the
         packet as VXLAN  gateway address is routable address.

nit: the grammar here is a bit wonky; I think the following preserves
the meaning with better grammar:

%        TTL or Hop Limit: MUST be set to 1 to ensure that the BFD
%        packet is not routed within the Layer 3 underlay network.  This
%        addresses the scenario where the inner IP destination address is
%        that of a VXLAN gateway and there is a router in the underlay
%        that removes the VXLAN header; in such cases it is possible for
%        the packet to be routed, as the VXLAN gateway's address is a
%        routable address.

Section 5

   Once a packet is received, VTEP MUST validate the packet.  If the
   Destination MAC of the inner Ethernet frame matches one of the MAC
   addresses associated with the VTEP the packet MUST be processed
   further.  If the Destination MAC of the inner Ethernet frame doesn't

What prevents the scenario where the MAC address associated with the
VTEP is also in use by the tenant?

   match any of VTEP's MAC addresses, then the processing of the
   received VXLAN packet MUST follow the procedures described in
   Section 4.1 [RFC7348].  If the BFD session is using the Management
   VNI (Section 6), BFD Control packets with unknown MAC address MUST
   NOT be forwarded to VMs.

nit: either "an unknown" or "MAC addresses"

   The UDP destination port and the TTL of the inner IP packet MUST be
   validated to determine if the received packet can be processed by
   BFD.

Can you give a pointer to or description of what this validation
consists of?

Section 5.1

   case of VXLAN, the VNI number identifies that logical link.  If BFD
   packet is received with non-zero Your Discriminator, then BFD session
   MUST be demultiplexed only with Your Discriminator as the key.

nits: "If a BFD packet", "then the BFD session"

Section 6

   In most cases, a single BFD session is sufficient for the given VTEP
   to monitor the reachability of a remote VTEP, regardless of the
   number of VNIs.  When the single BFD session is used to monitor the
   reachability of the remote VTEP, an implementation SHOULD choose any
   of the VNIs.  An implementation MAY support the use of the Management

nit: I feel like this is trying to say that the choice is arbitrary and it
doesn't matter which one is picked, but "SHOULD choose any of" is more
of a recommendation to make a choice than guidance on how to make that
choice, as written.

Section 9

I think we need to discuss the risk/potential consequences of a VTEP
failing to properly filter BFD traffic and incorrectly passing it
through to the tenant.

Relatedly, I'd also consider discussing the case of a mixed deployment
where one peer attempts to speak BFD-VXLAN to a peer that does not
implement that mechanism.

   The document requires setting the inner IP TTL to 1, which could be
   used as a DDoS attack vector.  Thus the implementation MUST have

An attack vector on what part of the system?