Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)

Jeffrey Haas <jhaas@pfrc.org> Wed, 18 December 2019 20:20 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 619DB120B12; Wed, 18 Dec 2019 12:20:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4pmOVcjGTYmQ; Wed, 18 Dec 2019 12:20:18 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 9F9CD120967; Wed, 18 Dec 2019 12:20:18 -0800 (PST)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 7862C1E2F6; Wed, 18 Dec 2019 15:24:48 -0500 (EST)
Date: Wed, 18 Dec 2019 15:24:48 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-bfd-vxlan@ietf.org, bfd-chairs@ietf.org, rtg-bfd@ietf.org
Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)
Message-ID: <20191218202448.GC6488@pfrc.org>
References: <157653979360.24617.1864402887480503965.idtracker@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <157653979360.24617.1864402887480503965.idtracker@ietfa.amsl.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/LWivXWVxnNIWDY1skHeY_0SbkBA>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Dec 2019 20:20:21 -0000

Benjamin,

On Mon, Dec 16, 2019 at 03:43:13PM -0800, Benjamin Kaduk via Datatracker wrote:
> Benjamin Kaduk has entered the following ballot position for
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> I have a few points that I think merit IESG discussion.
> 
> (1) I see that several directorate reviewers expressed unease at the
> destination (IP and) MAC address assignment procedure for the inner
> VXLAN headers, and appreciate that there was extensive on-list
> discussion (more than I could follow).  That said, I failed to find a
> clear statement of why the current text is believed to be safe, and in
> fact my reading of the current text is that the described procedure is
> *not* safe.  Pointers to key parts of the WG discusison would be more
> than welcome!

One high level point that likely didn't survive the rather verbose comment
chain is there are two implementations of this draft.  Some of the
considerations covered in the guidance here is "please don't break shipping
code".

While this is IETF, and shipping code isn't always a blocking point to
document changes, I'd suggest that as a consideration.

> To take something of a high-level view of my concerns, if we think of
> the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant
> traffic, then what we're trying to do is roughly like BFD between VTEPs,
> but we want to get fault-detection over as broad a coverage as we can
> (the "outermost part of the tunnel"), so we want to have the option of
> per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP).

You've summarized this clearly.  Joel Halpern, in particular, raised this
point multiple times.  Effectively, "what are we testing?"  And the response
not clearly converging on exactly one of the two possibilities.

As is noted in the various IESG discussion, each of the two test points
raise slightly different considerations.

> However, we end up having to do this by trying to insert a thin filter
> into the tenant's address space (i.e., the inner VXLAN header) and pick
> out the specific stream of BFD traffic that we're introducing.  This is,
> in some sense, a namespace grab in what is conceptually the tenant's
> namespace, and we have to be careful that what we do is either
> guaranteed to not impact the tenant or well-documented and
> compartmentalized (akin to the "well-known URIs").

Possibly, and it's certainly a consideration.  However, I think I'm less
convinced of it being quite the level of violation that seems to be
reflected in the rest of the IESG comments in the various other threads.
I'll respond to that detail a bit below.

> I've made comments at several places in the document that are more
> directly tied to specific pieces of text, but in general, if we assume
> that the tenant can add/remove new addresses at will within their VXLAN
> abstration, then any attempt to preconfigure by mutual agreement the BFD
> addresses to use at the VTEPs or to use the VTEP's normal (outer)
> address as the sentinel value seems subject to the tenant coming in and
> subsequently trying to use that address, leading to (some of) the
> tenant's traffic getting silently filtered and interpreted by the VTEP.
> If we were using domain names as identifiers, we could allocate
> something under .arpa or similar, but I think our options are more
> limited when numerical addresses are used.
> 
> The option suggested by the rtg-dir reviewer of always using the
> management VNI does not suffer from this namespacing issue, though I
> recognize that it does reduce the scope over which fault-detection is
> available, for the cases when different VNIs' traffic are routed or
> handled differently.

This is a clean summary of the considerations.  At least a portion of the WG
seems to be comfortable with "test to the management VNI".  However, another
(smaller, I believe) portion were wanting to test one layer further in.

> (2) Section 6 says:
> 
>                                                          The selection
>    of the VNI number of the Management VNI MUST be controlled through
>    management plane.  An implementation MAY use VNI number 1 as the
>    default value for the Management VNI.  All VXLAN packets received on
>    the Management VNI MUST be processed locally and MUST NOT be
>    forwarded to a tenant.
> 
> It seems like the management VNI concept is something that would apply
> to the entire VXLAN deployment and not just to the BFD-using portions;
> is this already defined somewhere (in which case we should reference
> it), or is it new with this document?  In the latter case wouldn't it be
> an update to the core VXLAN spec?  (I note that there are some
> procedural hoops to jump through for an IETF-stream document to update
> an ISE-stream document...)

The relevant portion of the archive will have the Subject: line text
including:
"Trapping BFD Control packet at VTEP"

A portion of the discussion relating to the magic number of the management
VNI suggested '1', instead of '0'.

At least some implementations already use '0':
https://mailarchive.ietf.org/arch/msg/rtg-bfd/6WfSATmfoPv4AD6RmD-Xb7zz4CE

The argument to not use '0' starts roughly here:
https://mailarchive.ietf.org/arch/msg/rtg-bfd/z8E_a5k_r4pLLs5YfNsL_Xm9_Us

You're correct, IMO, that there's no standard practice and the above seems
to support this.  I believe this leaves the document authors in the position
of being requested to make a recommendation for the default value of this
field and knowing that the default would be invalid on some platforms.

The alternative is requiring implementations to always configure this value.

I suggest the IESG determine whether it wants a default value here or not.
If not, the text should be adjusted to require configuration.  If yes, the
IESG should consider whether the nvo3 group should produce some document
that covers current operational practices.

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>    0:0:0:0:0:FFFF:7F00:0/104 range for IPv6).  There could be a firewall
>    configured on VTEP to block loopback addresses if set as the
>    destination IP in the inner IP header.  It is RECOMMENDED to allow
>    addresses from the loopback range through a firewall only if it is
>    used as the destination IP address in the inner IP header, and the
>    destination UDP port is set to 3784 [RFC5881].
> 
> I think we should reword this to make it clear that the default behavior
> is still "block all incoming traffic with loopback destination" and that
> the exception is tightly scoped to the encapsulated VXLAN traffic
> discussed in this document and the specific destination port *and when
> BFD has been configured for the VTEP*.  I note that well-known ports are
> not reserved ports, and we have no guarangee that only a BFD
> implementation would be listening on port 3784.

I don't think this consideration is necessarily critical.

BFD implementations residing in the related instance communicating to other
instances across the vxlan environment would be using RFC 5881 or RFC 5883
style BFD.  Since this isn't a tunneled BFD, the IP endpoints of the BFD
control traffic will be unicast addresses rather than the reserved
"loopback" ranges; i.e. 127/8 ::FFFF:127.0.0.0/104.  In order for those
ranges to be problematic, it'd be necessary for the client to be able to
manually encapsulate a vxlan packet - a security issue of its own.

A related point in this discussion is "we're hijacking an address managed by
the local tenant".  While true, it's in the above ranges and thus somewhat
under the auspice of the host OS to assert control.  I'm aware of some
unusual applications that make use of configured addresses in those ranges
for on-box communications, but they're also on the unusual end of things.

What sort of text would you want to cover the case that when BFD is run
up-to-the-tenant mode in this circumstance that an address MUST be reserved
for the BFD over vxlan application and that this address SHALL NOT be
available to the tenant for its own use?

>    VXLAN packet.  The choice of Destination MAC and Destination IP
>    addresses for the inner Ethernet frame MUST ensure that the BFD
>    Control packet is not forwarded to a tenant but is processed locally
>    at the remote VTEP.  [...]
> 
> This has to be 100% reliable, and I think we need to provide some
> example mechanism that has that property even if we don't mandate that
> it be the only allowed mechanism.

The consideration here, I believe, is that there's currently too much
latitude by implementations as to what MAC addresses they use here.
Restrict one case, you may break some implementation.

The missing element is how a pair of implementations of BFD for vxlan
discover the necessary information?  As far as BFD is concerned, "tell me!"
This seems like work that belongs in nvo3.

>          Destination MAC: This MUST NOT be of one of tenant's MAC
>          addresses.  The destination MAC address MAY be the address
> 
> But the tenant can start using new MAC addresses at any time!  How is
> BFD-over-VXLAN going to dynamically detect and avoid that?

See above.  Either it's coordinated with the ability to prevent the tenant
from using it or the underlying vxlan environment needs to provide some
mechanism to discover what's been provisioned.

>          associated with the destination VTEP.  The MAC address MAY be
>          configured, or it MAY be learned via a control plane protocol.
>          The details of how the MAC address is obtained are outside the
>          scope of this document.
> 
> This all talks about the MAC address being relatively static
> configuration, but per above, I don't think that's safe in the face of a
> MUST-level requirement to avoid conflicting with tenant MAC addresses.

But is it BFD's responsibility to figure this out?  This is what the
document is suggesting - a higher level with access to the implementation
specifics should be supplying the BFD provisioning information.  Or manual
provisioning in the absence thereof.

-- Jeff