Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)

Benjamin Kaduk <> Wed, 18 December 2019 22:03 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C0F1E120089; Wed, 18 Dec 2019 14:03:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id IJpxQ15GiPmH; Wed, 18 Dec 2019 14:03:01 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E3DBA12001A; Wed, 18 Dec 2019 14:03:00 -0800 (PST)
Received: from ([]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by (8.14.7/8.12.4) with ESMTP id xBIM2rmo017729 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Dec 2019 17:02:56 -0500
Date: Wed, 18 Dec 2019 14:02:46 -0800
From: Benjamin Kaduk <>
To: Jeffrey Haas <>
Cc: The IESG <>,,,
Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)
Message-ID: <>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <>
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 18 Dec 2019 22:03:04 -0000

Hi Jeff,

I think I can only touch on a few points before telechat-time rolls around,
and will finish off afterwards.

On Wed, Dec 18, 2019 at 03:24:48PM -0500, Jeffrey Haas wrote:
> Benjamin,
> On Mon, Dec 16, 2019 at 03:43:13PM -0800, Benjamin Kaduk via Datatracker wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > 
> > I have a few points that I think merit IESG discussion.
> > 
> > (1) I see that several directorate reviewers expressed unease at the
> > destination (IP and) MAC address assignment procedure for the inner
> > VXLAN headers, and appreciate that there was extensive on-list
> > discussion (more than I could follow).  That said, I failed to find a
> > clear statement of why the current text is believed to be safe, and in
> > fact my reading of the current text is that the described procedure is
> > *not* safe.  Pointers to key parts of the WG discusison would be more
> > than welcome!
> One high level point that likely didn't survive the rather verbose comment
> chain is there are two implementations of this draft.  Some of the
> considerations covered in the guidance here is "please don't break shipping
> code".
> While this is IETF, and shipping code isn't always a blocking point to
> document changes, I'd suggest that as a consideration.

It did indeed not survive (at least my pass through) the comment chain, so
thank you for calling it out.  It is indeed a consideration, and I expect
some actual discussion on the call tomorrow.

> > To take something of a high-level view of my concerns, if we think of
> > the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant
> > traffic, then what we're trying to do is roughly like BFD between VTEPs,
> > but we want to get fault-detection over as broad a coverage as we can
> > (the "outermost part of the tunnel"), so we want to have the option of
> > per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP).
> You've summarized this clearly.  Joel Halpern, in particular, raised this
> point multiple times.  Effectively, "what are we testing?"  And the response
> not clearly converging on exactly one of the two possibilities.
> As is noted in the various IESG discussion, each of the two test points
> raise slightly different considerations.
> > However, we end up having to do this by trying to insert a thin filter
> > into the tenant's address space (i.e., the inner VXLAN header) and pick
> > out the specific stream of BFD traffic that we're introducing.  This is,
> > in some sense, a namespace grab in what is conceptually the tenant's
> > namespace, and we have to be careful that what we do is either
> > guaranteed to not impact the tenant or well-documented and
> > compartmentalized (akin to the "well-known URIs").
> Possibly, and it's certainly a consideration.  However, I think I'm less
> convinced of it being quite the level of violation that seems to be
> reflected in the rest of the IESG comments in the various other threads.
> I'll respond to that detail a bit below.
> > I've made comments at several places in the document that are more
> > directly tied to specific pieces of text, but in general, if we assume
> > that the tenant can add/remove new addresses at will within their VXLAN
> > abstration, then any attempt to preconfigure by mutual agreement the BFD
> > addresses to use at the VTEPs or to use the VTEP's normal (outer)
> > address as the sentinel value seems subject to the tenant coming in and
> > subsequently trying to use that address, leading to (some of) the
> > tenant's traffic getting silently filtered and interpreted by the VTEP.
> > If we were using domain names as identifiers, we could allocate
> > something under .arpa or similar, but I think our options are more
> > limited when numerical addresses are used.
> > 
> > The option suggested by the rtg-dir reviewer of always using the
> > management VNI does not suffer from this namespacing issue, though I
> > recognize that it does reduce the scope over which fault-detection is
> > available, for the cases when different VNIs' traffic are routed or
> > handled differently.
> This is a clean summary of the considerations.  At least a portion of the WG
> seems to be comfortable with "test to the management VNI".  However, another
> (smaller, I believe) portion were wanting to test one layer further in.

It is reassuring that I at least managed to summarize the situation
tolerably.  Is it fair to say that testing "one layer further in" is a
superset of what "test to the managemenet VNI" can do?

> > (2) Section 6 says:
> > 
> >                                                          The selection
> >    of the VNI number of the Management VNI MUST be controlled through
> >    management plane.  An implementation MAY use VNI number 1 as the
> >    default value for the Management VNI.  All VXLAN packets received on
> >    the Management VNI MUST be processed locally and MUST NOT be
> >    forwarded to a tenant.
> > 
> > It seems like the management VNI concept is something that would apply
> > to the entire VXLAN deployment and not just to the BFD-using portions;
> > is this already defined somewhere (in which case we should reference
> > it), or is it new with this document?  In the latter case wouldn't it be
> > an update to the core VXLAN spec?  (I note that there are some
> > procedural hoops to jump through for an IETF-stream document to update
> > an ISE-stream document...)
> The relevant portion of the archive will have the Subject: line text
> including:
> "Trapping BFD Control packet at VTEP"
> A portion of the discussion relating to the magic number of the management
> VNI suggested '1', instead of '0'.
> At least some implementations already use '0':
> The argument to not use '0' starts roughly here:
> You're correct, IMO, that there's no standard practice and the above seems
> to support this.  I believe this leaves the document authors in the position
> of being requested to make a recommendation for the default value of this
> field and knowing that the default would be invalid on some platforms.
> The alternative is requiring implementations to always configure this value.
> I suggest the IESG determine whether it wants a default value here or not.
> If not, the text should be adjusted to require configuration.  If yes, the
> IESG should consider whether the nvo3 group should produce some document
> that covers current operational practices.

That does sound like something we should try to talk about on the telechat
as well; thanks for raising it so clearly.

> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> >    0:0:0:0:0:FFFF:7F00:0/104 range for IPv6).  There could be a firewall
> >    configured on VTEP to block loopback addresses if set as the
> >    destination IP in the inner IP header.  It is RECOMMENDED to allow
> >    addresses from the loopback range through a firewall only if it is
> >    used as the destination IP address in the inner IP header, and the
> >    destination UDP port is set to 3784 [RFC5881].
> > 
> > I think we should reword this to make it clear that the default behavior
> > is still "block all incoming traffic with loopback destination" and that
> > the exception is tightly scoped to the encapsulated VXLAN traffic
> > discussed in this document and the specific destination port *and when
> > BFD has been configured for the VTEP*.  I note that well-known ports are
> > not reserved ports, and we have no guarangee that only a BFD
> > implementation would be listening on port 3784.
> I don't think this consideration is necessarily critical.

I think I'm in agreement about its criticality, and will see if I can come
up with some actual text ... later.


> BFD implementations residing in the related instance communicating to other
> instances across the vxlan environment would be using RFC 5881 or RFC 5883
> style BFD.  Since this isn't a tunneled BFD, the IP endpoints of the BFD
> control traffic will be unicast addresses rather than the reserved
> "loopback" ranges; i.e. 127/8 ::FFFF:  In order for those
> ranges to be problematic, it'd be necessary for the client to be able to
> manually encapsulate a vxlan packet - a security issue of its own.
> A related point in this discussion is "we're hijacking an address managed by
> the local tenant".  While true, it's in the above ranges and thus somewhat
> under the auspice of the host OS to assert control.  I'm aware of some
> unusual applications that make use of configured addresses in those ranges
> for on-box communications, but they're also on the unusual end of things.
> What sort of text would you want to cover the case that when BFD is run
> up-to-the-tenant mode in this circumstance that an address MUST be reserved
> for the BFD over vxlan application and that this address SHALL NOT be
> available to the tenant for its own use?
> >    VXLAN packet.  The choice of Destination MAC and Destination IP
> >    addresses for the inner Ethernet frame MUST ensure that the BFD
> >    Control packet is not forwarded to a tenant but is processed locally
> >    at the remote VTEP.  [...]
> > 
> > This has to be 100% reliable, and I think we need to provide some
> > example mechanism that has that property even if we don't mandate that
> > it be the only allowed mechanism.
> The consideration here, I believe, is that there's currently too much
> latitude by implementations as to what MAC addresses they use here.
> Restrict one case, you may break some implementation.
> The missing element is how a pair of implementations of BFD for vxlan
> discover the necessary information?  As far as BFD is concerned, "tell me!"
> This seems like work that belongs in nvo3.
> >          Destination MAC: This MUST NOT be of one of tenant's MAC
> >          addresses.  The destination MAC address MAY be the address
> > 
> > But the tenant can start using new MAC addresses at any time!  How is
> > BFD-over-VXLAN going to dynamically detect and avoid that?
> See above.  Either it's coordinated with the ability to prevent the tenant
> from using it or the underlying vxlan environment needs to provide some
> mechanism to discover what's been provisioned.
> >          associated with the destination VTEP.  The MAC address MAY be
> >          configured, or it MAY be learned via a control plane protocol.
> >          The details of how the MAC address is obtained are outside the
> >          scope of this document.
> > 
> > This all talks about the MAC address being relatively static
> > configuration, but per above, I don't think that's safe in the face of a
> > MUST-level requirement to avoid conflicting with tenant MAC addresses.
> But is it BFD's responsibility to figure this out?  This is what the
> document is suggesting - a higher level with access to the implementation
> specifics should be supplying the BFD provisioning information.  Or manual
> provisioning in the absence thereof.
> -- Jeff