Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 20 December 2019 02:00 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 02DF3120013; Thu, 19 Dec 2019 18:00:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LXPo2AsyP9Cs; Thu, 19 Dec 2019 18:00:49 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9377B12000F; Thu, 19 Dec 2019 18:00:49 -0800 (PST)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id xBK20jIQ006816 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Dec 2019 21:00:47 -0500
Date: Thu, 19 Dec 2019 18:00:44 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: rtg-bfd@ietf.org, draft-ietf-bfd-vxlan@ietf.org, The IESG <iesg@ietf.org>, bfd-chairs@ietf.org
Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-bfd-vxlan-09: (with DISCUSS and COMMENT)
Message-ID: <20191220020044.GH35479@kduck.mit.edu>
References: <157653979360.24617.1864402887480503965.idtracker@ietfa.amsl.com> <20191218202448.GC6488@pfrc.org> <20191218220246.GK81833@kduck.mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20191218220246.GK81833@kduck.mit.edu>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/bCg4yQG3SEVWWCj0cbGsP5gylk0>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Dec 2019 02:00:52 -0000

On Wed, Dec 18, 2019 at 02:02:46PM -0800, Benjamin Kaduk wrote:
> Hi Jeff,
> 
> I think I can only touch on a few points before telechat-time rolls around,
> and will finish off afterwards.

as promised...

> On Wed, Dec 18, 2019 at 03:24:48PM -0500, Jeffrey Haas wrote:
> > Benjamin,
> > 
> > On Mon, Dec 16, 2019 at 03:43:13PM -0800, Benjamin Kaduk via Datatracker wrote:
> > > Benjamin Kaduk has entered the following ballot position for
> > > ----------------------------------------------------------------------
> > > DISCUSS:
> > > ----------------------------------------------------------------------
> > > 
> > > I have a few points that I think merit IESG discussion.
> > > 
> > > (1) I see that several directorate reviewers expressed unease at the
> > > destination (IP and) MAC address assignment procedure for the inner
> > > VXLAN headers, and appreciate that there was extensive on-list
> > > discussion (more than I could follow).  That said, I failed to find a
> > > clear statement of why the current text is believed to be safe, and in
> > > fact my reading of the current text is that the described procedure is
> > > *not* safe.  Pointers to key parts of the WG discusison would be more
> > > than welcome!
> > 
> > One high level point that likely didn't survive the rather verbose comment
> > chain is there are two implementations of this draft.  Some of the
> > considerations covered in the guidance here is "please don't break shipping
> > code".
> > 
> > While this is IETF, and shipping code isn't always a blocking point to
> > document changes, I'd suggest that as a consideration.
> 
> It did indeed not survive (at least my pass through) the comment chain, so
> thank you for calling it out.  It is indeed a consideration, and I expect
> some actual discussion on the call tomorrow.
> 
> > > To take something of a high-level view of my concerns, if we think of
> > > the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant
> > > traffic, then what we're trying to do is roughly like BFD between VTEPs,
> > > but we want to get fault-detection over as broad a coverage as we can
> > > (the "outermost part of the tunnel"), so we want to have the option of
> > > per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP).
> > 
> > You've summarized this clearly.  Joel Halpern, in particular, raised this
> > point multiple times.  Effectively, "what are we testing?"  And the response
> > not clearly converging on exactly one of the two possibilities.
> > 
> > As is noted in the various IESG discussion, each of the two test points
> > raise slightly different considerations.
> > 
> > > However, we end up having to do this by trying to insert a thin filter
> > > into the tenant's address space (i.e., the inner VXLAN header) and pick
> > > out the specific stream of BFD traffic that we're introducing.  This is,
> > > in some sense, a namespace grab in what is conceptually the tenant's
> > > namespace, and we have to be careful that what we do is either
> > > guaranteed to not impact the tenant or well-documented and
> > > compartmentalized (akin to the "well-known URIs").
> > 
> > Possibly, and it's certainly a consideration.  However, I think I'm less
> > convinced of it being quite the level of violation that seems to be
> > reflected in the rest of the IESG comments in the various other threads.
> > I'll respond to that detail a bit below.
> > 
> > > I've made comments at several places in the document that are more
> > > directly tied to specific pieces of text, but in general, if we assume
> > > that the tenant can add/remove new addresses at will within their VXLAN
> > > abstration, then any attempt to preconfigure by mutual agreement the BFD
> > > addresses to use at the VTEPs or to use the VTEP's normal (outer)
> > > address as the sentinel value seems subject to the tenant coming in and
> > > subsequently trying to use that address, leading to (some of) the
> > > tenant's traffic getting silently filtered and interpreted by the VTEP.
> > > If we were using domain names as identifiers, we could allocate
> > > something under .arpa or similar, but I think our options are more
> > > limited when numerical addresses are used.
> > > 
> > > The option suggested by the rtg-dir reviewer of always using the
> > > management VNI does not suffer from this namespacing issue, though I
> > > recognize that it does reduce the scope over which fault-detection is
> > > available, for the cases when different VNIs' traffic are routed or
> > > handled differently.
> > 
> > This is a clean summary of the considerations.  At least a portion of the WG
> > seems to be comfortable with "test to the management VNI".  However, another
> > (smaller, I believe) portion were wanting to test one layer further in.
> 
> It is reassuring that I at least managed to summarize the situation
> tolerably.  Is it fair to say that testing "one layer further in" is a
> superset of what "test to the managemenet VNI" can do?
> 
> > > (2) Section 6 says:
> > > 
> > >                                                          The selection
> > >    of the VNI number of the Management VNI MUST be controlled through
> > >    management plane.  An implementation MAY use VNI number 1 as the
> > >    default value for the Management VNI.  All VXLAN packets received on
> > >    the Management VNI MUST be processed locally and MUST NOT be
> > >    forwarded to a tenant.
> > > 
> > > It seems like the management VNI concept is something that would apply
> > > to the entire VXLAN deployment and not just to the BFD-using portions;
> > > is this already defined somewhere (in which case we should reference
> > > it), or is it new with this document?  In the latter case wouldn't it be
> > > an update to the core VXLAN spec?  (I note that there are some
> > > procedural hoops to jump through for an IETF-stream document to update
> > > an ISE-stream document...)
> > 
> > The relevant portion of the archive will have the Subject: line text
> > including:
> > "Trapping BFD Control packet at VTEP"
> > 
> > A portion of the discussion relating to the magic number of the management
> > VNI suggested '1', instead of '0'.
> > 
> > At least some implementations already use '0':
> > https://mailarchive.ietf.org/arch/msg/rtg-bfd/6WfSATmfoPv4AD6RmD-Xb7zz4CE
> > 
> > The argument to not use '0' starts roughly here:
> > https://mailarchive.ietf.org/arch/msg/rtg-bfd/z8E_a5k_r4pLLs5YfNsL_Xm9_Us
> > 
> > You're correct, IMO, that there's no standard practice and the above seems
> > to support this.  I believe this leaves the document authors in the position
> > of being requested to make a recommendation for the default value of this
> > field and knowing that the default would be invalid on some platforms.
> > 
> > The alternative is requiring implementations to always configure this value.
> > 
> > I suggest the IESG determine whether it wants a default value here or not.
> > If not, the text should be adjusted to require configuration.  If yes, the
> > IESG should consider whether the nvo3 group should produce some document
> > that covers current operational practices.
> 
> That does sound like something we should try to talk about on the telechat
> as well; thanks for raising it so clearly.

We did get to talk about it briefly, though without a clear conclusion.
Since there's not much of a clear preexisting description of the management
VNI that we can cite, I'm leaning towards just noting that (most?)
implementations offer a concept of "management VNI", that the management
VNI can be useful for giving confidence that BFD traffic does not interfere
with tenant traffic, and the details of how to use the management VNI are
implementation-specific.  But it would not be hard to convince me that
there is a better path to take!

> > > ----------------------------------------------------------------------
> > > COMMENT:
> > > ----------------------------------------------------------------------
> > >    0:0:0:0:0:FFFF:7F00:0/104 range for IPv6).  There could be a firewall
> > >    configured on VTEP to block loopback addresses if set as the
> > >    destination IP in the inner IP header.  It is RECOMMENDED to allow
> > >    addresses from the loopback range through a firewall only if it is
> > >    used as the destination IP address in the inner IP header, and the
> > >    destination UDP port is set to 3784 [RFC5881].
> > > 
> > > I think we should reword this to make it clear that the default behavior
> > > is still "block all incoming traffic with loopback destination" and that
> > > the exception is tightly scoped to the encapsulated VXLAN traffic
> > > discussed in this document and the specific destination port *and when
> > > BFD has been configured for the VTEP*.  I note that well-known ports are
> > > not reserved ports, and we have no guarangee that only a BFD
> > > implementation would be listening on port 3784.
> > 
> > I don't think this consideration is necessarily critical.
> 
> I think I'm in agreement about its criticality, and will see if I can come
> up with some actual text ... later.

I think the sense I have is something like:

% It is common to have a firewall configured on the VTEP (akin to general
% common practice for all machines) to drop incoming traffic where the
% inner IP header contains a loopback address as the destination address.
% In general, such traffic would be the result of misconfiguration, and
% such a policy improves network safety.  However, the procedures specified
% in this document can result in such traffic, so it is RECOMMENDED to only
% allow incoming traffic with loopback-range inner destination IP address
% when the destination UDP port is set to 3784 and BFD has been configured
% on the VTEP.

But if that's not correct, no need to spend more time on it.

> > BFD implementations residing in the related instance communicating to other
> > instances across the vxlan environment would be using RFC 5881 or RFC 5883
> > style BFD.  Since this isn't a tunneled BFD, the IP endpoints of the BFD
> > control traffic will be unicast addresses rather than the reserved
> > "loopback" ranges; i.e. 127/8 ::FFFF:127.0.0.0/104.  In order for those
> > ranges to be problematic, it'd be necessary for the client to be able to
> > manually encapsulate a vxlan packet - a security issue of its own.
> > 
> > A related point in this discussion is "we're hijacking an address managed by
> > the local tenant".  While true, it's in the above ranges and thus somewhat
> > under the auspice of the host OS to assert control.  I'm aware of some
> > unusual applications that make use of configured addresses in those ranges
> > for on-box communications, but they're also on the unusual end of things.
> > 
> > What sort of text would you want to cover the case that when BFD is run
> > up-to-the-tenant mode in this circumstance that an address MUST be reserved
> > for the BFD over vxlan application and that this address SHALL NOT be
> > available to the tenant for its own use?

I'd like to hold off on proposing text until some of the other threads come
to a conclusion; what mechanism we want to use (and thus write about) may
depend on what scenarios we're trying to prevent.

> > >    VXLAN packet.  The choice of Destination MAC and Destination IP
> > >    addresses for the inner Ethernet frame MUST ensure that the BFD
> > >    Control packet is not forwarded to a tenant but is processed locally
> > >    at the remote VTEP.  [...]
> > > 
> > > This has to be 100% reliable, and I think we need to provide some
> > > example mechanism that has that property even if we don't mandate that
> > > it be the only allowed mechanism.
> > 
> > The consideration here, I believe, is that there's currently too much
> > latitude by implementations as to what MAC addresses they use here.
> > Restrict one case, you may break some implementation.
> > 
> > The missing element is how a pair of implementations of BFD for vxlan
> > discover the necessary information?  As far as BFD is concerned, "tell me!"
> > This seems like work that belongs in nvo3.

That seems likely.
I suspect that what we'll end up with here is something akin to "here's a
procedure you can use that will work, but if you want to use some other
procedure that works, we aren't stopping you".

> > >          Destination MAC: This MUST NOT be of one of tenant's MAC
> > >          addresses.  The destination MAC address MAY be the address
> > > 
> > > But the tenant can start using new MAC addresses at any time!  How is
> > > BFD-over-VXLAN going to dynamically detect and avoid that?
> > 
> > See above.  Either it's coordinated with the ability to prevent the tenant
> > from using it or the underlying vxlan environment needs to provide some
> > mechanism to discover what's been provisioned.
> > 
> > >          associated with the destination VTEP.  The MAC address MAY be
> > >          configured, or it MAY be learned via a control plane protocol.
> > >          The details of how the MAC address is obtained are outside the
> > >          scope of this document.
> > > 
> > > This all talks about the MAC address being relatively static
> > > configuration, but per above, I don't think that's safe in the face of a
> > > MUST-level requirement to avoid conflicting with tenant MAC addresses.
> > 
> > But is it BFD's responsibility to figure this out?  This is what the
> > document is suggesting - a higher level with access to the implementation
> > specifics should be supplying the BFD provisioning information.  Or manual
> > provisioning in the absence thereof.

I don't really disagree; I'm more trying to get a sense of which parts are
dynamic and which parts (relatively) static -- my current sense is that
everything in the tenant space can be very dynamic, but that the BFD
provisioning is fairly static.  That would lead to something of an
impedance mismatch if the dynamic stuff the tenant is doing could generate
a conflict with the static configuration -- it would be a sign that we're
doing something at the wrong level.  But, as implied earlier, I'm still
trying to wrap my head around the system as a whole.

Thanks,

Ben