[Idr] Benjamin Kaduk's Discuss on draft-ietf-idr-tunnel-encaps-20: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 01 December 2020 18:18 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: idr@ietf.org
Delivered-To: idr@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id DF09C3A141E; Tue, 1 Dec 2020 10:18:14 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-idr-tunnel-encaps@ietf.org, idr-chairs@ietf.org, idr@ietf.org, John Scudder <jgs@juniper.net>, aretana.ietf@gmail.com, Susan Hares <shares@ndzh.com>, shares@ndzh.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.23.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <160684669488.21585.5180075052177708759@ietfa.amsl.com>
Date: Tue, 01 Dec 2020 10:18:14 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/yR0KfBKtw63gR5BPzOeApnFXfi8>
Subject: [Idr] Benjamin Kaduk's Discuss on draft-ietf-idr-tunnel-encaps-20: (with DISCUSS and COMMENT)
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Dec 2020 18:18:15 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-idr-tunnel-encaps-20: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


I support Erik's discuss.

I see that Roman has already suggested adding normative language
regarding the limitation to a single administrative domain (in addition
to the "MUST filter by default for EBGP sessions"), which I agree with.
However, I think there is an additional consideration regarding the
limitation of use to a single administrative domain, wherein the domain
of use for the tunnel encapsulation attribute may diverge from the
domain of use of segment routing, that seems to place this document in
conflict with the requirements of RFC 8402.  In particular,
RFC 8402 says, for SR-MPLS and SRv6, that boundary routers "MUST filter
any external traffic", and additionally for SRv6 that "explicit routing
information MUST NOT be leaked through the boundaries of the
administrered domain".  In §3.7 of this document, though, we find that
for the Prefix-SID sub-TLV, "the receiving BGP speaker need not even be
in the same Segment Routing Domain as the tunnel's egress endpoint, and
there is no implication that the prefix-SID for the advertised prefix is
the same in the Segment Routing domains of the BGP speaker that
originated the sub-TLV and the BGP speaker that received it", which
seems to suggest violation of the RFC 8402 requirement.  I think we need
to have greater clarity on what relationship is actually intended
between the SR domain and the tunnel encapsulation usage domain, and if
they are to diverge, we need to both somehow rectify this behavior with
RFC 8402 and to very clearly document how the 8402-mandated filtering at
the SR domain boundary is supposed to happen when the boundary includes
tunneled traffic.

I also would like to ensure that we have had adequate discussion of the
relationship between this document and RFC 8365.  The IESG has received
comments recently (in the context of a different document) that it is
irresponsible to effectively obsolete or deprecate existing work without
identifying or explicitly updating such work, and without indicating
whose responsibility it is to find discrepancies.  That seems like it
might apply to what's currently in Appendix A, which on first reading
suggests "there might be a problem here, but we aren't saying exactly
what or how to fix it, or even who is going to do that work".


It's good to see that the shepherd writeup got updated as things
changed; thank you for keeping it up to date!

[I initially wrote some inline comments about handling internal
inconsistencies within a given tunnel specification as malformed and
ignoring the tunnel entirely vs specifying a precedence order (the
latter being what this document does).  I removed them, because this
seems to be a generic topic where security types tend to fail-closed and
routing types tend to aim to provide some kind of service when possible,
and I don't have anything new to add to the discussion for these
particular cases.]

I didn't see a response to the secdir review; it would be good to get a
response to that, in particular to hear what amount of consideration
has been given to what new ways this provides to attack BGP.


   of certain other SAFIs.  This document adds support for additional
   Tunnel Types, and allows a remote tunnel endpoint address to be
   specified for each tunnel.  This document also provides support for

The shepherd writeup suggests that the "remote tunnel endpoint"
terminology was switched to be "tunnel egress endpoint"; was this spot

Section 1.4

   o  Defining a new "Tunnel Egress Endpoint sub-TLV" (Section 3.1) that
      can be included in any of the TLVs contained in the Tunnel
      Encapsulation attribute.  This sub-TLV can be used to specify the
      remote endpoint address of a particular tunnel.

["remote endpoint" again]

Section 3.1

I agree with Martin V that there must be a story about this Reserved
field and why it's only SHOULD send as zero.  I don't know whether this
information needs to end up in the RFC but I think we should talk about
why it is this way.  In particular, the current requirements suggest
that it could be (mis?)used as an additional data channel by
collaborating implementations (that ignore the "MUST be disregarded"),
without actually writing up a specification for those semantics.

   If the Address Family subfield has any value other than IPv4 or IPv6,
   the Tunnel Egress Endpoint sub-TLV is considered "unrecognized" (see

We probably need to repeat the carve-out for the value 0 here, as well.
(I dithered about remarking about the earlier "assumes that the Address
Family is either IPv4 or IPv6" since the "one special case" is a few
paragraphs later.)

   o  It can be determined that the IP address in the sub-TLV's address
      subfield does not belong to the Autonomous System (AS) that
      originated the route that contains the attribute.  Section 3.1.1
      describes an optional procedure to make this determination.

This check seems highly important for the security of the system and
should get called out in the security considerations.

Section 3.1.1

   sub-TLV is considered not to be valid.  In some cases a network
   operator who controls a set of Autonomous Systems might wish to allow
   a Tunnel Egress Endpoint to reside in an AS other than Route_AS;
   configuration MAY allow for such a case, in which case the check
   becomes, if Egress_AS is not within the configured set of permitted
   AS numbers, then the Tunnel Egress Endpoint sub-TLV is considered to
   be "malformed".

(nit?) maybe "the configured set of permitted AS numbers that contains
Route_AS"?  The current wording implies that there can only be one such
configured set and that it is used regardless of Route_AS, which does
not seem right...

Section 3.2

   This section defines Encapsulation sub-TLVs for the following tunnel
   types: VXLAN ([RFC7348]), NVGRE ([RFC7637]), MPLS-in-GRE ([RFC4023]),
   L2TPv3 ([RFC3931]), and GRE ([RFC2784]).

Thanks for putting the links all in one place, here.  I, at least, would
have benefited from also putting the links/references in the
corresponding sections, but that is probably just a matter of style.

Section 3.2.1

      R: The remaining bits in the 8-bit flags field are reserved for
      further use.  They MUST always be set to 0 by the originator of
      the sub-TLV.  Intermediate routers MUST propagate them without
      modification.  Any receiving routers MUST ignore these bits upon a
      receipt of the sub-TLV.

nit: spurious "a" in "upon a receipt" (and diffing this section against
§3.2.2 it seems that maybe the "of the sub-TLV" is also superfluous?).

   o  If the V bit is not set, and the BGP UPDATE message has AFI/SAFI
      other than Ethernet VPNs (EVPN) then the VXLAN tunnel cannot be

If this is intended to refer to SAFI 70 (from RFC 7432), I note that the
IANA entry is named "BGP EVPNs".

[I also don't understand why it's okay for EVPN to not have a VN-ID when
using the VXLAN tunnel, but assume that's just my ignorance.]

Section 3.2.2

      Reserved (two fields): MUST be set to zero on transmission and
      disregarded on receipt.

(nit) I only see one field marked "Reserved" (this format is the same
layout as for VXLAN).

   o  The values of the V, M, and R bits are NOT copied into the flags
      field of the NVGRE header.  The flags field of the VXLAN header is
      set as per [RFC7637].

(nit) stray "VXLAN"?

Section 3.2.5

   While it is not really necessary to have both the GRE and MPLS-in-GRE
   tunnel types, both are included for reasons of backwards

It might be nice to have a few more words about which one is the
"backward compatible" option and what it's compatible with.

Section 3.3

   If an outer Encapsulation sub-TLV occurs in a TLV for a Tunnel Type
   that does not use the corresponding outer encapsulation, the sub-TLV
   MUST be treated as if it were an unknown type of sub-TLV.

nit: this is the only instance of "unknown" in the document; using
"unrecognized" seems to be the common case (and makes it easier to find
Section 13).

Section 3.3.1

I think we may need to discuss the semantics of the DS field here -- as
I understand it, the attribute advertised in this TLV is the DSCP value
that the BGP speaker would like to receive in traffic destined to the
tunnel egress endpoint (which may be a different node than the BGP
speaker itself, but is expected to be under the control of the same
administrative entity).  Additionally, the interpretation of DSCP values
is subject to local interpretation on a given network.  Since the tunnel
encapsulation attribute is transitive, it will be propagated potentially
across multiple BGP hops and multiple ASes, so that the tunnel ingress
endpoint is not necessarily adjacent to the tunnel egress endpoint.
Although we do say that we expect the tunnel encapsulation information
to only be propagated within an administrative boundary, there is no
guarantee that the administrative boundary in question uses a unified
DSCP handling procedure throughout it.  As such, it may be possible to
end up in a regime where the requested DSCP codepoint has a different,
and potentially hazardous, interpretation, at the ingress of the tunnel.
So, it seems that we need to say something about either local policy for
DSCP value filtering, or only using this value when "directly" connected
to the egress AS, or similar; we do have something like this already for
the TC portion of the MPLS label stack entries.

Section 3.5

   labeled address family, then the sub-TLV MUST be disregarded.  If the
   sub-TLV is contained in a TLV whose Tunnel Type does not have a
   virtual network identifier in its encapsulation header, the sub-TLV
   MUST be disregarded.  In those cases where the sub-TLV is ignored, it
   SHOULD NOT be stripped from the TLV before the route is propagated.

Why only SHOULD NOT here?  I thought we hat MUST-level requirements to
preserve things unchaged in similar situations.

Section 4.1

   In the remainder of this specification, when a route is referred to
   as containing a Tunnel Encapsulation attribute with a TLV identifying
   a particular Tunnel Type, it implicitly includes the case where the
   route contains a Tunnel Encapsulation Extended Community identifying
   that Tunnel Type.

I searched the entire document for the string "identifying" and did not
find any instances where a route was referred to as containing a Tunenl
Encapsulation attribute with a TLV identifying a particular tunnel type.
Perhaps I should be looking for the "attribute" keyword, but there are
over 200 instances of that string; could you confirm whether this
implicit inclusion is actually used anywhere (and if so, give an example
of such usage)?

Section 6

   [RFC5512] specifies the use of the Tunnel Encapsulation attribute in
   BGP UPDATE messages of AFI/SAFI 1/7 and 2/7.  That document restricts
   the use of this attribute to UPDATE messages of those SAFIs.  This
   document removes that restriction.

I believe another reviewer commented on the ambiguity of "that", which I
first thought referred to "this" vs "that"; I now see that there is
additional ambituity as to whether it is the SAFI restriction or the
UPDATE message restriction that is lifted, and suggest clarification
of that as well.

   Once it is determined to send a packet through the tunnel specified
   in a particular Tunnel TLV of a particular Tunnel Encapsulation
   attribute, then the tunnel's egress endpoint address is the IP
   address contained in the sub-TLV.  If the Tunnel TLV contains a

nit: I think we have to say "Tunnel Egress Endpoint sub-TLV"; the use of
the definite article is not justified by the preceding context.

Section 8

   However, suppose that one of the TLVs in U2's Tunnel Encapsulation
   attribute contains the Color Sub-TLV.  In that case, packet P MUST
   NOT be sent through the tunnel contained in that TLV, unless U1 is
   carrying the Color Extended Community that is identified in U2's
   Color Sub-TLV.

We should probably reword this in light of Section 13's discussion that
a given Tunnel TLV can have more than one Color sub-TLV present.

Section 9.2

   Three of the tunnel types that can be specified in a Tunnel
   Encapsulation TLV have virtual network identifier fields in their
   encapsulation headers.  In the VXLAN encapsulation, this field is
   called the VNI (Virtual Network Identifier) field; in the NVGRE
   encapsulation, this field is called the VSID (Virtual Subnet
   Identifier) field.

We start off by saying "three" types but list only two.  What's the
third type?


   If the TLV identifying the tunnel does not contain an Encapsulation
   sub-TLV whose V bit is set, the virtual network identifier field of
   the encapsulation header is set as follows:

This perhaps sets us up for a nasty surprise in light of the
error-handling behavior in §13, where we are supposed to disregard the
second and subsequent instances of the Encapsulation sub-TLV.  This
language is not particularly clear about whether it applies to only the
first sub-TLV or all instances.

   o  If the TLV does not contain an Embedded Label Handling sub-TLV, or
      if it contains an Embedded Label Handling sub-TLV whose value is
      2, the embedded label is copied into the lower 3 octets of the
      virtual network identifier field of the encapsulation header.

nit: I think "lower" is unneeded, since all the VNI fields are exactly 3
octets now.

Section 10

   Note that if the Tunnel Encapsulation attribute is attached to a VPN-
   IP route [RFC4364], and if Inter-AS "option b" (see section 10 of
   [RFC4364]) is being used, and if the Tunnel Egress Endpoint sub-TLV
   contains an IP address that is not in same AS as the router receiving
   the route, it is very likely that the embedded label has been
   changed.  [...]

I'm not sure that I'm understanding this scenario properly.  The label
has been "changed" with respect to what baseline?  I'm also not sure why
the tunnel egress endpoint would need to be in the same AS as the router
receiving (vs originating) the route.

   Other documents may define other ways to signal tunnel information in
   BGP.  For example, [RFC6514] defines the "P-Multicast Service
   Interface Tunnel" (PMSI Tunnel) attribute.  In this specification, we
   do not consider the effects of advertising the Tunnel Encapsulation
   Attribute in conjunction with other forms of signaling tunnels.  Any
   document specifying such joint use should provide details as to how
   interactions should be handled.

It seems like we should perhaps go a step further and explicitly
recommend not advertising such combinations in the absence of a
specification for their combined use?  Otherwise implementations will
have to come up with their own interpretations, which could easily be

Section 13

   The following sub-TLVs defined in this document MUST NOT occur more
   than once in a given Tunnel TLV: Tunnel Egress Endpoint (discussed
   below), Encapsulation, DS, UDP Destination Port, Embedded Label
   Handling, MPLS Label Stack, Prefix-SID.  [...]

Ah, thanks for listing these out.  I had been wondering about this
situation earlier in the doc, and it would have helped if the "not more
than once" limitation was mentioned at each sub-TLV's definition (even
if the actual error handling stays here).

Section 14.3

Why do we only move "BGP Tunnel Encapsulation Attribute Sub-TLVs" (but
not "BGP Tunnel Encapsulation Attribute Tunnel Types") to the new "BGP
Tunnel Encapsulation Parameters" grouping?

Section 14.9

It might be useful to indicate in the registry metadata how many flags
are available (and, I suppose, in which order the bits are numbered).

Section 15

We briefly discuss in the main body text the possibility that a tunnel
will direct encapsulated traffic with (e.g) MPLS labels to a node that
will misinterpret those labels; it might be worth reiterating the risk
of such misinterpretation in the security considerations as well (or
just referencing the previous discussion as security-relevant).

I guess there's also a theoretical possibility that the flexibility in
tunnel specification (including the type of expected content) could
facilitate cross-protocol attacks, where the attacker causes the sender
and recipient of encapsulated traffic to think that they should
interpret the records in question as different protocols.  But this
seems so remote, and unlikely to succeed given different protocols'
message structure, that it is probably not worth mentioning.

Section 18.2

If we're using RFC 5462 as a reference for a field in an MPLS label,
that seems to make it normative.

We seem to depend on procedures from RFC 6811 in a few places, which
also seems to make it normative.