Re: [Idr] Benjamin Kaduk's Discuss on draft-ietf-idr-tunnel-encaps-20: (with DISCUSS and COMMENT)

Susan Hares <> Tue, 01 December 2020 19:44 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6E32E3A149A; Tue, 1 Dec 2020 11:44:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.948
X-Spam-Status: No, score=0.948 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DOS_OUTLOOK_TO_MX=2.845, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id V3hczVzdKJto; Tue, 1 Dec 2020 11:44:35 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8E0753A1499; Tue, 1 Dec 2020 11:44:34 -0800 (PST)
X-Default-Received-SPF: pass (skip=loggedin (res=PASS)) x-ip-name=;
From: Susan Hares <>
To: 'Benjamin Kaduk' <>, 'The IESG' <>
References: <>
In-Reply-To: <>
Date: Tue, 01 Dec 2020 14:44:26 -0500
Message-ID: <016401d6c81a$61b49da0$251dd8e0$>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQFTx0iqBu+nLpz5RoQUm8fRb2mLNKropXXQ
Content-Language: en-us
X-Antivirus: AVG (VPS 201201-4, 12/01/2020), Outbound message
X-Antivirus-Status: Not-Tested
Archived-At: <>
Subject: Re: [Idr] Benjamin Kaduk's Discuss on draft-ietf-idr-tunnel-encaps-20: (with DISCUSS and COMMENT)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 01 Dec 2020 19:44:38 -0000


Since the IESG has engaged in a previous discussion regarding two documents, would let me know what you think is an "adequate" discussion.  Are you looking for a joint discussion with the two working groups (IDR and BESS) on the mail list plus a written report?  Or something else? 

Thank you, Sue  

-----Original Message-----
From: Idr [] On Behalf Of Benjamin Kaduk via Datatracker
Sent: Tuesday, December 1, 2020 1:18 PM
To: The IESG
Subject: [Idr] Benjamin Kaduk's Discuss on draft-ietf-idr-tunnel-encaps-20: (with DISCUSS and COMMENT)

Benjamin Kaduk has entered the following ballot position for
draft-ietf-idr-tunnel-encaps-20: Discuss

When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.)

Please refer to
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


I support Erik's discuss.

I see that Roman has already suggested adding normative language regarding the limitation to a single administrative domain (in addition to the "MUST filter by default for EBGP sessions"), which I agree with.
However, I think there is an additional consideration regarding the limitation of use to a single administrative domain, wherein the domain of use for the tunnel encapsulation attribute may diverge from the domain of use of segment routing, that seems to place this document in conflict with the requirements of RFC 8402.  In particular, RFC 8402 says, for SR-MPLS and SRv6, that boundary routers "MUST filter any external traffic", and additionally for SRv6 that "explicit routing information MUST NOT be leaked through the boundaries of the administrered domain".  In §3.7 of this document, though, we find that for the Prefix-SID sub-TLV, "the receiving BGP speaker need not even be in the same Segment Routing Domain as the tunnel's egress endpoint, and there is no implication that the prefix-SID for the advertised prefix is the same in the Segment Routing domains of the BGP speaker that originated the sub-TLV and the BGP speaker that received it", which seems to suggest violation of the RFC 8402 requirement.  I think we need to have greater clarity on what relationship is actually intended between the SR domain and the tunnel encapsulation usage domain, and if they are to diverge, we need to both somehow rectify this behavior with RFC 8402 and to very clearly document how the 8402-mandated filtering at the SR domain boundary is supposed to happen when the boundary includes tunneled traffic.

I also would like to ensure that we have had adequate discussion of the relationship between this document and RFC 8365.  The IESG has received comments recently (in the context of a different document) that it is irresponsible to effectively obsolete or deprecate existing work without identifying or explicitly updating such work, and without indicating whose responsibility it is to find discrepancies.  That seems like it might apply to what's currently in Appendix A, which on first reading suggests "there might be a problem here, but we aren't saying exactly what or how to fix it, or even who is going to do that work".


It's good to see that the shepherd writeup got updated as things changed; thank you for keeping it up to date!

[I initially wrote some inline comments about handling internal inconsistencies within a given tunnel specification as malformed and ignoring the tunnel entirely vs specifying a precedence order (the latter being what this document does).  I removed them, because this seems to be a generic topic where security types tend to fail-closed and routing types tend to aim to provide some kind of service when possible, and I don't have anything new to add to the discussion for these particular cases.]

I didn't see a response to the secdir review; it would be good to get a response to that, in particular to hear what amount of consideration has been given to what new ways this provides to attack BGP.


   of certain other SAFIs.  This document adds support for additional
   Tunnel Types, and allows a remote tunnel endpoint address to be
   specified for each tunnel.  This document also provides support for

The shepherd writeup suggests that the "remote tunnel endpoint"
terminology was switched to be "tunnel egress endpoint"; was this spot missed?

Section 1.4

   o  Defining a new "Tunnel Egress Endpoint sub-TLV" (Section 3.1) that
      can be included in any of the TLVs contained in the Tunnel
      Encapsulation attribute.  This sub-TLV can be used to specify the
      remote endpoint address of a particular tunnel.

["remote endpoint" again]

Section 3.1

I agree with Martin V that there must be a story about this Reserved field and why it's only SHOULD send as zero.  I don't know whether this information needs to end up in the RFC but I think we should talk about why it is this way.  In particular, the current requirements suggest that it could be (mis?)used as an additional data channel by collaborating implementations (that ignore the "MUST be disregarded"), without actually writing up a specification for those semantics.

   If the Address Family subfield has any value other than IPv4 or IPv6,
   the Tunnel Egress Endpoint sub-TLV is considered "unrecognized" (see

We probably need to repeat the carve-out for the value 0 here, as well.
(I dithered about remarking about the earlier "assumes that the Address Family is either IPv4 or IPv6" since the "one special case" is a few paragraphs later.)

   o  It can be determined that the IP address in the sub-TLV's address
      subfield does not belong to the Autonomous System (AS) that
      originated the route that contains the attribute.  Section 3.1.1
      describes an optional procedure to make this determination.

This check seems highly important for the security of the system and should get called out in the security considerations.

Section 3.1.1

   sub-TLV is considered not to be valid.  In some cases a network
   operator who controls a set of Autonomous Systems might wish to allow
   a Tunnel Egress Endpoint to reside in an AS other than Route_AS;
   configuration MAY allow for such a case, in which case the check
   becomes, if Egress_AS is not within the configured set of permitted
   AS numbers, then the Tunnel Egress Endpoint sub-TLV is considered to
   be "malformed".

(nit?) maybe "the configured set of permitted AS numbers that contains Route_AS"?  The current wording implies that there can only be one such configured set and that it is used regardless of Route_AS, which does not seem right...

Section 3.2

   This section defines Encapsulation sub-TLVs for the following tunnel
   types: VXLAN ([RFC7348]), NVGRE ([RFC7637]), MPLS-in-GRE ([RFC4023]),
   L2TPv3 ([RFC3931]), and GRE ([RFC2784]).

Thanks for putting the links all in one place, here.  I, at least, would have benefited from also putting the links/references in the corresponding sections, but that is probably just a matter of style.

Section 3.2.1

      R: The remaining bits in the 8-bit flags field are reserved for
      further use.  They MUST always be set to 0 by the originator of
      the sub-TLV.  Intermediate routers MUST propagate them without
      modification.  Any receiving routers MUST ignore these bits upon a
      receipt of the sub-TLV.

nit: spurious "a" in "upon a receipt" (and diffing this section against
§3.2.2 it seems that maybe the "of the sub-TLV" is also superfluous?).

   o  If the V bit is not set, and the BGP UPDATE message has AFI/SAFI
      other than Ethernet VPNs (EVPN) then the VXLAN tunnel cannot be

If this is intended to refer to SAFI 70 (from RFC 7432), I note that the IANA entry is named "BGP EVPNs".

[I also don't understand why it's okay for EVPN to not have a VN-ID when using the VXLAN tunnel, but assume that's just my ignorance.]

Section 3.2.2

      Reserved (two fields): MUST be set to zero on transmission and
      disregarded on receipt.

(nit) I only see one field marked "Reserved" (this format is the same layout as for VXLAN).

   o  The values of the V, M, and R bits are NOT copied into the flags
      field of the NVGRE header.  The flags field of the VXLAN header is
      set as per [RFC7637].

(nit) stray "VXLAN"?

Section 3.2.5

   While it is not really necessary to have both the GRE and MPLS-in-GRE
   tunnel types, both are included for reasons of backwards

It might be nice to have a few more words about which one is the "backward compatible" option and what it's compatible with.

Section 3.3

   If an outer Encapsulation sub-TLV occurs in a TLV for a Tunnel Type
   that does not use the corresponding outer encapsulation, the sub-TLV
   MUST be treated as if it were an unknown type of sub-TLV.

nit: this is the only instance of "unknown" in the document; using "unrecognized" seems to be the common case (and makes it easier to find Section 13).

Section 3.3.1

I think we may need to discuss the semantics of the DS field here -- as I understand it, the attribute advertised in this TLV is the DSCP value that the BGP speaker would like to receive in traffic destined to the tunnel egress endpoint (which may be a different node than the BGP speaker itself, but is expected to be under the control of the same administrative entity).  Additionally, the interpretation of DSCP values is subject to local interpretation on a given network.  Since the tunnel encapsulation attribute is transitive, it will be propagated potentially across multiple BGP hops and multiple ASes, so that the tunnel ingress endpoint is not necessarily adjacent to the tunnel egress endpoint.
Although we do say that we expect the tunnel encapsulation information to only be propagated within an administrative boundary, there is no guarantee that the administrative boundary in question uses a unified DSCP handling procedure throughout it.  As such, it may be possible to end up in a regime where the requested DSCP codepoint has a different, and potentially hazardous, interpretation, at the ingress of the tunnel.
So, it seems that we need to say something about either local policy for DSCP value filtering, or only using this value when "directly" connected to the egress AS, or similar; we do have something like this already for the TC portion of the MPLS label stack entries.

Section 3.5

   labeled address family, then the sub-TLV MUST be disregarded.  If the
   sub-TLV is contained in a TLV whose Tunnel Type does not have a
   virtual network identifier in its encapsulation header, the sub-TLV
   MUST be disregarded.  In those cases where the sub-TLV is ignored, it
   SHOULD NOT be stripped from the TLV before the route is propagated.

Why only SHOULD NOT here?  I thought we hat MUST-level requirements to preserve things unchaged in similar situations.

Section 4.1

   In the remainder of this specification, when a route is referred to
   as containing a Tunnel Encapsulation attribute with a TLV identifying
   a particular Tunnel Type, it implicitly includes the case where the
   route contains a Tunnel Encapsulation Extended Community identifying
   that Tunnel Type.

I searched the entire document for the string "identifying" and did not find any instances where a route was referred to as containing a Tunenl Encapsulation attribute with a TLV identifying a particular tunnel type.
Perhaps I should be looking for the "attribute" keyword, but there are over 200 instances of that string; could you confirm whether this implicit inclusion is actually used anywhere (and if so, give an example of such usage)?

Section 6

   [RFC5512] specifies the use of the Tunnel Encapsulation attribute in
   BGP UPDATE messages of AFI/SAFI 1/7 and 2/7.  That document restricts
   the use of this attribute to UPDATE messages of those SAFIs.  This
   document removes that restriction.

I believe another reviewer commented on the ambiguity of "that", which I first thought referred to "this" vs "that"; I now see that there is additional ambituity as to whether it is the SAFI restriction or the UPDATE message restriction that is lifted, and suggest clarification of that as well.

   Once it is determined to send a packet through the tunnel specified
   in a particular Tunnel TLV of a particular Tunnel Encapsulation
   attribute, then the tunnel's egress endpoint address is the IP
   address contained in the sub-TLV.  If the Tunnel TLV contains a

nit: I think we have to say "Tunnel Egress Endpoint sub-TLV"; the use of the definite article is not justified by the preceding context.

Section 8

   However, suppose that one of the TLVs in U2's Tunnel Encapsulation
   attribute contains the Color Sub-TLV.  In that case, packet P MUST
   NOT be sent through the tunnel contained in that TLV, unless U1 is
   carrying the Color Extended Community that is identified in U2's
   Color Sub-TLV.

We should probably reword this in light of Section 13's discussion that a given Tunnel TLV can have more than one Color sub-TLV present.

Section 9.2

   Three of the tunnel types that can be specified in a Tunnel
   Encapsulation TLV have virtual network identifier fields in their
   encapsulation headers.  In the VXLAN encapsulation, this field is
   called the VNI (Virtual Network Identifier) field; in the NVGRE
   encapsulation, this field is called the VSID (Virtual Subnet
   Identifier) field.

We start off by saying "three" types but list only two.  What's the third type?


   If the TLV identifying the tunnel does not contain an Encapsulation
   sub-TLV whose V bit is set, the virtual network identifier field of
   the encapsulation header is set as follows:

This perhaps sets us up for a nasty surprise in light of the error-handling behavior in §13, where we are supposed to disregard the second and subsequent instances of the Encapsulation sub-TLV.  This language is not particularly clear about whether it applies to only the first sub-TLV or all instances.

   o  If the TLV does not contain an Embedded Label Handling sub-TLV, or
      if it contains an Embedded Label Handling sub-TLV whose value is
      2, the embedded label is copied into the lower 3 octets of the
      virtual network identifier field of the encapsulation header.

nit: I think "lower" is unneeded, since all the VNI fields are exactly 3 octets now.

Section 10

   Note that if the Tunnel Encapsulation attribute is attached to a VPN-
   IP route [RFC4364], and if Inter-AS "option b" (see section 10 of
   [RFC4364]) is being used, and if the Tunnel Egress Endpoint sub-TLV
   contains an IP address that is not in same AS as the router receiving
   the route, it is very likely that the embedded label has been
   changed.  [...]

I'm not sure that I'm understanding this scenario properly.  The label has been "changed" with respect to what baseline?  I'm also not sure why the tunnel egress endpoint would need to be in the same AS as the router receiving (vs originating) the route.

   Other documents may define other ways to signal tunnel information in
   BGP.  For example, [RFC6514] defines the "P-Multicast Service
   Interface Tunnel" (PMSI Tunnel) attribute.  In this specification, we
   do not consider the effects of advertising the Tunnel Encapsulation
   Attribute in conjunction with other forms of signaling tunnels.  Any
   document specifying such joint use should provide details as to how
   interactions should be handled.

It seems like we should perhaps go a step further and explicitly recommend not advertising such combinations in the absence of a specification for their combined use?  Otherwise implementations will have to come up with their own interpretations, which could easily be uninteroperable.

Section 13

   The following sub-TLVs defined in this document MUST NOT occur more
   than once in a given Tunnel TLV: Tunnel Egress Endpoint (discussed
   below), Encapsulation, DS, UDP Destination Port, Embedded Label
   Handling, MPLS Label Stack, Prefix-SID.  [...]

Ah, thanks for listing these out.  I had been wondering about this situation earlier in the doc, and it would have helped if the "not more than once" limitation was mentioned at each sub-TLV's definition (even if the actual error handling stays here).

Section 14.3

Why do we only move "BGP Tunnel Encapsulation Attribute Sub-TLVs" (but not "BGP Tunnel Encapsulation Attribute Tunnel Types") to the new "BGP Tunnel Encapsulation Parameters" grouping?

Section 14.9

It might be useful to indicate in the registry metadata how many flags are available (and, I suppose, in which order the bits are numbered).

Section 15

We briefly discuss in the main body text the possibility that a tunnel will direct encapsulated traffic with (e.g) MPLS labels to a node that will misinterpret those labels; it might be worth reiterating the risk of such misinterpretation in the security considerations as well (or just referencing the previous discussion as security-relevant).

I guess there's also a theoretical possibility that the flexibility in tunnel specification (including the type of expected content) could facilitate cross-protocol attacks, where the attacker causes the sender and recipient of encapsulated traffic to think that they should interpret the records in question as different protocols.  But this seems so remote, and unlikely to succeed given different protocols'
message structure, that it is probably not worth mentioning.

Section 18.2

If we're using RFC 5462 as a reference for a field in an MPLS label, that seems to make it normative.

We seem to depend on procedures from RFC 6811 in a few places, which also seems to make it normative.

Idr mailing list