[bess] Benjamin Kaduk's Discuss on draft-ietf-bess-datacenter-gateway-11: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 25 May 2021 03:42 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 714623A0FCB; Mon, 24 May 2021 20:42:43 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-datacenter-gateway@ietf.org, bess-chairs@ietf.org, bess@ietf.org, Matthew Bocci <matthew.bocci@nokia.com>, matthew.bocci@nokia.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.30.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <162191416295.8400.1863947061330586900@ietfa.amsl.com>
Date: Mon, 24 May 2021 20:42:43 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/2J6yq1uC_MC-jagtmQL35fooep4>
Subject: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-datacenter-gateway-11: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 May 2021 03:42:44 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bess-datacenter-gateway-11: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-datacenter-gateway/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Thanks for having the discussion with John and updating the document
already; I benefitted a lot from being able to read the -11 that has
started rolling in fixes from the prior discussion.  My one new discuss
point is relatively minor, all things considered, and is really just
trying to nail down an aspect of internal consistency.  (I also support
Roman's disuss, but we don't need to rehash that here.)

When we introduce the concept of gateways, we say that they can be
attached to the Internet or a backbone network.  We then go on to
provide a mechanism for gateways to advertise to some tunnel ingress
node the complete set of gateways for a given site.  It seems that we do
fairly consistently refer to this advertisement as being over "the
backbone network", but I'm not seeing anything that clearly disclaims
the applicability of this technique over the Internet itself.  However,
I think we need to have such a disclaimer, since we do have a clearly
stated assumption that "the connected set of DCs *and the backbone
network connecting them* are part of the same SR BGP Link State (LS)
instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe])"
(emphasis mine).  If the intent is to only use this mechanism over
"in-BGP-LS-instance" backbones and not over the Internet, we should
explicitly set the scope of applicability and contrast a gateway as a
generic concept and the gateway scenarios that this mechanism applies
to.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Thanks to Daniel Migault for the secdir review, and to the authors for
the updates in response to it.

The Abstract is perhaps pushing the bounds of reasonable length for an
abstract.  Perhaps:

% This document defines a mechanism using the BGP Tunnel Encapsulation
% attribute to allow datacenter gateway routers to advertise routes to the
% prefixes reachable in the site, including advertising them on behalf of
% other gateways at the same site.  This allows for multiple paths across
% the Internet or backbone (terminating at the different gateways) to be
% used by segment routing to steer traffic for load-balancing and
% resiliency purposes.

Section 1

   The solution described in this document is agnostic as to whether the
   transit ASes do or do not have SR capabilities.  the solution uses SR
   to stitch together path segments between GWs and through the ASBRs.
   Thus, there is a requirement that the GWs and ASBRs are SR-capable.
   The solution supports the SR path being extended into the ingress and
   egress sites if they are SR-capable.

There seem to be some nodes marked "ASBR" that are at the boundary
between the two transit ASes, in Figure 1.  This text leaves me
uncertain whether they are expected to support SR (vs just the ASBRs
that are attachment points for the ingress/egress GWs).

Section 3

   o  Each GW is configured with an identifier for the site.  That
      identifier MUST be the same across all GWs to the site (i.e., the
      same identifier is used by all GWs to the same site), and MUST be
      unique across all sites that are connected (i.e., across all GWs
      to all sites that are interconnected).

The advice in draft-gont-numeric-ids-sec-considerations is probably
relevant here.  How should we pick these identifiers?  Which properties
are necessary and which are not needed?

   o  Each GW MUST construct an import filtering rule to import any
      route that carries a route target with the same site identifier
      that the GW itself uses.  This means that only these GWs will
      import those routes, and that all GWs to the same site will import
      each other's routes and will learn (auto-discover) the current set
      of active GWs for the site.

This seems pretty fragile in the face of identifier collisions; I hope
there is some good text in the security considerations that covers the
risks here.
[ed. it seems we cover other aspects relating to identifier selection
but not this one]
Is there any filtering that can be done other than by site identifier,
e.g., to know that a certain peer would never be able to advertise
something that validly has the same site identifier?

   As described in Section 1, each GW will include a Tunnel
   Encapsulation attribute with the GW encapsulation information for
   each of the site's active GWs (including itself) in every route
   advertised externally to that site.  [...]

(I assume this is not intended to preclude the usual route
filtering/split-horizon type stuff.)

                                        As the current set of active GWs
   changes (due to the addition of a new GW or the failure/removal of an
   existing GW) each externally advertised route will be re-advertised
   with a new Tunnel Encapsulation attribute which reflects current set
   of active GWs.

The "everybody advertises the union of what they've seen" behavior seems
like it will latch NLRI in place as being a GW, but here we're saying
that removal will be propagated as well as addition.  What's the
mechanism for removing stale data (whether maliciously added or as part
of maintenance?  If it's an explicit withdrawl, is that also propagated
by everybody?  How long does it have to stay around for?  (I recognize
that some of this is just stock BGP, but I am looking for more clarity
on how it interacts with the "advertise the union of what you saw"
behavior that is new to this document.  The text in the next paragraph
mentions that there can be situations with broken internal routing where
things land in a broken state -- how long do they stay broken and how
can they be fixed?

   If a gateway becomes disconnected from the backbone network, or if
   the site operator decides to terminate the gateway's activity, it
   MUST withdraw the advertisements described above.  This means that
   remote gateways at other sites will stop seeing advertisements from
   this gateway.  Note that if the routing within a site is broken (for
   example, such that there is a route from one GW to another, but not
   in the reverse direction), then it is possible that incoming traffic
   will be routed to the wrong GW to reach the destination prefix - in
   this degraded network situation, traffic may be dropped.

This is probably worth reiterating in the security considerations
section.

   Note that if a GW is (mis)configured with a different site identifier
   from the other GWs to the same site then it will not be auto-
   discovered by the other GWs (and will not auto-discover the other
   GWs).  This would result in a GW for another site receiving only the
   Tunnel Encapsulation attribute included in the BGP best route; i.e.,
   the Tunnel Encapsulation attribute of the (mis)configured GW or that
   of the other GWs.

Are there noteworthy operational considerations of this, e.g., if all
the traffic gets directed to a GW that lacks the bandwidth to handle it?

Section 4

   attribute to identify the GWs through which X can be reached.  It
   uses this information to compute SR Traffic Engineering (SR TE) paths
   across the backbone network looking at the information advertised to
   it in SR BGP Link State (BGP-LS)

This seems to leave the reader wondering about the details of how those
SR TE paths are computed.  I understand that it's properly out of scope
for this document, but a reference would go a long way.

Section 5

   for a prefix X, then each GW computes an SR TE path through that site
   to X from each of the currently active GWs, and places each in an
   MPLS label stack sub-TLV [RFC9012] in the SR Tunnel TLV for that GW.

I don't think I understand why each (egress) GW has to (re)compute the
path through the site to X for each of the GWs at the site -- can't it
just take the sub-TLV it got from the peer and re-propagate it?

Section 6

[The topic of which sites are allowed to send in the site's native
encapsulation seems related to questions of what an "SR Domain" is and
what boundary security it has.  I think that the other ADs are basically
covering this topic, though, so am not sure there is much more to say
here.]

   If the GWs for a given site are configured to allow remote GWs to
   send them a packet in that site's native encapsulation, then each GW
   will also include multiple instances of a Tunnel TLV for that native
   encapsulation in externally advertised routes: one for each GW and
   each containing a Tunnel Egress Endpoint sub-TLV with that GW's
   address.  [...]

Does this implicitly require that all the GWs of the site have the same
configuration for whether or not to allow native encapsulation from
remote GWs?  How would things degrade if a mixed configuration did
happen to occur?

Section 8

   From a protocol point of view, the mechanisms described in this
   document can leverage the security mechanisms already defined for
   BGP.  Further discussion of security considerations for BGP may be
   found in the BGP specification itself [RFC4271] and in the security
   analysis for BGP [RFC4272].  The original discussion of the use of
   the TCP MD5 signature option to protect BGP sessions is found in
   [RFC5925], while [RFC6952] includes an analysis of BGP keying and
   authentication issues.

Such an elegant way of not mentioning TCP-AO :)
(I do see that it is actually referenced, just not mentioned by name.)

The whole section is quite nicely done, actually -- thank you!

Section 11

I don't really understand why draft-ietf-idr-bgpls-segment-routing-epe
is listed as normative but draft-ietf-idr-bgp-ls-segment-routing-ext is
listed as informative.  They seem to be used in the same place.

NITS/EDITORIAL

Section 1

   two DC sites.  In order for a source DC (also known as an ingress DC)
   that uses SR to load balance the flows it sends to a destination DC

I'd consider "also known as an ingress DC since it forms the ingress
endpoint of a tunnel".

   sites could each be constructed differently and use different
   technologies such as IP, MPLS with global table routing native BGP to
   the edge, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN.  That is, the

FWIW I don't think I figured out what "MPLS with global table routing
native BGP to the edge" means with any real confidence, and my attempts
to google it basically just found this document.  So please feel
encouraged to take another look at the phrasing.  My current most-likely
interpretation is that there is internally MPLS with global table, but
that what's presented to the outside is native BGP-based IP routing, so
there's some implied translation layer.

Section 3

   To avoid the side effect of applying the Tunnel Encapsulation
   attribute to any packet that is addressed to the GW itself, the GW
   MUST use a different loopback address for packets intended for it.

"different" is most clear when we list both things that differ.
So perhaps it's safer to say that the address advertised for
auto-discovery must use a different loopback address than is advertised
for packets directed to the gateway itself.

Section 5

   achieve this, each Tunnel TLV in the Tunnel Encapsulation attribute
   contains a Prefix SID sub-TLV [RFC9012] for X.  As defined in
   [RFC9012], the Prefix SID sub-TLV is only for IPv4/IPV6 labelled

I wonder if this sentence break should really be a paragraph break,
since the following paragraph seems to cover MPLS in a way that roughly
parallels how we treat IP here.

   applies to routes of those types.  If the use of the Prefix SID sub-
   tlv for routes of other types is defined in the future, further
   documents will be needed to describe their use.

I think that we are missing a "for SR TE tunnel encapsulation" at the
end, as the current text is basically saying "if the use of X is defined
in the future, future documents will be needed to describe the use of
X", which is fairly devoid of content.

   Alternatively, if MPLS SR is in use and if the GWs for a given site
   are configured to allow remote GWs to perform SR TE through that site
   for a prefix X, then each GW computes an SR TE path through that site

We might benefit from sprinkling around some "ingress" and "egress"
here.