Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-datacenter-gateway-11: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Wed, 02 June 2021 05:22 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BEFFC3A18F0; Tue, 1 Jun 2021 22:22:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.894
X-Spam-Level:
X-Spam-Status: No, score=-1.894 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id woELb-oLH6FI; Tue, 1 Jun 2021 22:22:30 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F3093A18E9; Tue, 1 Jun 2021 22:22:29 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 1525ML3Q021707 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 2 Jun 2021 01:22:26 -0400
Date: Tue, 01 Jun 2021 22:22:21 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: Adrian Farrel <adrian@olddog.co.uk>
Cc: 'The IESG' <iesg@ietf.org>, draft-ietf-bess-datacenter-gateway@ietf.org, bess-chairs@ietf.org, bess@ietf.org, 'Matthew Bocci' <matthew.bocci@nokia.com>
Message-ID: <20210602052221.GI32395@kduck.mit.edu>
References: <162191416295.8400.1863947061330586900@ietfa.amsl.com> <029e01d75404$df5dd570$9e198050$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <029e01d75404$df5dd570$9e198050$@olddog.co.uk>
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/9Zq8I8VVbUtpVh90F-uPLb1NOu4>
Subject: Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-datacenter-gateway-11: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jun 2021 05:22:36 -0000

Hi Adrian,

It seems that I didn't manage to get a full reply written up today as
planned.  Let me send what I have now, and get back to the rest when I
have a bit more brainpower.

On Fri, May 28, 2021 at 10:03:12PM +0100, Adrian Farrel wrote:
> Hi Ben,
> 
> Thanks for the Discuss and detailed Comments.
> 
> Responses in line.
> 
> All changes are held in the -12 buffer as we close off issues with the other ADs.

Makes sense; there seems to be a fair bit that's interrelated across AD
ballots.

> 
> > --------------------------------------------------------------
> > DISCUSS:
> >
> > Thanks for having the discussion with John and updating the document
> > already; I benefitted a lot from being able to read the -11 that has
> > started rolling in fixes from the prior discussion.  My one new discuss
> > point is relatively minor, all things considered, and is really just
> > trying to nail down an aspect of internal consistency.  (I also support
> > Roman's discuss, but we don't need to rehash that here.)
> 
> But, FWIW, I think addressing your Discuss and one of your Comments, will further help Roman.

Peeking ahead, I agree :)

> > When we introduce the concept of gateways, we say that they can be
> > attached to the Internet or a backbone network.  We then go on to
> > provide a mechanism for gateways to advertise to some tunnel ingress
> > node the complete set of gateways for a given site.  It seems that we
> > do fairly consistently refer to this advertisement as being over "the
> > backbone network", but I'm not seeing anything that clearly disclaims
> > the applicability of this technique over the Internet itself.  However, I
> > think we need to have such a disclaimer, since we do have a clearly
> > stated assumption that "the connected set of DCs *and the backbone
> > network connecting them* are part of the same SR BGP Link State (LS)
> > instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe])"
> > (emphasis mine).  If the intent is to only use this mechanism over 
> > "in-BGP-LS-instance" backbones and not over the Internet, we should
> > explicitly set the scope of applicability and contrast a gateway as a
> > generic concept and the gateway scenarios that this mechanism
> > applies to.
> 
> It's actually very common to interconnect sites using a combination of private networks and the Internet (https://www.networkworld.com/article/3031279/sd-wan-what-it-is-and-why-you-ll-use-it-one-day.html). We use the term "backbone network" as a sort of shorthand and we don't mean to preclude use of private networks or of the Internet. We can clarify this 
> 
> We will change a sentence in the first paragraph 
> OLD
>    DCs are attached to the Internet or a backbone network by gateway routers
> NEW
>   DCs (sites) are interconnected by a backbone network, which consists of any
>   number of private networks and/or the Internet, by gateway routers
> END
> 
> The tunnels that interconnect any pair of ASBRs or GWs appear in the BGP-LS instance as links. The transit nodes in the "backbone network" do not show up in that BGP-LS instance.

Ah, that makes a lot of sense.

> What is important (and we missed) is that the tunnels that run across "untrusted networks" should be secure tunnels. We'll make this clear with an additional paragraph in the Security considerations as follows:

That seems important, yes :)

>      <t>Given that the gateways and ASBRs are connected by tunnels that may run across parts of the network that are not trusted, 
>         data center operators using the approach set out in this network SHOULD consider using gateway-to-gateway encryption to

"SHOULD consider"?  We're not even bold enough to go with just SHOULD?

>         protect the data center traffic.  Additionally, due consideration SHOULD be given to encrypting end-to-end traffic as it
>         would be for any traffic that uses a public or untrusted network for transport.</t>

I am thinking that I might be in a similar situation as Warren: my
recollection from when we processed 8402 is that nodes remotely connected
to each other would be using secure tunnels that provide at least as much
protection as the "secure physical network" within the local domain.  With
that background, I would have gone with MUSTs here, but I am not at the
moment finding much in 8402 to strongly support that as a requirement.

> > COMMENT:
> > ----------------------------------------------------------------
> >
> > The Abstract is perhaps pushing the bounds of reasonable length for an abstract. 
> > Perhaps:
> >
> > % This document defines a mechanism using the BGP Tunnel Encapsulation
> > % attribute to allow datacenter gateway routers to advertise routes to the 
> > % prefixes reachable in the site, including advertising them on behalf of 
> > % other gateways at the same site.  This allows for multiple paths across 
> > % the Internet or backbone (terminating at the different gateways) to be
> > % used by segment routing to steer traffic for load-balancing and
> > % resiliency purposes.
> 
> The Style Guide recommends 25 lines or fewer. We have 18 lines of text.

Er, which style guide?  I'm not seeing much in RFC 7322 ... and
https://www.rfc-editor.org/policy.html#policy.abstract (which admittedly
does say it has been replaced by https://www.rfc-editor.org/styleguide/)
puts the cap at 20, with 5-10 being "typical".

> Perhaps we can compromise a bit. We need to mention the fact that a site may have more than one gateway because that is an important part of the problem that needs to be solved. That takes us to:

I take it you don't think "other gateways at the same site" is enough
emphasis, then :)

> OLD
>    Data centers are critical components of the infrastructure used by
>    network operators to provide services to their customers.  Data
>    centers are attached to the Internet or a backbone network by gateway
>    routers.  One data center typically has more than one gateway for
>    commercial, load balancing, and resiliency reasons.
> 
>    Segment Routing is a protocol mechanism that can be used within a
>    data center, and also for steering traffic that flows between two
>    data center sites.  In order that one data center site may load
>    balance the traffic it sends to another data center site, it needs to
>    know the complete set of gateway routers at the remote data center,
>    the points of connection from those gateways to the backbone network,
>    and the connectivity across the backbone network.
> 
>    Other sites, such as access networks, also need to be connected
>    across backbone networks through gateways.
> 
>    This document defines a mechanism using the BGP Tunnel Encapsulation
>    attribute to allow each gateway router to advertise the routes to the
>    prefixes reachable in the site to which it provides access, including
>    advertising them on behalf of each other gateway to the same site.
> NEW
>    Data centers are attached to the Internet or a backbone network by
>    gateway routers.  One data center typically has more than one gateway
>    for commercial, load balancing, and resiliency reasons.  Other sites,
>    such as access networks, also need to be connected across backbone
>    networks through gateways.
> 
>    This document defines a mechanism using the BGP Tunnel Encapsulation
>    attribute to allow data center gateway routers to advertise routes to the 
>    prefixes reachable in the site, including advertising them on behalf of 
>    other gateways at the same site.  This allows segment routing to be used
>    to identify multiple paths across the Internet or backbone network 
>    between different gateways.  The paths can be selected for load-balancing,
>    resilience, and quality purposes.   
> END

Thanks, this is an improvement over the OLD version.

> > Section 1
> >
> >   The solution described in this document is agnostic as to whether the
> >   transit ASes do or do not have SR capabilities.  the solution uses SR
> >   to stitch together path segments between GWs and through the ASBRs.
> >   Thus, there is a requirement that the GWs and ASBRs are SR-capable.
> >   The solution supports the SR path being extended into the ingress and
> >   egress sites if they are SR-capable.
> >
> > There seem to be some nodes marked "ASBR" that are at the boundary 
> > between the two transit ASes, in Figure 1.  This text leaves me uncertain
> > whether they are expected to support SR (vs just the ASBRs that are 
> > attachment points for the ingress/egress GWs).
> 
> Whether the interior nodes of the transit ASes support SRs doesn't really matter to us. The GWs and ASBRs are interconnected by tunnels as represented by links in the BGP-LS instance. SR is used to stitch these tunnels together to create an end-to-end path between an ingress site and an egress site. Thus, the GWs and ASBRs must be SR capable in order to participate.

I think we should say something about how the ASBRs that are "abstracted
away" by the tunnel are not relevant, since otherwise just saying "ASBR"
would seem to include them.  (They are listed in the figure as ASBRs, after
all.)

> > Section 3
> >
> >   o  Each GW is configured with an identifier for the site.  That
> >      identifier MUST be the same across all GWs to the site (i.e., the
> >      same identifier is used by all GWs to the same site), and MUST be
> >      unique across all sites that are connected (i.e., across all GWs
> >      to all sites that are interconnected).
> >
> > The advice in draft-gont-numeric-ids-sec-considerations is probably 
> > relevant here.  How should we pick these identifiers?  Which properties
> > are necessary and which are not needed? 
> 
> This bullet needs to be taken along with the bullet that follows, viz.
> 
>    o  A route target ([RFC4360]) MUST be attached to each GW's auto-
>       discovery route (defined below) and its value MUST be set to the
>       site identifier.
> 
> We will re-jig these two bullets to say...
> 
>    o  A route target ([RFC4360]) MUST be attached to each GW's auto-
>       discovery route (defined below) and its value MUST be set to a 
>       value that identifies the site identifier.  The rules for constructing

(nit) "identifies the site identifier" is probably one too many "identifie"s

>       a route target are detailed in [RFC4360].  It is RECOMMENDED that
>       a Type x00 or x02 route target be used.
> 
>    o Site identifiers are set through configuration.  The site identifiers 
>       MUST be the same across all GWs to the site (i.e., the same
>       identifier is used by all GWs to the same site), and MUST be
>       unique across all sites that are connected (i.e., across all GWs
>       to all sites that are interconnected).

This helps, but I still wonder if we can give some guidance to the operator
on how to choose the value of the (configured) site identifier.  If
everyone decides to label their sites as consecutive integers starting at
one, for example, that is likely to end in sadness later on.

> > o  Each GW MUST construct an import filtering rule to import any
> >      route that carries a route target with the same site identifier
> >      that the GW itself uses.  This means that only these GWs will
> >      import those routes, and that all GWs to the same site will import
> >      each other's routes and will learn (auto-discover) the current set
> >      of active GWs for the site.
> >
> > This seems pretty fragile in the face of identifier collisions; I hope there
> > is some good text in the security considerations that covers the risks here.
> > [ed. it seems we cover other aspects relating to identifier selection but not
> > this one] Is there any filtering that can be done other than by site identifier,
> > e.g., to know that a certain peer would never be able to advertise something
> > that validly has the same site identifier?
> 
> This mechanism to control the importing of routes (using route targets) has been in use in BGP networks for a while (more than 15 years since RFC 4360).
> 
> One of the big uses for route targets is in identifying which sites belong to the same VPN (see RFC 4364). There is the potential to misconfigure the route target used to identify a set of VPN sites by giving one site the wrong value. That would mean that it:
> - fails to import the relevant routes
> - doesn't get its routes imported by its peers
> - may accidentally import false routes
> - may accidentally have its routes imported by other sites in different VPNs
> That is all "bad stuff" and 4364 contains no mechanisms to handle what happens if there is a misconfiguration (it's not an attack vector unless BGP or a router is compromised - in which case far worse stuff happens). Basically, if you mess up the configuration, expect the unexpected.

Agreed, this is all "bad stuff", and the latter two are more what I had in
mind when I wrote the comment.  It's part of my job to wonder about the
potential scope of fallout if BGP or a router is compromised, even if we
expect that to not actually happen much in practice :)

> For us, the biggest risk is that GWs fail to discover each other rather than GWs finding false partners. This is because the BGP peerings are explicitly configured, not discovered. And that is a big reason why this is not an attack vector for us. 
> 
> Note that Section 3 concludes with (as you comment upon, below)...
> 
>    Note that if a GW is (mis)configured with a different site identifier
>    from the other GWs to the same site then it will not be auto-
>    discovered by the other GWs (and will not auto-discover the other
>    GWs).  This would result in a GW for another site receiving only the
>    Tunnel Encapsulation attribute included in the BGP best route; i.e.,
>    the Tunnel Encapsulation attribute of the (mis)configured GW or that
>    of the other GWs.
> 
> That said, we will call this out in the Manageability Considerations section pointing out that getting this right is important.

Thanks.

> >   As described in Section 1, each GW will include a Tunnel
> >   Encapsulation attribute with the GW encapsulation information for
> >   each of the site's active GWs (including itself) in every route
> >   advertised externally to that site.  [...]
> >
> > (I assume this is not intended to preclude the usual route filtering/split-horizon type stuff.)
> 
> Definitely doesn't preclude it. In fact that rule remains key to the way GWs behave - they don't re-advertise routes they have learned from other GWs. But (of course) all the GWs to a site advertise the same routes. 

I had to read this twice just to make sure I'm picking up on the details:
the GWs of a site do *not* advertise the *routes* they learn from each
other; the only re-advertising is of GW encapsulation information.
I assume that John has covered the case of intra-site connectivity issues
whereby this could entail a site advertising a route with encapsulation
information that couldn't actually get to that prefix, and will not think
about it further.

-Ben