Re: [Idr] BGP Classful Transport Planes

Gyan Mishra <hayabusagsm@gmail.com> Thu, 22 October 2020 06:32 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A1493A0B8E for <idr@ietfa.amsl.com>; Wed, 21 Oct 2020 23:32:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.086
X-Spam-Level:
X-Spam-Status: No, score=-2.086 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1xdblRAITaXC for <idr@ietfa.amsl.com>; Wed, 21 Oct 2020 23:32:47 -0700 (PDT)
Received: from mail-vk1-xa31.google.com (mail-vk1-xa31.google.com [IPv6:2607:f8b0:4864:20::a31]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 591643A0B8C for <idr@ietf.org>; Wed, 21 Oct 2020 23:32:47 -0700 (PDT)
Received: by mail-vk1-xa31.google.com with SMTP id d125so128562vkh.10 for <idr@ietf.org>; Wed, 21 Oct 2020 23:32:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fWt5X+Kd1KsiuxLeF7/hVwvFpBnBRFY4HyEZVB2fNIM=; b=BTR2zWM1yUq/E708Oiej0sx2pn7K8REEMhnzKWTR8JY0hX7XSWVN/gUOa76VeVxyhG n8DM5toar0EmFb+1puAljizKxMnclxsqevoPvybUtDmBdZeZsfPEmtP8YRqUt3F7J8cC nYTnxkdJg+If4P5SPR3HRU8/u4UYjifAIn4NxhbegeYvfNoIS2s9DEmnhRXKDb+lscci dzbPM2+odEGtwLdscTIi5PQGpJWXc1PoFPB9i9Ex5zoa8rukxtrv9SN+TW2SjRYoB+/c wILKM5IxdN1RuQHNvjRICeoKc6ec26h52RWIcOwsEsy+o9UWswwaiYB3LHB1z2xsJ7Oe Gliw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fWt5X+Kd1KsiuxLeF7/hVwvFpBnBRFY4HyEZVB2fNIM=; b=okZ0IkmpwfX2uzXzlFIkLIVy/xh9MKNW64jh6/+jz9yd4aSaTvG1NvNd6Qud/Uagq2 w0fHLwYdPs/bvYn3qvvuU90S7TCREwr08fuEnGCSngJ2KlYJt3KI3kSEgSAgd3zuy3PB DTJE8YPXMmFovu3RLk0z6SbXCOVQ5+TVPznyF0JFqhSBSajZRnBAZqeCwLx82Xih8wTY 6XErDfjIvkwtK5nj2IwY1x63ai1A1ykC+SjF3HywLx/S9mw/vJMXi/SzVt1JGEOWJM0R n+ofQTCpQAc4BRszxGPDZtb5H3qNsMCnNNDsIcGpKrASqIc+jO/N1Sf9OSEX2rk3H5or rN7g==
X-Gm-Message-State: AOAM53062+7ttEgVynfouvI5KfFUGlJvuNRpVJiDgWevXOFVr/0+rsty cj/CSCXOcDhq/X3Et2AiOw0BLE1Cujz4caDHAAU=
X-Google-Smtp-Source: ABdhPJzfIeOYQz5fvp5vvxH56YqZJ238WKqWtafkNCZo0W+FAs0mfYeYbnCNH7dsAyCb/yd7AlzjQJgfteZ3myyXyL4=
X-Received: by 2002:a1f:bd56:: with SMTP id n83mr517366vkf.5.1603348366232; Wed, 21 Oct 2020 23:32:46 -0700 (PDT)
MIME-Version: 1.0
References: <CAOj+MMEdijdGVMKS3Qf-nabj0gZk+rrf7ygZ1H+6AyvxdP7xuA@mail.gmail.com> <59A888A2-682A-4A36-B80A-CC46DB02D1EA@juniper.net> <CAOj+MME2fY0HT0jKd-mVPgJPyModgwLwexb=XXsKxPxW91yfsw@mail.gmail.com> <CABNhwV0cCm+n0q8tvJuaU9ayhhWrLoFNbfSGogeAM+ecgt3L8A@mail.gmail.com> <BD0F986A-1A85-49E8-9FCA-80D1232B5C3A@juniper.net> <CABNhwV3M3VQ8RRHSWofQxyfZykHks74=CKuADEhAVC1ENYikvQ@mail.gmail.com> <4F60F276-9EA7-4837-8934-9E2253C1D296@juniper.net>
In-Reply-To: <4F60F276-9EA7-4837-8934-9E2253C1D296@juniper.net>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Thu, 22 Oct 2020 02:32:15 -0400
Message-ID: <CABNhwV1OiEA=p2aAQz8mCgdTANdaNB6VV0UyYJA=siTkeg9OEw@mail.gmail.com>
To: Kaliraj Vairavakkalai <kaliraj@juniper.net>
Cc: Balaji Rajagopalan <balajir@juniper.net>, Natrajan Venkataraman <natv@juniper.net>, Robert Raszuk <robert@raszuk.net>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000d211d905b23ca18e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/03L2SntO437CxUsgMXG_zpg7o1M>
Subject: Re: [Idr] BGP Classful Transport Planes
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2020 06:32:51 -0000

On Tue, Oct 20, 2020 at 4:56 PM Kaliraj Vairavakkalai <kaliraj@juniper.net>
wrote:

> Hi Gyan, please see inline..
>
>
>
> *From: *Gyan Mishra <hayabusagsm@gmail.com>
> *Date: *Tuesday, October 20, 2020 at 8:00 AM
> *To: *Kaliraj Vairavakkalai <kaliraj@juniper.net>
> *Cc: *Natrajan Venkataraman <natv@juniper.net>, Robert Raszuk <
> robert@raszuk.net>, "idr@ietf. org" <idr@ietf.org>
> *Subject: *Re: [Idr] BGP Classful Transport Planes
>
>
>
> *[External Email. Be cautious of content]*
>
>
>
>
>
> Hi Kaliraj
>
>
>
> In-line
>
>
>
> Thanks
>
>
>
> Gyan
>
> On Mon, Oct 19, 2020 at 2:06 PM Kaliraj Vairavakkalai <kaliraj@juniper.net>
> wrote:
>
> Hi Gyan,
>
>
>
> No, there is no impact to existing CsC deployments.
>
>
>
>     Gyan> Understood as a new CT RIB is created per new SAFI 76.  So this
> draft defines a new SAFI RD/RT that could be used instead of the current
> BGP-LU Labeled Unicast rib in global table used for Opt C which would now
> be carried in a separate CT RIB within the domain.
>
>
>
> Today with service VPN their is an automatic underpinning with label stack
> topmost encapsulation so no recursion with VPN routes.
>
>
>
> I suppose you are mentioning option-B inter-ASBR link here. Other-wise at
> an ingress-node Service-VPN routes do use nexthop-resolution of their PNH,
> which can be resolved via either an intra-AS tunnel(RSVP, SRTE) or an
> inter-AS tunnel (LU/CT).
>
> Gyan> I was mentioning in general Intra AS VPN SAFI 128 that the service
> label is bottom of stack sits under the transport label LDP or TE under
> pinning.  In case of VPN mapped TE or per VRF TE next hop rewrite or SR-TE
> the service VPN label BOS would be placed underpinned to whichever
> transport label tunnel ends up being the preferred next hop is set to
> egress PE LSP endpoint.
>
With BGP LU,  it is has a BGP RIB for transport prefixes injected between
> domains as it’s a BGP RIB and not a separate VRF RIB as being proposed with
> CT RIB new extended community,
>
>
>
> Please consider the BGP-LU global RIB as the best-effort Transport-rib.
> And with CT, we create per-class instances of Transport-RIBs (e.g. gold,
> silver) each having a Transport-class ID (e.g. 100, 200 resp). And yes, we
> use the Transport-Class RT carried on the BGP-CT routes to leak them into
> appropriate Transport-RIBs. It is really a transport layer BGP-VPN.
> Origination of BGP-CT route happens at a tunnel-ingress node, where
> protocols like RSVP/SRTE publish their tunnel ingress routes into these
> per-class Transport-RIBs, by virtue of local config.
>
> Gyan> Understood. I think I am getting a better picture of the solution.
> So basically you can have any flavor of transports tunnels topmost label
> outer header you desire but now each transport tunnel type corresponding
> transport prefixes can be placed into a corresponding special SAFI 76 CT
> RIB.
>

    Gyan> Have you looked at BGP Tunnel encapsulation attribute draft
updates RFC 5512.
This does have some similarities to that draft and in some ways can provide
some of the same functionality.

https://datatracker.ietf.org/doc/draft-ietf-idr-tunnel-encaps/


Once the routes are collected in per-class Transport-ribs, they can now be
> used by other BGP-routes to resolve their nexthop. That is where the
> ‘mapping-community’ comes in.
>
>  Gyan> That’s a bit confusing.  So the CT rib contains the transport
> prefixes but then also contains the next hop recursion routes which maybe
> IGP routes.  How is that possible if the next hop is known in IGP and not
> BGP which is almost always the case.
>
> The presence/absence of ‘mapping community’ on a service/transport plane
> route indicates which of these transport-ribs (best-effort, gold, silver)
> the route’s PNH should be resolved in.
>
>
>
>
>
> and so BGP LU sit in the global table so has next hop recursion.
>
>
>
> BGP-LU has nexthop resolution at an ingress-node. But it doesn’t do
> nexthop-resolution at inter-ASBR link. Whether we do nexthop-resolution or
> not actually depends on whether the peer is directly connected or not. It
> is similar logic for all families: Inet-Uni, L3VPN, LU, CT. No changes
> there.
>
>  Gyan> Understood standard next hop recursion.
>
> In contrast the next hop underpinning to topmost label tunnel PNH would be
> a special mapping community mapped it sounds like to a primary and failover
> tunnel or a set of tunnels  to map CT transport service to underlay tunnel
>
>
>
> The mapping-community just indicates which Transport-RIB to resolve the
> route’s PNH on. And it is just a ‘role’ that any BGP-community can play.
> Local (auto)configuration at ingress/border node identifies
> mapping-communities, and resolution schemes they map to.
>
> Gyan> An example would be really helpful.  A picture is with a thousand
> words. 😀
>
    Gyan> I think I finally got it.  So all next hops are sorted by
transport tunnel type gold silver bronze and placed into discrete Transport
RIBs based on their mapping community setting.  The transport prefixes are
the recursed prefixes next hop prefixes one and the same.  Basically the
same BGP-LU next hop loopbacks from all PEs advertised inter-as opt now
have a variety of CT ribs they can be mapped into based on mapping
community.

> Different routes can use different communities as mapping-communities.
> Because they have different requirements with respect to fallback. To
> elaborate:
>
>
>
>    - Service-routes (e.g. L3VPN, Inet-Uni) use Color extended community
>    as mapping community. These map to a resolution-scheme which includes
>    fallback to best-effort tunnels.
>
> E.g. routes received with Color:0:100 will resolve using gold
> Transport-RIB (Transport-Class-ID:100), and fallback to best-effort
> transport-rib which may contain BGP-LU and other class-less tunnels (RSVP,
> LDP).
>
>
>
>   - Transport-routes (BGP-CT routes) use the Transport-Class RT as the
> mapping community. This maps to a resolution-scheme which resolves strictly
> over specified transport-class, and doesn’t use any fallback.
>
>     E.g. a BGP-CT route for RD1:PE1 route received with Transport-Class RT
> = transport-target:0:100 will resolve using “gold” Transport-RIB
> only(Transport-Class-ID:100). It will not fallback to best-effort
> Transport-RIB if gold transport-rib does not have any matching routes. This
> will result in sending a withdrawal for the “gold” BGP-CT route, thus
> providing a feedback mechanism to ingress-node that end to end gold path
> doesn’t exist now.
>
>
>
> This feature could also be used as stated in the draft with CSC customer
> carrier VRF new CT SAFI for the topmost labeled transport prefixes.
>
>
>
> That is right. But it is not the main intent. The main intent is to be
> able to do provide an architecture for Transport-class aware
> nexthop-resolution of BGP nexthops.
>
>
>
> Section 4 details the new extended community.  I believe this is a typo as
> the CT community would follow RFC 4360 and 5668.  You mention 8 byte RT and
> RD but I believe you meant 6 bytes community with new IANA codepoint for
> new Hi Order and low Order sub type setting.  I think you would want to
> stay with standard type 0 2 byte global 4 byte local for 2 byte AS, type 1
> 4 byte global 2 byte local for IPv4 Global, end type 2 4 byte global 2 byte
> local for 4 byte AS.
>
>
>
> rfc4360 Extended communities are 8 bytes. I didn’t get why you think it is
> a typo. Transport-Class identifier needed to be compatible with the Color
> value carried by SRTE (one of the classful transport protocols that exist
> today), that is the reason we defined the transport-class ID as 32-bits, in
> order to interoperate with existing technology. Hence defined a new format
> of RT instead of overloading the 4byte-AS format.
> Gyan> The size of the entire field is 8 bytes including the high and low 2
> bytes however what is administratively set is the global and local which is
> a total of 6 bytes which is the RD RT extended community.
>
>
>
>
>
> This draft creates a new SAFI that parallels BGP-LU in option-C
> deployments. I will clarify this with an explicit mention to option-C in
> the introduction.
>
>
>
>    Gyan> Understood
>
> The deficiency being addressed is: In today’s option-C deployments, if
> multiple classes(gold, bronze, etc) of tunnels exist to a destination PE1
> in one domain, they can not be extended out to the adjacent domains by
> using BGP-LU, such that service-traffic in those domains is able to choose
> a certain class of tunnel.
>
>
>
>     Gyan> In general Opt C is not generally used between ASs with
> operators in separate administrative domains end to end LSP.  The other
> down side of Opt C is the RR-RR eBGP SAFI 128 129 next hop unchanged
> peering which makes it impossible to use between operators in separate
> admin domains.  The last major issue with Opt C being addressed here is the
> import of loopbacks between domains has to be carried in the global table.
> So Opt C in general is used only for M&A or internal inter AS peering where
> their is a trust relationship and the loopbacks FECs can be imported
> between domains.  So with this SAFI 76 if addresses the loopback import
> into the global table issue but does not address the eBGP SAFI 128 129
> RR-RR peering for VPN and MVPN NLRI.  So in this option as the LSP is end
> to end the next hop self rewrite is not done in ingress egress NNI inter-as
> peering as the LSP builds end to end from ingress PE in domain A to egress
> PE in domain B.  So the major advantage of Opt B is that BGP LU is only for
> connected transport label on the inter-as peer and no loopbacks are
> injected between domains and as next hop self is utilized to terminate each
> segment of the LSP into 3 segments.  In Opt B the 1st segment is Ingress PE
> to egress PE domain A, 2nd segment inter as link, 3rd segment ingress PE to
> egress PE domain B.
>
>
>
> Option-B or intra-AS VPNs will also see benefit from this architecture.
> Because,
>
>    - Even without SAFI-76, intra-AS tunneling protocols (RSVP, SRTE)
>    publish their routes in Transport-RIBs created for the transport-classes
>    that exist in the network. And Service-routes can map to these
>    Transport-classes using appropriate mapping-community.
>
> When information in these transport-ribs needs to be exported out to
> BGP-speakers in other domains, SAFI-76 comes into play.
>
>
>
>    - When such an intra-AS service-route which has resolved over gold
>    transport-class, gets readvertised across inter-AS link for option-B, the
>    mpls swap-label will stitch to the gold transport-class tunnel. Thus the
>    option-B traffic can also get gold transport-class end-to-end, while
>    traversing any of the ASes participating in the option-B deployment.
>
>
>
>    - Also, when an AS is too big that full-mesh of tunnels between PEs is
>    hard to manage, the network can be segmented into regions. And SAFI-76 can
>    be used to carry Transport-class aware PE reachability across regions
>    (within one AS) in such cases, to stitch together gold-class tunnels in
>    each region, for e.g. to provide end-to-end gold tunnel in the AS.
>
>
>
>  Gyan> In way applying option C to option B in a way.  So with opt B you
> are stitching topmost transport label LDP lsp intra AS with BGP LU SAFI 4
> inter-as to build end to end lsp.  So on the intra-as side with opt b
> instead of stitching ldp to BGP LU you could to SAFI 76 end to end CT rib
> provisioned.
>
> does not address the eBGP SAFI 128 129 RR-RR peering for VPN and MVPN NLRI
>
>
>
> I agree the Multihop EBGP session between domains to exchange
> service-routes is required. This proposal doesn’t change that. I don’t see
> any way to avoid that in option-C. Yes Option-B avoids that, but it pushes
> additional service-routes state at the ASBRs, and increasing the
> convergence hops in the network. Both Option-B and Option-C have some
> benefits and disadvantages. This BGP-CT proposal doesn’t alter that. But
> there is a new proposal “MPLS-namespaces signaled by BGP” which can
> potentially bring benefits of both these options together. I may present it
> in the oncoming IETF, in BESS WG. Though it doesn’t avoid the ebgp-multihop
> session, it brings the benefit of option-B to option-C deployments, without
> increasing service-route scale at the ASBRs. Details of that mechanism will
> be out of scope of this discussion.
>
> Gyan> Understood
>
>
>
> So here I don’t see any benefit of using SAFI 76 for Opt B which scales
> well. With both Opt B SAFI 128 and new 76 you would have to do the RT
> rewrite as well so no benefit there.
>
>
>
> There is no intent to mix SAFI-128(service-routes) and SAFI-76. The idea
> is to maintain clear separation between service-routes and
> transport-routes. SAFI-76 carries only transport-prefixes.
>
> Gyan> Understood
>
> RTC is a RT membership optimization that can be used     RR-PE only RTs
> that the PE has membership explicit import of RT is what is sent to PE.  By
> default all PEs have RT filtering enabled by default so if their is not an
> interesting explicit import of the RT the RT is dropped.  So RTC is an
> optimizations for incongruent VPNs on operator PEs to cut down on flooding
> of all RTs. So for CT are we changing the behavior of RTC RFC 4684.  Are we
> making RTC part of CT architecture and what is the requirement that it must
> be used as RTC is only an optimization.
>
>
>
> No, we are not changing the behavior. RTC is still optional with this
> family as-well.
>
> Gyan> Good
>
> Also for CSC we here we are changing the VPN rib for customer carrier from
> SAFI 128 to 76.  With CSC hierarchical VRFs VPNs the transport prefix in
> the customer carrier VRF sit in the separate SAFI 128 RIB and so do the
> have the issue as exists with Opt C and you are just shifting from one VPN
> to another SAFI 128 to 76.  I still don’t see the benefit.
>
>
>
> There are a few benefits I can see:
>
>    - It makes it easier to give different treatment, like
>    per-prefix-label, to transport-prefixes carried in SAFI-76. It provides
>    stability by localizing churn. We don’t want to give this treatment to
>    routes advertised in SAFI-128. Because it carries service-prefixes as-well,
>    so it is a scaling problem.
>    - The transport-RIBs proposed in this solution are recommended to be
>    implemented as Control-Plane only RIBs. That don’t get pushed to
>    forwarding. Unlike the CsC VRFs today, which by default, do get pushed to
>    Forwarding as ip-prefixes and consume forwarding resources.
>    - More importantly, debugging. Looking at a SAFI-128 route, it is
>    difficult to tell immediately whether it is a service-prefix or a Csc
>    transport-prefix. If we use SAFI-76, it gives better visibility to humans
>    and tools managing the network.
>
>  Gyan> Understood
>
> With this draft overall I do see the benefit with Opt C to assist with the
> BGP LU carrying imported loopbacks between domains in the global table but
> don’t see benefit in any other inter as solution that exists today.
>
>
>
> Thank you for the note on Option-C. I have tried to clarify above the
> usefulness in the other use-cases.
>
>
>
> With SRv6 LPM IPv6 data plane and no BGP-LU and use of SR-TE binding sid
> policy inter domain I don’t see any benefit in using SAFI 76.
>
>
>
> With SR-MPLS as can use legacy inter as options would have same opt-c
> benefit but I think most customers using SR-MPLS would prefer to use SR-TE
> binding sid policy instead of MPLS based inter as options.
>
>
>
> IMO, SRTE will have the problem as one of our previous proposal (BGP-LCU).
> Because it carries Color in the NLRI, co-ordinating between different
> domains will be difficult. It is not as easy as rewriting RT.
>
> Gyan> Understood
>
> Thanks for the engaging discussion.
>
> Gyan> Welcome!
>
> Kaliraj
>
> Juniper Business Use Only
>
-- 

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *



*M 301 502-134713101 Columbia Pike *Silver Spring, MD