Re: [Idr] Regd. https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/

Igor Malyushkin <gmalyushkin@gmail.com> Thu, 17 August 2023 18:16 UTC

Return-Path: <gmalyushkin@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6BAF1C151545 for <idr@ietfa.amsl.com>; Thu, 17 Aug 2023 11:16:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FMTbsc6qI-2D for <idr@ietfa.amsl.com>; Thu, 17 Aug 2023 11:16:31 -0700 (PDT)
Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6E1B0C151541 for <idr@ietf.org>; Thu, 17 Aug 2023 11:15:36 -0700 (PDT)
Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-6889350bc2bso93533b3a.0 for <idr@ietf.org>; Thu, 17 Aug 2023 11:15:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692296135; x=1692900935; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=waIg+fSQ2+Waj9ZgbrX/h04GB4emBwFEs2da9Av/ZVE=; b=Mp8dAk1t7zvLMgQPT+EKer8M3c7wY/mNF6SDn/7csuVtOY51D5lgjyjphHIPSleZwp rO7frIZitEAz5zQJ+VjQVP3gyig1L8WTxIjoitek2dU4tWT9X5w1MFIshN6exVFZD4Fj Byop2LSVMlV9hGZRmM2hUhiC70GoZ/Vat+dz99CymvQ6qG4e/As2yYQ0pPZ+dFOsFnQH dZTam+EzNjatC3cog6Z8M/JDpcZ/8ibItUPjYSZfvflt1TRLmodH4Pt1q2ZD4kTGuv1c BsCu/ahVh+2lEyN4svs+iQp3VuSNLxv/WPZm/pP5U9bXIS402UMHFkfGZOP9LaaHAey3 j4Yg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692296135; x=1692900935; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=waIg+fSQ2+Waj9ZgbrX/h04GB4emBwFEs2da9Av/ZVE=; b=kdIrWclisiH0Oh9zLfSpk5cM4ZVoK4vICPLKfgysMlZxbVaa6i5T+rJIt+rYDCNLpG PH/tEt5iOWRiJ7uOjLouVeqbhv5mhieOHCKw6jNQYjBTVKoqXLpc80ueB2AhWXowDq+r KCWUo+yUccRgwExZNijAAFg9LO/zNXmZFV0nIQcDWsNJRCgrNP0p6wCAGLpwnkkTW4Ox FnJvFZbRq2LmmcIbdaX79dbS10M0OjxB98druZrXnoFItwua7xKqVaBDTYAiMgkXuaf7 SdLrG4leuL07aR0l0ZFq1tEYJmkknEd9ey4IoiGrRXNTn8c5GFDZVPgwAzUMcY8i7xVv uC4w==
X-Gm-Message-State: AOJu0YyGQruMFy3pE4aHp46iffr7tm44IC15G7W7UebKLo1j5fzbQsYx Pdz7Cw6B55jTYtj2zx+C7ELd6PhcdCUbcNeJXUc=
X-Google-Smtp-Source: AGHT+IHDEbvgBR7spuTpWmUppluI5jzIXUhQPEvGw0DIt8N3mizcNsB9JJCAyRmm3eHwAaRc7psmp4qoOSm6iT3lLCU=
X-Received: by 2002:a05:6a00:1a56:b0:674:8fe0:126f with SMTP id h22-20020a056a001a5600b006748fe0126fmr322856pfv.27.1692296135326; Thu, 17 Aug 2023 11:15:35 -0700 (PDT)
MIME-Version: 1.0
References: <40ad79902852443d8783a322dffbab8a@huawei.com> <CH2PR11MB4312EC318A3E8C1667C784ADD431A@CH2PR11MB4312.namprd11.prod.outlook.com> <BY5PR11MB43055C64B2497F586ACB64BED401A@BY5PR11MB4305.namprd11.prod.outlook.com> <CAOj+MMFP+u6UGpTAyvn7KhRww00mmd-iGmHxBnFg9OeGNF-X7Q@mail.gmail.com> <CAEfhRrx5oNeW2z4V9pDqs9nSgFzFH6oiK1CCEOf+FQj_DuimsQ@mail.gmail.com> <CAOj+MMEiVMxR3JKwXdT7=6ozmmZYJR95iQfqGOHU1Vm5XzXi7w@mail.gmail.com> <CAEfhRrxKfN+8bZnxSND4zo=8h_Y=q+rWsM6BEZf9FC3BcJe2Xw@mail.gmail.com> <BY5PR11MB4305392CAD13D631EC3C9CD5D411A@BY5PR11MB4305.namprd11.prod.outlook.com> <CAEfhRrwzP=uY5uhw5K8tSqprPn_4z6n_00CNqTQtyRHZK61sGQ@mail.gmail.com> <BY5PR11MB4305B6A5597F1724FCBF1149D414A@BY5PR11MB4305.namprd11.prod.outlook.com> <CAEfhRrxMGbv-wDbusLEXA9UjGaFRLL-YSGpXRtVP8miG98bqAQ@mail.gmail.com> <BY5PR11MB4305FC48F79D09A2996B6780D41AA@BY5PR11MB4305.namprd11.prod.outlook.com>
In-Reply-To: <BY5PR11MB4305FC48F79D09A2996B6780D41AA@BY5PR11MB4305.namprd11.prod.outlook.com>
From: Igor Malyushkin <gmalyushkin@gmail.com>
Date: Thu, 17 Aug 2023 22:15:22 +0400
Message-ID: <CAEfhRryHwBZe=5-NwjW9WBWsxqSNP7R_zMCiQH_pxXVstLhODQ@mail.gmail.com>
To: "Satya Mohanty (satyamoh)" <satyamoh@cisco.com>
Cc: Robert Raszuk <robert@raszuk.net>, "idr@ietf.org" <idr@ietf.org>, "RAMADENU, PRAVEEN" <pr9637@att.com>, "MEANS, ISRAEL L" <im8327@att.com>
Content-Type: multipart/alternative; boundary="00000000000000450d0603226609"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/Y0jgnLROWQ3PtX1Ty2wh2lUicyU>
Subject: Re: [Idr] Regd. https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Aug 2023 18:16:35 -0000

Hi Satya,

Thanks for your response, please, see my inline.

чт, 17 авг. 2023 г. в 20:49, Satya Mohanty (satyamoh) <satyamoh@cisco.com>:

> Hello Igor,
>
>
>
> Thanks for your email. Please see my comments inline [Satya].
>
>
>
> Thanks,
>
> --Satya
>
>
>
> *From: *Igor Malyushkin <gmalyushkin@gmail.com>
> *Date: *Tuesday, August 15, 2023 at 2:48 AM
> *To: *Satya Mohanty (satyamoh) <satyamoh@cisco.com>
> *Cc: *Robert Raszuk <robert@raszuk.net>, idr@ietf.org <idr@ietf.org>,
> RAMADENU, PRAVEEN <pr9637@att.com>
> *Subject: *Re: [Idr] Regd.
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
> Hi Satya,
>
> Thanks for your mail. My comments are inlined, but I want to express one
> general point. To me, this solution's primary goal is too restrictive: "This
> draft utilizes the concept of a secondary label to solve few cases in L3VPN
> Deployments". Also, both examples (PD#1, and PD#2) pose more questions,
> than express the topic. The problems, that these examples rise, can be
> solved by the design's altering.
>
> [Satya] The draft is addressing a problem that we came across in
> deployments. Respectfully disagree that it “poses more questions”. We have
> clearly mentioned and explained the cases where the problem
> statement/solution is applicable and where we cannot make idealized
> assumptions. You said earlier that PD#2 solution is a good way to do for
> BGP LU. PD#1 is an issue which is recognized by other vendors in a similar
> setting.
>
[IM2] Well, all this thread is based on uncertainties of the examples from
the draft.
Can you explain, for example, why in PD#1 there is a control-plane churn
after an RR receives a route from another RR? My understanding of a
per-next-hop mode does not express this issue. There is some per-context
mode mentioned in the draft, but it is unclear if a label is allocated for
a context, not a next-hop, why is there any churn after a new NHFLE is
added in this context? Why the RR does allocate a new label again? We still
have the original best in the context.
Talking about PD#2, from my POV, this example can be more clear if you
delete an ISP2 at all. At this moment, I still don't understand why we
can't send traffic locally to an external peer instead an internal backup.


>
> As we can see, these *few cases in L3VPN deployments* are linked to PIC's
> mechanics. If this draft were more general and tried to do with PIC itself
> for a broader scope of families, it would be much better from
> my perspective. This is where all the discussion below would be unnecessary.
>
> [Satya] We have started with L3VPN, but similar concept can be applicable
> to EVPN (type 2) and other VPN families. Why not?  But this is TBD.
>
> Also, PIC is a broad topic. We are not aiming to fix every issue there.
> (Some deficiencies in PIC may not even require IETF standardization). We
> can write a line to that effect also.
>
[IM2] And I have never asked you to solve all of them :) All that I said,
there is an exact problem with PIC that can be addressed for several
families with a terminating label concept. I'm a little concerned with your
"TBD", should we expect the same approach for every family as a dedicated
doc?

>
>
>
>
> вт, 15 авг. 2023 г. в 06:28, Satya Mohanty (satyamoh) <satyamoh@cisco.com
> >:
>
> Hi Igor,
>
>
>
> Thanks for your mail.
>
> Replying to your observations on PD#1 as mentioned yesterday.
>
>
>
> Inline [Satya].
>
>
>
> At some point, we will summarize the discussions at one place, so as not
> to lose content.
>
> Emails on this split content at times.
>
> I also had quite a bit of unicast email exchange with Robert to clarify
> things.
>
>
>
> Thanks,
>
> --Satya
>
>
>
> *From: *Igor Malyushkin <gmalyushkin@gmail.com>
> *Date: *Sunday, August 13, 2023 at 9:34 AM
> *To: *Satya Mohanty (satyamoh) <satyamoh@cisco.com>
> *Cc: *Robert Raszuk <robert@raszuk.net>, idr@ietf.org <idr@ietf.org>,
> RAMADENU, PRAVEEN <pr9637@att.com>
> *Subject: *Re: [Idr] Regd.
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
> Hello Satya,
>
> From my understanding, in this solution, the number of next-hop and label
> pairs grows twice at every ABR/ASBR. What if any PE allocates labels in a
> per-prefix fashion? We will spend twice more resources at any intermediate
> NHS node.
>
> [Satya] Indeed, the per-nexthop-received-label mode is useless if the
> sourcing router is using per-prefix label. Let us be clear on that. That is
> not the target use-case. The use-case here is when the source is using
> either pe-vrf or per-ce label allocation scheme in the L3VPN.
>
> [IM] Good, thank you for confirming. My point is that the draft should be
> clear on that. With a per-next-hop allocation mode at any intermediate
> point and a per-prefix at an originator router (say, by mistake) there is
> no multiplication of resources utilization at the former. They are the same
> as for the latter. It will be good to see in the text as a warning that
> resources are multiplied two times because an ASBR/ABR allocates an
> additional label per prefix in this case.
>
> [Satya] Wish to point out here that extra label allocation will be there
> only when the net has two paths with one of them being backup. Otherwise,
>  extra resources (labels) are not allocated. We can insert a sentence to
> that effect.
>
[IM2] Yes, but in the case of a single path, there are no problems
described by your solution. Or am I missing something?

> Also, I have a question w.r.t the optional transitive attribute. Don't we
> have the same problem as we have with the entropy label attribute here?
> What if we have a pair of ASBR/ABR that does not support this solution,
> make NHS, and propagate routes with this attribute? If we have any PE
> underneath, supporting this solution and doing PIC, in case of failure of
> one of these ASBR/ABR, will the traffic be blackholed at another with an
> unknown secondary label?
>
> [Satya] This is an issue with optional transitive attributes in general.
> We will need to limit propagation via filtering wherever applicable (likely
> using attribute discard semantics based on attribute code-point) or by
> future protocol feature scoping etc. (Jeff has given a very good
> description in his attribute escape draft).
>
> [IM] Here too, can we see some considerations on this point in the text?
>
> [Satya] Ack, we can do that.
>
[IM2] Thank you!

>
>
> For PD#1, at first, we tried to solve the issue with per-prefix label
> allocation for VPN prefixes, turned the per-next-hop mode, and got another
> issue. To cope with this, the draft offers to allocate again some
> additional labels. This of course is less scaling demanding but
> nevertheless. LSP hierarchy solves this problem better than a
> flat structure. In your example with different next-hops, I don't see a
> good reason to not have connectivity among all PEs and RRs. In this case,
> independent of the number next-hops, the problem is solved.
> For Option B, mentioned draft also offers to use an LSP hierarchy, which
> solves the issue too. As I understand, almost all machinery is already
> defined and standardized for that purpose.
>
> [Satya] As mentioned earlier, the different next-hops is an issue. If a
> service route has two different next-hops (and that too we do not know
> before-hand) at the RR1 and RR2 we can’t really do PIC. We are advocating a
> solution which is least restrictive with respect to assumptions.
>
> [IM] Sorry, I can't see any restrictions here. You are probably describing
> an anycast case. I agree that an RR is lack PIC for a service prefix here
> because it does not do NHS for it. But the requirement for having PIC at
> this point generally stems from the inability to propagate an NH failure
> between domains/areas/levels/etc. This is not the case when we signal LSPs
> to all or several next-hops in an end-to-end manner. So, here we can do PIC
> at ingress. This is just a matter of propagation of a single BGP route.
>
> [Satya] No, I am not talking about the anycast case. When next-hops are
> different, you cannot chain the service prefix to a single next-hop. I
> think that is clear. So. we cannot do PIC at the ingress. Also, in option-B
> as we have today and deployed for 15+ yrs., we never signal LSPs (to source
> next-hops) downstream.
>
[IM2] My point is you don't have to chain the service prefixes, with an LSP
hierarchy it is not required at all, and the prefixes are propagated
without NHS E2E. So there is nowhere a different next-hops for a service
prefix can arise except the anycast case.

>
>
> I read https://datatracker.ietf.org/doc/draft-zzhang-bess-vpn-option-bc/
> that you referred earlier.
>
> It is well-written from the perspective of aiming at label conservation
> but is not addressing the label oscillation problem per-se.
>
>  [IM] Yes, this draft is absent of PIC and all related problems. But I
> believe this is a point where it can be improved.
>
>
>
> Section 1.2.1 is a non-starter. We can’t have multiple labels in the NLRI
> in the L3VPN.  Not at this day.
>
>  [IM] I can't see why not. An ASBR allocates a new label to stitch it to a
> transport LSP towards an egress PE's NH. How many labels are in NLRI of
> L3VPN is not important because the ASBR does not manipulate these labels,
> they have to be added to a stack already by an ingress PE and stay
> unchanged till the very end. From my understanding, the ASBR's LFIB is
> unaware of any service labels at all.
>
> [Satya] Let me explain. The issue I am pointing out here is solely in the
> Control Plane. In VPN4 and VPNV6 [RFC 4364] routes, today only one label
> goes with the NLRI. What is being proposed is to have another label
> accommodated in the NLRI. And I am not even going into EVPN (type-2 for
> instance)  ☹.
>
>
>
> To accommodate the second label in the encoding of the NLRI, every
> ASBR/RR/PE in deployment need to be upgraded. I doubt this will happen.
> That is the reason I mentioned this approach seems to be a non-starter.
>
[IM2] If you are talking from an implementation perspective, I agree, but
with your solution, we also have to update some routers in the network.
>From the perspective of protocols' machinery, VPN routes are based on the
logic of labeled unicast ones, and the latter already supports multiple
labels. But yes, this feature is not widely deployed.

> Perhaps TEA approach is more feasible ?
>
> But I do see a big problem if one somehow fits in this IAS BC solution to
> the use-case we have here.
>
> The issue is that the draft keeps the *service label invariant and so it
> cannot achieve PIC*.
>
>  [IM] Well, the draft is silent about PIC, yes. But if my understanding of
> it is correct (see above), I don't think PIC is impossible.
>
> [Satya] In its current form I don’t see how one can do PIC. There is
> discussion on this in BESS itself.
>

>
> Besides, it does do extra label allocation for the transport end-point
> each time the next-hop changes. So,  there is also some extra label
> allocation.
>
> And it does require upgrades at each ASBR and source/sink PEs as well as
> the RRs :)
>
> There are other things like Label spoofing etc. but not relevant to this
> discussion.
>
>  [IM] Yes, these are its weak sides. Almost every new solution has its own
> :)
>
>
>
> Best,
>
> Satya
>
>
>
> Best,
>
> Satya
>
>
>
>
>
> For PD#2, I also see have some questions. Please, see the inline.
>
>
>
>
>
> вс, 13 авг. 2023 г. в 18:39, Satya Mohanty (satyamoh) <satyamoh@cisco.com
> >:
>
> Hi Robert and Igor,
>
>
>
> 1. The RRs are non-clients to each other. It is the PEs who are the RR
> clients. We have that in reverse in the draft. Thanks for pointing that out.
>
> We had this noted down before submission but unfortunately forgot, both
> during the draft submission and in the presentation.
>
> We will amend this in the next version and substitute “client” with
> “non-client” in the following text.
>
>
>
> “Both these RRs are also clients of each other and advertise VPN routes to
> each other with the next-hop set to the peering address.”
>
>
>
> Irrespective, the RR client/non-client discussion or an option-B IAS (in
> which case none of RFC4456, RR clients/non-clients/ cluster-id etc. apply)
> should not detract from the main topic.  BTW, a topology like Fig.1 (which
> is greatly simplified) is in production for more than 2 years now without
> any RR related issues.
>
>
>
> 2. Igor, we did consider about a year back one of your suggestions i.e.,
> keep VPN next-hops unchanged, leak the next-hop in the BGP LU and do the
> PIC in the BGP LU route (the Nexthop for the VPN). We had it verified in
> the lab too.
>
>
>
> However, there is one big issue. If the next-hop of the VPN route is *not
> the same*, this scheme fails. In figure below, VPN route V is received at
> RR1 with next-hop PE1 and at RR2 with next-hop PE1’. Since the next-hops
> themselves are different (there are good reasons why they are different but
> cannot go there) , we can’t do as you suggest.  Also, as you have also
> mentioned, solution with LU does not work in the case of option B.
>
>
>
> PE1         PE1’
>
>  |              |.  V
>
>   |.             |
>
> RR1-----   RR2
>
>  \              /
>
>     \          /
>
>       \      /
>
>         PE2
>
>
>
> I will investigate the draft that you mentioned and get back. Thanks for
> the reference.
>
>
>
> Regarding PD#2, I will try to explain the issue with respect to a
> particular VPN prefix with regards to Figure 2 in the draft. Let’s say we
> are doing vanilla PIC.
>
> 1.  Local label at PE1 has primary path with next-hop ISP1 and backup
> PE2. Say this label is 100. At PE1, we cannot have the backup to ISP2
> because of the given objective constraint that traffic should be able to
> still reach ISP1 so long as there is a path from one of the PEs to ISP1. If
> we choose the backup as ISP2, and PE2-ISP1 was intact, then we would have
> defeated our purpose if we forwarded to ISP2 directly since the forwarding
> path PE1—PE2—ISP1 exists.
>
> [IM] My reading of the part "objective constraint that traffic should be
> able to still reach ISP1 so long as there is a path from one of the PEs to
> ISP1" rises a question. How can we know that there is such a path at all?
> We can't differentiate a node failure from a link one. So, when the link at
> PE1 towards ISP1 fails, the draft makes an assumption that is the link
> failure and reroutes traffic to PE2. It works for link failure but does not
> work for node failure. According to this draft, traffic will be dropped at
> PE2, instead of being locally rerouted at PE1 to ISP2.
>
>
>
> 2.  Local label at PE2 has primary path with next-hop ISP1 and backup PE1.
> Say this label is 100. We cannot have the backup to ISP2 because of the
> same constraint that I mentioned in (1) above.
>
>
>
> If traffic from PE0 is ingressing at PE1 with label 100. If PE1-ISP1 link
> breaks, with vanilla PIC, traffic will be diverted to PE2 with label
> swapped to 200. At PE2, if it then finds that PE2-ISP1 is broken, it will
> send it back to PE1 after swapping label to 100, and then the micro-loop
> ensues until the BGP Convergence.
>
>
>
> I think it may be easier to describe in better details in the next
> version, so that similar questions do not prop up.
>
>
>
> Best Regards,
>
> --Satya
>
>
>
>
>
>
>
> *From: *Igor Malyushkin <gmalyushkin@gmail.com>
> *Date: *Saturday, August 12, 2023 at 6:10 AM
> *To: *Robert Raszuk <robert@raszuk.net>
> *Cc: *Satya Mohanty (satyamoh) <satyamoh@cisco.com>, idr@ietf.org <
> idr@ietf.org>, RAMADENU, PRAVEEN <pr9637@att.com>
> *Subject: *Re: [Idr] Regd.
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
> Hi Robert,
>
> Well, maybe RFC4456 indeed requires some clarification. From
> my experience, inline RRs are not the same as regular ones. Yes, they use
> the same mechanics but solve other tasks, and because they are LSRs for BGP
> LSPs or VPN LSPs some tricks with CLUSTER_IDs, peering, and label
> allocation modes are required here.
>
> I agree that the solution from PD#1 is a bad idea to solve the scaling
> issue. I don't think that there should be a new solution with the next
> layer of labels and a new path attribute whenever BGP LU is here for ages
> and solves this problem better.
>
>
>
> With Option B I would like to see which of the approaches is better (this
> one or B/C).
>
>
>
> сб, 12 авг. 2023 г. в 16:54, Robert Raszuk <robert@raszuk.net>:
>
> Hi Igor,
>
>
>
> > Using different CLUSTER_IDs for inline RRs at the same hierarchy level
> is common
>
>
>
> Even if you do setup different CLUSTER_IDs it should be fine ... as the
> other RR should not accept an UPDATE MSG when he seems his own CLUSTER_ID
> in the incoming update.
>
>
>
> Remember CLUSTER_ID should get prepended upon reflection not overwritten.
>
>
>
> Label allocation has nothing to do with loop. It is broken reflection
> configuration which causes described loops.
>
>
>
> Yes between clusters you can setup non client IBGP to fully mesh clusters,
> but within cluster it is rather a poor idea to make RRs clients of each
> other.
>
>
>
> So PD#1 is simply a misconfiguration IMHO.
>
>
>
> If you think otherwise please update RFC4456 first. Only then we could
> consider solutions to the problem caused by such update.
>
>
>
> Regards,
>
> Robert
>
>
>
>
>
> On Sat, Aug 12, 2023 at 2:39 PM Igor Malyushkin <gmalyushkin@gmail.com>
> wrote:
>
> Hello, Robert, Satya,
>
> Using different CLUSTER_IDs for inline RRs at the same hierarchy level is
> common. Especially when there is a labeled unicast underneath. Although, I
> don't understand why two RRs should be clients to each other instead of
> regular non-client peers.
>
>
> For PD#1, it is possible to signal LU addresses of PE1, PE2, and both RRs
> and use them as NHs for VPN prefixes. In this case for labeled unicast
> prefixes a per-prefix label allocation mode completely solves the problem.
> For VPN sessions RRs do not apply next-hop-self but act as classical RRs
> (or even can be unaware of any VPN sessions at all). Classical seamless
> MPLS approach. With the different CLUSTER_IDs, PIC between the RRs can be
> maintained also.
>
>
> If we talk about Option B, the solution with LU does not obviously work,
> but there are several approaches to cope with scaling problems, Option A/B,
> and Option B/C (draft-zzhang-bess-vpn-option-bc-00). The latest is the new
> draft that combines a two-labeled approach but does not require new path
> attributes.
>
> For PD#2, here I agree with Robert that it is strange to use internal BGP
> paths instead of external ones for PIC in that case. What if the ISP1 box
> goes down? All the traffic will go to the ISP2 box from both PEs anyway.
> Isn't it wise not to use internal BGP paths for a link failure? Actually,
> we don't even differentiate a link down even from a node failure. But we
> are trying to apply different FFR technics there.
>
> [Satya] Well, we do use internal paths in the best-external case. In case
> of box failure that you mention, if we can infer that, sure, there can be
> an optimization to directly send to ISP2.
> Also, for a possible loop, does not NFRR from the MNA framework solve this
> issue at the transport level?
> [Satya] Will look that up.
> My 2 cents.
>
>
>
>
>
> сб, 12 авг. 2023 г. в 15:45, Robert Raszuk <robert@raszuk.net>:
>
> Satya,
>
>
>
> *Reg PD#1: *
>
>
>
> Problem described as PD#1 arises by violation of RFC4456 rules. When your
> RRs are part of the same cluster (and here they clearly are) it is
> mandatory to use the same CLUSTER_ID on both route reflectors. That will
> prevent any reflected routes to get accepted by the other RR client.
>
>
>
>    Both these RRs are also clients of each other and advertise VPN routes to each other with the
>
>    next-hop set to the peering address.
>
>
>
> Please do not invent a bandage to heal wounds which should not be self
> made in the first place. PD#1 as described is a misconfiguration.
>
>
>
> *Reg PD#2:*
>
>
>
> You say:
>
>
>
> >  Failure scenario 2 (FS#2) The links from ISP1 to PE1 and PE2 are down
>
> >  at the same time;
>
>
>
> If those two links go down in the same time both PEs should notice it
> (optics or BFD) and apply PIC accordingly. PIC on PE1 should result in
> shifting traffic to ISP2. So should PIC action on PE2.
>
> [Satya] *PE1 cannot know that PE2-ISP1 link is also down, right*? If
> PE2-ISP1 not down, then for the traffic to reach ISP1, the correct
> forwarding path is from PE1 to PE2 and then to ISP1. It should not send to
> PE2 as I mentioned in the constraint earlier.
>
>
>
> As with PIC the FIB rewrite is prefix independent so no loop should form.
>
>
>
> As you said both ISPs advertise identical set of routes: "Both ISPs
> advertise the same 700k prefixes/"
>
>
>
> Only in a situation when you would apply eiBGP multipath there could be
> some micr-loop.
>
>
>
> PIC should be smart and ignore IBGP paths (if their local pref is
> preferred in steady state) if local EBGP paths exist to heal data plane
> during the fast repair. Tnen BGP will converge to the policy
> aligned selection of exist.
>
> [Satya] As I mentioned this is PIC with an additional constraint.
>
>
>
> Kind regards,
>
> Robert
>
>
>
>
>
> On Thu, Jul 27, 2023 at 9:36 AM Satya Mohanty (satyamoh) <satyamoh=
> 40cisco.com@dmarc.ietf.org> wrote:
>
> Hi Keyur and the chairs,
>
>
>
> Towards the end of my IETF presentation, the audio was coming garbled at
> my end and not at all coherent.
>
> I went over the recording today. I am replying to the two
> questions/observations.
>
>
>
> 1)  Suggestion was given to use another label mode i.e., per-prefix
> (per-vrf does not apply here).  However, using per-prefix label allocation
> would result in the inline RRs/ASBRs exhausting their label threshold
> (platform dependent  very quickly as the route scale increases (platform
> dependent upper-limit). Therefore, using per-prefix label allocation was
> ruled out in this deployment after being given due consideration.
>
>
>
> Cisco IOS-XR supports the per-nexthop-recvd-label mode for some-time now
> in Option-B ASBR and RR with nh-self use-cases, precisely for this reason.
> I believe other vendors has an equivalent mode. Idea is to take advantage
> of the optimal label allocation by this mode and simultaneously ensure fast
> convergence via BGP PIC.
>
>
>
> 2) Regarding the suggestion of not using the proposed attribute, the
> original thought was to use tunnel-encaps attribute. The problem that I saw
> is that the tunnel-encaps can have many sub-tlvs for different purposes,
> and if we wanted to restrict the advertisement of the secondary label to
> routers that do not need it, it will not be that easy as those same routers
> may need some other TLVs present in that same tunnel-encaps attribute. But,
> we do look forward to getting your inputs/suggestions on this as you
> indicated.
>
>
>
> Thanks.
>
>
>
> Best Regards,
>
> --Satya
>
>
>
>
>
>
>
> *From: *Idr <idr-bounces@ietf.org> on behalf of Satya Mohanty (satyamoh)
> <satyamoh=40cisco.com@dmarc.ietf.org>
> *Date: *Tuesday, July 11, 2023 at 9:44 PM
> *To: *Dongjie (Jimmy) <jie.dong=40huawei.com@dmarc.ietf.org>, idr@ietf.org
> <idr@ietf.org>, MEANS, ISRAEL L <im8327@att.com>, RAMADENU, PRAVEEN <
> pr9637@att.com>
> *Cc: *idr-chairs@ietf.org <idr-chairs@ietf.org>
> *Subject: *Re: [Idr] Call for IETF 117 IDR agenda items
>
> Hi Jie,
>
>
>
> We would like to request a slot of 10 minutes to present the following
> draft. Tuesday slot is preferable.
>
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
>
>
> Thanks,
>
> --Satya
>
>
>
> *From: *Idr <idr-bounces@ietf.org> on behalf of Dongjie (Jimmy) <jie.dong=
> 40huawei.com@dmarc.ietf.org>
> *Date: *Tuesday, June 27, 2023 at 3:57 PM
> *To: *idr@ietf.org <idr@ietf.org>
> *Cc: *idr-chairs@ietf.org <idr-chairs@ietf.org>
> *Subject: *[Idr] Call for IETF 117 IDR agenda items
>
> Dear all,
>
>
>
> The draft agenda of IETF 117 is available at
> https://datatracker.ietf.org/meeting/117/agenda. The IDR sessions are
> scheduled as below:
>
>
>
> - Monday Session II  13:00 - 15:00 (local time)  Plaza B
>
>
>
> - Thursday Session IV 17:00 – 18:00 (local time)  Continental 4
>
>
>
> Please start to send any IDR agenda item request to me and CC the chairs (
> idr-chairs@ietf.org). Please include the name of the person who will be
> presenting, and the estimate time you'll need (including Q/A).
>
>
>
> If you plan to make a presentation, please keep in mind the IDR tradition,
> "no Internet Draft - no time slot". You should also plan to send your
> slides to me and CC the chairs no later than 24 hours prior to the IDR
> session, though earlier is better. Please number your slides for the
> benefit of remote attendees. By default your slides will be converted to
> PDF and presented from the PDF.
>
>
>
> Potential presenters may want to take a look at the checklist for
> presenting at IDR:
>
>
>
>
> https://trac.tools.ietf.org/wg/idr/trac/wiki/Checklist%20for%20presenting%20at%20an%20IDR%20meeting
>
>
>
> Best regards,
>
> Jie
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
>