Re: [Idr] Regd. https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/

Igor Malyushkin <gmalyushkin@gmail.com> Sun, 13 August 2023 16:34 UTC

Return-Path: <gmalyushkin@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 831B3C14CF1D for <idr@ietfa.amsl.com>; Sun, 13 Aug 2023 09:34:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nzaBs5KjRA5j for <idr@ietfa.amsl.com>; Sun, 13 Aug 2023 09:34:05 -0700 (PDT)
Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E3E3C14CF0C for <idr@ietf.org>; Sun, 13 Aug 2023 09:34:00 -0700 (PDT)
Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-26b47df332eso409353a91.2 for <idr@ietf.org>; Sun, 13 Aug 2023 09:34:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691944439; x=1692549239; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=30Uia/wyGFipm7yyDj2TgenueKnADoq2MjkHhmZ6GLc=; b=WvmMa7fpV6sVULLlfnqgn9jjP84ZPFsDr1UvEvA5LHSdbuZX15RZfM5bG3Ww4wJ1Kd 4uDYnBAEnka0F9tTLIsOgm9Ilr7QZzqjDx0wFwyGVmS130eOtBCzWE8S1HKwOmfkcHVy ww0/Eq2LGvqkKINyG61uGRGBsZHUR0ljUE5SySL6WOLpdbbS1z5XOrB1UKjDcQaW/TG1 Z3zOfCLtmx16+5RQ3yjO5jOaaDcQbMkwA6xAK4QNE+nI9TGChURlyDXOp4FBSEJB53vU PqGNfuGlbQXtlX8nJO3BwJdoPumV/YM9uxtOao9e+HByYDr4dX0nk8eLo44JtVyUZCcv lKGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691944439; x=1692549239; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=30Uia/wyGFipm7yyDj2TgenueKnADoq2MjkHhmZ6GLc=; b=fVEVtBKo5aBrn7xQcjE9GI6bK1SWzK/wZhYXH6McfMjG0cCrwEdiN2JfwTyQCrFog0 ADC5js7AdaLMlllLpEPWUejP408RJ3iVQX5MXxUyM9nhNxzUpN6YALoaY/MAUxO6YOtZ vqyBFcTR5YuXgqV/HYRCsvR4paW2kpmSvIu6clgL+UhcRv7E/JK+dCdRpu2RKrmqsIKn CoUUg5eldbWEPBdd3+hTdVsZNXGihdGDZSTDh0FPb/RmFOiDvHL0bIv4qaB45k7yjEH4 v2BpVIzS+O8wMgdGuhE5nkumZQNd0KfkYyzdA/eClwl1dp8AhM0d4LlNDzVJ+E+pIDtD blYQ==
X-Gm-Message-State: AOJu0Yw/gaLW33UYVCi4Sd78EpEbGvAeYzeAnW5rb3FXNo3XIzzDtxcX 22ovQKwlRq0wDN5FAS2jD2T5jaIFP+PRiPMY70k=
X-Google-Smtp-Source: AGHT+IGoIz3UPdXJyZMoABVxMiP6m++c7nQB2qXIcRQWXAYTDY4ox5rZNxXHGOVANinTCnPtL384esPRzHvZ7nHQRls=
X-Received: by 2002:a17:90a:72cd:b0:263:4685:f9a5 with SMTP id l13-20020a17090a72cd00b002634685f9a5mr4982867pjk.8.1691944439406; Sun, 13 Aug 2023 09:33:59 -0700 (PDT)
MIME-Version: 1.0
References: <40ad79902852443d8783a322dffbab8a@huawei.com> <CH2PR11MB4312EC318A3E8C1667C784ADD431A@CH2PR11MB4312.namprd11.prod.outlook.com> <BY5PR11MB43055C64B2497F586ACB64BED401A@BY5PR11MB4305.namprd11.prod.outlook.com> <CAOj+MMFP+u6UGpTAyvn7KhRww00mmd-iGmHxBnFg9OeGNF-X7Q@mail.gmail.com> <CAEfhRrx5oNeW2z4V9pDqs9nSgFzFH6oiK1CCEOf+FQj_DuimsQ@mail.gmail.com> <CAOj+MMEiVMxR3JKwXdT7=6ozmmZYJR95iQfqGOHU1Vm5XzXi7w@mail.gmail.com> <CAEfhRrxKfN+8bZnxSND4zo=8h_Y=q+rWsM6BEZf9FC3BcJe2Xw@mail.gmail.com> <BY5PR11MB4305392CAD13D631EC3C9CD5D411A@BY5PR11MB4305.namprd11.prod.outlook.com>
In-Reply-To: <BY5PR11MB4305392CAD13D631EC3C9CD5D411A@BY5PR11MB4305.namprd11.prod.outlook.com>
From: Igor Malyushkin <gmalyushkin@gmail.com>
Date: Sun, 13 Aug 2023 20:33:47 +0400
Message-ID: <CAEfhRrwzP=uY5uhw5K8tSqprPn_4z6n_00CNqTQtyRHZK61sGQ@mail.gmail.com>
To: "Satya Mohanty (satyamoh)" <satyamoh@cisco.com>
Cc: Robert Raszuk <robert@raszuk.net>, "idr@ietf.org" <idr@ietf.org>, "RAMADENU, PRAVEEN" <pr9637@att.com>
Content-Type: multipart/alternative; boundary="0000000000004a66c90602d08300"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/ArJcli0SBcOK_nBZr6X8uOiftE4>
Subject: Re: [Idr] Regd. https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Aug 2023 16:34:09 -0000

Hello Satya,

>From my understanding, in this solution, the number of next-hop and label
pairs grows twice at every ABR/ASBR. What if any PE allocates labels in a
per-prefix fashion? We will spend twice more resources at any intermediate
NHS node. Also, I have a question w.r.t the optional transitive attribute.
Don't we have the same problem as we have with the entropy label attribute
here? What if we have a pair of ASBR/ABR that does not support this
solution, make NHS, and propagate routes with this attribute? If we have
any PE underneath, supporting this solution and doing PIC, in case of
failure of one of these ASBR/ABR, will the traffic be blackholed at another
with an unknown secondary label?

For PD#1, at first, we tried to solve the issue with per-prefix label
allocation for VPN prefixes, turned the per-next-hop mode, and got another
issue. To cope with this, the draft offers to allocate again some
additional labels. This of course is less scaling demanding but
nevertheless. LSP hierarchy solves this problem better than a
flat structure. In your example with different next-hops, I don't see a
good reason to not have connectivity among all PEs and RRs. In this case,
independent of the number next-hops, the problem is solved.
For Option B, mentioned draft also offers to use an LSP hierarchy, which
solves the issue too. As I understand, almost all machinery is already
defined and standardized for that purpose.

For PD#2, I also see have some questions. Please, see the inline.



вс, 13 авг. 2023 г. в 18:39, Satya Mohanty (satyamoh) <satyamoh@cisco.com>:

> Hi Robert and Igor,
>
>
>
> 1. The RRs are non-clients to each other. It is the PEs who are the RR
> clients. We have that in reverse in the draft. Thanks for pointing that out.
>
> We had this noted down before submission but unfortunately forgot, both
> during the draft submission and in the presentation.
>
> We will amend this in the next version and substitute “client” with
> “non-client” in the following text.
>
>
>
> “Both these RRs are also clients of each other and advertise VPN routes to
> each other with the next-hop set to the peering address.”
>
>
>
> Irrespective, the RR client/non-client discussion or an option-B IAS (in
> which case none of RFC4456, RR clients/non-clients/ cluster-id etc. apply)
> should not detract from the main topic.  BTW, a topology like Fig.1 (which
> is greatly simplified) is in production for more than 2 years now without
> any RR related issues.
>
>
>
> 2. Igor, we did consider about a year back one of your suggestions i.e.,
> keep VPN next-hops unchanged, leak the next-hop in the BGP LU and do the
> PIC in the BGP LU route (the Nexthop for the VPN). We had it verified in
> the lab too.
>
>
>
> However, there is one big issue. If the next-hop of the VPN route is *not
> the same*, this scheme fails. In figure below, VPN route V is received at
> RR1 with next-hop PE1 and at RR2 with next-hop PE1’. Since the next-hops
> themselves are different (there are good reasons why they are different but
> cannot go there) , we can’t do as you suggest.  Also, as you have also
> mentioned, solution with LU does not work in the case of option B.
>
>
>
> PE1         PE1’
>
>  |              |.  V
>
>   |.             |
>
> RR1-----   RR2
>
>  \              /
>
>     \          /
>
>       \      /
>
>         PE2
>
>
>
> I will investigate the draft that you mentioned and get back. Thanks for
> the reference.
>
>
>
> Regarding PD#2, I will try to explain the issue with respect to a
> particular VPN prefix with regards to Figure 2 in the draft. Let’s say we
> are doing vanilla PIC.
>
> 1.  Local label at PE1 has primary path with next-hop ISP1 and backup
> PE2. Say this label is 100. At PE1, we cannot have the backup to ISP2
> because of the given objective constraint that traffic should be able to
> still reach ISP1 so long as there is a path from one of the PEs to ISP1. If
> we choose the backup as ISP2, and PE2-ISP1 was intact, then we would have
> defeated our purpose if we forwarded to ISP2 directly since the forwarding
> path PE1—PE2—ISP1 exists.
>
[IM] My reading of the part "objective constraint that traffic should be
able to still reach ISP1 so long as there is a path from one of the PEs to
ISP1" rises a question. How can we know that there is such a path at all?
We can't differentiate a node failure from a link one. So, when the link at
PE1 towards ISP1 fails, the draft makes an assumption that is the link
failure and reroutes traffic to PE2. It works for link failure but does not
work for node failure. According to this draft, traffic will be dropped at
PE2, instead of being locally rerouted at PE1 to ISP2.



> 2.  Local label at PE2 has primary path with next-hop ISP1 and backup PE1.
> Say this label is 100. We cannot have the backup to ISP2 because of the
> same constraint that I mentioned in (1) above.
>
>
>
> If traffic from PE0 is ingressing at PE1 with label 100. If PE1-ISP1 link
> breaks, with vanilla PIC, traffic will be diverted to PE2 with label
> swapped to 200. At PE2, if it then finds that PE2-ISP1 is broken, it will
> send it back to PE1 after swapping label to 100, and then the micro-loop
> ensues until the BGP Convergence.
>
>
>
> I think it may be easier to describe in better details in the next
> version, so that similar questions do not prop up.
>
>
>
> Best Regards,
>
> --Satya
>
>
>
>
>
>
>
> *From: *Igor Malyushkin <gmalyushkin@gmail.com>
> *Date: *Saturday, August 12, 2023 at 6:10 AM
> *To: *Robert Raszuk <robert@raszuk.net>
> *Cc: *Satya Mohanty (satyamoh) <satyamoh@cisco.com>, idr@ietf.org <
> idr@ietf.org>, RAMADENU, PRAVEEN <pr9637@att.com>
> *Subject: *Re: [Idr] Regd.
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
> Hi Robert,
>
> Well, maybe RFC4456 indeed requires some clarification. From
> my experience, inline RRs are not the same as regular ones. Yes, they use
> the same mechanics but solve other tasks, and because they are LSRs for BGP
> LSPs or VPN LSPs some tricks with CLUSTER_IDs, peering, and label
> allocation modes are required here.
>
> I agree that the solution from PD#1 is a bad idea to solve the scaling
> issue. I don't think that there should be a new solution with the next
> layer of labels and a new path attribute whenever BGP LU is here for ages
> and solves this problem better.
>
>
>
> With Option B I would like to see which of the approaches is better (this
> one or B/C).
>
>
>
> сб, 12 авг. 2023 г. в 16:54, Robert Raszuk <robert@raszuk.net>:
>
> Hi Igor,
>
>
>
> > Using different CLUSTER_IDs for inline RRs at the same hierarchy level
> is common
>
>
>
> Even if you do setup different CLUSTER_IDs it should be fine ... as the
> other RR should not accept an UPDATE MSG when he seems his own CLUSTER_ID
> in the incoming update.
>
>
>
> Remember CLUSTER_ID should get prepended upon reflection not overwritten.
>
>
>
> Label allocation has nothing to do with loop. It is broken reflection
> configuration which causes described loops.
>
>
>
> Yes between clusters you can setup non client IBGP to fully mesh clusters,
> but within cluster it is rather a poor idea to make RRs clients of each
> other.
>
>
>
> So PD#1 is simply a misconfiguration IMHO.
>
>
>
> If you think otherwise please update RFC4456 first. Only then we could
> consider solutions to the problem caused by such update.
>
>
>
> Regards,
>
> Robert
>
>
>
>
>
> On Sat, Aug 12, 2023 at 2:39 PM Igor Malyushkin <gmalyushkin@gmail.com>
> wrote:
>
> Hello, Robert, Satya,
>
> Using different CLUSTER_IDs for inline RRs at the same hierarchy level is
> common. Especially when there is a labeled unicast underneath. Although, I
> don't understand why two RRs should be clients to each other instead of
> regular non-client peers.
>
>
> For PD#1, it is possible to signal LU addresses of PE1, PE2, and both RRs
> and use them as NHs for VPN prefixes. In this case for labeled unicast
> prefixes a per-prefix label allocation mode completely solves the problem.
> For VPN sessions RRs do not apply next-hop-self but act as classical RRs
> (or even can be unaware of any VPN sessions at all). Classical seamless
> MPLS approach. With the different CLUSTER_IDs, PIC between the RRs can be
> maintained also.
>
>
> If we talk about Option B, the solution with LU does not obviously work,
> but there are several approaches to cope with scaling problems, Option A/B,
> and Option B/C (draft-zzhang-bess-vpn-option-bc-00). The latest is the new
> draft that combines a two-labeled approach but does not require new path
> attributes.
>
> For PD#2, here I agree with Robert that it is strange to use internal BGP
> paths instead of external ones for PIC in that case. What if the ISP1 box
> goes down? All the traffic will go to the ISP2 box from both PEs anyway.
> Isn't it wise not to use internal BGP paths for a link failure? Actually,
> we don't even differentiate a link down even from a node failure. But we
> are trying to apply different FFR technics there.
>
> [Satya] Well, we do use internal paths in the best-external case. In case
> of box failure that you mention, if we can infer that, sure, there can be
> an optimization to directly send to ISP2.
> Also, for a possible loop, does not NFRR from the MNA framework solve this
> issue at the transport level?
> [Satya] Will look that up.
> My 2 cents.
>
>
>
>
>
> сб, 12 авг. 2023 г. в 15:45, Robert Raszuk <robert@raszuk.net>:
>
> Satya,
>
>
>
> *Reg PD#1: *
>
>
>
> Problem described as PD#1 arises by violation of RFC4456 rules. When your
> RRs are part of the same cluster (and here they clearly are) it is
> mandatory to use the same CLUSTER_ID on both route reflectors. That will
> prevent any reflected routes to get accepted by the other RR client.
>
>
>
>    Both these RRs are also clients of each other and advertise VPN routes to each other with the
>
>    next-hop set to the peering address.
>
>
>
> Please do not invent a bandage to heal wounds which should not be self
> made in the first place. PD#1 as described is a misconfiguration.
>
>
>
> *Reg PD#2:*
>
>
>
> You say:
>
>
>
> >  Failure scenario 2 (FS#2) The links from ISP1 to PE1 and PE2 are down
>
> >  at the same time;
>
>
>
> If those two links go down in the same time both PEs should notice it
> (optics or BFD) and apply PIC accordingly. PIC on PE1 should result in
> shifting traffic to ISP2. So should PIC action on PE2.
>
> [Satya] *PE1 cannot know that PE2-ISP1 link is also down, right*? If
> PE2-ISP1 not down, then for the traffic to reach ISP1, the correct
> forwarding path is from PE1 to PE2 and then to ISP1. It should not send to
> PE2 as I mentioned in the constraint earlier.
>
>
>
> As with PIC the FIB rewrite is prefix independent so no loop should form.
>
>
>
> As you said both ISPs advertise identical set of routes: "Both ISPs
> advertise the same 700k prefixes/"
>
>
>
> Only in a situation when you would apply eiBGP multipath there could be
> some micr-loop.
>
>
>
> PIC should be smart and ignore IBGP paths (if their local pref is
> preferred in steady state) if local EBGP paths exist to heal data plane
> during the fast repair. Tnen BGP will converge to the policy
> aligned selection of exist.
>
> [Satya] As I mentioned this is PIC with an additional constraint.
>
>
>
> Kind regards,
>
> Robert
>
>
>
>
>
> On Thu, Jul 27, 2023 at 9:36 AM Satya Mohanty (satyamoh) <satyamoh=
> 40cisco.com@dmarc.ietf.org> wrote:
>
> Hi Keyur and the chairs,
>
>
>
> Towards the end of my IETF presentation, the audio was coming garbled at
> my end and not at all coherent.
>
> I went over the recording today. I am replying to the two
> questions/observations.
>
>
>
> 1)  Suggestion was given to use another label mode i.e., per-prefix
> (per-vrf does not apply here).  However, using per-prefix label allocation
> would result in the inline RRs/ASBRs exhausting their label threshold
> (platform dependent  very quickly as the route scale increases (platform
> dependent upper-limit). Therefore, using per-prefix label allocation was
> ruled out in this deployment after being given due consideration.
>
>
>
> Cisco IOS-XR supports the per-nexthop-recvd-label mode for some-time now
> in Option-B ASBR and RR with nh-self use-cases, precisely for this reason.
> I believe other vendors has an equivalent mode. Idea is to take advantage
> of the optimal label allocation by this mode and simultaneously ensure fast
> convergence via BGP PIC.
>
>
>
> 2) Regarding the suggestion of not using the proposed attribute, the
> original thought was to use tunnel-encaps attribute. The problem that I saw
> is that the tunnel-encaps can have many sub-tlvs for different purposes,
> and if we wanted to restrict the advertisement of the secondary label to
> routers that do not need it, it will not be that easy as those same routers
> may need some other TLVs present in that same tunnel-encaps attribute. But,
> we do look forward to getting your inputs/suggestions on this as you
> indicated.
>
>
>
> Thanks.
>
>
>
> Best Regards,
>
> --Satya
>
>
>
>
>
>
>
> *From: *Idr <idr-bounces@ietf.org> on behalf of Satya Mohanty (satyamoh)
> <satyamoh=40cisco.com@dmarc.ietf.org>
> *Date: *Tuesday, July 11, 2023 at 9:44 PM
> *To: *Dongjie (Jimmy) <jie.dong=40huawei.com@dmarc.ietf.org>, idr@ietf.org
> <idr@ietf.org>, MEANS, ISRAEL L <im8327@att.com>, RAMADENU, PRAVEEN <
> pr9637@att.com>
> *Cc: *idr-chairs@ietf.org <idr-chairs@ietf.org>
> *Subject: *Re: [Idr] Call for IETF 117 IDR agenda items
>
> Hi Jie,
>
>
>
> We would like to request a slot of 10 minutes to present the following
> draft. Tuesday slot is preferable.
>
> https://datatracker.ietf.org/doc/draft-mohanty-idr-secondary-label/
>
>
>
> Thanks,
>
> --Satya
>
>
>
> *From: *Idr <idr-bounces@ietf.org> on behalf of Dongjie (Jimmy) <jie.dong=
> 40huawei.com@dmarc.ietf.org>
> *Date: *Tuesday, June 27, 2023 at 3:57 PM
> *To: *idr@ietf.org <idr@ietf.org>
> *Cc: *idr-chairs@ietf.org <idr-chairs@ietf.org>
> *Subject: *[Idr] Call for IETF 117 IDR agenda items
>
> Dear all,
>
>
>
> The draft agenda of IETF 117 is available at
> https://datatracker.ietf.org/meeting/117/agenda. The IDR sessions are
> scheduled as below:
>
>
>
> - Monday Session II  13:00 - 15:00 (local time)  Plaza B
>
>
>
> - Thursday Session IV 17:00 – 18:00 (local time)  Continental 4
>
>
>
> Please start to send any IDR agenda item request to me and CC the chairs (
> idr-chairs@ietf.org). Please include the name of the person who will be
> presenting, and the estimate time you'll need (including Q/A).
>
>
>
> If you plan to make a presentation, please keep in mind the IDR tradition,
> "no Internet Draft - no time slot". You should also plan to send your
> slides to me and CC the chairs no later than 24 hours prior to the IDR
> session, though earlier is better. Please number your slides for the
> benefit of remote attendees. By default your slides will be converted to
> PDF and presented from the PDF.
>
>
>
> Potential presenters may want to take a look at the checklist for
> presenting at IDR:
>
>
>
>
> https://trac.tools.ietf.org/wg/idr/trac/wiki/Checklist%20for%20presenting%20at%20an%20IDR%20meeting
>
>
>
> Best regards,
>
> Jie
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
>