Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

Aijun Wang <wangaijun@tsinghua.org.cn> Mon, 07 September 2020 03:44 UTC

Return-Path: <wangaijun@tsinghua.org.cn>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B2C73A043E for <lsr@ietfa.amsl.com>; Sun, 6 Sep 2020 20:44:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.656
X-Spam-Level:
X-Spam-Status: No, score=-0.656 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ocmVErUcKXsK for <lsr@ietfa.amsl.com>; Sun, 6 Sep 2020 20:44:28 -0700 (PDT)
Received: from mail-m127101.qiye.163.com (mail-m127101.qiye.163.com [115.236.127.101]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0DE913A017E for <lsr@ietf.org>; Sun, 6 Sep 2020 20:44:27 -0700 (PDT)
Received: from DESKTOP2IOH5QC (unknown [219.142.69.75]) by mail-m127101.qiye.163.com (Hmail) with ESMTPA id 111B4453E1; Mon, 7 Sep 2020 11:44:23 +0800 (CST)
From: Aijun Wang <wangaijun@tsinghua.org.cn>
To: 'Gyan Mishra' <hayabusagsm@gmail.com>, "'Acee Lindem (acee)'" <acee@cisco.com>
Cc: "'Les Ginsberg (ginsberg)'" <ginsberg@cisco.com>, 'Tony Przygienda' <tonysietf@gmail.com>, 'Robert Raszuk' <robert@raszuk.net>, 'Huzhibo' <huzhibo@huawei.com>, 'Aijun Wang' <wangaj3@chinatelecom.cn>, 'Peter Psenak' <ppsenak=40cisco.com@dmarc.ietf.org>, 'lsr' <lsr@ietf.org>, 'Xiaoyaqun' <xiaoyaqun@huawei.com>
References: <CAOj+MMGgpcnRMnPxQqcZofgJNH67QYUQOxWsTU5Xp-Km0D2DDg@mail.gmail.com> <A202F6E1-AD83-46E8-A1D2-E156FB35DF57@chinatelecom.cn> <CAOj+MMHd1WZNCWr6KihxzDf=G53A8FBUBbqHpZGNwvF4hsuzMA@mail.gmail.com> <059e01d66ad7$ffda2e50$ff8e8af0$@tsinghua.org.cn> <CABNhwV2oXBBNKOdUA59sLF+b5srWHi3KF2Q6H1Tg-dK+gA9Lgw@mail.gmail.com> <013301d679f6$92da3ce0$b88eb6a0$@tsinghua.org.cn> <CA+wi2hM1H6Vr5U_1fTVkPi5aQLrFTeRhD5Q8Be2T+wf0e1h+QA@mail.gmail.com> <BY5PR11MB4337214640926A190A8D620FC12D0@BY5PR11MB4337.namprd11.prod.outlook.com> <8838525B-EB2D-4B40-956A-1F0D5EA56D32@cisco.com> <CABNhwV26_doVEPTexHUhgAJNjdAF1JwCAHxf4H1+QzKCGLyP9g@mail.gmail.com>
In-Reply-To: <CABNhwV26_doVEPTexHUhgAJNjdAF1JwCAHxf4H1+QzKCGLyP9g@mail.gmail.com>
Date: Mon, 07 Sep 2020 11:44:22 +0800
Message-ID: <00dd01d684c9$2d1ff3d0$875fdb70$@tsinghua.org.cn>
MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_000_00DE_01D6850C.3B48B210"
X-Mailer: Microsoft Outlook 16.0
Content-Language: zh-cn
Thread-Index: AQKwsgRl8uq4ZZAYYiwm8gqgwsmPuwLfhjxOAXwBCl8B/jVB7AHXWAwnAU+85ywCK6kboAKkbAgoAYwlr7oClVsLoqcUjAhw
X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgYFAkeWUFZS1VLWVdZKFlBSkxLS0o3V1ktWUFJV1 kPCRoVCBIfWUFZT04ZSxgfGktKGktKVkpOQkJPTktJTUhNSEhVEwETFhoSFyQUDg9ZV1kWGg8SFR 0UWUFZT0tIVUpKS09ISFVKS0tZBg++
X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6M0k6Lio6GD8uSxlLGSsBPgwK GhMaCgFVSlVKTkJCT05LSU1PSEJJVTMWGhIXVQwaFRwaEhEOFTsPCBIVHBMOGlUUCRxVGBVFWVdZ EgtZQVlJSkJVSk9JVU1CVUxOWVdZCAFZQUNLQ01INwY+
X-HM-Tid: 0a7466aa2f7e9865kuuu111b4453e1
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/0j08BPxhKEr88afTLoyckGaCiPk>
Subject: Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Sep 2020 03:44:32 -0000

Hi, Gyan, Acee, Les, Tony:

 

Thanks for your comments on this draft. 

Although the topology in RIFT is different from ISIS/OSPF, if some thoughts can be referred from RIFT, it’s better.

Anyway, let’s focus the scenarios that described in this draft.

 

This draft discuss mainly the influence of summary networks in intra-area and inter-area situation, describes the PUA mechanisms when the failure of node/link that located within in the summary address occurs. The main purposes of PUA mechanism is to decrease the traffic black hole for the services that run on these failure node/link. 

 

Based on the description of section-6 <https://datatracker.ietf.org/doc/html/draft-wang-lsr-prefix-unreachable-annoucement-03#section-6> ,we think the behavior of node on PUA mechanism is determined.(section 6.3 has also state the node within the area should all support this mechanism to avoid possible traffic loop). The PUA information need only keep one configurable time to allow the service that run on it converged, or bypass the failed node/link.

 

If there is any issues for the current solution, would you like to describe it in more detail? Based on the figure of this draft, or other figures that you consider may exist.

 

More detail responses are inline.

 

 

Best Regards

 

Aijun Wang

China Telecom

 

 

From: Gyan Mishra [mailto:hayabusagsm@gmail.com] 
Sent: Saturday, September 5, 2020 4:31 AM
To: Acee Lindem (acee) <acee@cisco.com>
Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; Tony Przygienda <tonysietf@gmail.com>; Aijun Wang <wangaijun@tsinghua.org.cn>; Robert Raszuk <robert@raszuk.net>; Huzhibo <huzhibo@huawei.com>; Aijun Wang <wangaj3@chinatelecom.cn>; Peter Psenak <ppsenak=40cisco.com@dmarc.ietf.org>; lsr <lsr@ietf.org>; Xiaoyaqun <xiaoyaqun@huawei.com>
Subject: Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

 

Agreed that Rift negative  disaggregation and PUA proposed are  in no way comparable.   Sorry to make that analogy but unfortunately it was the first thing that came to mind when reading the draft.

 

I was I will work with Aijun to help fill the gaps & points noted in this thread  without adding more complexity.  

 

>From an operators perspective I agree backwards compatibility is requirement. 

[WAJ] Welcome Gyan to join us, and also welcome other experts.

 

 

Kind Regards

 

Gyan

 

On Fri, Sep 4, 2020 at 4:03 PM Acee Lindem (acee) <acee@cisco.com <mailto:acee@cisco.com> > wrote:

Speaking as WG member…. 

 

This is in no way comparable. This solution presented in the draft is full of holes and non-backward compatible. The problem may be solvable but the question is whether or not the required complexity is  worse than a problem that could be solved with proper network design. 

 

[WAJ]  Hi, Acee, with PUA, we just want to fill in the invisible hole that be covered by the summary address.  It should be supported by all the nodes within the area. 

And whatever the network design, the summary address will be used in the network, especially for the IPv6 era.

Would you like to point out in what situation that the current solution can’t resolve? We can refine and fix it then.

 Thanks,
Acee

 

From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com <mailto:ginsberg@cisco.com> >
Date: Friday, September 4, 2020 at 3:00 PM
To: Tony Przygienda <tonysietf@gmail.com <mailto:tonysietf@gmail.com> >, Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> >
Cc: Gyan Mishra <hayabusagsm@gmail.com <mailto:hayabusagsm@gmail.com> >, Robert Raszuk <robert@raszuk.net <mailto:robert@raszuk.net> >, Huzhibo <huzhibo@huawei.com <mailto:huzhibo@huawei.com> >, Aijun Wang <wangaj3@chinatelecom.cn <mailto:wangaj3@chinatelecom.cn> >, Peter Psenak <ppsenak=40cisco.com@dmarc.ietf.org <mailto:40cisco.com@dmarc.ietf.org> >, lsr <lsr@ietf.org <mailto:lsr@ietf.org> >, Acee Lindem <acee@cisco.com <mailto:acee@cisco.com> >, Xiaoyaqun <xiaoyaqun@huawei.com <mailto:xiaoyaqun@huawei.com> >
Subject: RE: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

 

In support of what Tony has said, I think any comparison between what RIFT is doing and what is proposed in this draft is inappropriate.

 

RIFT is able to determine what destinations exist in the network but are not reachable via a certain subset of the topology – and then generate negative advertisements appropriately. There is also full determinism in knowing when the negative advertisement should be removed.

 

draft-wang-lsr-prefix-unreachable-announcement by contrast tries to provide an advertisement for a destination that no longer exists. This leads to the lack of determinism which necessitates arbitrary timers and creates problems for nodes who connect to the network after the disappearance of the destination.

[WAJ] If necessary, we can advertise the MAX_T_PUA(configurable time for the hold of PUA information on the nodes) among the area.

If one node connect to the network after the disappearance of the PUA destination,  there will be no services can be established/run on these failure node/link prefix. 

It’s the same as the beginning, as not all of the prefixes can be reachable within the summary address.

 

Not comparable at all IMO…

 

   Les

 

From: Lsr <lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org> > On Behalf Of Tony Przygienda
Sent: Friday, September 04, 2020 11:12 AM
To: Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> >
Cc: Gyan Mishra <hayabusagsm@gmail.com <mailto:hayabusagsm@gmail.com> >; Robert Raszuk <robert@raszuk.net <mailto:robert@raszuk.net> >; Huzhibo <huzhibo@huawei.com <mailto:huzhibo@huawei.com> >; Aijun Wang <wangaj3@chinatelecom.cn <mailto:wangaj3@chinatelecom.cn> >; Peter Psenak <ppsenak=40cisco.com@dmarc.ietf.org <mailto:40cisco.com@dmarc.ietf.org> >; lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Acee Lindem (acee) <acee@cisco.com <mailto:acee@cisco.com> >; Xiaoyaqun <xiaoyaqun@huawei.com <mailto:xiaoyaqun@huawei.com> >
Subject: Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

 

I read the draft since the longish thread triggered my interest. As Peter said very thin ice walking with magic soft-state-timers for (to me) entirely unclear benefit and lots of interesting questions completely omitted like e.g. what will happen if a mix of old and new routers are in the network. 

 

RIFT works completely differently BTW (and I don't think we _also_ noticed the problem, AFAIK RIFT is the first protocol that defined the concept of at least negative disaggregation to deal with black-hole avoidance in presence of summaries). RIFT defines precisely how negative disaggregation state is transitively propagated (if necessary) and next-hop resolved via recursive inheritance to provide black-hole and loop free routing in case of links failures on IP fabrics. No soft-timers or undescribed magic there. However, to do what RIFT is doing a sense of direction on the graph is needed (this is relatively easy on semi-lattice RIFT supports but would precondition uniform topological sorting on generic graphs, probably ending up in RPL type of solutions [which still need a direction indicator on arc to work and would take out a lot of links out of the topology possibly {but Pascal is better to judge that}]). 

 

-- tony 

 

On Mon, Aug 24, 2020 at 11:11 AM Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> > wrote:

Hi, Gyan:

 

Sorry for replying you so late.

You are right about the summary address behavior, but the summary address is configured by manually, and if one of the specific prefix within this summary range is down, the black hole(route to this specific prefix) will be formed.  PUA mechanism just want to amend this.

Glad to know Rift has also noticed such issues.  In OSPF/ISIS, such problem needs also be solved.

 

If you are interested this topic, welcome to join us to the solution.

 

 

Best Regards

 

Aijun Wang

China Telecom

 

From: lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org>  [mailto:lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org> ] On Behalf Of Gyan Mishra
Sent: Thursday, August 6, 2020 4:44 PM
To: Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> >
Cc: Peter Psenak <ppsenak=40cisco.com@dmarc.ietf.org <mailto:40cisco.com@dmarc.ietf.org> >; Robert Raszuk <robert@raszuk.net <mailto:robert@raszuk.net> >; Huzhibo <huzhibo@huawei.com <mailto:huzhibo@huawei.com> >; Aijun Wang <wangaj3@chinatelecom.cn <mailto:wangaj3@chinatelecom.cn> >; lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Acee Lindem (acee) <acee@cisco.com <mailto:acee@cisco.com> >; Xiaoyaqun <xiaoyaqun@huawei.com <mailto:xiaoyaqun@huawei.com> >
Subject: Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

 

Hi Aijun and authors

 

I am catching up with this thread after reading through this draft.

 

I understand the concept that we are trying to send a PUA advertisement which sounds similar to Rift Negative Disaggregation prefix now with a  null setting when a longer match component prefix that is part of a summary range is down to detect prefix down conditions with longer match component prefixes that are part of a summary.  

 

Basically how summarization works with both OSPF and ISIS is that at minimum a single longer match within the defined IA summary range must exist for the IA summary to be sent.  So the summary is sent conditionally similar to a BGP conditional advertisement or even a ospf default originate which requires a default in the RIB to exist before the advertisement is sent.  A good example of non conditional analogy with BGP if you have a null0 static for a summary for an exact match BGP advertisement the prefix is always advertised no matter what even if no component prefixes exist.  This can result in black hole routing. BGP has conditional feature to conditionally advertisement based on existence of a route advertisement of downstream neighbor in the BGP RIB.  So with ospf and Isis the summary is in fact conditional similar to a BGP conditional advertisement and in the example given in the draft in section 3.1 when node T2 is down and pt2 becomes unreachable and let’s say that prefix is 1.1.1.1/32 <http://1.1.1.1/32>  and the summary is 1.1.1.0/30 <http://1.1.1.0/30>  and no other component prefix exists within the summary range the prefix will not get adversed.  So there will not be any black hole.  

 

The summary represents all prefixes within the range that would be suppressed with the summary when advertised into the backbone area.  However only at a minimum one prefix must exist in the range for the summary to be generated.  That is done by design as the summary represents all prefixes within the range.  So let’s say there are a 100 prefixes and let’s say a few devices are down and so now only 5 prefixes exist within the range.  By design it is OK for router to generate the summary for the 5 prefixes it is representing and that will not cause any routing loop or black hole.

 

 

I am trying to understand wage gap exists and what we are trying to solve related to summarization in the context of IPv6.  Both IPv4 and IPV6 summarization operates the similarly as far as the requirement of at minimum a single component route within the summary range must exist  as a condition to be present in the RIB before the summary can be advertised.

 

Kind Regards 

 

Gyan

 

On Tue, Aug 4, 2020 at 11:25 PM Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> > wrote:

Hi, Robert:

 

From: lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org>  [mailto:lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org> ] On Behalf Of Robert Raszuk
Sent: Friday, July 31, 2020 6:21 PM
To: Aijun Wang <wangaj3@chinatelecom.cn <mailto:wangaj3@chinatelecom.cn> >
Cc: Peter Psenak <ppsenak=40cisco.com@dmarc.ietf.org <mailto:40cisco.com@dmarc.ietf.org> >; Huzhibo <huzhibo@huawei.com <mailto:huzhibo@huawei.com> >; Aijun Wang <wangaijun@tsinghua.org.cn <mailto:wangaijun@tsinghua.org.cn> >; lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Acee Lindem (acee) <acee@cisco.com <mailto:acee@cisco.com> >; Xiaoyaqun <xiaoyaqun@huawei.com <mailto:xiaoyaqun@huawei.com> >
Subject: Re: [Lsr] New Version Notification for draft-wang-lsr-prefix-unreachable-annoucement-03.txt

 

[WAJ] Such information is for underlay link state and should be flooded via IGP? The ambiguity arises from IGP summary behavior and should be solved by itself?

 

Well if we look at this problem from a distance while on surface it seems like an IGP issue (not to mention some which use BGP as IGP :) IMO it is only hurting when you have some service overlay on top utilizing the IGP. 

[WAJ] There are situations that the PUA mechanism apply when no service overlay running over IGP.  Scenarios described in  <https://datatracker.ietf.org/doc/html/draft-wang-lsr-prefix-unreachable-annoucement-03#section-3> draft-wang-lsr-prefix-unreachable-annoucement-03#section-3 are not tightly coupled with the overlay service.

 

So typically today if I am running any service with BGP I do count on BGP to remove routes which are no longer reachable. IGP just tells me how to get to the next hop, which direction to go and not if the endpoint (service CPE or PE connected to given CE) is up or down. 

 

So today smart BGP implementations in good network design can use RD based withdraws to very fast (milliseconds) remove the affected service routes. When I said should we do it in BGP I meant to ask WG if this is good enough to quickly remove service routes. If not maybe we should send such affected next hop in BGP to even faster invalidate all routes with such next hop as failing PE. 

 

Bottom line if you think the problem is IGP then I think Acee's comments apply. 

[WAJ] Which comment is not addressed yet?

 

Last - See today you are bringing the case to signal transition to DOWN ... but for some people and applications it may be not enough. In fact UP/DOWN they can get via BGP. But if you have two ABRs and one will due to topology changes in its area suddenly will be forced to reach atomic destination covered by summary over much higher metric path that for applications running above may be much more severe case and not acceptable one too. 

[WAJ] Or else, the application traffic will be broken.

 

And BGP will not remove service routes nor modify best path in any way as summary is masking the real metric to some next hops. So while in the network you may have alternate better native transit paths with a lot of free capacity if you only switch to a different bgp next hop (not talking about any TE at all) you are stuck offering much worse service to your customers. 

[WAJ] if there are other links to reach the affected prefix via the ABR, then this ABR will not send the PUA information.

 

Those cases are starting to be solved by performance routing both at the service itself or at BGP nh levels. Should IGP assist here ... I am not sure.

[WAJ] when node become down, it can only depend on other nodes within the same IGP to send such unreachability information. IGP can certainly help here J

 

 

Many thx,

R.

_______________________________________________
Lsr mailing list
Lsr@ietf.org <mailto:Lsr@ietf.org> 
https://www.ietf.org/mailman/listinfo/lsr

-- 

 <http://www.verizon.com/> 

Gyan Mishra

Network Solutions Architect 

M 301 502-1347
13101 Columbia Pike 
Silver Spring, MD

 

_______________________________________________
Lsr mailing list
Lsr@ietf.org <mailto:Lsr@ietf.org> 
https://www.ietf.org/mailman/listinfo/lsr




 

-- 

 <http://www.verizon.com/> 

Gyan Mishra

Network Solutions Architect 

M 301 502-1347
13101 Columbia Pike 
Silver Spring, MD