Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing

<bruno.decraene@orange.com> Tue, 18 December 2018 23:10 UTC

Return-Path: <bruno.decraene@orange.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 93A2A1311F0; Tue, 18 Dec 2018 15:10:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NpKAa44M9FTR; Tue, 18 Dec 2018 15:10:23 -0800 (PST)
Received: from orange.com (mta240.mail.business.static.orange.com [80.12.66.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 72EEF1311EE; Tue, 18 Dec 2018 15:10:22 -0800 (PST)
Received: from opfedar00.francetelecom.fr (unknown [xx.xx.xx.11]) by opfedar23.francetelecom.fr (ESMTP service) with ESMTP id 43KDJX4VRdzBs0X; Wed, 19 Dec 2018 00:10:20 +0100 (CET)
Received: from Exchangemail-eme2.itn.ftgroup (unknown [xx.xx.31.62]) by opfedar00.francetelecom.fr (ESMTP service) with ESMTP id 43KDJX3ShJzCqkw; Wed, 19 Dec 2018 00:10:20 +0100 (CET)
Received: from OPEXCAUBM6E.corporate.adroot.infra.ftgroup (10.114.13.79) by OPEXCLILM5E.corporate.adroot.infra.ftgroup (10.114.31.62) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 19 Dec 2018 00:10:19 +0100
Received: from OPEXCAUBM43.corporate.adroot.infra.ftgroup ([fe80::b846:2467:1591:5d9d]) by OPEXCAUBM6E.corporate.adroot.infra.ftgroup ([fe80::d89a:9017:59c2:9724%21]) with mapi id 14.03.0415.000; Wed, 19 Dec 2018 00:10:19 +0100
From: bruno.decraene@orange.com
To: Alvaro Retana <aretana.ietf@gmail.com>
CC: "idr@ietf. org" <idr@ietf.org>, SPRING WG <spring@ietf.org>
Thread-Topic: [spring] Error Handling for BGP-LS with Segment Routing
Thread-Index: AQHUlxXxt4N4BO7ZJU6ywvZlHF46J6WFCa9A
Date: Tue, 18 Dec 2018 23:10:19 +0000
Message-ID: <13486_1545174620_5C197E5C_13486_197_1_53C29892C857584299CBF5D05346208A48970FEB@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
References: <CAMMESsz8Z_B1aH-4wYL-V9cV=5Xse+tpKqXFish6+V+td7KKzw@mail.gmail.com>
In-Reply-To: <CAMMESsz8Z_B1aH-4wYL-V9cV=5Xse+tpKqXFish6+V+td7KKzw@mail.gmail.com>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.247]
Content-Type: multipart/alternative; boundary="_000_53C29892C857584299CBF5D05346208A48970FEBOPEXCAUBM43corp_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/1V1n4Vg546IdIl2QpRcBkufNK4E>
Subject: Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Dec 2018 23:10:26 -0000

Alvaro,

Speaking as an individual IDR contributor,

Interesting discussion.
But error handling in routing is a difficult topic and sometimes a controversial one.

> Thoughts/ideas/comments?


1) shouldn’t BGP-LS error handling be also discussed in the LSVR WG?
https://tools.ietf.org/html/draft-ietf-lsvr-bgp-spf-03#section-5.7 does not seem to cover this.
And this document was under WGLC till yesterday.


2) Regarding BGP-LS error handling, it’s not clear to me that “treat as withdraw” would be “safer” than “Attribute Discard”. “Session reset” is safer from an inconsistency standpoint but definitely also “has a direct effect on how traffic is forwarded in the network” and a sever one.

3)
> The BGP-LS extensions for SR (e.g. draft-ietf-idr-bgp-ls-segment-routing-ext) are, as explained in that draft, used so that "an external component (e.g., a controller) then can collect SR information from across an SR domain and construct the end-to-end path (with its associated SIDs) that need to be applied to an incoming packet to achieve the desired end-to-end forwarding."

> To me, that obviously implies that use of BGP-LS for SR has a direct effect on how traffic is forwarded in the network.  Does any one see it differently?

a) IMHO that implication would be the same without SR, e.g., with RSVP-TE. In fact, the effect on how traffic is forwarded is coming from the PCE computation using partial/incorrect topology information, not how the forwarding is enforced.

b) IMHO RFC7606 was more concerned about forwarding loops/black holing –especially for IBGP-, rather than changing the path of the traffic. (as ““treat as withdraw “ or “sessions reset” would also have “a direct effect on how traffic is forwarded in the network”.) Note that the latter quote is not from RFC760 which uses the terms  “no effect on route selection or installation” which is a bit different.

c) Coming back to SR, quickly looking at the ToC, the discard of the SID simply means that the SID can”t be used by the SR source/ingress node. The discard of the SR node attribute means that the node can’t be used to forward a global segment. The use of flex-algo is a bit more touchy as discarding the support for a flex algo will change the routing along this flex algo. But only from the perspective of the BGP-LS consumer, so this would not create forwarding loops/black hole, but only a non expected routing path.

4)  I haven’t checked but it’s not clear to me that IS-IS has a perfect (better?) error handling.
e.g.,
> “the procedure used to choose which copy shall be used
> is undefined.”
> https://datatracker.ietf.org/doc/html/draft-ietf-isis-rfc4971bis-04

Les or Stefano could comment on IS-IS error handling and there has been a recent related discussion on this https://tools.ietf.org/html/draft-ginsberg-lsr-isis-invalid-tlv
In theory, IS-IS would have a way to signal the error back to the sender/whole network by purging the LSP, which BGP-LS has not. But it”s not signaling the error so the ingress can’t try removing it.
5) BGP-LS and IS-IS have chosen a different granularity to advertise the LSDB (per link/node vs oer LSP) which very likely will result in a different error handling hence a different vision of the topology. This looks like day 1 design choice for BGP-LS, so difficult to address.


Thanks,
--Bruno


From: spring [mailto:spring-bounces@ietf.org] On Behalf Of Alvaro Retana
Sent: Tuesday, December 18, 2018 10:09 PM
To: idr@ietf. org; SPRING WG
Subject: [spring] Error Handling for BGP-LS with Segment Routing

Dear idr and spring WGs:

tl;dr  I don't think that BGP-LS, with error handling as specified ("attribute discard"), can provide the robustness that an application (like SR), with direct impact on the forwarding in the network, needs.  [Jump to the bottom for discussion.]


The BGP-LS extensions for SR (e.g. draft-ietf-idr-bgp-ls-segment-routing-ext) are, as explained in that draft, used so that "an external component (e.g., a controller) then can collect SR information from across an SR domain and construct the end-to-end path (with its associated SIDs) that need to be applied to an incoming packet to achieve the desired end-to-end forwarding."

To me, that obviously implies that use of BGP-LS for SR has a direct effect on how traffic is forwarded in the network.  Does any one see it differently?


The error handling mechanism specified in rfc7752 is "attribute discard" [rfc7606].  If an error is detected, then the information in the controller may be, at best, incomplete, but it could also be out of date...resulting in "segment routes" that don't follow the best available path or that may even end in a black hole.

It seems clear to me that this is one of the cases that rfc7606 warned about:

   o  Attribute discard: In this approach, the malformed attribute MUST
      be discarded and the UPDATE message continues to be processed.
      This approach MUST NOT be used except in the case of an attribute
      that has no effect on route selection or installation.

      ....
   For any malformed attribute that is handled by the "attribute
   discard" instead of the "treat-as-withdraw" approach, it is critical
   to consider the potential impact of doing so.  In particular, if the
   attribute in question has or may have an effect on route selection or
   installation, the presumption is that discarding it is unsafe unless
   careful analysis proves otherwise.  The analysis should take into
   account the tradeoff between preserving connectivity and potential
   side effects.


There was a related discussion as a result of my AD review of draft-ietf-idr-ls-distribution (= rfc7606) [1][2].  At that time (2015), the consensus on the list was (paraphrasing): if there's a malformed attribute we won't be able to recover, but that's ok because BGP-LS is "purely application-level data that has no immediate corresponding forwarding state impact", and there won't be an impact on critical AFI/SAFI for network operations.   No one else argued against that...so I ended up in the rough...

I think the situation has now changed because BGP-LS is carrying SR information that is used to define paths in the network -- even if isolation exists, as described in rfc7752:

                 ...    Furthermore, it is anticipated that
   distribution of this NLRI will be handled by dedicated route
   reflectors providing a level of isolation and fault containment
   between different NLRI types.

...the BGP-LS information could still be incomplete, stale, etc..


After all that...  I don't think that BGP-LS, with error handling as specified ("attribute discard"), can provide the robustness that an application (like SR), with direct impact on the forwarding in the network, needs.

What now?  I see several potential paths forward (there are probably more):

(1) "fix" BGP-LS to mandate (MUST) isolation and change the error handling approach

(2) change the error handling approach...maybe just when used with SR

(3) the controller should only use the SR information received from routing protocols (IGP/BGP, e.g. draft-ietf-idr-bgp-prefix-sid)

(4) ..??


I didn't find a specific discussion about this topic in the archive...but I may have missed it in between other related ones.  If I did, please point me to it.

Thoughts/ideas/comments?

Thanks!

Alvaro.

[1] https://mailarchive.ietf.org/arch/msg/idr/FomvQV2DqjaaRiAcLYLn3LcIdYM
[2] https://mailarchive.ietf.org/arch/msg/idr/wbPNQ-HM2NeR75gR2Or948J9o1I

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.