Re: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

"UTTARO, JAMES" <ju1738@att.com> Sun, 08 July 2012 13:36 UTC

Return-Path: <ju1738@att.com>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9A4B621F8663; Sun, 8 Jul 2012 06:36:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.729
X-Spam-Level:
X-Spam-Status: No, score=-104.729 tagged_above=-999 required=5 tests=[AWL=-1.371, BAYES_40=-0.185, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_MED=-4, SARE_SUB_OBFU_Q1=0.227, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LfROB0bekAGd; Sun, 8 Jul 2012 06:36:20 -0700 (PDT)
Received: from nbfkord-smmo03.seg.att.com (nbfkord-smmo03.seg.att.com [209.65.160.84]) by ietfa.amsl.com (Postfix) with ESMTP id B9CDF21F864A; Sun, 8 Jul 2012 06:36:19 -0700 (PDT)
Received: from unknown [144.160.112.28] (EHLO tlpi048.enaf.dadc.sbc.com) by nbfkord-smmo03.seg.att.com(mxl_mta-6.11.0-10) over TLS secured channel with ESMTP id 9ec89ff4.0.22645.00-458.57236.nbfkord-smmo03.seg.att.com (envelope-from <ju1738@att.com>); Sun, 08 Jul 2012 13:36:42 +0000 (UTC)
X-MXL-Hash: 4ff98cea79bd13e5-9c4588bd0fa20d634e0cd54f71bb05f6b6bb3993
Received: from enaf.dadc.sbc.com (localhost.localdomain [127.0.0.1]) by tlpi048.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id q68DaeYf026599; Sun, 8 Jul 2012 08:36:41 -0500
Received: from dalint01.pst.cso.att.com (dalint01.pst.cso.att.com [135.31.133.159]) by tlpi048.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id q68DaaBR026564; Sun, 8 Jul 2012 08:36:37 -0500
Received: from MISOUT7MSGHUB9B.ITServices.sbc.com (misout7msghub9b.itservices.sbc.com [144.151.223.72]) by dalint01.pst.cso.att.com (RSA Interceptor); Sun, 8 Jul 2012 08:36:13 -0500
Received: from MISOUT7MSGUSR9I.ITServices.sbc.com ([144.151.223.56]) by MISOUT7MSGHUB9B.ITServices.sbc.com ([144.151.223.72]) with mapi id 14.02.0298.004; Sun, 8 Jul 2012 09:35:51 -0400
From: "UTTARO, JAMES" <ju1738@att.com>
To: 'Rob Shakir' <rjs@rob.sh>
Thread-Topic: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
Thread-Index: Ac1Qh0AwzuLH5xBMSu2rz3aHemLX7wBK0QmAAAIbFGAAE+v4AABFsGbAAIK+tYAAGy5tQA==
Date: Sun, 08 Jul 2012 13:35:51 +0000
Message-ID: <B17A6910EEDD1F45980687268941550FB3287D@MISOUT7MSGUSR9I.ITServices.sbc.com>
References: <B17A6910EEDD1F45980687268941550FB1F5C0@MISOUT7MSGUSR9I.ITServices.sbc.com> <52CBEC1F-49E6-4656-A617-CAB7304479F0@rob.sh> <B17A6910EEDD1F45980687268941550FB20F89@MISOUT7MSGUSR9I.ITServices.sbc.com> <FB4C2B5B-E935-4972-ACDD-151AF87DC26A@rob.sh> <B17A6910EEDD1F45980687268941550FB2F6AA@MISOUT7MSGUSR9I.ITServices.sbc.com> <3FE3D8FD-7658-4A42-8D9E-2133825A4061@rob.sh>
In-Reply-To: <3FE3D8FD-7658-4A42-8D9E-2133825A4061@rob.sh>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.70.57.167]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-RSA-Inspected: yes
X-RSA-Classifications: public
X-Spam: [F=0.2000000000; CM=0.500; S=0.200(2010122901)]
X-MAIL-FROM: <ju1738@att.com>
X-SOURCE-IP: [144.160.112.28]
X-AnalysisOut: [v=1.0 c=1 a=AwR8yoXkD5AA:10 a=81eWbNQ_HDgA:10 a=ofMgfj31e3]
X-AnalysisOut: [cA:10 a=BLceEmwcHowA:10 a=kj9zAlcOel0A:10 a=srMsL6ituuWTYe]
X-AnalysisOut: [ky9Bs9mA==:17 a=48vgC7mUAAAA:8 a=8ZcXN_yANpVvQNl1C-EA:9 a=]
X-AnalysisOut: [CjuIK1q_8ugA:10 a=lZB815dzVvQA:10 a=nY9Zm9b37MAMMyBa:21 a=]
X-AnalysisOut: [9Dej72qHM68fRHrk:21]
Cc: 'idr wg' <idr@ietf.org>, "'grow@ietf.org'" <grow@ietf.org>
Subject: Re: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Jul 2012 13:36:21 -0000

Rob,

	Sorry, I meant to respond to you sooner.. In re good/bad paths...

Thanks,
	Jim Uttaro

">>> [Jim U>] I guess what I meant was the other paths that are considered good would be treated differently.. So in an environment where only paths with the mal-formed attr are affected by this error condition as opposed to an environment where all paths are affected ( withdrawn ) would create a inconsistent view of the "good" paths across AS domains.. So not so much the "bad" paths but the "good" paths and how they may be treated differently..

[rjs]: I'm not sure I fully understand here:
	- Today: UPDATE is received from element A and found to be erroneous - session is reset, downstream do not see any paths where A was the best-path in the RIB. 
	- With this draft: UPDATE is received from element A, found to be erroneous, downstream still see all other paths where A is the best-path in the RIB.

[rjs]: I'm not sure that this is so much inconsistency of what the "good" paths look like - both the receiving and downstream elements still consider A's paths as valid, other than the ones that were included in the erroneous UPDATE. In both cases, the NLRI contained in the erroneous UPDATE is also not propagated downstream (session reset, or treat-as-withdraw stops the further propagation).
"

I do not know if my concern actually matters.. My thought was that if AS1 advertises P(1)...P(n) to AS2 and AS3 and AS2 has deployed error handling and AS3 has not, then in a mal-formed scenario AS3 would withdraw all paths and AS2 would only send a withdrawal for the bad paths.. There is no way of AS4 knowing that the "good" paths from AS2 are actually suspect..Not sure if it matters and if AS4 could ever actually know without AS2 sending the "good" paths and informing downstream peers..

-----Original Message-----
From: Rob Shakir [mailto:rjs@rob.sh] 
Sent: Thursday, June 28, 2012 4:49 AM
To: UTTARO, JAMES
Cc: 'grow@ietf.org'; 'idr wg'
Subject: Re: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi Jim,

Apologies for the delay in replying to this message. Further discussion in-line marked [rjs].

On 25 Jun 2012, at 23:47, UTTARO, JAMES wrote:

> [rjs]: Absolutely, this is the current behaviour. The problem with taking a whole session down in this case is that you now take a risk of inconsistency for all NLRI across that session for the duration that you hold onto the learned NLRI. If one avoids being in the situation where the session is down (e.g., by applying treat-as-withdraw behaviour in cases where one can determine the NLRI) then all other NLRI on the session continue to be updated as they need to be. It is only the NLRI that were included in the erroneous UPDATE that may be affected for looping/black-holing.
> 
> 
>>> [Jim U>] The assumption being that the error was caused by an upstream speaker and is therefore not truly indicative of an issue over the session where the error manifests itself. This seems to make sense in the IPV4 case. I am still a bit concerned as I do not understand how the following is addressed.
> 

[rjs]: Actually, I think treat-as-withdraw applied more generally than optional-transitive only does not necessarily imply that the error was not the direct fault of the upstream speaker. However:

> - There is no way of knowing if the adjacent peer is the speaker that is actually responsible for the malformed attr or is coming from an upstream speaker. I can think of no way of knowing this.. Can it be inferred from the notion that an error is of the syntactic or semantic variety?

[rjs]: This is true, only where we have a mechanism such as the partial bit in the optional transitive attribute can we infer that the directly attached neighbour did not look at the session. What the semantic and critical errors that are called out in the draft relate to is the impact of the error on the resulting UPDATE message, rather than the direct neighbour being responsible for it.
[Jim U>] >>> Yup. I was hoping that it may  be possible to infer.. 

> - There seems to be no threshold when the session is actually taking out of service. It would seem that some number of these type of errors would indicate a major issue is taking place and should be addressed by severing the speaker that is advertising paths with the malformed attr into the topology. A large number of these error will create a large number of withdrawn messages being generated from many peers. What are your thoughts on how this should be addressed?

[rjs]: This was something that has been discussed on the list previously. There are two key questions in this space:

- Do you expect errors that are indicative of a whole box failure, that are not related to a large change on the device (e.g., code upgrade) that affect all prefixes in a manner that the UPDATE message is formed well enough to extract the NLRI?

- Is the state of reaching all prefixes withdrawn (with UPDATEs withdrawing all NLRI being sent to all neighbours) an acceptable state? I think there is (of course) a scaling impact of such UPDATES being transmitted and parsed to all downstream neighbours, but the impact of such an event is really dependent on assuming that large proportions of the UPDATEs generated become erroneous.

I don't think that I can categorically state that the answer to the former is no, but I am not aware of the case. In my view, larger scale issues on the BGP speaker (e.g., things that affect memory integrity etc.) result in failures that produce output that is not well enough formed to fall within the "semantic" errors described in the draft. I would (of course) welcome implementor and tester's feed back on this point.
[Jim U>] >>> Not sure either.. But that being said it would be good to sever a speaker if the number of errors in some time t exceed a threshold..

>> I would expect all solutions implemented in response to these requirements to be optional. If the risk of incorrectness is unacceptable to you/an operator, then you should absolutely not enable any of these mechanisms. In a number of networks that I have operated, designed and architected, I am prepared to accept the risk of incorrectness, as I consider it acceptable when compared to the risk of complete service outages in terms of impact to my customers during such incidents. At the moment, without the work described through the requirements outlined in this draft I do not have the means to make that call...
>> [Jim U>] I do not understand how it is possible to make this configurable on a per session or AS basis..I would think all speakers participating in a routing context would have to adhere to the same rules for a consistent view across domains.. In my reading of the IDR draft it seems that it would be a MUST.. Maybe I should not be considering that IDR draft as the actual realization of the reqs..
> 
> [rjs]: The IDR draft is the solution for some of the requirements -- particularly those described in Section 3 of the GROW draft.
>>> [Jim U>] Got it..
> 
> [rjs]: I do not see why this behaviour needs to be consistent across domains?
>>> [Jim U>] Can you explain this

[rjs]: See later point about "good"/"bad" paths. 

> 
> [rjs]: Essentially, if I receive an invalid UPDATE message, and apply treat-as-withdraw, if the advertising speaker did not know that this was erroneous then I end up with a different view of what is in the RIB than the advertising speaker does. If this was a prefix I had no other route to, then I may black-hole, if it was one where it was a more-specific of some larger prefix, then we end up with the potential for loops.
>>> [Jim U>] Yes.. I am not sure I like the notion of forwarding loops especially for large flows.. 

[rjs]: The potential for loops exists in some specific scenarios I think -- especially those where there is a covering aggregate advertised to a speaker, and a more specific that advertised within that aggregate. If this is the case, then in some cases, rather than forwarding back to the device advertising the more specific (i.e., the one that was withdrawn). I think the below example shows something like this - if 10.0.0.0/24 is advertised from C to B, then A forwards packets destined to 10.0.0.0/24, during the time that this prefix is withdrawn, then this will loop. Now, I think that this is a feature of this topology anyway, since where B-C is down, then there will be loops for 10/8 in B-D.

                         <-- 10.0.0.0/8 --
[ A ] --0.0.0.0/0--> [ B ] ---0.0.0.0/0---> [ D ]
                      |
                  10.0.0.0/24
                      |
                    [ C ]

[rjs]: There is then a discussion as to whether one would actually expect such topologies to occur in practical terms. Really, I'd rather expect that there are blackholes (e.g., I only had one path to A, and it got withdrawn, if anyone forwards me packets destined for A, then I drop them) or (more likely in an Internet DFZ perspective) I converge to an alternate path I had to that NLRI.

[rjs]: The reason that this is highlighted in the text is that introducing behaviour into the protocol such that loops may occur is obviously a compromise to protocol correctness, that may be a compromise to overall network forwarding integrity. It's important that this risk is understood, and balanced against the wider impact of session tear down. 

> 
> [rjs]: If I am prepared to accept the black-holing or loops for the NLRI in the erroneous UPDATE as a risk, in favour of keeping the remaining NLRI working (and being updated/withdrawn if they change), then this is a local decision and I do not need to imply any behaviour of the neighbouring domains.
>>> [Jim U>] I guess what I meant was the other paths that are considered good would be treated differently.. So in an environment where only paths with the mal-formed attr are affected by this error condition as opposed to an environment where all paths are affected ( withdrawn ) would create a inconsistent view of the "good" paths across AS domains.. So not so much the "bad" paths but the "good" paths and how they may be treated differently..

[rjs]: I'm not sure I fully understand here:
	- Today: UPDATE is received from element A and found to be erroneous - session is reset, downstream do not see any paths where A was the best-path in the RIB. 
	- With this draft: UPDATE is received from element A, found to be erroneous, downstream still see all other paths where A is the best-path in the RIB.

[rjs]: I'm not sure that this is so much inconsistency of what the "good" paths look like - both the receiving and downstream elements still consider A's paths as valid, other than the ones that were included in the erroneous UPDATE. In both cases, the NLRI contained in the erroneous UPDATE is also not propagated downstream (session reset, or treat-as-withdraw stops the further propagation).

> [rjs] I'd say that it's not just applicable to Ivy[46] in the Internet - but to numerous AFIs (there is a definite use-case for these solutions in L3VPN environments for instance). I am not saying that this is applicable or desirable to be turned on for all AFIs -- but it seems to me that this is a per-operator, per-deployment decision, not a per-AFI one. For instance, if we get an RTC UPDATE that is malformed, an operator may not want to tear down a session if it also carries other AFIs (e.g., Van[46] also) - in that case, the operator may want to treat this UPDATE as withdrawing the {as, route-target} NLRI (consider that we have no *standardized* multi-session mechanism yet, and there are potential scaling impacts of multiple sessions).
> 
>>> [Jim U>] Quite honestly AFs such as RT-C, Flowspec, etc... where the info being propagated is more akin to "configuration" not path info should persist regardless of the session.. This goes to the heart of the discussion of BGP is used for many fields of use that require persistence. It is not only paths that use BGP for dissemination.. I would prefer that this solution is limited to AFs that disseminate reachability/path info not configuration info..

[rjs]: The persistence discussion is a further optimization over this work I feel, it addresses (as you correctly said before) more failure cases. In the case that one UPDATE containing modifications to this configuration information is invalid, is it worth making the rest of it "stale" (and not able to be updated)? I think that in the case, you also want to keep as much of the RIB/config info up-to-date as possible, therefore targeting the error handling mechanism to the contained NLRI still seems advantageous.

[rjs]: Now, the question may be whether treat-as-withdraw is suitable in these cases -- is it better to remove the flow specification, or RT from those installed, or keep it and know that it might be stale? I'd be interested to hear your thoughts here.

[rjs]: On the point of addressing this per-AF, perhaps the text to add to the draft is that behaviour such as treat-as-withdraw must (MUST?) be configurable on a per-AFI basis? The problem with stating something like this, is what does one do when there is no multi-session, and it is disabled for one AFI, yet enabled for another? 

Thanks,
r.