Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
Jakob Heitz <jakob.heitz@ericsson.com> Mon, 10 December 2012 16:31 UTC
Return-Path: <jakob.heitz@ericsson.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46F1B21F84EE for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 08:31:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.299
X-Spam-Level:
X-Spam-Status: No, score=-6.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8SMc2b4MC3bT for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 08:31:12 -0800 (PST)
Received: from imr4.ericy.com (imr4.ericy.com [198.24.6.9]) by ietfa.amsl.com (Postfix) with ESMTP id 43B1621F84ED for <idr@ietf.org>; Mon, 10 Dec 2012 08:31:12 -0800 (PST)
Received: from eusaamw0706.eamcs.ericsson.se ([147.117.20.31]) by imr4.ericy.com (8.14.3/8.14.3/Debian-9.1ubuntu1) with ESMTP id qBAGfFhe022013; Mon, 10 Dec 2012 10:41:16 -0600
Received: from EUSAAHC007.ericsson.se (147.117.188.93) by eusaamw0706.eamcs.ericsson.se (147.117.20.31) with Microsoft SMTP Server (TLS) id 8.3.279.1; Mon, 10 Dec 2012 11:31:00 -0500
Received: from EUSAAMB109.ericsson.se ([147.117.188.126]) by EUSAAHC007.ericsson.se ([147.117.188.93]) with mapi id 14.02.0318.001; Mon, 10 Dec 2012 11:31:00 -0500
From: Jakob Heitz <jakob.heitz@ericsson.com>
To: Chris Hall <chris.hall@highwayman.com>
Thread-Topic: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
Thread-Index: AQHNyBxqy9fEwUYlVUSnx1DosT0Aspf0/iwAgBcupYCAADNcgP//yMCqgAGK/gCAAJrggIAA4PUAgAAGNdyAAlGBgIAACv4AgADlMQD//9+bDQ==
Date: Mon, 10 Dec 2012 16:30:58 +0000
Message-ID: <828AAFF5-0260-4AA6-BBDC-6C1F69919837@ericsson.com>
References: <20121121191321.6164.6887.idtracker@ietfa.amsl.com> <50AD2986.90705@cisco.com> <058b01cdd3b4$9f5193b0$ddf4bb10$@highwayman.com> <8ED5B0B0F5B4854A912480C1521F973A0F4940@xmb-rcd-x13.cisco.com> <94913EE5-2864-4EE2-B474-9631430B1E22@ericsson.com> <068701cdd478$2cf01cf0$86d056d0$@highwayman.com> <CAEGVVtBy-zdLz8hVajLnuAqgzfgQHrseK4r-N9=pOZGtqV7LbA@mail.gmail.com>, <074d01cdd536$173f5830$45be0890$@highwayman.com> <9474D8DC-30FF-4C52-9504-15CBCC47E7D8@ericsson.com> <07df01cdd661$f28ef7c0$d7ace740$@highwayman.com> <36E98AE5-3EF8-4738-9982-42B9CA0BAAF5@rob.sh>, <005001cdd6da$099f1e90$1cdd5bb0$@highwayman.com>
In-Reply-To: <005001cdd6da$099f1e90$1cdd5bb0$@highwayman.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "idr@ietf.org" <idr@ietf.org>
Subject: Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Dec 2012 16:31:13 -0000
If, at the time a bgp speaker detects a malformation in a received UPDATE, it has completely parsed at least one of: . Withdrawn routes section . NLRI . MP_REACH . MP_UNREACH then it assumes that there are no more of these following. A capability to say so adds nothing. The router will behave exactly the same way. Either way, human intervention is required to restore correct routing. -- Jakob Heitz. On Dec 10, 2012, at 5:27 AM, "Chris Hall" <chris.hall@highwayman.com> wrote: > Rob Shakir wrote (on Sun 09-Dec-2012 at 23:47 +0000): > .... >> In my opinion (and as per Jakob's comment earlier in the thread) you >> are looking at the treat-as-withdraw behaviour in the wrong way. I >> have written multiple messages about this previously, but let me >> reiterate some of the discussions. > > It seems to me that "treat-as-withdraw" is "safe" if, and only if, all > NLRI in a broken UPDATE can be identified. > > I suggest that this can be achieved in two ways: > > a) if the sender always sends the NLRI attributes > as the first attributes -- as per the draft, > > AND > > the receiver knows that is the case -- for which > a capability would serve. > > or: > > b) if the receiver requires (at a minimum) all > attributes to be correctly "framed". > > The advantage of (a) is that one no longer really cares how badly > broken the attributes are. The disadvantage of (a) is that it > requires a small change to the protocol (a much smaller change than > the improved error handling, but nevertheless an extra change). > > The advantage of (b) is that it can be applied without any change at > the sender end. The disadvantage of (b) is that it will not accept > every conceivable form of broken attribute. > > The advantage of "safe" "treat-as-withdraw" is that it does not > introduce any new inconsistency in the RIB -- "first do no harm". > > I am not arguing that safety is essential -- I am trying to be precise > about how safety may be achieved, and what the compromises are in > doing so. If those compromises are unacceptable, then we need to be > clear on the impact of removing the safety belt, so that it is clear > whether things are better or worse: under some circumstances we may be > thrown clear of the pile-up and avoid being burnt to a crisp, or we > may sail through the windscreen and kiss our backsides goodbye, while > on the other hand, an air-bag may be better all round; who can tell ? > >> The current BGP implementation of session reset optimises solely to >> make sure it maintaining single device RIB consistency, and >> knowledge of correct routing information being used by that local >> speaker. What this misses out is the other dimension to the >> correctness, which is that of whether services within the network as >> a system are functional. Where we pursue anything around the revised >> error handling functionality that is being discussed in this draft, >> we balance the correctness of the local device's RIB, against the >> functionality of the overall network system. > > Sure... session-reset is an extreme measure, and can be positively > destructive... cue sound of many babies being ejected with bathwater. > > I'm happy to be counted as a fan of "safe" "treat-as-withdraw". And > yes, that would preserve the correctness of the RIB, or at least avoid > incorrectness thereof. > > In case (b) the safety of "treat-as-withdraw" means that there is an > error for which session-reset would continue to be the result -- > namely a "framing" error. For all other errors, case (b) avoids > session-reset -- hurrah ! Over time case (b) would be overtaken by > case (a), as more devices are upgraded to the new error handling. So, > the residual session-reset cases would dwindle away. > > If "framing errors" are determined to be a significant risk, then I > guess that's an incentive for the deployment of case (a). > > But, "unsafe" "treat-as-withdraw" may still be better than > session-reset, which we are agreed is simply ghastly. Inconsistencies > in the RIB may, as you say, be tolerable in the larger context of the > network, and treatable at an operational level -- getting away from > the tedious bits and stuff that I keep droning on about. > > I note that the inconsistencies which may be introduced by "unsafe" > "treat-as-withdraw" are perhaps different to other inconsistencies: in > particular, the operator can no longer tell which routes are good and > which are bad. In "unsafe" "treat-as-withdraw", each broken UPDATE > may or may not have contained some NLRI which should have been > withdrawn, or which are now out of date, but the receiver does not > know which (if any) NLRI are in that (inconsistent) state ! > > I note also that Appendix A of the draft waxes lyrical on the subject > of "Why not Discard UPDATE Messages". With "unsafe" > "treat-as-withdraw" the effect is to discard *part* of the UPDATE > message -- the part which may or may not (and you cannot tell which) > contain NLRI attribute(s) which have been obscured by earlier broken > attribute(s). > > I do not know how to assess the possible impact of "unsafe" > "treat-as-withdraw"... but Appendix A appears to argue against ? > > It is perfectly possible that I have my hands clenched firmly around > the wrong end of this stick. If the risk of "unsafe" > "treat-as-withdraw" is understood and it is determined that the cure > is (generally) not worse than the disease, then I can let go (yay !). > >> As such, I think we have to accept that the current protocol >> behaviour can be *very* damaging to network deployments and hence >> operators of real networks are prepared to tweak this balance >> somewhat. The requirements draft that I am continuing to edit tries >> to lay out a framework whereby one can limit the amount of time over >> which this inconsistency may affect the network through having means >> by which RIB consistency may be recovered. For example, these are: >> >> - A more selective means by which ROUTE REFRESH can be achieved >> (e.g., one-time ORF, using rt-constrain to refresh a subset of >> routes, or building upon the Enhanced GR UPDATE-VERSION message) - >> which allows the individual speaker to recovery consistency of the >> RIB. > > That appears to require the receiver to know which NLRI are no longer > consistent, which is not entirely possible with "unsafe" > "treat-as-withdraw". > >> - Better ways to be able to do session reset (the observation being >> that the session-level error handling causes most problems due to >> forwarding outages during it) - which is answered by GR based on >> NOTIFICATION, and Enhanced GR. > > This would not help with "unsafe" "treat-as-withdraw", since the > session-reset has been avoided in any case. > > However, it would help in case (b). So, for "framing" errors a > session-reset is required if "unsafe" "treat-as-withdraw" is to be > avoided, but that session-reset would be mitigated along with all > other (residual) session-resets. > > Mind you, changes in GR will require changes at both ends, and I > suspect rather larger changes than those required for case (a) "safe" > "treat-as-withdraw" -- but I guess those GR changes are more generally > a Good Thing. > > .... >> My view is that we should *not* have a capability to indicate this >> behaviour. I would like a means by which I am not reliant on 3rd >> party actions (be it my peers in the dfz, or l3vpn deployments, or >> all device vendors) to begin to address a risk within my network >> deployments. > > OK... to try to summarise succinctly, I think there are two levels at > which "safe" "treat-as-withdraw" may be implemented, as above: > > a) where the sender sends NLRI attributes as required by > section 3 of the draft... > > ...PLUS a capability... without which the receiver > cannot *know* that the sender is being helpful, and > has to assume otherwise. > > This can tolerate any (non-NLRI attribute related) > aberrations. > > b) without any change at the sender end, > > ...OR where the receiver does not *know* that the > sender is being helpful. > > This can tolerate anything except "framing" errors (as > defined elsewhere). > > Ruling out (a) limits the choice to: > > i) case (b) "safe" "treat-as-withdraw" > > ii) "unsafe" "treat-as-withdraw" > > Since "unsafe" "treat-as-withdraw" gives me the screaming hab-dabs, my > view would be that starting with case (b) is a reasonable compromise, > as a first step towards case (a). > > There is obviously an incentive to deploy improved error handling. > Let us assume (for a moment) that improved error handling includes the > case (a) sender behaviour. Early adopters reap the benefit of case > (b) improved error handling immediately on the devices where new > software is deployed. And they reap the benefit of case (a) improved > error handling for their iBGP just as quickly as new software is > deployed across their network. For eBGP, availability of case (a) > improved error handling depends on the strength of the incentive -- > but good coverage requires only that the relatively small number of > Transit Providers adopt reasonably quickly. > > However, given some way of determining the (likely ?) impact of > "unsafe" "treat-as-withdraw", then one could assess whether that is > better or worse than session-reset (under some circumstances ?) -- in > the (unlikely ?) event that some particularly dim BGP implementation > fails to correctly frame a set of attributes. I wish I knew where to > start to untangle this problem. > > Chris >
- [Idr] I-D Action: draft-ietf-idr-error-handling-0… internet-drafts
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Enke Chen
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Saikat Ray (sairay)
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Enke Chen
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Shyam Sethuram
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Brian Dickson
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Enke Chen
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Brian Dickson
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Robert Raszuk
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Rob Shakir
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… bruno.decraene
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… bruno.decraene
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jakob Heitz
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Brian Dickson
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Chris Hall
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… Jeff Wheeler
- Re: [Idr] I-D Action: draft-ietf-idr-error-handli… John Leslie