Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

<rob.shakir@bt.com> Mon, 23 July 2012 15:11 UTC

Return-Path: <rob.shakir@bt.com>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F55C11E8091 for <grow@ietfa.amsl.com>; Mon, 23 Jul 2012 08:11:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.572
X-Spam-Level:
X-Spam-Status: No, score=0.572 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FB_INCREASE_VOL=3.629, RCVD_IN_DNSWL_LOW=-1, SARE_MILLIONSOF=0.315, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LQC0ACorQnsF for <grow@ietfa.amsl.com>; Mon, 23 Jul 2012 08:10:59 -0700 (PDT)
Received: from smtpe1.intersmtp.com (smtp61.intersmtp.com [62.239.224.234]) by ietfa.amsl.com (Postfix) with ESMTP id 45EA211E808E for <grow@ietf.org>; Mon, 23 Jul 2012 08:10:59 -0700 (PDT)
Received: from EVMHT63-UKRD.domain1.systemhost.net (10.36.3.100) by RDW083A005ED61.smtp-e1.hygiene.service (10.187.98.10) with Microsoft SMTP Server (TLS) id 8.3.264.0; Mon, 23 Jul 2012 16:10:57 +0100
Received: from EVMHT01-UKBR.domain1.systemhost.net (193.113.108.42) by EVMHT63-UKRD.domain1.systemhost.net (10.36.3.100) with Microsoft SMTP Server (TLS) id 8.3.264.0; Mon, 23 Jul 2012 16:10:56 +0100
Received: from EMV02-UKBR.domain1.systemhost.net ([169.254.1.214]) by EVMHT01-UKBR.domain1.systemhost.net ([193.113.108.42]) with mapi; Mon, 23 Jul 2012 16:10:56 +0100
From: rob.shakir@bt.com
To: christopher.morrow@gmail.com, rjs@rob.sh
Date: Mon, 23 Jul 2012 16:08:53 +0100
Thread-Topic: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
Thread-Index: Ac1o5VwpHlLn6EJjTYesX4tQ7ZeKhQ==
Message-ID: <CC332685.44846%rob.shakir@bt.com>
In-Reply-To: <CAL9jLaYK6jCKVkWEK3JAyiD8hxZbL_QjT0XvSZe=UfCYycLWgw@mail.gmail.com>
Accept-Language: en-US, en-GB
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.2.1.120420
acceptlanguage: en-US, en-GB
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: grow@ietf.org
Subject: Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Jul 2012 15:11:01 -0000

Hi Chris,

Sure - I will push an -05 version with the changes that were highlighted
this week (including integrating the changes that Shane and I discussed in
this thread). It would be great to progress this draft to the IESG
following this.

I don't believe that there are other comments to deal with. The discussion
between Russ and Robert did not relate to this draft (but was rather a
discussion of draft-white-grow-overlapping-routes-00 with the wrong
subject line).

Many thanks,
r.

On 23/07/2012 03:17, "Christopher Morrow" <christopher.morrow@gmail.com>
wrote:

>Rob,
>Did you want to spin a new version of the draft and get final comments
>from Shane? then move this along to IESG-land?
>
>Or are there still comments/issues to deal with from other folk? (the
>russ/robert discussion seemed to peter out as well)
>
>-chris
>
>On Wed, Jul 18, 2012 at 1:54 PM, Rob Shakir <rjs@rob.sh> wrote:
>> Hi Shane,
>>
>> Thanks for the comments again, and apologies (again!) for the delay in
>>responding.
>>
>> Please find my responses in-line as [rjs].
>>
>> On 11 Jul 2012, at 17:50, Shane Amante wrote:
>>
>>>>> [...snip...]
>>>>
>>>> [rjs]: I tried to add something to cover this that fits in with
>>>>Section 1.1:
>>>>
>>>>                       <t>
>>>>                           The combination of the increased number of
>>>>deployments of BGP-4 as an intra-AS routing protocol, its use for the
>>>>propagation of additional types of routing and service information,
>>>>and the growth of IP services has resulted in a substantial increase
>>>>in the volume of information carried within BGP-4. In numerous
>>>>networks, RIB sizes of the order of millions of entries exist, with
>>>>particular high-scale points existing at BGP speakers performing
>>>>aggregation or functionality designed improve utilisation of network
>>>>resources (e.g., route reflector hierarchies). Whilst clearly an
>>>>increase in the amount routing information carried in BGP results in
>>>>greater impact to services during failures, it is also critical to
>>>>their recovery time. The increased time to compute new paths following
>>>>a failures and subsequently re-learn them following recoveries results
>>>>in greater impact of failures within the protocol, and hence adds
>>>>further weight to the requirement to
>>  avoid failures affecting all routing, or service, information carried
>>via a particular adjacency. Whilst an argument could be made the
>>convergence time of BGP-4 can be reduced through additional
>>computational resource being deployed, it is notable that significant
>>challenges continue to exist for operators of scaling BGP-4, and hence
>>mechanisms which improve the scalability of the protocol are of
>>particular note.
>>>>                       </t>
>>>
>>>
>>> The above looks good, but I've made some minor modifications.  See
>>>below.
>>> ---snip---
>>> The combination of the increased number of deployments of BGP-4 as an
>>>intra-AS routing protocol, its use for the propagation of additional
>>>types of routing and service information, and the growth of IP services
>>>has resulted in a substantial increase in the volume of information
>>>carried within BGP-4. In numerous networks, RIB sizes of the order of
>>>millions of entries exist within individual BGP speakers, with
>>>particularly high-scale points exhibited at BGP speakers performing
>>>aggregation or functionality designed improve utilisation of network
>>>resources (e.g., route reflector hierarchies). Whilst clearly an
>>>increase in the amount routing information carried in BGP results in
>>>greater impact to services during failures, which is only amplified by
>>>a corresponding increase in recovery times. Following a failure, there
>>>is a substantial recovery time to learn, compute and distribute new
>>>paths, which results in a greater observed impact to services affected,
>>>and hence adds further
>>  weight to the requirement to avoid failures altogether or, at least,
>>mitigate their impact to the narrowest scope possible, (e.g.: a specific
>>NLRI). Whilst an argument could be made that convergence time of BGP-4
>>could potentially be reduced through deployment of additional
>>computational resource, it is notable that solution is not necessarily
>>straightforward from an implementation or deployment point-of-view,
>>(e.g.: scaling computation resources within a single address-family is
>>difficult).  Thus, significant challenges continue to exist for
>>operators when scaling BGP-4 deployments, and hence mechanisms which
>>improve the scalability of BGP-4 are very important.
>>> ---snip---
>>
>> [rjs]: Thanks, other than some minor editorial changes I adopted this
>>paragraph -- it seems like a good hybrid.
>>
>>
>>>>> [...snip...]
>>>>
>>>> [rjs]: I'm not quite clear on whether this gets the point across
>>>>completely - do we think that it is just that things have become in
>>>>the realm of provisioning activities, or rather is it that there are
>>>>more and more functions that are overloading onto BGP. I agree that
>>>>this sentence doesn't necessarily capture that - but do you think that
>>>>it's the generic information transfer protocol between PEs, as well as
>>>>replacing provisioning mechanisms?
>>>
>>> I believe that you are correct, and better off, in stating "more and
>>>more functions that are overloaded (sic) onto BGP".  Although, I'm not
>>>sure that "overloaded" is an appropriate adjective.
>>
>> [rjs]: I guess there may be negative connotations of 'overloaded', I
>>guess what I really mean is maybe "layered" onto BGP -- poor wording
>>perhaps.
>>
>>> The point I was trying to get at is as follows.  I think there's a
>>>continuum of information exchanged within BGP from real-time
>>>information (reachability) to less dynamic (perhaps, even static)
>>>information, with _examples_ of the latter being
>>>auto-discovery/provisioning use cases.  While traditional applications,
>>>such as vanilla Internet service for which BGP was originally designed,
>>>only fall into the "real-time information" category ... there are a lot
>>>of new(er) applications that do not fit "neatly" in a single category
>>>and, in fact, span the range of real-time to less dynamic categories
>>>depending on which facet of a particular protocol you look at,
>>>(examples being: IPVPN, MVPN, VPLS-BGP, etc.).  Regardless, I don't
>>>think it's prudent to make value judgements (particularly at this point
>>>in time when these protocols are already widely deployed and
>>>successful) as to the "correctness" of these functions/services being
>>>in BGP, since that's bound to be very subjective.  Rath
> e
>>  r, we need to recognize the world for what it is today, which is why I
>>think use of the word "overloaded" may be inappropriate.  Furthermore, I
>>think that talking about this in such a context is only recognizing a
>>symptom (the more complex the system, the higher the probability is to
>>introduce errors), when in reality we should be trying to focus in on
>>the root problem: since we've put so many eggs in one basket, we need
>>unnoticeable (or, faster) recovery from errors that affect real-time,
>>reachability information.
>>
>> [rjs]: Completely agree with this. I think my poor choice of wording
>>perhaps portrayed my view as negative -- rather, the key point for me is
>>that the robustness and error handling that we are discussing here is
>>designed with the vanilla Internet service as the baseline - and as we
>>extend the protocol to different deployment cases (no judgement about
>>the value of which is made), then some of the initial assumptions
>>perhaps don't hold true. I think this is in agreement with yourself,
>>insofar that I think we would both assert that for the real-time
>>information, potentially the behaviour required in a number of areas of
>>the protocol is not the same as the behaviour required for relatively
>>static information.
>>
>>>>
>>>> [rjs]: Yes - the intention is to define this based on the narrowest
>>>>set possible, the reason that I used this wording is that (in my view)
>>>>this is defined by the NLRI actually in the message (if there were
>>>>differing path attributes for NLRI, then we expect that this is packed
>>>>into a second UPDATE message). Perhaps a hybrid of our wording would
>>>>clarify this (unless you think the assertion above is erroneous?).
>>>
>>> I see your point now.  How about the following hybrid text?
>>> ---snip---
>>> ... it is a requirement of any enhanced error handling mechanism to
>>>constrain the error handling so that it is narrowly focused on the NLRI
>>>contained within the bad UPDATE message.
>>> ---snip---
>>
>> [rjs]: Sure, this sounds good.
>>
>>>>> 3)  Section 2:
>>>>> ---snip---
>>>>> contained within the message.  Since in this case, the message
>>>>> received from the remote peer is syntactically valid, it is
>>>>> considered that such an UPDATE is indicative of erroneous data within
>>>>> a path attribute.  [...]
>>>>> ---snip---
>>>>> s/path attribute/path attributes/
>>>>
>>>> [rjs]: Is the point here "one or more path attributes"? I'm not sure
>>>>I quite understand the nit? :-)
>>>
>>> Yes, sorry: "one or more path attributes".  (My point was you can't
>>>predict, here anyway, that it will only a single path attribute that is
>>>a problem.  Ideally, a more robust error-handling solution would not
>>>make such assumptions :-).
>>
>> [rjs]: ACK, updated this to 'one or more' :-)
>>
>>>> Many thanks again for your comments - if you could cast your eyes
>>>>over the above corrections, and let me know if you feel they're
>>>>sufficient, that'd be fantastic.
>>>
>>> And, thank you Rob for your excellent work on this.
>>
>> [rjs]: No worries - I'll take a read through and submit an -05 of the
>>draft that merges the edits we've discussed in this thread.
>>
>> Thanks again for the comments,
>> r.
>>
>> _______________________________________________
>> GROW mailing list
>> GROW@ietf.org
>> https://www.ietf.org/mailman/listinfo/grow
>_______________________________________________
>GROW mailing list
>GROW@ietf.org
>https://www.ietf.org/mailman/listinfo/grow