Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

On Mon, Jul 23, 2012 at 11:08 AM,  <rob.shakir@bt.com> wrote:
> Hi Chris,
>
> Sure - I will push an -05 version with the changes that were highlighted
> this week (including integrating the changes that Shane and I discussed in
> this thread). It would be great to progress this draft to the IESG
> following this.

sounds good to me, I think submissions may be blocked for a bit while
ietf-meeting-start happens... but once the new version goes in I'll
send (please ping me to remind me when it submits) iesg publication
requests out.

> I don't believe that there are other comments to deal with. The discussion
> between Russ and Robert did not relate to this draft (but was rather a
> discussion of draft-white-grow-overlapping-routes-00 with the wrong
> subject line).

ah, that was my feeling as well... go thread overlap!

-chris

>
> Many thanks,
> r.
>
> On 23/07/2012 03:17, "Christopher Morrow" <christopher.morrow@gmail.com>
> wrote:
>
>>Rob,
>>Did you want to spin a new version of the draft and get final comments
>>from Shane? then move this along to IESG-land?
>>
>>Or are there still comments/issues to deal with from other folk? (the
>>russ/robert discussion seemed to peter out as well)
>>
>>-chris
>>
>>On Wed, Jul 18, 2012 at 1:54 PM, Rob Shakir <rjs@rob.sh> wrote:
>>> Hi Shane,
>>>
>>> Thanks for the comments again, and apologies (again!) for the delay in
>>>responding.
>>>
>>> Please find my responses in-line as [rjs].
>>>
>>> On 11 Jul 2012, at 17:50, Shane Amante wrote:
>>>
>>>>>> [...snip...]
>>>>>
>>>>> [rjs]: I tried to add something to cover this that fits in with
>>>>>Section 1.1:
>>>>>
>>>>>                       <t>
>>>>>                           The combination of the increased number of
>>>>>deployments of BGP-4 as an intra-AS routing protocol, its use for the
>>>>>propagation of additional types of routing and service information,
>>>>>and the growth of IP services has resulted in a substantial increase
>>>>>in the volume of information carried within BGP-4. In numerous
>>>>>networks, RIB sizes of the order of millions of entries exist, with
>>>>>particular high-scale points existing at BGP speakers performing
>>>>>aggregation or functionality designed improve utilisation of network
>>>>>resources (e.g., route reflector hierarchies). Whilst clearly an
>>>>>increase in the amount routing information carried in BGP results in
>>>>>greater impact to services during failures, it is also critical to
>>>>>their recovery time. The increased time to compute new paths following
>>>>>a failures and subsequently re-learn them following recoveries results
>>>>>in greater impact of failures within the protocol, and hence adds
>>>>>further weight to the requirement to
>>>  avoid failures affecting all routing, or service, information carried
>>>via a particular adjacency. Whilst an argument could be made the
>>>convergence time of BGP-4 can be reduced through additional
>>>computational resource being deployed, it is notable that significant
>>>challenges continue to exist for operators of scaling BGP-4, and hence
>>>mechanisms which improve the scalability of the protocol are of
>>>particular note.
>>>>>                       </t>
>>>>
>>>>
>>>> The above looks good, but I've made some minor modifications.  See
>>>>below.
>>>> ---snip---
>>>> The combination of the increased number of deployments of BGP-4 as an
>>>>intra-AS routing protocol, its use for the propagation of additional
>>>>types of routing and service information, and the growth of IP services
>>>>has resulted in a substantial increase in the volume of information
>>>>carried within BGP-4. In numerous networks, RIB sizes of the order of
>>>>millions of entries exist within individual BGP speakers, with
>>>>particularly high-scale points exhibited at BGP speakers performing
>>>>aggregation or functionality designed improve utilisation of network
>>>>resources (e.g., route reflector hierarchies). Whilst clearly an
>>>>increase in the amount routing information carried in BGP results in
>>>>greater impact to services during failures, which is only amplified by
>>>>a corresponding increase in recovery times. Following a failure, there
>>>>is a substantial recovery time to learn, compute and distribute new
>>>>paths, which results in a greater observed impact to services affected,
>>>>and hence adds further
>>>  weight to the requirement to avoid failures altogether or, at least,
>>>mitigate their impact to the narrowest scope possible, (e.g.: a specific
>>>NLRI). Whilst an argument could be made that convergence time of BGP-4
>>>could potentially be reduced through deployment of additional
>>>computational resource, it is notable that solution is not necessarily
>>>straightforward from an implementation or deployment point-of-view,
>>>(e.g.: scaling computation resources within a single address-family is
>>>difficult).  Thus, significant challenges continue to exist for
>>>operators when scaling BGP-4 deployments, and hence mechanisms which
>>>improve the scalability of BGP-4 are very important.
>>>> ---snip---
>>>
>>> [rjs]: Thanks, other than some minor editorial changes I adopted this
>>>paragraph -- it seems like a good hybrid.
>>>
>>>
>>>>>> [...snip...]
>>>>>
>>>>> [rjs]: I'm not quite clear on whether this gets the point across
>>>>>completely - do we think that it is just that things have become in
>>>>>the realm of provisioning activities, or rather is it that there are
>>>>>more and more functions that are overloading onto BGP. I agree that
>>>>>this sentence doesn't necessarily capture that - but do you think that
>>>>>it's the generic information transfer protocol between PEs, as well as
>>>>>replacing provisioning mechanisms?
>>>>
>>>> I believe that you are correct, and better off, in stating "more and
>>>>more functions that are overloaded (sic) onto BGP".  Although, I'm not
>>>>sure that "overloaded" is an appropriate adjective.
>>>
>>> [rjs]: I guess there may be negative connotations of 'overloaded', I
>>>guess what I really mean is maybe "layered" onto BGP -- poor wording
>>>perhaps.
>>>
>>>> The point I was trying to get at is as follows.  I think there's a
>>>>continuum of information exchanged within BGP from real-time
>>>>information (reachability) to less dynamic (perhaps, even static)
>>>>information, with _examples_ of the latter being
>>>>auto-discovery/provisioning use cases.  While traditional applications,
>>>>such as vanilla Internet service for which BGP was originally designed,
>>>>only fall into the "real-time information" category ... there are a lot
>>>>of new(er) applications that do not fit "neatly" in a single category
>>>>and, in fact, span the range of real-time to less dynamic categories
>>>>depending on which facet of a particular protocol you look at,
>>>>(examples being: IPVPN, MVPN, VPLS-BGP, etc.).  Regardless, I don't
>>>>think it's prudent to make value judgements (particularly at this point
>>>>in time when these protocols are already widely deployed and
>>>>successful) as to the "correctness" of these functions/services being
>>>>in BGP, since that's bound to be very subjective.  Rath
>> e
>>>  r, we need to recognize the world for what it is today, which is why I
>>>think use of the word "overloaded" may be inappropriate.  Furthermore, I
>>>think that talking about this in such a context is only recognizing a
>>>symptom (the more complex the system, the higher the probability is to
>>>introduce errors), when in reality we should be trying to focus in on
>>>the root problem: since we've put so many eggs in one basket, we need
>>>unnoticeable (or, faster) recovery from errors that affect real-time,
>>>reachability information.
>>>
>>> [rjs]: Completely agree with this. I think my poor choice of wording
>>>perhaps portrayed my view as negative -- rather, the key point for me is
>>>that the robustness and error handling that we are discussing here is
>>>designed with the vanilla Internet service as the baseline - and as we
>>>extend the protocol to different deployment cases (no judgement about
>>>the value of which is made), then some of the initial assumptions
>>>perhaps don't hold true. I think this is in agreement with yourself,
>>>insofar that I think we would both assert that for the real-time
>>>information, potentially the behaviour required in a number of areas of
>>>the protocol is not the same as the behaviour required for relatively
>>>static information.
>>>
>>>>>
>>>>> [rjs]: Yes - the intention is to define this based on the narrowest
>>>>>set possible, the reason that I used this wording is that (in my view)
>>>>>this is defined by the NLRI actually in the message (if there were
>>>>>differing path attributes for NLRI, then we expect that this is packed
>>>>>into a second UPDATE message). Perhaps a hybrid of our wording would
>>>>>clarify this (unless you think the assertion above is erroneous?).
>>>>
>>>> I see your point now.  How about the following hybrid text?
>>>> ---snip---
>>>> ... it is a requirement of any enhanced error handling mechanism to
>>>>constrain the error handling so that it is narrowly focused on the NLRI
>>>>contained within the bad UPDATE message.
>>>> ---snip---
>>>
>>> [rjs]: Sure, this sounds good.
>>>
>>>>>> 3)  Section 2:
>>>>>> ---snip---
>>>>>> contained within the message.  Since in this case, the message
>>>>>> received from the remote peer is syntactically valid, it is
>>>>>> considered that such an UPDATE is indicative of erroneous data within
>>>>>> a path attribute.  [...]
>>>>>> ---snip---
>>>>>> s/path attribute/path attributes/
>>>>>
>>>>> [rjs]: Is the point here "one or more path attributes"? I'm not sure
>>>>>I quite understand the nit? :-)
>>>>
>>>> Yes, sorry: "one or more path attributes".  (My point was you can't
>>>>predict, here anyway, that it will only a single path attribute that is
>>>>a problem.  Ideally, a more robust error-handling solution would not
>>>>make such assumptions :-).
>>>
>>> [rjs]: ACK, updated this to 'one or more' :-)
>>>
>>>>> Many thanks again for your comments - if you could cast your eyes
>>>>>over the above corrections, and let me know if you feel they're
>>>>>sufficient, that'd be fantastic.
>>>>
>>>> And, thank you Rob for your excellent work on this.
>>>
>>> [rjs]: No worries - I'll take a read through and submit an -05 of the
>>>draft that merges the edits we've discussed in this thread.
>>>
>>> Thanks again for the comments,
>>> r.
>>>
>>> _______________________________________________
>>> GROW mailing list
>>> GROW@ietf.org
>>> https://www.ietf.org/mailman/listinfo/grow
>>_______________________________________________
>>GROW mailing list
>>GROW@ietf.org
>>https://www.ietf.org/mailman/listinfo/grow
>