Re: [Idr] draft-uttaro-idr-bgp-persistence-00

Hi Robert,

>Hi Bruno,
>
>Great comparison between GR and persistent !
>
>
>However there is significant two points which need to be highlighted
...
>
>1.
>
>- time t for GR is max expressed as 12 bits in seconds which is 68 min
>... if session does not come up for that long I think it is wise to
>clean up
>
>- time t for persistence is indicated in the draft to be in case of
>L2VPN: "The persist-timer should be set to a large value on the order
of
>days to infinity." In case of L3VPN: "The persist-timer should be set
to
>a large value on the order of hours to a few days."

If you believe 68 minutes is the right max value,
- IMHO Max timer duration should be AS and application specific. There
is no right value. That's also the position taken by GR. Especially as
some routes may be rather static.
- Where does 68 minutes come from? Why would this be the right value?
Why are you more restrictive than the BGP RFC (Hold Time)?
- for consistency, I guess you will also argue against defining a long
timer value in the GR respin draft as has been proposed during the
meeting.

>2.
>
>In both GR and persistance the key to using the path on BGP speakers in
>the data plane is in next hop liveness detection. Be it in scope of the
>draft or out of scope does not really matter. Ingress as you call it
>should not care if RR says this path is suspicious if he can reach just
>fine the next hop of the prefix.

No.
BGP Next-Hop liveliness is important and need to be checked. But this
only one part of the route validity problem as this only checks the BGP
Next-Hop /ASBR. Any downstream BGP routers could withdraw the route.
You/egress/ingress would not know as the BGP session is down. Hence it's
important to mark this path (as "stale" or "not refreshed" if you
prefer) and ingress should care.

>So this is not that we like GR and we do not like persistence. All we
>are saying is that the only real delta between GR and persistance is
>this new signalling informing that path is "perhaps broken". This is
>this "perhaps" which many folks have a problem with.
>
>
>Best regards,
>R.
>
>
>> Russ,
>>
>>> From: Russ White, Sent: Wednesday, November 16, 2011 4:21 AM
>>>
>>>> I am personally completely not convinced that there is any value in
>>>> informing my peers that one of my BGP sessions went down. You
either
>>>> have reachability to next hop and can attract traffic or you do not
>> and
>>>> if so you withdraw.
>>>>
>>>> Telling peers that "I may be perhaps used to reach prefix X as last
>>>> resort" is of highly questionable value.
>>>
>>> I would go farther --this isn't questionable, it's really bad. If
you
>>> have a route you know exists, but you don't want people to use, set
>>> things so it's a "last resort" (wait for BGP, wait for LDP, etc). If
>> you
>>> don't know whether or not you really have reachability, don't
advertise
>> it.
>>
>> As you don't comment on Graceful Restart (GR), I assume that you are
>> fine with GR.
>> Are you?
>>
>> Then I think we need to make a distinction between the information we
>> have on a route, and the routing decision we make based on those
>> information.
>>
>> a) failure assumption:
>> let's assume a PE can lose both its iBGP session toward its redundant
RR
>> Note: the assumption is the same for GR and persistence.
>>
>> b) information available:
>> Once the BGP session(s) are down, we have the following information
>> available on the BGP router: cause of failure, route tags, time (t)
>> elapsed since the session failure, local configuration/preference
>> Note: idem for GR and persistence.
>>
>> At that point, I would say that once t>0, we have no certainty on the
>> validity of the route (from the BGP peer/BGP Next-Hop and from
>> downstream BGP routers). And the longer the duration, the less
>> certainty.
>> Note: for a given "t", this is the same for GR and persistence. Hence
I
>> don't get how you claim that "it's really bad" for persistence while
>> it's ok for GR.
>>
>> c) routing decision:
>> Given the above information and level of uncertainty, at a given "t"
>> time, currently 2 routing decisions are possible:
>> - withdraw the route.
>> 	- This is the regular BGP behavior.
>> 	- The reasoning is that the level of uncertainty is too high to
>> use that route.
>> - keep the route.
>> 	- This is the BGP Graceful Restart behavior.
>> 	- The reasoning is that the level of uncertainty is low enough
>> so that the route can still be considered perfectly valid.
>>
>>
>> But the router making the above decision is the BGP advertising
router
>> hence the egress router. It's lacking some information only available
on
>> the ingress router. Namely the availability of alternate paths/Next
Hop
>> on the ingress. I call that this is an important information to make
the
>> routing decision. Because in the end, each router/AS tries to take
the
>> _best_ decision among _available_ options. This is not a binary
decision
>> between "right/good" and "bad". So compared to GR, BGP persistence
>> proposes to give some additional information and routing decision on
>> upstream ingress BGP routers. (e.g. using "STALE" community, low BGP
>> local pref. But other vehicles could be considered).
>>
>> Could you elaborate on why this would be "really bad"?
>>
>> Thanks,
>> Regards,
>> Bruno
>>
>>> The possible problems involved in saying, "I might have a route to
x,
>>> just in case you don't have any other path there," will end up being
>>> really, really ugly.
>>>
>>> :-)
>>>
>>> Russ
>>> _______________________________________________
>>> Idr mailing list
>>> Idr@ietf.org
>>> https://www.ietf.org/mailman/listinfo/idr
>> _______________________________________________
>> Idr mailing list
>> Idr@ietf.org
>> https://www.ietf.org/mailman/listinfo/idr
>>
>>