Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing

Alvaro,

I think this is one of the difficulties of overloading a protocol like BGP
with different datasets -- it's not simple to say how particular attributes
are actually going to be used within a protocol deployment. This was one of
the things that was noted in 7606 -- i.e., I can make *any* attribute
really affect forwarding if I write a policy that accepts/rejects some
UPDATE based on the presence of that attribute.

In general, any topology discovery mechanism (whether used in real-time or
not) needs to define how it handles cases where it might end up with
missing information. Let's consider what the different mechanisms for
discovery we have are today:

   - IGP listening -- in this case, if we have some malformed IS-IS TLV,
   then we might end up discarding this information (whether it be at the
   listening node, or a device that didn't flood it earlier in the chain) --
   meaning that we know that we have some potential gap in the topology.
   - Streaming telemetry -- speaking particularly to gNMI for LSDB
   streaming encoded using the OpenConfig model, here, we are tolerant to
   getting as much information as can be parsed, and have a way to carry
   unknown TLVs (which might include those that cannot be successfully parsed)
   as binary data to the external consumer. This means that the approach is
   "as complete data as possible", but has the same characteristic that we can
   also end up having the potential to lose data.
   - BGP-LS with attribute discard -- this has some information loss, since
   we'll have some attributes that could be malformed in the input data, and
   we discard them at the receiver.

It doesn't seem to me that, given the source of the data is the IGP, and we
might have information discarded there -- that we can really guarantee
strong consistency of an off-box view of the network, since we can't
guarantee strong consistency across the IGP domain itself.

Thus, I'm not sure that the issue that is being highlighted here actually
makes a difference when we're considering the overall system design -- we
always need to deal with the fact that the view of the network at the path
computing node might not match exactly the network's current state in the
presence of malformed protocol messages. One motivation for having the LSDB
via streaming telemetry is the ability to provide such validation ("do all
nodes within my IGP domain, including listeners, have a consistent view of
the state of the network?").

If the discussion is "should we adopt treat-as-withdraw vs. attribute
discard?" -- I don't think that from the system perspective there is really
any difference between the two in this situation. We still have the same
potentially inconsistent view of the network.

For these reasons, I'd err on leaving this unchanged in the current
specification(s).

Cheers,
r.

On Wed, Dec 19, 2018 at 10:13 AM Alvaro Retana <aretana.ietf@gmail.com>
wrote:

> On December 18, 2018 at 6:23:19 PM, Robert Raszuk (rraszuk@gmail.com)
> wrote:
>
> Robert:
>
> Hi!
>
> What comes as #1 question to your points is a comparison of SR controller
> with regular BGP RR.
>
> I think it is safe to assume that error handling on SR controller would be
> no more aggressive then on RRs. So if there is error the updates may be
> dropped on the RRs itself, logged and proper NOC alarm generated.
>
> IMO this is no different regardless if you use SR with BGP-LS or just
> plane regular BGP routing.
>
> In general, I agree that error handling should be the same regardless of
> the type of BGP speaker (RR, controller, PE, whatever).
>
> So unless your goal here is to point out the deficiency of BGP error
> handling RFC I am not sure what is so specific to BGP-LS and SR.
>
> No, the goal is not to point at any deficiency in the error handling RFC.
> I just replied to Bruno saying: " I don’t want to rehash the discussion
> from rfc7606 about the types of approached and whether there should be more
> or not (or what those could be)…. I’m just pointing out that I think the
> current approach is not the right one for all applications.”
>
> When BGP-LS was defined, it was noted that the "information present in
> this document carries purely application-level data that has no immediate
> corresponding forwarding state impact..”  I think that SR has a direct
> impact on the forwarding state of the network.  That is what is specific
> about BGP-LS+SR.
>
>
> To be clear, this thread is about using BGP-LS with applications that have
> an impact on forwarding/route selection in the network, like SR (Bruno
> pointed at lsvr and there may be others).  It is not about about the error
> handling approaches (rfc7606) or BGP sessions in general…just that specific
> application.
>
> Thanks for helping me clarify what I mean.  Hopefully this makes more
> sense. ;-)
>
> Alvaro.
> _______________________________________________
> spring mailing list
> spring@ietf.org
> https://www.ietf.org/mailman/listinfo/spring
>