Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-ls-distribution

"Adrian Farrel" <adrian@olddog.co.uk> Sun, 04 October 2015 20:58 UTC

From: Adrian Farrel <adrian@olddog.co.uk>
To: "'Alvaro Retana (aretana)'" <aretana@cisco.com>
References: <022901d0fdc9$0ea4ce80$2bee6b80$@olddog.co.uk> <D2369A00.D559D%aretana@cisco.com>
In-Reply-To: <D2369A00.D559D%aretana@cisco.com>
Date: Sun, 04 Oct 2015 21:57:17 +0100
Message-ID: <02bd01d0fee7$5678c440$036a4cc0$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="windows-1256"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQKMJJdt6aER1MTZdE6+ValWAmuL1wLZgAiRnM6jaPA=
Content-Language: en-gb
Archived-At: <http://mailarchive.ietf.org/arch/msg/idr/OWXRI9PT9AoLmpkUsoCkuMC-fvY>
Cc: idr@ietf.org
Subject: Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-ls-distribution
Precedence: list
Reply-To: adrian@olddog.co.uk

Hi Alvaro,

Here is a better reply snipped still further.

> >> Major:
> >> I would like to have Designated Experts in place soon.  It seems to me
> >> that having 2 of the authors volunteer (primary and backup) would be
> >> ideal.  Please let me know if you want to do it.
> >
> >I think the absence of an answer speaks for itself.
> 
> :-(
> 
> Given the potential for extensions, I¹ll be back knocking soon..

I'm not close enough to running code. I'm sure the WG can find stuckees.

> >> 6.1.2 (Installation and Initial Setup): [default] "maximum rate at which
> >> Link-State NLRIs will be advertised/withdrawn from neighbors is set to
> >> 200 updates per second."
> >>Where did the number come from?  It does look very big.  Are you
> >> assuming just changes or startup as well?
> >> One of my concerns is the interaction with other BGP stuff (like regular
> >>routing!).  In fact, in 6.1.5 (Impact on Network Operation) you wrote:
> >> "Frequency of Link-State NLRI updates could interfere with regular BGP
> >> prefix distribution.  A network operator MAY use a dedicated Route-
> >> Reflector infrastructure to distribute Link-State NLRIs."
> >> Why not use SHOULD?  If there may be significant operational impact I
> >>think you should be more direct with the guidance.
> >
> >There are several questions folded in here.
> >
> >Firstly, understand that the maximum rate described here is the default
> > rate if not overridden by an operator or by an implementation that
> > knows better.
> > So we shouldn't get too hung up on it. Although, obviously, there is a
> > reasonable expectation that if an implementation uses this default
> > value the function should work.
> >
> > Why 200? Not sure where that came from. I have no evidence that 200
> > is a problem and given that IDR has done quite a bit of implementation
> > and interop without anyone screaming, I assume this is OK.
> >https://www.ietf.org/id/draft-ietf-idr-ls-distribution-impl-04.txt doesn't
> >mention any problems. Are you aware of implementations where this is a
> >problem?
> 
> No, but implementation and interop (most likely in a controlled
> environment) is very different than operation in the wild.  But I am sure
> that you (and everyone else listed as an author) obviously knows that.

I think it is tremendously difficult to work out / predict what a reasonable
default value is for operation in  real network. I'd be happy to not give any
guidance because of that, but history says that if we don't suggest a default we
will be caned by the IESG. You can't win :-)

We have picked a value we believe will not be a problem. But it is art not
science.

> > Could BGP-LS advertisements impact on "other BGP stuff"? Well, that
> > really is an implementation and deployment issues? Will the BGP-LS 
> > speaker doing the advertisements also advertise "other BGP stuff" on
> > the same session? Is the advertisement higher cost than the consolidation
> > of IGP-TE information and the application of policy to determine what LS
> > information to advertise. Could the advertisements be prioritised? This
> > has been something of a recent thread on the IDR list and, although you
> > are raising interesting questions, I don't believe we have any way to draw
> > a line unless someone reports real problems that are not caused by
> > implementation choices, yet we are required (by the IESG) to supply
> > defaults for configurable parameters.
> 
> Section 6 is the ³Operational Considerations² section, where I expect you
> to provide operational considerations.  It sounds like you¹re saying we
> need deployment (not just testing) experience so that clear guidance can
> be given and that you¹re otherwise putting numbers in there just because
> you have to, am I interpreting this correctly?

I think so.
We can only start to guess at this stage.
Certainly there are things you can do to be "safe". But they might not be
optimal.

This remains one of the problems with an Operational Considerations section for
a protocol extension where it is unclear what the impact will be.

> > Why "MAY" use a dedicated RR rather than "SHOULD"? Because there really
> > doesn't appear to be any indication that this is a problem in all networks
with
> > all BGP implementations.

So the bottom line is that I haven't made any changes for this.
If you want to suggest some weasel words about how to monitor your deployment
and what to look for that might be a symptom of stuff going wrong, and then what
to do about it, we can surely cut and paste your text.

> >> 6.2.2 (Fault Management). "If an implementation of BGP-LS detects a
> >> malformed attribute, then it SHOULD use the 'Attribute Discard' action.."
> >> Doesn't this mean that the information may be useless, completely missing,
> >> or in the best case incomplete?  Aren't we better off just resetting the
> >> session or at least requesting a route refresh?
> >
> > I don't think we are assuming corruption. So the malformed attribute
> > comes as the result of an implementation difference or a bug at the
> > sender.
> 
> That would still result in the receiver not having the information.  The
> cause doesn¹t seem to matter in this case, the end result does.

Absolutely.
So you have to choose between having most of the info from the network minus
some pieces that you couldn't parse, or resetting the session and having no
information about the network.
Perhaps we should be "raising an alert to the operator" when this happens? (rate
limits may apply)

> > Requesting a refresh runs a significant risk of simply thrashing the
> > advertisement since each time it is sent it will contain the same
> > malformed attribute. This problem is considerably worsened if the
> > session is reset because then we thrash the whole session.
> 
> Of course.
> 
> I was trying (and didn¹t do a good job) to suggest that if we had
> isolation (as in carrying the BGP-LS in a separate session/RR
> infrastructure), then a better job could be done at not only isolating
> from failure but also at recovering information so that both application
> could work better.

So I maintain that you can't recover. All you can do is start again, which is
worse than helpful.

Certainly, if you choose to reset the session then using a separate session for
different things would be a wise idea. But if you do as we suggest then I don'
think it matters because you don't perturb the "normal" BGP work.

> >> 3.3.2.2 (MPLS Protocol Mask TLV)
[snip]
> Now that I revisit this point, I think that the new text should be
> something like this:
> 
>    The ŒMPLS Protocol Mask¹ TLV SHOULD only be used with
>    originators that have local link insight, like for example
>    Protocol-IDs 'Static' or 'Direct' as per Table 2.
> 
> That way, if others come up in the future we¹re covered and not dependent
> on a Table.

OK. I reworked this text.

> >> Section 8 (Security Considerations) "The principal attack a consumer
> >> may apply is to attempt to start multiple sessions either sequentially or
> >> simultaneously."
> >> Isn't this an attack that any other node can try?  Why limit the 
> >> discussion only to consumers?
> >
> > I think you have to be a recognised peer for this attack to have any
> > significant value that is specific to *this* document.
> 
> I meant that a BGP Speaker could also start multiple sessions with a
> consumer, for example.

Hmmm. OK.
I suspect that if a core router is compromised to the extent that it attacks the
BGP-LS consumers, we have *are* worse problems to contemplate.

> >> Curious question:  When encoding a "normal" size node using BGP-LS,
> >> what is the resulting size of the UPDATE?
> >
> >I'll leave this as an exercise for the reader.
> 
> Thanks! :-(
> 
> I was mostly wondering if the information is anywhere close to 4k or not.
> The implementation report doesn¹t say anything either. :-(

OK. I see your point. The problem presumably exists for the IGPs as well (maybe
more for OSPF than for ISIS).

Anyone in the DR WG want to do the math?

Thanks,
Adrian

[Idr] Alvaro's AD Comments on draft-ietf-idr-ls-d… Adrian Farrel
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Acee Lindem (acee)
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Alvaro Retana (aretana)
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Adrian Farrel
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Adrian Farrel
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Adrian Farrel
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Alvaro Retana (aretana)
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Jeffrey Haas
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Alvaro Retana (aretana)
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Susan Hares
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Susan Hares
Re: [Idr] Alvaro's AD Comments on draft-ietf-idr-… Adrian Farrel