Re: [sidr] WGLC: draft-ietf-sidr-origin-ops-

I've reviewed this draft and have a number of comments:

At a high level, I think this draft is a very important piece of the sidr landscape, so I certainly applaud Randy for writing it.

- The second sentence in the abstract is a fragment, without a direct object.

Section 1 Intro:
- 1st paragraph: if we are going to predicate advice on terms like "widespread deployment," I think we need define them.  I know we all roughly know things like, "widespread means `a lot' of deployment," but the assertion that RPKI-based origin validation has a dependency on some level of penetration either needs to be qualified (or better yet, quantified) or removed, imho.

- 2nd para: "... the next year to five years." As a living document, this work might want to stay away from relative time references.  Will this statement need to be updated every year?
- 2nd para: "... eventually there will be a single root..." Is this assumed in order for this document's advice to matter?  It seems like operational advice ought to be more topical than this.  The advice in this draft is intended to be relevant before the single root, so this reference seems quite out of place.

- 3rd para: s/AS's/AS'/

Section 3:

- 2nd para: Some intuition behind _why_ hierarchy affects performance would be quite useful.

- 3rd para: As this is an operational document, and there is nothing else available, shouldn't this text explain that there is currently no choice of what to use?

- 4th para: The edict that operators should make use of the cache-chaining facility seems like it should have some operational explanation/justification/intuition/something.  It may be good advice, but shouldn't there be some rationale that allows for the evaluation of tradeoffs?
- 4th para: "Of course, the recipient relying parties SHOULD re-validate the date."  This makes it seem (imho) like the benefits of the immediately preceding advice might be lessened... Without an explanation of the rationale, the reader is forced to ask if doing this re-validtion wouldn't cause the same scaling worries as before the chaining.  Without a more detailed explanation, the reader really can't tell.

- 5th para: This seems quite inappropriate to me.  We need to know how this design will scale and function in an operational setting.  If there's any place that this should be discussed, it would seem to be here.  The operational behavior, configuration, and dynamics of the system seem like they must be described in the operational guidance documents.  How else should an operator know what configurations result in which behaviors, and scaling properties, etc?  I don't think this draft can simply punt by saying these operational concerns are beyond the scope of its operational guidance.

- 6th para: What if the objects in a network are multi-mastered?  I wasn't able to tell what this paragraph's guidance was, but it seemed to be a little narrow in its application.  Maybe it is superfluous?

- 7th para: I think this advice seems a little dilute, and may not be as useful as it could be.  Clearly, network configurations can be quite varied, right?  Since this is the case, I think it would be much more helpful for the author to build a strawman here, and overlay advice on it.  In fact, perhaps a set of running strawmen throughout the document would allow various pieces of advice to be hung on specific examples that operators could then adapt to their own configurations.  At the very least, it would provide some context, and possibly some additional substance to the document.

- 8th para: This seems like good advice, but it could use an example (i.e., the above comment).

- 9th para: This paragraph seems to give direction without intuition of the cost/benefits tradeoffs of this decision or the deployment decisions (like how many).  More generally, it seems like there is important advice to give here, but maybe the experts should frame the advice in terms of something like: what (specifically) does an operator pay for $n$ peers and what (specifically) does she gain?

- 10th para: I don't think the text is clear: it seems like upstreams carrying traffic and the trust one has in attestation objects are quite different.  I don't think this is an apt analogy, and it really confuses me (as a reader).  As a result, it's hard for me to understand what the point of that paragraph is (given the inapt analogy).
- 10th para: With the above caveat that I might not be understanding what is being said, the final sentence raises additional concerns for me.  If we recommend that operators use each others' caches, and then force them to revalidate, we are either introducing a new attack vector (cache poisoning of non-authoritative caches), or we increasing the attack surface of an existing attack vector (more caches must be validated because they can lie to me).  Either way, I don't see the benefit gained here, just the drawback.

- 11th para: How does trusting caches relate to mandatory revalidation?  As I read the text (at least, as written), it seems to me that this is a conflation of very different concepts.

- 12th para: Should we define the term ``super-block'' before using it here?  I'm more used to seeing it in the context of filesystems, but that doesn't mean we can't overload its definition here... I just think we need to do so before using it.

- 13th para: I think we need to add some context in this paragraph.  Specifically, I suggest adding the following text t the penultimate sentence, ``, but only for those external routers that have also deployed RPKI-based origin validation.''  And adding the following text to the final sentence, `` for just those RPs.''

- 15th para: I think it is important to be specific, and after claiming that something is ``more likely to be noticed,'' I think we ought to describe _how_ one might/should do so, in an operational setting.  As before, I'd suggest some advice be given through an example.

- 17th para: I think this paragraph makes good sense, but can we get a more quantitative discussion here?  I was just thinking that since this is an operational/engineering document, it might be good to shift this part from the qualitative end of the spectrum over more to the quantitative side.

- 20th para: This paragraph felt a little prescriptive from the policy/provisioning side, to me.  I caught myself wondering if this kind of advice really belongs here?  If it does, then maybe it would be more appropriate to just mention that proxy registration of this kind of data is an option, and cite how well that has worked elsewhere (like with IRRs and stuff)?

- 21st para: s/^While //
- 21st para: This paragraph suggest a period of ``four to six hours.''  I think we need some kind of explanation for these numbers.  As an operations document, it seems to me that we should be discussing tradeoffs and the relative value of different settings, etc.

Section 4:

- 2nd para: I worry that this advice is a little dilute, and (as a result) kind of falls a little limp (i.e., I was not able to clearly see what it was trying to explain).  I think if we had been carrying a strawman (or some strawmen) through the document, it would help bring the point of this paragraph into focus.

- 3rd para: In the same vein as the above, it seems like some examples/strawmen would be quite apropos.

Section 5: 

- 3rd para: ``10.0.666.0/24'' ?  maybe 10.0.6.0/24 ?

- 4th para: This paragraph seems to be offering tractable guidance.  Is there any thinking around the tradeoffs for when to change policy?

- 5th para: s/AS-path/AS_PATH/g

- 6th para: s/it's/its/

- 7th para: Should ``Local Pref'' be normalized to match earlier discussions of ``Local-Preference''?

- 10th para: I think we need to add a little bit of text.  Perhaps add to the last sentence, `` for the same prefix''?

Section 6:

- 1st para: The comment/implication that incoherency is a quality of all distributed caching systems is totally untrue.  In fact, there are many cache protocols with different specific consistency models that accomplish this.  To claim something is not being attempted with RPKI is one thing.  To claim that no system is able to accomplish this is quite different.  Moreover, why is this (clearly a design issue with RPKI) being discussed in this draft (an operational guidance draft)?  This seems like it is definitely the wrong place to talk about this, but regardless, the text is quite wrong.

- 2nd para: I think this paragraph brings up an important point, but doesn't mention a very important operational side effect of that point.  I suggest adding one more sentence to the end, ``Alternately, since no consistency model is attempted, it is possible that routing may not be able to converge in those networks deploying this approach without manual intervention.''

- 3rd para + 4th para: I don't understand how this paragraph is conveying helpful operational advice?

- 10th para: Why was 1 hour chosen?  What are the tradeoffs, etc?

Section 7:

s/AS-Path/AS_PATH/g

Thanks,

Eric

On Aug 17, 2012, at 11:03 AM, Christopher Morrow wrote:

> Hello WG folk,
> This draft has undergone 9 revisions since the last WGLC, which seemed
> to end with requests for changes by the authors.
> Can we now have a final-final-please-let's-progress WGLC for this
> draft now? Let's end the call: 08/31/2012 (Aug 31 2012).
> 
> Htmlized version available at:
> http://tools.ietf.org/html/draft-ietf-sidr-origin-ops-19
> 
> Abstract:
> "Deployment of RPKI-based BGP origin validation has many operational
>   considerations.  This document attempts to collect and present the
>   most critical.  It is expected to evolve as RPKI-based origin
>   validation is deployed and the dynamics are better understood."
> 
> Thanks!
> -Chris
> <co-chair-2-of-3>
> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr