Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard

Terry Manderson <terry.manderson@icann.org> Wed, 01 February 2012 00:14 UTC

Return-Path: <terry.manderson@icann.org>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EC20021F84DD for <ietf@ietfa.amsl.com>; Tue, 31 Jan 2012 16:14:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.299
X-Spam-Level:
X-Spam-Status: No, score=-105.299 tagged_above=-999 required=5 tests=[AWL=-1.300, BAYES_50=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gtZCOuAv5RZx for <ietf@ietfa.amsl.com>; Tue, 31 Jan 2012 16:14:14 -0800 (PST)
Received: from EXPFE100-1.exc.icann.org (expfe100-1.exc.icann.org [64.78.22.236]) by ietfa.amsl.com (Postfix) with ESMTP id 7A14C21F84DE for <ietf@ietf.org>; Tue, 31 Jan 2012 16:14:14 -0800 (PST)
Received: from EXVPMBX100-1.exc.icann.org ([64.78.22.232]) by EXPFE100-1.exc.icann.org ([64.78.22.236]) with mapi; Tue, 31 Jan 2012 16:14:14 -0800
From: Terry Manderson <terry.manderson@icann.org>
To: Rob Austein <sra@hactrn.net>
Date: Tue, 31 Jan 2012 16:14:08 -0800
Subject: Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard
Thread-Topic: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard
Thread-Index: AczdfZPjYs44mqS2Q7+VE59s5lkvbgC+NZae
Message-ID: <CB4EC0F0.21105%terry.manderson@icann.org>
In-Reply-To: <20120128050229.98A9717711@thrintun.hactrn.net>
Accept-Language: en-US
Content-Language: en
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ietf@ietf.org" <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Feb 2012 00:14:16 -0000

On 28/01/12 3:02 PM, "Rob Austein" <sra@hactrn.net> wrote:

> At Wed, 21 Dec 2011 17:43:23 -0800, Terry Manderson wrote:
>>
>> Apologies for my lack of attention to date on this topic, so speaking only
>> for myself here.
>
> Similar apologies for not having answered this more promptly.  Somehow
> we missed seeing this until our AD asked us about it.
>
> Please see draft-ietf-sidr-rpki-rtr-25, just posted, which we hope
> addresses most of your concerns (there are a few points on which I
> think we're just going to have to agree to disagree).

I will read -25 soon and raise any concerns should they remain.

>
[..]

> RADIUS doesn't have a bulk transfer operation, and bulk transfer of
> data is the main task of this protocol, particularly at start-up.

Is that function of the protocol now highlighted in -25?

>
> You are certainly entitled to your opinion, but it comes a bit late.
> This work was done in the public view, with regular progress reports
> to the SIDR WG, and we have multiple interoperable implementations
> including several of the major router vendors.  So, with all due
> respect, I don't think the folks who have put work into this will be
> all that interested in abandoning running code at this point.

My example was to highlight that without the rationale for why *this*
protocol was desiired any number of options would/could seem perfectly
reasonable and attractive.

>
>> Glossary:
>>
>> Global RPKI:
>> I disagree with this definition for two reasons. 1) I'm not aware of a
>> unified definition for 'distributed system' so this is all rather vague.
>
> The term has been used to describe DNS for decades.  Also see:
>
>   http://en.wikipedia.org/wiki/Distributed_computing

Citing wikipedia - the end is nigh!

>
>> Perhaps you could say 'published at a disparate set of systems'.
>
> I don't find that any clearer.  Readers who can't understand the words
> "distributed set" aren't likely to understand "disparate set" either.
>

I guess we remain in disagreement :).

>> 2) Limiting
>> the servers to be "at" the "IANA, RIRs, NIRs, and ISPs" is also premature.
>> It's not clear to me that these entities will run their own repositories,
>> nor are they going to be the only repository operators in the lifecycle of
>> the RPKI.
>
> This is essentially the same list as appears in section 1.1 of
> draft-ietf-sidr-arch, with the term "LIR" replaced by "ISP".
>
> I suppose we could add "or other service providers".

I think that would satisfy me.

>
>> Cache:
>> The words surrounding the fetch/refresh mechanisms of the RPKI is limiting.
>> Both draft-ietf-sidr-repos-struct and draft-ietf-sidr-res-certs allow for
>> other (future) retrieval mechanisms as defined by the repository operator
>> beyond RSYNC (loosely documented in RFC5781).
>
> Terry, you've made it quite clear that you disagree with the SIDR WG's
> decision to make rsync the mandatory-to-implement RPKI retrieval
> protocol, but you lost that argument a long time ago, and I fail to
> see the point of bringing it up here yet again.

That wasn't the intent Rob, please re-read the paragraph for the reality
that I think this document still needs to be flexible SHOULD a future
retrieval mechanism develop. If you still think that it shouldn't be
flexible - then we remain in disagreement.

>
>> Last sentence. "Trusting this cache further is a matter between the provider
>> of the cache and a relying party". In my mind the Relying Party was the one
>> that did the RPKI validation - would this not be better stated as "Trusting
>> this cache further is a matter between the provider of the cache and the
>> router operator".
>
> If a router is making decisions based on data given to it by a server,
> the router is the relying party in that relationship.  That the server
> in question was itself the relying party in another relationship does
> not change this.
>
> The picture here is not all that different from the way that some
> vendors have chosen to implement DNSSEC.  It's a two-tier security
> relationship: an end-to-end relationship between the publisher of
> signed objects and the validator of those signed objects, then a
> separate security relationship between the entity that validated the
> signed objects and the end entity that actually uses the data.

I think then we remain in disagreement on the phrasing, spelling out
precisely that the relying party identified here has a trust relationship
only with the cache, and not the larger RPKI is important.

>
>> Deployment Structure:
>>
>> Why repeat the definition of "Global RPKI"? It's superfluous.
>
> Because it's not a definition?
>
> I agree that the text here is similar to the definition, but this
> section is trying to describe the roles in the system.

Then I think the text needs work.

>
>> Local Cache: Again. 'Relying party' seems to be borrowed from the
>> CA/identity world. Unless you redefine that term here it seems as if the
>> "router" is making RPKI validation decisions. Which it is not. The router is
>> acting more like a NAS (See Radius, 2865) when talking to a local cache.
>>
>> The definition of "routers" seems to get this right - eg "a client of the
>> cache".
>
> See above.  "Relying party" is a security relationship term, not just
> a PKI term.
>
>> Operational Overview
>>
>> when you first use "ROA", please expand the TLA, and provide a reference.
>
> Done.
>

Thanks.

>> Serial Query
>>
>> I don't remember seeing a recommendation for how often a client (router)
>> sends a serial query. Is there a Min/Max? Surely doing it every second would
>> be excessive..
>
> Maximum is covered in section 6.2: the router must send a Serial or
> Reset Query no less frequently than once per hour.
>
> Minimum is a good question.  We had been assuming that, as this is an
> in-POP relationship with cache and router operated by the same party,
> there would likely be a knob in the router (router guys live for
> knobs) and setting it would be a matter of local policy.  If you want
> your router to beat up your cache server every minute, who am I to
> stop you?
>
> We needed to set a maximum because that affects the architecture of
> the cache (how long does it need to hold onto old data -- given the
> potential size of the data sets involved, one might implement the
> cache very differently if one needed to hold old data for a week
> rather than an hour).

Thus some recommendation text would be helpful.

>
>> IPv4 Prefix:
>>
>> "and nothing prohibits the existence of two identical
>>    route: or route6: objects in the IRR."
>>
>> Why even mention the IRR here? It just doesn't seem at all relevant. (and
>> isn't defined)
>
> Good catch.  Done.

Thanks

>
>> " IPvX PDUs" expand to IPv4 or IPv6. Globing into one is a misdirection
>> under a heading of 'IPv4 Prefix'
>>
>> IPv6 Prefix
>>
>> Some text here to say that the IPv6 data structure follows the same
>> semantics as the IPv4 data structure would be good.. or alternatively
>> restructure the document to Semantics, then describe the IPv4 and IPv6 data
>> structures as subheadings to Prefix PDUs.
>
> Done.

Thanks

>
>> Error Report
>>
>> What is "excessive length" of a PDU? at what point do you say "o.k, now I
>> can truncate".
>
> Too long to be any valid PDU other than an Error Report.  Done.

Thanks

>
>> Fields of a PDU
>>
>> For all types, instead of using "ordinal" can you use the exact description
>> of the number? eg unsigned integer? For me I always relate ordinals to set
>> theory.
>
> Done.

Thanks

>
>> PDU type, the e,g is incomplete shouldn't it be "IPv4 Prefix = 4" with a
>> forward reference to the IANA Considerations section?
>
> I think this is a matter of stylistic preference.

Yep. I can let that be.

>
>> Serial Number. "for example via rcynic", Is not defined and implementation
>> specific!
>
> Please read the words "for example".
>
> I suppose we could add a reference, but the last time we did that
> somebody objected to having a reference pointing to the source code
> for a particular implementation.

Do you need the example? Perhaps just remove it. (I may have missed it, but
I don't recall seeing bind, or any other reference code base mentioned in
any of the DNS documents.)

>
>> and there is a typo "completing an rigorously validated"..while
>> there, consider why you use the term 'rigorously'..
>
> Sigh.  Next time, please be explicit about the typo you're seeing, our
> eyes repeatedly bounced off the "an" here until after we'd posting
> version -25.  It's not worth yet another rev just to fix that.

ok. Sorry I wasn't explicit at the time.

>
>> are there situations when a validation is less rigorous? If so
>> explain.
>
> I suspect that my co-author was trying to say that one can't just
> retrieve the data, pull the ASNs and prefixes out of the ROAs, and
> feed them into the router, one has to do the RPKI validation first.
>
> I guess we can remove the word if it offends you, but it seems
> harmless.

I just want it to be clear that there is only one level of validation as per
the various RPKI object validation rules.

>
>> Session ID
>>
>> What is the risk of a cache server starting/restarting with the same session
>> ID and serial number as before, but with different cache contents? Is this
>> an entropy concern? Just thinking of a potential scenario where a router is
>> cache-wedged. Is this at all probable? and why not - some words here to
>> cover this would be good.
>
> We added several paragraphs on exactly this topic sometime around IETF
> Last Call, I suspect the version you reviewed did not have that text.
> I think we've addressed this point, please check the current text and
> let us know if there's a further issue here.

I will read.

>
>> Flags
>>
>> Can you reword the binary choice here? Do you actually need to delve into
>> 'right to announce'? This is really about RIB entry behaviors yeah?
>
> The semantics here are closely related to ROAs, which, as you no doubt
> recall, are Route Origin Authorizations, so the text here follows that
> model.
>
> With all due respect, I do not think that a discussion of RIB entry
> behavior here would be simpler.
>

fine.

>> Expand "IPvX".
>
> Done.
>

Thanks

>> Start or Restart:
>>
>> I think the terms in when a router needs to send a serial query or a reset
>> query need to be tighter. Saying MAY here is too loose. I would much prefer
>> to see a structure where if the router does not have a recorded serial for a
>> cache from a previous session, the router MUST send a reset query. Logically
>> you assume that to be the case, so be specific.
>
> I think this is a stylistic matter again.  The router MAY do two
> things here, one of which is only applicable if it has data from a
> previous broken session.
>
> The only real difference I see here between the current formulation
> and the MUST formulation you prefer is that, as currently written, the
> router could chose not to send anything at all initially; this option
> doesn't seem particularly useful, so I don't mind removing it, but
> neither do I see the difference between the current text and your
> suggested change as a big deal.

Perhaps choose whichever has the lower chance of confusion for the router.

>
>> Thereafter the router MAY send a reset query, and SHOULD send a serial
>> query. I suspect this is what the vendors (who have chimed in on the list)
>> have coded.
>>
>> This then corroborates section 4 where you suggest the router only send
>> serial queries for efficiency.
>
> Section 6.2 already says that the typical exchange is for the cache to
> send a Serial Notify, in the expectation that the router will schedule
> an immediate Serial Query.  We didn't make it any stronger than that
> because the folks implementing the router side of this expressed
> concern at the notion that the cache could tell them to do something
> (read: they understand that the notification mechanism will help speed
> convergence, but they're worried that the dinky CPUs they're stuck
> with in some of the relevant hardware will be swamped if they try too
> hard, which is why routers are allowed to ignore notifications and
> caches are rate-limited in sending them).

ok.

>
>> Transport:
>>
>> MiTM is Man in the middle as I and many others know it. 'Monkey/piggy/pickle
>> in the middle' is a child's ball game.
>
> Monkey-in-the-middle is a common non-sexist variant of this term.
> Welcome to the 21st century.

Going back to a gender-neutral section of a professional writing text from
my MBA, it highlights that arbitrarily changing the linguistic definition of
certain gender inclusive scenarios is poor form. If the language where
'Men-in-The-Middle" or "A-Man-in-The-Middle", then certainly change it.

Otherwise Man-in-the-middle is perfectly gender ambiguous. - But that may
also be my style and I will let the RFC Editor handle as appropriate.

>
>> " Therefore, as of this document, there is no mandatory to
>>    implement transport which provides authentication and integrity
>>    protection."
>>
>> if this is the case.. then why? what is the gain?
>
> OK, this is the elephant in the living room.
>

[..]

>
> Nobody is happy with this, but it's the least bad compromise we could
> find between what the IETF would prefer and reality in the field.
>

O.K.

>> why not then make the router fetch the signed objects and do the
>> validation internal - this again seems to be the 'missing
>> requirements' problem.
>
> See "currently shipping routing hardware", above.
>
>> SSH Transport
>>
>> State up front that you MUST use SSHv2. (instead hinting in the third
>> paragraph)
>
> Done.
>

Thanks.

>> TLS Transport
>> "Man in The Middle (MiTM)" please.
>
> Above.
>
>> Router Cache setup
>>
>> "When a more preferred cache becomes available, if resources allow, it
>>    would be prudent for the client to start fetching from that cache."
>>
>> How does the client (I assume router) know when to do this as cache's are
>> not synchronized?? How does a router tell if any particular cache has more
>> current data over another cache? what if two caches contradict each other?
>
> The document repeatedly states that the router has an ordered
> preference list of the caches it uses.  The text you quote here
> doesn't say "has more current data", it says "becomes available", ie,
> it stops rejecting connection attempts, signalling errors, or
> otherwise failing to be useful.

o.k.

>
>> Error codes
>>
>> 6: Withdrawal of Unknown Record (fatal), why drop the session? (which
>> presumably causes a restart) to a cache, assuming the cache is corrupt,
>> which will then send another Unknown Record, which is fatal... (repeat)??
>>
>> Why not mark the cache as corrupt at the client?
>
> This is one of several loss-of-synchronization problems.  The
> assumption is that the router may have (somehow) lost synchronization
> with the cache.  We don't really know which party is confused at this
> point, all we know is that the session itself is no longer useful
> because the router and cache are not communicating clearly.  So the
> router's data isn't necessarily corrupt.
>
> The router won't necessarily restart with this cache right away
> either, it has several options: it might try another cache, it switch
> to another set of data it has already loaded, or might try a reset
> query to this cache.

o.k.

>
>> Security Considerations:
>>
>> Transport Security. There are multiple valid options for a root trust anchor
>> including the structure from the IAB aligning it to the IANA. Perhaps
>> instead of saying " the IANA root trust anchor" say "Global RPKI root trust
>> anchor". Otherwise you might accidently find your validated cache only
>> covers unallocated and reserved blocks.
>
> I think you're saying that using the term IANA here is politically
> incorrect.

No. I'm saying that while discussions are underway, precisely which trust
anchor covers what is still on the table. At this stage one option has
IANA's RPKI CA being authoritative for only unallocated and reserved INRs.
It may be that there is a unified trust anchor above that, known loosely as
the global trust anchor. However tying the document to one particular TA
might result in a gross inaccuracy. Ultimately then, if the global trust
anchor is the IANA TA, you haven't lost.

Cheers
Terry