Re: [sidr] WGLC: draft-ietf-sidr-origin-ops

Shane Amante <shane@castlepoint.net> Mon, 31 October 2011 05:20 UTC

Return-Path: <shane@castlepoint.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8161A21F8C8F for <sidr@ietfa.amsl.com>; Sun, 30 Oct 2011 22:20:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.349
X-Spam-Level:
X-Spam-Status: No, score=-1.349 tagged_above=-999 required=5 tests=[AWL=-1.050, BAYES_00=-2.599, MANGLED_STOP=2.3]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id feDMcJ2Ar+kP for <sidr@ietfa.amsl.com>; Sun, 30 Oct 2011 22:20:44 -0700 (PDT)
Received: from dog.tcb.net (dog.tcb.net [64.78.150.133]) by ietfa.amsl.com (Postfix) with ESMTP id E7EBA21F8B04 for <sidr@ietf.org>; Sun, 30 Oct 2011 22:20:43 -0700 (PDT)
Received: by dog.tcb.net (Postfix, from userid 0) id 9FFD1268063; Sun, 30 Oct 2011 23:20:38 -0600 (MDT)
Received: from mbpw.castlepoint.net (65-102-206-76.hlrn.qwest.net [65.102.206.76]) (authenticated-user smtp) (TLSv1/SSLv3 AES128-SHA 128/128) by dog.tcb.net with SMTP; Sun, 30 Oct 2011 23:20:38 -0600 (MDT) (envelope-from shane@castlepoint.net)
X-Avenger: version=0.7.8; receiver=dog.tcb.net; client-ip=65.102.206.76; client-port=64388; syn-fingerprint=65535:54:1:64:M1452,N,W2,N,N,T,S; data-bytes=0
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset="windows-1252"
From: Shane Amante <shane@castlepoint.net>
In-Reply-To: <m2hb2q4uhj.wl%randy@psg.com>
Date: Sun, 30 Oct 2011 23:20:21 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <6054A2B1-40D3-49E6-8972-946D426E830B@castlepoint.net>
References: <CAL9jLaaOm_=W85r3P990A6DtROTcQwSJ-KBRzAi9ugw1Bo1_cQ@mail.gmail.com> <E4B4DE52-BBB3-4FA0-A75A-B51824BA83E7@lacnic.net> <m2hb3a7uqp.wl%randy@psg.com> <m2fwiu7uji.wl%randy@psg.com> <CAL9jLabcaLnBbZXbNf7Lbv+ppm-h9yO+wBHunG4s1=emOyM6=w@mail.gmail.com> <805B0799-7026-4532-A53C-4CFE3E863A33@castlepoint.net> <m2hb2q4uhj.wl%randy@psg.com>
To: Randy Bush <randy@psg.com>
X-Mailer: Apple Mail (2.1251.1)
Cc: sidr wg list <sidr@ietf.org>
Subject: Re: [sidr] WGLC: draft-ietf-sidr-origin-ops
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Oct 2011 05:20:45 -0000

Hi Randy,

On Oct 30, 2011, at 4:57 AM, Randy Bush wrote:
[--snip--]
>> 1)  From Section 3:
>> ---snip---
>>   A local valid cache containing all RPKI data may be gathered from the
>>   global distributed database using the rsync protocol, [RFC5781], and
>>   a validation tool such as rcynic [rcynic].
>> ---snip---
>> 
>> Would it be possible to mention and/or point to how the above process is supposed to be bootstrapped?  IOW, is it expected that, eventually?, the RIR's are going to publish to their end-users and maintain URI's of RPKI publication points?  Since this is an Ops guidelines document, some guidance and/or pointers are likely to save [lots of] questions down the road.  I'm not expecting this to be a tutorial document, but some idea on the theory of how a new SP bootstraps their cache(s) would be helpful.
> 
> uh, i am not clear on what you actually want here.  in the minimal case,
> the op should just run rcynic or some equivalent relying party tool, as
> it says.  in the more complex/large case, good quality RP cache code
> should be able to feed from other RP caches.

Let me try again.  :-)  Assume that I'm Joe Random Operator and I'm new to this SIDR thing.  I've just installed my first RPKI cache and would like to bootstrap it with a full set of RPKI data from "the world".  To what RIR's or RP's (or, both) do I go to bootstrap my RPKI cache for the very first time?


>> 2)  Given that, to my knowledge, the RPKI is [very] loosely synchronized in a "pull-only" fashion, shouldn't there be some text added below to that effect that:
>>    a)  It may not be best to go more than, say, 2 levels of RPKI caches deep inside a single organization/ASN to avoid RPKI caches from being out of sync with each other?  IOW, there are likely a small set of 1st/top-level RPKI caches that speak externally to fetch RPKI cache information, (similar to 'hidden' authoritative DNS servers), then a second tier of RPKI caches that synchronize (only) from the top-level RPKI caches, (similar to external, anycast authoritative DNS servers). 
>>    b)  Operators should look at running more aggressive synchronization intervals _internally_ within their organization/ASN, from "children" (2nd-level) RPKI caches to the 'parent' (top-level) RPKI cache in their organization/ASN, compared to more "relaxed" synchronization intervals to RPKI caches external to their organization (top-level RPKI caches in their ASN to RIR's)?
>> ---snip---
>>   Validated caches may also be created and maintained from other
>>   validated caches.  Network operators SHOULD take maximum advantage of
>>   this feature to minimize load on the global distributed RPKI
>>   database.  Of course, the recipient SHOULD re-validate the data.
>> ---snip---
> 
> does b not address a, for those who want very tight synch.

I don't believe so: (a) attempts to address the concern in the draft that you don't want, potentially, dozens of RPKI caches inside a single operator hammering the same set of _external_ RPKI caches "constantly"; (b) suggests that since you've got a hierarchy of RPKI caches in your own ASN, that you most likely _can_ and _should_ keep the 2nd-level of RPKI caches in sync to each other, as best as possible, in order that when that when RPKI information is pushed/pulled into the routers, via RPKI-RTR, that you're hopefully affecting routing policy in your control plane and, hence, forwarding of traffic through your whole AS in a consistent way at nearly the same time.  I view them as two, separate, but related matters.

I would be fine if you suggested that for it is envisioned that a tiered approach to RPKI caches makes sense for "large backbones" and that for "small stub/enterprise/edge networks", (to quote the classification of networks already in Section 1), they may be fine with a single-tier of a handful of RPKI caches.  


> note that the RIRs were talking 24 hour publication cycles, last i heard
> (long ago, i admit).  [ i thought this was nutso ]  so a lot of this has
> yet to play out.

Re: RIR's + 24 hours … IMO, that mainly affects the interval at which it is reasonable for an operator's "top-level" set of RPKI caches go after a "fresh" set of RPKI data.  Again, once the 'top-level' RPKI caches get that data, then I would strongly prefer the "window of time" (IOW, the 'duration') that it takes to cascade that fresh set of data out to my 2nd-level of RPKI caches is as narrow as possible in order that when RPKI-RTR pushes or routers pull that data from their RPKI caches, they're doing so in a very short window of time.  That way, if/when BGP policy is affected by new information in the RPKI, it's not leading to [very] long-lived improper/inconsistent forwarding in the network.


>> While I'm here, I don't think the text in Section 6, "Notes", addresses the above concerns, at all.  In fact, I find it extremely unhelpful to just dismiss this concern, out of hand, with the text: "There is no 'fix' for this, it is the nature of distributed data with distributed caches".  We know what the answer is here: you tune the synchronization intervals to strike the appropriate balance between [very] tight synchronization vs. increased load on the systems being synchronized.  I find it hard to believe a simple suggestion such as this is not proposed in the text, even including the phrase "the suggested values for such synchronization are outside the scope of this document, but will likely be subject to further studies to determine optimal values based on field experience".
> 
> sorry, dns taught us that the answer is not in just running it more
> frequently.  you can narrow the windows, but you can not eliminate them.
> i wish we could, but the protocols which could provide a globally
> synchronized database would be extremely complex and just do not seem
> worth the effort in this case.

To be clear, I recognize that perfect synchronization is hard; however, I'm asking you to acknowledge that one needs to attain much better than lackadaisical synchronization so that there isn't, potentially, massive sloshing of traffic around the network due to new RPKI data showing up in routers at vastly different times.  See just below where I hope you can find additional text that might satisfy my concern.


> your suggested text seems useful, and i will steal and modify if you do
> not mind.  but i suspect we would find tuning has topological and delay
> sensitivities which will prevent optimal recipies.
> 
>    <t>Timing of inter-cache synchronization is outside the scope of
>      this document, but depends on things such as how often routers
>      feed from the caches, how often the operator feels the global RPKI
>      changes significantly, etc.</t>

I would appreciate it if you could also acknowledge that cache synchronization intervals also has a very important dependency on ensuring that all routers in an operator's ASN get updated in as short a time-interval (duration) as possible to ensure there is consistent application of BGP policy and, hence, forwarding of traffic.  


>> 3)  Granted, the following text is only a "SHOULD", but the text offers no reasoning as to why caches should be placed close to routers, i.e.: are there latency concerns (for the RPKI <-> cache protocol), or is it that a geographically distributed system is one way to avoid a single-point-of-failure, or something else entirely?  As a start, just defining "close" would help, e.g.: same POP, same (U.S.) state, same country, same timezone … but, then a statement as to any latency or resiliency requirement for geographic deployment of RPKI caches wold be useful.
> 
> we tried to go down this path and found it just got more and more
> complex with no real improvement.  you probably want them in some
> diameter of transport trust.  you probably want them in some diameter of
> routing bootstrap reach.  you probably want them with reasonable latency
> characteristics.  and there are probably more concerns.  that's why you
> get the big bucks. :)

I'm more concerned for the 100's or 1000's of engineers that were not participating in this initial development effort and will be left wondering what, at a high-level, the thinking/background/concerns were that went into these guidelines.  With that information, future engineers can then weigh those criteria for themselves to decide if they are applicable to their environment, how they are applicable to their environment, etc. 

Remember, ultimately what you're laying out here is going to be read by engineers who are going to look at buying actual server HW, and network ports to attach them to, to set-up an RPKI in their network.  The more background material you give them the more confident they will be on putting together a budget estimate to start such a project … 


>    <t>As RPKI-based origin validation relies on the availability of
>      RPKI data, operators SHOULD locate caches close to routers that
>      require these data and services.  'Close' is, of course, complex.
>      One should consider trust boundaries, routing bootstrap
>      reachability, latency, etc.</t>

While I appreciate the attempt, I don't think the above satisfies my concern.  What 'trust boundaries' are you referring to, e.g.: within your ASN, not within your ASN but within your organization or neither/other?  With respect to latency, is what you're attempting to say that it's recommended to engineer for a high BW x low delay product, in order to achieve fast xfers of data between RPKI caches and also data between the RPKI cache to the routers?  If so, then aren't you conceding that low latency is necessary in order to _strive_ to attain "good [enough]" synchronization between all RPKI caches in the network so that 'new' RPKI data that affects BGP policy gets uniformly applied across all routers in the network at roughly the same time? 

Also, you mention 'routing bootstrap reachability'.  That's actually a very good point and I don't recall seeing anything about that in this document anywhere.  (If it already is, I apologize).  Assuming there's nothing about that in here, then wouldn't it be good to state some guidelines around this in Section 7, Security Considerations, like:
====
It is recommended that operators SHOULD log _and_ exclude *new* RPKI data that downgrades the previous state of ROA's, (e.g.: from Valid -> Invalid or Valid -> Not Found), associated with external RPKI caches, root DNS servers and ccTLD DNS servers so as not to cause a DoS that would lead to an inability to gather fresh, accurate RPKI data.  This information should be evaluated by a human and manually pushed out to RPKI caches and routers in the network after it has been validated as correct.
====

Speaking of which, is it possible to use anything in ghostbusters to 'aid' in the recognition of those critical infrastructure ROA's?

Anyway, the point is we are supposed to be avoiding (automated) circular dependencies here and it would be worth pointing those out to operators, when they read this, so they don't forget to take them into account.



>>    Furthermore, given the [very] loosely synchronized nature of the RPKI, should the text point out that the number of RPKI caches (internal to the organization) be balanced against the potential need of an organization to maintain a more tightly synchronized view, across their entire network, of validated routing information?  A concern might be that if routers in Continent A pull information from their RPKI caches that tell them that ROA is not "Invalid", but other routers in Continent B are still using 'older' information in RPKI caches in Continent B that says the same ROA is either "Not Found" or "Valid", then the result might be that BGP Path Selection swings all traffic from Continent A to Continent B.  At a minimum, this could lead to substantially increased latency or, at worst, congestion, packet-loss or a unintended DoS.  
>> ---snip---
>>   As RPKI-based origin validation relies on the availability of RPKI
>>   data, operators SHOULD locate caches close to routers that require
>>   these data and services.  A router can peer with one or more nearby
>>   caches.
>> ---snip---
> 
> see above

This didn't address my concern, which is primarily about consistent policy across the network at any given moment in time.


>> In Section 5, "Routing Policy":
>> 4)  From a practical standpoint, LOCAL_PREF is already widely used to influence Traffic Engineering, both by an SP as well as by the SP's customers (through the use of "TE communities" sent by a downstream customer to the SP) -- the latter of which is done in order so the customer can influence traffic from the SP toward themselves, (e.g.: one example where a customer prefers a circuit be 'backup' for another circuit only if their other SP is not announcing that same prefix).  In reality, I think that there will have to be significant re-work of an SP's existing BGP policies to encode dual-meanings inside a single LOCAL_PREF attribute, (route validity + TE preference).  It may be good to acknowledge this by recommending that in the text, above, something like:
>> ====
>>    In the short-term, the LOCAL_PREF Attribute may be used to carry both the validity state of a prefix along with it's Traffic Engineering characteristic(s).  It is likely that the SP will have to change their BGP policies such that they can encode these two, separate characteristics in the same BGP attribute without negatively impacting their existing use or leading to accidental privilege escalation attacks. 
>> ====
>> ---snip---
>> Some may choose to use the large Local-Preference hammer.
>> ---snip---
> 
> i would hesitate to tell you *how* to deal with local policy matters.
> the whole point of pfx-validate and this document is that you are free
> to do whatever is appropriate to your needs.  we definitely do not want
> to tell you if or how you should complicate your use of local-pref.
> we did our best to avoid assuming you will affect local-pref at all.

I understand, but since the document has waved its hands that LOCAL_PREF is a potentially viable method that has a very low-bar to deployment, you could at least acknowledge that fact with a little more detail that there are existing uses of LOCAL_PREF and how to accommodate those + this new validity data.


>> 5)  I have three comments on the below:
>>    a)  It's not clear, to me, what is meant by "internal metric" below.  Do you mean MED or IGP metric or something else?  I don't see IGP metric as being practical, so I'm assuming you mean additively altering MED (up|down) based on validity state.  Regardless, I would recommend you state more precisely which BGP Path Attribute you're referring to below.
> 
> we meant MED.  jay caught this the other day, and it is fixed in the
> draft in my edit buffer.
> 
>    <t>Some providers may choose to set Local-Preference based on the
>      RPKI validation result.  Other providers may not want the RPKI
>      validation result to be more important than AS-path length --
>      these providers would need to map RPKI validation result to some
>      BGP attribute that is evaluated in BGP's path selection process
>      after AS-path is evaluated.  Routers implementing RPKI-based
>      origin validation MUST provide such options to operators.</t>

That looks good.


>>    b)  Since MED is passed from one ASN to (only) a second, downstream ASN to influence ingress TE policy, is it "OK" from a security PoV that MED is a *trusted* means to convey ROA validity information from one ASN to a second?  Presumably, the answer should be "heck, no", right?  If that's the case, then wouldn't it be wise to state that:
>>        i)  MED's, encoded with any ROA validity information, should get reset on egress from an ASN to remove said validity information and only carry TE information, as appropriate; and,
>>        ii) MED's should not be trusted on ingress to convey any meaning with respect to validity information?
>>    c)  What is meant by the statement, "might choose to let AS-Path rule"?  Is your intent to state that an SP may choose to just use MED, which follows after LOCAL_PREF & AS_PATH in the BGP Path Selection Algorithm, as a means to determining validity of a particular prefix?  If so, then it would be much more clear if you just stated that, e.g.:
>> ====
>>    If LOCAL_PREF is not used to convey validity information, then MED is likely the next best candidate BGP Attribute that can be used to influence path selection based on the validity of a particular prefix.  As with LOCAL_PREF, care must be taken to avoid changing the MED attribute and creating privilege escalation attacks.
>> ====
>> ---snip---
>>   […]  Others
>>   might choose to let AS-Path rule and set their internal metric, which
>>   comes after AS-Path in the BGP decision process.
>> ---snip---
> 
> if you trust MEDs from a neighbor you are either a fool or have a,
> likely rather complex, contractual and technical agreement.  far be it
> from us to get into such matters.  we abjure general inter-provider
> hygenic practices.  this is not an inter-operator best practices
> document, we're just trying to inform you of where origin-validation
> may affect your design.

I believe you've acknowledged this concern with the text just below re: the more general statement about passing validity information on to third-parties via BGP attributes.


>> Other Comments:
>> 6)  Related to #5, above, BGP Communities are another transitive attribute that /might/ be used to convey validity information of a prefix, or lack thereof, from one ASN to a second ASN (or, more).  However, as we know, there is no means to authenticate BGP Attributes, from one ASN to the next.  So, from a security hygiene perspective, would it be best to say something along the lines of:
>> ====
>> The validity state of routes MUST NOT be transmitted beyond the borders of an SP's ASN, since: a) there is no authenticity of BGP Attributes; and, b) this would place hidden dependencies on the ability of the upstream ASN to validate routes and pass them along to others, which would increase the fragility of the overall system.  Finally, ASN's MUST NOT rely on BGP Attributes received on an eBGP session, to convey any meaning with respect to validity of a particular prefix for the reasons just stated.
>> ====
> 
> ok, since you keep banging your head against this wall, it is clear that
> something saying "do not listen to validity information from another AS"
> is needed.
> 
>    <t>Validity state signialing SHOULD NOT be accepted from a neighbor
>      AS.  The validity state of a received announcement has only local
>      scope due to issues such as scope of trust, RPKI synchrony and
>      <xref target="I-D.ietf-sidr-ltamgmt"/>.</t>

That looks good, but I would suggest a minor change to the latter part of the first sentence:
s/from a neighbor AS/from a neighbor AS that is not under your organization's direct control/


>> 7)  Is this document only intended (scoped?) to cover PE's that can (or, eventually, will) speak the RPKI-RTR protocol for validation?  Or is this document intended to also cover PE's that do not speak RPKI-RTR, but those PE's would obviously need some other mechanism, (e.g.: periodically pushing an updated config to them based on RPKI validated data), in order that they could influence the policy applied to valid routes in such a way that is consistent with other more modern routers that do run RPKI-RTR protocol?  If so, wouldn't it be good to suggest this, even if only as a means to increase the deployment speed?  Or, to at least let readers know that this needs to be considered during their deployment so that they can factor in the load on their [existing] systems that might do this work as well as the effects of the 'loosely synchronized' aspects of the RPKI?
> 
> the former

OK.  So, then in Section 1, would it be prudent to say something to the affect of:
====
The scope of this document is intended to discuss application of RPKI cache data to routers that speak the RPKI-RTR protocol.  Other uses, such pushing RPKI data into routers through a Service Provider's existing management systems or software are outside the scope of this document.
====

Thanks,

-shane