Re: [sidr] WGLC: draft-ietf-sidr-origin-ops

Randy Bush <> Sun, 30 October 2011 10:57 UTC

Date: Sun, 30 Oct 2011 11:57:44 +0100
From: Randy Bush <>
To: Shane Amante <>
Cc: sidr wg list <>
Subject: Re: [sidr] WGLC: draft-ietf-sidr-origin-ops
thanks for the review!

> - whether it's intended or 'safe' to use BGP Attributes, (MED, communities), to convey validity of prefixes from one ASN to another ASN

what is valid for you may not be valid for me, see draft-ietf-sidr-ltamgmt.

> - better guidance/recommendations around the number, placement and
> - synchronization characteristics of RPKI caches within a SP.

this is a complex net design issue and has way too much dependency on
your architecture.  if you have a simple bit of guidance you would
suggest, do send text.

in general, we tried to give high level guidance as opposed to delving
deeply into your net design.  aside from the latter being far too
prescriptive, the discussion would explode into thousands of
micro-considerations as net designs are quite diverse.  no proof of

luckily, as you demonstrate with your message, a good net eng will
perceive the issues that affect them.  we assume they will then design

> 1)  From Section 3:
> ---snip---
>    A local valid cache containing all RPKI data may be gathered from the
>    global distributed database using the rsync protocol, [RFC5781], and
>    a validation tool such as rcynic [rcynic].
> ---snip---
> Would it be possible to mention and/or point to how the above process is supposed to be bootstrapped?  IOW, is it expected that, eventually?, the RIR's are going to publish to their end-users and maintain URI's of RPKI publication points?  Since this is an Ops guidelines document, some guidance and/or pointers are likely to save [lots of] questions down the road.  I'm not expecting this to be a tutorial document, but some idea on the theory of how a new SP bootstraps their cache(s) would be helpful.

uh, i am not clear on what you actually want here.  in the minimal case,
the op should just run rcynic or some equivalent relying party tool, as
it says.  in the more complex/large case, good quality RP cache code
should be able to feed from other RP caches.

> 2)  Given that, to my knowledge, the RPKI is [very] loosely synchronized in a "pull-only" fashion, shouldn't there be some text added below to that effect that:
>     a)  It may not be best to go more than, say, 2 levels of RPKI caches deep inside a single organization/ASN to avoid RPKI caches from being out of sync with each other?  IOW, there are likely a small set of 1st/top-level RPKI caches that speak externally to fetch RPKI cache information, (similar to 'hidden' authoritative DNS servers), then a second tier of RPKI caches that synchronize (only) from the top-level RPKI caches, (similar to external, anycast authoritative DNS servers). 
>     b)  Operators should look at running more aggressive synchronization intervals _internally_ within their organization/ASN, from "children" (2nd-level) RPKI caches to the 'parent' (top-level) RPKI cache in their organization/ASN, compared to more "relaxed" synchronization intervals to RPKI caches external to their organization (top-level RPKI caches in their ASN to RIR's)?
> ---snip---
>    Validated caches may also be created and maintained from other
>    validated caches.  Network operators SHOULD take maximum advantage of
>    this feature to minimize load on the global distributed RPKI
>    database.  Of course, the recipient SHOULD re-validate the data.
> ---snip---

does b not address a, for those who want very tight synch.

note that the RIRs were talking 24 hour publication cycles, last i heard
(long ago, i admit).  [ i thought this was nutso ]  so a lot of this has
yet to play out.

> While I'm here, I don't think the text in Section 6, "Notes", addresses the above concerns, at all.  In fact, I find it extremely unhelpful to just dismiss this concern, out of hand, with the text: "There is no 'fix' for this, it is the nature of distributed data with distributed caches".  We know what the answer is here: you tune the synchronization intervals to strike the appropriate balance between [very] tight synchronization vs. increased load on the systems being synchronized.  I find it hard to believe a simple suggestion such as this is not proposed in the text, even including the phrase "the suggested values for such synchronization are outside the scope of this document, but will likely be subject to further studies to determine optimal values based on field experience".

sorry, dns taught us that the answer is not in just running it more
frequently.  you can narrow the windows, but you can not eliminate them.
i wish we could, but the protocols which could provide a globally
synchronized database would be extremely complex and just do not seem
worth the effort in this case.

your suggested text seems useful, and i will steal and modify if you do
not mind.  but i suspect we would find tuning has topological and delay
sensitivities which will prevent optimal recipies.

    <t>Timing of inter-cache synchronization is outside the scope of
      this document, but depends on things such as how often routers
      feed from the caches, how often the operator feels the global RPKI
      changes significantly, etc.</t>

> 3)  Granted, the following text is only a "SHOULD", but the text offers no reasoning as to why caches should be placed close to routers, i.e.: are there latency concerns (for the RPKI <-> cache protocol), or is it that a geographically distributed system is one way to avoid a single-point-of-failure, or something else entirely?  As a start, just defining "close" would help, e.g.: same POP, same (U.S.) state, same country, same timezone … but, then a statement as to any latency or resiliency requirement for geographic deployment of RPKI caches wold be useful.

we tried to go down this path and found it just got more and more
complex with no real improvement.  you probably want them in some
diameter of transport trust.  you probably want them in some diameter of
routing bootstrap reach.  you probably want them with reasonable latency
characteristics.  and there are probably more concerns.  that's why you
get the big bucks. :)

    <t>As RPKI-based origin validation relies on the availability of
      RPKI data, operators SHOULD locate caches close to routers that
      require these data and services.  'Close' is, of course, complex.
      One should consider trust boundaries, routing bootstrap
      reachability, latency, etc.</t>

>     Furthermore, given the [very] loosely synchronized nature of the RPKI, should the text point out that the number of RPKI caches (internal to the organization) be balanced against the potential need of an organization to maintain a more tightly synchronized view, across their entire network, of validated routing information?  A concern might be that if routers in Continent A pull information from their RPKI caches that tell them that ROA is not "Invalid", but other routers in Continent B are still using 'older' information in RPKI caches in Continent B that says the same ROA is either "Not Found" or "Valid", then the result might be that BGP Path Selection swings all traffic from Continent A to Continent B.  At a minimum, this could lead to substantially increased latency or, at worst, congestion, packet-loss or a unintended DoS.  
> ---snip---
>    As RPKI-based origin validation relies on the availability of RPKI
>    data, operators SHOULD locate caches close to routers that require
>    these data and services.  A router can peer with one or more nearby
>    caches.
> ---snip---

see above

> In Section 5, "Routing Policy":
> 4)  From a practical standpoint, LOCAL_PREF is already widely used to influence Traffic Engineering, both by an SP as well as by the SP's customers (through the use of "TE communities" sent by a downstream customer to the SP) -- the latter of which is done in order so the customer can influence traffic from the SP toward themselves, (e.g.: one example where a customer prefers a circuit be 'backup' for another circuit only if their other SP is not announcing that same prefix).  In reality, I think that there will have to be significant re-work of an SP's existing BGP policies to encode dual-meanings inside a single LOCAL_PREF attribute, (route validity + TE preference).  It may be good to acknowledge this by recommending that in the text, above, something like:
> ====
>     In the short-term, the LOCAL_PREF Attribute may be used to carry both the validity state of a prefix along with it's Traffic Engineering characteristic(s).  It is likely that the SP will have to change their BGP policies such that they can encode these two, separate characteristics in the same BGP attribute without negatively impacting their existing use or leading to accidental privilege escalation attacks. 
> ====
> ---snip---
> Some may choose to use the large Local-Preference hammer.
> ---snip---

i would hesitate to tell you *how* to deal with local policy matters.
the whole point of pfx-validate and this document is that you are free
to do whatever is appropriate to your needs.  we definitely do not want
to tell you if or how you should complicate your use of local-pref.
we did our best to avoid assuming you will affect local-pref at all.

> 5)  I have three comments on the below:
>     a)  It's not clear, to me, what is meant by "internal metric" below.  Do you mean MED or IGP metric or something else?  I don't see IGP metric as being practical, so I'm assuming you mean additively altering MED (up|down) based on validity state.  Regardless, I would recommend you state more precisely which BGP Path Attribute you're referring to below.

we meant MED.  jay caught this the other day, and it is fixed in the
draft in my edit buffer.

    <t>Some providers may choose to set Local-Preference based on the
      RPKI validation result.  Other providers may not want the RPKI
      validation result to be more important than AS-path length --
      these providers would need to map RPKI validation result to some
      BGP attribute that is evaluated in BGP's path selection process
      after AS-path is evaluated.  Routers implementing RPKI-based
      origin validation MUST provide such options to operators.</t>

>     b)  Since MED is passed from one ASN to (only) a second, downstream ASN to influence ingress TE policy, is it "OK" from a security PoV that MED is a *trusted* means to convey ROA validity information from one ASN to a second?  Presumably, the answer should be "heck, no", right?  If that's the case, then wouldn't it be wise to state that:
>         i)  MED's, encoded with any ROA validity information, should get reset on egress from an ASN to remove said validity information and only carry TE information, as appropriate; and,
>         ii) MED's should not be trusted on ingress to convey any meaning with respect to validity information?
>     c)  What is meant by the statement, "might choose to let AS-Path rule"?  Is your intent to state that an SP may choose to just use MED, which follows after LOCAL_PREF & AS_PATH in the BGP Path Selection Algorithm, as a means to determining validity of a particular prefix?  If so, then it would be much more clear if you just stated that, e.g.:
> ====
>     If LOCAL_PREF is not used to convey validity information, then MED is likely the next best candidate BGP Attribute that can be used to influence path selection based on the validity of a particular prefix.  As with LOCAL_PREF, care must be taken to avoid changing the MED attribute and creating privilege escalation attacks.
> ====
> ---snip---
>    […]  Others
>    might choose to let AS-Path rule and set their internal metric, which
>    comes after AS-Path in the BGP decision process.
> ---snip---

if you trust MEDs from a neighbor you are either a fool or have a,
likely rather complex, contractual and technical agreement.  far be it
from us to get into such matters.  we abjure general inter-provider
hygenic practices.  this is not an inter-operator best practices
document, we're just trying to inform you of where origin-validation
may affect your design.

> Other Comments:
> 6)  Related to #5, above, BGP Communities are another transitive attribute that /might/ be used to convey validity information of a prefix, or lack thereof, from one ASN to a second ASN (or, more).  However, as we know, there is no means to authenticate BGP Attributes, from one ASN to the next.  So, from a security hygiene perspective, would it be best to say something along the lines of:
> ====
> The validity state of routes MUST NOT be transmitted beyond the borders of an SP's ASN, since: a) there is no authenticity of BGP Attributes; and, b) this would place hidden dependencies on the ability of the upstream ASN to validate routes and pass them along to others, which would increase the fragility of the overall system.  Finally, ASN's MUST NOT rely on BGP Attributes received on an eBGP session, to convey any meaning with respect to validity of a particular prefix for the reasons just stated.
> ====

ok, since you keep banging your head against this wall, it is clear that
something saying "do not listen to validity information from another AS"
is needed.

    <t>Validity state signialing SHOULD NOT be accepted from a neighbor
      AS.  The validity state of a received announcement has only local
      scope due to issues such as scope of trust, RPKI synchrony and
      <xref target="I-D.ietf-sidr-ltamgmt"/>.</t>

> 7)  Is this document only intended (scoped?) to cover PE's that can (or, eventually, will) speak the RPKI-RTR protocol for validation?  Or is this document intended to also cover PE's that do not speak RPKI-RTR, but those PE's would obviously need some other mechanism, (e.g.: periodically pushing an updated config to them based on RPKI validated data), in order that they could influence the policy applied to valid routes in such a way that is consistent with other more modern routers that do run RPKI-RTR protocol?  If so, wouldn't it be good to suggest this, even if only as a means to increase the deployment speed?  Or, to at least let readers know that this needs to be considered during their deployment so that they can factor in the load on their [existing] systems that might do this work as well as the effects of the 'loosely synchronized' aspects of the RPKI?

the former