Re: [sidr] come on people Re: The question about https certificates and frequency of mft/crl re-issuance

Hi,

> On 25 Jun 2016, at 09:32, Randy Bush <randy@psg.com> wrote:
> 
>> Look, if no one can summon the energy to respond, Tim has no way to
>> decide on a change.
> 
> i believe that rob laid this out clearly many months ago.  and no, i
> will not look it up for folk; the epicycles have become too painful.

I remember the comments, and I have been in contact with Rob and the other RRDP authors about this. I think we can move forward.

It is useful to separate the issues regarding 1) https certificate verification, and 2) mft/crl re-issuance.

On 1) .. 

In short, we seem to converge on using HTTPS certificate validation to alert about possible problems, but since RPKI objects are signed and can be validated even if the source is untrusted, let's always try to get the latest regardless. We are still working on the text, but the gist of what we would like to put in a -05 doc before the cut-off date:

4.  HTTPS considerations

  It is RECOMMENDED that Relying Parties and Publication Servers follow
  the Best Current Practices outlined in [RFC7525] on the use of HTTP
  over TLS (https).

  Note that a Man-in-the-Middle (MITM) cannot produce validly signed
  RPKI data, but they can perform withhold or replay attacks targeting
  an RP, and keep the RP from learning about changes in the RPKI.
  Because of this RPs SHOULD do TLS certificate and host name
  validation when they fetch from an RRDP Publication Server

  However, such validation issues are often due to configuration
  errors, or a lack of a common TLS trust anchor.  In these cases it
  would be better that the RP retrieves the signed RPKI data
  regardless, and performs validation on it.

  Therefore RPs SHOULD log any TLS certificate or host name validation
  issues they find, so that an operator can investigate the cause.  But
  the RP SHOULD continue to retrieve the data.  The RP MAY choose to
  log this issue only when fetching the notification update file, but
  not when it subsequently fetches snapshot or delta files from the
  same host.  Furthermore the RP MAY provide a way for operators to
  accept untrusted connections for a given host, after the cause has
  been identified.

On 2) CRL/MFT re-issuance

First of all. TL;DR on the below: There are operational considerations that I am happy to share with the group, but if they need to be documented it's not in the RRDP document.

I brought it up because there is some mention of using nextUpdate in CRLs and MFTs as a protection against replays, and I believe the above (under 1) provides a better way to detect this. The 24 hours that is now frequently used is probably way too long anyway to be useful w.r.t. MITM. So I don't think that changing it to 7 days, or even 1 month makes much difference in this regard.

The discussion that we may want to have is whether nextUpdate should be used as an indication of when to fetch data again. This keeps coming up. Steve Kent also suggested having a long default nextUpdate time, and a shorter one when we know that there are changes.

Problem is that the common case for change is unpredictable: a change in routing requires ROAs or BGPSec certificates to be re-issued, and there is a desire that RPs learn about this fast. There was a lot of discussion a few years ago about how fast.. especially Danny McPherson was vocal on this. There is no clear indication of what is fast enough though. My impression is that if changes in RPKI can propagate to RPs (and connected routers) in 10-30 minutes we are in a good spot.

Both rcynic and the RIPE NCC RPKI Validator* will re-fetch at regular intervals regardless of the nextUpdate time. With RRDP I believe we have the scaling to support refetching every 5-10 mins by any RP that wants it. But if we do, we need to lower the churn resulting from MFT/CRL updates.

So, I believe that it's safe to lower the nextUpdate frequency to something in the order of a week, or even a month. Chris Morrow brought up a concern about keeping the cogs in the machine well greased. I hear you, but in our case we have over 3000 hosted CAs, if we re-issue MFTs/CRLs every month we still smear the cogs with a 100 CAs every day.

Finally I also had another thought how we can lower the signal-to-noise ratio of MFT/CRLs vs ROAs in our operations. Currently we re-issue CRLs/MFTs for our 3000+ CAs every X hours, or whenever there is a change in ROAs (no BGPSec yet). We optimised to spread the load on our CPU and HSM. But if we want to optimise for RPs fetching instead, we can change our implementation to do the background CRLs/MFTs updates in large batches (say once per X days), and only do the ROA related updates as soon as they happen. This would allow RPs to aggressively do a cheap fetch for our update notification file every X minutes, and they would only find that they need to do an expensive when there are important changes - and okay once per X days because of the nextUpdates are re-newed.

Anyway, I believe that all of the above is in the space of local operational considerations. There may be merit in discussing, and we may find there is merit in documenting as Informational or BCP, but in my opinion not in the current RRDP document.

Cheers
Tim

*: off-topic.. yes, we need a cool name, ideas welcome ;)

> 
> randy
> 
> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr