Re: [Sidrops] [WGLC] draft-ietf-sidrops-roa-considerations-01

There is a typo error in the last sentence of Section 1, 
'Sprefixes' should be 'prefixes'. 
This version is good to me.

Ching-Heng Ku

-----Original message-----
From:YAN Zhiwei<yanzhiwei@cnnic.cn>
To:tim<tim@nlnetlabs.nl>
Cc:keyur<keyur@arrcus.com>,sidrops<sidrops@ietf.org>
Date: Fri, 18 Nov 2022 10:03:07
Subject: Re: [Sidrops] [WGLC] draft-ietf-sidrops-roa-considerations-01 - Ends 28/November/2022

Hello, Tim and all,
Thank Tim to provide the revision comments.
For your convenience, I just updated the drafts considering Tim's comments.
Please continue to feedback and comment based on the -04 version.

Name: draft-ietf-sidrops-roa-considerations
Revision: 04
Title: Avoidance for ROA Containing Multiple IP Prefixes
Document date: 2022-11-17
Group: sidrops
Pages: 6
URL: https://www.ietf.org/archive/id/draft-ietf-sidrops-roa-considerations-04.txt

Thank you so much.
BR,

YAN Zhiwei

From: Tim Bruijnzeels
Date: 2022-11-16 18:15
To: YAN Zhiwei
CC: keyur; sidrops
Subject: Re: [Sidrops] [WGLC] draft-ietf-sidrops-roa-considerations-01 - Ends 28/November/2022
Hello Yan,

Responses in-line.

Best regards,

Tim

> On 16 Nov 2022, at 07:53, YAN Zhiwei <yanzhiwei@cnnic.cn> wrote:
> 
> Hello, Tim,
> Thank you very much for you comments and please see my in-line responses.
> BR,
> 
> YAN Zhiwei
>  
> From: Tim Bruijnzeels
> Date: 2022-11-15 04:23
> To: Keyur Patel
> CC: SIDR Operations WG
> Subject: Re: [Sidrops] [WGLC] draft-ietf-sidrops-roa-considerations-01 - Ends 28/November/2022
> Dear chairs and authors,
>  
> > On 13 Nov 2022, at 23:19, Keyur Patel <keyur@arrcus.com> wrote:
> > 
> > Hi Folks:
> >  
> > A working group last call has been re-issued for draft-ietf-sidrops-roa- considerations-03, “Avoidance for ROA containing Multiple Prefixes”. Please reply to the list with your comments. The WGLC will end on November 28th, 2022.
>  
> I agree with the idea behind this document. But, I do not think it's ready.
>  
> I believe that as written assumptions are made that are not true for all RPKI CA implementations. I also believe that the analysis regarding ROA validity times is not relevant to the issue that is being discussed. Furthermore, the tradeoff of space used by many ROAs objects under the same CA - and reducing the risk of over-claims vs reducing that space by aggregating prefixes and accepting the risk of over-claims should be discussed explicitly.
>  
> Detailed comments below. Comments are intended to help improve, from my perspective at least, the quality of the document and the suggestions. So please bear with me. I know it's long..
>  
>  
> Problem Statement 1/4
> =====================
>  
> The Problem Statement section starts with:
>  
>    For a Certification Authority (CA) issuing ROAs containing multiple
>    IP prefixes, adding or deleting one <AS, IP_Prefix> pair causes the
>    (single) ROA for an AS to be withdrawn and reissued.  All IP prefixes
>    for an AS share the same validation state and then this may affect
>    the stability and security of RPKI.
>  
> For the above to be an issue the following would need to happen:
> 1) a large ROA is withdrawn and a new manifest is published with the ROA removed
> 2) a new ROA is issued and published, together with an updated manifest are published
>  
> I.e. the operator or their software is using a break-before-remake strategy. It would be good to warn against that specifically.
>  
> Yan: Yepp, this risk will be warned.
> 
> For some RPKI CA implementations this issue may be more likely to happen in case multiple prefixes are combined on a ROA, but for many other implementations this is not the case.
>  
> Many RPKI CA implementations do not let operators create ROA objects directly. Instead, they ask the user to configure which prefixes should be authorised, and often they will accept a delta of authorisations to be processed as a single update. The RPKI CA will then create/update/withdraw ROA objects as needed, and will publish them as a single multi-element publication (RFC 8181). RPs will accept this delta as a single update, or reject it if they only see a part (see failed fetch in RFC 9286). RPs will exclude duplicates and send a delta of IPv4 or IPv6 PDUs to routers (RFC 8210).
> 
> Yan: multi-element publication could guarantee the synchronization of a create/update/withdraw operation set, but sharing fates with unnecessary prefixes during these operations can also induce risks due to mis-operation or mis-configuration.  
> [Moreover, how RP works may be also implementation-dependent]

CA software that is incapable of multi-element publication is fundamentally broken. If this is the case I would recommend that upgrading is even more important than using a single prefix per ROA.

As I said earlier, I do agree with the direction of one-prefix per ROA. But, I think that this particular paragraph in the problem statement is not generally true. If it would include an if-statement I would like it better, e.g. start it with:

    For a Certification Authority (CA) that is incapable of using multi-element
    publication queries [RFC 8181], issuing ROAs containing multiple IP prefixes,
    adding or deleting one <AS, IP_Prefix> pair can cause the (single) ROA for
    an AS to be withdrawn and reissued.  All IP prefixes for an AS share the same
    validation state and then this may affect the stability and security of RPKI.

A second reason why I think it's important to call this out explicitly is because it's at odds with your suggestion #2 if this were generally applicable. In fact, you could argue that suggestion #2 should not be used by CAs that do not support multi-element publication.

I still believe that it would be confusing to readers who know RFC 9286, because they will realise that this results in a failed fetch. The objects will be out of sync with the manifest content in this case. So, it would not affect any modern RP that implements this. But, it may help in cases where old RP software is used and as such it is still the better option for affected CAs.

> In the latter case problems can still occur if operators choose to remove prefix authorisations first, let their CA update an publish ROAs, to only then re-add some of the same prefixes, and let the CA publish those ROAs.
>  
> So, generically speaking I believe the advice here should be to warn operators and CA implementations against break-before-make update strategies. The correlation with multi/single prefix ROAs is in this context is specific to certain RPKI CA implementations only, and irrelevant to others.
> 
> Yan: It's considerate to warn this operation strategy in the draft.
>  
> Problem Statement 2/4
> =====================
>  
> The second paragraph talks about the risk of over-claiming ROAs. I share the concern and I believe it is the most important reason why single prefix ROAs are preferred.
>  
> However, I believe the premise of this paragraph to be incorrect. Current text:
>  
>    By default, ROAs have an extended validity period.  Resource changes
>    can happen at any time during this validity period.  A certificate
>    change can affect all ROAs using IP prefixes from the issuing
>    certificate.  CAs should carefully coordinate ROA updates with
>    resource certificate updates.  A CA can automate this process if a
>    single entity manages both the parent CA and the CA issuing the ROAs
>    (scenario D [[RFC8211] section 3]).  However, in other deployment
>    scenarios, this coordination becomes more complex.  Furthermore, for
>    the ROA containing multiple IP prefixes, the IP prefixes share the
>    same expiry configuration.  If the ROA is not reissued timely, the
>    whole set of IP prefixes will be affected after expiry.
>  
> The problem is not the validity period of the ROA.
>  
> Yan: The reason to mention "validity period" here is mainly to stress that the resource is likely to change because the ROA has an extended validity period. If the ROA validity period is always shorter than the resource certificate, things may be easier.

The validity period is irrelevant. The CA certificate issued by the parent may be replaced by the parent at any time, well before it would expire. Even if in practice there are processes against unexpected loss of resources, this can fundamentally happen at any time. Furthermore, the problem also applies to resource extensions - which happens more frequently.

If you want to capture the problem as I see it in one sentence then I would suggest that you replace:

    By default, ROAs have an extended validity period.  Resource changes
    can happen at any time during this validity period.

With something like:

    The CA certificate issued by a parent may be replaced by the parent
    at any time resulting in changes in resources. Any ROA object that
    includes resources that are a) no longer contained in the new CA
    certificate, or b) contained in a new CA certificate that is not yet
    discovered by Relying Party software, will be rejected as invalid.

Using single prefix ROAs will help in both cases.

> The problem is that a CA issuing ROAs may not be aware that resources are no longer validly held in their CA certificate, as published by their parent.
>  
> This problem can arise in case the parent pro-actively shrinks the CA certificate of the child before the child has found out (by sending an RFC 6492 resource class list query), and before the child has had a chance to clean up its ROAs.
>  
> This problem can also arise in case the parent issues a new resource to the child, as requested through RFC 6492, and the child then issues a ROA including the new resource *before* the parent has published the new CA certificate and/or *before* RPs have discovered that new CA certificate.
>  
> Using a single prefix per ROA will make these events less impactful, as only actually over-claiming ROA objects would become invalid.
>  
> Note though, that this issue can also affect CA certificates in the delegation chain, and in single prefix ROA objects would still ALL be considered invalid if the issuing CA certificate would be considered invalid because of an over-claim.
>  
> So, while I agree that the single prefix per ROA strategy is better, I believe that talking about the ROA validity time in this context is confusing and irrelevant. Furthermore, this over-claiming issue is not solved completely by this strategy.
>  
> To avoid this issue more generally we should also look at: a) validation-reconsidered - but I know that there is opposition to this approach, or b) better in-band signalling about resource changes and CA certificate publication - as I mentioned in my talk during the WG session at the IETF last week.
>  
> Yan: Agree with you, we are also discussing some in-band or out-band schemes to refine the coordination between parent and child nodes, as shown in the following figure. 
> <Catch.jpg>
> [Just mention it, this is another issue to be discussed later if anyone has interest.]

I think we should take this to another thread. I plan to work on a "problem statement / requirements for RFC 6492/8181/8183" soon. (as I mentioned during the WG session: I don't think these protocols are fundamentally broken, but I believe there is room for improvement)

> Problem Statement 3/4
> =====================
>  
> This third paragraph:
>  
>    Using multiple ROA objects with single IP prefix also allows a CA
>    to affect routing over time based on certificate expiry.  For
>    example, a prefix could be allowed to be originated from an AS only
>    for a specific period of time, such as some IP prefix was leased out
>    temporarily.
>  
> This is not actually a problem, but an RPKI CA implementation specific advice on how ROA object validity times can be used to achieve a specific goal.
>  
> I am not -at all- against the of this strategy in those CAs, but other implementations that want to achieve similar behaviour could achieve this in other ways. E.g. by setting an end time for a configured authorization and updating ROAs and CRLs accordingly when that time comes.
>  
> Yan: Although this is mainly the implementation-dependent issue, it's better to mention it to avoid the related risk.

I just don't think this is a problem statement. But, I think it's fine to include this in the "Suggestions" section as it may help certain implementations.

>  
> Problem Statement 4/4
> =====================
>  
> What I miss in the problem statement is a mention of the trade-off that exists between a) reducing the risk of over-claiming ROAs and accepting that there may be many objects amounting to significant overhead in size (roughly amounting to a EE cert per prefix) versus b) aggregating prefixes to reduce the amount of objects and size, and accepting the risk of over-claims.
>  
> In some contexts the risk of over-claims is next to 0%, e.g. the AS0 ROA used by at least one RIR. But even in other contexts there may be something to be said for reducing space.
>  
> I understand that it would be hard to quantify this. The text under suggestion #2 alludes to it. But it's not specific enough for my taste.
>  
> To be very specific. In my CA implementation I currently have a default threshold of 100 prefixes - after which the system will start aggregating prefixes per ROA in order to save space. This number is user configurable.
>  
> I am looking at this draft for advice on whether that number is in the right ball-park, or that it should be higher, or even that the default should be to *never* aggregate *regardless* of the space that will use unless the operator explicitly sets an aggregation threshold.
>  
> And, yes, I would be perfectly fine with the latter outcome - but the suggestion should be more explicit and the argumentation leading up to it more nuanced.
>  
> Yan: This is a problem we discussed a lot. It’s difficult to give the specific number, because of the complexity of different scenarios and situations (based on management, business requirements and implementation strategies), though it's a great idea. Then give a "threshold" will be rude and rigid. Thus in the current version(03), we give the following suggestion: 2) In some special scenarios, where the resource is very stable or a CA has operational problems producing increased number of individual ROAs, multiple IP prefixes may be aggregated in one ROA. This may include the case of ROA0.

I understand that it's hard to quantify, and I agree that the document should not name a number. But as this is a 'considerations' document I do think it needs to be mentioned more explicitly.

Suggestion #2 is very broad. I don't think it actually applies to AS0 from a CA point of view. Based on experience, my CA implementation doesn't care - in fact adding code to support aggregating prefixes on a single ROA was extra work, as it only implemented single prefix per ROA at the time. The code could be much simpler (which would be good) if we never aggregated. The concern here is the impact on RPs that get to download and validate a huge number of objects. That is why the complexity was introduced.

In short: I think it's more about avoiding operational problems for RPs than CAs.

So, could you consider something like:

   A very large number of individual ROA objects may affect Relying Party software
   in terms of the total size of these objects as well as in processing time needed
   for validation. Therefore, CAs may choose to aggregate multiple IP prefixes in
   such cases.

That still doesn't answer my operational question about which default to choose for my CA software. But it would make the decision process more tangible. As mentioned, I have a default threshold of 100 prefixes now. I am more than happy to increase this number, or even make the aggregation an opt-in for special cases only. I understand that the draft may not be able to give a definitive answer, but I would appreciate feedback on this - offline is also fine..

> Appendix A
> ==========
>  
> There is no reference to this appendix in the document. And the numbers listed here may actually not be that relevant to the discussion in this document simply because many, if not all, ROAs with multiple prefixes fall into the suggestion #2 category.
>  
> If there would be an analysis that could flag the number of occasions where a single prefix really should have been used, but wasn't, then it would serve better to illustrate the scope of the problem this document is trying to address.
>  
> As it is I think it could be omitted in the final document.
>  
> Yan: During the initial phase, we use this appendix to illustrate that aggregation of IP prefixes in one ROA is very common and the ROA validity period is always very long. This is used to support the problem analysis, we think this appendix will be omitted in the final document as you said. 

Ok, sounds good.

I hope it was clear that I was not trying to disqualify this analysis as such. I can see how it was helpful in early stages. But given the suggestions it would be interesting to have some follow up analysis that could analyse in how far we see aggregation per ROA where suggestion #2 is not expected to apply. A hunch I have here is if it can be detected that the parent and child CA are operated in different organisations - i.e. not a hosted platform, but remote delegated CA. This does not need to be part of the document of course.

> 
> Kind regards,
>  
> Tim
>  
>  
>  
>  
>  
> >  
> > The draft can be found at: https://datatracker.ietf.org/doc/draft-ietf-sidrops-roa-considerations/. Authors, please reply indicating whether you're aware of any relevant IPR that hasn't been disclosed.
> >  
> > Regards,
> > Nathalie, Chris & Keyur
> >  
> >  
> > _______________________________________________
> > Sidrops mailing list
> > Sidrops@ietf.org
> > https://www.ietf.org/mailman/listinfo/sidrops

_______________________________________________
Sidrops mailing list
Sidrops@ietf.org
https://www.ietf.org/mailman/listinfo/sidrops