Re: [Sidrops] 6486bis: Failed Fetches

Tim Bruijnzeels <tim@nlnetlabs.nl> Mon, 31 August 2020 18:03 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Tim Bruijnzeels <tim@nlnetlabs.nl>
In-Reply-To: <10b9622d-90b0-63e8-288e-858f88835284@verizon.net>
Date: Mon, 31 Aug 2020 20:03:40 +0200
Cc: sidrops@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <291655EE-2255-441B-B425-59BEE6DBE39F@nlnetlabs.nl>
References: <20200817163134.29aa1a6b@glaurung.nlnetlabs.nl> <c1a8fffb-9106-d08e-4254-44ddf1a0115a@verizon.net> <20200818083659.1922a98c@grisu.home.partim.org> <6cebcc89-3e07-8a85-3813-b4ae9887d119@verizon.net> <20200826122539.52493813@glaurung.nlnetlabs.nl> <b71c9c88-fb10-b037-d06a-910711e51e04@verizon.net> <cf7030be-adc4-dde4-7eda-516339fd6c91@verizon.net> <E8A259E6-869D-4D2C-87F0-87462B9731E2@nlnetlabs.nl> <10b9622d-90b0-63e8-288e-858f88835284@verizon.net>
To: Stephen Kent <stkent@verizon.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/xLuwypejX7maSLhnGK20Ojpvmjo>
Subject: Re: [Sidrops] 6486bis: Failed Fetches
Precedence: list

Hi,


> On 29 Aug 2020, at 20:58, Stephen Kent <stkent@verizon.net> wrote:
> 
> Tim,
>> Hi,
>> I agree that a correctly functioning CA will produce a sound set of objects. Also it is highly unlikely that a CA will just publish an expired ROA. The much more likely scenario is that a CA is unable to make or publish updates and the MFT and CRL go stale, and the MFT EE expires, way before any ROA or issued cert would.
> I agree that these seem to be the more likely error modes, but there also was a discussion of encountering more than one CRL, for example.
>> That being said there are scenarios beyond the control of a CA that can lead to invalid objects:
>> 
>> 1) rsync
>> 
>> If the RP does rsync only, they may get an inconsistent data set. Transactions are not supported here. They just get an old MFT and new CRL or vice versa, or they may get a new MFT but not the new object, etc. This is a race condition that happens infrequently when RPs fetch during publication. It is rare, but it does happen.
> I agree that this may happen, even if one publishes all of the objects in a new directory and the  switches the pointer. But, in that case, the RP should consider this a failed fetch and retry. I believe that's what the RPSTIR software did (does?), so it's not a fatal error.
>> I remember that many years ago I wrote code in the, then, RIPE NCC validator 2 to catch this issue. And only process publication points if the set was consistent. This was an allowed, but not mandated, interpretation of 6486.
>> 
>> Note that with RRDP we have deltas. Solving this issues was one of its design goals. So, phasing out rsync could mitigate this.
>> (yes, I realise now that the co-chairs asked me to re-submit the I-D for this, and I still need to do it)
> I'm still waiting to see a clear, and hopefully concise, description of how an RP verifies that the pub point data acquired via RRDP is constrained in the hierarchic fashion imposed by the rsync directory model, but that's a separate issue.

Coming up, I just submitted 'draft-ietf-sidrops-deprecate-rsync-00' as requested by the co-chairs on 16 July. Sorry, for the delay. Let's have the discussion there.

>> 2) resource changes
>> 
>> If the parent removes any resources from the child before the child has had a chance to clean up objects then this will lead to invalid objects. Being strict that *all* objects must be valid will then lead to rejection of the CA's products. Even if it is only for a few hours.
>> 
>> I believe that this issue could be greatly mitigated if RFC6492 would be extended with a notion to give children a warning about planned resource removals. The length of this warning time being a function of the contact frequency of child CAs and the depth of the CA hierarchy. But, if this frequency could also be stipulated - say SHOULD every 10 minutes? Then a warning time of just a few hours would already be plenty.
>> 
>> However, this is a separate discussion. I do not suggest that we block progress on 6486bis for this. I am just saying that if strict checking for *all* objects is where the consensus goes then the WG must be aware of these consequences and accept them.
> yes, this is a separate discussion.

Possible extension of RFC6492 is a separate discussion.

For 6486bis the WG should be aware that intermittent failure cases can come up, and assume that they will. If this happens high enough in the tree - e.g. RIR shrinks NIR, while NIR still issues affected resources to its members - then this will invalidate a big branch, resulting in not founds. This may be a soft landing, but it is a landing.

In short: Given that I expect that we are never come to a consensus other than the total reject-all-if-one-fails - I can live with this if we must, but don't let it come as a surprise.

>> The current version of 6486 allows RPs to treat individual objects as invalid. -bis is trying to change that to say that the whole set of objects must be valid. This changes the consequences for the deployment of new object types, so it is right to consider this now. 
> 6486 allows an RP to behave either way, e.g., to accept some but not all inconsistencies to to reject them. So, it's not quite accurate to suggest that RP software that operates in accordance with 6486 will treat individual objects as invalid- it might, or it might not.
>> The most likely next object type would be the ASPA objects. I don't think that RPs should reject all ROAs because they do not yet understand ASPA. Given that object types in the RPKI get distinct file names, and RPKI signed objects get distinct OIDs I think it would be reasonable to say that RP software can ignore object *types* that it does not understand. I would still advocate then that the RPs do check for the presence and hashes of these objects as stipulated by the signed MFT, but not consider their content.
> Allowing RPs to ignore object types they don't understand prevents a CA from being able to convey the notion that a new object type is important (to that CA). I don't think this is a good strategy. It means that RP behavior will be ambiguous relative to new object types.

So far none of the objects have seemed to need this flag.

>> If we don't then the consequence will be that new object types only become deployable after a significant percentage of RP has been upgraded to version that understand the new type.
>> 
>> Now, if all operators say that they are fine with this: let things go unknown for everyone who did not upgrade, and force them to do so.. sure. But it is right to consider this now and make a conscious choice. Oh and BTW.. for RPKI use cases where we only have VALID/INVALID but no NOT FOUND, such as BGPSec, this would be problematic.
> 
> If we want to have a consistent and flexible approach to accommodating new objects I suggest the strategy I mentioned earlier. Define an additional SIA URI that points to a pub point (and manifest) where we can introduce the next version of the signed object format, one that includes a critical flag, analogous to X.509v3 extensions. This allows each CA to decide which object types have to be processed  by an RP in order for the whole pub point to be accepted vs. rejected. Note that this will require modifying a lot of RFCs, but it is a flexible, extensible approach to this issue.

I agree that it's flexible and extensible. I had not thought of this approach.

But it is a lot of work, not just in RFCs, also in code. It also raises questions about how and when old PPs without the new objects can be deprecated. You can give operators more time to upgrade, but at some point plugs will probably be pulled? Maintaining multiple PPs indefinitely seems rather wasteful.

I would like to hear what others have to say.. I have the feeling that ASPA is getting close, and I would really not like to see it delayed because of this.

If we do go down this road then I think that we should also look at the manifest object itself, and let it convey which object (types) are critical (and while we are at it, we can specify types instead of using filename extensions). That way future object types could introduced more easily perhaps - this obviously needs more discussion but it could even allow for semantics like: 1) new object please test, don't use, 2) new objects, use if you can, 3) new objects, critical - fail if you don't understand.


Tim


> 
> I agree that even if we adopt the current 6486bis , for now, that a flag day is appropriate, and it shoudk be part of the document.
> 
> Steve
> 
>

Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
[Sidrops] 6486bis: Failed Fetches Martin Hoffmann
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Martin Hoffmann
Re: [Sidrops] 6486bis: Failed Fetches George Michaelson
Re: [Sidrops] 6486bis: Failed Fetches Di Ma
Re: [Sidrops] 6486bis: Failed Fetches George Michaelson
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Martin Hoffmann
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Nick Hilliard
Re: [Sidrops] 6486bis: Failed Fetches Randy Bush
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Jay Borkenhagen
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Di Ma
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Jay Borkenhagen
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels
Re: [Sidrops] 6486bis: Failed Fetches Stephen Kent
Re: [Sidrops] 6486bis: Failed Fetches Christopher Morrow
Re: [Sidrops] 6486bis: Failed Fetches Tim Bruijnzeels