Re: [sidr] AD Review of draft-ietf-sidr-delta-protocol-04

Tim Bruijnzeels <tim@ripe.net> Wed, 28 December 2016 16:43 UTC

Return-Path: <tim@ripe.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3668512961A; Wed, 28 Dec 2016 08:43:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10
X-Spam-Level:
X-Spam-Status: No, score=-10 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-3.1] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pqFrlVTwutB8; Wed, 28 Dec 2016 08:43:29 -0800 (PST)
Received: from mahimahi.ripe.net (mahimahi.ripe.net [IPv6:2001:67c:2e8:11::c100:1372]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2138512961C; Wed, 28 Dec 2016 08:43:25 -0800 (PST)
Received: from nene.ripe.net ([193.0.23.10]) by mahimahi.ripe.net with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84_2) (envelope-from <tim@ripe.net>) id 1cMHJt-000Ajy-FI; Wed, 28 Dec 2016 17:43:21 +0100
Received: from sslvpn.ripe.net ([193.0.20.230] helo=vpn-216.ripe.net) by nene.ripe.net with esmtps (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.84_2) (envelope-from <tim@ripe.net>) id 1cMHJr-0000uw-3x; Wed, 28 Dec 2016 17:43:17 +0100
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Content-Type: multipart/mixed; boundary="Apple-Mail=_7C926A38-13BD-4007-BE31-52E9B6CF54D2"
From: Tim Bruijnzeels <tim@ripe.net>
In-Reply-To: <7DE58D01-AC82-4099-9A46-99E625315637@cisco.com>
Date: Wed, 28 Dec 2016 17:43:13 +0100
Message-Id: <99DD12D4-6E13-40A4-95C7-17B12CB61F1C@ripe.net>
References: <7DE58D01-AC82-4099-9A46-99E625315637@cisco.com>
To: "Alvaro Retana (aretana)" <aretana@cisco.com>
X-Mailer: Apple Mail (2.3124)
X-ACL-Warn: Delaying message
X-RIPE-Spam-Level: ------------
X-RIPE-Spam-Report: Spam Total Points: -12.5 points pts rule name description ---- ---------------------- ------------------------------------ -7.5 ALL_TRUSTED Passed through trusted hosts only via SMTP -3.1 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000]
X-RIPE-Signature: 784d7acfe6559f2a0b602ec6519a071914229dff0e700e47d9387ab463138c83
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidr/_sfKNGfgJxbo0gXpfEhrxJvgfqQ>
Cc: "draft-ietf-sidr-delta-protocol@ietf.org" <draft-ietf-sidr-delta-protocol@ietf.org>, Chris Morrow <morrowc@ops-netman.net>, "sidr-chairs@ietf.org" <sidr-chairs@ietf.org>, "sidr@ietf.org" <sidr@ietf.org>
Subject: Re: [sidr] AD Review of draft-ietf-sidr-delta-protocol-04
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidr/>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Dec 2016 16:43:32 -0000

Dear Alvaro, all,

Thank you for your in-depth review :) It really helps to clarify the document. Replies in-line.

I attached an updated document, but.. I did not discuss this with co-authors yet. So, I invite any of them to disagree or suggest changes to the things below.


> On 20 Dec 2016, at 14:15, Alvaro Retana (aretana) <aretana@cisco.com> wrote:
> 
> Dear authors:
>  
> Hi!  I just finished reading this document.
>  
> I have several comments (please see below); I marked many of them as “Major”, some because of the use of Normative language, but my main concern is that I think error conditions in the protocol are underspecified (see M7, M8, M10, below).  Along the same lines, I think that an Operations Considerations section would be of benefit (see M8 below); you might also want to take a look at RFC5706 (Guidelines for Considering Operations and Management of New Protocols and Protocol Extensions).
>  
> I would like to see the error conditions comments addressed before moving this document forward to IETF Last Call.
>  
> Thanks!!
>  
> Alvaro.
>  
>  
> Major:
>  
> M1.  I assume that the id-ad-rpkiNotify value was assigned through the early allocation process.  Instead of pointing at the registry in Section 3.2, point at the IANA Consideration Section --- and there, please remind IANA of the early allocation and request the update.

ack

> M2. Section 3.2. (Certificate Authority Use): “Relying Parties that do not support this delta protocol MUST NOT reject a CA certificate merely because it has an SIA extension containing this new kind of AccessDescription.”  By definition, an RP that has never even considered this document will not support the delta protocol – IOW, trying to specify the behavior of RPs that may have never even seen this document makes no sense to me, and can’t be enforced.  What is the current behavior when an RP receives the extra SIA extension?

The point of documenting this is to give RP software the option of recognising that a new protocol exists, even if they do not fully support it yet. In practice all current RPs do this (see below), and future RPs should consider this document. We have in fact been publishing these additional SIAs in the RIPE NCC production RPKI for more than a year without problems. I changed it slightly to this:

"Relying Parties that choose not to support this delta protocol yet, MUST NOT reject a CA certificate merely because it has an SIA extension containing this new kind of AccessDescription."

But, I can also live with taking this sentence out altogether.

In time we will also seek to phase out rsync altogether. When we do we will probably need to define timelines for both CAs and RPs to support RRPD, make rsync SIAs optional (they are currently still mandatory), and ultimately phase out rsync altogether. But.. we believe that documenting this should be a separate effort.

RP behaviour in detail:

- RIPE NCC RPKI Validator

All versions will ignore the additional SIA as written. Recent versions provide full support of RRDP if the RRPD option is enabled in the configuration file. By default this option is still switched off, but we plan to change this as soon as RRDP is an accepted standard RFC.

- Rcynic

I know that current versions ignore the additional SIA. Recent versions also provide full support of RRDP. But I am not sure what users need to do in order to get a version (branch?) and/or enable support. Maybe Rob can elaborate.

- RPSTIR

Will also ignore the additional SIA. Declan Ma has indicated to the working group that building support for this new protocol is on the roadmap.


> M3. Section 3.3.2. (Publishing Updates): “Whenever the repository server receives updates from a CA it SHOULD generate new snapshot and delta files.  However, if a publication server services a large number of CAs it MAY choose to combine updates from multiple CAs.  If a publication server combines updates in this way, it MUST NOT postpone publishing for longer than one minute.”  The “MUST NOT” at the end (making it mandatory to publish) doesn’t work with the “SHOULD” at the beginning, which has no time constraint and opens to the door to a situation where no publishing is done.  Suggestion:
> NEW>
>   Whenever the repository server receives updates from a CA it MUST
>    generate new snapshot and delta.  If a publication
>    server services a large number of CAs it MAY choose to combine
>    updates from multiple CAs, but it MUST NOT postpone publishing for longer than one
>    minute.

I changed it to:

Whenever the repository server receives updates from a CA it MUST generate new snapshot and delta files within one minute. If a publication server services a large number of CAs it MAY choose to combine updates from multiple CAs. If a publication server combines updates in this way, it MUST ensure that publication never postponed for longer than one minute for any of the CAs involved.


> M4. From 3.3.2. (Publishing Updates): “The update notification file SHOULD be kept small…older delta files that…will result in total size of deltas exceeding the size of the snapshot, MUST be excluded.”  Here the “SHOULD” and the “MUST” are in contradiction: you either do it always (MUST) or there may be cases when you don’t (SHOULD).  s/SHOULD/should

Ok, I moved the file size concern to the next bullet point instead, so that we have:

 o Any older delta files that, when combined with all more recent delta files, will result
   in a total size of deltas exceeding the size of the snapshot, MUST be excluded to avoid
   that RPs download more data than necessary.

 o The server MAY also exclude more recent delta files if it finds that their usage by a
   small number of RPs that would be forced to perform a full synchronisation is outweighed
   performance benefits of having a smaller update notification file. However, the repository
   server MUST include all deltas that it has available for the last two hours.

I hope that explains it better. I changed the SHOULD in the second bullet point to a MUST. The unspoken reason for the SHOULD was that a publication server may not have deltas.

 
> M5. Section 3.4.1. (Processing the Update Notification File) says that “a Relying Party (RP)…SHOULD prefer to use this protocol as follows.”  I think you really want to be explicit: s/SHOULD prefer to use/SHOULD use

ok, what I meant to convey here was the RRPD SHOULD be preferred over rsync. But.. I am fine with your suggestion, and leaving any rsync phase-out concerns to a future document.
 
> M6. 3.4.1. (Processing the Update Notification File): “The RP SHOULD download the update notification file, unless an update notification file was already downloaded and processed from the same location in this validation run.”  Is there any other reason for the file not to be downloaded?  Why would the RP decide not to (“SHOULD”)?  If there are no other reasons, then why isn’t the “SHOULD” a “MUST”?

With polling one would download and process and only start validation in case there is a change, and then there is no need to this again, but to make this more clear I changed it to:

The RP MUST download the update notification file, unless an update notification file was already downloaded and processed from the same location in this validation run, or because a polling strategy was used (see Section 3.4.4).
 
> M7. 3.4.1. (Processing the Update Notification File): “MUST issue an operator error”  What is an “operator error” and how do you issue one?  I didn’t see an error definition in the schema.

I see your point, but.. afaik none of the other sidr RFCs and I-Ds define this, and just avoid talking about this altogether. This was undefined in over five years of deployment with rsync, and yet operational issues with rsync also happen and get reported and resolved somehow. In short I suggest that I just leave it out here as well.

But to continue: All RPs do have the common sense to inform their users about issues (e.g. also things like: hey, this certificate is invalid because..), but there is no standard defined, so they use various formats and ways. I do see some benefit in defining at least a common standard because it might help (1) monitoring, (2) normative documenting of error conditions and responses, and (3) interop testing between RPs (Rob, David and I have done this a few times and it's a bit of a hassle). Doing so is imo major work and something for a separate sidr-ops document.
 
> M8. 3.4.1. (Processing the Update Notification File): “If neither update notification file and one snapshot file or delta files could be processed…SHOULD use an alternate repository retrieval mechanism if it is available.”  This “SHOULD” doesn’t define anything normative with respect to the Delta Protocol.  I think that this document would benefit from a short Operation Considerations section; a place to indicate expectations of the operators, potential issues, the fact (as expressed in this piece of text) that an alternate mechanism should be enabled (or at least considered), other considerations related to logging errors for the operator (see above), etc.

I am adding a "considerations" section and moving things there.

However, we do not wish to say in the context of this protocol that other mechanisms should be enabled. In fact, I expect we will seek to phase out rsync in future and only have RRDP.

As below (M10) operational issues with retrieving RRDP data will mean that the the RP is witheld potentially important information and will be out-of-sync. As long as we do have rsync it would make sense for an RP to try to use that instead in such cases, but of course there is no guarantee that that will work either (e.g. a network outage may well affect both RRDP and rsync for a given publication server).

So.. first and foremost people should monitor their things and fix issues.
 
> M9. There are some places where non-normative and normative (RFC2119) language is used that may cause confusion.  For example, in 3.4.2: “should perform the following actions”, the actions contain a set of “MUSTs” and “SHOULDs”.  Please find an alternate way to write the lead-in header to avoid confusion.  Suggestion: s/it should perform the following actions/it performs the following actions.  [Note that the same construct if used in several different places…]
>  
> M10. In several places (related to verification, but in other places as well), “the file MUST be rejected” is used.  What does this mean exactly?

That it cannot be used. If it's the update notification file there is no recovery - RRDP cannot be used now. If it's a delta then deltas cannot be used -> a snapshot may still be used. If it's a snapshot there is no recovery (since they are only used if deltas are not useable).

I added clarifying text for all this and references to the "considerations" section.

>  Are there actions that should be taken as a result?  An example of why I’m asking: an RP will download and process delta files based on information in a notification file – if a delta file is not well-formed (for example), then the validation will fail and the RP MUST reject it – there are multiple reasons why the file may not be well-formed, but the bottom line is that whatever information was there that the repository pointed to in the notification message is lost (?) – does this mean that the RP is now out of sync?  Should the RP now try and re-sync using the snapshot file?

Yes, as was written: "..or if delta files could not be used, the RP MUST update its last known session_id to this session_id and download and process snapshot file on the update notification file". But made this more explicit in the "delta" section.

>  What if that is also rejected?

Then RRPD cannot be used. Added a reference to the "considerations" section.

> Maybe the notification file was somehow corrupted and pointed at the wrong file(s)…should the relationship with the repository be reset/restarted?   It is not clear at all what should be done, or what the implications are of these error conditions; it just looks like we reject and go on. [See comment M8 above: it looks like only when “neither update notification file and one snapshot file or delta files could be processed” that we would abandon using RRDP.]

This problem is not specific to just this protocol. What do you do when you have no network? What do you do when an rsync server is down, or starts speaking in a funny dialect? What if a CA starts publishing malformed CA certificates for its children, manifests etc?

See section 3.4.5 of attached file for clarifications that I think we can do in the context of this document. Following your previous remark the only normative addition I feel I can add is this: "In order to help prevent this Publication Servers MUST perform regular validation of their own RRDP repository."

>  
> M11. From 3.4.3. (Processing Delta Files): “it is RECOMMENDED that a RP uses additional strategies to determine if an object is still relevant for validation before removing it from its local storage”.  Ok, like what?

I think going into too much detail is out of scope because it cannot be explained without having a document formally describing validation strategies. But I added this:

In particular objects should not be removed if they are included in a current validated manifest.

 
> M12. 3.5.1.2. (Update Notification File): “…the repository server MUST ensure that this file is not cached for longer than 1 minute.  An exception to this rule…”  Using “MUST” implies no exceptions.  s/MUST/SHOULD    s/then no/than no

ack 

>  
> M13. Why isn’t an IETF namespace [RFC3688] used in the XML schema?  I would strongly suggest that you use one and request it in the IANA Considerations Section.  Unlike the publication protocol, this document specifies version 1 – which of course doesn’t mean there isn’t a longer history behind it, so I’m open to keeping a non-IETF namespace if that is the case.

To be honest, because I wasn't aware :)

We now have running code based on this. Speaking for our own implementation I do not think we care about the namespace when validating the XML - just that it conforms everything else defined here. So I have no problem with changing this. Rob, do you see an issue with deployed versions of rcynic if we do this?

> M14. 3.5.2.3: “Note that the publish element is defined in the publication protocol [I-D.ietf-sidr-publication]”  But the example doesn’t correspond to the definition in I-D.ietf-sidr-publication, even if it does comply with the schema in this document.  Should the publish element behave the same way as specified in I-D.ietf-sidr-publication?  It is good that the RRDP design goes back to the Publication Protocol, but if it doesn’t use the exact definition from there (including the schema), then the differences should be highlighted here.
>  
> M15. 3.5.3.3: “Note that the publish and withdraw elements are defined in the publication protocol [I-D.ietf-sidr-publication]”  Same comment as above.

Ok, updated text to explain that the "tag" attribute is not used here, and that the "hash" attribute is not used in snapshot files.


> M16. Section 5. (Security Considerations): “RRDP replaces the use of rsync by HTTPS…”. Given the statement in 3.4.1 about using “an alternate repository retrieval mechanism” and the rest of the text in this section, I would assume that the intent is not really to “replace the use of rsync” (with an update to RFC6480).   Maybe I’m reading too much into the phrase above, but please clarify.

As explained above it is the intent to replace rsync. But.. a migration document should be written as a separate effort.

That said I would be fine with just taking out: 

      The original RPKI transport mechanism is rsync, which offers no channel
      security mechanism. RRDP replaces the use of rsync by HTTPS;

Please let me know if you prefer this.

> Minor:
>  
> P1. In 3.3.1. (Initialisation): “Note that this snapshot file MAY contain zero publish elements at this point if no objects have been submitted for publication yet.”  This text is not describing an option: s/MAY/may

ack
 
> P2. Please expand RRDP in first use.

ack, added to introduction

>  
> P3. Section 3.4.4. (Polling the Update Notification File) says: “A detailed description of the validation process itself is out of scope of this document.”  Isn’t that what is described in 3.4.1 and 3.5.1.3??

No, but changed to this to clarify:

       ..and initiate a new RPKI object validation process. However, a detailed description
       of the RPKI object validation process itself is out of scope of this document.
 
> P4. “version 4 UUID”  Please provide a reference.

ack, I assumed informative.

>  
> P5. 3.5.1.3: “The serial attribute must be…”  This is the only place where RFC2119 language is not used in this section.  Any special reason?

changed to MUST

>  
> P6. 3.5.1.3: “If delta elements are included they MUST form a contiguous sequence of serial numbers…up to the serial number mentioned in the notification element.”  The example in this section lists the serial numbers in “reverse order” (3, 2) – is there an order requirement?  The text seems to imply it, but it could go either way.

No, order in the XML elements is not required. Although most applications will probably end up using some kind of order. So I added: 

Note that the elements may not be ordered.

>  
> P7. The following references should be Informative: IANA-AD-NUMBERS, RFC6481, RFC6486, RFC6488 should be made Informative.

ok

> Nits:
>  
> N1. In both 3.3.1/3.3.2, the notes about “format and caching concerns” would work better if they are part of the previous paragraph.

ok

>  
> N2. s/are no no longer/are no longer

ok



> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr