Re: [sidr] AD Review of draft-ietf-sidr-delta-protocol-04

Tim Bruijnzeels <tim@ripe.net> Mon, 09 January 2017 11:55 UTC

Return-Path: <tim@ripe.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BCC4129530; Mon, 9 Jan 2017 03:55:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.098
X-Spam-Level:
X-Spam-Status: No, score=-5.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-3.199] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yeADCoPuQP1a; Mon, 9 Jan 2017 03:55:26 -0800 (PST)
Received: from molamola.ripe.net (molamola.ripe.net [IPv6:2001:67c:2e8:11::c100:1371]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C300D129C0F; Mon, 9 Jan 2017 03:55:25 -0800 (PST)
Received: from titi.ripe.net ([193.0.23.11]) by molamola.ripe.net with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84_2) (envelope-from <tim@ripe.net>) id 1cQYXn-0006DV-0J; Mon, 09 Jan 2017 12:55:21 +0100
Received: from sslvpn.ripe.net ([193.0.20.230] helo=vpn-187.ripe.net) by titi.ripe.net with esmtps (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.84_2) (envelope-from <tim@ripe.net>) id 1cQYXl-00056e-Tl; Mon, 09 Jan 2017 12:55:18 +0100
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Content-Type: multipart/alternative; boundary="Apple-Mail=_6FC83753-5FBF-4B11-A131-52AE7C864F90"
From: Tim Bruijnzeels <tim@ripe.net>
In-Reply-To: <EE84C61E-7730-4559-852A-4CFBF8543756@cisco.com>
Date: Mon, 09 Jan 2017 12:55:16 +0100
Message-Id: <EEDD968F-D454-4B10-AC0D-8C1B3CD89709@ripe.net>
References: <7DE58D01-AC82-4099-9A46-99E625315637@cisco.com> <99DD12D4-6E13-40A4-95C7-17B12CB61F1C@ripe.net> <EE84C61E-7730-4559-852A-4CFBF8543756@cisco.com>
To: "Alvaro Retana (aretana)" <aretana@cisco.com>
X-Mailer: Apple Mail (2.3124)
X-ACL-Warn: Delaying message
X-RIPE-Spam-Level: ------------
X-RIPE-Spam-Report: Spam Total Points: -12.6 points pts rule name description ---- ---------------------- ------------------------------------ -7.5 ALL_TRUSTED Passed through trusted hosts only via SMTP -3.2 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.0 HTML_MESSAGE BODY: HTML included in message 0.0 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars
X-RIPE-Signature: 784d7acfe6559f2a0b602ec6519a07197a2d12bb4e942ed404ce4524b12cf972
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidr/eCPeZ1YqNDoF9fdMGINh3eutFYc>
Cc: "draft-ietf-sidr-delta-protocol@ietf.org" <draft-ietf-sidr-delta-protocol@ietf.org>, Chris Morrow <morrowc@ops-netman.net>, "sidr-chairs@ietf.org" <sidr-chairs@ietf.org>, "sidr@ietf.org" <sidr@ietf.org>
Subject: Re: [sidr] AD Review of draft-ietf-sidr-delta-protocol-04
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidr/>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Jan 2017 11:55:28 -0000

Hi Alvaro,

in-line

> On 07 Jan 2017, at 00:04, Alvaro Retana (aretana) <aretana@cisco.com> wrote:
> 
> On 12/28/16, 11:43 AM, "Tim Bruijnzeels" <tim@ripe.net <mailto:tim@ripe.net>> wrote:
>  
> Tim:
>  
> Happy New Year!!

Thank you, you (all) too!!

>  
>  
> | Thank you for your in-depth review :) It really helps to clarify the document. Replies in-line.
> |
> | I attached an updated document, but.. I did not discuss this with co-authors yet. So, I invite any of them to
> | disagree or suggest changes to the things below.
>  
> Thanks for that.  I have some comments below – I want us to discuss the Update issue (M16) so we can start the IETF Last Call.
>  
> Thanks!
>  
> Alvaro.
>  
>  
> | On 20 Dec 2016, at 14:15, Alvaro Retana (aretana) <aretana@cisco.com <mailto:aretana@cisco.com>> wrote:
> …
> | | M2. Section 3.2. (Certificate Authority Use): “Relying Parties that do not support this delta protocol MUST
> | | NOT reject a CA certificate merely because it has an SIA extension containing this new kind of
> | | AccessDescription.”  By definition, an RP that has never even considered this document will not support
> | | the delta protocol – IOW, trying to specify the behavior of RPs that may have never even seen this
> | | document makes no sense to me, and can’t be enforced.  What is the current behavior when an RP receives
> | | the extra SIA extension?
> |
> | The point of documenting this is to give RP software the option of recognising that a new protocol exists,
> | even if they do not fully support it yet. In practice all current RPs do this (see below), and future RPs should
> | consider this document. We have in fact been publishing these additional SIAs in the RIPE NCC production
> | RPKI for more than a year without problems. I changed it slightly to this:
> |
> | "Relying Parties that choose not to support this delta protocol yet, MUST NOT reject a CA certificate merely
> | because it has an SIA extension containing this new kind of AccessDescription."
> |
> | But, I can also live with taking this sentence out altogether.
>  
> I would prefer it if we do that.
>  

ok

>  
> …
> | | M4. From 3.3.2. (Publishing Updates): “The update notification file SHOULD be kept small…older delta files
> | | that…will result in total size of deltas exceeding the size of the snapshot, MUST be excluded.”  Here the
> | | “SHOULD” and the “MUST” are in contradiction: you either do it always (MUST) or there may be cases when
> | | you don’t (SHOULD).  s/SHOULD/should
> |
> | Ok, I moved the file size concern to the next bullet point instead, so that we have:
> |
> | o Any older delta files that, when combined with all more recent delta files, will result
> |   in a total size of deltas exceeding the size of the snapshot, MUST be excluded to avoid
> |   that RPs download more data than necessary.
> |
> | o The server MAY also exclude more recent delta files if it finds that their usage by a
> |   small number of RPs that would be forced to perform a full synchronisation is outweighed
> |   performance benefits of having a smaller update notification file. However, the repository
> |   server MUST include all deltas that it has available for the last two hours.
> |
> | I hope that explains it better. I changed the SHOULD in the second bullet point to a MUST. The unspoken
> | reason for the SHOULD was that a publication server may not have deltas.
>  
> The problem that I think can still exist is time dependent:  “older delta files…MUST be excluded…MUST include all deltas that is has available for the last two hours”.  What happens if all “older delta files” are less than two hours old, but the size would be too big?  Is there the possibility of a case where there are a lot of changes that happen close together (but more than a minute apart) that would then result in many delta files?  Besides not postponing publication for more than a minute, I don’t remember other mechanisms that would avoid a large number of changes…  If this case is not possible, then we should be ok – I would still like to understand why.
>  

Let me clarify first. No, it should never exceed snapshot file. But I understand the wording is confusing, how about this instead then?

 o The server MAY also choose to exclude old delta files, even though their combined size
   does not exceed the size of the snapshot file, if they find that files of a certain age
   are only very infrequently requested. Since most Relying Parties will keep close track of
   new deltas, excluding such deltas will help to keep the size of the Notification File small
   and thus reduce the load on the Publication Server in particular. However, if a server
   chooses to use this strategy they MUST not exclude such deltas if they are less than two
   hours old, so that Relying Parties can safely use a one hour refresh interval and suffer
   short (maintenance) outages without being forced to do a full re-synchronisation.

And, yes, two hours is arbitrary. I just felt that there should be some reasonable limit to this. Another approach could be to say that the servers MUST(?) establish first where some threshold lies of the time window where e.g. on average 99.5% (again arbitrary of course) of all requests ever for deltas are typically done. This is something that a publication server can monitor over time. It might find that 99.5% of such requests happen within 1 hour, 2, 5.. we don't really know at this time - this needs deployment.

Long story short.. I am also fine with leaving this out of the specification altogether for now. But then I will probably seek to add some wording on this in a -bis or update sometime later once we have more operational experience (and data).



> …
> | | M7. 3.4.1. (Processing the Update Notification File): “MUST issue an operator error”  What is an “operator
> | | error” and how do you issue one?  I didn’t see an error definition in the schema.
> |
> | I see your point, but.. afaik none of the other sidr RFCs and I-Ds define this, and just avoid talking about this
> | altogether. This was undefined in over five years of deployment with rsync, and yet operational issues with
> | rsync also happen and get reported and resolved somehow. In short I suggest that I just leave it out here as
> | well.
>  
> No.  Sorry, but “Others didn’t do it” is not a valid reason for an incomplete specification.
>  
> | But to continue: All RPs do have the common sense to inform their users about issues (e.g. also things like:
> | hey, this certificate is invalid because..), but there is no standard defined, so they use various formats and
> | ways. I do see some benefit in defining at least a common standard because it might help (1) monitoring, (2)
> | normative documenting of error conditions and responses, and (3) interop testing between RPs (Rob, David
> | and I have done this a few times and it's a bit of a hassle). Doing so is imo major work and something for a
> | separate sidr-ops document.
>  
> One of the reasons I added this comment is because it uses a “MUST”: you’re mandating the behavior as part of the specification.  If it is a mandatory behavior, then there should be something that goes with is as to how it is done.  One way forward is to clarify what you mean by an “operator error”, which (interpreting the paragraph above) is really a log message.
>  
> I am fine (if you want to mandate the behavior) for you to say something like: “MUST log the condition for the operator to be aware of the problem”.  No need to specify the contents, a common format, etc.
>  
> I agree that if you did want to define common formats, contents, etc. then that would belong somewhere else.

Okay, so is your concern addressed if I change "MUST issue an operator error" to “MUST log the condition for the operator to be aware of the problem”?


>  
> …
> | See section 3.4.5 of attached file for clarifications that I think we can do in the context of this document.
> | Following your previous remark the only normative addition I feel I can add is this: "In order to help prevent
> | this Publication Servers MUST perform regular validation of their own RRDP repository."
>  
> The new section looks ok, except for the paragraph that says: “This protocol document cannot define normative text…”  No need to put that in here.

ok, will exclude

>  
>   
> | | M11. From 3.4.3. (Processing Delta Files): “it is RECOMMENDED that a RP uses additional strategies to
> | | determine if an object is still relevant for validation before removing it from its local storage”.  Ok, like
> | | what?
> |
> | I think going into too much detail is out of scope because it cannot be explained without having a document
> | formally describing validation strategies. But I added this:
> |
> | In particular objects should not be removed if they are included in a current validated manifest.
>  
> Ok.  As before, the reason for the comment is the use of “RECOMMENDED”.
>  
>  
> …
> | | M16. Section 5. (Security Considerations): “RRDP replaces the use of rsync by HTTPS…”. Given the
> | | statement in 3.4.1 about using “an alternate repository retrieval mechanism” and the rest of the text in this
> | | section, I would assume that the intent is not really to “replace the use of rsync” (with an update to
> | | RFC6480).   Maybe I’m reading too much into the phrase above, but please clarify.
> |
> | As explained above it is the intent to replace rsync. But.. a migration document should be written as a
> | separate effort.
> |
> | That said I would be fine with just taking out: 
> |
> |     The original RPKI transport mechanism is rsync, which offers no channel
> |     security mechanism. RRDP replaces the use of rsync by HTTPS;
> |
> | Please let me know if you prefer this.
>  
>  
> Let me try to make the point again – I didn’t do a good job above.  In fact, reading this over I want to change it…
>  
> In this document you’re saying: here’s a new protocol that SHOULD be used instead of rsync (3.4.1). 
>  
> And, as you mentioned above, the intent is to eventually phase out rsync in favor of RRDP.  Time lines to phase it out are really out of the scope of this (or I think any other RFC) document because it is something the RPKI community agrees on and migrates to, just like any other new protocol.
>  
> All that is good.
>  
> However, both RFC6480 and RFC6481 mandate the use of rsync: “all publication points MUST be accessible via rsync” and “publication repository MUST be available using rsync”.  Which means that to comply with the RPKI architecture rsync MUST be supported forever, regardless of whether RRDP is used, or a different protocol that may come along in the future…or even if it is not needed at all anymore (not even for eventual backup). L
>  
> The question about updating RFC6480 above should have been a statement: this document should update RFC6480/RFC6481 so that rsync is not required anymore.
>  
> Changing the text in RFC6480/RFC6481 to say “MUST use RRDP” would just result in another update later if another protocol ever comes along, so I think this document should be explicit in the change (to allow the move away from rsync), but still generic…for example, the text in RFC6480/RFC6481 can be replaced with something like: “The publication point MUST be accessible using a retrieval mechanism consistent with the accessMethod element value(s).  Multiple retrieval mechanisms can be supported at the repository operator’s discretion”.
>  
> A change like that would require that this document be tagged to Update RFC6480/RFC6481 (and would of course return RFC6481 to be Normative).

Ok, I understand. But I will need a bit of time for the text. (baby number two has his due date this Thursday so bear with me).

>  
> 
> …
> | | P4. “version 4 UUID”  Please provide a reference.
> |
> | ack, I assumed informative.
>  
> Normative, because the document says that it MUST be used.  BTW, please put the reference on the first mention of UUID.

ok, will do


Thanks
Tim