Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03

Richard Hansen <rhansen@bbn.com> Thu, 16 April 2015 04:37 UTC

Return-Path: <rhansen@bbn.com>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EA0271B2F69 for <sidr@ietfa.amsl.com>; Wed, 15 Apr 2015 21:37:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.312
X-Spam-Level:
X-Spam-Status: No, score=-2.312 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tgE2-C1aMUjF for <sidr@ietfa.amsl.com>; Wed, 15 Apr 2015 21:37:19 -0700 (PDT)
Received: from smtp.bbn.com (smtp.bbn.com [128.33.1.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E3661B2B65 for <sidr@ietf.org>; Wed, 15 Apr 2015 21:37:19 -0700 (PDT)
Received: from socket.bbn.com ([192.1.120.102]:36467) by smtp.bbn.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.77 (FreeBSD)) (envelope-from <rhansen@bbn.com>) id 1YibYE-000AfK-4v for sidr@ietf.org; Thu, 16 Apr 2015 00:37:18 -0400
X-Submitted: to socket.bbn.com (Postfix) with ESMTPSA id E09FE3FEE1
Message-ID: <552F3C79.8030809@bbn.com>
Date: Thu, 16 Apr 2015 00:37:13 -0400
From: Richard Hansen <rhansen@bbn.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: sidr@ietf.org
References: <A5144FF9-FD2A-4284-A8FE-E0CB89F1E00F@tislabs.com>
In-Reply-To: <A5144FF9-FD2A-4284-A8FE-E0CB89F1E00F@tislabs.com>
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="TMjAX1IL9OnURtG9ILV0bsgcu6NU38sfe"
Archived-At: <http://mailarchive.ietf.org/arch/msg/sidr/kmD-LdDdRAWB35IKZpROqmpfLDY>
Subject: Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr/>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Apr 2015 04:37:22 -0000

Hi all,

Here are my comments, some of which overlap with what others have said:

  * The name of the draft says "rfc6810-bis", but the XML <rfc> tag
    doesn't have an obsoletes="6810" attribute.  And I don't think it
    should -- Section 7 has a normative reference to RFC6810 when
    discussing downgrades to version 0, which isn't specified in this
    document.  So perhaps the title and abstract should be worded to
    make it clear that this is not a replacement for RFC6810, but
    rather a new version of the protocol specified in RFC6810.  (Or
    maybe this document should be worded as an update to RFC6810?)
    (Also mentioned in <http://article.gmane.org/gmane.ietf.sidr/6871>.)

  * The protocol is mostly query-response lockstep, but there are no
    timeouts.  If the cache is taking unreasonably long to respond to a
    query, what should the router do?  How long is unreasonably long?
    If timeouts are added, should the router reset its timeout timer
    for each response PDU (Cache Response, payload, and End of Data),
    or only after it receives the End of Data PDU?

  * Should the cache time out the router if the router doesn't send a
    Query soon after connecting?

  * Notify/Query race:  What is supposed to happen if the router sees a
    Serial Notify right after it sends a Serial Query or Reset Query?
    This could happen if the two are sent at the same time -- the
    messages will cross paths and the router might think that the
    Serial Notify is an erroneous response to the query, and that the
    subsequent Cache Response came out of the blue.

  * The name "Session ID" is misleading.  Section 2 clearly defines it,
    but unless you pay attention to the definition it's easy to assume
    that "session" refers to the transport session with the peer.  I
    would prefer a different name such as "Cache Instance ID", though
    that name may be insufficient when you consider the protocol
    upgrade problem brought up by David in
    <http://article.gmane.org/gmane.ietf.sidr/6896>.  Maybe something
    like "Data Series ID"?

  * In Section 5.1 (fields) under "Session ID", what is the definition
    of "completely drop the session"?  Do you mean send a fatal error
    PDU, do a transport-layer disconnect, and let the router reconnect
    (possibly to a more preferred cache)?  Or do you mean send a Cache
    Reset (cache->router) or Reset Query (router->cache) and continue
    the existing transport session?  Or is either reaction acceptable?

  * What is the definition of "payload PDU", mentioned in Sections 5.3,
    5.5, 8.1, 8.2, and 8.3?  (I assume it means IPv4 Prefix, IPv6
    Prefix, and Router Key, but it should be explicitly stated.)

  * Suppose an IPv4 Prefix was announced in serial 5 and withdrawn in
    serial 6, and a router does a Serial Query against serial 4.  Is
    it OK if the cache elides the announce/withdraw pair?  MUST it?  If
    it doesn't, it seems like the cache MUST send the payload PDUs in
    serial number order, and the router MUST process the payload PDUs in
    serial number order (which implies that the transport MUST provide
    in-order delivery of the PDUs because the router has no idea which
    PDUs correspond to which serial number).

  * Section 5.1 (fields) says that the serial number is the serial
    number of the cache, but Section 5.3 (Serial Query) talks about
    serial numbers as if they are properties of a PDU.  Perhaps 5.3
    should be worded like:

        The router sends a Serial Query to ask the cache for the
        announcements and withdrawals that have occurred since the
        Serial Number in the Serial Query.

    Section 5.5 (Cache Response) has similarly problematic wording.

  * The two sentences in 5.3 (Serial Query) paragraph 2 seem to
    contradict each other in the case where there are no (net?)
    changes:  The first sentence suggests that the cache sends a Cache
    Response (maybe followed by something?), while the second suggests
    that it only sends an End of Data (no Cache Response).  I think the
    intention is for the cache to send a Cache Response immediately
    followed by an End of Data.  Is that correct?

  * I don't think the set of valid responses to a Query (Reset or
    Serial) is clearly specified.  I think the intention is for these
    to be the only valid responses:

      - Reset Query:
          * Cache Response followed by 0 or more payload PDUs followed
            by End of Data
          * Error Report
      - Serial Query:
          * Cache Response followed by 0 or more payload PDUs followed
            by End of Data
          * Error Report
          * Cache Reset

    Is this correct?

  * Is there a particular reason for omitting a payload PDU count field
    from the Cache Response PDU?  If one was present, the router could
    pre-allocate an appropriate amount of memory to handle the payload
    PDUs (and perform additional sanity checks).

    I guess a PDU count field would prevent an implementation from
    opportunistically sending additional PDUs if there happened to be a
    serial number bump during the middle of a Cache Response.
    (Instead, the cache would have to follow the End of Data PDU with a
    Serial Notify, which is almost as good.)

  * Section 5.6 (IPv4 Prefix) mentions duplicates, but are redundant
    entries OK?  Examples:
      - {65536,192.0.2.0/24-26} and {65536,192.0.2.0/26-26} (the latter
        is redundant)
      - {65536,192.0.2.0/24-26} and {65536,192.0.2.0/24-25} (the latter
        is redundant)

  * The fixed-length SKI field doesn't permit algorithm changes.  Note
    that there has been some discussion about using SHA-256 for the SKI
    and AKI fields for the RFC6487(bis) profile (I'm guessing that's
    probably not going to happen, but still...).
    (Also mentioned in <http://article.gmane.org/gmane.ietf.sidr/6869>.)

  * Section 5.11 (Error Report) says that Error Reports are only sent
    as responses to other PDUs.  Why the restriction?  This prevents a
    side from raising a timeout error, and it prevents the cache from
    raising an internal error if a problem is detected when it's time
    to send a Serial Notify.

  * If error reports are only sent as responses to other PDUs, how is
    it possible for an Error Report to not be associated with the PDU
    to which it is responding?  (Section 5.11 paragraph 4)

  * For version negotiation, what is supposed to happen if the router
    starts with a PDU with version > 1?  There is an Unsupported
    Protocol Version error type, but nothing requires that to be sent.

  * Suppose a router connects and issues a v0 Query.  If the cache
    doesn't support protocol v0, Section 7 says it MUST either
    downgrade or disconnect.  Can it issue an Error Report before
    disconnecting?  I would prefer it if the server MUST issue an
    Unsupported Protocol Version Error Report before disconnecting.

  * The second-to-last paragraph of Section 10 talks about deleting
    data from a cache when it has been unable to refresh from that
    cache for twice the polling period (by default).  Why not have the
    time to delete equal the Expire Interval as specified in Section 6?

Thanks,
Richard