Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03

Rob Austein <sra@hactrn.net> Fri, 12 June 2015 23:31 UTC

Return-Path: <sra@hactrn.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E45F1A8859 for <sidr@ietfa.amsl.com>; Fri, 12 Jun 2015 16:31:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uabZ0-cze1ME for <sidr@ietfa.amsl.com>; Fri, 12 Jun 2015 16:31:36 -0700 (PDT)
Received: from cyteen.hactrn.net (cyteen.hactrn.net [66.92.66.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CD2B1A8852 for <sidr@ietf.org>; Fri, 12 Jun 2015 16:31:35 -0700 (PDT)
Received: from minas-ithil.hactrn.net (c-24-34-34-101.hsd1.ma.comcast.net [24.34.34.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "nargothrond.hactrn.net", Issuer "Grunchweather Associates" (verified OK)) by cyteen.hactrn.net (Postfix) with ESMTPS id 0B60C412 for <sidr@ietf.org>; Fri, 12 Jun 2015 23:31:33 +0000 (UTC)
Received: from minas-ithil.hactrn.net (localhost [IPv6:::1]) by minas-ithil.hactrn.net (Postfix) with ESMTP id 9B96318BF283 for <sidr@ietf.org>; Fri, 12 Jun 2015 19:31:30 -0400 (EDT)
Date: Fri, 12 Jun 2015 19:31:30 -0400
From: Rob Austein <sra@hactrn.net>
To: sidr@ietf.org
In-Reply-To: <552F3C79.8030809@bbn.com>
References: <A5144FF9-FD2A-4284-A8FE-E0CB89F1E00F@tislabs.com> <552F3C79.8030809@bbn.com>
User-Agent: Wanderlust/2.15.5 (Almost Unreal) Emacs/22.3 Mule/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20150612233130.9B96318BF283@minas-ithil.hactrn.net>
Archived-At: <http://mailarchive.ietf.org/arch/msg/sidr/V8ibUyBeUJza-IKpkGp2eSeKJPM>
Subject: Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr/>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jun 2015 23:31:38 -0000

[Last one for today, I hope]

[Skipping points already covered in other messages]

At Thu, 16 Apr 2015 00:37:13 -0400, Richard Hansen wrote:
> 
>   * The protocol is mostly query-response lockstep, but there are no
>     timeouts.  If the cache is taking unreasonably long to respond to a
>     query, what should the router do?  How long is unreasonably long?
>     If timeouts are added, should the router reset its timeout timer
>     for each response PDU (Cache Response, payload, and End of Data),
>     or only after it receives the End of Data PDU?

Um, section 6, "Protocol Timing Parameters"?

>   * Should the cache time out the router if the router doesn't send a
>     Query soon after connecting?

Local decision, but section 9, "Transport", recommends keep-alives
(misnamed and always have been, should be called "make-deads").

>   * The name "Session ID" is misleading.  Section 2 clearly defines it,
>     but unless you pay attention to the definition it's easy to assume
>     that "session" refers to the transport session with the peer.  I
>     would prefer a different name such as "Cache Instance ID", though
>     that name may be insufficient when you consider the protocol
>     upgrade problem brought up by David in
>     <http://article.gmane.org/gmane.ietf.sidr/6896>.  Maybe something
>     like "Data Series ID"?

Sorry, update to an existing protocol, not changing names of protocol
elements now.

>   * In Section 5.1 (fields) under "Session ID", what is the definition
>     of "completely drop the session"?  Do you mean send a fatal error
>     PDU, do a transport-layer disconnect, and let the router reconnect
>     (possibly to a more preferred cache)?  Or do you mean send a Cache
>     Reset (cache->router) or Reset Query (router->cache) and continue
>     the existing transport session?  Or is either reaction acceptable?

Um, "drop" means "drop", not "send a reset PDU".

>   * What is the definition of "payload PDU", mentioned in Sections 5.3,
>     5.5, 8.1, 8.2, and 8.3?  (I assume it means IPv4 Prefix, IPv6
>     Prefix, and Router Key, but it should be explicitly stated.)

OK.

>   * Suppose an IPv4 Prefix was announced in serial 5 and withdrawn in
>     serial 6, and a router does a Serial Query against serial 4.  Is
>     it OK if the cache elides the announce/withdraw pair?

Yes.

>     MUST it?

Not specified.  Design assumption is that the cache will elide, since
the point of the protocol is to offload work from the router to the
cache, thus making the router follow a pointless series of actions
would be counterproductive, even if more convenient for the cache.

Speaking as an implementer, I can tell you that I just (pre)compute
the straight difference from one serial directly to the other, ie, my
implementation always elides.

>     If it doesn't, it seems like the cache MUST send the payload
>     PDUs in serial number order, and the router MUST process the
>     payload PDUs in serial number order (which implies that the
>     transport MUST provide in-order delivery of the PDUs because the
>     router has no idea which PDUs correspond to which serial
>     number).

All the specified transports are reliable in any case.

I would be OK with making this a MUST.

>   * Section 5.1 (fields) says that the serial number is the serial
>     number of the cache, but Section 5.3 (Serial Query) talks about
>     serial numbers as if they are properties of a PDU.  Perhaps 5.3
>     should be worded like:
> 
>         The router sends a Serial Query to ask the cache for the
>         announcements and withdrawals that have occurred since the
>         Serial Number in the Serial Query.
> 
>     Section 5.5 (Cache Response) has similarly problematic wording.

OK.  I think the odd wording in the Serial Query and Cache Response
sections was a holdover from some odd assumptions that somebody years
ago made about internal data structures on either the cache or the
router.  I agree that it was not very clear.

> 
>   * The two sentences in 5.3 (Serial Query) paragraph 2 seem to
>     contradict each other in the case where there are no (net?)
>     changes:  The first sentence suggests that the cache sends a Cache
>     Response (maybe followed by something?), while the second suggests
>     that it only sends an End of Data (no Cache Response).  I think the
>     intention is for the cache to send a Cache Response immediately
>     followed by an End of Data.  Is that correct?

Yes.  Tweaked wording, maybe it's better.

>   * I don't think the set of valid responses to a Query (Reset or
>     Serial) is clearly specified.  I think the intention is for these
>     to be the only valid responses:
> 
>       - Reset Query:
>           * Cache Response followed by 0 or more payload PDUs followed
>             by End of Data
>           * Error Report
>       - Serial Query:
>           * Cache Response followed by 0 or more payload PDUs followed
>             by End of Data
>           * Error Report
>           * Cache Reset
> 
>     Is this correct?

Yes.  Seems clear enough to me. :)

>   * Is there a particular reason for omitting a payload PDU count field
>     from the Cache Response PDU?  If one was present, the router could
>     pre-allocate an appropriate amount of memory to handle the payload
>     PDUs (and perform additional sanity checks).
> 
>     I guess a PDU count field would prevent an implementation from
>     opportunistically sending additional PDUs if there happened to be a
>     serial number bump during the middle of a Cache Response.
>     (Instead, the cache would have to follow the End of Data PDU with a
>     Serial Notify, which is almost as good.)

Existing protocol, you're proposing a new change, so out of scope
after WGLC unless the WG really wants this.  Customers for this would
primarily be the router folks, and I haven't heard them asking for it.

>   * Section 5.6 (IPv4 Prefix) mentions duplicates, but are redundant
>     entries OK?  Examples:
>       - {65536,192.0.2.0/24-26} and {65536,192.0.2.0/26-26} (the latter
>         is redundant)
>       - {65536,192.0.2.0/24-26} and {65536,192.0.2.0/24-25} (the latter
>         is redundant)

We just present the data we find in the ROAs, sir.  We de-dup to keep
the protocol semantics simple, but otherwise we just send what we got.

>   * Section 5.11 (Error Report) says that Error Reports are only sent
>     as responses to other PDUs.  Why the restriction?  This prevents a
>     side from raising a timeout error, and it prevents the cache from
>     raising an internal error if a problem is detected when it's time
>     to send a Serial Notify.

From which we infer that you've not had the dubious pleasure of
operating a protocol that permits error message duels.

Don't go there.

>   * If error reports are only sent as responses to other PDUs, how is
>     it possible for an Error Report to not be associated with the PDU
>     to which it is responding?  (Section 5.11 paragraph 4)

You're not allowed to send unsolicited Error Report PDUs, but it is
not necessarily the case that the error you're reporting has much of
anything to do with the request.

Router: "Please tell me about new stuff since serial xyz."

Cache:  "I'm sorry, my CPU caught fire half an hour ago and this is
         the first chance I've had to tell you, bye."

>   * For version negotiation, what is supposed to happen if the router
>     starts with a PDU with version > 1?  There is an Unsupported
>     Protocol Version error type, but nothing requires that to be sent.

Good catch.

>   * Suppose a router connects and issues a v0 Query.  If the cache
>     doesn't support protocol v0, Section 7 says it MUST either
>     downgrade or disconnect.  Can it issue an Error Report before
>     disconnecting?  I would prefer it if the server MUST issue an
>     Unsupported Protocol Version Error Report before disconnecting.

And what protocol version should that Error Report PDU be, given that
the router asked for v0 and you just said the cache doesn't speak v0?

>   * The second-to-last paragraph of Section 10 talks about deleting
>     data from a cache when it has been unable to refresh from that
>     cache for twice the polling period (by default).  Why not have the
>     time to delete equal the Expire Interval as specified in Section 6?

Good catch.

Thanks!