Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03

Rob Austein <sra@hactrn.net> Fri, 12 June 2015 20:03 UTC

Return-Path: <sra@hactrn.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34F631B29E0 for <sidr@ietfa.amsl.com>; Fri, 12 Jun 2015 13:03:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.011
X-Spam-Level:
X-Spam-Status: No, score=-0.011 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vyyybjzRgqiO for <sidr@ietfa.amsl.com>; Fri, 12 Jun 2015 13:03:11 -0700 (PDT)
Received: from cyteen.hactrn.net (cyteen.hactrn.net [66.92.66.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DF03B1B29DF for <sidr@ietf.org>; Fri, 12 Jun 2015 13:03:10 -0700 (PDT)
Received: from minas-ithil.hactrn.net (c-24-34-34-101.hsd1.ma.comcast.net [24.34.34.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "nargothrond.hactrn.net", Issuer "Grunchweather Associates" (verified OK)) by cyteen.hactrn.net (Postfix) with ESMTPS id 52037356 for <sidr@ietf.org>; Fri, 12 Jun 2015 20:03:09 +0000 (UTC)
Received: from minas-ithil.hactrn.net (localhost [IPv6:::1]) by minas-ithil.hactrn.net (Postfix) with ESMTP id 84FEB18BD360 for <sidr@ietf.org>; Fri, 12 Jun 2015 16:03:08 -0400 (EDT)
Date: Fri, 12 Jun 2015 16:03:08 -0400
From: Rob Austein <sra@hactrn.net>
To: sidr@ietf.org
In-Reply-To: <D12DE2D7.49276%wesley.george@twcable.com>
References: <A5144FF9-FD2A-4284-A8FE-E0CB89F1E00F@tislabs.com> <9D70CAEF-22F9-44FC-A429-9CBEBA9EAE6C@tislabs.com> <D12DE2D7.49276%wesley.george@twcable.com>
User-Agent: Wanderlust/2.15.5 (Almost Unreal) Emacs/22.3 Mule/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset="US-ASCII"
Message-Id: <20150612200308.84FEB18BD360@minas-ithil.hactrn.net>
Archived-At: <http://mailarchive.ietf.org/arch/msg/sidr/sZ5aNkE-sR01pceWIPhctBzbUtU>
Subject: Re: [sidr] WGLC for draft-ietf-sidr-rpki-rtr-rfc6810-bis-03
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr/>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jun 2015 20:03:13 -0000

[More old WGLC comments]

At Tue, 17 Mar 2015 14:29:34 -0400, George, Wes wrote:
> 
> Section 6 Expire interval - this seems out of step with the way that we've
> done most things in RPKI, i.e. what to do with the information provided is
> usually thought of as a matter of local policy. I realize that there is
> increasing risk to keeping the data beyond the expiry time as it grows
> more stale, but there isn't much justification for why this MUST NOT is
> present. I think that perhaps the tradeoff between staleness and
> everything failing back to unknown status is something that the operator
> needs to weigh themselves. We could provide guidance that [MUST/SHOULD]
> NOT keep data from a certain cache beyond expiry time *if* another cache
> is available, but acknowledge that in some cases stale info may be more
> desirable than nothing at all (unless that's not actually true and I'm
> missing something...) This is discussed in the failover scenarios in the
> bottom of section 10 (only keep data if you don't have a full set from
> another cache) but figured I'd mention it in case we think there's some
> tweaks necessary in the section 6 text.

The basic problem is that the router does not have the data necessary
to make an informed decision.  We've stripped all of the timestamps
off of the public data (certificate expiration, CRL and manifest
nextUpdate, etc), and some of the information (cache's own interval
for polling the global RPKI) is not public.

I could (maybe, sort of) see an argument for downgrading MUST NOT here
to SHOULD NOT, but the intent here is to send a (very) strong signal
to the router that once it's past the expiration time, it would be
better (in the opinion of the cache operator) for the router to let
everything go to unknown than to keep using data this stale.

Keep in mind that the security model for this protocol pretty much
assumes that the router and cache are within the same organization.
So it's your cache telling your router what to do, and if you need a
local policy override, sure, go ahead, configure it -- on your cache.

> Also -
> Since we say in section 10 that it is permissible to hold data from
> multiple caches, the doc appears to be missing guidance on what the router
> side of RPKI-router should do in the case where there are multiple caches
> that disagree with one another. It does say MUST NOT distinguish between
> data sources when validating, but that may not cover this scenario. This
> may be as simple as recommending that in the case where data from multiple
> caches is held and specific entries conflict with one another, there
> SHOULD be an odd number of caches so that there is basis for comparison to
> determine which cache is out of sync or providing incorrect info. (i.e.
> Have 3 so that you can go with the 2/3 that agree)

With respect, you may be over-thinking this.  The whole point of the
protocol is to make the router's job simpler, not harder.  Having the
router try to second guess the caches defeats that purpose.  Pick a
cache and use it; if it breaks, pick another one.

There's is a role for some kind of monitoring tool here, operating in
the router protocol role talking to all of your caches to see what's
up, and perhaps even whacking things (cache server, NOC monkey,
whatever works in your shop) as necessary to fix problems, but this
seems like a job for a NOC tool running on a cheap VM, not code in the
slow path on every one of your expensive routers.