Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard

Shane Amante <shane@castlepoint.net> Fri, 16 December 2011 00:12 UTC

Return-Path: <shane@castlepoint.net>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB14111E809A for <ietf@ietfa.amsl.com>; Thu, 15 Dec 2011 16:12:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.449
X-Spam-Level:
X-Spam-Status: No, score=-1.449 tagged_above=-999 required=5 tests=[AWL=-1.150, BAYES_00=-2.599, MANGLED_TOOL=2.3]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DUogcJx1sug9 for <ietf@ietfa.amsl.com>; Thu, 15 Dec 2011 16:12:23 -0800 (PST)
Received: from dog.tcb.net (dog.tcb.net [64.78.150.133]) by ietfa.amsl.com (Postfix) with ESMTP id 01B5511E808A for <ietf@ietf.org>; Thu, 15 Dec 2011 16:12:22 -0800 (PST)
Received: by dog.tcb.net (Postfix, from userid 0) id ED0453681BC; Thu, 15 Dec 2011 17:12:11 -0700 (MST)
Received: from mbp.castlepoint.net (216-160-173-30.hlrn.qwest.net [216.160.173.30]) (authenticated-user smtp) (TLSv1/SSLv3 AES128-SHA 128/128) by dog.tcb.net with SMTP; Thu, 15 Dec 2011 17:12:11 -0700 (MST) (envelope-from shane@castlepoint.net)
X-Avenger: version=0.7.8; receiver=dog.tcb.net; client-ip=216.160.173.30; client-port=62568; syn-fingerprint=65535:54:1:64:M1452,N,W1,N,N,T,S; data-bytes=0
Subject: Re: Last Call: <draft-ietf-sidr-rpki-rtr-19.txt> (The RPKI/Router Protocol) to Proposed Standard
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset="us-ascii"
From: Shane Amante <shane@castlepoint.net>
In-Reply-To: <m28vmeu8x2.wl%randy@psg.com>
Date: Thu, 15 Dec 2011 17:11:56 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <3BE72666-0C2B-4D4A-8F64-8FD83843B28A@castlepoint.net>
References: <20111129225106.25323.811.idtracker@ietfa.amsl.com> <FF8D803A-4C2D-4A3A-B274-70A9FB514F5C@castlepoint.net> <m28vmeu8x2.wl%randy@psg.com>
To: Randy Bush <randy@psg.com>
X-Mailer: Apple Mail (2.1251.1)
Cc: IETF Disgust <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Dec 2011 00:12:23 -0000

On Dec 14, 2011, at 2:42 PM, Randy Bush wrote:
> I am not sure if this is an architectural misunderstanding V a red herring.

Let's call it an architectural misunderstanding.


> As you say, NetConf is for *configuring* routers.  RPKI-rtr is not used
> for router configuration, but rather dynamic data, a la IS-IS or BGP.
> In fact, the RPKI-rtr payload data go into the same data structure as
> the BGP data.

"Dynamic data"?  Isn't that a truism?  We're not carving RPKI data out of stone tablets are we?  :-)


> Of course, the configuration of the RPKI-rtr relationship to cache(s) is
> router configuration, similar to configuring BGP peers, and presumably
> can be done by NetConf on those platforms which support NetConf.
> 
> Bottom line: NetConf 'replaces' the CLI, not BGP.

Here's another bottom-line: today, in my network, there is only one thing that can/does influence BGP policy on an individual router: locally configured (on the router itself) policy.  There is a well-established operational practice that only policy on the router itself, consumed through either CLI or NETCONF, will affect routing decisions made by that box.

RPKI-RTR is introducing a *new* side-channel, (I'm being kind by not calling it a 'backdoor'), through which "dynamic data" is now injected directly to the BGP routing process that "dynamically impacts" BGP policy and, ultimately, it's route-selection.  The key concern I have is: RPKI-RTR is another entry-point into BGP on the router for which I, and every other operator, now need to perform regression testing for each version of code, on each make & model of router!, I roll-out to my network.  If RPKI-RTR had used NETCONF, instead, this likely would have used an existing mechanism of loading, and evaluating, BGP policy in routers that exist *today*, (and since the dawn of time).  This is a non-trivial operational expense bourne by the entire SP industry for the lifetime of RPKI-RTR.


> FWIW, two or three years ago, not wanting to reinvent the wheel, we
> looked at NetConf-style payload packaging.  After all, Bert and I
> chartered NetConf back in the day.  I still owe a dinner to the two
> NetConf folk who helped try.  Unfortunately the mismatch was
> non-trivial, though nowhere near the mismatch of DNSsec, at which we
> also looked (as the Tonys and I had published in 1998, Lutz in 2006,
> etc., of which I presume you are unaware).
> 
> When we evaluated the data bloat for NetConf-style packaging we were not
> cheered.  While probably not important for a CLI replacement, for a
> continuous dynamic protocol the overhead of unpacking XML and decoding
> the contained ASCII payload drew unhappy whining from the router
> hackers.

Was this discussed in the IETF?  If so, do you have a pointer to the evaluation or results of those findings?

Most importantly, why this overwhelming concern about "bloat" from NetConf-style packaging?  Surely there is substantial 'overhead' in pushing around just the RPKI data amongst the RPKI caches, (even within a single ISP), including having RPKI caches verify signatures associated with new RPKI updates on the caches?  NetConf would seem to be the least of the concerns here, but without an elaboration of the issues you cite, it's unclear if those issues were valid, let alone if they could have been mitigated.


> NetConf is not ideal for a long-session back-and-forth protocol, with
> RPKI-rtr's serial number exchange which leaves the router in control of
> the exchanges and enables incremental update of the data.  You *really*
> do not want the cache to send the full data set to the router every
> time.  And you definitely do not want a cache trying to keep track of
> the state of O(100) router clients which may or may not still think they
> are its friend.

Hrm, I think there is a grave misunderstanding indeed.  A few points:
a)  The current draft, draft-ietf-sidr-rpki-rtr-20, contradicts what you say above wrt not sending the full data set every time:
---snip---
   [...]  As with any update protocol based on
   incremental transfers, the router must be prepared to fall back to a
   full transfer if for any reason the cache is unable to provide the
   necessary incremental data.
---snip---
b)  I think there is a very bad assumption, above, that the router is a "database of record" and, therefore, should be the one in control of consuming configuration, excuse me: "dynamic", updates from RPKI caches.  That's not how networks are, or should be, run in my experience.  ISP's already have systems that maintain physical & logical inventory in offline DB's and, just as importantly, coordinate/synchronize changes from their offline systems to O(1,000's) of routers.  You need that for physical & logical inventory purposes alone, let alone more fancy configuration of routing protocols on those same devices.
c)  In draft-ietf-sidr-origin-ops-13, there is a suggestion that if "RPKI data [is used] as an input to operational routing decisions, they SHOULD ensure local cache freshness at least every four to six hours."  However, I believe it's been suggested that (Global?) RPKI synchronization intervals may, in practice, be as large as 24 hours.  Regardless, in my experience, anything that's updated on the order of hours is surely the domain of "configuration" and isn't what I would consider "dynamic" like would be the case in a routing protocol such as IS-IS or BGP.
d)  Given the above, very lax synchronization intervals of the underlying RPKI data, I don't see the need to open a long-lived TCP connection, (or as you said a couple of paragraphs ago: "continuous dynamic protocol"), between RPKI cache to receiving RPKI router.  Instead, do what is done today: NETCONF/SSH into the box when needed, i.e.: every 4 - 6 hours or 24 hours when new RPKI data arrives at the cache(s).
e)  Given the success operators have today with generating very, very large prefix-lists, (today based on IRR data), and periodically pushing them through the CLI into routers, it seems a misguided assumption that one is "required" to have incremental updates of data, particularly given the several hour intervals we're talking about here for new/updated RPKI data to arrive.  Nonetheless, if one wanted to do this, then please at least give us operators some credit that we're smart enough to (help) create NETCONF schemas that would be amenable to incremental updates onboard the router, based on a variety of well-known CompSci techniques, and which could also take advantage of RFC 5717, NETCONF Partial Locking.


> And, sadly, NetConf is not available on significant platforms where
> RPKI-rtr is already running today.

That's a bit of a circular argument isn't it?  RPKI-RTR is in existence today due to pre-standards development, which is fine, everyone does it.  Certainly a RPKI-RTR NETCONF schema would be there, today, had SIDR gone that as a direction, no?


> So, all in all, being lazy, of course we tried.  But it was not a good
> fit.  Of course, if you want to have a go at it, I am sure we would be
> willing to at least kibitz.  But first you might want to talk to the
> vendors who have already implemented RPKI-rtr to see if they would be
> willing to re-code.
> 
> randy

-shane