[sidr] some comments and questions regarding rpki-rtr
timbru@ripe.net Sat, 01 October 2011 09:39 UTC
Return-Path: <timbru@ripe.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 36B2C21F8AA9 for <sidr@ietfa.amsl.com>; Sat, 1 Oct 2011 02:39:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PExSR0NfYmPJ for <sidr@ietfa.amsl.com>; Sat, 1 Oct 2011 02:39:33 -0700 (PDT)
Received: from beaver.ripe.net (beaver.ipv6.ripe.net [IPv6:2001:67c:2e8:11::c100:131d]) by ietfa.amsl.com (Postfix) with ESMTP id 48E5E21F8906 for <sidr@ietf.org>; Sat, 1 Oct 2011 02:39:33 -0700 (PDT)
Received: from [2001:67c:2e8:11::c100:1356] (helo=pony.ripe.net) by beaver.ripe.net with esmtp (Exim 4.63) (envelope-from <timbru@ripe.net>) id 1R9w5Q-0000s1-S7; Sat, 01 Oct 2011 11:42:25 +0200
Received: from apache by pony.ripe.net with local (Exim 4.63) (envelope-from <timbru@ripe.net>) id 1R9w5Q-0008EW-Ho; Sat, 01 Oct 2011 11:42:24 +0200
Received: from 80.57.195.122 (SquirrelMail authenticated user timbru) by webmail.ripe.net with HTTP; Sat, 1 Oct 2011 11:42:24 +0200 (CEST)
Message-ID: <49638.80.57.195.122.1317462144.squirrel@webmail.ripe.net>
Date: Sat, 01 Oct 2011 11:42:24 +0200
From: timbru@ripe.net
To: randy@psg.com, sra@hactrn.net
User-Agent: SquirrelMail/1.4.8-5.el5.centos.10
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: sidr@ietf.org
Subject: [sidr] some comments and questions regarding rpki-rtr
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Oct 2011 09:55:45 -0000
Hi Randy, Rob, wg, We have been working on rpki-rtr support in our validator at RIPE NCC over the past weeks. I found the 6.x sections describing typical scenario exchanges particularly useful. I have read the document in the past as well, but as we all know: with actual implementation come actual questions... so I have a couple: A = No changes B = Nonce and cache reset C = Duplicate announcements / withdrawals D = Keep alive timeout E = Cache shutdown A = No changes ============= So if I read 5.2 correctly the cache should respond with just 1 end-of-data pdu if there are no updates since the serial included in a serial request. But if read 5.4 I can also interpret this that we should respond with 2 pdus: 1 cache response, 0 data records, 1 end-of-data Can you please tell me which is correct? Like this? Cache Router ~ ~ | <----- Serial Query ------- | R requests data | | | ----- Cache Response -----> | C confirms request | ------ End of Data ------> | C sends End of Data | | and sends *same* serial ~ ~ This is what we are doing now.. In any case I think it would be useful to have this somewhere in 6.2, or a separate 6.x section. B = Nonce and cache reset ===================== When I read 5.10 the nonce is generated when the cache starts. And reading 6.3 the cache may send a cache reset reply to the client when there are no incremental updates available. Does this imply that a new cache nonce should be generated? I assumed that it did not. The nonce is made when the process starts. So when a client sends a reset query the same nonce may be kept. The client just gets a new, full, data set, up to the current serial for that same nonce. If not, then we would have to keep track of nonce-s and serials for each connected child, or reset them all.. I am afraid that would not scale very well. Part of the reason I am asking is that we are currently not yet able to send incremental updates. So our cache always replies as described in 6.3. We are not resetting the nonce though, and we are seeing duplicate announcement errors from the routers. So: is our cache wrong not to reset the nonce? Can section 6.3 be amended to be explicit about this? C = Duplicate announcements / withdrawals =================================== As described in 5.5: The cache server MUST ensure that it has told the router client to have one and only one IPvX PDU for a unique {prefix, len, max-len, asn} at any one point in time. So this means that cache should exclude duplicates in a full update even if the same unique {prefix, len, max-len, asn} exists more than once (same ROA, multiple prefixes, or different ROAs). I probably missed the discussion on this, but can you explain why this is? I don't see a conflict. If I get the same announce twice, it's still just announce? I am also wondering what this means wrt serial updates. Let me clarify by example: 10/16 is announced in serial 2, withdrawn in 3, announced again in 4. The router has serial 1. Should the cache then work out the exact delta between 1 and 4, or can it send 1-2, followed by 2-3, followed by 3-4. I can imagine that from the routers perspective it's very useful if the cache takes care of duplicates and sends just one big delta, and not the full history since the router last asked. I am afraid though, that this may cause scaling issues when a potentially large number of routers use the same cache (cpu), or a large number of pre-computed deltas need to be kept (memory). I think that if this responsibility were just handled by the routers we would have much better scaling on the cache side, and it would be much easier for caches to keep incremental updates without having to resort to no-incremental-updates like 6.3 describes. D = Keep alive timeout =============== As described in 6.1: To limit the length of time a cache must keep the data necessary to generate incremental updates, a router MUST send either a Serial Query or a Reset Query no less frequently than once an hour. This also acts as a keep alive at the application layer. So, we have interpreted this to say that it's probably good on our side to drop the connection after 1 hour. It's must likely dead and we want our resource back... When we do this, do you think it would be good if we tried to send an error pdu just before closure? With a new error code indicating session timeout? E = Cache shutdown ============== When the cache is stopped for whatever reason. Server restart, cache had irrecoverable internal error, anything else... Should we send a new type of notify / error (with new specific code) to all children so that they can gracefully switch over to another cache -- or wait until we are back? Or should we just close the connections? Thanks, Tim PS: If you are a rpki-rtr router implementer and you want to do interop testing with us: please contact me.