Re: [Hipsec] fault-tolerance for base exchange and update

Tobias Heer <heer@cs.rwth-aachen.de> Thu, 07 January 2010 10:51 UTC

Return-Path: <heer@informatik.rwth-aachen.de>
X-Original-To: hipsec@core3.amsl.com
Delivered-To: hipsec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 424C13A6811 for <hipsec@core3.amsl.com>; Thu, 7 Jan 2010 02:51:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.801
X-Spam-Level:
X-Spam-Status: No, score=-4.801 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_DE=0.35, HELO_MISMATCH_DE=1.448, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JYO5ZFc-UzxF for <hipsec@core3.amsl.com>; Thu, 7 Jan 2010 02:51:36 -0800 (PST)
Received: from mta-1.ms.rz.rwth-aachen.de (mta-1.ms.rz.RWTH-Aachen.DE [134.130.7.72]) by core3.amsl.com (Postfix) with ESMTP id 00C763A680A for <hipsec@ietf.org>; Thu, 7 Jan 2010 02:51:35 -0800 (PST)
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: text/plain; charset="us-ascii"
Received: from ironport-out-1.rz.rwth-aachen.de ([134.130.5.40]) by mta-1.ms.rz.RWTH-Aachen.de (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008)) with ESMTP id <0KVV00140I5WU540@mta-1.ms.rz.RWTH-Aachen.de> for hipsec@ietf.org; Thu, 07 Jan 2010 11:51:32 +0100 (CET)
X-IronPort-AV: E=Sophos;i="4.49,234,1262559600"; d="scan'208";a="40013709"
Received: from relay-auth-1.ms.rz.rwth-aachen.de (HELO relay-auth-1) ([134.130.7.78]) by ironport-in-1.rz.rwth-aachen.de with ESMTP; Thu, 07 Jan 2010 11:51:32 +0100
Received: from umic-137-226-154-185.nn.rwth-aachen.de ([unknown] [137.226.154.185]) by relay-auth-1.ms.rz.rwth-aachen.de (Sun Java(tm) System Messaging Server 7.0-3.01 64bit (built Dec 9 2008)) with ESMTPA id <0KVV00CF1I5WXV10@relay-auth-1.ms.rz.rwth-aachen.de> for hipsec@ietf.org; Thu, 07 Jan 2010 11:51:32 +0100 (CET)
From: Tobias Heer <heer@cs.rwth-aachen.de>
In-reply-to: <4B458BB7.8090000@hiit.fi>
Date: Thu, 07 Jan 2010 12:51:32 +0100
Message-id: <8651FB5B-E07F-4EDC-8A8D-434C44AE8E05@cs.rwth-aachen.de>
References: <4B458BB7.8090000@hiit.fi>
To: miika.komu@hiit.fi
X-Mailer: Apple Mail (2.1077)
Cc: hip WG <hipsec@ietf.org>
Subject: Re: [Hipsec] fault-tolerance for base exchange and update
X-BeenThere: hipsec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "This is the official IETF Mailing List for the HIP Working Group." <hipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hipsec>, <mailto:hipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hipsec>
List-Post: <mailto:hipsec@ietf.org>
List-Help: <mailto:hipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hipsec>, <mailto:hipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jan 2010 10:51:37 -0000

Hi, 

Am 07.01.2010 um 08:22 schrieb Miika Komu:

> Hi,
> 
> Baris Boyvat has implemented an experimental fault-tolerance extension for the HIP base exchange and UPDATE in the HIPL implementation. He will document it in his master thesis during this year, but I would like to start discussion of the topic already now.
> 
Great. I think this extension is really worth some deeper investigation. In our own tests with HIP(L) we found that timing and aggressiveness regarding retransmissions and opportunistic double transmissions can greatly improve the performance.

> At the protocol level, the extension allows sending multiple I1 or UPDATE-with-locator packets sequentially. The idea is to scan through all possible source and destination IP pairs at the HIP layer to improve  the chances for successful initial contact (I1) and to re-establish contact (UPDATE-with-locator) in way similar to the NAT-ICE extensions. We have playfully called the extension as "shotgun" mode in the implementation :)
> 
> The obvious difference to ICE is that the shotgun mode works at the HIP protocol layer. A non-obvious difference is that the approach supports also fault-tolerance for a single relay/rendezvous (Responder's RVS has crashed) and it can make use of multiple relay/rendezvous servers for better redundancy. At the moment, neither of these are possible direcly with the ICE-NAT extensions. I actually believe the shotgun approach can be applied even with the ICE-NAT extensions to improve fault-tolerance.
> 
> The shotgun approach seems useful to improve fault-tolerance with an without (single or multiple) rendezvous/relay middleboxes, but there is also another use case for this. The Initiator (or Mobile Node) can learn multiple mappings for the peer, some of which may have connectivity and some not. It is also possible that a malign user intentionally sends invalid mappings for a well-known service in a multiuser system (this case also requires some rate control for mappings per user). In such scenarios, it is useful to try multiple peer addresses sequentially instead of just single one.
> 
> Minimally, the approach requires few considerations in an implementation:
> 
> i) Allow sending of multiple I1 and UPDATE-with-locator packets in a rate-controlled fashion
> ii) Filter redundant incoming packets.
> 
> Case (ii) could be implemented as filtering of I1 packets or filtering of R1 packets. We chose filtering of redundant R1 packets in the implementation and it required a small change in the state machine. For the UPDATE filtering, filtering based on sequence numbers was sufficient.
> 
> I would like the WG feedback on whether we could include this approach in RFC5201-bis and RFC5206-bis (as MAY or SHOULD).

I would like to see this as a separate document that solely focuses on fault-tolerance and performance I think the shotgun extension is a first step to a comprehensive document. My two reasons for this are: 
a) I think solving the problem goes beyond the scope of the base documents because this problem domain offers more possible solutions than the shotgun mode. A separate document could discuss use cases and solutions in more depth than it can be done in the base documents.
b) Measures for improving fault tolerance may be quite specific to a scenario and may require to make some assumptions that cannot be made in the general case.



Some more thoughts on fault tolerance:

As far as I understood, the shotgun extension only works with multiple interfaces. What about optimizations for single-homed hosts? 

As far as I understood, the shotgun mode will make the mobile devices switch interfaces quite aggressively. What happens if the primary interface (e.g. WiFi) is temporarily down (because of a recent L2/L3 handover). The shotgun mode will determine that the secondary interface (e.g., GPRS) is working and will switch to the secondary interface? Do we need a mechanism to switch back as soon as the primary interface is available again?  

Variable timeouts and increased redundancy depending on the current situation (e.g., high packet loss -> more redundancy) might be an option, too. 

Thanks for the work you already did in this problem domain.

Tobias

> 
> P.S. Maybe Baris has something to add or to explain some details better.
> _______________________________________________
> Hipsec mailing list
> Hipsec@ietf.org
> https://www.ietf.org/mailman/listinfo/hipsec




--  

Dipl.-Inform. Tobias Heer, Ph.D. Student
Distributed Systems Group 
RWTH Aachen University, Germany
tel: +49 241 80 207 76
web: http://ds.cs.rwth-aachen.de/members/heer