Re: [Hipsec] fault-tolerance for base exchange and update

Miika Komu <> Thu, 07 January 2010 15:14 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 535363A6847 for <>; Thu, 7 Jan 2010 07:14:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.398
X-Spam-Status: No, score=-2.398 tagged_above=-999 required=5 tests=[AWL=0.201, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 8RS8DODbgaVX for <>; Thu, 7 Jan 2010 07:14:02 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 471F63A6838 for <>; Thu, 7 Jan 2010 07:14:02 -0800 (PST)
Received: from [] ( []) by (Postfix) with ESMTP id 5261B25ED1A; Thu, 7 Jan 2010 17:13:59 +0200 (EET)
Message-ID: <>
Date: Thu, 07 Jan 2010 17:15:51 +0200
From: Miika Komu <>
User-Agent: Thunderbird (X11/20090817)
MIME-Version: 1.0
To: Tobias Heer <>
References: <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: hip WG <>
Subject: Re: [Hipsec] fault-tolerance for base exchange and update
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "This is the official IETF Mailing List for the HIP Working Group." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 07 Jan 2010 15:14:04 -0000

Tobias Heer wrote:


> Hi,
> Am 07.01.2010 um 08:22 schrieb Miika Komu:
>> Hi,
>> Baris Boyvat has implemented an experimental fault-tolerance
>> extension for the HIP base exchange and UPDATE in the HIPL
>> implementation. He will document it in his master thesis during
>> this year, but I would like to start discussion of the topic
>> already now.
> Great. I think this extension is really worth some deeper
> investigation. In our own tests with HIP(L) we found that timing and
> aggressiveness regarding retransmissions and opportunistic double
> transmissions can greatly improve the performance.
>> At the protocol level, the extension allows sending multiple I1 or
>> UPDATE-with-locator packets sequentially. The idea is to scan
>> through all possible source and destination IP pairs at the HIP
>> layer to improve  the chances for successful initial contact (I1)
>> and to re-establish contact (UPDATE-with-locator) in way similar to
>> the NAT-ICE extensions. We have playfully called the extension as
>> "shotgun" mode in the implementation :)
>> The obvious difference to ICE is that the shotgun mode works at the
>> HIP protocol layer. A non-obvious difference is that the approach
>> supports also fault-tolerance for a single relay/rendezvous
>> (Responder's RVS has crashed) and it can make use of multiple
>> relay/rendezvous servers for better redundancy. At the moment,
>> neither of these are possible direcly with the ICE-NAT extensions.
>> I actually believe the shotgun approach can be applied even with
>> the ICE-NAT extensions to improve fault-tolerance.
>> The shotgun approach seems useful to improve fault-tolerance with
>> an without (single or multiple) rendezvous/relay middleboxes, but
>> there is also another use case for this. The Initiator (or Mobile
>> Node) can learn multiple mappings for the peer, some of which may
>> have connectivity and some not. It is also possible that a malign
>> user intentionally sends invalid mappings for a well-known service
>> in a multiuser system (this case also requires some rate control
>> for mappings per user). In such scenarios, it is useful to try
>> multiple peer addresses sequentially instead of just single one.
>> Minimally, the approach requires few considerations in an
>> implementation:
>> i) Allow sending of multiple I1 and UPDATE-with-locator packets in
>> a rate-controlled fashion ii) Filter redundant incoming packets.
>> Case (ii) could be implemented as filtering of I1 packets or
>> filtering of R1 packets. We chose filtering of redundant R1 packets
>> in the implementation and it required a small change in the state
>> machine. For the UPDATE filtering, filtering based on sequence
>> numbers was sufficient.
>> I would like the WG feedback on whether we could include this
>> approach in RFC5201-bis and RFC5206-bis (as MAY or SHOULD).
> I would like to see this as a separate document that solely focuses
> on fault-tolerance and performance I think the shotgun extension is a
> first step to a comprehensive document. My two reasons for this are:
>  a) I think solving the problem goes beyond the scope of the base
> documents because this problem domain offers more possible solutions
> than the shotgun mode. A separate document could discuss use cases
> and solutions in more depth than it can be done in the base
> documents. b) Measures for improving fault tolerance may be quite
> specific to a scenario and may require to make some assumptions that
> cannot be made in the general case.

Well, I am just a bit skeptic that this will be never taken into use if 
the state machine filtering part are not part of RFC5201-bis and 

> Some more thoughts on fault tolerance:
> As far as I understood, the shotgun extension only works with
> multiple interfaces. What about optimizations for single-homed hosts?

the shotgun mode does not "care" about interfaces. It pairs up
addresses, not interfaces. So if you've got two addresses on a single
interface machine and peer has got one, two redundant packets will be sent.

> As far as I understood, the shotgun mode will make the mobile devices
> switch interfaces quite aggressively. What happens if the primary
> interface (e.g. WiFi) is temporarily down (because of a recent L2/L3
> handover). The shotgun mode will determine that the secondary
> interface (e.g., GPRS) is working and will switch to the secondary
> interface? Do we need a mechanism to switch back as soon as the
> primary interface is available again?

This is a matter of the UPDATE policy and has nothing to do with the 
shotgun extension we're proposing. The shotgun mode just means that you 
send all I1 and UPDATE-with-locator through all known source IP and 
destination IP combinations. So the shotgun mode is quite dumb and simple.

But perhaps I just misunderstood you. I haven't really thought about 
optimizing the shotgun - probably there's room for making it more clever.

> Variable timeouts and increased redundancy depending on the current
> situation (e.g., high packet loss -> more redundancy) might be an
> option, too.
> Thanks for the work you already did in this problem domain.

You're welcome.