Re: [Hipsec] fault-tolerance for base exchange and update

Miika Komu <miika.komu@hiit.fi> Thu, 07 January 2010 15:14 UTC

Message-ID: <4B45FAA7.9030609@hiit.fi>
Date: Thu, 07 Jan 2010 17:15:51 +0200
From: Miika Komu <miika.komu@hiit.fi>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: Tobias Heer <heer@cs.rwth-aachen.de>
References: <4B458BB7.8090000@hiit.fi> <8651FB5B-E07F-4EDC-8A8D-434C44AE8E05@cs.rwth-aachen.de>
In-Reply-To: <8651FB5B-E07F-4EDC-8A8D-434C44AE8E05@cs.rwth-aachen.de>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: hip WG <hipsec@ietf.org>
Subject: Re: [Hipsec] fault-tolerance for base exchange and update
Precedence: list
Reply-To: miika.komu@hiit.fi

Tobias Heer wrote:

Hi,

> Hi,
> 
> Am 07.01.2010 um 08:22 schrieb Miika Komu:
> 
>> Hi,
>> 
>> Baris Boyvat has implemented an experimental fault-tolerance
>> extension for the HIP base exchange and UPDATE in the HIPL
>> implementation. He will document it in his master thesis during
>> this year, but I would like to start discussion of the topic
>> already now.
>> 
> Great. I think this extension is really worth some deeper
> investigation. In our own tests with HIP(L) we found that timing and
> aggressiveness regarding retransmissions and opportunistic double
> transmissions can greatly improve the performance.
> 
>> At the protocol level, the extension allows sending multiple I1 or
>> UPDATE-with-locator packets sequentially. The idea is to scan
>> through all possible source and destination IP pairs at the HIP
>> layer to improve  the chances for successful initial contact (I1)
>> and to re-establish contact (UPDATE-with-locator) in way similar to
>> the NAT-ICE extensions. We have playfully called the extension as
>> "shotgun" mode in the implementation :)
>> 
>> The obvious difference to ICE is that the shotgun mode works at the
>> HIP protocol layer. A non-obvious difference is that the approach
>> supports also fault-tolerance for a single relay/rendezvous
>> (Responder's RVS has crashed) and it can make use of multiple
>> relay/rendezvous servers for better redundancy. At the moment,
>> neither of these are possible direcly with the ICE-NAT extensions.
>> I actually believe the shotgun approach can be applied even with
>> the ICE-NAT extensions to improve fault-tolerance.
>> 
>> The shotgun approach seems useful to improve fault-tolerance with
>> an without (single or multiple) rendezvous/relay middleboxes, but
>> there is also another use case for this. The Initiator (or Mobile
>> Node) can learn multiple mappings for the peer, some of which may
>> have connectivity and some not. It is also possible that a malign
>> user intentionally sends invalid mappings for a well-known service
>> in a multiuser system (this case also requires some rate control
>> for mappings per user). In such scenarios, it is useful to try
>> multiple peer addresses sequentially instead of just single one.
>> 
>> Minimally, the approach requires few considerations in an
>> implementation:
>> 
>> i) Allow sending of multiple I1 and UPDATE-with-locator packets in
>> a rate-controlled fashion ii) Filter redundant incoming packets.
>> 
>> Case (ii) could be implemented as filtering of I1 packets or
>> filtering of R1 packets. We chose filtering of redundant R1 packets
>> in the implementation and it required a small change in the state
>> machine. For the UPDATE filtering, filtering based on sequence
>> numbers was sufficient.
>> 
>> I would like the WG feedback on whether we could include this
>> approach in RFC5201-bis and RFC5206-bis (as MAY or SHOULD).
> 
> I would like to see this as a separate document that solely focuses
> on fault-tolerance and performance I think the shotgun extension is a
> first step to a comprehensive document. My two reasons for this are:
>  a) I think solving the problem goes beyond the scope of the base
> documents because this problem domain offers more possible solutions
> than the shotgun mode. A separate document could discuss use cases
> and solutions in more depth than it can be done in the base
> documents. b) Measures for improving fault tolerance may be quite
> specific to a scenario and may require to make some assumptions that
> cannot be made in the general case.

Well, I am just a bit skeptic that this will be never taken into use if 
the state machine filtering part are not part of RFC5201-bis and 
RFC5206-bis.

> Some more thoughts on fault tolerance:
> 
> As far as I understood, the shotgun extension only works with
> multiple interfaces. What about optimizations for single-homed hosts?

the shotgun mode does not "care" about interfaces. It pairs up
addresses, not interfaces. So if you've got two addresses on a single
interface machine and peer has got one, two redundant packets will be sent.

> As far as I understood, the shotgun mode will make the mobile devices
> switch interfaces quite aggressively. What happens if the primary
> interface (e.g. WiFi) is temporarily down (because of a recent L2/L3
> handover). The shotgun mode will determine that the secondary
> interface (e.g., GPRS) is working and will switch to the secondary
> interface? Do we need a mechanism to switch back as soon as the
> primary interface is available again?


This is a matter of the UPDATE policy and has nothing to do with the 
shotgun extension we're proposing. The shotgun mode just means that you 
send all I1 and UPDATE-with-locator through all known source IP and 
destination IP combinations. So the shotgun mode is quite dumb and simple.

But perhaps I just misunderstood you. I haven't really thought about 
optimizing the shotgun - probably there's room for making it more clever.

> Variable timeouts and increased redundancy depending on the current
> situation (e.g., high packet loss -> more redundancy) might be an
> option, too.
> 
> Thanks for the work you already did in this problem domain.

You're welcome.

[Hipsec] fault-tolerance for base exchange and up… Miika Komu
Re: [Hipsec] fault-tolerance for base exchange an… Andrew McGregor
Re: [Hipsec] fault-tolerance for base exchange an… Tobias Heer
Re: [Hipsec] fault-tolerance for base exchange an… Miika Komu
Re: [Hipsec] fault-tolerance for base exchange an… Laganier, Julien
Re: [Hipsec] fault-tolerance for base exchange an… Miika Komu