Re: Question about REAP state transition (draft-ietf-shim6-failure-detection-09)

Iljitsch van Beijnum <iljitsch@muada.com> Wed, 06 February 2008 09:29 UTC

Return-Path: <owner-shim6@psg.com>
X-Original-To: ietfarch-shim6-archive-oY2iet1p@core3.amsl.com
Delivered-To: ietfarch-shim6-archive-oY2iet1p@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 18A7B3A6B12 for <ietfarch-shim6-archive-oY2iet1p@core3.amsl.com>; Wed, 6 Feb 2008 01:29:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.014
X-Spam-Level:
X-Spam-Status: No, score=-4.014 tagged_above=-999 required=5 tests=[AWL=2.285, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4]
Received: from core3.amsl.com ([127.0.0.1]) by localhost (mail.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0tpclFsDSfup for <ietfarch-shim6-archive-oY2iet1p@core3.amsl.com>; Wed, 6 Feb 2008 01:29:49 -0800 (PST)
Received: from psg.com (psg.com [147.28.0.62]) by core3.amsl.com (Postfix) with ESMTP id 36DC23A6A73 for <shim6-archive-oY2iet1p@lists.ietf.org>; Wed, 6 Feb 2008 01:29:49 -0800 (PST)
Received: from majordom by psg.com with local (Exim 4.68 (FreeBSD)) (envelope-from <owner-shim6@psg.com>) id 1JMgOp-0001w7-0n for shim6-data@psg.com; Wed, 06 Feb 2008 09:16:59 +0000
Received: from [2001:1af8:2:5::2] (helo=sequoia.muada.com) by psg.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from <iljitsch@muada.com>) id 1JMgOm-0001vb-0O for shim6@psg.com; Wed, 06 Feb 2008 09:16:57 +0000
Received: from [192.168.0.196] (static-167-138-7-89.ipcom.comunitel.net [89.7.138.167] (may be forged)) (authenticated bits=0) by sequoia.muada.com (8.13.3/8.13.3) with ESMTP id m169GXo3076130 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 6 Feb 2008 10:16:33 +0100 (CET) (envelope-from iljitsch@muada.com)
Cc: shim6@psg.com
Message-Id: <69689EBC-DE88-431F-B67D-86CD90BB0F26@muada.com>
From: Iljitsch van Beijnum <iljitsch@muada.com>
To: Alberto García <alberto@it.uc3m.es>
In-Reply-To: <00ba01c867e0$fece0930$7d8b75a3@bombo>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v915)
Subject: Re: Question about REAP state transition (draft-ietf-shim6-failure-detection-09)
Date: Wed, 06 Feb 2008 10:16:43 +0100
References: <00a901c83cda$348270c0$7d8b75a3@bombo> <F2BAA8B3-38D1-4A8C-BF8E-1B0B149730E4@muada.com> <00e501c84180$0e0e59e0$7d8b75a3@bombo> <CEF283EA-89D2-4D16-B6E8-46C7BF702556@muada.com> <00ba01c867e0$fece0930$7d8b75a3@bombo>
X-Mailer: Apple Mail (2.915)
Sender: owner-shim6@psg.com
Precedence: bulk

On 5 feb 2008, at 11:22, Alberto García wrote:

> It is nice to clarify that information related to the received  
> probes should
> be included in the probes sent in the InboundOK state.

> However, I still think that these information MUST be included also if
> available in the Exploring state (and not optionally, in "MAY"-style).

Well, I clarified that the only probes we copy back are the ones since  
the last transition from Operational to Exploring, because otherwise  
it's possible to copy back old probes that were received before the  
failure happened. And since the reception of an inbound probe means  
going from Exploring to InboundOK it's impossible to have any probes  
to copy back in Exploring. Maybe it's useful to make copying back one  
probe mandatory in Operational too, though.

> In B, the Retransmission Timer of B expires because a valid path  
> from A to B
> was not found,

What do you mean by "retransmission timer"? There is no timer with  
that name.

Probes are sent at certain intervals without considering whether  
they're retransmissions.

> so B starts testing other paths that are not working.

B keeps testing paths until it sees A is in state InboundOK.

> Then, A
> stops receiving data from B, so the Send timer expires (I don't find  
> any
> reason why all the possible paths should be explored in less than Send
> Timeout time, so A could not test all possible paths from A to B in  
> this
> time).

The Send Timeout is for determining when the probing starts. The  
probing process does not depend on the Send Timeout.

Because probing exponentially backs off, a good number of them are  
sent in the first minute or so (I think 17, but it depends on the  
exact values for the exponential backoff) but at some point, the  
probing rate is only one per minute. In theory, this means you will  
find any working address pair if you wait long enough. In practice,  
you don't really care anymore after 30 - 300 seconds, depending on  
transport timers and user patience. This means the number of address  
pairs shouldn't be more than 16 or so (4 addresses on each end).

> Then, A falls to the Exploring state, and (in the supposition of the
> previous paragraph) forgets about the working path from B to A. May  
> be now A
> sends a probe to B through a working path. but in B happens the same  
> (it
> tries now with different paths from B to A that are no valid, so A  
> tries
> another paths from A to B abandoning the good one...).

The inclusion of at least one probe that was received earlier in  
outgoing probes should fix this: when you get a packet from the other  
side, you know at least one working address pair from here to there.  
Obviously this won't work if reachability changes in the interim.