Re: quick failover in SCTP

Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Fri, 08 October 2010 10:15 UTC

Return-Path: <Michael.Tuexen@lurchi.franken.de>
X-Original-To: tsvwg@core3.amsl.com
Delivered-To: tsvwg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EAE0E3A6847 for <tsvwg@core3.amsl.com>; Fri, 8 Oct 2010 03:15:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.1
X-Spam-Level:
X-Spam-Status: No, score=-1.1 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_43=0.6, J_CHICKENPOX_48=0.6, MIME_8BIT_HEADER=0.3, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VKnfOKDi+3C1 for <tsvwg@core3.amsl.com>; Fri, 8 Oct 2010 03:15:13 -0700 (PDT)
Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) by core3.amsl.com (Postfix) with ESMTP id A18EF3A67E3 for <tsvwg@ietf.org>; Fri, 8 Oct 2010 03:15:10 -0700 (PDT)
Received: from [IPv6:2002:5481:a2bb::224:36ff:feef:67d1] (unknown [IPv6:2002:5481:a2bb:0:224:36ff:feef:67d1]) by mail-n.franken.de (Postfix) with ESMTP id D3E791C0C0BD6; Fri, 8 Oct 2010 12:16:12 +0200 (CEST)
Subject: Re: quick failover in SCTP
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset="iso-8859-1"
From: Michael Tüxen <Michael.Tuexen@lurchi.franken.de>
In-Reply-To: <AANLkTi=2uGTkPjtcohABJ3vpy=5ryAMzXSwGiq3tXgi_@mail.gmail.com>
Date: Fri, 08 Oct 2010 12:16:12 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <17E9584A-5AC1-4CC7-AC93-7207412DD05B@lurchi.franken.de>
References: <AANLkTi=07JfcQOKhfLouaU8N6=r57Koh9fKw+j3=O56R@mail.gmail.com> <8CF58AED-5D88-46A3-B873-26AAB8DAF9BD@lurchi.franken.de> <AANLkTi=2uGTkPjtcohABJ3vpy=5ryAMzXSwGiq3tXgi_@mail.gmail.com>
To: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
X-Mailer: Apple Mail (2.1081)
Cc: tsvwg@ietf.org, Preethi Natarajan <prenatar@cisco.com>
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tsvwg>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Oct 2010 10:15:15 -0000

Hello Nishida-san,

OK, I thought about this some time and think it would be good
to specify a way for a quick failover which can be implemented
at the sender only.

I would like to see some extensions to you suggestion:
* Introduce a threshold (call it PFMR for now).
  Then use
  if (error counter > PMR)
     the path is inactive.
  if (error counter <= PMR) && (error counter > PFMR)
     the path is potentially failed.
  Using PFMR = 0 is what you suggest, PFMR=PMR gives
  the old behavior.

* Specify that you start sending HBs when the path is
  PF. Explicitly allow PFHB.interval=0, which
  I think is a good choice. Maybe we can just remove PFHB.interval.

* Make sure that the following works: The application disabled HBs.
  When a path enters PF (or failed) HB are sent to get the path
  active again. If it is active, no HB should be send (since
  the application disables them).

* Provide a way that applications are not bothered with
  state change notification related to PF when not explicitly
  subscribed.

* Make clear what to do when all paths are PF.

* Make clear what to do when all paths are failed.

What do you think?

Best regards
Michael

On Aug 24, 2010, at 12:04 PM, Yoshifumi Nishida wrote:

> Hello Michael,
> 
> Thanks for your reply. In a nutshell, the difference between PF and PMR=0 are:
>   PF allows SCTP to use another path while SCTP is checking the
> primary path is active or inactive, but don't change the behavior of
> marking the path as inactive.
>   PMR=0 allows SCTP to mark the primary as inactive quickly.
> 
> In my feeling, this will create several differences although it looks similar
> 
> For example, if we use PMR=0, we will need to modify at least the
> following points in the RFC4960
>   1) recommended value for PMR
>   2) behavior in dormant state
>   3) relationship between PMR and AMR
>        RFC4960 states users should avoid having the value of
> 'Association.Max.Retrans' larger than the
>        summation of the 'Path.Max.Retrans'  we'll need to change this part.
> 
> Also, I think we'll need to think about the following points
>   a) Some of current applications or OS local configurations might
> already have specified PMR on their own. If they're not using PMR=0,
> their benefit might be reduced.
>   b) When you have 100Mbps and 1Mbps links and you set 100Mbps as
> primary, everytime packet loss happens on the 100Mbps link, it will
> switch over to the 1Mbps link and have to wait for HeartBeat which is
> likely less frequent (30secs or so). Or, you'll need to add special
> logic here in the spec.
> 
> If we use PF,
>   PF allows SCTP to keep PMR and AMR unchange. Hence, we don't have
> to modify 1) and 3).
>       and issue 2) will be a minor point since we don't change PMR and AMR.
>   Also, PF already has a solution for b) as described in the draft.
> 
> In my view, PMR=0 will requires several "modifications" to the spec
> which might be a bit tricky to understand for implementers while PF
> will requires explicit "addition".
> 
> Thanks,
> --
> Yoshifumi Nishida
> nishida@sfc.wide.ad.jp
> 
> 
> 2010/8/23 Michael Tüxen <Michael.Tuexen@lurchi.franken.de>
>> 
>> On Aug 17, 2010, at 12:16 PM, Yoshifumi Nishida wrote:
>> 
>>> Hi folks,
>>> This is a follow-up from the Maastricht meeting.
>>> Preethi and I are proposing quick failover algorithm in SCTP and gave a presentation about this one.
>>> 
>>> In my feeling, the community seems to be positive in enhancing the SCTP standard to some extent to address this issue.
>>> Also, in my understanding, Michael and Randy suggested that minor updates in the current spec can have the similar effects as the PF approach can do.
>>> So,  we're going to start with investigating the alternatives for PF approach and would like to know about the detail of the suggestion.
>>> If Michael and Randy could give us some info about this, we would be grateful very much.
>> Sure.
>> One point is that RFC 4960 does not specify what to do when all paths
>> are INACTIVE. If that happens, the association is called dormant.
>> I think it should be clarified that you still send HB to get a path
>> to ACTIVE again until the association finally fails. This could be
>> handled as an errata, I think.
>> 
>> Now assume that handling of the dormant state.  If I understand PF correctly,
>> you could simply set Path.Max.Retrans to 0 to get the same behavior on the
>> wire as when using PF, or am I missing something? The only difference I see,
>> are the path state change notification sent locally to the user.
>> 
>> But maybe I have overlooked something...
>> 
>> Best regards
>> Michael
>>> Also, if someone who has comments or feedbacks for this, please let us know.
>>> 
>>> Thank you so much.
>>> --
>>> Yoshifumi Nishida
>>> nishida@sfc.wide.ad.jp
>>> 
>>> 
>>> 
>> 
>