Re: quick failover in SCTP

Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Sat, 09 October 2010 12:33 UTC

Return-Path: <Michael.Tuexen@lurchi.franken.de>
X-Original-To: tsvwg@core3.amsl.com
Delivered-To: tsvwg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B8A9A3A6895 for <tsvwg@core3.amsl.com>; Sat, 9 Oct 2010 05:33:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.023
X-Spam-Level:
X-Spam-Status: No, score=-0.023 tagged_above=-999 required=5 tests=[AWL=-1.077, BAYES_00=-2.599, FRT_BELOW2=2.154, J_CHICKENPOX_43=0.6, J_CHICKENPOX_48=0.6, MIME_8BIT_HEADER=0.3, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r1d+5sIML8R9 for <tsvwg@core3.amsl.com>; Sat, 9 Oct 2010 05:33:29 -0700 (PDT)
Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) by core3.amsl.com (Postfix) with ESMTP id B44C33A6852 for <tsvwg@ietf.org>; Sat, 9 Oct 2010 05:33:27 -0700 (PDT)
Received: from [IPv6:2002:5481:bdc8::224:36ff:feef:67d1] (unknown [IPv6:2002:5481:bdc8:0:224:36ff:feef:67d1]) by mail-n.franken.de (Postfix) with ESMTP id D8A8A1C0B4619; Sat, 9 Oct 2010 14:34:31 +0200 (CEST)
Subject: Re: quick failover in SCTP
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset="iso-8859-1"
From: Michael Tüxen <Michael.Tuexen@lurchi.franken.de>
In-Reply-To: <AANLkTi=4NDwVgkgsndZYy_t6OcA4b3fgOBC4w+o+HEhA@mail.gmail.com>
Date: Sat, 09 Oct 2010 14:34:31 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <EC763CF2-05B6-4A3F-A508-72E482A5E5BE@lurchi.franken.de>
References: <AANLkTi=07JfcQOKhfLouaU8N6=r57Koh9fKw+j3=O56R@mail.gmail.com> <8CF58AED-5D88-46A3-B873-26AAB8DAF9BD@lurchi.franken.de> <AANLkTi=2uGTkPjtcohABJ3vpy=5ryAMzXSwGiq3tXgi_@mail.gmail.com> <17E9584A-5AC1-4CC7-AC93-7207412DD05B@lurchi.franken.de> <AANLkTi=4NDwVgkgsndZYy_t6OcA4b3fgOBC4w+o+HEhA@mail.gmail.com>
To: Yoshifumi Nishida <nishida@sfc.wide.ad.jp>
X-Mailer: Apple Mail (2.1081)
Cc: tsvwg@ietf.org, Preethi Natarajan <prenatar@cisco.com>
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tsvwg>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Oct 2010 12:33:31 -0000

On Oct 9, 2010, at 12:04 PM, Yoshifumi Nishida wrote:

> Hello Michael,
> 
> Thanks for your response.
> 
> 2010/10/8 Michael Tüxen <Michael.Tuexen@lurchi.franken.de>:
>> Hello Nishida-san,
>> 
>> OK, I thought about this some time and think it would be good
>> to specify a way for a quick failover which can be implemented
>> at the sender only.
> 
> Great!
> Please check my comments bellow.
> 
>> I would like to see some extensions to you suggestion:
>> * Introduce a threshold (call it PFMR for now).
>>  Then use
>>  if (error counter > PMR)
>>     the path is inactive.
>>  if (error counter <= PMR) && (error counter > PFMR)
>>     the path is potentially failed.
>>  Using PFMR = 0 is what you suggest, PFMR=PMR gives
>>  the old behavior.
> 
> I thought similar thing and I agree with providing a way to disable PF.
> I tend to agree with this idea, but one thing I'm not very sure is how
> PFMR != 0 && PFMR < PMR can be useful.
I could image someone wanting to call a path potentially failed
after 2 consecutive timer based retransmission instead of 1.
Just being a bit more conservative. This might help deploying
such a feature in SIGTRAN networks where CMT is not deployed.
For CMT traffic, PFMR==0 is the right choice, I guess.
But I think PF is also very helpful in non-CMT scenarios.
> If we just want a switch to disable PF, having USE_PF parameter might be enough?
> May be Preethi has an opinion on this?
> 
>> * Specify that you start sending HBs when the path is
>>  PF. Explicitly allow PFHB.interval=0, which
>>  I think is a good choice. Maybe we can just remove PFHB.interval.
> 
> Hmm. Sorry. I might not quite follow this point.
> Does PFHB.interval=0 mean suppress sending HB during PF state?
No. I mean just to send them every RTO.
Having a specific interval allows this by setting PFHB.interval=0.
I was just thinking whether one can remove the parameter and send
the HB every RTO (and doubling it)... The same as using PFHB.interval=0.

Does this makes things clearer?

Best regards
Michael
> 
> 
> I agree with all of the following points. I think these are very good points.
> 
>> * Make sure that the following works: The application disabled HBs.
>>  When a path enters PF (or failed) HB are sent to get the path
>>  active again. If it is active, no HB should be send (since
>>  the application disables them).
>> 
>> * Provide a way that applications are not bothered with
>>  state change notification related to PF when not explicitly
>>  subscribed.
>> 
>> * Make clear what to do when all paths are PF.
>> 
>> * Make clear what to do when all paths are failed.
>> 
>> What do you think?
>> 
>> Best regards
>> Michael
>> 
>> On Aug 24, 2010, at 12:04 PM, Yoshifumi Nishida wrote:
>> 
>>> Hello Michael,
>>> 
>>> Thanks for your reply. In a nutshell, the difference between PF and PMR=0 are:
>>>   PF allows SCTP to use another path while SCTP is checking the
>>> primary path is active or inactive, but don't change the behavior of
>>> marking the path as inactive.
>>>   PMR=0 allows SCTP to mark the primary as inactive quickly.
>>> 
>>> In my feeling, this will create several differences although it looks similar
>>> 
>>> For example, if we use PMR=0, we will need to modify at least the
>>> following points in the RFC4960
>>>   1) recommended value for PMR
>>>   2) behavior in dormant state
>>>   3) relationship between PMR and AMR
>>>        RFC4960 states users should avoid having the value of
>>> 'Association.Max.Retrans' larger than the
>>>        summation of the 'Path.Max.Retrans'  we'll need to change this part.
>>> 
>>> Also, I think we'll need to think about the following points
>>>   a) Some of current applications or OS local configurations might
>>> already have specified PMR on their own. If they're not using PMR=0,
>>> their benefit might be reduced.
>>>   b) When you have 100Mbps and 1Mbps links and you set 100Mbps as
>>> primary, everytime packet loss happens on the 100Mbps link, it will
>>> switch over to the 1Mbps link and have to wait for HeartBeat which is
>>> likely less frequent (30secs or so). Or, you'll need to add special
>>> logic here in the spec.
>>> 
>>> If we use PF,
>>>   PF allows SCTP to keep PMR and AMR unchange. Hence, we don't have
>>> to modify 1) and 3).
>>>       and issue 2) will be a minor point since we don't change PMR and AMR.
>>>   Also, PF already has a solution for b) as described in the draft.
>>> 
>>> In my view, PMR=0 will requires several "modifications" to the spec
>>> which might be a bit tricky to understand for implementers while PF
>>> will requires explicit "addition".
>>> 
>>> Thanks,
>>> --
>>> Yoshifumi Nishida
>>> nishida@sfc.wide.ad.jp
>>> 
>>> 
>>> 2010/8/23 Michael Tüxen <Michael.Tuexen@lurchi.franken.de>
>>>> 
>>>> On Aug 17, 2010, at 12:16 PM, Yoshifumi Nishida wrote:
>>>> 
>>>>> Hi folks,
>>>>> This is a follow-up from the Maastricht meeting.
>>>>> Preethi and I are proposing quick failover algorithm in SCTP and gave a presentation about this one.
>>>>> 
>>>>> In my feeling, the community seems to be positive in enhancing the SCTP standard to some extent to address this issue.
>>>>> Also, in my understanding, Michael and Randy suggested that minor updates in the current spec can have the similar effects as the PF approach can do.
>>>>> So,  we're going to start with investigating the alternatives for PF approach and would like to know about the detail of the suggestion.
>>>>> If Michael and Randy could give us some info about this, we would be grateful very much.
>>>> Sure.
>>>> One point is that RFC 4960 does not specify what to do when all paths
>>>> are INACTIVE. If that happens, the association is called dormant.
>>>> I think it should be clarified that you still send HB to get a path
>>>> to ACTIVE again until the association finally fails. This could be
>>>> handled as an errata, I think.
>>>> 
>>>> Now assume that handling of the dormant state.  If I understand PF correctly,
>>>> you could simply set Path.Max.Retrans to 0 to get the same behavior on the
>>>> wire as when using PF, or am I missing something? The only difference I see,
>>>> are the path state change notification sent locally to the user.
>>>> 
>>>> But maybe I have overlooked something...
>>>> 
>>>> Best regards
>>>> Michael
>>>>> Also, if someone who has comments or feedbacks for this, please let us know.
>>>>> 
>>>>> Thank you so much.
>>>>> --
>>>>> Yoshifumi Nishida
>>>>> nishida@sfc.wide.ad.jp
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>