Re: [clue] Question about timers (retry and timeouts) in draft-ietf-clue-protocol

Simon Pietro Romano <spromano@unina.it> Fri, 06 April 2018 18:27 UTC

Return-Path: <spromano@unina.it>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8E8C127333 for <clue@ietfa.amsl.com>; Fri, 6 Apr 2018 11:27:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2HXoq5HkRknG for <clue@ietfa.amsl.com>; Fri, 6 Apr 2018 11:27:55 -0700 (PDT)
Received: from unina.it (fmvip.unina.it [IPv6:2001:760:3403:ffff::7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A46BF127136 for <clue@ietf.org>; Fri, 6 Apr 2018 11:27:54 -0700 (PDT)
Received: from smtp2.unina.it (smtp2.unina.it [192.132.34.62]) by leas1.unina.it with ESMTP id w36IRk2T010851-w36IRk2V010851 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Apr 2018 20:27:46 +0200
Received: from [192.168.1.86] (93-44-59-94.ip95.fastwebnet.it [93.44.59.94]) (authenticated bits=0) by smtp2.unina.it (8.14.4/8.14.4) with ESMTP id w36IRiB1029773 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 6 Apr 2018 20:27:45 +0200
From: Simon Pietro Romano <spromano@unina.it>
Message-Id: <A57BAD5A-E794-491B-914C-D3C7B78AD691@unina.it>
Content-Type: multipart/alternative; boundary="Apple-Mail=_FA59F3DE-5D53-4D8A-9C1A-AF0C4891D06C"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Date: Fri, 06 Apr 2018 20:27:43 +0200
In-Reply-To: <6E58094ECC8D8344914996DAD28F1CCD863A56@DGGEMM506-MBX.china.huawei.com>
Cc: "clue@ietf.org" <clue@ietf.org>, Adam Roach <adam@nostrum.com>, Roberta Presta <roberta.presta@unina.it>
To: "Roni Even (A)" <roni.even@huawei.com>
References: <6E58094ECC8D8344914996DAD28F1CCD863A56@DGGEMM506-MBX.china.huawei.com>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/clue/1zmS2Ge7bT9IhutM34SWKTdeqCk>
Subject: Re: [clue] Question about timers (retry and timeouts) in draft-ietf-clue-protocol
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/clue/>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Apr 2018 18:27:58 -0000

Hello again Roni,


> Hi,
> An important issue that was made by Adam during his AD review has to do with timeout and retry thresholds, please provide feedback
>  
> This is the comment:
> BLOCKER: General: There are several mentions of timeouts and retry thresholds in the text and its corresponding state machines; however, the document neither defines nor cites a document as defining what these timeout and retry values are. These need to be defined and described. If the timer and retry scheme allows the two ends of the connection to have different values for timeouts and number of retries, then there need to be additional error procedures that allow the MC and MP state machines to stay in sync (if the timer/retry values can be different, it's possible for one state machine to transition to "terminated," while the other is still active, and you need messaging to clean this up). The remainder of this comment is non-blocking: Related to this, the document frequently refers to retries as "expiring" (e.g., "retry expired" on the state diagrams). That doesn't really make sense unless "retry" is the name of a timer rather than a counter; I think you mean to say "exhausted" or something similar.
> 
>  
> My view as individual:
>  
> The CLUE protocol is delivered using SCTP “CLUE entities are required to use ordered SCTP message delivery, with full reliability” so there is no problem with timeout for retry for the data channel (https://tools.ietf.org/html/draft-ietf-clue-datachannel-14#section-3.3.2 <https://tools.ietf.org/html/draft-ietf-clue-datachannel-14#section-3.3.2> )
>  
> So the retry and timeout are for the application level and not for the sctp transport.
>  
> From the discussion on the mailing list during the WGLC
>  
> “The idea here is that the MC avoids entering a loop where the MP keeps on sending an erroneous ADV hence forcing the MC to respond with a NACK. If this situation iterates for a while (# of retries), the MC terminates the ongoing CLUE “session”. 
> I noticed that the discussion started even earlier and the conclusion was that retry and timeout are needed but we also need default values which were never listed
> My understanding is that when a receiver of a protocol message sends a negative ack as response he allows for reties of fixed message and will allow it n times or quit after x time if no new message arrives.
> I think that for retries any number is good but I think that 2 is OK since is the message sender cannot send a valid message we should abandon the call. As for timeout this can be a short one that will still allow for a round trip so my view that 1 second is enough
> Other thoughs?

There is indeed Adam’s answer to my answer, that I’m copy/pasting below for the benefit of the readers:

> In thinking through what this scheme should look like, it occurs to me that the CLUE messages are defined to be sent over a reliable transport (SCTP), which has its own retransmission timers and eventual timeouts. Implementing a retransmission scheme on top of a reliable transport -- especially one as aggressive as you suggest above -- will put more traffic on the network when congestion occurs rather than less.
> 
> So, in the final analysis, I think the action here is to remove retransmission timers and retry counts altogether. If the underlying transport takes longer to detect a failure than is sensible for CLUE (and it likely does), then a supervisory timer that declares the session failed might make sense.


I would be personally inclined to take Adam’s suggestion and get rid of those timers and retry counts altogether. What is your feeling about that?

Thanks,

Simon


                     				            _\\|//_
                           				   ( O-O )
      ~~~~~~~~~~~~~~~~~~~~~~o00~~(_)~~00o~~~~~~~~~~~~~~~~~~~~~~~~
                    				Simon Pietro Romano
             				 Universita' di Napoli Federico II
                		     Computer Engineering Department 
	             Phone: +39 081 7683823 -- Fax: +39 081 7683816
                                           e-mail: spromano@unina.it

		    <<Molti mi dicono che lo scoraggiamento è l'alibi degli 
		    idioti. Ci rifletto un istante; e mi scoraggio>>. Magritte.
               			                     oooO
       ~~~~~~~~~~~~~~~~~~~~~~~(   )~~~ Oooo~~~~~~~~~~~~~~~~~~~~~~~~~
					                 \ (            (   )
			                                  \_)          ) /
                                                                       (_/
>  
> One side comment in section 6.2 there are two instances of “number of timeouts” ?
>  
>  
>  
> Roni Even
> Clue co-chair
> _______________________________________________
> clue mailing list
> clue@ietf.org <mailto:clue@ietf.org>
> https://www.ietf.org/mailman/listinfo/clue <https://www.ietf.org/mailman/listinfo/clue>