Re: [dispatch] draft-shen-soc-avalanche-restart-overload

worley@ariadne.com (Dale R. Worley) Wed, 12 February 2014 21:12 UTC

Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: dispatch@ietfa.amsl.com
Delivered-To: dispatch@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B51D1A0685 for <dispatch@ietfa.amsl.com>; Wed, 12 Feb 2014 13:12:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AXhqLGcq2_YK for <dispatch@ietfa.amsl.com>; Wed, 12 Feb 2014 13:12:03 -0800 (PST)
Received: from TheWorld.com (pcls6.std.com [192.74.137.146]) by ietfa.amsl.com (Postfix) with ESMTP id 6646F1A0647 for <dispatch@ietf.org>; Wed, 12 Feb 2014 13:12:02 -0800 (PST)
Received: from shell.TheWorld.com (svani@shell01.theworld.com [192.74.137.71]) by TheWorld.com (8.14.5/8.14.5) with ESMTP id s1CLABJD017638; Wed, 12 Feb 2014 16:10:13 -0500
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1]) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id s1CKtOi64867480; Wed, 12 Feb 2014 15:55:24 -0500 (EST)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id s1CKtFC64823393; Wed, 12 Feb 2014 15:55:15 -0500 (EST)
Date: Wed, 12 Feb 2014 15:55:15 -0500
Message-Id: <201402122055.s1CKtFC64823393@shell01.TheWorld.com>
From: worley@ariadne.com
Sender: worley@ariadne.com
To: Charles Shen <charles@cs.columbia.edu>
In-reply-to: <CAPSQ9ZWxwomvKJBKbSTpO8B83wis=7+oqkfZYE7cg3RQ7wcf3g@mail.gmail.com> (charles@cs.columbia.edu)
References: <CAPSQ9ZWxwomvKJBKbSTpO8B83wis=7+oqkfZYE7cg3RQ7wcf3g@mail.gmail.com>
Cc: dispatch@ietf.org
Subject: Re: [dispatch] draft-shen-soc-avalanche-restart-overload
X-BeenThere: dispatch@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DISPATCH Working Group Mail List <dispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dispatch>, <mailto:dispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dispatch/>
List-Post: <mailto:dispatch@ietf.org>
List-Help: <mailto:dispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dispatch>, <mailto:dispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Feb 2014 21:12:06 -0000

> From: Charles Shen <charles@cs.columbia.edu>

> I am looking for your kind opinion on what should be the appropriate next
> step for this document:
> 
> -- Should this draft be dispatched to SOC (and their charter amended)?
> -- Should this draft be processed as AD-sponsored?
> -- Should this draft be killed (if it is harmful)?

I do believe that this draft should be advanced, as this sort of
registration storm causes a problem in practice, though I don't have
any opinion on what would be the proper track for it.

It would be interesting to know if there have been any large-scale
implementations, and what the results have been in practice.

In regard to the draft itself, I think a few tweaks would improve it.

- Clarify that the Restart-Timer value is associated with the URI that
  is being registered (for REGISTER) or the URI/event that is being
  subscribed to (for SUBSCRIBE and PUBLISH).

- You may want a more clever way of handling multiple Restart-Timer
  values received from different servers during a boot sequence that
  sends requests to several servers (which may be incompletely
  coordinated with each other).  E.g., if the registration server has
  a Restart-Timer of 300 and the voicemail server also has a
  Restart-Timer of 300, it seems that the UA could safely wait
  rand(0:300) then register and subscribe.  If the VM server has a zero
  restart timer, the UA probably wants to wait until the registration
  is done anyway before subscribing to VM.  But if the VM server has a
  Restart-Timer of 600, there probably should be an additional delay
  between registration and subscribing.

  The trouble is that rand(0:300)+rand(0:300) doesn't have the same
  distribution as rand(0:600), so you may not want to say "Wait a random
  fraction of the difference between the two Restart-Timers."

  Perhaps a workable algorithm is "Choose a random real number
  uniformaly between zero and one.  Each bootup operation may be
  executed no earlier than (the random number) * (the specified
  Restart-Timer for that operation) seconds after power-up."  That
  causes each server to see the time-distribution of requests that it
  expects.

- The text suggests that the Restart-Timer value expires when the
  registration expires.  ("The validity duration of the Restart-Timer
  header is the same as that of the corresponding registration
  operation.")  That doesn't work at all, because the power failure
  may exceed the length of all the registrations.  The Restart-Timer
  value has to be saved until another value is received for that same
  target.

- You may want to allow/require the UA to place an upper limit on the
  Restart-Timer value.  At the least, Restart-Timer should not exceed
  the maximum registration/subscription duration the UA requests and
  the server provides.

- I expect that you want to require that if a REGISTER response is
  received that does not contain Restart-Timer, then the saved
  Restart-Timer value is set to zero.  That causes the expected
  behavior when a registrar that supports Restart-Timer is replaced
  with one that does not.

Dale