Re: [Atoca] Requirement D2: "Large Audience"

"James M. Polk" <jmpolk@cisco.com> Tue, 18 January 2011 22:48 UTC

Return-Path: <jmpolk@cisco.com>
X-Original-To: earlywarning@core3.amsl.com
Delivered-To: earlywarning@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AC36728C0FC for <earlywarning@core3.amsl.com>; Tue, 18 Jan 2011 14:48:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.358
X-Spam-Level:
X-Spam-Status: No, score=-110.358 tagged_above=-999 required=5 tests=[AWL=-0.074, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_MILLIONSOF=0.315, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s7ompb8LI5ew for <earlywarning@core3.amsl.com>; Tue, 18 Jan 2011 14:48:25 -0800 (PST)
Received: from sj-iport-4.cisco.com (sj-iport-4.cisco.com [171.68.10.86]) by core3.amsl.com (Postfix) with ESMTP id 05DDA28C0D7 for <earlywarning@ietf.org>; Tue, 18 Jan 2011 14:48:25 -0800 (PST)
Authentication-Results: sj-iport-4.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAF6oNU2rRN+J/2dsb2JhbACkQnOoJ5o3gwuCRQSEbw
Received: from sj-core-3.cisco.com ([171.68.223.137]) by sj-iport-4.cisco.com with ESMTP; 18 Jan 2011 22:51:03 +0000
Received: from jmpolk-wxp01.cisco.com (rcdn-jmpolk-8716.cisco.com [10.99.80.23]) by sj-core-3.cisco.com (8.13.8/8.14.3) with ESMTP id p0IMp2OH004221; Tue, 18 Jan 2011 22:51:02 GMT
Message-Id: <201101182251.p0IMp2OH004221@sj-core-3.cisco.com>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Tue, 18 Jan 2011 16:51:01 -0600
To: Brian Rosen <br@brianrosen.net>, "<mark.wood@engineer.com>" <mark.wood@engineer.com>
From: "James M. Polk" <jmpolk@cisco.com>
In-Reply-To: <682E8C23-7988-4A4D-A03F-B00AB05CABF4@brianrosen.net>
References: <FDFC6E6B2064844FBEB9045DF1E3FBBC024A1E59@BD01MSXMB016.US.Cingular.Net> <002201cbb636$27cdf790$7769e6b0$@engineer.com> <5A054107-A965-433E-AAB4-D0C79FAF843E@brianrosen.net> <002101cbb705$c9f3f900$5ddbeb00$@engineer.com> <682E8C23-7988-4A4D-A03F-B00AB05CABF4@brianrosen.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Cc: earlywarning@ietf.org
Subject: Re: [Atoca] Requirement D2: "Large Audience"
X-BeenThere: earlywarning@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Discussion list for the IETF Authority-to-Citizen Alert \(atoca\) working group." <earlywarning.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/earlywarning>, <mailto:earlywarning-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/earlywarning>
List-Post: <mailto:earlywarning@ietf.org>
List-Help: <mailto:earlywarning-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/earlywarning>, <mailto:earlywarning-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jan 2011 22:48:26 -0000

Brian

Are you taking into account the difference in the packet size of the 
NOTIFY vs. the 200 OK? I believe any burst is the worry of the 
downstream capacity, and not the 200 OKs in the upstream.

That said, what events are we taking about accounting for?

- Local tornado/flooding alert? (affecting a creek or the eastern 
half of Australia?)
- metro amber alert? (backwoods sheriff's department or US Tri-state area?)
- hurricane alert? (Canary Islands or Katrina sized?)
- earthquake alert? (California dessert or Japan?)
- limited scale tsunami alert? (only Sumatra or the Indian Ocean?)
- ocean scale tsunami alert?

I think scoping - with examples - could serve some use in these 
discussions, and not just refer to them in the abstract.

Much of these will have vastly different scaling issues and design 
considerations - or will they?

We could go down the path of scaling regardless of the type of event 
that caused the alert. In other words, for a

    #1 - alerting <500 end-systems, do X

    #2 - alerting <10,000 end-systems, do Y

    #3 - alerting <500,000 end-systems, do Z

...all the way up to at least the ~140M that are in Japan because 
they have had events that would have alerted that many within the 
last 30 years.

Obviously, as the number of end-systems subscribe for a certain 
notification, the scale of the delivery system becomes more 
distributed to handle the load of the alerts and confirmations.  Our 
documents really ought to recommend ways to aggregate large scale 
confirmations too.

just a couple of thoughts

James

At 08:02 AM 1/18/2011, Brian Rosen wrote:
>I understand your concern.  As I stated, in really, really large 
>events, it may not be practical to get confirmation.  On the other 
>hand, when you have burst events that cause overload, we have 
>developed techniques for dealing with them.  This specific problem 
>is probably best addressed by a random backoff delay.  The devices, 
>upon receiving a broadcast wait a random time before attempting a 
>reply, and if they encounter busy conditions, increase the range of 
>the delay and delay another random time.
>
>Brian
>
>On Jan 18, 2011, at 6:49 AM, <mark.wood@engineer.com> 
><mark.wood@engineer.com> wrote:
>
> >
> > Thanks Brian,
> >
> > Brian's point is well made and correct, thanks Brian, but there are some
> > special issues that need to be borne in mind with some bearers.
> >
> > I first came into this project with the brief from the UN  to "protect the
> > (Mobile) mobile networks from catastrophic overload situations during
> > disasters".
> > When I "did the numbers" I discovered that the real problem is 
> not what many
> > think. The bottleneck is in the mobility management system
> > (HLR/VLR/Paging/Access grant). EU sponsored Studies by Prof Sophocles
> > Kiriazakos  for the Greek government after the Athens earthquake and
> > subsequent crash of the Greek networks, confirmed this.  This is what lead
> > me to work on cell broadcast (which does not use the mobility management
> > system at all).
> >
> > I agree that it is reasonable to allow that the acknowledgments may indeed
> > take much longer to 'traffic' than the outbound multicast 
> message. Obviously
> > the scale is the same both ways, but while the latency is critical going
> > forward, the reverse path is not in the least bit time critical, so
> > relatively slower 'best effort'  bearers would be fine. The server which is
> > wishing to know its subscribers got a message may send the message both by
> > unicast and multicast (as mine do), but inevitably the acknowledgement will
> > have to be unicast. There is no specific problem in allowing the
> > acknowledgement of a multicast message by unicast means as long as we
> > understand that the latency is indeterminate. (However since it's not clear
> > when the ack may come, I send the message by both means simultaneously
> > without waiting for acks.)
> >
> > My concern is really for the Mobile Network, at layer 2.
> > For example if a large number of terminals all receive a multicast at the
> > same time, then they will all want to acknowledge at the same time. This
> > will result is a tsunami wave of random access bursts to the cells uplink
> > timeslot, MSC call set up load, 'channel allocation algorithm' threads and
> > SDCCH allocation attempts. Then there will be huge load on the 
> SMS gateways.
> > Mobiles that don't get an access grant message will obviously try again but
> > for a while the whole mobility management system will be significantly
> > loaded. This affects circuit switched voice just as much because the
> > mobility management system is common for voice and SMS, (but maybe not
> > GPRS?). Recall that in cellular network design, erlang 
> calculations are done
> > such that it's the assumption that only a small fraction of terminals will
> > make random access burst attempts at any one time, so the mobility
> > management system is designed for this load only.
> >
> > In other words, consider that a public warning message (such as a USA  CMAS
> > presidential message) will reach 100% of terminals simultaneously, rather
> > than the small percentage that the signaling system can cope with. This is
> > why both the CMAS and ETSI standards intentionally disallow 
> embedded numbers
> > or URLs for large scale (Public) warnings.
> >
> > So in fact the "scale" of the problem may not be as significant as the
> > impact on the local infrastructure (such as a cell). Maybe 
> "scale" is a less
> > important factor than, let's say, penetration?
> >
> > On the other hand a smaller scale (of penetration)  message would not have
> > such a profound impact. So in some cases it may be reasonable to expect
> > acknowledgements in 'best effort'  time. Norway, for example,  likes this
> > approach.
> >
> > I am unclear as to if IP systems have such problems because there is not a
> > 'stateful'  mobility management system in the core and though acks are on a
> > large scale, they represent very small packets of less than 1K each. Maybe
> > the problem will go away in the future? Any comments on that?
> >
> > Warm regards Mark Wood DRCF.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Brian Rosen [mailto:br@brianrosen.net]
> > Sent: Monday, January 17, 2011 2:44 PM
> > To: <mark.wood@engineer.com>
> > Cc: earlywarning@ietf.org
> > Subject: Re: [Atoca] Requirement D2: "Large Audience"
> >
> > It may be, but I'd like to explore this a bit anyway.
> >
> > Millions of messages (acknowledgements) is a scale we can deal with today.
> > Hundreds of millions is probably beyond what we can deal with in a response
> > to a very large alert.
> >
> > Most systems consist of several smaller subsystems.  The purpose of an
> > acknowledgement is to make sure everyone got the message.  If the subsystem
> > can determine that every one of its clients got it, it can report that up
> > the line.  It can save missed acks for later analysis, or if there are few
> > enough of them, report them up.
> >
> > This means messages national scale which have small effectivity times can't
> > reasonably ask for message acknowledgement.  Anything smaller than that
> > probably can.
> >
> > Since most alerts really don't involve hundreds of millions of
> > notifications, most alerts probably can ask for them.
> >
> > If your delivery mechanism is multicast, the multicast mechanism itself
> > doesn't track who gets the alert in any way we can use.  That implies
> > something else is tracking who gets the alert, a complication that could
> > loom large.  Some systems do know who gets the alert (sometimes because it
> > knows who it is connected to, and all of them get the alert).  Certainly,
> > anything with a subscription has the characteristic that the sender knows
> > who all the recipients are.
> >
> > It's VERY valuable to know that every entity that should get the alert got
> > it.  The only other mechanism we have is some repeating of the sending in
> > the hopes that everyone got it.  In some cases you may have more than one
> > "path" to the same recipient.  That might be multiple devices, multiple
> > services, or multiple logical or physical connections.  You may try one
> > first, and if that doesn't get an ack, try another.  Although we 
> often think
> > of this mechanism as needing no more than seconds to deploy, in fact many
> > alerts would be fine with a few minutes, and trying some things 
> sequentially
> > may make sense.
> >
> > So, yes, probably a Tsunami alert to all of East Asia can't ask for
> > acknowledgements.  An "Amber Alert" (possible abducted child) to a county
> > might very well.  Certainly, a snow emergency closing to the parents of an
> > elementary school could.
> >
> > Brian
> >
> >
> > _______________________________________________
> > earlywarning mailing list
> > earlywarning@ietf.org
> > https://www.ietf.org/mailman/listinfo/earlywarning
>
>_______________________________________________
>earlywarning mailing list
>earlywarning@ietf.org
>https://www.ietf.org/mailman/listinfo/earlywarning