Re: [lmap] degrees of Measurement Suppression

Charles Cook <charles.cook2@centurylink.com> Wed, 11 December 2013 21:49 UTC

Return-Path: <charles.cook2@centurylink.com>
X-Original-To: lmap@ietfa.amsl.com
Delivered-To: lmap@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C2EA1AE11B for <lmap@ietfa.amsl.com>; Wed, 11 Dec 2013 13:49:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QEXKvZk9-bMQ for <lmap@ietfa.amsl.com>; Wed, 11 Dec 2013 13:49:25 -0800 (PST)
Received: from sudnp799.qwest.com (sudnp799.qwest.com [155.70.32.99]) by ietfa.amsl.com (Postfix) with ESMTP id 4DAE91AE117 for <lmap@ietf.org>; Wed, 11 Dec 2013 13:49:25 -0800 (PST)
Received: from lxomavmpc030.qintra.com (lxomavmpc030.qintra.com [151.117.207.30]) by sudnp799.qwest.com (8.14.4/8.14.4) with ESMTP id rBBLnF1L028443 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 11 Dec 2013 14:49:15 -0700 (MST)
Received: from lxomavmpc030.qintra.com (unknown [127.0.0.1]) by IMSA (Postfix) with ESMTP id CC8EC1E006F; Wed, 11 Dec 2013 15:49:09 -0600 (CST)
Received: from suomp60i.qintra.com (unknown [10.6.10.61]) by lxomavmpc030.qintra.com (Postfix) with ESMTP id A7AF51E0058; Wed, 11 Dec 2013 15:49:09 -0600 (CST)
Received: from suomp60i.qintra.com (localhost [127.0.0.1]) by suomp60i.qintra.com (8.14.4/8.14.4) with ESMTP id rBBLn9QK014933; Wed, 11 Dec 2013 15:49:09 -0600 (CST)
Received: from [10.188.113.236] (x1069818.dhcp.intranet [10.188.113.236]) by suomp60i.qintra.com (8.14.4/8.14.4) with ESMTP id rBBLn6m9014830; Wed, 11 Dec 2013 15:49:07 -0600 (CST)
Message-ID: <52A8DDD2.1060606@centurylink.com>
Date: Wed, 11 Dec 2013 14:49:06 -0700
From: Charles Cook <charles.cook2@centurylink.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121005 Thunderbird/16.0
MIME-Version: 1.0
To: Sharam Hakimi <sharam.hakimi@exfo.com>
References: <2D09D61DDFA73D4C884805CC7865E611303993CD@GAALPA1MSGUSR9L.ITServices.sbc.com> <A68F3CAC468B2E48BB775ACE2DD99B5E04A0A8E0@podcwmbxex505.ctl.intranet> <20131210223947.GD39105@idrathernotsay.com> <A68F3CAC468B2E48BB775ACE2DD99B5E04A0AE24@podcwmbxex505.ctl.intranet> <084CDC75FEC1E640B60338273BEACDFA029B235C@spboexc01.exfo.com>
In-Reply-To: <084CDC75FEC1E640B60338273BEACDFA029B235C@spboexc01.exfo.com>
X-Enigmail-Version: 1.4.6
Content-Type: multipart/alternative; boundary="------------090602030408000308000508"
X-CFilter-Loop: Reflected
Cc: Steve Miller <steve@idrathernotsay.com>, "Bugenhagen, Michael K" <Michael.K.Bugenhagen@centurylink.com>, lmap@ietf.org, "STARK, BARBARA H" <bs7652@att.com>
Subject: Re: [lmap] degrees of Measurement Suppression
X-BeenThere: lmap@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: charles.cook2@centurylink.com
List-Id: Large Scale Measurement of Access network Performance <lmap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lmap>, <mailto:lmap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lmap/>
List-Post: <mailto:lmap@ietf.org>
List-Help: <mailto:lmap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lmap>, <mailto:lmap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Dec 2013 21:49:30 -0000

Summarizing this email chain:

·         Although not prohibited, measurements without end times (ones
that could run forever) should be discouraged.  (I do think I saw a
sentence in the draft along these lines.)

·         It may be beneficial to place caps on end-user requested
measurements.

·         In the absence of a crystal ball, it is important to have a
measurement system that will scale, and be able to successfully turn off
measurement activities should there be a need to do so.

·         We don't want to literally do "anything possible" to improve
communications in an emergency.  But we do want to have
simple-to-implement mechanisms to control measurement traffic that will
help improve overall non-measurement communications.

·         We want the measurement system to have sufficient flexibility
that ISP Operations can use it to conduct various exercises in their own
network.

·         We don't want to make the measurement system complex.

·         Measurement traffic should be a small fraction, "way down in
the weeds", of the overall traffic in the network.

·         In an emergency, it may not be possible to reliably reach an MA.

 

Possible Suppression Solutions:

1)      Continual (periodic) 2-way communications between the
Measurement Controller and the MA.  This was given as an example, and it
was pointed out that in an emergency this approach may be unreliable.

2)      A two-state on/off mechanism that is controlled by the
Measurement Controller sending Instructions to the MA.  This has the
issue that communication between the Measurement Controller and MA may
not be reliable in an emergency.  Additionally, this requires the
Measurement Controller to send a message to each MA that has a schedule
that permits it to participate in measurements at the time of the emergency.

3)      A two-state on/off mechanism that is controlled by a timeout at
the MA if the MA has not heard anything from the Measurement
Controller.  This results in the MA eventually terminating measurements,
but it requires that the Measurement Controller to periodically send
keep-alive messages when the network is not in an emergency state. 
Also, the responsiveness of the network depends on what the Measurement
Controller keep-alive period is set to, and the length that the timer in
the MA is set to.

4)      A two-state on/off mechanism where the MA periodically checks
the state of the network and conducts behavior based on the state of the
network.  If the MA is unable to retrieve network state information, it
assumes the network is in an emergency state and acts accordingly.  The
state of the network could be provided by the Measurement Controller, a
server in the Operational Domain that is responsible for the MAs
belonging to that Operational Domain, or both.  The Operational Domain
can provision the MA as appropriate (frequency of network state checks,
address of network state server(s)).

 

I don't think that any of these solutions would preclude a Measurement
Controller sending a blank Instruction to the MA to shut down
measurements (assuming the network is able to reliably deliver the
Instruction to the MA). 

 

I am sure there are other solutions.  Of the ones listed so far, I
prefer #4.

 

There is also a question of whether two network states are sufficient. 
We don't want too many states because that results in additional
complexity.  Possible solutions include:

 

1)      Two states:  on (continue measurements), and off (cease
measurements immediately).

2)      Three states:  on (continue measurements), off (cease
measurements immediately), and limited (limit measurements per
parameters in the Instruction sent by the Measurement Controller). 

3)      N states:  ???

4)      Two states mandated, but additional states allowed.  Simple MAs
will support two states (on and off).  More advanced MAs can support
more elaborate responses to a network emergency.

 

Again, I am sure there are other solutions.  Of the ones listed so far,
I prefer #2 and #4.

 

Charles

On 12/11/2013 9:14 AM, Sharam Hakimi wrote:
> Mike,
> There two ways of having this control which would work in the system and
> do what you would like. They are
>
>
> 	1: The MA device MUST stop testing if it has not heard from the
> MC in a designated time  ( we had talked about 		putting this in
> the document.
>
> 	2: In an emergency the MC can issue a BLANK test schedule that
> would override whatever the MA is scheduled to do.
> 		( and if MA cannot receive this it will automatically
> stop)
>
>
> Thanks,
> Sharam 
>
>
> 		
>
> -----Original Message-----
> From: lmap [mailto:lmap-bounces@ietf.org] On Behalf Of Bugenhagen,
> Michael K
> Sent: Wednesday, December 11, 2013 9:21 AM
> To: 'Steve Miller'
> Cc: 'STARK, BARBARA H'; 'lmap@ietf.org'
> Subject: Re: [lmap] degrees of Measurement Suppression
>
> Steve -
>
>     Operational design groups have these frameworks historically - and
> they control the deployment of test systems so adoption issues can and
> do occur with these types of systems that can load the network.
> I think it's fairly simple to take away those concerns by addressing
> them vs. discarding what may be a legitimate issue for and ISP to adopt
> the framework.        
>
>
> If we can't reach consensus on the hard requirement I recommend one of
> two paths.
> 1) Add those added safety measures but make them optional.
> And or 
> 2) Partition the tests (which I think we're doing) so that they can be
> adopted into existing test control schemas that fit the ISP's need.
>
>
>
> My concern on the way it is currently drafted -
>   I'm just afraid that the current control schema might miss the target
> for the #1 use case for all ISP's, which "large" somewhat infers all
> ISP's would find the control framework suitable.
> I understand the added complexity argument, but if it's required to meet
> hard operational criteria (and this has been discussed else where) then
> it's probably worth the effort to ensure that this work is
> reference-able and revelent to all.     There are a few more Key blind
> spots in this framework as well that make it's current form and
> longevity suspect.
>
>
>
> To resolve the "is this a real op's issue"  -
> Let's see if we can't get some IETF ISP members to ping their OP's
> groups on if they restrict load testing the network for any reason - if
> yes.. then we need to look at phases.
> BTW - every Op's group I've been in has situations where they don't from
> the customer level on UP... aka you don't load a congested customer
> unless you're taking them out of service and they know it, and you don't
> stress a "trunk" of shared path that's severely congested and in trouble
> either through fault or due un-usual circumstance.   (an earthquake
> drops building in a city and you want every bit available to be there
> for use, not test traffic).
>
> Cheers - 
> Mike
>
>
>
>
>
>
>
>
>  
>
>
>
>
> -----Original Message-----
> From: Steve Miller [mailto:steve@idrathernotsay.com] 
> Sent: Tuesday, December 10, 2013 4:40 PM
> To: Bugenhagen, Michael K
> Cc: 'STARK, BARBARA H'; 'lmap@ietf.org'
> Subject: Re: [lmap] degrees of Measurement Suppression
>
>    If we are testing at a rate where leaving the tests running during a
> national emergency will make a meaningful difference as to whether or
> not lives are saved, we shouldn't be testing at that rate even during
> normal operation.  Relying on the ability to turn anything off -- in an
> active manner at least -- during an emergency seems like something we
> should avoid: I'd think that anyone who's experienced making a phone
> call just after a natural disaster would be able to testify as to the
> flakiness of any communications medium under those circumstances.
>
>    A key (and maybe *the* key) element here is that in many cases the
> time you must want to hit the big red off switch is the time at which
> you are least able to communicate with the MAs.
>
>    The other autoshutdown stuff that's been discussed so far would give
> people the flexibility to implement the only policy that truly seems
> important: the one that disables measurements after N
> seconds/minutes/hours/days in which the MA can't reach the controller.
> It's been my experience that no matter how important that seems on
> paper, it ends up being much, much less critical than it seems in the
> Real World.  Still, leaving this binary on/off level of control in place
> makes some sense: it's easy to code, it's pretty innocuous, the
> additional load on the controller infrastructure shouldn't be that bad,
> and it avoids the "three million old devices hammering some poor
> destination that belongs to the people who bought some defunct ISP"
> problem.  It's also been my experience that even that level of control
> isn't needed and can in fact cause problems of its own, but I recognize
> that this scenario is a little different.  I still think we should not
> overcomplicate things.
>
> 	-Steve
> 	(who still needs to read the updated I-Ds, hopefully early next
> week, sorry)
>
> On Tue, Dec 10, 2013 at 10:16:58PM +0000, Bugenhagen, Michael K wrote:
>> I concur with Barbara -
>>
>>   Stated a bit differently - If LMAP says put this probe on everything
> (Part of the introduction statement)...
>> Then 
>>
>> 1)  They will want a suppression method that allows them to disable it
> in case of national emergency - saving lives via emergency communication
> takes priority in their minds.
>>     So they typically will ask for 2 each Safety breaks.
>> 		1 from the controller
>> 		A second from a element controller or whatever
> configures the element in the first place.   
>>    These "MA unavailable" states should also be recorded for
> transparency sake.
>>
>> 2)  Op's folks like to test - even when things are going bad..
> otherwise fault isolation is hard to do.
>> 	So suppression should have 3 levels (again this goes back to
> everyone has a probe) -
>> 	Green = do as you will (test anything)
>> 	Yellow - don't load anything - restrict testing that will load
> the network, or kick up CPU cycles ...but allow fault testing (NO
> Saturation tests, ....)(
>> 	RED - Stop, don't pass go, ... ADHOC testing only
>>
>>
>> If USE case 1 is the ISP - not adopting those types of Op's
> requirements would get a significant amount of pushback on the
> implementation side.
>> Regards,
>> Mike
>>
>>
>>
>>
>> -----Original Message-----
>> From: lmap-bounces@ietf.org [mailto:lmap-bounces@ietf.org] On Behalf
> Of STARK, BARBARA H
>> Sent: Tuesday, November 12, 2013 10:50 AM
>> To: lmap@ietf.org
>> Subject: [lmap] degrees of Measurement Suppression
>>
>> A number of providers have discussed a model of Measurement
> Suppression that supports more granular degrees of suppression (other
> than just suppress and don't suppress). Michael Bugenhagen presented the
> original version of this model, and he said it would be ok if I brought
> it up to IETF.
>> Following are highlights of the proposal (with some modifications I've
> included as a result of additional discussion):
>> 1. Measurement Methods (or perhaps Tasks) have a configurable
> parameter that indicates whether the Controller operator considers them
> to be "critical for OAM", "not critical, but not resource intensive",
> and "not critical and resource intensive".
>> 2. The Controller can specify degrees of Measurement Suppression,
> which should include: halt all non-critical tasks immediately but allow
> OAM tasks; halt resource-intensive tasks immediately but allow
> non-resource-intensive tasks; finish resource intensive tasks but do not
> start new resource-intensive tasks and allow all non-resource-intensive
> tasks; allow all tasks.
>> 3. It may also be possible for the unspecified bootstrap mechanism to
> instruct the MA to suppress measurements. Where multiple channels exist
> for Measurement Suppression (e.g., Controller and bootstrap), the MA is
> to use the most restrictive setting.
>> Barbara
>>
>>
>> _______________________________________________
>> lmap mailing list
>> lmap@ietf.org
>> https://www.ietf.org/mailman/listinfo/lmap
>> _______________________________________________
>> lmap mailing list
>> lmap@ietf.org
>> https://www.ietf.org/mailman/listinfo/lmap
> _______________________________________________
> lmap mailing list
> lmap@ietf.org
> https://www.ietf.org/mailman/listinfo/lmap
> _______________________________________________
> lmap mailing list
> lmap@ietf.org
> https://www.ietf.org/mailman/listinfo/lmap

-- 

Charles Cook 
Principal Architect
Network
5325 Zuni Street; Suite 224
Denver, CO  80221
Tel:  303.992.8952  Fax:  925.281.0662
charles.cook2@centurylink.com