Re: [bmwg] Software Update Doc

"MORTON JR., ALFRED (AL)" <acmorton@att.com> Tue, 05 March 2013 13:53 UTC

Return-Path: <acmorton@att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DB24321F8949 for <bmwg@ietfa.amsl.com>; Tue, 5 Mar 2013 05:53:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.599
X-Spam-Level:
X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id guRKBml2Tnya for <bmwg@ietfa.amsl.com>; Tue, 5 Mar 2013 05:53:40 -0800 (PST)
Received: from mail-pink.research.att.com (mail-pink.research.att.com [192.20.225.111]) by ietfa.amsl.com (Postfix) with ESMTP id A5A3121F893D for <bmwg@ietf.org>; Tue, 5 Mar 2013 05:53:39 -0800 (PST)
Received: from mail-blue.research.att.com (unknown [135.207.178.11]) by mail-pink.research.att.com (Postfix) with ESMTP id 6088D1201CA; Tue, 5 Mar 2013 08:56:10 -0500 (EST)
Received: from njfpsrvexg7.research.att.com (njfpsrvexg7.research.att.com [135.207.177.33]) by mail-blue.research.att.com (Postfix) with ESMTP id 88DA1EF667; Tue, 5 Mar 2013 08:53:24 -0500 (EST)
Received: from njfpsrvexg7.research.att.com ([fe80::3598:75fe:b400:9299]) by njfpsrvexg7.research.att.com ([fe80::3598:75fe:b400:9299%11]) with mapi; Tue, 5 Mar 2013 08:53:24 -0500
From: "MORTON JR., ALFRED (AL)" <acmorton@att.com>
To: Sarah Banks <sbanks@aerohive.com>
Date: Tue, 05 Mar 2013 08:53:23 -0500
Thread-Topic: [bmwg] Software Update Doc
Thread-Index: AQHODyl3zIKhmzAZkEWfuNz+90ntf5iCx58AgAjr3ICAAVYpAIAJdzCAgABkdwCAAEpV0A==
Message-ID: <F1312FAF1A1E624DA0972D1C9A91379A1BF8990245@njfpsrvexg7.research.att.com>
References: <87AD9F93929539479A256B7EDD36C82B7BA7A061@hq-mbx2.aerohive.com> <5124670C.2000303@networktest.com> <E9C76483-6202-4F37-BF5B-BB7805CCD047@aerohive.com> <512D01EA.6010200@networktest.com> <F1312FAF1A1E624DA0972D1C9A91379A1BF8990139@njfpsrvexg7.research.att.com> <858A761C-18DC-4C75-97AF-9CBA65FC42D7@aerohive.com>
In-Reply-To: <858A761C-18DC-4C75-97AF-9CBA65FC42D7@aerohive.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "bmwg@ietf.org" <bmwg@ietf.org>
Subject: Re: [bmwg] Software Update Doc
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2013 13:53:42 -0000

Thanks for your reply, Sarah.

I have pulled out several questions for replies:

> 2nd paragraph
> This is the critical phase of the ISSU, where the control plane MUST
> not be impacted and any interruptions to the forwarding plane should
> be minimal to none.
> 
> This is a case where the ideal results have been expressed as a requirement
> for the DUT to meet, rather than an anticipated result.
> 
> I suggest NEW:
> There will be continuous monitoring of control-plane and forwarding-plane
> functions, anticipating that any interruptions SHOULD be characterized.
> 

 Hrmm. The difference between the two statements is that the suggested revised wording would not
 exclude interruption, and would require that it be captured; IMHO, an ISSU isn't an ISSU if 
 there's a drop or break in control plane. Is there an amenable way to compromise here?

ACM - sure, I think the procedure needs some objective way to judge
that interruptions are "minimal to none", but we can't try to place 
a performance requirement on the DUT in a benchmarking RFC, and that's
how I read the "MUST" in the 2nd para...

> > -=-=-=-=-=-
> > 4.2 Load Model
> > 2nd para
> > For mixed protocol environments, frames SHOULD be distributed
> > between all the different protocols. The distribution SHOULD
> > approximate the network conditions of deployment. In all cases, the
> > details of the mixed protocol distribution MUST be included in the
> > reporting.
> >
> > ?? what protocols are being mixed?  IPv4 and IPv6? please clarify
> >
> 
> It would seem that you're looking for us to denote specifically what
> protocols, whereas the section calls for what you have today; if you have
> an IPv4 only network today, then it'd be an IPv4 protocol being used. I
> think the spirit of the statement is very different; we want you to use
> what's representative of your network today, not protocol a, b and c that
> make vendor a, b and c look good on paper. Looking good on paper is great,
> but if it's not representative of your network, as an operator, what's the
> point?

ACM - I'm simply pointing out that the phrase 
"Mixed Protocol Environments" has a lot of latitude. 
Just give some examples, like you did in your reply...

> ...Overall, I think the agreement was to
> post THIS (the current version of the doc, the one you've been commenting
> on) on March 11th, and then revise thereafter. Is this still desirable?
> Let me reword; It's unlikely I'll be able to meet with my coauthors and
> work in changes, given our current geographic and timezone challenges. :)
> Can we still proceed as aforementioned?

ACM - yes, post this version on Monday.

thanks,
Al

> -----Original Message-----
> From: Sarah Banks [mailto:sbanks@aerohive.com]
> Sent: Monday, March 04, 2013 8:14 PM
> To: MORTON JR., ALFRED (AL)
> Cc: bmwg@ietf.org
> Subject: Re: [bmwg] Software Update Doc
> 
> Hi Al,
> 	Thanks for your feedback.
> Let me answer inline.
> On Mar 4, 2013, at 11:14 AM, "MORTON JR., ALFRED (AL)" <acmorton@att.com>
> wrote:
> 
> >> On 2/25/13 2:16 PM, Sarah Banks wrote:
> >> David Newman wrote:
> >> ...
> >>> Having said this, I think that if or when downgrades are covered, the
> >>> methodology would be exactly the same. Would you agree?
> >>
> >> Yes.
> >>
> >> dn
> >
> > I tend to agree, possibly without the download step in some DUTs
> > (though this seems to be included in section 6 on rollback).
> >
> 
> So let me point out - the document scopes ISSU, not ISSD; and a change to
> the next rev (after we post on March 11) to discuss WHAT makes an ISSU an
> ISSU; for example, it'll deal with questions like, "Can I upgrade just the
> Line Cards and not the Route Engines and still have it be an ISSU?" Rather
> than get into which vendors do what for downgrades, as I think there's a
> good amount of misunderstanding and confusion there, I thought we were
> clear on the scope. I am not opposed to ISSD; but I do have reservations.
> 
> > I'm happy to see this draft in text form, after much discussion
> > and support for the topic in general (when on the agenda in the past).
> >
> 
> +1. it's been a long road getting here. Thanks for your support!
> 
> > Editorial:  this paragraph indent is a little distracting,
> > I assume this is a MSWord source?  Let's talk doc formatting...
> 
> Yes sir, it is indeed an MSWord source, with two active editors. :) Shall
> we take the formatting discussion offline?
> 
> >
> > 1. Introduction
> >
> > ISSU is a technique implemented by forwarding devices to upgrade or
> >   downgrade from one software version to another as applicable. The
> >
> > -=-=-=-=-
> >
> > Editorial:  it's a good idea to avoid wording that might lead to
> acceptance
> > criteria in bmwg specs. I'll try to flag them all, but here's an
> example:
> >
> > 3.1. Software Download
> > In this first phase, ...
> > ... Such separation
> > allows an administrator to download the new code inside or outside
> > of a maintenance window; it is expected that downloading new code
> > and saving it to disk on the router will not impact operations.
> >
> > s/expected/anticipated/
> >
> 
> OK
> 
> > -=-=-=-=-
> >
> > later in 3.1, Verification is purely functional testing, did it happen
> as planned
> > or not?
> >
> > ... Verification should include both positive verification (ensuring
> > that an ISSU action should be permitted) as well as negative tests
> > (creation of scenarios where the verification mechanisms would
> > report exceptions).
> >
> > It would be good to separate procedural verifications as pre-requisites
> > for the benchmarking tests that follow. All good benchmarks
> assume/require
> > proper functions are performed. I see that similar verifications are
> > included in section 3.2, Software Staging, because at that stage they
> are an
> > inextricable part of the process.
> >
> 
> ok
> 
> > -=-=-=-=-
> >
> > 3.3 Upgrade Run
> >
> > 2nd paragraph
> > This is the critical phase of the ISSU, where the control plane MUST
> > not be impacted and any interruptions to the forwarding plane should
> > be minimal to none.
> >
> > This is a case where the ideal results have been expressed as a
> requirement
> > for the DUT to meet, rather than an anticipated result.
> >
> > I suggest NEW:
> > There will be continuous monitoring of control-plane and forwarding-
> plane
> > functions, anticipating that any interruptions SHOULD be characterized.
> >
> 
> Hrmm. The difference between the two statements is that the suggested
> revised wording would not exclude interruption, and would require that it
> be captured; IMHO, an ISSU isn't an ISSU if there's a drop or break in
> control plane. Is there an amenable way to compromise here?
> 
> > -=-=-=-=-=-
> > Editorial:
> > 4.1 Test Topology
> > The hardware configuration of the DUT (Device Under test) MUST be
> > identical to the one expected to be or currently deployed in
> > production.
> >
> > suggest inserting the phrase below at end of first sentence:
> > ... in order for the benchmark to have relevance in production.
> >
> 
> ok
> 
> > -=-=-=-=-=-
> > 4.2 Load Model
> > 2nd para
> > For mixed protocol environments, frames SHOULD be distributed
> > between all the different protocols. The distribution SHOULD
> > approximate the network conditions of deployment. In all cases, the
> > details of the mixed protocol distribution MUST be included in the
> > reporting.
> >
> > ?? what protocols are being mixed?  IPv4 and IPv6? please clarify
> >
> 
> It would seem that you're looking for us to denote specifically what
> protocols, whereas the section calls for what you have today; if you have
> an IPv4 only network today, then it'd be an IPv4 protocol being used. I
> think the spirit of the statement is very different; we want you to use
> what's representative of your network today, not protocol a, b and c that
> make vendor a, b and c look good on paper. Looking good on paper is great,
> but if it's not representative of your network, as an operator, what's the
> point?
> 
> > -=-=-=-=-=-
> >
> > later, at the end of 4.2:
> >
> > All in all, the load model should attempt to simulate the production
> > network expectations to the greatest extent possible in order to
> > maximize the applicability of the results generated.
> >
> > s/expectations/conditions/
> >
> 
> ok
> 
> > -=-=-=-=-
> >
> > 5.2 Software Staging
> >
> > General comment:  It is simpler in a procedure like this to ask the
> tester
> > to record the current time at different steps, then calculate the
> durations
> > later.
> >
> > Start time:                          T0
> > Secondary RP stabilized to new code: T1
> > Start of Upgrade Run:                T2
> > Completion of Upgrade:               T3
> >
> > Then when reporting (section 7), it looks like:
> >
> >                                    Duration
> > Software Download/Secondary Update: T1 - T0
> > Upgrade Run:                        T3 - T2
> > ...
> >
> > worth a try?
> >
> > -=-=-=-=-
> >
> > 5.3 Upgrade Run
> >
> > 4th para
> > ... Examine the traffic
> > generators for any indication of traffic loss over this interval. If
> > the Test Set reported any traffic loss interval, note the duration
> > of the outage as "TP".
> >
> > The above requires more detail, so that outage durations are
> > reported the same way.  I think the recently approved MPLS
> > protection methods can provide some help here:
> > http://tools.ietf.org/html/draft-ietf-bmwg-protection-meth-14#page-12
> >
> 
> OK, we'll take a look.
> 
> > -=-=-=-=-
> >
> > 5.4 Post ISSU
> > ... last bullet
> >
> > * Document the hitless or outage dark windows detected based upon
> > the (TP) counter value (provided by the Test Set)
> >
> > update to reflect the metric chosen: although "outage dark windows"
> > sounds pretty cool, I don't know how to measure them.
> >
> 
> I agree. We'll work this out after next rev.
> 
> Al, thanks for the great feedback. Overall, I think the agreement was to
> post THIS (the current version of the doc, the one you've been commenting
> on) on March 11th, and then revise thereafter. Is this still desirable?
> Let me reword; It's unlikely I'll be able to meet with my coauthors and
> work in changes, given our current geographic and timezone challenges. :)
> Can we still proceed as aforementioned?
> 
> Thanks
> Sarah
>