[bmwg] Comments on draft-ietf-bmwg-protection-meth-04.txt

Al Morton <acmorton@att.com> Mon, 16 February 2009 21:19 UTC

Return-Path: <acmorton@att.com>
X-Original-To: bmwg@core3.amsl.com
Delivered-To: bmwg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0D8A23A6D1A for <bmwg@core3.amsl.com>; Mon, 16 Feb 2009 13:19:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.796
X-Spam-Level:
X-Spam-Status: No, score=-105.796 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, MSGID_FROM_MTA_HEADER=0.803, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DGU5nbda5PRn for <bmwg@core3.amsl.com>; Mon, 16 Feb 2009 13:19:35 -0800 (PST)
Received: from mail121.messagelabs.com (mail121.messagelabs.com [216.82.242.3]) by core3.amsl.com (Postfix) with ESMTP id 674033A6844 for <bmwg@ietf.org>; Mon, 16 Feb 2009 13:19:35 -0800 (PST)
X-VirusChecked: Checked
X-Env-Sender: acmorton@att.com
X-Msg-Ref: server-9.tower-121.messagelabs.com!1234819183!23716585!1
X-StarScan-Version: 6.0.0; banners=-,-,-
X-Originating-IP: [144.160.128.141]
Received: (qmail 3373 invoked from network); 16 Feb 2009 21:19:44 -0000
Received: from sbcsmtp9.sbc.com (HELO flph161.enaf.ffdc.sbc.com) (144.160.128.141) by server-9.tower-121.messagelabs.com with AES256-SHA encrypted SMTP; 16 Feb 2009 21:19:44 -0000
Received: from enaf.ffdc.sbc.com (localhost.localdomain [127.0.0.1]) by flph161.enaf.ffdc.sbc.com (8.14.3/8.14.3) with ESMTP id n1GLJhm1023863 for <bmwg@ietf.org>; Mon, 16 Feb 2009 13:19:43 -0800
Received: from klph001.kcdc.att.com (klph001.kcdc.att.com [135.188.3.11]) by flph161.enaf.ffdc.sbc.com (8.14.3/8.14.3) with ESMTP id n1GLJaZ7023823 for <bmwg@ietf.org>; Mon, 16 Feb 2009 13:19:37 -0800
Received: from kcdc.att.com (localhost.localdomain [127.0.0.1]) by klph001.kcdc.att.com (8.14.0/8.14.0) with ESMTP id n1GLJaQe018369 for <bmwg@ietf.org>; Mon, 16 Feb 2009 15:19:36 -0600
Received: from maillennium.att.com (mailgw1.maillennium.att.com [135.25.114.99]) by klph001.kcdc.att.com (8.14.0/8.14.0) with ESMTP id n1GLJU0L018260 for <bmwg@ietf.org>; Mon, 16 Feb 2009 15:19:30 -0600
Message-Id: <200902162119.n1GLJU0L018260@klph001.kcdc.att.com>
Received: from acmt.att.com (dyp004275dys.mt.att.com[135.16.251.250](misconfigured sender)) by maillennium.att.com (mailgw1) with SMTP id <20090216211929gw1000u7jme>; Mon, 16 Feb 2009 21:19:29 +0000
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Mon, 16 Feb 2009 16:19:29 -0500
To: bmwg@ietf.org
From: Al Morton <acmorton@att.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Subject: [bmwg] Comments on draft-ietf-bmwg-protection-meth-04.txt
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://www.ietf.org/mailman/private/bmwg>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Feb 2009 21:19:37 -0000

...-bmwg-protection-meth-04 Authors,

Since there have been some comments during WGLC,
I hope you can take these late comments into account.
I only just made time to do a review during a flight.

I've raised some issues below that the test equipment community
may be interested to comment on.

Al (as a participant)

General:
--------

This memo is very nearly ready for publication request,
and closing the loop on a several details will make it so.

There are lots of section references that need +2 (3.1 -> 5.1)
or other fix-up.

Employing the "standard" Intro/Security paragraphs would help.

This work effort uses the IGP-Dataplane convergence work as a
foundation, and might also benefit by adding a new section on:
>    3.2.9 Tester Capabilities
>    It is RECOMMENDED that the Tester used to execute each test case
>    have the following capabilities:
>       1. Ability to insert a timestamp in each data packet's IP
>          payload.
>       2. An internal time clock to control timestamping, time
>          measurements, and time calculations.
>       3. Ability to distinguish traffic load received on the Preferred
>          and Next-Best interfaces.
>       4. Ability to disable or tune specific Layer-2 and Layer-3
>          protocol functions on any interface(s).
and some of this is applicable to MPLS-FRR.  A section like this would
be a good place to expand on the capabilities needed to distinguish
between impaired and unimpaired packets (per-flow sequence numbers).

I can make a lot of word-smithing suggestions,
but they would be too tedious to type-up explicitly,
and then for someone else to locate and change.

One key set of related issues to reconcile in section 7:
--------------------------------------------------------
Definition of "maximum forwarding rate (pps)"
(currently assessed in section 7.5)
In my opinion,
  - this is a setup condition needed for all the other
procedures in section 7, so this section should come first.
  - there should be a reference to the existing definitions
of Forwarding Rate and Maximum Forwarding Rate
http://tools.ietf.org/html/rfc2285#section-3.6.3
and their methods of measurement
(or possibly use Throughput instead, and refer to RFC 2544 and
draft-ietf-bmwg-mpls-forwarding-meth-01.txt for the method).

Also, there is no test to verify that *all* the offered load
is switched to the backup path and that stability is achieved
before turning off the load and assessing switching time.

Another key issue, assumption of Reversion performance:
-------------------------------------------------------
The draft says (in many places) that Reversion should not produce
packet loss, but this really depends on the implementation.
Also, our charter states:
>Because the demands of a particular technology may vary from deployment
>to deployment, a specific non-goal of the Working Group is to define
>acceptance criteria or performance requirements.
I think we've managed to convey a "desirable outcome" in the past
without wording that might be misinterpreted as a requirement


Other specific comments:
------------------------

S1 Intro:
---------
The 3rd and 4th paragraphs describe correlated failures and
planned failures, but they aren't particularly useful in the
remainder of the memo:
  - The scenarios that test correlated failures aren't identified
as such.  Sec 5.1 has a few examples (Parent Interface shut,
Reload/Crash Protected Node), but most are single failures.
  - The notion of a "Planned Failure" that takes a node or interface
out-of-service involves other activities to minimize the customer
impact, such as pre-notification, time of day, and diverting traffic
before taking action (anything out of service). Since my day-job
is with a service provider, I think that unplanned failures should
be the main emphasis here.

S2 Scope:
---------
I think that the last paragraph of Sec 1 should be part of the Scope
section.

The current 1st paragraph mentions "scenarios" twice, but section
5.1 lists and defines all the failure "events", and events are not
mentioned.  Maybe this would be better:
    This document provides detailed test cases along with different
    topologies and scenarios that should be considered to effectively
    benchmark MPLS protection mechanisms and failover times. Different
    failure *events* and scaling considerations are also provided in
    this document, in addition to reporting formats for the observed
    results.

The last sentence of the Scope:
    Benchmarking of unexpected correlated failures is currently out of
    scope of this document.
Wouldn't a Node Crash event be an example of an unexpected correlated failure?


S3 General ref and sample topology
----------------------------------

2nd sentence:
    ... TG & TA represents
    Traffic Generator & Analyzer respectively.
suggest change to:
TG & TA stand for Traffic Generator & Traffic Analyzer respectively,
and these components comprise the "tester".

Figure 1 is too wide!
I count 85 characters, including 5 leading spaces. The max is 72.
See 
http://tools.ietf.org/idnits?url=http://tools.ietf.org/id/draft-ietf-bmwg-protection-meth-04.txt
for more stuff like this...


Section 5.3 last paragraph reads:
---------------------------------
    In addition, this methodology considers the packets in error and
    duplicate packets that could have been generated during the failover
    process. In scenarios, where separate measurement of packets in
    error and duplicate packets is difficult to obtain, these packets
    should be attributed to lost packets.
A duplicated packet should *never* be considered the same as a lost packet.
I think you want to say that some of the methodologies consider lost,
out-of-order, and duplicate packets to be impaired packets that
all determine the failure recovery time.


Section 5.5, Selection of IGP
-----------------------------
Add a sentence/reference:
     See [IGP-DATAPLANE] for IGP options to consider and record.


Section 5.6, Reversion
----------------------
    ...In all test cases listed here
    Reversion should not produce any packet loss, out of order or
    duplicate packets. Each of the test cases in this methodology
    document provides a check to confirm that there is no packet loss.
This is not strictly true, especially since the presence of
out-of-order packets depends on the difference in delay between
backup and primary paths.

I suggest:
    ...In all test cases listed here,
    Reversion is actuated to observe any packet loss, out of order or
    duplicate packets. <delete last sentence>

Section 5.7, Traffic Generation
-------------------------------
The recommendation against round-robin traffic generation to prefixes
seems to be inconsistent with section 3.9 of [IGP-DATAPLANE]:
    ...The destination addresses
    for the offered load MUST be distributed such that all routes are
    matched and each route is offered an equal share of the total
    Offered Load.  This requirement for the Offered Load to be
    distributed to match all destinations in the route table creates
    separate flows that are offered to the DUT.  The capability of the
    Tester to measure packet loss for each individual flow (identified
    by the destination address matching a route entry) and the scale
    for the number of individual flows for which it can measure packet
    loss should be considered when benchmarking Route-Specific
    Convergence [Po07t].
This difference needs to be sorted-out.

Section 6
---------
Some nouns may be missing:
    f) PRI is Primary *Node or Path*   ??

    g) BKP denotes Backup Node *or Path*  ??

Section 6.1 (and beyond)
------------------------
The Label stack only makes sense if PE and P are
indicated in (all) the Figure(s).

Section 8, Reporting Format
---------------------------

Suggest adding:
     Note the presence of out-of-order and duplicate packets, when measured
as an item in the Benchmarks list.


Also:
       3. Timestamp Based Method (TBM): This method of failover
         calculation is based on the timestamp that gets transmitted as
         payload in the packets originated by the generator. The Traffic
         Analyzer records the timestamp of the last packet received
         before the failover event and the first packet after the
         failover and derives the time based on the difference between
         these 2 timestamps. Note: The payload could also contain
         sequence numbers for out-of-order packet calculation and
         duplicate packets.
The capabilities required to do this should be identified in the
new section on Tester capabilities.

(it seems I had more than a just few comments...)