Re: Genart last call review of draft-ietf-rtgwg-backoff-algo-07

"Acee Lindem (acee)" <acee@cisco.com> Fri, 16 February 2018 00:30 UTC

Return-Path: <acee@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E97171201FA; Thu, 15 Feb 2018 16:30:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.53
X-Spam-Level:
X-Spam-Status: No, score=-14.53 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zxv2PFffP7yH; Thu, 15 Feb 2018 16:30:45 -0800 (PST)
Received: from alln-iport-7.cisco.com (alln-iport-7.cisco.com [173.37.142.94]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F1A46124235; Thu, 15 Feb 2018 16:30:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=12674; q=dns/txt; s=iport; t=1518741045; x=1519950645; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=EurmykDqyK9ejYJZUMhnUiP5MIweZDQIXQRfzHcFkuU=; b=aJp0oyayh11oC7u0IWm1ddzK8jmRFF/xcelj9Jycn5WFGGJQ0V4SAeUk QFPzyzg95WaX2bqugyOw1Co4vs7WAsCzOkmaUl/pqOpqUi0+zB7+8XksH q0+OOAWy46emVTN90SY6ezBxPewN1aLaMmymGlZr+6BK0YC9fFGvIYvBp c=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DnAAA5JYZa/4kNJK1cGQEBAQEBAQEBAQEBAQcBAQEBAYMhBC1mcCgKg1uKJY4CggKBF5ZFghgKJYUWAhqCKFQYAQIBAQEBAQECayiFJAYjET4HEAIBCBQGAiYCAgIwFRACBAENBYo1EK8hgieIdYITAQEBAQEBAQEBAQEBAQEBAQEBAQEBGAWBD4N0gieDPgEpgwWDLwIDAYFtgxcxgjQFiwmZKQkCiB6NZYIfhiqLfYsWgm+GYIMKAhEZAYE7AR85gVFwFRlOAYIbglUcgQoBCXJ4i0grgQmBGQEBAQ
X-IronPort-AV: E=Sophos;i="5.46,519,1511827200"; d="scan'208";a="70941963"
Received: from alln-core-4.cisco.com ([173.36.13.137]) by alln-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Feb 2018 00:30:43 +0000
Received: from XCH-RTP-013.cisco.com (xch-rtp-013.cisco.com [64.101.220.153]) by alln-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id w1G0Ugk6000640 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Fri, 16 Feb 2018 00:30:43 GMT
Received: from xch-rtp-015.cisco.com (64.101.220.155) by XCH-RTP-013.cisco.com (64.101.220.153) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Thu, 15 Feb 2018 19:30:42 -0500
Received: from xch-rtp-015.cisco.com ([64.101.220.155]) by XCH-RTP-015.cisco.com ([64.101.220.155]) with mapi id 15.00.1320.000; Thu, 15 Feb 2018 19:30:42 -0500
From: "Acee Lindem (acee)" <acee@cisco.com>
To: Elwyn Davies <elwynd@dial.pipex.com>, "gen-art@ietf.org" <gen-art@ietf.org>
CC: "draft-ietf-rtgwg-backoff-algo.all@ietf.org" <draft-ietf-rtgwg-backoff-algo.all@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
Subject: Re: Genart last call review of draft-ietf-rtgwg-backoff-algo-07
Thread-Topic: Genart last call review of draft-ietf-rtgwg-backoff-algo-07
Thread-Index: AQHTppDi5UPFeAFcl0esh5E8v/nu86OmLZ2A
Date: Fri, 16 Feb 2018 00:30:42 +0000
Message-ID: <8C2D1776-C3F3-4C13-A403-3D4C112184C8@cisco.com>
References: <151872192828.7546.15103568221130514259@ietfa.amsl.com>
In-Reply-To: <151872192828.7546.15103568221130514259@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.116.152.195]
Content-Type: text/plain; charset="utf-8"
Content-ID: <8EFE250E5E06E741B63CA87D5EF4A928@emea.cisco.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/bXzewunqlDgoK4c3pDlkpg5YKv4>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Feb 2018 00:30:48 -0000

Hi Elwyn, 

On 2/15/18, 2:12 PM, "Elwyn Davies" <elwynd@dial.pipex.com> wrote:

    Reviewer: Elwyn Davies
    Review result: Ready with Nits
    
    I am the assigned Gen-ART reviewer for this draft. The General Area
    Review Team (Gen-ART) reviews all IETF documents being processed
    by the IESG for the IETF Chair.  Please treat these comments just
    like any other last call comments.
    
    For more information, please see the FAQ at
    
    <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.
    
    Document: draft-ietf-rtgwg-backoff-algo-07.txt
    Reviewer: Elwyn Davies
    Review Date: 2018/02/15
    IETF LC End Date: 2018/02/14
    IESG Telechat date: 2016/02/22
    
    Summary: Ready with nits. The draft does not refer to OSPFv3 - i am not sure if
    this is an oversight or because ODSPFv3 already has this mechanism - either way
    it should be mentioned.  One question that occurred to me is whether the draft
    could be considered as updating the OSPFv2/v3 and ISIS standards (not that IETF
    has any control over ISIS).
    
    Major issues:
    None
    
    Minor issues:
    (Non-)Relation between HOLDDOWN_INTERVAL and *_SPF_DELAY values:  I notced that
    Benjamin Kaduk's SECDIR review of this document
    (https://datatracker.ietf.org/doc/review-ietf-rtgwg-backoff-algo-07-secdir-lc-kaduk-2018-02-14/)
    was concerned that certain state transitions would never occur.  I loooked at
    this and realized that his assumption that LONG_SPF_DELAY < HOLDDOWN_INTERVAL
    is not required by the document and s6 explicitly resiles from offering
    suggested default values.  Without this assumption, the state machine appears
    to be correct. Not being familiar with the consequences of setting the
    HOLDDOWN_INTERVAL relative to the *_SPF_DELAY, I am not sure if anything could
    be said about such consequences, but I think it would avoid other people making
    the same assumption as the SECDIR reviewer if it was explicitly stated that
    HOLDDOWN_INTERVAL is not necessarily bigger than any of the *_SPF_DELAY values
    and adding any advice from experience about how to choose appropriate values. 
    This might also avoid naive implementers shortcutting the state machine
    implementation if they made the same assumption.

The definition of HOLDDOWN_INTERVAL explicitly states: 

HOLDDOWN_INTERVAL: The time required with no received IGP events
   before considering the IGP to be stable again and allowing the
   SPF_DELAY to be restored to INITIAL_SPF_DELAY. e.g., 3 seconds.  The
   HOLDDOWN_INTERVAL MUST be defaulted or configured to be longer than
   the TIME_TO_LEARN_INTERVAL.

Perhaps, it should be restated in the third paragraph of section 6. 

    
    Requirements Language: Suggest s/RFC2119/RFC8174/ as there are uses of lower
    case versions of the reserved words.

I believe this was brought up before and we will make this change. If not, we'll change to the RFC 8174 language. 
    
    Default values for parameters:  There is a possible conflict between s3, where
    example values for the various interval parameters are given and s6 which
    states that no default values are specified in the document.  The difference in
    termnology maybe too subtle for some implementers.

I would expect those implementing IGPs to know the different between an "example" and a "default". 
    
    Aborting or otherwise of SPF calculation if an IGP event occurs while an SPF
    calculation is in progress.  A note about whether this should happen (if it is
    possible) would be desirable.

This is certainly out scope and I'm not sure why anyone would deduce from this draft that an implementation should or shouldn't do this. 

    
    OSPFv3: Does this (not) equally apply to OSPF v3 for IPv6?  If so it should be
    mentioned and RFC 5340 included in the references.

Yes. This will be added as reference. 
    
    s12:  I suspect (although it could be arguable) that the ISIS definition, RFC
    2328 (OSPFv2) and (if added) RFC5340 are normative as you need to understand
    how they work.  This work could even be considered to update these documents.

While implemented from the beginning, SPF Backoff is not specified by the IGP protocol specifications. 

Bruno and I will look at the editorial comments and let you know which ones we do and don't incorporate. 

Thanks,
Acee 
    
    Nits/editorial comments:
    General: The term 'back-off' may not be familiar to non-Emglish mother tongue
    speakers and on first occurrence needs a little explanation for naive readers
    to indicate what it means and to what the back-off is being applied.  I have
    suggested some additional text to this end for the abstract and s1.
    
    Abstract:
    OLD:
       This document defines a standard algorithm to back-off link-state IGP
       Shortest Path First (SPF) computations.
    NEW:
       This document defines a standard algorithm to temporararily postpone or
       'back-off' link-state IGP Shortest Path First (SPF) computations to reduce
       the computational load on IGP nodes if network events occurring at closely
       spaced times would otherwise lead to multiple, essentially redundant
       recalculations of the routing tables.
    ENDS
    
    s1, para 1: s/at the same time/essentially at the same time/
    
    s1, para 2: s/new Shortest Path First (SPF)/new Shortest Path First (SPF)
    routing table/
    
    s1, para 2:
    OLD:
       experiencing multiple temporally close failures over a short
       period of time
    NEW:
       experiencing multiple temporally close failures (that is, eventuating over a
       short period of time)
    ENDS
    
    s1, para 2: There is a right bracket missing in the following and starting a
    clause with 'such as' and ending it with an ellipsis ('...') is redundant. >   
    such as LDP [RFC5036], RSVP-TE [RFC3209], >    BGP [RFC4271], Fast ReRoute
    computations (e.g.  Loop Free Alternates >    (LFA) [RFC5286], FIB updates...
    It is unclear to me where the bracket should go: maybe after [RFC5286] or at
    the end. Please clarify.
    
    s1, para 2: the phrase
    > This also reduces the churn on
    >    routers and in the network and.
    is useless, vague jargon.  The previous sentence expresses what I suspect is
    meant by 'churn'. so this is redundant and can be omitted.
    
    s1, para 3:
    OLD:
    To allow for this, IGPs implement an SPF back-off algorithm.
    NEW:
    To allow for this, IGPs usually implement an SPF back-off algorithm that
    postpones or backs-off the running of the SPF calculation when the algorithm
    predicts that a run would be essentially redundant or even counter-productive
    because it appears that multiple closely timed routing-affecting events can be
    expected. ENDS
    
    s1, para 3: s/choosen/chosen/
    
    s2, last bullet: SPF_DELAY is not defined at this point:
    s/SPF_DELAY timers values/values for any timers used to back-off SPF
    calculations/
    
    s2, last bullet:  s/Even though/This is important even though/
    
    s3, para 1: Undesirable ellipsis:
    s/a metric change on a link or prefix.../and a metric change on a link or
    prefix./
    
    s3:Need to expand SRLG on first use - it isn't deemed to be well-known.
    
    s3, INITIAL_SPF_DELAY bullet: s/A very small delay to quickly handle link
    failure/A very small delay to quickly handle a single isolated link failure/
    
    s3, SHORT_SPF_DELAY bullet:
    OLD:
        SHORT_SPF_DELAY: A small delay to have a fast convergence in case of
        a single failure (node, SRLG..), e.g., 50-100 milliseconds.
    NEW:
        SHORT_SPF_DELAY: A small delay to provide fast convergence in the case of
        a single component failure (node, SRLG..) that leads to multiple IGP events,
        e.g., 50-100 milliseconds.
    ENDS
    
    s5/s5.1: There is currently no text in s5: this is generally considered
    inappropriate.  Suggest removing the first sentence in s5.1 ("This section
    describes the state machine.") and adding to s5: NEW: This section describes
    the abstract finite state machine (FSM) intended to control the timing of the
    running of SPF calculations in response to IGP events.
    
    s5.1, QUIET bullet: s/occured/occurred/
    
    s5.2:  There is no need for 3 expansions of FSM - the expansion can be moved to
    s5 as suggested above.
    
    s5.3 title: s/States/State/
    
    s6, next to last para: s/it's RECOMMENDED to play it safe/it is recommended
    that timer intervals should be chosen conservatively/ (this is an operational
    recommendation).
    
    s6, last para: s/RECOMMENDED/recommended/ (ditto).
    
    s7, para 1: s/is based on/is dependent on/, s/RECOMMENDED/recommended/
    (operational again)
    
    s8: Other documents (e.g., from vendors) have used the terms SPF wait time and
    SPF hold time.  It might be useful to mention that this document essentially
    provides ways to implement these settings.