Re: [RTG-DIR] Rtgdir last call review of draft-ietf-spring-oam-usecase-06

<Ruediger.Geib@telekom.de> Thu, 29 June 2017 07:33 UTC

Return-Path: <prvs=346f6a78b=Ruediger.Geib@telekom.de>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EF7AE128B8E; Thu, 29 Jun 2017 00:33:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.32
X-Spam-Level:
X-Spam-Status: No, score=-4.32 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=telekom.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A0B0RobvIEmz; Thu, 29 Jun 2017 00:33:15 -0700 (PDT)
Received: from mailout34.telekom.de (MAILOUT34.telekom.de [80.149.113.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 99D3B12706D; Thu, 29 Jun 2017 00:33:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telekom.de; i=@telekom.de; q=dns/txt; s=dtag1; t=1498721595; x=1530257595; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=dXgims0Cx5GKMPSx0/t50bXuJJip5yF9y8jCYqHUJbs=; b=tZq8fOkXCVrC5kxDxuqPhVwj8+1I3fTbVqaJx4tMLG7ReTfs5uc/MXJL GfBA5+/6/s+8z4dZaqdJupAEISVXWE5otKzxGaFv+13ogJF1v5O1mn/Hn tKxlgbGpZi86YWGnFkCrf6LAg1mVJBJY8v23tlFac4lgFmdYsGhmSz1U4 7ylxruXAD4swMjfV9+t+xpgeo5tUJgL2bAtYtFoXwx+dq8QwGP348ajfP IpmWW7KcuNLDPng+YeKfKlni8Osh8oOuNnLRI/ZTYXh/8+I2310Pf/Pq6 lsnyvPnhX7Kqh743N/CPqNgbxUpb2ENZSFVSkis+SGjQASjLfvw8nLv+i w==;
Received: from q4de8ssaz61.gppng.telekom.de ([10.206.166.200]) by MAILOUT31.telekom.de with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Jun 2017 09:33:07 +0200
X-IronPort-AV: E=Sophos;i="5.40,279,1496095200"; d="scan'208";a="1206321409"
Received: from he101659.emea1.cds.t-internal.com ([10.134.226.19]) by q4de8ssazdv.gppng.telekom.de with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Jun 2017 09:33:04 +0200
Received: from HE101653.emea1.cds.t-internal.com (10.134.226.13) by HE101659.emea1.cds.t-internal.com (10.134.226.19) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Thu, 29 Jun 2017 09:33:00 +0200
Received: from HE101653.emea1.cds.t-internal.com ([fe80::8954:80af:2020:572c]) by HE101653.emea1.cds.t-internal.com ([fe80::8954:80af:2020:572c%27]) with mapi id 15.00.1263.000; Thu, 29 Jun 2017 09:33:00 +0200
From: Ruediger.Geib@telekom.de
To: jmh@joelhalpern.com
CC: spring@ietf.org, ietf@ietf.org, draft-ietf-spring-oam-usecase.all@ietf.org, rtg-dir@ietf.org
Thread-Topic: Rtgdir last call review of draft-ietf-spring-oam-usecase-06
Thread-Index: AQHS61uWDtZ/bIkU00O8YkTSMd/hKKI7fDQA
Date: Thu, 29 Jun 2017 07:33:00 +0000
Message-ID: <6a3572d6034241099f9a3b73d1b5450f@HE101653.emea1.cds.t-internal.com>
References: <149813817013.30481.17524594111387704082@ietfa.amsl.com>
In-Reply-To: <149813817013.30481.17524594111387704082@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.157.169.89]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/K8wR8OfuB9wk6JemioZS4MoST8s>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-spring-oam-usecase-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jun 2017 07:33:18 -0000

Hi Joel,

thanks for your review. The comments of the draft editors are marked [ED] inserted in your text below.

Regards,

Ruediger

######

Minor:
    [JH]
    The introduction treats having a single centralized monitoring system as an
    unalloyed positive.  To set context properly, it would seem more
    appropriate to note that many operators find such central systems useful,
    and the approach described here enables that when desired.

[ED] Is the following text of the bulleted list ok (if you have a better term then 
"large", please let us know)?      
     "- The system described here allows to set up an SR domain wide centralised 
      connectivity validation, which is useful in large network operator domains."

#####
     [JH]
    The reference in the introduction to IGP topology discovery is very
    confusing. "Adding MPLS topology awareness to an IGP speaking device hence
    enables a simple and scalable data plane based monitoring mechanism."  As
    noted later in the document, link-state IGPs provide topology awareness. 
    So what is this part of the introduction trying to say?  (Side-note, not
    all IGPs are link state, although the applicability of Babel or RIP to MPLS
    Segment Routing is clearly outside the scope of this document.)

[ED] Thanks for pointing out. We propose to change the text reference and  limit scope to link state IGPs:  
"Topology awareness is an important feature of link state IGPs deployed by operators of large networks. MPLS topology awareness combined with IGP topology awareness enables a simple and scalable data plane based monitoring mechanism."

#####

    [JH]
    In section 5.1 in discussing path trace the reference is to RFC 4379 which
    is a clear source for path trace.  However, the text refers to "tree
    trace".  While that may have become a common phrase for the usage, it is
    not used in RFC 4379.  The term should either be explain, include a
    suitable reference, or not be used.

    [ED] Will replace "tree trace" by path trace.

#####
   [JH]
   In section 5.3 on fault isolation, the text notes that the only difference
   between the test which succeeds and that which fails is the difference the
   the adjacency SID.  The text then goes on to say "Assuming the second probe
   has been routed correctly, the fault must have been occurring in R2 which
   didn't forward the packet to the interface identified by its Adjacency SID
   663."  That does not follow.  If the link as failed in an undetected fashion
   (either in one direction or both), R2 would be functioning fine and the
   symptom would be the same.  Remotely detecting the difference between R2
   failing to forward and the link not working seems a much harder task.

[ED] Yes, link or router can be faulty. What about:
"...the fault is that for some (possibly unknown) reason SR packets to be forwarded from R2 via the interface identified by Adjacency SID 663 are lost."

#####
    [JH]
    The claim that the PMS can / should (intent is ambiguous) notify the router
    when it detects a path failure raises a number of issues.   It is not at
    all clear what the router would do with the notification.  (e.g. If it
    removed the link from service, then future monitoring would not be able to
    detect that the link was working.)  Either this needs to become a
    significantly larger section, or (more likely) the text needs to be removed.

[ED] Your comment hits the point. Prior to removing it, I'd like to offer a change (but may word it not well and I unfortunately can't check text with my operational department for the coming weeks - vacation season has started).

[ED] Let's call the symptom "not working forwarding". We know, that it occurs. If there is one counter-measure which often works and has properties allowing automated execution (i.e. the situation can't deteriorate, if a solution is executed automatically), "automation" may also be proposed here (but no discussion of solution details, rather a limited number of general requirements). If the text below is useful, we should keep it, but I will not fight for it in the draft: 

[ED]
"Path Trace and Failure Notification

Sometimes forwarding along a single path indeed doesn't work, while the control plane information is healthy. Such a situation may occur after maintenance work within a domain.
An operator may perform on demand-tests, but execution of automated PMS path trace checks may be set up too (scope may be limited to a subset of important end-to-end paths crossing the router or network section after completion of the maintenance work there). Upon detection of a path which can't be used, the operator needs to be notified. A check ensuring that re-routing event is differed from a path facing whose forwarding behavior doesn't correspond to the control plane information is necessary (but out of scope of this document).

Adding an automated problem solution to the PMS features only makes sense, if the root cause of the symptom appears often, can be assumed to be non- ambiguous by its symptoms, can be solved by a pre-determined chain of commands and the automated PMS reaction not doing any collateral damage. A closer analysis is out of scope of this document.

The PMS is expected to check control plane liveliness after a path repair effort was executed. It doesn't matter whether the path repair was triggered manually or by an automated system."

############

[JH] Editorial:
    Chapter 7 is titled dealing with non-SR environments.  Which makes sense. 
    The text then switches to using "pre-SR" instead of "non-SR".  I would
    recommend that all uses of "pre-SR" be changed to "non-SR".

[ED] OK, will be done in next published version.

-----Ursprüngliche Nachricht-----
Von: Joel Halpern [mailto:jmh@joelhalpern.com] 
Gesendet: Donnerstag, 22. Juni 2017 15:30
An: rtg-dir@ietf.org
Cc: spring@ietf.org; ietf@ietf.org; draft-ietf-spring-oam-usecase.all@ietf.org
Betreff: Rtgdir last call review of draft-ietf-spring-oam-usecase-06

Reviewer: Joel Halpern
Review result: Has Nits

This is a rtg-dir requested review.

Summary: Ready for publication as an Informational RFC with some minor items that should be considered.

Major: N/A

Minor:
    The introduction treats having a single centralized monitoring system as an
    unalloyed positive.  To set context properly, it would seem more
    appropriate to note that many operators find such central systems useful,
    and the approach described here enables that when desired.

    The reference in the introduction to IGP topology discovery is very
    confusing. "Adding MPLS topology awareness to an IGP speaking device hence
    enables a simple and scalable data plane based monitoring mechanism."  As
    noted later in the document, link-state IGPs provide topology awareness. 
    So what is this part of the introduction trying to say?  (Side-note, not
    all IGPs are link state, although the applicability of Babel or RIP to MPLS
    Segment Routing is clearly outside the scope of this document.)

    In section 5.1 in discussing path trace the reference is to RFC 4379 which
    is a clear source for path trace.  However, the text refers to "tree
    trace".  While that may have become a common phrase for the usage, it is
    not used in RFC 4379.  The term should either be explain, include a
    suitable reference, or not be used.

   In section 5.3 on fault isolation, the text notes that the only difference
   between the test which succeeds and that which fails is the difference the
   the adjacency SID.  The text then goes on to say "Assuming the second probe
   has been routed correctly, the fault must have been occurring in R2 which
   didn't forward the packet to the interface identified by its Adjacency SID
   663."  That does not follow.  If the link as failed in an undetected fashion
   (either in one direction or both), R2 would be functioning fine and the
   symptom would be the same.  Remotely detecting the difference between R2
   failing to forward and the link not working seems a much harder task.

    The claim that the PMS can / should (intent is ambiguous) notify the router
    when it detects a path failure raises a number of issues.   It is not at
    all clear what the router would do with the notification.  (e.g. If it
    removed the link from service, then future monitoring would not be able to
    detect that the link was working.)  Either this needs to become a
    significantly larger section, or (more likely) the text needs to be removed.

Editorial:
    Chapter 7 is titled dealing with non-SR environments.  Which makes sense. 
    The text then switches to using "pre-SR" instead of "non-SR".  I would
    recommend that all uses of "pre-SR" be changed to "non-SR".