[mpls] Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)

Spencer Dawkins <spencerdawkins.ietf@gmail.com> Wed, 24 May 2017 20:10 UTC

Return-Path: <spencerdawkins.ietf@gmail.com>
X-Original-To: mpls@ietf.org
Delivered-To: mpls@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 30FE2127B52; Wed, 24 May 2017 13:10:09 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Spencer Dawkins <spencerdawkins.ietf@gmail.com>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-mpls-tp-shared-ring-protection@ietf.org, Eric Gray <Eric.Gray@Ericsson.com>, mpls-chairs@ietf.org, Eric.Gray@Ericsson.com, mpls@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.51.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <149565660910.8641.739437988075507213.idtracker@ietfa.amsl.com>
Date: Wed, 24 May 2017 13:10:09 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/1pYXjxUv9y7PA3Y6Yxu41EO-igE>
Subject: [mpls] Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.22
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 May 2017 20:10:09 -0000

Spencer Dawkins has entered the following ballot position for
draft-ietf-mpls-tp-shared-ring-protection-05: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-mpls-tp-shared-ring-protection/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I want to thank the authors for a very readable draft. It was a pleasure
to review, and that's a high bar for the subject.

I have loads of questions, but my first set of questions is an expansion
of Alvaro's comment that I think rises to the level of a Discuss. Please
note that I'm asking questions, not proposing text changes, so I really
do want to discuss it.

---------- my first set of questions

In this text,

   Three typical ring protection mechanisms are described in this
   section: wrapping, short wrapping and steering.  All nodes on the
   same ring MUST use the same protection mechanism.

I would like to understand what happens if they aren't - and I'm asking,
mostly as a way of encouraging guidance for operators in debugging cases
where they're not all using the same mechanism. I'm not asking for a full
mesh of possible misconfigurations, only for a sentence or two ("If they
aren't all using the same protection mechanism, the following things may
happen").

More broadly, I'd like to understand why wrapping and short wrapping are
both defined. It seems like the only functional difference is that short
wrapping doesn't give you as much latency. Is that right? 

24 pages in, I see this:

   o  In rings utilizing the wrapping protection, each node detects the
      failure or receives the RPS request as the destination node MUST
      perform the switch from/to the working ring tunnels to/from the
      protection ring tunnels if it has no higher priority active RPS
      request.

   o  In rings utilizing the short wrapping protection, each node
      detects the failure or receives the RPS request as the
destination
      node MUST perform the switch only from the working ring tunnels
to
      the protection ring tunnels.

so I'm pretty sure there are differences beyond what I was seeing,
earlier in the document.

And, of course, I'm not sure what the effect of choosing steering over
wrapping/short wrapping would be, for my users, but that can wait until
we talk about wrapping and short wrapping ...

At a minimum, I'd like to see guidance for operators in choosing among
the three protection mechanisms. Why would they choose any one of the
three?

I also note that this MUST seems to be repeated using different words in
section 5.1, as

   All nodes in the same ring MUST use the same protection mechanism,
   Wrapping, steering or short-wrapping.

If that's saying the same thing, one MUST is all you need.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

---------- all the other questions 

In this text,

   When the service LSP passes through the interconnected rings, the
   direction of the working ring tunnels used on both rings SHOULD be
   the same.  For example, if the service LSP uses the clockwise
working
   ring tunnel on Ring1, when the service LSP leaves Ring1 and enters
   Ring2, the working ring tunnel used on Ring2 SHOULD also follow the
   clockwise direction.

I'm not understanding why this is a SHOULD, and not a MUST. If the
direction of the working ring tunnels used on both rings is not the same,
does this still work? 

If it still works, why does this matter? But, either way, you might
usefully say something about why this isn't always the right thing to do,
even if you just give one example. The point of SHOULD is that
implementers make their own informed decisions, so providing information
that will inform those decisions seems important.

I wanted to call out 

   Ring switches MUST be preempted by higher priority RPS requests. 
For
   example, consider a protection switch that is active due to a manual
   switch request on the given link, and another protection switch is
   required due to a failure on another link.  Then an RPS request MUST
   be generated, the former protection switch MUST be dropped, and the
   latter protection switch established.

   MSRP mechanism SHOULD support multiple protection switches in the
   ring, resulting in the ring being segmented into two or more
separate
   segments.  This may happen when several RPS requests of the same
   priority exist in the ring due to multiple failures or external
   switch commands.

as really good examples of the kind of text I think would help the places
in this document ("For example", "This may happen when") where no
examples are given. Thanks for providing those examples!

Ouch. Do I understand from 

   o  Protection Switching Mode (M): This 2-bit field indicates the
      protection switching mode used by the sending node of the RPS
      message.  This can be used to check that the ring nodes on the
      same ring use the same protection switching mechanism.  The
      defined values of the M field are listed as below:

             +------------------+-----------------------------+
             |  Bits (MSB-LSB)  |   Protecton Switching Mode  |
             +------------------+-----------------------------+
             |       0 0        |         Reserved            |
             |       0 1        |         Wrapping            |
             |       1 0        |       Short Wrapping        |
             |       1 1        |         Steering            |
             +------------------+-----------------------------+

that you already have three protection mechanisms, and have only one
possible codepoint to allocate for any future optimizations? Assuming
that "0 0" can be unReserved ...

Could you clarify what "anyway" means in this text?

   When multiple MS RPS requests exist at the same time addressing
   different links and there is no higher priority request on the ring,
   no switch SHOULD be executed and existing switches MUST be dropped.
   The nodes MUST signal, anyway, the MS RPS request code.

I'm seeing that the commands like LP described in section 5.2.1.1  are
used in the document before these (I'm serious) helpful and clear
explanations appear. If it's possible to move section 5.2.1.1 up in the
document, that would be great, but if it isn't possible, a forward
pointer would be helpful to readers who don't already know what the
command abbreviations mean.

I'm really confused by this SHOULD:

   The PSC protocol [RFC6378] is designed for point-to-point LSPs, on
   which the protection switching can only be performed on one or both
   of the end points of the LSP.  The RPS protocol is designed for ring
   tunnels, which consist of multiple ring nodes, and the failure could
   happen on any segment of the ring, thus RPS SHOULD be capable of
   identifying and handling the different failures on the ring, and
   coordinating the protection switching behavior of all the nodes on
   the ring.

I suspect that's because it's not a 2119 SHOULD, but if people think it
is, I wouldn't mind understanding why.

Section 5.3, "RPS and PSC Comparison on Ring Topology" is really helpful,
but it appears 43 pages in. Given that I'd expect people to be asking why
they should implement a new protection switching protocol when they've
already implemented PSC, I'd think this would be much more useful, early
in the document.

I'm somewhat confused about the code point allocation strategy in this
text:

   The RPS Request Field is 8 bits, the allocated values are as
follows:

       Value       Description               Reference
      -------  --------------------------- ---------------
         0     No Request (NR)             this document
         1     Reverse Request (RR)        this document
         2     unassigned
         3     Exercise (EXER)             this document
         4     unassigned
         5     Wait-To-Restore (WTR)       this document
         6     Manual Switch (MS)          this document
        7-10   unassigned
        11     Signal Fail (SF)            this document
        12     unassigned
        13     Forced Switch (FS)          this document
        14     unassigned
        15     Lockout of Protection (LP)  this document
      16-254   unassigned
        255    Reserved

My first question is, why the highest priority RPS value is 15, given
that the field is 8 bits wide. If anyone ever needs to add a code point
higher than the highest priority code point, will that work well? I can
imagine code that says "if operation_priority is greater than
highest_priority, it's an error", for example.

I may have other questions depending on your answer, but let's start
there.