Re: [mpls] Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)

"Dongjie (Jimmy)" <jie.dong@huawei.com> Thu, 22 June 2017 02:27 UTC

Return-Path: <jie.dong@huawei.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 64324129463; Wed, 21 Jun 2017 19:27:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OPLURmI8mXkv; Wed, 21 Jun 2017 19:27:29 -0700 (PDT)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5C61D12942F; Wed, 21 Jun 2017 19:27:27 -0700 (PDT)
Received: from 172.18.7.190 (EHLO lhreml704-cah.china.huawei.com) ([172.18.7.190]) by lhrrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DIY92708; Thu, 22 Jun 2017 02:27:25 +0000 (GMT)
Received: from NKGEML413-HUB.china.huawei.com (10.98.56.74) by lhreml704-cah.china.huawei.com (10.201.108.45) with Microsoft SMTP Server (TLS) id 14.3.301.0; Thu, 22 Jun 2017 03:27:23 +0100
Received: from NKGEML515-MBX.china.huawei.com ([fe80::a54a:89d2:c471:ff]) by NKGEML413-HUB.china.huawei.com ([10.98.56.74]) with mapi id 14.03.0235.001; Thu, 22 Jun 2017 10:27:09 +0800
From: "Dongjie (Jimmy)" <jie.dong@huawei.com>
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>, The IESG <iesg@ietf.org>
CC: "mpls@ietf.org" <mpls@ietf.org>, "draft-ietf-mpls-tp-shared-ring-protection@ietf.org" <draft-ietf-mpls-tp-shared-ring-protection@ietf.org>, "mpls-chairs@ietf.org" <mpls-chairs@ietf.org>, Eric Gray <Eric.Gray@ericsson.com>
Thread-Topic: Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)
Thread-Index: AQHS1MnGqj3O+6ppgU23RAEPtNq/rqIt5CmAgAJt6EA=
Date: Thu, 22 Jun 2017 02:27:08 +0000
Message-ID: <76CD132C3ADEF848BD84D028D243C9279371154E@NKGEML515-MBX.china.huawei.com>
References: <149565660910.8641.739437988075507213.idtracker@ietfa.amsl.com> <CAKKJt-fsy96LXCUR9PCjxFAq64tJbqtVm_ewQpOJ61x0uZrSjQ@mail.gmail.com>
In-Reply-To: <CAKKJt-fsy96LXCUR9PCjxFAq64tJbqtVm_ewQpOJ61x0uZrSjQ@mail.gmail.com>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.130.151.75]
Content-Type: multipart/alternative; boundary="_000_76CD132C3ADEF848BD84D028D243C9279371154ENKGEML515MBXchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.594B2B0D.013B, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: ab6295ba6a7be4ec3d6ae87bb82a35b8
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/fniIF1Bv_k0tox2_wr6MYiN-rUo>
Subject: Re: [mpls] Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Jun 2017 02:27:32 -0000

Hi Spencer,

Thanks a lot for your feedbacks on the updated version.

Best regards,
Jie

From: Spencer Dawkins at IETF [mailto:spencerdawkins.ietf@gmail.com]
Sent: Wednesday, June 21, 2017 5:12 AM
To: The IESG <iesg@ietf.org>
Cc: mpls@ietf.org; draft-ietf-mpls-tp-shared-ring-protection@ietf.org; mpls-chairs@ietf.org; Eric Gray <Eric.Gray@ericsson.com>
Subject: Re: Spencer Dawkins' Discuss on draft-ietf-mpls-tp-shared-ring-protection-05: (with DISCUSS and COMMENT)

Jimmy has asked whether -06 works for me. The easiest way for me to reply to that, is from my original ballot, but I'm looking at Huub's e-mails in this thread. So, see below.

On Wed, May 24, 2017 at 3:10 PM, Spencer Dawkins <spencerdawkins.ietf@gmail.com<mailto:spencerdawkins.ietf@gmail.com>> wrote:
Spencer Dawkins has entered the following ballot position for
draft-ietf-mpls-tp-shared-ring-protection-05: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-mpls-tp-shared-ring-protection/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I want to thank the authors for a very readable draft. It was a pleasure
to review, and that's a high bar for the subject.

I have loads of questions, but my first set of questions is an expansion
of Alvaro's comment that I think rises to the level of a Discuss. Please
note that I'm asking questions, not proposing text changes, so I really
do want to discuss it.

---------- my first set of questions

In this text,

   Three typical ring protection mechanisms are described in this
   section: wrapping, short wrapping and steering.  All nodes on the
   same ring MUST use the same protection mechanism.

I would like to understand what happens if they aren't - and I'm asking,
mostly as a way of encouraging guidance for operators in debugging cases
where they're not all using the same mechanism. I'm not asking for a full
mesh of possible misconfigurations, only for a sentence or two ("If they
aren't all using the same protection mechanism, the following things may
happen").

Huub's additional text worked for me here.

More broadly, I'd like to understand why wrapping and short wrapping are
both defined. It seems like the only functional difference is that short
wrapping doesn't give you as much latency. Is that right?

24 pages in, I see this:

   o  In rings utilizing the wrapping protection, each node detects the
      failure or receives the RPS request as the destination node MUST
      perform the switch from/to the working ring tunnels to/from the
      protection ring tunnels if it has no higher priority active RPS
      request.

   o  In rings utilizing the short wrapping protection, each node
      detects the failure or receives the RPS request as the
destination
      node MUST perform the switch only from the working ring tunnels
to
      the protection ring tunnels.

so I'm pretty sure there are differences beyond what I was seeing,
earlier in the document.

I think Huub's proposed text addressed this.


And, of course, I'm not sure what the effect of choosing steering over
wrapping/short wrapping would be, for my users, but that can wait until
we talk about wrapping and short wrapping ...

And this.


At a minimum, I'd like to see guidance for operators in choosing among
the three protection mechanisms. Why would they choose any one of the
three?

Section 7 is what I was hoping for. It's likely that a forward pointer to this section, early in the document, would be helpful for your readers. Do the right thing.


I also note that this MUST seems to be repeated using different words in
section 5.1, as

   All nodes in the same ring MUST use the same protection mechanism,
   Wrapping, steering or short-wrapping.

If that's saying the same thing, one MUST is all you need.

So, that went away.

SO I CAN CLEAR MY DISCUSS ... but just to check on my comments :-)



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

---------- all the other questions

In this text,

   When the service LSP passes through the interconnected rings, the
   direction of the working ring tunnels used on both rings SHOULD be
   the same.  For example, if the service LSP uses the clockwise
working
   ring tunnel on Ring1, when the service LSP leaves Ring1 and enters
   Ring2, the working ring tunnel used on Ring2 SHOULD also follow the
   clockwise direction.

I'm not understanding why this is a SHOULD, and not a MUST. If the
direction of the working ring tunnels used on both rings is not the same,
does this still work?

If it still works, why does this matter? But, either way, you might
usefully say something about why this isn't always the right thing to do,
even if you just give one example. The point of SHOULD is that
implementers make their own informed decisions, so providing information
that will inform those decisions seems important.

Huub's additional text was very helpful in explaining what's going on here.


I wanted to call out

   Ring switches MUST be preempted by higher priority RPS requests.
For
   example, consider a protection switch that is active due to a manual
   switch request on the given link, and another protection switch is
   required due to a failure on another link.  Then an RPS request MUST
   be generated, the former protection switch MUST be dropped, and the
   latter protection switch established.

   MSRP mechanism SHOULD support multiple protection switches in the
   ring, resulting in the ring being segmented into two or more
separate
   segments.  This may happen when several RPS requests of the same
   priority exist in the ring due to multiple failures or external
   switch commands.

as really good examples of the kind of text I think would help the places
in this document ("For example", "This may happen when") where no
examples are given. Thanks for providing those examples!

Ouch. Do I understand from

   o  Protection Switching Mode (M): This 2-bit field indicates the
      protection switching mode used by the sending node of the RPS
      message.  This can be used to check that the ring nodes on the
      same ring use the same protection switching mechanism.  The
      defined values of the M field are listed as below:

             +------------------+-----------------------------+
             |  Bits (MSB-LSB)  |   Protecton Switching Mode  |
             +------------------+-----------------------------+
             |       0 0        |         Reserved            |
             |       0 1        |         Wrapping            |
             |       1 0        |       Short Wrapping        |
             |       1 1        |         Steering            |
             +------------------+-----------------------------+

that you already have three protection mechanisms, and have only one
possible codepoint to allocate for any future optimizations? Assuming
that "0 0" can be unReserved ...

Huub's explanation helped me with this one.


Could you clarify what "anyway" means in this text?

   When multiple MS RPS requests exist at the same time addressing
   different links and there is no higher priority request on the ring,
   no switch SHOULD be executed and existing switches MUST be dropped.
   The nodes MUST signal, anyway, the MS RPS request code.

Thanks for this one.


I'm seeing that the commands like LP described in section 5.2.1.1  are
used in the document before these (I'm serious) helpful and clear
explanations appear. If it's possible to move section 5.2.1.1 up in the
document, that would be great, but if it isn't possible, a forward
pointer would be helpful to readers who don't already know what the
command abbreviations mean.

Thanks for the forward pointer.


I'm really confused by this SHOULD:

   The PSC protocol [RFC6378] is designed for point-to-point LSPs, on
   which the protection switching can only be performed on one or both
   of the end points of the LSP.  The RPS protocol is designed for ring
   tunnels, which consist of multiple ring nodes, and the failure could
   happen on any segment of the ring, thus RPS SHOULD be capable of
   identifying and handling the different failures on the ring, and
   coordinating the protection switching behavior of all the nodes on
   the ring.

I suspect that's because it's not a 2119 SHOULD, but if people think it
is, I wouldn't mind understanding why.

Thanks for fixing this.


Section 5.3, "RPS and PSC Comparison on Ring Topology" is really helpful,
but it appears 43 pages in. Given that I'd expect people to be asking why
they should implement a new protection switching protocol when they've
already implemented PSC, I'd think this would be much more useful, early
in the document.

There's more text, and it looks helpful, but I still think it would be more useful, much earlier in the document. Do the right thing :-) ...


I'm somewhat confused about the code point allocation strategy in this
text:

   The RPS Request Field is 8 bits, the allocated values are as
follows:

       Value       Description               Reference
      -------  --------------------------- ---------------
         0     No Request (NR)             this document
         1     Reverse Request (RR)        this document
         2     unassigned
         3     Exercise (EXER)             this document
         4     unassigned
         5     Wait-To-Restore (WTR)       this document
         6     Manual Switch (MS)          this document
        7-10   unassigned
        11     Signal Fail (SF)            this document
        12     unassigned
        13     Forced Switch (FS)          this document
        14     unassigned
        15     Lockout of Protection (LP)  this document
      16-254   unassigned
        255    Reserved

My first question is, why the highest priority RPS value is 15, given
that the field is 8 bits wide. If anyone ever needs to add a code point
higher than the highest priority code point, will that work well? I can
imagine code that says "if operation_priority is greater than
highest_priority, it's an error", for example.

I may have other questions depending on your answer, but let's start
there.

Huub's explanation helped a lot.

So, I'll go clear now, and thanks for working through my Discuss.

Spencer