Re: [mpls] MPLS-RT review of draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org

"Dutta, Pranjal K (Pranjal)" <pranjal.dutta@alcatel-lucent.com> Mon, 17 September 2012 21:46 UTC

Return-Path: <pranjal.dutta@alcatel-lucent.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 075FC21E8053 for <mpls@ietfa.amsl.com>; Mon, 17 Sep 2012 14:46:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.527
X-Spam-Level:
X-Spam-Status: No, score=-6.527 tagged_above=-999 required=5 tests=[AWL=0.071, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HvCK89lJIy90 for <mpls@ietfa.amsl.com>; Mon, 17 Sep 2012 14:46:10 -0700 (PDT)
Received: from ihemail3.lucent.com (ihemail3.lucent.com [135.245.0.37]) by ietfa.amsl.com (Postfix) with ESMTP id 2985721F869E for <mpls@ietf.org>; Mon, 17 Sep 2012 14:46:10 -0700 (PDT)
Received: from inbansmailrelay1.in.alcatel-lucent.com (h135-250-11-31.lucent.com [135.250.11.31]) by ihemail3.lucent.com (8.13.8/IER-o) with ESMTP id q8HLjuCO024776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 17 Sep 2012 16:45:59 -0500 (CDT)
Received: from INBANSXCHHUB01.in.alcatel-lucent.com (inbansxchhub01.in.alcatel-lucent.com [135.250.12.32]) by inbansmailrelay1.in.alcatel-lucent.com (8.14.3/8.14.3/GMO) with ESMTP id q8HLjtDK014060 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Tue, 18 Sep 2012 03:15:55 +0530
Received: from INBANSXCHMBSA3.in.alcatel-lucent.com ([135.250.12.53]) by INBANSXCHHUB01.in.alcatel-lucent.com ([135.250.12.32]) with mapi; Tue, 18 Sep 2012 03:15:54 +0530
From: "Dutta, Pranjal K (Pranjal)" <pranjal.dutta@alcatel-lucent.com>
To: Eric Gray <eric.gray@ericsson.com>, "draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org" <draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org>
Date: Tue, 18 Sep 2012 03:15:50 +0530
Thread-Topic: [mpls] MPLS-RT review of draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org
Thread-Index: Ac2GwS7a3BtpauCXTzC/3o7ZrcuRBQK0Bf7gAN9+QlA=
Message-ID: <C584046466ED224CA92C1BC3313B963E13F0EDB6A8@INBANSXCHMBSA3.in.alcatel-lucent.com>
References: <503DDC69.606@pi.nu> <OF320DC2B3.1BE45510-ON48257A6A.0050D2EE-48257A6A.00530A32@zte.com.cn> <C0AC8FAB6849AB4FADACCC70A949E2F12FABA330CA@EUSAACMS0701.eamcs.ericsson.se>
In-Reply-To: <C0AC8FAB6849AB4FADACCC70A949E2F12FABA330CA@EUSAACMS0701.eamcs.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_C584046466ED224CA92C1BC3313B963E13F0EDB6A8INBANSXCHMBSA_"
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.57 on 135.245.2.37
Cc: "mpls@ietf.org" <mpls@ietf.org>, "mpls-chairs@tools.ietf.org" <mpls-chairs@tools.ietf.org>
Subject: Re: [mpls] MPLS-RT review of draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 21:46:20 -0000

Hi Eric,
                  Pls. refer my answers inline.

Thanks,
Pranjal

________________________________
From: mpls-bounces@ietf.org [mailto:mpls-bounces@ietf.org] On Behalf Of Eric Gray
Sent: Friday, September 14, 2012 7:30 AM
To: draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org
Cc: mpls@ietf.org; mpls-chairs@tools.ietf.org
Subject: Re: [mpls] MPLS-RT review of draft-pdutta-mpls-multi-ldp-instance@tools.ietf.org

Dear Authors,

The MPLS Chair(s) have asked myself (and others) to review this draft to
determine - possibly among other things - whether this draft is ready for
adoption as a working group draft.

[Pranjal] Thanks for detailed review of the draft.

As an overview, I feel that this draft needs a few important "direction
changes" before it will be ready to adopt by the working group, assuming
the working group decides that this is work that we have both interest and
capacity to work on.

[Pranjal]  "Interest" part is understood but I think I would have some concerns on
"capacity" aspects of it and deserves more qualification - that is, are we compromising real
demands from today's networks just because WG can't cater to growing demands. If it's so, then I am not
sure if I am the right person to answer that in the context of this specific draft and IMO, perhaps a
question to be asked and addressed in a more general thread.

I first tried to determine if this work is appropriate for the working group.
It appears that the MPLS charter is very much out of date, so this item -
along with the 25+ drafts already adopted as working group drafts (in
various degrees of completion) - does not appear as a charter item.

[Pranjal]  I re-read the current WG charter as below. Did you mean the highlighted text to reason this work as out of
scope of the charter?
"
The working group is also responsible for specifying the necessary management objects (e.g. as part
of MIB modules) and OAM techniques for the functionality specified in the base MPLS technology.

The first generation of the MPLS standards are largely complete,
and the current WG work items are:

- Define requirements, mechanisms and protocol extensions for
point-to-multipoint (P2MP) MPLS

- Define requirements, mechanisms and protocol extensions for
traffic engineered point-to-multipoint (P2MP) MPLS, including
soft preemption

- Define requirements and mechanisms for MPLS OAM

- Define an overall OAM framework for MPLS applications

- MPLS-specific aspects of traffic engineering for multi-areas/multi-AS
in cooperation with the CCAMP WG

- Determine (with CCAMP) what procedures are appropriate for evaluating
proposals to extend the MPLS and GMPLS protocols, and document these

- Document current implementation practices for MPLS load sharing

- Include extensions to the MPLS WG protocols and RFCs necessary to
create an MPLS Transport Profile (MPLS TP). The work on the MPLS TP will
be coordinated between the working groups (eg, MPLS, CCAMP, PWE3, and
L2PVN) that are chartered to do MPLS TP work."

Although multiple ldp instance is a generic concept however this is also been discussed  recently in context
of ldp-ipv6 - that is for the fate separation aspects.  We had discussed ipv4 and ipv6 fate separation at length in the
mailing list and heard operators asking for this as requirement. We see ldp-ipv6 work as enhancements to "first generation"
of MPLS standards (e.g LDP RFC 3026/5036) and I can't see an explicit ldp-ipv6 into the existing WG charter. If that is the
case then may I ask how it is possible that ldp-ipv6 draft passed WG LC in the MPLS WG? I am just trying to see the correct
rationale behind your verdict on this document being out of charter.

A simple reason why we came to MPLS WG is because we are defining an approach to an existing standard (RFC 5036)
which is a product of MPLS WG. We could have gone to PWE3 or MPLS WG if the draft is geared towards a specific
application only (and thus not a generic infrastructure concept).

Which is a good segue to the topic - "can the working group actually do
this - along with the work they currently have already."  If every WG
participating member were to read all of the currently adopted drafts
in preparation for each IETF meeting, they could expect to spend 2 to
3 weeks out of every 4 months doing nothing else.

Possibly this is a bit much to expect.  I suspect this is a question for the
active participants in the MPLS working group to consider as a general
issue.

[Pranjal] I would probably agree but again I can't answer this as a co-author of an individual draft or I am not sure
this is the appropriate thread to raise this.

As per my understanding, perhaps the questions you are asking precisely are as below:

1. Is the WG equipped with sufficient b/w to cater to today's needs? I am not sure, personally I can support the fact that
we stop making technical progress or standardize drafts with fundamental flaws because the WG doesn't have
sufficient b/w. In such case a resolution needs to be taken by the WG on whether it needs a black out period
of "no draft submissions" in order to clear the existing backlogs.

2. Is the WG doing the right job in standardizing the right thing that is needed in today's networks? I can't answer
that question and perhaps a WG wide introspection is needed on quality of on-going works.

As a second general issue, this draft describes what I feel are fairly well
known rules that an implementation may follow to ensure that their
"cheating ways" are not discovered in some pathological way by other
implementations.

[Pranjal] I believe in this context, you are mentioning about introduction of Node-ID TLV that is used for Nodes to be multi-instance
aware; IMO, cheating is always bad because you may have a corner case left out somewhere. I would appreciate if you could describe
precisely a method where we can do multiple instances without Node-ID TLV but not getting caught in any of existing LDP based
"applications".

This seems to be the major contribution in section 2 of this draft.

In this sense, I'd be more comfortable if this draft were targeted to
become a BCP, rather than a Standard.

[Pranjal] I am perfectly OK to drop Node-ID TLV from the draft and let an implementation figure out certains things on own
(not getting cheated by peers). Thus we can certainly make the draft BCP or Informational.

But, enough of the general issues; on to the specifics of this draft...

Major issues:

The third paragraph of the introduction is incorrect.  When peering
with an LSR that uses two LSR IDs, it is possible for that LSR to do this
in a way that makes it  difficult (if not impossible) for a peer to detect
that the two LSR IDs identify LSR functions of the same physical node.

In fact, to ensure the highest probability of compatibility (or the
lowest probability of compatibility issues), any LSR implementation
that uses more than one LSR ID should do this.  That means they need
to use separate IP addresses for discovery, session establishment, etc.

This is an example of a standards implementation axiom that amounts
to this: "if you're going to cheat, be careful not to get caught."

If implementers follow this rule through proper use of addressing, and
a few other things, then support for multiple LDP sessions just works,
today - without requiring further standardization work.

[Pranjal] Yep, you are right. An implementation of multiple ldp instances exists that has been deployed in major networks
for around ~5 years by now. So it's a proven, robust technique. There are 3 major deployment reasons for multiple ldp
instances so far -


 1.  Fate separation or avoidance of head of line blocking between PW (Pseudowire) overlay and
LDP P2P/MP2P tunnels. I don't want to get into gory details but key reasons on such  a separation are - demultiplex
between intenstive PW status signaling, PW OAM etc and transport signaling + separation of
management entities between PW overlay and Transport.


 1.  Fate separation between LDP Multicast and Unicast Traffic. You can complement this further with IGP multi-topology.

      3.   Non-fate separated use case - Separation of ldp stack into independent managements domains inside a node (e.g east, west, north, south).
            Only one lsr- based session would be established to each region and thus no case of || sessions. In such use cases, each LDP LSR-ID is
            mapped to a single ipv4 address which is also a transport address of sessions and only that address is routable to the specific region (other
            regions don't see it). In this regional separation, LDP FECs are not stitched by default between domain and distribution between domains
           (inter-region LSPs) happen thru well defined local policies (it's a local implementation specific matter though). This specific use case of multiple
            instance case is inter-oping with vendors that don't support or aware of multi-instance. So in your words, this is the case of "cheating without
            getting caught".



For the case where the label space portion of the LSR ID is zero (the
so-called platform-wide label space), the implementation that has
multiple LSR IDs needs to be careful about allocating labels in different
LDP Sessions - but this is an implementation robustness issue, not a
standards issue.

[Pranjal] This is why this draft describes the cases of what do to when running || sessions between two nodes (fate-separation case), LDP session
capabilities, loop detection etc. The way I see it as - with existing methodologies of RFC 5036, an LDP based application mustn't get into pathological
case at any cost.

It is my opinion that the draft authors have not given sufficient thought
to what can be done using the current standards and a few common
sense implementation rules

[Pranjal]  We are aware of the current standard rules. As I mentioned, we are not talking theory but running code.
But when you want to use multiple instances for fate separation, I am not quite sure you can convince an implementer
to ignore Node-ID TLV in order to keep the existing LDP based applications sane. Currently, the fate separation cases
(1 and 2 as discussed earlier) have been working well with Node-ID TLV encoded as Vendor Private TLV  (single
Vendor implementation). I am perfectly OK to keep the Node ID TLV as Vendor Private as long as "vendor being cheated"
can guarantee that it handle any loop cases that may arise due to mis-configuration etc. We can set the draft as BCP/
Informational.

__________________________________________________________

I believe the direction taken in section 3 - toward greater visibility of the
fact that an LSR node has multiple LDP instance - is a mistake.

[Pranjal] I think I didn't understand your point well. Why do you say that it's a mistake when physical node has multiple
LSRs and LSR represents an instance within a single VPRN or Base Routing domain.

The authors have provided no example use case to support the assertion
that a loop may be formed as a result for "some applications" with a
properly implemented LSR.

[Pranjal] I agree, we fell short of explaining applications (e.g H-VPLS) in adequate detail. We can discuss a few applications at length
on how loop can occur and can remove Node-ID TLV/Loop detection (Each implementation finds smarter ways in their own).

In my opinion, this draft needs to provide explicit examples of where a
properly implemented/compliant LDP implementation would cause a
pathological outcome if it is implemented in such a way that a peer is
not able to detect that multiple LDP instances are implemented by a
common physical node.

[Pranjal] You have certainly raised a good point and I agree with you. We will do so.

Minor issues and Comments/Questions:

For the paragraph that starts on page 3 and continues onto page 4, the
first sentence has a number of problems and does not parse without
making some assumptions as to what the author(s) mean to say.

In addition, it would be useful if you could expand slightly on what you
mean by "routable"  IP addresses.  To be more precise, the 32-bit part
of the LSR ID that is typically used corresponds to an IP address (one of
potentially very many) that is assigned to the LSR node.


[Pranjal] Sure. We would describe in detail. "Routability" of LSR-ID is a prevalent case esp. on the use case
mentioned in 3 (regional separation). You use one loopback IP address for binding to everything in your OSS,
for IGP, an LDP LSR, OAM, etc. If the local IP address to which LSR is bound to is also used as transport (or
(requires reachability) then the loopback IP must be routable.

In fact, common implementation considerations frequently lead to a
preference to use the router ID (typically based on an internal assigned
"loop-back" IP address) - for much the same reasons that routers use
these addresses (i.e. they are less likely to be impacted by removal of
a physical interface).

[Pranjal] Perfectly agree. AFAIK, most of implementations I have seen so far has adopted to mapping a local
loopback IP address to 32-bit of LSR-ID, except a few.

If an IPv4 address assigned to the LSR is used for this purpose, it is
clear that the address is not only routable by assigned to a specific
network entity.

[Pranjal] Yes, that's correct.
________________________________________________________

In section 2, it looks as if we are conflating the meaning of "label space"
(as a number - zero, in this case), "label context" (not necessarily tied
to the label-space number, but definitely tied to an LDP session) and
data plane.

[Pranjal] The proposal mentioned in the draft applies to global label space and as per RFC 5036 global
label space is always identified by '0'. We are not saying that it doesn't work with per interface label space.
We can clarify that explicitly into the draft. Thanks for pointing this out.

As long as implementations observe certain rules, there is no problem
in using the existing protocol specifications when labels are assigned in
association with an active session and apply to a common subset of
interfaces (possibly including all interfaces).

[Pranjal] Yes, that's correct. But I am not quite sure we can claim that
"being cheated" would cater to all applications running on LDP as an
Infrastructure protocol.

For example, as long as the same label is not issued by two (or more)
LSR instances with a different semantic meaning, that label does not
present any problems when used (as intended) in the common data
plane.

[Pranjal]  On theory yes, but when we are making a robust implementation of a protocol stack, every -ve case
needs to be considered (Not IETF design rule but software design rule). Ideally, a peer shouldn't distribute
duplicate labels but what if it does? A receiver can get any message as defined in RFC 5036. I gave H-VPLS as
another example where we can stitch FEC128 spoke to FEC 129 mesh PW. In that case if both are wrongly terminated
in same node then existing LDP procedures can't determine the semantics of applications (e.g loop in MAC FIB).  Can
we guarantee that the receiving LSR that is well behaved would remain sane? From an implementation perspective
we can't see RFC 5036 only as LDP and an implementation needs to take care of the larger picture - where
LDP becomes infrastructure provider on which many solutions have been built upon.

The obvious method for doing this is for all LSR instances to share a
common label management function - for at least the case where
labels allocated may have meaning across multiple interfaces.  This
has been done in LDP implementations since before RFC 3036 (the
predecessor to RFC 5036) was published.

[Pranjal] "for at least the case where labels allocated may have meaning across multiple interfaces."
Do you mean to say "same" meaning or "different" meaning across multiple interfaces?

Note that this is an implementation choice.
_________________________________________________________

NITs:

In the Introduction, in (I believe) the 4th sentence of the 2nd paragraph,
"4 octets" should be "first 4 octets" (the preceding sentence talks about
the 6-octet LSR ID and the next sentence talks about the last 2 octets).

The sentence, in the penultimate paragraph of section 1 (Introduction),
that starts "Suc next-hop addresses" was probably meant to say "Such
next-hop addresses" - and there is probably supposed to be a space
after the period in that sentence and before the first word ("Thus") of
the next sentence.

[Pranjal] Thanks. Would rectify the errors.
--
Eric