Re: [mpls] Still open: working group lst call on draft-ietf-mpls-seamless-mpls

Curtis Villamizar <curtis@ipv6.occnc.com> Thu, 23 January 2014 23:12 UTC

Return-Path: <curtis@ipv6.occnc.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C5E261A035C for <mpls@ietfa.amsl.com>; Thu, 23 Jan 2014 15:12:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.837
X-Spam-Level:
X-Spam-Status: No, score=-1.837 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_35=0.6, RP_MATCHES_RCVD=-0.535, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UsWAENONM3L2 for <mpls@ietfa.amsl.com>; Thu, 23 Jan 2014 15:12:08 -0800 (PST)
Received: from maildrop2.v6ds.occnc.com (maildrop2.v6ds.occnc.com [IPv6:2001:470:88e6:3::232]) by ietfa.amsl.com (Postfix) with ESMTP id 9D50E1A02DC for <mpls@ietf.org>; Thu, 23 Jan 2014 15:12:07 -0800 (PST)
Received: from harbor3.ipv6.occnc.com (harbor3.v6ds.occnc.com [IPv6:2001:470:88e6:3::239]) (authenticated bits=128) by maildrop2.v6ds.occnc.com (8.14.7/8.14.7) with ESMTP id s0NNC08b010650; Thu, 23 Jan 2014 18:12:00 -0500 (EST) (envelope-from curtis@ipv6.occnc.com)
Message-Id: <201401232312.s0NNC08b010650@maildrop2.v6ds.occnc.com>
To: "maciek@cisco.com" <maciek@cisco.com>
From: Curtis Villamizar <curtis@ipv6.occnc.com>
In-reply-to: Your message of "Mon, 13 Jan 2014 20:28:16 +0000." <C70CBD5B-1FBD-438E-BC0B-A2D75A89F986@cisco.com>
Date: Thu, 23 Jan 2014 18:12:00 -0500
Cc: "mpls@ietf.org" <mpls@ietf.org>, "draft-ietf-mpls-seamless-mpls.all@tools.ietf.org" <draft-ietf-mpls-seamless-mpls.all@tools.ietf.org>, "mpls-chairs@tools.ietf.org" <mpls-chairs@tools.ietf.org>
Subject: Re: [mpls] Still open: working group lst call on draft-ietf-mpls-seamless-mpls
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: curtis@ipv6.occnc.com
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jan 2014 23:12:12 -0000

In message <C70CBD5B-1FBD-438E-BC0B-A2D75A89F986@cisco.com>
"Maciek Konstantynowicz (mkonstan)" writes:
 
> Curtis,
>  
> Pls see our comments inline, and let us know your thoughts.
> We have also posted updated ID.

There was very little change between the 04 version and the 05
version.

> On 15 Oct 2013, at 18:09, Curtis Villamizar wrote:
>  
> > 
> > Loa,
> > 
> > No details are given in IPR #686 (applicable to RFC 5283) for the
> > "Reasonable and Non-Discriminatory License to All Implementers with
> > Possible Royalty/Fee."  At the very least, some terms should be given.
>  
> Update from Bruno in addition to the email he sent on
> Date: 16 October 2013 08:43:46 GMT+01:00
>  
> <...>
> Tried for more than 6 months, to have my IPR team to update the IPR 853
> to make it clear that it _replaces/supersede _ 686.
> But they are not moving. It's all the more incredible that they have
> dropped the patent. So no, it's not even "No licence Required" it's even
> no IPR at all...
>  
> That being said, the above IPR was on RFC 5283, not on seamless MPLS
> draft. I had a discussion with Loa & Adrian on this, and there opinion
> is that no IPR declaration would be required for draft seamless mpls. So
> there is no real problem in the first place.
> <...>
>  
> > 
> > IPR #1920 and #2212 do provide details.
> > 
> > An alternate to the use of a prefix based LDP LSP is to use a prefix
> > based RSVP-TE LSP and carry the individual end-to-end LSP within it.
> > This is not mentioned in the draft.  See below for details.
>  
> The use cases, proposed design and protocol choices have been driven by
> the actual SP deployments.
>  
> Design based on the hierarchy of RSVP-TE LSPs may address the listed use
> cases. However the labeled BGP design with LDP DoD has been chosen due
> to the higher degree of out-of-the-box automation and operational
> simplicity as well as compatibility with the existing backbone and
> backhaul designs & deployments which use LDP and not RSVP-TE.
>  
> It also assumes relatively simple MPLS implementations on access nodes -
> RFC 7032 goes into much more detail there.

Could you please simply mention that RSVP-TE might be an alternative
and then state the assumptions above in the document as reasons for
using this approach.


> > Scaling numbers are given as:
> > 
> >  Number of Aggregation Domains: 100
> >  Number of Backbone Nodes: 1.000
> >  Number of Aggregation Nodes: 10.000
> >  Number of Access Nodes: 100.000
> > 
> > This section should state that very sparse connectivity among the set
> > of access nodes is expected.  For exampls, you would not expect each
> > access node to have an LSP to every other access node.  Is so, more
> > than 10 such access nodes aggregated prior to a prefix based LSP would
> > exceed the limits of the MPLS 20 bit label space.
>  
> The requirement was to cater for service connectivity over transport
> LSPs per scaling numbers provided for listed deployment use cases. The
> required design was not to restrict the LSP based connectivity between
> the access, aggregation and backbone nodes, catering equally well for
> sparse and dense connectivity, but not the full-mesh. Full-mesh
> connectivity between between all ANs will require each AN maintaining
> LSP that they initiate and terminate, complicating the AN
> implementation. 
> Luckily this is not the case in the listed use cases and the actual
> access deployments.
>  
> The result is the seamless MPLS design specified in this draft. And it
> does work for dense transport LSP connectivity between the access nodes
> without exhausting the MPLS 20-bit label space on any of the nodes by
> relying on LSP hierarchy provided by labeled BGP for inter-domain
> connectivity.
>  
> See section 5.2 for scalability analysis, and numerical examples for
> access node connectivity in section 5.2.1.5.
>  
> > 
> > On the other hand it would not be unreasonable to expect each of the
> > 10,000 aggregation nodes to be fully meshed or at least very densely
> > meshed. This can also create problems without some form of
> > aggregation as a worst case graph cut set with 5000 nodes on each side
> > would have 25,000,000 LSP.  A cutset of 2-5 nodes (typical core) could
> > not support this (exceeds the 20 bit label space) without aggregation.
>  
> We read your comment "without some form of aggregation" as meaning
> "without some form of hierarchy".
> If so, indeed this is why specified design used LSP hierarchy per
> earlier comment.

OK.  I reread parts of your draft and I understand how you are using
recursive LDP LFIB lookup to create a hierarchical label stack at the
ABR.

> > Some indication of how dense or sparse the connectivity would be
> > useful in this section (2.1.  Why Seamless MPLS).  
>  
> Indicative access node level connectivity has been described in section
> 5.2 Scalability Analysis, but we agree that it makes sense to give an
> indication in section 2.1

Access node connectivity is not mentioned in 5.2 as far as I can tell.
What you do have in "5.2.1.4. Summary" is mention that "The main
limitation is the MPLS connectivity requirements on the AN,
i.e. mainly the number of LSP needed on the AN."  At no point in this
section is there mention that this limit is less than #AN because in
the typical deployment there is a sparse mesh of connectivity among
access nodes.

You would really solve this if in "5.2.1.1. Introduction" you added a
variable which is the number of access nodes that any one access node
is connected to.  Perhaps call it #AMD for access mesh density and
explain what it is.  You have a magical 1k and 100 under assumptions
that comes out of thin air.  You can replace it with #AMD (or whatever
name you pick).  Then when you do the worked examples in
"5.2.1.5. Numerical application for use case #1" and
"5.2.1.6. Numerical application for use case #2" you can include
values for #AMD.  I do think that right now the 1k and 100 numbers are
reasonable values but I think in the future these may increase.

The reason that the 1k number looks to be out of thin air is because
sections 2.2.2 and 2.3.2 contain nothing but a table and no
explanation and there is desription of the assumption or connection to
the 1k number later on in section 5.2.  See below on 2.2.2 and 2.3.2.

[OT and IMO - the access node connectivity due to legacy circuits is
expected to continue to drop even though actual TDM infrastructure is
disappearing and being replaced with IP and PW for remaining TDM, FR,
and ATM.  OTOH - l3vpn connectivity is growing.  Many potential l3vpn
customers know they can use plain old Internet and tunnel traffic
themselves and encrypt it.  The value to them of l3vpn is mostly
preferred treatment of traffic.  As cost of l3vpn service drops, that
value will out weigh the cost of the service until penetration
increases (in which case cost may drop more).]

btw- Are case #1 and case #2 the numbers that specific customers asked
you to use?  A simple "yes/no" question - no need to name them.

> Following text added in section 2.1:
>  
> Multiple Service Providers plan to deploy networks with 10k to 100k MPLS
> nodes, with varying levels of MPLS LSP connectivity between those nodes
> - sparse-mesh in access, partial-mesh in aggregation and full-mesh in
> core. This is typically at least one order of magnitude higher than
> typical deployments and may require a new architecture.

OK.

> > It may be worth
> > creating a new subsection (2.2 Scaling Goals) and going into a little
> > more detail.
>  
> We believe this comment is addressed by above addition in section 2.1
> and section 5.2. Scalability Analysis. Let us know if this does address
> your comment.

Please consider the comments I've made above.

I'm also not sure what "IGP Control Plane = 2" and "IP FIB = 2" mean
in table 1 and table 2.  You also have "LDP Control Plane = 200" (or
1000) and "LDP FIB = 200" (or 1000).  It is not clear to me what it
means for an access node to have 200 control planes.  If the access
node has only default routes I can see where "IP FIB = 2" would come
about but that assumption is not stated.  It is unclear what "IGP
Control Plane = 2" means.  Please explain in the document.

It is also more common AFAIK to state how many VFRs the access node
needs and also how many routes per VRF on average.  Since the access
nodes have a high fanout to CPE nodes, the number of IP fib entries
would be the two conditional default routes plus at least one route
per CPE attachment, plus VRF routes which should be counted
separately.  If it is assumed that the aggregation nodes hold the VRF,
that should be stated.

If you are just counting core facing IP routes, then say so in the
document.

BTW- the correct MPLS term is ILM.  IFIB is a vendor term and so far
does not appear in any RFCs.  s/IFIB/ILM/g please.

> > Then there is the question as to whether this draft should even go
> > forward at all.
>  
> Can you pls be more specific why you don't see this draft proceeding ?
>  
> There are production implementation by both vendors and providers of
> Seamless MPLS design as specified in this draft.

Now that I see how you are doing the core hierarchy I see how this
works.

> > It is worth noting that a full mesh of 10.000 RSVP-TE LSP is feasible
> > using hierarch.  Within the core of 100 nodes, PSC-4 can be used.
> > Each of the 1,000 backbone nodes can create a PSC-3 LSP to each other
> > backbone node (using above terminology, "edge" nodes in more common
> > core-edge-access or core-edge-aggregation-access terminology).  If on
> > average there are 10 backbone nodes per core node pair, then each core
> > node has on the order of 10,000 ILM entries (about 20,000 if you
> > consider 50 pairs of core nodes, each serving 20 nodes).  There are
> > only 100 LSP from each core node facing the core side, so FRR can be
> > very effectively deployed.  The same holds for a full mesh of the
> > 10.000 aggregation nodes.  If each of the 1,000 backbone nodes are
> > deployed in pairs serving on average 20 nodes per pair, then an ILM
> > siz on the order of 200,000 is needed.  The PSC-3 LSP used to reach
> > the far side backbone node can be used, yielding only 1,000 LSP facing
> > the core per backbone node.  Again, FRR can be used, with the protect
> > path using the alternate core node of the designated pair.
> > 
> > In this scenarion the access nodes may or may not be full meshed.  If
> > they are full meshed, then they need to be able to support 100,000 DoD
> > mode LSP.  If the are more sparsely meshed, they can support a lower
> > number of DoD mode LSP.
>  
> We do not disagree that alternative design approach based on RFC 4206
> and related h-LSP RFCs may be applied to address the LSP connectivity in
> this environment.
>  
> However we believe the design specified in the Seamless MPLS draft
> better meets described requirements including better deployment
> flexibility, better scaling and easier troubleshooting.

As I said before, the draft would be greatly improved if you mentioned
that you have considered this alternative and why you took the
approach that you took.

> > If RSVP-TE is used in this way, there is no need for aggregating LDP
> > LSP.  The LDP LSP are aggregated into RSVP-TE LSP and further
> > aggregated in PSC-3 LSP and PSC-4 LSP in tiers closer to the core.
>  
> The draft does not propose aggregating LDP LSPs. In fact the design
> enables the operator to choose the transport LSP signalling protocol,
> LDP or RSVP-TE, per domain, per section 4.4. Intra-Domain Routing.

OK.  I missed that mention of RSVP.  We may have been arguing for
similar solutions but your solution adding the recursive LDP route
lookup when only a prefix route is present and resulting in a label
stack.

This is a very key aspect of your proposal and it is not well
highlighted in section 4.5 where there is only the one word mention of
hierarchy.  Perhaps you could add a forward reference from that point
such as a sentence saying "The mechanism for this hierarchy is defined
in Section 5.1.3" and then change the title of 5.1.3 from hierarchy to
"Hierarchy based on recursive LDP route lookup".  This is AFAIK the first
RFC mention of this technique and it is quite well buried.  Also in
this section please mention that router "I" could be an RSVP LSP or
LDP LSP that reaches a prefix at the other ABR.

> > There are ultimate scaling limits to either approach, imposed by the
> > 20 bit label space.  If a full mesh is needed at a given tier T with N
> > nodes, then the nodes in tier T-1 aggregating tier T needs an ILM size
> > of N times the number of T nodes the T-1 node serves.  So for example,
> > with a full mesh of 100,000 tier T nodes if each tier T-1 node could
> > aggregate 8 tier T nodes without exceeding the 20 bit label space size
> > (ILM=800,000 in this case), but could not aggregate 20 tier T nodes
> > (ILM=2,000,000).  If using RSVP-TE in the core, the core is limited by
> > the worst case cut set, but core sizes of on the order of hundreds are
> > OK (but smaller is better).
>  
> In line with your comments the scaling limits are imposed not only by
> MPLS 20-bit label space, but also by the number of supported LFIB
> entries on specific MPLS node.
>  
> Design in this draft relies on labeled BGP control plane to scale the
> MPLS label distribution and to optimize the MPLS data plane on ABR nodes
> by installing only the local labelled routes in its LFIB, as described
> in section 5.1.7.

Now that I see your LDP based hierarchy works I see how the scaling
works out.

> > Using either RSVP-TE or LDP for aggregation the outermost T-max tier
> > could have a million nodes or more as long as they were sufficiently
> > sparsely connected.  This limit is independent of whether aggregation
> > is via LDP or RSVP-TE.
>  
> Similarly, the scale of the design in this draft is restricted by the
> scale of AN at the outskirts of the MPLS network and their connectivity
> needs driving the scale of neighbouring AGNs.

We are in agreement.  Your text was simply not clear to me.  I'm not
usually known to be more clueless than the average reader (maybe
depends on who you ask, but OT) so it seems that some improvements in
document clarity wouldn't hurt.

I suspect that few people actually thoroughly read your document and
thought about where your numbers came from and that may have more to
do with lack of anyone else questioning your numbers (or maybe I *am*
just really dense - I hope not).

> > With RSVP-TE used for aggregation, T-LDP is used among the sparse mesh
> > in the outermost T-max tier.  A T-LDP session is only needed if FEC
> > information needs to be exchanged via LDP to support an underlying
> > service, otherwise RSVP-TE alone could be used.  Using T-LDP the
> > number of T-LDP TCP sessions could be a limiting factor for very
> > highly connected tier T-max nodes.  Either TCB state or the 16 bit TCP
> > port limit could be the limit depending on how much RAM the T-max node
> > has.  OTOH if a tier T-max node out on the fringes is supporting on
> > the order of 50K T-LDP sessions, it can use multiple IP addresses to
> > get around the 16 bit TCP port number issue.
>  
> In the draft, labeled BGP is used for labeled routes, providing excellent
> scaling properties in the control plane.

Yes.  I agree.  It was not clear to me at first how your hierarchy was
working in the core with the labeled BGP routes exchanged but not
installed in the core.

At some point in the document you should mention that the labeled BGP
routes are not installed in the core.  I searched the document for the
word BGP (the word BGP is in a lot of places) and there does not seem
to be any mention that the BGP labeled routes are installed at the ABR
and carried across the core but are not installed in the core.

Again, this is a key aspect of the proposal and it has gone unstated.

> > LDP could be extended to avoid the need for T-LDP sessions using LDP
> > distribution of labels that will make use of RSVP-TE LSP.  A DoD
> > request would normally create a series of label bindings and swaps.
> > If a full mesh of RSVP-TE LSP is know to exist within a prefix (ie:
> > tier T), then the LSR can return a mapping of FEC,label,addr with its
> > address.  This mapping can be passed back and at each hop that
> > actually does have a direct path to that address the mapping can be
> > installed and the RSVP-TE LSP used as an outer label, even if there is
> > no T-LDP session to that address.  (btw- AFAIK no such extension
> > exists, I'd be happy to be wrong about that).
> > 
> > IMHO using RSPV-TE is a viable and IMHO better solution for the
> > problem posed in this draft.  
>  
> It is opinion of the authors of this draft and WG members supporting
> this draft that the design described in the draft is more optimal for
> scaling MPLS into large access and aggregation deployments compared to
> RSVP-TE with hierarchical LSPs. The draft also efficiently accommodates
> capabilities of access device and the operational aspects.

OK.  Now I see how your approach works and I can see the benefits.
Prior to this discussion it was not clear from reading the draft.

> > It is also not encumbered by IPR AFAIK.
>  
> IPRs are provided on non-discriminatory terms, per IETF standard.

Yes.  I've seen that.

> > I don't think the draft provides a good solution.
>  
> Per earlier note, the design specified by this draft has been accepted
> by number of operators for production deployments.

Now that I better understand how your proposal works, I withdraw my
"not a good solution" comment and replace it with "your draft is not a
clear description of your solution, but that can be easily fixed".

Please consider making the changes that I have suggested to improve
the document clarity.

> /maciek

Regards,

Curtis


> > Curtis
> > 
> > 
> > 
> > In message <525CD992.2020800@pi.nu>
> > Loa Andersson writes:
> > 
> > Working Group,
> > 
> > I'll will keep this wglc open until October 21st, there are several
> > reasons.
> > 
> > - the subject line did not explicitly say that this was a working group
> >  last call
> > - we had a very late IPR disclosure, and we are still looking into that.
> >  We should like to draw the attention to the expectation that IPRs
> >  need to be disclosed in a timely fashion after your name appears on
> >  on a document, we are talking days or weeks, rather than months; and
> >  certainly not years
> > - we have not seen any comments on the list, we are therefore now also
> >  asking if there is support to progress the draft.
> > 
> > /Loa
> > 
> > 
> > 
> > On 2013-09-26 13:37, Loa Andersson wrote:
> >> Working Group,
> >> 
> >> this is to start a two week+ working group last call on
> >> draft-ietf-mpls-seamless-mpls-05.
> >> 
> >> Please send your comment to working group mailing lists (mpls@ietf.org).
> >> 
> >> We did an IPR poll on this document prior to starting the wglc.
> >> All the authors responded to the IPR poll that they are not aware of
> >> any IPR's relating to this document other than the one already
> >> disclosed.
> >> 
> >> There are no IPRs disclosed directly against this document, disclosure
> >> #1920 was disclosed against an earlier individual version of the
> >> document.
> >> 
> >> It has also been pointed out that one of the components in the Seamless
> >> MPLS architecture is derived from RFC 5283 and that IPR disclosures
> >> # 686 and # 853 are applicable.
> >> 
> >> The working group last call will end Friday October 11, 2913.
> >> 
> >> /Loa
> > 
> > -- 
> > 
> > 
> > Loa Andersson                        email: loa@mail01.huawei.com
> > Senior MPLS Expert                          loa@pi.nu
> > Huawei Technologies (consultant)     phone: +46 739 81 21 64