Re: [mpls] AD review of draft-ietf-mpls-forwarding

Curtis Villamizar <curtis@ipv6.occnc.com> Mon, 27 January 2014 04:58 UTC

Return-Path: <curtis@ipv6.occnc.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F13BE1A01AB for <mpls@ietfa.amsl.com>; Sun, 26 Jan 2014 20:58:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.437
X-Spam-Level:
X-Spam-Status: No, score=-2.437 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.535, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AEdEi8WvrPum for <mpls@ietfa.amsl.com>; Sun, 26 Jan 2014 20:58:54 -0800 (PST)
Received: from maildrop2.v6ds.occnc.com (maildrop2.v6ds.occnc.com [IPv6:2001:470:88e6:3::232]) by ietfa.amsl.com (Postfix) with ESMTP id 129ED1A01A4 for <mpls@ietf.org>; Sun, 26 Jan 2014 20:58:53 -0800 (PST)
Received: from harbor3.ipv6.occnc.com (harbor3.v6ds.occnc.com [IPv6:2001:470:88e6:3::239]) (authenticated bits=128) by maildrop2.v6ds.occnc.com (8.14.7/8.14.7) with ESMTP id s0R4woWw074790; Sun, 26 Jan 2014 23:58:50 -0500 (EST) (envelope-from curtis@ipv6.occnc.com)
Message-Id: <201401270458.s0R4woWw074790@maildrop2.v6ds.occnc.com>
To: adrian@olddog.co.uk
From: Curtis Villamizar <curtis@ipv6.occnc.com>
In-reply-to: Your message of "Sun, 26 Jan 2014 18:02:08 +0000." <005601cf1ac0$bc2ea410$348bec30$@olddog.co.uk>
Date: Sun, 26 Jan 2014 23:58:50 -0500
Cc: mpls@ietf.org, draft-ietf-mpls-forwarding.all@tools.ietf.org
Subject: Re: [mpls] AD review of draft-ietf-mpls-forwarding
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: curtis@ipv6.occnc.com
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 04:58:58 -0000

In message <005601cf1ac0$bc2ea410$348bec30$@olddog.co.uk>
"Adrian Farrel" writes:
> 
> Hi,
>  
> Thank you for a really thorough and clear document.
>  
> I have done my usual AD review upon receiving the publication request 
> for this document. The purpose of the review is to catch any issues that
> might otherwise show up in IETF last call and IESG evaluation. In this
> case it is also an extra pair of eyes on what is a very detailed 
> document.
>  
> I have a number of comments below. A few are editorial nits (sorry) and
> some are questions for clarification of the text. There is very little 
> where I come even remotely close to saying "wrong" or missing".
>  
> It looks to me that a new revision would be useful just to mop up these
> comments, so I have put the document into "Revised I-D Needed" state and
> will wait to see the next revision before starting the IETF last call.
> Please debate any issues where you disagree with me or think that no
> document change is needed.
>  
> Thanks for the work,
> Adrian


Adrian,

Thanks for the review.  Better to get the changes in now than later.

There are a few dicussion points below before coming out with a new
revision of document.

> ===
>  
> idnits notes that...
>  
>   == Outdated reference: draft-ietf-pwe3-vccv-impl-survey-results has been
>      published as RFC 7079

Got it.

> ---
>  
> Andy may want to update his coordinates.

Approximate coordinates updated as per Andy's most recent request.

> ---
>  
> Your acronym list is commendably thorough, but a little enthusiastic.
>  
> AC is only used once and then only at the point of expansion.
> CE doesn't appear to be used at all
> FEC only seems to be used in this document in the context of 
>   Forwarding Equivalence Classes (LDP)

If I provide a section with a list of acronyms, do I still have to
expand on first use.  If so, AC, NSP, OAM, and a few others appear
before that section.

AC is used in "Specific pseudowire AC and NSP are out of scope."

I included CE since PE and P are included.

> The following are all well-known acronyms according to
>   http://www.rfc-editor.org/rfc-style-guide/abbrev.expansion.txt
>   so they don't need to be included
> BGP
> CPU
> DDoS
> DoS
> GMPLS
> IANA
> IP
> IPv4
> IPv6
> LDP
> MPLS
> NTP
> QoS
> RTP
> TCP
> UDP
> WG

Given that this takes up only 17 lines in a 50 page document I'd
rather be thorough.  The audience here is assumed to include but not
be limited to people who are not quite as familiar with MPLS, and are
somehow involved in chips, systems, or eval, and need guidance
regarding MPLS forwarding (though it might be a deep dive into the
guide book and I pitty the MPLS newbie starting here or anywhere).

> I think Curtis may have heard this before :-)
> The "preferred" (by the RFC editor) expansion of ECMP is
> "Equal-Cost Multipath"

The form without the hyphen is more common, even among recent
documents.  I prefer to keep it without the hyphen.

> Several acronyms are qualified with (LDP) (CE, FEC, P, PE). I haven't
> checked the usage in this document, but it seems to me that the terms
> might have wider applicability. Indeed, a quick search revealed 
> "RSVP-TE PE".

I will change this to "(LDP, RSVP-TE, other protocols)" in P, PE, CE.

> The third expansion of PSC could have a reference to RFC 6378. I think
> that to avoid confusion, you need to expand this in the text when you 
> use it. You do this well everywhere except in Section 4.7 T#32, and in
> this acronym list itself at E-LSP and L-LSP.

Reference to RFC 6378 added.

In T#32, the context MPLS-TP AIS/RDI and PSC should make the choice of
which PSC is meant sufficiently obvious.  I'll expand it anyway and in
the one other case where not expanded on first use in a section.

It should be obvious in E-LSP and L-LSP from the context plus the
EXP-Inferred-PSC is the official expansion of E-LSP.  If not obvious,
the reader can look at RFC3270 since that is in the same line of text.

> ---
>  
> Section 1.3 bullet 5
>  
>    5.  The implementer and system designer MUST support pseudowire
>        control word (CW) if MPLS-TP is supported or if ACH [RFC5586] is
>        being used on a pseudowire.
>  
> The wording is a bit odd. "The implementation and system design..."?
>  
> Ditto bullets 6 and 7

Target audience is explained in Section 1.4.  If you like I can flip
Section 1.3 and 1.4 so target audience is first, then use of the roles
called for in the target audience section won't seem quite so odd.

> ---
>  
> Section 1.3
>  
> While there is not wrong with the statements made in the bullets, some
> of the later ones refer to recent additions to the MPLS suite. Yet the
> list is presented as "there were some misconceptions." Clearly the
> early silicon did not have misconceptions about the inclusion of entropy
> labels. 
>  
> Just tweak the words at the top of the list?

I'd like to keep that as is and split into two lists.  The second list
would have the last two items (fat-pw and EL).  The first list would
end with "implement CW" (sic).

 OLD

   6.  The implementer and system designer SHOULD support adding a
       pseudowire Flow Label [RFC6391].  Deployments MAY enable this
       feature for appropriate pseudowire types.  See Section 2.4.3.

   7.  The implementer and system designer SHOULD support adding an MPLS
       entropy label [RFC6790].  Deployments MAY enable this feature.
       See Section 2.4.4.

 NEW

   The following statements provide clarification regarding more
   recent requirements that are often missed.

   1.  The implementer and system designer SHOULD support adding a
       pseudowire Flow Label [RFC6391].  Deployments MAY enable this
       feature for appropriate pseudowire types.  See Section 2.4.3.

   2.  The implementer and system designer SHOULD support adding an MPLS
       entropy label [RFC6790].  Deployments MAY enable this feature.
       See Section 2.4.4.

I've made this change.  Let me know if this is not OK.

> ---
>  
> Section 2.1
>  
>    Tunneling encapsulations which may carry MPLS, such as MPLS in GRE,
>    L2TP, or LDP, are out of scope.
>  
> I think s/LDP/UDP/

Yes.  Thanks.

> ---
>  
> 2.1.1
>  
> Maybe the first paragraph should clarify "special purpose labels at 
> the top of the label stack"

That phrase doesn't appear anywhere in the document.  Exactly what am
I clarifying?  What to do with unknown special purpose labels is in
the last paragraph of this subsection.

> ---
>  
> 2.1.1
>  
> I think that this section should note that labels 0-15 were commonly
> referred to as "reserved labels" and are only renamed to "special
> purpose labels" by [I-D.ietf-mpls-special-purpose-labels].

 OLD

   [RFC3032] specifies that label values 0-15 are special purpose
   labels with special meanings.

 NEW

   [RFC3032] specifies that label values 0-15 are special purpose
   labels with special meanings.
+   [I-D.ietf-mpls-special-purpose-labels] renamed these from the term
+   "reserved labels" used in [RFC3032] "special purpose labels".

BTW - draft-ietf-mpls-special-purpose-labels is also waiting for WG
chair go-ahead.  Hopefully since it is a short document (and I think
non-controversial) it will go through smoothly and quickly so we can
change this to an RFC reference.

> ---
>  
> 2.1.4
>  
>    MPLS hierarchy as described in [RFC4206] can in principle add up to
>    four additional labels.  MPLS hierarchy is discussed in
>    Section 2.1.6.
>  
> Similar text in 2.1.6
>  
> I wonder whether the choice of "four" here is because RFC 4203 defines
>       1     Packet-Switch Capable-1 (PSC-1)
>       2     Packet-Switch Capable-2 (PSC-2)
>       3     Packet-Switch Capable-3 (PSC-3)
>       4     Packet-Switch Capable-4 (PSC-4)
> If that is the case, we should note that RFC 7074 deprecates PSC-2 
> through 4 recognising that different switching levels within the PSC
> world were not applied and that LSP nesting was arbitrary and not 
> limited to four levels.
>  
> On the other hand, if "four" comes from somewhere else, perhaps you 
> could give a clue in the text.

OK.  Missed this in CCAMP otherwise I would have objected to making
this change without changing the SCSI for the PSC type to includes a
level of hierarchy.  Though perhaps a 4 byte drop in MTU is a hint.

I'll change this from "add up to four additional labels" to "at least
one additional label" and cite RFC 7074.

The PSC-2 and up were needed if you wanted more than one level of
nesting and needed to know whether your PSC-1 LSPs could make use of
other PSC FA.  Can be kludged with admin attributes though.

> ---
>  
> While per platform label space is mentioned in 2.1.7 I wonder whether
> more information on per platform and per interface label spaces is 
> needed. I recall early implementations that got very confused when
> parallel interfaces used the same label for different purposes.
>  
> I guess the point there is that you cannot assume that your neighbor
> is or is not using the per platform label space.
>  
> Upstream label allocation may also come into this.

The only mention of label allocation is that MPLS FRR bypass method
(more formally known as facilitles backup) uses platform label space.

The only reason platform label space is of any significance in a
document about forwarding is "The use of platform label space impacts
the size of the LSR ILM for LSR with a very large number of
interfaces."

Label allocation, per platform or per interface and upstream or
downstream, is not a forwarding issue.  It is a software issue
and a matter of getting the protocol bits right.  Therefore I think
expanding any further on label allocation should be out of scope.

But yes ... it is something that can trip up vendors.

> ---
>  
> 2.1.8.1
>  
>    1.  The most common case is where reordering occurs is rare,
>  
> One too many instances of "is"

yeah or something like that.

 NEW

   1.  The most common case is where reordering is rare,

> ---
>  
> 2.1.8.1
>  
>    3.  If the edge is not using pseudowire control word (CW) and the
>        core is using multipath, reordering will be far more common.  If
>        this is occurring, the best solution is to use CW on the edge,
>        rather than try to fix the reordering using resequencing.
>  
> Completely agree, but isn't the sequence number contained in a control
> word meaning that the resequencing could, in any case, not be done 
> without using a control word?

I suppose you can't fix reordering caused by not using CW without the
sequence number in the CW.  That is going to require fixing the text.

 OLD

   3.  If the edge is not using pseudowire control word (CW) and the
       core is using multipath, reordering will be far more common.
       If this is occurring, the best solution is to use CW on the
       edge, rather than try to fix the reordering using resequencing.

 NEW

   3.  If the edge is not using pseudowire control word (CW) and the
       core is using multipath, reordering will be far more common.
       If this is occurring, using CW on the edge will solve the
       problem.  Without CW, resequencing is not possible since the
       sequence number is contained in the CW.

That was a big oops on our part.

> ---
>  
> 2.1.8.1
>  
>    4.  Another avoidable case is where some core equipment has multipath
>        and for some reason insists on periodically installing a new
>        random number as the multipath hash seed.  If supporting MPLS-TP,
>        equipment MUST provide a means to disable periodic hash reseeding
>        and deployments MUST disable periodic hash reseeding.  Even if
>        not supporting MPLS-TP, equipment should provide a means to
>        disable periodic hash reseeding and deployments should disable
>        periodic hash reseeding.
>  
> Are those two "should" really "SHOULD"?

"No" at this point because of the way we stated we would use the RFC
2119 keywords in Section 1.2.

Since no existing RFC calls for this it has to be lower case.

I could (and did so far) change the last sentence:

 OLD

   Even if not supporting MPLS-TP, equipment should provide a means to
   disable periodic hash reseeding and deployments should disable
   periodic hash reseeding.

 NEW

   Operator experience dictates that even if not supporting MPLS-TP,
   equipment SHOULD provide a means to disable periodic hash reseeding
   and deployments SHOULD disable periodic hash reseeding.

This would be consistent with the use of RFC 2119 terms in this
document, explicitly calling out any requirement not already stated in
an existing RFC.

The prior MUSTs in the paragraph are upper case becausse you would
violate MPLS-TP requirements by not doing something.

> ---
>  
> Should 2.2 distinguish the order of magnitude of replication at branch
> nodes? This impacts the replication method used (some devices make a
> copy and cycle around, some devices can do multiple copies at once). On
> the whole is no different from IP multicast processing except (as you
> note) that each outgoing packet may be different by its label value.

Is it possible to quantify the fanout?  YMMV?

The only thing I could say is that an implementation may need to make
lots of copies in some roles (access routers for example).

Making a copy and cycling yields poor performance but for low
multicast traffic volumes might be OK.  But you are right - some
mostly low-end-ish chips to this.

I'm not sure I can describe how multicast with high fanout is done
without wading into implementation details of specific vendors.

Perhaps the best I can do is add this:

   Careful consideration should be given to the performance
   characteristics of high fanout multicast for equipment that is
   intended to be used in such a role.

I'll add this before the last paragraph in the section.

> ---
>  
> 2.4
>  
> So obvious you didn't say it?
>  
>    In order to support an adequately balanced load distribution across
>    multiple links, IP header information must be used.  Common practice
>    today is to reinspect the IP headers at each LSR and use the label
>    stack and IP header information in a hash performed at each LSR.
>    Further details are provided in Section 2.4.5.
>  
> Missing is the statement that a single "flow" must not be distributed 
> across multiple paths because of the implication for potentially
> significant packet misordering. And feeding that is a common requirement
> that such packet misordering must not occur because applications and
> transport protocol implementations cannot survive such misordering.

Yes.  That requirement was missed.  Add new second paragraph to this
subsection.

   The Differentiated Services requirements for good reasons dictate
   that packets within a common microflow SHOULD NOT be reordered
   [RFC2474].  Service providers generally impose stronger
   requirements, commonly requiring that packets within a microflow
   MUST NOT be reordered except in rare circumstances such as load
   balancing across multiple links or path change for load balancing
   or path change for other reason.

Another SP requirement is stated here and I'm quite sure this
requirement is well accepted.

> ---
>  
> 2.4.2 uses "composite link" and "component link". I suggest picking just
> one term.

They are two different things.  Two or more component links make up a
composite link.  Knowing that, give it another read please.

I'd rather not cite draft-ietf-rtgwg-cl-requirements as an
informational reference just for this one term.  In favor of citing
it, draft-ietf-rtgwg-cl-requirements is moving along.  Against citing
it is there is far less than a ground swell of providers calling for
the full set of things asked for in draft-ietf-rtgwg-cl-requirements.

> ---
>  
> 2.4.5.1 notes that special purpose and extended special purpose labels 
> need to be excluded from the hash. Good.
> But it seems that some special purpose labels will indicate that the 
> next label stack entry contains a label with special meaning. (ELI is
> an example that we specifically don't have to worry about.)
> How do we handle that?
> Should we be dividing up the extended special purpose label space to
> have one set of code points meaning "just this label is special" and
> another set meaning "this label is special and the next label stack
> entry is magic"?

I did list ELI (bullet 2) before the more general rule of not useing
special purpose labels.  The ELI is not used, just the EL, so the text
could be considered correct as-is.

So far the only special purpose label that is not just ignored and
skipped over is ELI.

Regarding this being magic -- All of this is somewhat programable
specialized silicon magic.  The silicon generally has some form of
very fast, very light weight parsing engine at the front of the
pipeline.  One thing it does is pick out fields for load balance.

The better silicon hashes as it goes rather that pick out a set of
fields and then hashes that set of fields when its done.  When it sees
13 it skips and hashes the next thing and stops hashing completely.
If it sees 0-12,14 it skips and continues.  If it sees 15 it skips two
labels and continues.  Its should be programable enough that if
someone defines a new ELI like label it is likely to be able to deal
with it.

The not so good silicon has this all so hard wired that it won't be
able to do ELI without at least a respin.

At most I could add "If a new special purpose label or extended
special purpose label is defined which requires special load balance
processing then, as is the case for the ELI label, a spacial action
may be needed rather than skipping the special purpose label or
extended special purpose label."  I really don't think this is needed.

> ---
>  
> An issue that arises from the multipath support (2.4.5.1) is that
> hardware assumes that after a label stack entry with the S-bit set,
> there are only three possible next bytes...
> - a control word (indicated by b0000 or b0001)
> - an IPv4 header
> - an IPv6 header
> This is the case regardless of how the LSP was set up, and the next
> bytes cannot ever be further MPLS stack entries.

Right.  Note that in (5) is says that some SP will require IP headers
and some will require an ability to disable IP headers.

The rule is really look for 4, 6, or anything else in the first
nibble.  If 4 or 6 assume IP.  If anything else stop.

And yes if the payload is MPLS after a S-bit you have a screwed up
MPLS implementation to start with and you won't get load balance on
any set of MPLS labels after the first S-bit.  This is a fact of life
in the field and is as it should be.

> While this comes up 2.4.5.1 it may merit further discussion in an
> earlier section of the document.

This text is part of 2.4. ("MPLS Multipath Techniques").  The third
paragraph contains "Further details are provided in Section 2.4.5."
Section 2.4.5. is "Fields Used for Multipath Load Balance".

> I note that discussion of support of PWs without the CW drives you
> to say that hashing beyond the S-bit should be a configurable option
> which would (of course) support any payload including MPLS in MPLS
> with repeated bottom of stack. However, you might want to specifically
> preclude that.

It says the same thing here in bullet 5 regarding being configurable.
The wording "ability to disable" is same as "configurable option".

At no point in this document do we imply that looking beyond the S-bit
means looking at anything beyond the S-bit other than looking for IP.
This is very clear in [RFC4385] and [RFC4928] which is cited in the
text about PW CW.

All it says in the places discusing PW is that without CW the traffic
might get reordered.

If you feel that we at any point imply that lack of PW CW allows
looking at anything past the S-bit rather than just looking for an IP
header please point to where and we will have to correct that.  I
looked at all occurances of CW and did not find anything.

Bullet 5 is very clear that a 4 or 6 has to be found in the first
nibble of payload.

> ---
>  
> Q#7 introduces the terms "short-pipe" and "uniform model". It provides
> a pointer to 2.1.6, but that section does not mention these terms (and
> doesn't refer to RFC 3270).

It really should say "See Section 2.1.6 regarding MPLS hierarchy.  See
[RFC3443] regarding PHP, UHP, and pipe, short-pipe, and uniform models."

It does now.

> ---
>  
> Section 7 could usefully point back at Section 2.6.1

I'll add:

   Some advice on hardware and other equipment hardenning against
   Denial-of-Service attack can be found in Section 2.6.1.

The security ADs will have fun with that.  They'll love the comments
about crypto auth being the last line of defense not the first and
only line of defense.  I'll plan on discussing this in greater detail
when SecDir review comes along.

I thik the security AD will want to reword the prior paragraph as well
as they seem to have some very precise preferred wording.

Curtis