Re: [mpls] AD review of draft-ietf-mpls-forwarding

<l.wood@surrey.ac.uk> Mon, 27 January 2014 23:09 UTC

Return-Path: <l.wood@surrey.ac.uk>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 342BA1A03F5 for <mpls@ietfa.amsl.com>; Mon, 27 Jan 2014 15:09:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pI2084_4Ms5C for <mpls@ietfa.amsl.com>; Mon, 27 Jan 2014 15:09:06 -0800 (PST)
Received: from mail1.bemta5.messagelabs.com (mail1.bemta5.messagelabs.com [195.245.231.142]) by ietfa.amsl.com (Postfix) with ESMTP id 033291A03E9 for <mpls@ietf.org>; Mon, 27 Jan 2014 15:09:05 -0800 (PST)
Received: from [85.158.136.51:27597] by server-6.bemta-5.messagelabs.com id 77/D0-16310-F07E6E25; Mon, 27 Jan 2014 23:09:03 +0000
X-Env-Sender: l.wood@surrey.ac.uk
X-Msg-Ref: server-10.tower-49.messagelabs.com!1390864142!25118999!1
X-Originating-IP: [131.227.200.43]
X-StarScan-Received:
X-StarScan-Version: 6.9.16; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 29713 invoked from network); 27 Jan 2014 23:09:02 -0000
Received: from exht022p.surrey.ac.uk (HELO EXHT022P.surrey.ac.uk) (131.227.200.43) by server-10.tower-49.messagelabs.com with AES128-SHA encrypted SMTP; 27 Jan 2014 23:09:02 -0000
Received: from EXMB01CMS.surrey.ac.uk ([169.254.1.204]) by EXHT022P.surrey.ac.uk ([131.227.200.43]) with mapi; Mon, 27 Jan 2014 23:09:01 +0000
From: l.wood@surrey.ac.uk
To: adrian@olddog.co.uk, curtis@ipv6.occnc.com
Date: Mon, 27 Jan 2014 23:07:17 +0000
Thread-Topic: [mpls] AD review of draft-ietf-mpls-forwarding
Thread-Index: AQF2A4GEka3MiGnMKW0OtjuBIpWsOZtLWGfAgAAJoUI=
Message-ID: <290E20B455C66743BE178C5C84F1240847E63346F5@EXMB01CMS.surrey.ac.uk>
References: Your message of "Sun, 26 Jan 2014 18:02:08 +0000." <005601cf1ac0$bc2ea410$348bec30$@olddog.co.uk> <201401270458.s0R4woWw074790@maildrop2.v6ds.occnc.com>, <022d01cf1bb4$1624fde0$426ef9a0$@olddog.co.uk>
In-Reply-To: <022d01cf1bb4$1624fde0$426ef9a0$@olddog.co.uk>
Accept-Language: en-US, en-GB
Content-Language: en-GB
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-GB
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: mpls@ietf.org, draft-ietf-mpls-forwarding.all@tools.ietf.org
Subject: Re: [mpls] AD review of draft-ietf-mpls-forwarding
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 23:09:11 -0000

>> > I think Curtis may have heard this before :-)
>> > The "preferred" (by the RFC editor) expansion of ECMP is
>> > "Equal-Cost Multipath"
>>
>> The form without the hyphen is more common, even among recent
>> documents.  I prefer to keep it without the hyphen.

>> A discussion for a rainy day with the RFC Editor.
>> Leave as is.

Basic English grammar. Hyphenate related adjectives.
http://www.grammar-monster.com/lessons/hyphens_in_compound_adjectives.htm

Lloyd Wood
http://about.me/lloydwood
________________________________________
From: mpls [mpls-bounces@ietf.org] On Behalf Of Adrian Farrel [adrian@olddog.co.uk]
Sent: 27 January 2014 23:04
To: curtis@ipv6.occnc.com
Cc: mpls@ietf.org; draft-ietf-mpls-forwarding.all@tools.ietf.org
Subject: Re: [mpls] AD review of draft-ietf-mpls-forwarding

Hello Curtis,

These replies modulo the conversation with Carlos. I find I can't read that
conversation and back apply the results here :-(

[snip]

> > Your acronym list is commendably thorough, but a little enthusiastic.
>
> If I provide a section with a list of acronyms, do I still have to
> expand on first use.  If so, AC, NSP, OAM, and a few others appear
> before that section.

Afraid so :-(

> Given that this takes up only 17 lines in a 50 page document I'd
> rather be thorough.

Yup. OK. Go for it.

[snip]

> > I think Curtis may have heard this before :-)
> > The "preferred" (by the RFC editor) expansion of ECMP is
> > "Equal-Cost Multipath"
>
> The form without the hyphen is more common, even among recent
> documents.  I prefer to keep it without the hyphen.

A discussion for a rainy day with the RFC Editor.
Leave as is.

> > Section 1.3 bullet 5
> >
> >    5.  The implementer and system designer MUST support pseudowire
> >        control word (CW) if MPLS-TP is supported or if ACH [RFC5586] is
> >        being used on a pseudowire.
> >
> > The wording is a bit odd. "The implementation and system design..."?
> >
> > Ditto bullets 6 and 7
>
> Target audience is explained in Section 1.4.  If you like I can flip
> Section 1.3 and 1.4 so target audience is first, then use of the roles
> called for in the target audience section won't seem quite so odd.

No issue with the section order.
Just puzzled by "the implementer MUST support" when I (pedantically) thought
that the implementer supporting something might not be the same as the
implementation supporting it.

Not a big deal.

> > Section 1.3
> >
> > While there is not wrong with the statements made in the bullets, some
> > of the later ones refer to recent additions to the MPLS suite. Yet the
> > list is presented as "there were some misconceptions." Clearly the
> > early silicon did not have misconceptions about the inclusion of entropy
> > labels.
> >
> > Just tweak the words at the top of the list?
>
> I'd like to keep that as is and split into two lists.  The second list
> would have the last two items (fat-pw and EL).  The first list would
> end with "implement CW" (sic).
>
>  OLD
>
>    6.  The implementer and system designer SHOULD support adding a
>        pseudowire Flow Label [RFC6391].  Deployments MAY enable this
>        feature for appropriate pseudowire types.  See Section 2.4.3.
>
>    7.  The implementer and system designer SHOULD support adding an MPLS
>        entropy label [RFC6790].  Deployments MAY enable this feature.
>        See Section 2.4.4.
>
>  NEW
>
>    The following statements provide clarification regarding more
>    recent requirements that are often missed.
>
>    1.  The implementer and system designer SHOULD support adding a
>        pseudowire Flow Label [RFC6391].  Deployments MAY enable this
>        feature for appropriate pseudowire types.  See Section 2.4.3.
>
>    2.  The implementer and system designer SHOULD support adding an MPLS
>        entropy label [RFC6790].  Deployments MAY enable this feature.
>        See Section 2.4.4.
>
> I've made this change.  Let me know if this is not OK.

That's fine.

> > 2.1.1
> >
> > Maybe the first paragraph should clarify "special purpose labels at
> > the top of the label stack"
>
> That phrase doesn't appear anywhere in the document.  Exactly what am
> I clarifying?  What to do with unknown special purpose labels is in
> the last paragraph of this subsection.

WTF?
You have to wonder where these senile ADs get their ideas.

Complete brain fart. Sorry.

[snip]

> > While per platform label space is mentioned in 2.1.7 I wonder whether
> > more information on per platform and per interface label spaces is
> > needed. I recall early implementations that got very confused when
> > parallel interfaces used the same label for different purposes.
> >
> > I guess the point there is that you cannot assume that your neighbor
> > is or is not using the per platform label space.
> >
> > Upstream label allocation may also come into this.
>
> The only mention of label allocation is that MPLS FRR bypass method
> (more formally known as facilitles backup) uses platform label space.
>
> The only reason platform label space is of any significance in a
> document about forwarding is "The use of platform label space impacts
> the size of the LSR ILM for LSR with a very large number of
> interfaces."
>
> Label allocation, per platform or per interface and upstream or
> downstream, is not a forwarding issue.  It is a software issue
> and a matter of getting the protocol bits right.  Therefore I think
> expanding any further on label allocation should be out of scope.

OK. I'm convinced.

> > 2.1.8.1
> >
> >    3.  If the edge is not using pseudowire control word (CW) and the
> >        core is using multipath, reordering will be far more common.  If
> >        this is occurring, the best solution is to use CW on the edge,
> >        rather than try to fix the reordering using resequencing.
> >
> > Completely agree, but isn't the sequence number contained in a control
> > word meaning that the resequencing could, in any case, not be done
> > without using a control word?
>
> I suppose you can't fix reordering caused by not using CW without the
> sequence number in the CW.  That is going to require fixing the text.
>
>  OLD
>
>    3.  If the edge is not using pseudowire control word (CW) and the
>        core is using multipath, reordering will be far more common.
>        If this is occurring, the best solution is to use CW on the
>        edge, rather than try to fix the reordering using resequencing.
>
>  NEW
>
>    3.  If the edge is not using pseudowire control word (CW) and the
>        core is using multipath, reordering will be far more common.
>        If this is occurring, using CW on the edge will solve the
>        problem.  Without CW, resequencing is not possible since the
>        sequence number is contained in the CW.
>
> That was a big oops on our part.

Well, on the scale of IETF oopsies, I don't think you score too high. Maybe
Narten could run a weekly script?

[snip]

> > Should 2.2 distinguish the order of magnitude of replication at branch
> > nodes? This impacts the replication method used (some devices make a
> > copy and cycle around, some devices can do multiple copies at once). On
> > the whole is no different from IP multicast processing except (as you
> > note) that each outgoing packet may be different by its label value.
>
> Is it possible to quantify the fanout?  YMMV?
>
> The only thing I could say is that an implementation may need to make
> lots of copies in some roles (access routers for example).
>
> Making a copy and cycling yields poor performance but for low
> multicast traffic volumes might be OK.  But you are right - some
> mostly low-end-ish chips to this.
>
> I'm not sure I can describe how multicast with high fanout is done
> without wading into implementation details of specific vendors.
>
> Perhaps the best I can do is add this:
>
>    Careful consideration should be given to the performance
>    characteristics of high fanout multicast for equipment that is
>    intended to be used in such a role.
>
> I'll add this before the last paragraph in the section.

That works.

> > 2.4
> >
> > So obvious you didn't say it?
> >
> >    In order to support an adequately balanced load distribution across
> >    multiple links, IP header information must be used.  Common practice
> >    today is to reinspect the IP headers at each LSR and use the label
> >    stack and IP header information in a hash performed at each LSR.
> >    Further details are provided in Section 2.4.5.
> >
> > Missing is the statement that a single "flow" must not be distributed
> > across multiple paths because of the implication for potentially
> > significant packet misordering. And feeding that is a common requirement
> > that such packet misordering must not occur because applications and
> > transport protocol implementations cannot survive such misordering.
>
> Yes.  That requirement was missed.  Add new second paragraph to this
> subsection.
>
>    The Differentiated Services requirements for good reasons dictate
>    that packets within a common microflow SHOULD NOT be reordered
>    [RFC2474].  Service providers generally impose stronger
>    requirements, commonly requiring that packets within a microflow
>    MUST NOT be reordered except in rare circumstances such as load
>    balancing across multiple links or path change for load balancing
>    or path change for other reason.
>
> Another SP requirement is stated here and I'm quite sure this
> requirement is well accepted.

Looks good.

> > 2.4.2 uses "composite link" and "component link". I suggest picking just
> > one term.
>
> They are two different things.  Two or more component links make up a
> composite link.  Knowing that, give it another read please.
>
> I'd rather not cite draft-ietf-rtgwg-cl-requirements as an
> informational reference just for this one term.  In favor of citing
> it, draft-ietf-rtgwg-cl-requirements is moving along.  Against citing
> it is there is far less than a ground swell of providers calling for
> the full set of things asked for in draft-ietf-rtgwg-cl-requirements.

Yes. Sorry. It has been a looooooooong time since I had a pass on the CL
document. Atrophy.

> > 2.4.5.1 notes that special purpose and extended special purpose labels
> > need to be excluded from the hash. Good.
> > But it seems that some special purpose labels will indicate that the
> > next label stack entry contains a label with special meaning. (ELI is
> > an example that we specifically don't have to worry about.)
> > How do we handle that?
> > Should we be dividing up the extended special purpose label space to
> > have one set of code points meaning "just this label is special" and
> > another set meaning "this label is special and the next label stack
> > entry is magic"?
>
> I did list ELI (bullet 2) before the more general rule of not useing
> special purpose labels.  The ELI is not used, just the EL, so the text
> could be considered correct as-is.
>
> So far the only special purpose label that is not just ignored and
> skipped over is ELI.
>
> Regarding this being magic -- All of this is somewhat programable
> specialized silicon magic.  The silicon generally has some form of
> very fast, very light weight parsing engine at the front of the
> pipeline.  One thing it does is pick out fields for load balance.
>
> The better silicon hashes as it goes rather that pick out a set of
> fields and then hashes that set of fields when its done.  When it sees
> 13 it skips and hashes the next thing and stops hashing completely.
> If it sees 0-12,14 it skips and continues.  If it sees 15 it skips two
> labels and continues.  Its should be programable enough that if
> someone defines a new ELI like label it is likely to be able to deal
> with it.
>
> The not so good silicon has this all so hard wired that it won't be
> able to do ELI without at least a respin.
>
> At most I could add "If a new special purpose label or extended
> special purpose label is defined which requires special load balance
> processing then, as is the case for the ELI label, a spacial action
> may be needed rather than skipping the special purpose label or
> extended special purpose label."  I really don't think this is needed.

You're right, and my worry is more about the special-purpose draft and the
consequences of possibly adding other special purpose labels that have child
labels associated. We certainly don't want to have to retrain the silicon at
transit LSRs to specially know what to do for each new special-purpose label.
Currently we propose that you don't hash on a special purpose label, but you can
carry on hashing immediately after.

If I introduce the foo-label, your silicon will recognise it as special purpose,
but it I say the label after the foo-label is magic you won't know that.

A way to fix this is to have (just punting here) the top bit of the extended
special purpose label range set mean "magic label follows".

> > An issue that arises from the multipath support (2.4.5.1) is that
> > hardware assumes that after a label stack entry with the S-bit set,
> > there are only three possible next bytes...
> > - a control word (indicated by b0000 or b0001)
> > - an IPv4 header
> > - an IPv6 header
> > This is the case regardless of how the LSP was set up, and the next
> > bytes cannot ever be further MPLS stack entries.
>
> Right.  Note that in (5) is says that some SP will require IP headers
> and some will require an ability to disable IP headers.
>
> The rule is really look for 4, 6, or anything else in the first
> nibble.  If 4 or 6 assume IP.  If anything else stop.
>
> And yes if the payload is MPLS after a S-bit you have a screwed up
> MPLS implementation to start with and you won't get load balance on
> any set of MPLS labels after the first S-bit.  This is a fact of life
> in the field and is as it should be.
>
> > While this comes up 2.4.5.1 it may merit further discussion in an
> > earlier section of the document.
>
> This text is part of 2.4. ("MPLS Multipath Techniques").  The third
> paragraph contains "Further details are provided in Section 2.4.5."
> Section 2.4.5. is "Fields Used for Multipath Load Balance".
>
> > I note that discussion of support of PWs without the CW drives you
> > to say that hashing beyond the S-bit should be a configurable option
> > which would (of course) support any payload including MPLS in MPLS
> > with repeated bottom of stack. However, you might want to specifically
> > preclude that.
>
> It says the same thing here in bullet 5 regarding being configurable.
> The wording "ability to disable" is same as "configurable option".
>
> At no point in this document do we imply that looking beyond the S-bit
> means looking at anything beyond the S-bit other than looking for IP.
> This is very clear in [RFC4385] and [RFC4928] which is cited in the
> text about PW CW.
>
> All it says in the places discusing PW is that without CW the traffic
> might get reordered.
>
> If you feel that we at any point imply that lack of PW CW allows
> looking at anything past the S-bit rather than just looking for an IP
> header please point to where and we will have to correct that.  I
> looked at all occurances of CW and did not find anything.
>
> Bullet 5 is very clear that a 4 or 6 has to be found in the first
> nibble of payload.

OK. I misread bullet 5.

[snip]

Thanks.
Adrian

_______________________________________________
mpls mailing list
mpls@ietf.org
https://www.ietf.org/mailman/listinfo/mpls