Re: [mpls] MPLS-RT review of draft-bryant-mpls-sfl-framework

Curtis Villamizar <curtis@orleans.occnc.com> Sat, 06 May 2017 15:15 UTC

To: Stewart Bryant <stewart.bryant@gmail.com>
cc: Curtis Villamizar <curtis@ipv6.occnc.com>, mpls@ietf.org
Reply-To: Curtis Villamizar <curtis@orleans.occnc.com>
From: Curtis Villamizar <curtis@orleans.occnc.com>
In-reply-to: Your message of "Thu, 04 May 2017 14:19:07 +0100." <93e688f5-0072-0853-337b-df9364b56f68@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <11993.1494083705.1@harbor1-em2.v6only.occnc.com>
Date: Sat, 06 May 2017 11:15:08 -0400
Message-Id: <20170506151518.E494B129454@ietfa.amsl.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/aW6E73Bm6nY1fABKAYas1cAn3N8>
Subject: Re: [mpls] MPLS-RT review of draft-bryant-mpls-sfl-framework
Precedence: list

In message <93e688f5-0072-0853-337b-df9364b56f68@gmail.com>
Stewart Bryant writes:
> 
> On 28/04/2017 20:26, Curtis Villamizar wrote:
> > Loa,
> >
> > The document is coherent and possibly useful.  The question of IPR
> > should be resolved before accepting this as a WG document.
> I will leave the IPR issue to the chairs.
>  
> >
> > Briefly:
> >
> >    1.  The document proposes a change to forwarding.  Whenever a change
> >        in forwarding is considered, there should be a very substantial
> >        benefit and interoperability with older hardware should be
> >        considered in the document.
> The important thing is that the change to forwarding is confined solely 
> to the
> PEs that are using this function. There is no change to any of the P 
> routers needed.

Forwarding at intermediate routers which do ECMP is affected.  If you
add a label to LM/DM measurement traffic but not the traffic intended
to be measured, then the measurement traffic is likely to take a
different path.  If you add a label to both the measurement traffic
and the non-measurement traffic, and the label value is different,
then the path are likely to be different.  If the label value is
always the same, then the label is a NOOP.

For the purpose of discussing LM/DM traffic path, ECMP should be
interpreted as all forms of ECMP, including at L2 such as Ethernet LAG
or equivalent behaviour over other high capacity links.

> Furthermore, in the application that triggered this work, pro-active packet
> loss measurement, the action required is to count the number of packets
> received on a label, and that is something that a lot of chips already 
> do. In such
> a case the change is a technical change to the forwarding behaviour 
> specifying
> an action that is already deployed.

The LM measurement needs to be made on the same path.

> In other cases, where a net-processor is deployed, it is a case of 
> vectoring from
> the LFIB to a different  routine. Given that the additional 
> functionality is needed
> by the PEs such a change is not unreasonable.

What you are trying to accomplish is getting a classification label
into the label stack.  To support ECMP, that label cannot affect ECMP
if the classification label is not intended to affect the traffic
path.

> >    2.  The document leaves open the question of dealing with ECMP in
> >        section 4.  The first option in section 4 is not viable.
> I do not understand why this is not viable, other than perhaps due to 
> stack size limits.
> The EL solution requires support for EL along the LSP which may not be 
> available.
> Indeed I am not sure how widely available it is. Thus I see option 1 as 
> a method to be
> deployed in the absence of EL support.
>  
> We can discuss whether this is listed first or second.

At this point EL support may be at some nodes in the path but not
others, or at all nodes, or at none.

Adding a label to identify flows (such as ingress on a PTMP LSP or
measurement traffic on LM/DM traffic) can only not affect ECMP in the
case where EL is supported on the entire path.

> >        The
> >        document should drop this and expand on the second "worthy of
> >        consideration" solution, explaining how it will work in the same
> >        level of detail as provided in 3.1, 3.2, and 3.3.
>  
> Sure, that can be added. Adding it is fairly simple, although I do not 
> see why
> that cannot wait until after adoption, since it is the duty of the editor to
> reflect the consensus of the WG.

Before adoption you need to insure that a viable solution exists and
also that a reasonably efficient solution is proposed so I would like
to see you complete that exercise.  Ultimately the WG decides.

> >    3.  In the VPN example a perceived advantage of MP2P LSP is lost.
> >        With MP2P LSP, the number of application labels needed is the
> >        number of VPN IDs.  The number of labels needed becomes number
> >        of VPN IDs times the number in ingress.  If Section 3.3 is
> >        supposed to address this, then this is not clear and therefore
> >        could benefit by referring back to the example and stating that
> >        the SFL could indicate source only and the application label
> >        indicate VPN ID only.
> Sure we can text to address the scaling concern, and expand the 
> processing text.
> Note that whether scaling is a concern or not will depend on how many VPNs
> need SFL actions, and on the chosen ECMP approach.

What I was getting at above is a reason to decouple the label
providing the action with the label identifying the flow.

> > More detail:
> >
> > Forwarding of labeled packets currently involves some special
> > processing of labels 0-15 and some form of lookup in a table or TCAM
> > or other structure with the lookup resolving to a set of operations.
> > At mimimum these operation are SWAP and forward to an egress
> > optionally after considering ECMP (EL or rest of stack), POP and look
> > at next label (or assume IP payload if no next label).  Forwarding
> > hardware currently need not support a POP and look for IP payload.
> > That operation is only required of the IPv4 Explicit NULL Label and
> > IPv6 Explicit NULL Label.
>  
> I do not see what in the text causes you to raise this point. An SL has 
> the same
> semantics as the label it replaces plus a side-effect. If the original 
> label implied
> IP, so will the SL. As I said earlier, if the side-effect is other than 
> increment a
> counter (which is common anyway) then the side effect is new functionality
> which needs implementing anyway.

Some routers didn't do POP and look at payload on ordinary labels, or
did so at lower PPS rate.  They could make that special operation (at
all or at full speed) only for Explicit NULL.  Otherwise a label
lookup was required, followed by an IP lookup.  That is why PHP came
about.  Some routers assumed PHP was always available and could not do
any form of POP and look further except on Explicit NULL.  Since
MPLS/TP prohibited PHP, maybe those routers are all gone.  You asked
why I brought it up.

> > Second, both Explicit NULL labels in
> > RFC3032 are required to be at the bottom of stack in RFC3032 with this
> > restriction removed in RFC4182 for proper operation of the pipe model
> > when not using an ordinary label POP at egress.
> >
> > The changes to forwarding are:
> >
> >    1.  Processing of an ordinary label (label not in the 0-15 range)
> >        must be able to POP and look at the IPv4 or IPv6 payload to
> >        conform to section 3.2 of this draft.  Currently this processing
> >        can be fixed to label values 0 and 2 respectively.
> >
> >    2.  Processing of an ordinary label taking on the function of an
> >        explicit NULL and processing of an explicit NULL label must be
> >        capable of handling ELI and EL below the SFL or Explicit NULL
> >        and then take the same action as would be taken for Explicit
> >        NULL after the ELI and EL POP.  This would be needed to support
> >        pipe model as described in RFC4182 and not change the
> >        penultimate LSR to SWAP a lable for SFL or Explicit NULL and
> >        remove ELI and EL from underneath.
> >
> > I'm assuming that SFL would not map to special labels other than
> > Explicit NULL but such a statement should be made explicit.  Although
> > if applied to RFC 6374 then SFL might also map to GAL.
> I had been assuming that if you wanted a GAL you would put it in there.
> I wonder if the EN equivalence is confusing and we should look at other
> equivalences.
>  
> There are a number of alternative equivalence models we can look at that
> have the same properties:
>  
> If you assume that the LSP label has already been popped (due to PHP) it
> the SFL could be considered as equivalent to a non-popped LSP label.
> Alternatively I suppose you could model the SFL as a VRF label with
> the lookup performed in the base topology.

I think the problem is that the EN equivalence is problematic, not
that the EN equivalence is confusing.

It is true though that the EN is processed at the egress and that the
egress can not agree to perform the SFL function if it is in any way
problematic, for example problematic in forcing slow path as Andy
pointed out.

> > If SFL can map
> > to other special labels, then those others from the set of 0-15
> > including mapping to ESPL should be mentioned.
>  
> We will put text in on the subject.

Thanks.

> > Each time a new
> > special label is defined there are complaints about changing
> > forwarding.  If now any ordinary label can map to a specific set of
> > special labels, or worse to any special label, then this is a
> > substantial change in the definition of MPLS forwarding.
> That is not where I think this should go either. I think that it needs to
> behave exactly as MPLS normally behaves except for the agreed side
> effect in PEs configured to have the required behaviour.

I think what you are ignoring in the "exactly as MPLS normally
behaves" is consideration of ECMP along the path for measurement
traffic.

> > btw- In practice the IPv6 Explicit NULL is rarely if ever used but
> > instead POP of an ordinary label at BOS, label zero (IPv4 Explicit
> > NULL) at BOS, or lack of a label stack after PHP looks for the IP
> > version in the payload.  Does any RFC say to do this (combine the
> > meaning of IPv4 Explicit NULL and IPv6 Explicit NULL and look at IP
> > version in the payload) or is this just common practice for almost 20
> > years that escaped getting documented?
>  
> I am not sure this impacts this draft.

It was just a comment on whether anyone actually uses IPv6 Explicit
NULL, which could also bring up the question of whether anyone uses
any Explicit NULL rather than Implicit NULL (where payload is exposed
at egress and therefore the IP version nibble is checked.

> > The second issue relates to ECMP.  Section 4.1 item 1 suggests that
> > "The operator can elect to always run with the SFL in place".  If the
> > value of SFL is different for data traffic and PM traffic, then the PM
> > would not take the same path.  This would put each "flow" in a
> > multipoint to point LSP (ie: LDP) on a separate path which for ECMP is
> > not an issue.
> I would expect that the instrumentation packet go with the same SFL it is
> collecting results for.

Are we talking about direct measurement LM or counting PM packets?  If
the former, then yes they have the same SFL.

Back when MPLS was having the TP PM discussions, the notion of sending
PM test traffic slow pathing simply to support those counters was
popular because most hardware could not capture the exact LSP count at
the moment a PM measurement packet arrived, only some time after the
packet arrived since the measurement packet itself was slow pathed, or
semi-slow pathed in some chip microcoded engine added for OAM.  As I
understood it then, very little hardware (if any) at the time could
support a slow path with LSP counter snapshot, that would be needed to
support the sort of highly accurate LM that direct measurement would
provide.

So are talking about applicability to LM (rfc6374) direct measurement
only?  If so, please say so.

For inferred measurement I don't see how using the LSP counters helps.

RFC 6374 section 3 describes LM using inferred measurement (stop
sending, wait, then send query).  If this is the case and you are LSP
counters then you have an ECMP problem without EL.

> > If the only purpose of the SFL was maintain separately counters for
> > specific multipoint to point LSP ingress then lack of SFL for other
> > ingress has no effect.  In this case the "solution" described in
> > Section 4.1 item 1 is a NOOP.
> >
> > If the purpose is to also provide separate PM for specific (or all)
> > multipoint to point LSP ingress or provide separate counters for PM
> > for a point to point LSP, then the "solution" described in Section 4.1
> > item 1 fails.
> >
> > Section 4.1 item 1 therefore needs to be removed.
>  
> I don't think so.
>  
> If you instrument one flow on a long term basis and leave the SFL in place
> the behaviour is always the same.  How does that fail?

We are back to the question of whether PM traffic has a different SLF
value than the traffic it is measuring.  And then we are back to the
PM slow path counter snapshot time lag issue.  If chip vendors took
note and fixed this such that LM direct measurement now works
(accurate counter when sending and accurate counter read on
reception), that would be great news but for me it would be news.

Also I don't know of any hardware capable of direct measurement of DM
or any advantage to having LSP counters available for DM.

The one case that I can see an small advantage in inferred LM
measurement is where the LM test packets carried a unique SFL and
could therefore be counted and tossed rather than slow pathed.  This
would require a different SFL value that the flow being measured.  If
that case is being considered, then Section 4.1 would not work.

> > The implications of Section 4.1 item 2 need to be better explored.
> > The section as-is constitutes a "punt" on the problem of ECMP with SFL
> > used for PM.  A diagram of each case in Section 3 with ELI and EL can
> > be added in Section 4 or the optional placement of ELI and EL.  The
> > behaviour when ELI and EL are present can then be described in Section
> > 4 with penultimate LSR behaviour in addition to egress LSR behaviour
> > described for both PHP and non-PHP case.
> That text can be expanded in a future version.

Thanks.

> > Nits:
> >
> > In "Abstract" the phrase "on the on the" should be "on the".
> Ack
> >
> > In Section 2 "defined to be a label" is awkward.
> s/to be/as/
>  
> >
> > An alternate:
> >
> > An alternate to the proposal here would be to make something like what
> > is in Section 3.3 the only case, but where a ESPL range is used.  This
> > would add egress processing without stepping on ECMP since SPL and
> > ESPL are excluded from ECMP according to RFC 7274.
>  
> An interesting idea, but I am not sure how widely EL is supported, and 
> it has the
> disadvantage that it needs new dataplane changes to process the EL, whereas
> the advantage of the proposal in the text is that for the common VPN and PW
> case the existing h/w works out of the box in a lot of cases.

Everyone I talked to making a new chip in 2010 timeframe was aware of
EL and addressing it, though I only talked to a subset and only those
considering new product on the higher end.  Any multiple 100G
interface chip today is of that generation, though I can't be sure
roadmaps turned to real product.  Anyone using an high multiple of 10G
chip would probably use a newer chip due to space and power and is
likely to have EL.  Its lower end products that may very well be using
older chips just because they cost less and space and power is less of
an issue in that product space.  For rfc6790, bets are off for chips
designed before about 2010 and I suspect lower end chips will lag.

If there is no EL support then nothing is Section 4 will work.

> > An ESPL with a few
> > top bits always set and less than 20 bits of range could be considered
> > (for example XL + 1111xxxxxxxxxxxxxxxx for 16 bits of range and 1/16
> > of the ESPL space in RFC 7274).  This range would be defined such that
> > it is application specific, is added at ingress, completely ingnored
> > along the way and provides the some egress specific functionality.
>  
> Not only is that new functionality in the core (ignoring a label 
> following and EL)
> but it then remains unclear how ECMP then works.

If a mid LSP LSR doesn't support EL, then it will do ECMP on all
labels.  Your entire Section 4 fails to get measurement and real
traffic on the same path (except direct measurement).  At the mid LSP
LSR this proposal misbehaves in the same way.

In both cases only the egress LSR is affected.  In this case, no SFL
functionality is needed (and no increase in number of ordinary labels
that need to be acted on).  The XL and next label have to be removed
to expose the payload and then a counter is based on the low 16 bits
of the label following XL.  The forwarding action (PW or VPN or
whatever) and the counting action are decoupled.

> On the other hand we could (which is what I thought you meant) assign a 
> meaning
> to the EL value agreed between PEs and have the load balancing use the 
> EL value
> and have the SL behaviour be based on the SL value. However this is a 
> forwarding
> change and an ECMP path change, and can only be applied once whereas the
> proposed design could have multiple SFLs in the stack, say to monitor
> an LSP and a PW.

No ECMP change is needed.  RFC 7274 already prohibits using SPL, XL,
or ESPL label for ECMP.  There is no change at all to ingress except
push more labels, none at all for mid LSR including PHP LSR, and only
change at egress.

> > Most implementations that support RFC 6790 are likely to also support
> > the prohibition on using ESPL for ECMP since the documents were in the
> > MPLS WG at about the same time, though publication of the latter
> > lagged about a year.  This alternative also solves any scalability
> > problem associated with combining source and application label
> > functionality into one label.  It does add two labels to the stack
> > depth but newer hardware should no longer be constrained to very
> > shallow stack depth.  This alternate also might be an even more
> > attractive option if IPR becomes an issue with the current proposal.
>  
> I have no idea how to respond to this given the IETF rules that prohibit 
> discussion
> of IPR terms.

The WG choice may be no IPR vs IPR with currently unknown IPR terms
given lack of an IPR statement.

> > Summary:
> >
> > well written document.  not a great design.  multiple problems as
> > defined.  possible IPR issue.
> I agree with the first :)
>  
> I disagree with the second :(

Its really just the thorny ECMP questions.  ECMP is not to be ignored
IMHO since we have to consider non-TP.

> Working through problems is part of the WG phase of development, and as 
> far as
> I can see the alternative you propose has problems. What I do know is 
> that the
> underlying problem we seek to solve is important and as far as I can 
> tell this is
> the best proposal on the table.

I remain unconvinced.

Using SFL:

  no ECMP - works.

  no way to support mid LSR counting

  inferred measurement LM or DM:

    with ECMP and less that 100% EL support - doesn't work.

    with EL - works

  direct measurement LM or DM:

    works but apparently nothing today supports direct measurement.

Using a new ESPL range:

  egress changes to count on new ESPL before POP

  optional mid LSR change to count on new ESPL would be nice but not
  needed (enable per LSP).

  inferred measurement LM or DM:

    with ECMP and less that 100% EL support - doesn't work.

    with EL - works

  direct measurement LM or DM:

    works but apparently nothing today supports direct measurement.

So it looks like a wash except that there is no way for an ingress and
egress to create a label stack that doesn't yield bad measurements if
ECMP is present in the path but not known to the ingress or egress and
there is a way to support mid LSR counting down the stack a ways.

> You keep saying IPR issues, but it is a deficiency of the IETF rules of 
> conduct that
> we cannot have a detailed discussion. The chairs and or ADs need to provide
> guidance on how to respond to this.

It is perfectly valid to bring up the existance of IPR in a review.
AFAIK All you and I can say is "IPR exists and there are four IPR
disclosures so far" so WG should read them and wait on completion of
the IPR poll for any additional disclosure.

> - Stewart

Curtis

> > consider alternative.  any alternative but I provided one.  feel free
> > to use the idea.  :-)
> >
> > Curtis
> >
> >
> > In message <5774d63c-6980-103e-0b3f-a277db822462@pi.nu>
> > Loa Andersson writes:
> >> Huub, Andy, Curtis and Kamran,
> >>   
> >> You have been selected as MPLS-RT reviewers for
> >> draft-bryant-mpls-sfl-framework-04
> >> .
> >>   
> > [ ...]

[mpls] MPLS-RT review of draft-bryant-mpls-sfl-fr… Andrew G. Malis
Re: [mpls] MPLS-RT review of draft-bryant-mpls-sf… Curtis Villamizar
Re: [mpls] MPLS-RT review of draft-bryant-mpls-sf… Stewart Bryant
Re: [mpls] MPLS-RT review of draft-bryant-mpls-sf… Stewart Bryant
[mpls] MPLS-RT review of draft-bryant-mpls-sfl-fr… Huub van Helvoort
Re: [mpls] MPLS-RT review of draft-bryant-mpls-sf… Curtis Villamizar