Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe3-endpoint-fast-protection-01

Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe3-endpoint-fast-protection-01 - RFC4447

Eric Rosen <erosen@cisco.com> Wed, 06 August 2014 15:54 UTC

From: Eric Rosen <erosen@cisco.com>
To: Alexander Vainshtein <Alexander.Vainshtein@ecitele.com>
In-reply-to: Your message of Tue, 05 Aug 2014 13:59:49 -0000. <9696d0db139d46ffaad7be11340215e8@AM3PR03MB612.eurprd03.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <16166.1407340459.1@erosen-lnx>
Date: Wed, 06 Aug 2014 11:54:19 -0400
Message-ID: <16167.1407340459@erosen-lnx>
Archived-At: http://mailarchive.ietf.org/arch/msg/mpls/lKDPNlJTOSMq25eNd-5y4H63nEo
Cc: "mpls@ietf.org" <mpls@ietf.org>, pwe3 <pwe3@ietf.org>, "pwe3-chairs@tools.ietf.org" <pwe3-chairs@tools.ietf.org>
Subject: Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe3-endpoint-fast-protection-01 - RFC4447
Precedence: list
Reply-To: erosen@cisco.com

I've looked at this draft, and it seems to me that it does work in LDP-based
networks (even with DU mode), and that it does not violate any architectural
principles.  However, I have made certain assumptions that aren't explicitly
stated in the draft.  (I am assuming that the draft is intended to be
patterned after the "anycast BGP" scheme that I've heard about about, but
that is not written up anywhere as far as I know.)  Let me say how I think
it is intended to work, and the authors can correct me if I'm wrong.

For protection against the failure of the "primary" egress PE:

- A "primary egress PE" partitions its set of PWs into n sets, where each
  set is associated with a given "protector".  The "protector" is the backup
  egress PE for those PWs.

- Each such set is associated with an IP address.  This IP address must be
  known both to the primary PE and to the protector.

  * One can think of this address as an "anycast" address that, to routing,
    denotes both the primary PE and the protector.  At the primary PE and
    the protector, the address denotes a particular set of PWs.

  * These anycast addresses are distributed by routing, and good old
    fashioned MPLS labels get assigned to them by LDP.  The primary PE MAY
    bind implicit null to that address, but the protector MUST NOT bind
    implicit null to that address.

  * Routing is set up so that (a) if the primary PE is up, traffic to the
    anycast address always goes to the primary PE, but (b) if the primary PE
    is down, traffic to the anycast address goes to the protector instead.
    This can be achieved by messing with the metrics, e.g.

- For a given PW, the primary PE informs both the protector and the ingress
  PE  of (a) the label that it (the primary PE) has assigned to that PW, and
  the "anycast address" that corresponds to that PW.  

  This requires some new signaling (beyond that specified in RFC 4447), but
  the requisite signaling is specified in the draft.

- When sending a packet on the given PW, the ingress PE pushes on the label
  assigned to the PW by the egress PE.  Then the ingress PE pushes on
  whatever LDP-assigned label it need to use to get the packet to the
  anycast address that is associated with the PW.

- When the protector learns that the egress PE has assigned a given label
  and anycast address to the PW, it stores that label in a context-specific
  label space associated with that anycast address.  Whenever the protector
  receives a packet whose top label is the label that the protector has
  bound to the given anycast address, the protector looks up the next label
  in the corresponding label space.  

Let's look at an example:

CE1---PE1----PLR----PE2----CE2
               |             |
               |-----PE3-----|

where PE3 is the protector for a PW connecting CE1 to CE2 (via PE1 and PE2
respectively).   Call this pseudowire PW1.

Thus there is some IP address, call it A, that PE2 and PE3 both advertise to
routing.  The metrics are set up so that as long as PE2 is up, it appears to
be a better path to A than does PE3.

So PE2 assigns label L1 to PW1, and uses T-LDP to inform both PE1 and PE3 of
this mapping.  To PE1, these are just outgoing labels.  To PE3, however,
they are incoming labels in a context-specific label space.

PE2 and PE3 both assign labels to A (let's call these labels L2 and L3
respectively), and distribute the corresponding label bindings to PLR.  PLR
also assigns a label to A, and distributes that label binding (L4) to PE1.

When PE1 has a packet to send on PW1, it pushes L1, then pushes L4, then
sends the packet to PLR.  PLR swaps L4 with L2 and sends the packet to PE2.
(Note that L2 could be implicit NULL.)  PE2 sees label L1, and sends the
packet to CE2.

Now if PE2 goes down, things happen a bit differently.  When PLR gets a
packet with L4 on top, it swaps L4 with L3, and sends the packet to PE3.  (In
this case, of course, L3 cannot be implicit NULL.)

PE3 gets the packet, sees L3 on top, looks it up, and then says "L3 is a
context label, so I have to pop it and then look up the following label in
the L3-specific label table."  This context-specific label table contains
L1, which has been bound to the PW that leads to CE2.  So PE2 pops the label
and sends the packet to CE2.

I think that's basically it, and the necessary mechanisms do seem to be
described in the draft.  The draft doesn't actually use the notion of
"anycast address" explicitly, but I think the intention is that the "context
labels" are definitely bound to anycast addresses.  I hope the authors will
correct me if I'm wrong.

Although the above example shows a PLR that is a neighbor of both PE2 and
PE3, this isn't really necessary, one could easily construct an example
where PE2 and PE3 have no neighbors in common.

The example above doesn't rely on LDP DoD at all, and doesn't make use of
any RSVP-TE backup tunnels.

However, when protecting against the failure of the egress AC, packets would
go first to PE2 then to PE3.  This scenario is a bit different, in that:

- I think this scenario may require an RSVP-TE backup tunnel (on which PHP
  is not used) from PE2 to PE3.

- PE2 would have to signal PE3 to associate the given PW with the given
  backup tunnel.

- The label that PE3 assigns to that backup tunnel would identify the
  context in which the PW label is looked up.

This is perhaps more detail than is actually in the draft ;-) but if it is
an accurate description of the authors' intentions, the scheme seems to
work.

I don't think this modifies the MPLS architecture at all.  It is true that
the PE3 needs to maintain a context-specific label table that contains
labels assigned by PE2.  While one may say that that is a "coordinated label
space", it is no more than what is required by the nature of
"upstream-assigned labels".  I.e., if you can't support this level of
coordination, you just can't support upstream-assigned labels.

Also, I don't think this draft is an "update" to RFC 4447.  You don't need
to read this draft to implement RFC 4447, nor does it modify the procedures
of RFC 4447.  It just introduces some new optional signaling and procedures.
I don't think RFC 4447's requirement to use the "platform label space"
should be interpreted as prohibiting the use of upstream-assigned labels.
The intention of RFC 4447 was to prohibit the use of the interface-specific
label space for PW labels.  This prohibition was necessary because the
ingress PE doesn't know which of the egress PE's network-facing interfaces
will actually receive a given packet.  This draft doesn't have that problem,
as it provides clear procedures for determining the context in which the
upstream-assigned labels are looked up.

Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Alexander Vainshtein
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Alexander Vainshtein
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Loa Andersson
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Eric Rosen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Mingui Zhang
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant (stbryant)
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Alexander Vainshtein
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Mingui Zhang
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Loa Andersson
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Stewart Bryant
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yakov Rekhter
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Loa Andersson
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yimin Shen
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Yakov Rekhter
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Loa Andersson
Re: [mpls] [PWE3] WG Last Call for draft-ietf-pwe… Huub van Helvoort