Re: [PWE3] GAL above and below the PW label

"Carlos Pignataro (cpignata)" <cpignata@cisco.com> Tue, 26 April 2011 01:07 UTC

Return-Path: <cpignata@cisco.com>
X-Original-To: pwe3@ietfa.amsl.com
Delivered-To: pwe3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 11B30E069C for <pwe3@ietfa.amsl.com>; Mon, 25 Apr 2011 18:07:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -108.532
X-Spam-Level:
X-Spam-Status: No, score=-108.532 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, RCVD_NUMERIC_HELO=2.067, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jU8BG6r033pO for <pwe3@ietfa.amsl.com>; Mon, 25 Apr 2011 18:07:28 -0700 (PDT)
Received: from sj-iport-6.cisco.com (sj-iport-6.cisco.com [171.71.176.117]) by ietfa.amsl.com (Postfix) with ESMTP id 7712DE0699 for <pwe3@ietf.org>; Mon, 25 Apr 2011 18:07:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=cpignata@cisco.com; l=22514; q=dns/txt; s=iport; t=1303780048; x=1304989648; h=subject:references:content-transfer-encoding:from: in-reply-to:message-id:date:to:cc:mime-version; bh=AVo073Q+3X+7hqdAjwz9+yPMhP7p6I2UVay6VhGpdJc=; b=kYZ+TlI66BKelAczejJdlIK0vpa+musML/8boiTaRC2BhKMemHkBksDY S3aCWy9DjpKXWEE04cN2wj3eGuNTHaJr1XND00JfZLdeyP0NPsCSkQ7cL 0qmhCVCE+H3/OgmSIwIGqwX/XLpvadZV0k4DpKJfgRcKADUtzH+1iDq5R k=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AnUFACAatk2tJXG//2dsb2JhbACET6AIYwJ3iHCgSotgkSuBKYFWgXp9BI41hA8
X-IronPort-AV: E=Sophos;i="4.64,267,1301875200"; d="scan'208";a="686909720"
Received: from rcdn-core2-4.cisco.com ([173.37.113.191]) by sj-iport-6.cisco.com with ESMTP; 26 Apr 2011 01:07:27 +0000
Received: from xbh-rcd-202.cisco.com (xbh-rcd-202.cisco.com [72.163.62.201]) by rcdn-core2-4.cisco.com (8.14.3/8.14.3) with ESMTP id p3Q17RfY022493; Tue, 26 Apr 2011 01:07:27 GMT
Received: from xmb-rcd-206.cisco.com ([72.163.62.213]) by xbh-rcd-202.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 25 Apr 2011 20:07:26 -0500
Received: from 72.163.62.204 ([72.163.62.204]) by XMB-RCD-206.cisco.com ([72.163.62.213]) with Microsoft Exchange Server HTTP-DAV ; Tue, 26 Apr 2011 01:07:26 +0000
References: <4D92F5D3.6080609@pi.nu> <201103310703.p2V73Hrx045107@harbor.orleans.occnc.com> <AANLkTimwxBexevis2-5-vrGM-hA2APZeC0djwxUyV0jU@mail.gmail.com> <4DB108AD.2040809@cisco.com> <BANLkTi=hUyLRm0SLGfUhcgYfdyeox_d2ww@mail.gmail.com> <164FBBE6-5814-4C12-8CC1-C1A4D22D7BE6@cisco.com> <BANLkTim7jvo3c3VDuSB2-OPzj7hhM9xq7A@mail.gmail.com> <4DB5C9F6.60402@cisco.com> <BANLkTinPCw0AaOoyFj04mtVD=QxknWYw+w@mail.gmail.com>
Content-Transfer-Encoding: base64
From: "Carlos Pignataro (cpignata)" <cpignata@cisco.com>
Content-Type: text/plain; charset="utf-8"
In-Reply-To: <BANLkTinPCw0AaOoyFj04mtVD=QxknWYw+w@mail.gmail.com>
Message-ID: <E9EEA702-D8A6-4ECF-BFD8-65B6DB55632E@cisco.com>
Date: Mon, 25 Apr 2011 21:07:15 -0400
To: Sriganesh Kini <sriganesh.kini@ericsson.com>
thread-topic: [PWE3] GAL above and below the PW label
thread-index: AcwDrk6jq6rGPYrBR9O6HmqXA3BKuw==
MIME-Version: 1.0 (iPad Mail 8H7)
X-OriginalArrivalTime: 26 Apr 2011 01:07:26.0549 (UTC) FILETIME=[4EA46050:01CC03AE]
Cc: pwe3@ietf.org
Subject: Re: [PWE3] GAL above and below the PW label
X-BeenThere: pwe3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Pseudo Wires Edge to Edge <pwe3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pwe3>, <mailto:pwe3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pwe3>
List-Post: <mailto:pwe3@ietf.org>
List-Help: <mailto:pwe3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pwe3>, <mailto:pwe3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Apr 2011 01:07:30 -0000

Sriganesh,

Please see inline. 

Carlos P.

On Apr 25, 2011, at 7:29 PM, "Sriganesh Kini" <sriganesh.kini@ericsson.com> wrote:

> Hi Carlos,
> 
> Pls see inline.
> 
> On Mon, Apr 25, 2011 at 12:22 PM, Carlos Pignataro <cpignata@cisco.com> wrote:
>> Hi Sriganesh,
>> 
>> Please see inline.
>> 
>> On 4/25/2011 2:56 PM, Sriganesh Kini wrote:
>>> Hi Carlos,
>>> 
>>> Pls see inline.
>>> 
>>> On Fri, Apr 22, 2011 at 6:23 PM, Carlos Pignataro (cpignata)
>>> <cpignata@cisco.com> wrote:
>>>> Hi Sriganesh,
>>>> 
>>>> Thanks for the quick response and low RTT, please find some follow-ups inline.
>>>> 
>>>> Carlos P.
>>>> 
>>>> On Apr 22, 2011, at 2:20 PM, "Sriganesh Kini" <sriganesh.kini@ericsson.com> wrote:
>>>> 
>>>>> Hi Carlos,
>>>>> 
>>>>> The applicability of draft-kini-pwe3-inband-cc-offset is when CW is
>>>>> not present, but the intermediate LSR looks beyond the PW label to do
>>>>> multipath decisions. The draft is independent of any (even proprietary
>>>>> logic) for multipath decision taken by intermediate nodes looking
>>>>> beyond PW label and also independent of any heuristic the intermediate
>>>>> routers may use to guess the packet type (beyond label stack) being
>>>>> carried.
>>>> 
>>>> I think that this statement is a bit too optimistic and too generalized. It seems to me that, in addition, the PE needs to understand the flows carried inside the PW, have some clue as to the type of ECMP in the core, and there should have correlation between these two. This dramatically limited the applicability as you say below:
>>>> 
>>> 
>>> Correct that the PE needs to know the flow that the operator is trying
>>> to troubleshoot ! However the PE does not need to know the type of
>>> ECMP that is implemented in the core (P router). That algorithm would
>>> be typically proprietary. Not sure if you are saying that the PE
>>> should know the co-relation between the flow and the ECMP-type before
>>> the problem occurs. The draft is not a multi-path algorithm discovery
>>> mechanism. That is an important problem, but due to the proprietary
>>> nature of those algorithms a solution has limitations. The draft
>>> provides a PE a mechanism to troubleshoot a flow without having
>>> precise knowledge a-priori of how the multi-path algorithm is behaving
>>> for a specific flow.
>> 
>> Apologies as I was not clear. I did not mean that the PE would need to
>> know the ECMP algorithm. I meant to say that the PE would need to know
>> what the ECMP algorithm is expecting and what fields are relevant for
>> the ECMP algorithm. This was a round-about way of saying that, assuming
>> that ECMP algorithms hash on certain fields of an IP+Transport/ULP
>> header after an MPLS header, then this I-D is applicable for an IP PW.
> 
> Maybe we are saying the same thing here but let me try. I think we
> both agree that ECMP algorithms at a P router typically look into the
> packet hdr to do multipath decisions. The caveat being that they may
> use a heuristic to guess the packet type (due to incomplete
> information). As long as they do not send a flow along multiple paths
> everything is ok.
> 
> Consider a network with P routers P1, P2, P3, .... Say that different
> P routers have different ECMP algorithms - So P1 hashes only on the
> label stack, P2 looks at the first nibble beyond label stack and if it
> is IPv4/v6, hashes on IP src/dest. P3 does the same as P2 except that
> it looks further and hashes on protocol type and certain TCP/UDP hdrs
> params.
> 
> PE1 does not know the details of the P routers algorithms or which
> parts of the header they look into. Say the PE wants to troubleshoot a
> TCP session between a pair of IP addresses. The packet that is used
> for VCCV purposes would look the same as a normal data packet for that
> TCP session with the exception that the TTL would be as specified for
> TTL Expiry VCCV type-3 and that beyond the pseudo-flow header, that
> actual VCCV message would be present instead of the data packet.

No, this is the key. VCCV exists in the context of a PW, which is more generic than IP directly over MPLS. So there is, potentially, no TCP session from CEs would influence ECMP for an Ethernet PW, which is the need for Fat-PW to summarize the flow in something meaningful in the core.

> For
> the P routers, this VCCV message looks the same as a data packet since
> it is just looking at the header to do multipath decisions and the
> data and VCCV message would take the same path through the P routers.
> 
> In the case where the packet coming from the client has a MPLS label
> stack and that label stack is sent as a single label stack (i.e. PW
> label stack + client label stack), then again the client label stack
> would be part of the flow header (Note that PW label or the optional
> FAT-PW label will have S=0 in this case).
> 
>> Conversely, for an Ethernet PW, I do not see how you can do what the I-D
>> claims:
>> 
>> Â This document
>> Â  defines a simple extension to the TTL expiry CC (Type 3) to do inband
>> Â  VCCV. This can be used even without a CW.
>> 
>> since a(n IP) flow (as received from a PE) is after an Ethernet header,
>> but there is no ECMP based on these flows (since the core routers do not
>> know what the PW is carrying).
>> 
>> My point was that unless some flow identifier from the client
>> perspective is what is used for ECMP (which is not the case if there is
>> some L2 or MPLS between IP and the transport), then this technique
>> cannot be used.
> 
> If any field from the client packet is not used by the transit LSR for
> ECMP then TTL Expiry VCCV type 3 is always inband. So in that sense I
> agree. One of the problems we have with VCCV is that multiple types
> have been deployed (some of which are inband) and also that CW has
> been used in some PW deployments and not others. This draft is
> addressing cases where CW is not used and intermediate LSRs look
> beyond the PW labels (including FAT-PW) into the packet to do
> multipath decisions. This could be an IP packet immediately following
> label stack, any client labels that form a single label stack with the
> PW labels, or (as a stretch) even a P router heuristic such as - a
> nibble beyond label stack that indicates a non IPv4/6 packet is an
> ethernet header. I am not advocating such an approach, just making a
> point that this draft provides a solution that is agnostic of such
> behavior.

My comment is that I am not convinced of the latter, not even as a stretch. That's why I was asking about applicability. 

> 
>> 
>>> 
>>> Regarding applicability, consider an encapsulation that has a single
>>> label stack in the core consisting of the adaptation label stack (PW
>>> in this case) and the label stack of a MPLS packet received from the
>>> client CE. This kind of encapsulation is in
>>> draft-kini-pwe3-pkt-encap-efficient-ip-mpls when CW is not used. I
>>> believe a similar encap is used in the network layer adaptation sec
>>> 3.4.5 RFC 5921 for a client layer that is MPLS enabled. In such a case
>>> too it is possible to troubleshoot a flow (that includes the client
>>> labels and the packet being carried below) of the combined label
>>> stack.
>> 
>> I do not see how this applies other than for IP PWs. Let me try to ask a
>> different way: The proposal in draft-kini-pwe3-inband-cc-offset uses a
>> "pseudo flow header" after the PW label. It is the expectation that core
>> routers would ECMP-distribute based on said "pseudo flow header". It is
>> also the expectation that when this happens, this OAM would fate-share
>> with actual user traffic. Does this imply that PW traffic needs to start
>> (after the PW Label) with the same values as in the "pseudo flow
>> header", and thus the flow identifiers for regular traffic follow the PW
>> Label directly?
>> 
> 
> The PW payload stays exactly the way it does according to the
> adaptation defined for that type of PW. This draft does not define any
> new adaptation, so it has no new implications on where the PW traffic
> needs to start. All it defines is a variant of the TTL expiry VCCV,
> where the VCCV message starts at an offset after the PW label. The
> packet encoding (after the PW label) till the offset is exactly the
> same as for a data packet (for which the inband OAM needs to be
> executed).
> 

Yes, and that helps for IP PWs, where existing methods can test all paths. 

Thanks,

Carlos. 

>>> 
>>> Also note that this mechanism has some similarity to
>>> draft-nadeau-pwe3-vccv-2. The primary difference being that in
>>> draft-nadeau the GAL is used to identify an OAM packet instead of TTL,
>>> hence the presence of GAL in the hash computation is an issue.
>>> Moreover due to the offset, the inband-vccv draft  is also independent
>>> of the pkt (and multi-path decision based on that) being carried below
>>> the label stack.
>>> 
>>>>> The most common heuristic is of course looking at the first
>>>>> nibble beyond label stack to determine that it is an IP packet. So
>>>>> this would also apply for an IP PW, IPLS, etc in addition to
>>>>> draft-kini-pwe3-pkt-encap-efficient-ip-mpls.
>>>> 
>>>> Right. I think it would apply to an IP PW, where there are existing techniques to walk multipaths. But it would not to say an Ethernet PW (without CW): even if the PE sees the flows inside the Ethernet (e.g., TCP flows over IP over Ethernet), the is no ECMP in the core based on those.
>>>> 
>>> 
>>> I would like to believe that too. But I have heard that
>>> implementations do different things with some packet inspection.
>>> 
>>>>> LSP-Ping uses the 127
>>>>> address and it is not easy to co-relate this to a problem reported for
>>>>> real (non 127) addresses.
>>>> 
>>>> I think LSP-Ping can do better: identify and walk all ECMPs.
>>> 
>>> If the ECMP algorithm is proprietary and info about that is not
>>> available to the PE, there will be issues.
>> 
>> <http://tools.ietf.org/html/rfc4379#section-3.3> does not need to know
>> the internals.
> 
> Yes, that can be used to identify LSP paths, find out which IP
> addresses (127 range) map to which LSP nexthop and so on. But VCCV is
> determining the health of the PW at the PW layer that may include
> multiple LSPs (especially MS-PW). When the PW can take multiple paths
> it is useful to quickly determine if a client flow that needs
> troubleshooting has a problem and which segment of the PW has that
> problem. That may be followed by a LSP tracing for the segment of the
> PW that has a problem.
> 
>> 
>> Thanks,
>> 
>> -- Carlos.
>> 
>>> 
>>>> I also think that the assumption of a problem being reported on a given flow, only for an IP PW Type is too expensive for the return.
>>>> 
>>>> Thanks,
>>>> 
>>>> -- Carlos.
>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Thu, Apr 21, 2011 at 9:48 PM, Carlos Pignataro <cpignata@cisco.com> wrote:
>>>>>> Hi Sriganesh,
>>>>>> 
>>>>>> Do you see the applicability of draft-kini-pwe3-inband-cc-offset
>>>>>> strictly limited to the encapsulation proposed in
>>>>>> draft-kini-pwe3-pkt-encap-efficient-ip-mpls?
>>>>>> 
>>>>>> Specifically, for a PW that carries say Ethernet without CW (or any of
>>>>>> the other encap modes with non-mandatory CW), there is no "flow" that is
>>>>>> being ECMP-ed in the core that you want to mimic/imitate in a
>>>>>> load-balancing pseudo-header. Conversely, for "IP PWs", LSP-Ping can
>>>>>> take different Equal-cost paths (all of them, one by one) based on
>>>>>> choosing the destination IP address. And there is no need for this when
>>>>>> there is a CW.
>>>>>> 
>>>>>> I am trying to understand how common is the use case for what this
>>>>>> optimizes for.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> -- Carlos.
>>>>>> 
>>>>>> On 3/31/2011 4:08 AM, Sriganesh Kini wrote:
>>>>>>> All,
>>>>>>> 
>>>>>>> Just an FYI that a proposal for fate-sharing PW OAM and data given in
>>>>>>> draft-kini-pwe3-inband-cc-offset
>>>>>>> 
>>>>>>> Just to summarize the proposal.
>>>>>>> 1. It does not require GAL in PW. TTL expiry is used to alert the S/T-PE.
>>>>>>> 2. It is mainly applicable when CW is not used in the PW.
>>>>>>> 3. It uses a fixed offset (negotiated between PW endpoints) after the
>>>>>>> label stack before starting the OAM msg.
>>>>>>> 4. The bytes between the label-stack and the fixed offset is referred
>>>>>>> to as a pseudoflow header and is filled with byte-values (by the PE)
>>>>>>> that represent the flow for which OAM is desired. This helps PW OAM
>>>>>>> and data to fate-share even when the intermediate node looks beyond
>>>>>>> label stack to do multipath forwarding decisions.
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 31, 2011 at 12:03 AM, Curtis Villamizar <curtis@occnc.com> wrote:
>>>>>>>> 
>>>>>>>> Loa, Dave, Sasha,
>>>>>>>> 
>>>>>>>> I've snipped from various posts on the thread that Sasha started. Â See
>>>>>>>> inline.
>>>>>>>> 
>>>>>>>> In message <4D92F5D3.6080609@pi.nu>
>>>>>>>> Loa Andersson writes:
>>>>>>>>> 
>>>>>>>>> All,
>>>>>>>>> 
>>>>>>>>> missed a nuance in Sasha subject line.
>>>>>>>>> 
>>>>>>>>> We have two issues
>>>>>>>>> 
>>>>>>>>> - where the GAL is placed relative to the PW label, I Â believe it is
>>>>>>>>> Â  Â necessary to have the GAL below the PW label.
>>>>>>>>> 
>>>>>>>>> - whether the the GAL label needs to be bottom of stack or not, it
>>>>>>>>> Â  Â figure that this is really a discussion if it is possible to have
>>>>>>>>>   Â a FAT label below the GAL or not. I'm not sure  about my preferences
>>>>>>>>> Â  Â but I think it is possible.
>>>>>>>>> 
>>>>>>>>> /Loa
>>>>>>>> 
>>>>>>>> I agree with Loa's summary. Â Not putting GAL at the bottom may confuse
>>>>>>>> some LSR, but putting it above the PW label is likely to be even more
>>>>>>>> problematic.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In message <60C093A41B5E45409A19D42CF7786DFD51D5310B53@EUSAACMS0703.eamcs.ericsson.se>
>>>>>>>> David Allan I writes:
>>>>>>>>> 
>>>>>>>>> It does become a new behavior, GAL with S=3D0. However the combination of G=
>>>>>>>>> AL being top label at the S-PE and TTL being encoded in the PW label means =
>>>>>>>>> that fate sharing is broken at every S-PE. Life only gets a little more str=
>>>>>>>>> ange if there is a FAT label as well...
>>>>>>>>> 
>>>>>>>>> That being said, GAL as bottom label is broken in any ECMP environment, whi=
>>>>>>>>> ch is why GAL is a TP construct.
>>>>>>>>> 
>>>>>>>>> my 2 cents
>>>>>>>>> D
>>>>>>>> 
>>>>>>>> Dave,
>>>>>>>> 
>>>>>>>> If GAL is broken for ECMP, which it is, then all TP OAM is broken.
>>>>>>>> If all it takes to fix it is simple then lets just fix it.
>>>>>>>> 
>>>>>>>> This is what we'd have to do.
>>>>>>>> 
>>>>>>>> Â 1) Relax the requirement that GAL be at the bottom
>>>>>>>> 
>>>>>>>> Â 2) Have the ingress insert GAL in the stack immediately below the
>>>>>>>> Â  Â  label for which the measurement is made, keeping the rest of the
>>>>>>>> Â  Â  label stack in place.
>>>>>>>> 
>>>>>>>> The only thing the midpoing LSR at a multipath (LAG, link bundle for
>>>>>>>> MPLS) has to do is skip over the GAL when hashing, as if the GAL
>>>>>>>> wasn't there. Â That will yield the same hash value and it will
>>>>>>>> preserve fate-sharing across multipath.
>>>>>>>> 
>>>>>>>> I prefer that we fix things rather than complain that they are broken.
>>>>>>>> 
>>>>>>>> Curtis
>>>>>>>> _______________________________________________
>>>>>>>> pwe3 mailing list
>>>>>>>> pwe3@ietf.org
>>>>>>>> https://www.ietf.org/mailman/listinfo/pwe3
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> pwe3 mailing list
>>>>>> pwe3@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/pwe3
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> - Sri
>>>>> 
>>>> _______________________________________________
>>>> pwe3 mailing list
>>>> pwe3@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/pwe3
>>>> 
>>> 
>>> 
>>> 
>> _______________________________________________
>> pwe3 mailing list
>> pwe3@ietf.org
>> https://www.ietf.org/mailman/listinfo/pwe3
>> 
> 
> 
> 
> -- 
> - Sri
>