Re: [mpls] https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01

Petr Lapukhov <petr@fb.com> Mon, 30 March 2015 03:21 UTC

Return-Path: <prvs=1531998453=petr@fb.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8D2C1A1AA5; Sun, 29 Mar 2015 20:21:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.084
X-Spam-Level:
X-Spam-Status: No, score=0.084 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, IP_NOT_FRIENDLY=0.334, MIME_CHARSET_FARAWAY=2.45, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QVy2L7NiIP0V; Sun, 29 Mar 2015 20:21:28 -0700 (PDT)
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 19C9A1A1A8B; Sun, 29 Mar 2015 20:21:28 -0700 (PDT)
Received: from pps.filterd (m0044010 [127.0.0.1]) by mx0a-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id t2U3KkYR007502; Sun, 29 Mar 2015 20:21:27 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=facebook; bh=18+i9+2ZfWISUvwIAqsZlanQ3YJf2AOvlQmRNE+7XfI=; b=f/wRqH3i02tdPjbakNS68rdlXdvZZUyx2Lz/S8XhlE1lHUW5Hexc8sPu9e/6/zGwEzZY hH/fvV2fnRuYuUiMWt1hZ6nskpPirweqU8agPtChxbVyAGoWnY46MY9lOqGtX8jUWQ0V Ot9xvuLGddkxudViOanmEUj6u3qy+AMoISA=
Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 1ter070e7m-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Sun, 29 Mar 2015 20:21:26 -0700
Received: from PRN-MBX01-2.TheFacebook.com ([169.254.4.97]) by PRN-CHUB11.TheFacebook.com ([fe80::80d:37ff:4b6a:a4fc%12]) with mapi id 14.03.0195.001; Sun, 29 Mar 2015 20:21:24 -0700
From: Petr Lapukhov <petr@fb.com>
To: Robert Raszuk <robert@raszuk.net>
Thread-Topic: https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01
Thread-Index: AQHQZ1fPspED0N7DmUK9mRcUoHV2rZ0t8KszgAChd4CABb8YNQ==
Date: Mon, 30 Mar 2015 03:21:23 +0000
Message-ID: <3F437107848A5140A6A19222EFFB34811FDF37B5@PRN-MBX01-2.TheFacebook.com>
References: <CA+b+ERnGa3TOo5-qu5RWyPXcjduKCrYzX0hR2F6NEkNoQe9h=w@mail.gmail.com> <A5CA7469-D5E8-4E55-903A-89B048E2267F@fb.com>, <CA+b+ER=BgokDsO-56KNYZoTKu5q8s4zN8yY50TANPVBVujdCiw@mail.gmail.com>
In-Reply-To: <CA+b+ER=BgokDsO-56KNYZoTKu5q8s4zN8yY50TANPVBVujdCiw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [192.168.52.13]
Content-Type: multipart/alternative; boundary="_000_3F437107848A5140A6A19222EFFB34811FDF37B5PRNMBX012TheFac_"
MIME-Version: 1.0
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68, 1.0.33, 0.0.0000 definitions=2015-03-30_01:2015-03-28,2015-03-29,1970-01-01 signatures=0
Archived-At: <http://mailarchive.ietf.org/arch/msg/mpls/9ehfG87CzIuEOBhB5OzlfGpHkHo>
X-Mailman-Approved-At: Sun, 29 Mar 2015 20:24:08 -0700
Cc: "mpls@ietf.org" <mpls@ietf.org>, "nvo3@ietf.org" <nvo3@ietf.org>, Luyuan Fang <lufang@microsoft.com>
Subject: Re: [mpls] https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Mar 2015 03:21:32 -0000

Robert,

On MPLS selling points:

Generally speaking, any tunneling technique may achieve the FIB state compression in "transit" (but not all) devices (by the virtue of core/edge asymmetry). MPLS is just one of these techniques, which seems attractive due to its simplicity (flat label lookup) and protocol agnostic operations. However, since IPv4/IPv6 is not going away from DC boxes, the advantage of simplicity is diminished - it's not like we would see super-cheap and easy-to-operate MPLS-only DC switches on the market. Additionally, I personally don't see FIB state explosion as such a daunting problem in properly engineered/summarized DC/DCI networks, but that's a separate conversation.

As another plus for MPLS, MPLS OAM related techniques seem attractive as they allow to test the "circuit" independent of payload. This is true in "pure" MPLS case, but if hardware parser is looking beyond the MPLS stack (which it often does) this "protocol-independent" OAM becomes not-so-independent :). Another good reason for MPLS OAM is the ability to lock down specific paths for testing, though one may argue that any source-routed technique would do that. However, MPLS just seems to be the one with least overhead and supported across multiple DC hardware platforms.

Despite the advantages, the are some actual technical challenges with MPLS in data-center. For example, some platforms limit L3 ECMP to IP2MPLS function only (though L2 ECMP is still possible with MPLS header). Similar issue is happens when handing anycast destinations, which is critical to many load-balancing solutions. These issues could be addressed by performing IP lookups and building label stacks in the host, which is the ultimately the segment/source routing approach to the problem. This also allows the network to be purely MPLS.

Pushing IP2MPLS edge to the host level seems like a great idea until you consider the amount of state that one needs to distribute & synchronized among potentially millions of devices. For example, if you want to ECMP off of a host in a DC network with 128 paths you need to inform every host of a link failure when any of these 128 paths becomes unavailable (assuming that any host may talk to any host over any path). Distributed state synchronization is a not an easy problem, and it hell of a pain to troubleshoot, especially when many devices are involved.

This is why, I'm personally in favor of hybrid approaches, where source-routing (not necessarily via MPLS, though) is added on top of traditional IP/shortest-path model, and allows for extended OAM functionality, while retaining full compatibility with the traditional approach. This does not allow for achieving the goal of ultimate FIB compression, but as I mentioned I do not see this as a serious problem with proper route summarization...

Regards,

Petr

________________________________
From: rraszuk@gmail.com [rraszuk@gmail.com] on behalf of Robert Raszuk [robert@raszuk.net]
Sent: Wednesday, March 25, 2015 8:34 PM
To: Petr Lapukhov
Cc: Luyuan Fang; mpls@ietf.org; nvo3@ietf.org; Pedro Marques
Subject: Re: https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01

Hello Petr,

Thank you very much for yr comment ! One question:

> To me, the only good selling point for mpls in DC, in my opinion,
> is having a uniform end to end transport (with corresponding OAM etc).

Let me understand this. What is the definition of "uniform end to end transport" ?

If you use IP for transport and say use L3VPN option C in overlay or for example LISP what is not uniform in compute node to compute node transport across any DC or any region ?

As far as OAM do we have any missing tools with IP operation ?

- - -

I am just trying to understand technical rationale - if any such exist - aside from sales, political, religious or fanatic - why anyone would propose mpls transport for DC underlay.

- - -

No longer then today I was actually arguing with one networking vendor that MPLS as demux value (say in RFC4364) makes a lot of sense for DCs multi tenant rather then reinventing the wheel and use different name for effectively the same function (GRE key or NVGRE VSID). But that is in the overlay space. Completely different topic.

Best regards,
r.


On Thu, Mar 26, 2015 at 1:56 AM, Petr Lapukhov <petr@fb.com<mailto:petr@fb.com>> wrote:
AFK, so can't write a well-formed comment :(

but in short, my personal experience was that circuit-like transports play well as *augmentation* to shortest-path / ecmp / longest-prefix match techniques, not as a complete replacement (after all, ip already works). Mpls circuits are alright if you have network asymmetry and need to work around it, but in symmetric topologies they seem rather unnecessary, unless you really want to have end-to-end uniform data plane, which has both downsides and benefits.

Robert and I had some discussions around pure mpls / seamless mpls DC + DCI networks a couple of years ago, but it was hard to find a strong selling point for mpls (I was arguing for mpls, btw). In general, MPLS offers uniform, protocol agnostic forwarding plane, with simple lookup, but the latter is not such a big win with modern (and upcoming) silicon. Next, for entropy reasons it is often necessary to resort to leaky abstractions with mpls  (eg nibble guessing) or add complications (entropy label), which makes the architecture more complicated.

Additionally, I feel that FIB compaction has more to do with network structure and careful control of state propagation rather that underlying forwarding mechanism. On this side, something that could be achieved with IP via simple summarization requires rather sophisticated LSP hierarchies with mpls.

To me, the only good selling point for mpls in DC, in my opinion, is having a uniform end to end transport (with corresponding OAM etc). It is not very clear whether this has more advantages than downsides, and requires a separate discussion :)

Petr

Mar 25, 2015, в 5:00 PM, "Robert Raszuk" <robert@raszuk.net<mailto:robert@raszuk.net>> написал(а):

Hello Luyuan,

Quote:


"The HSDN forwarding architecture in the underlay network isbased on four main concepts: 1. Dividing the DC and DCI in ahierarchically-partitioned structure; 2. Assigning groups ofUnderlay Border Nodes in charge of forwarding within each partition; 3. Constructing HSDN MPLS label stacks to identify the end points according to the HSDN structure; and 4. Forwarding using the HSDN MPLS labels."


Can you provide any reasoning for going to such complexity when trying to use MPLS as transport within and between DCs as compared with using IP based transport ? Note that IP based transport native summarization provides unquestionable forwarding FIB compression.


Quote:


"HSDN is designed to allow the physical decoupling ofcontrol and forwarding, and have the LFIBs configuredby a controller according to a full SDN approach. Th controller-centric approach is described in this document."


+

Quote:


"2) The network nodes MUST support MPLS forwarding."



Please kindly note that to the best of my knowledge number of ODMs routers used to construct IP CLOS Fabric does not really have control plane which supports MPLS transport. Neither distributed nor centrally ie via controller managed.


Quote:


"The key observation is that it is impractical, uneconomical, and
ultimately unnecessary to use a fully connected Clos-based topology in a large scale DC."


That is an interesting statement. I think however that one should distinguish interconnected regions with proper CLOS fabric from some sort of CLOS fabric want-to-be type of topologies. In any case it has no bearing on the main points of the scalable interconnect discussion.


- - -


While we could go via number of other comments let's cut it short.


Your draft states that HSDN works with IPv4 transport in the below statement:


Quote:


"Although HSDN can be used with any forwarding technology, including IPv4 and IPv6,"


1. Can you summarise reasons what problems do you see with IPv4/IPv6 based underlay in the DCs that drove you to provide this document to be based on MPLS ?


(Note that tenant mobility is the overlay task and nothing to do with underlay.)


2. Can you describe how are you going to distribute MPLS stack to be used for forwarding in the underlay to servers ?


3. How are you going to provide efficient ECMP intra-dc ? I see no trace of entropy labels in your document.


4. For TE is there anything missing in the below document ?

https://tools.ietf.org/html/draft-lapukhov-bgp-sdn-00


Many thx,

r.