Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc

Diego Garcia del Rio <diego@nuagenetworks.net> Wed, 18 November 2015 06:25 UTC

Return-Path: <diego@nuagenetworks.net>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 15ED81ACEBB for <bess@ietfa.amsl.com>; Tue, 17 Nov 2015 22:25:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.277
X-Spam-Level:
X-Spam-Status: No, score=-1.277 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fqgixT0Ogcn7 for <bess@ietfa.amsl.com>; Tue, 17 Nov 2015 22:25:23 -0800 (PST)
Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com [IPv6:2607:f8b0:4001:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9FC591ACEB8 for <bess@ietf.org>; Tue, 17 Nov 2015 22:25:23 -0800 (PST)
Received: by igvi2 with SMTP id i2so112687158igv.0 for <bess@ietf.org>; Tue, 17 Nov 2015 22:25:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nuagenetworks-net.20150623.gappssmtp.com; s=20150623; h=mime-version:date:message-id:subject:from:to:content-type; bh=hSEt5EVf8DXF4W9lPQbsD4SwFM2nxWgAcLuv0SDdzZw=; b=xLfk/f2s4td0ZIY8j1bnaBSvP7/6auOsddEFvp4bzuEeh/hvQymuUczkQ1JC2/UBzs GT2uPXxxt/LsMuuk7RBFMkpn93AOe8pcGH+vmVHUpbu5nU+YR9BYl157KM0D4lkbRwSV nfIGXm4W3q4AuCQ2czhM9jnNihFYIWcJN1RBYCnqGnJEyfhDGVDUBISmx0TzM4RRPLgr 3aq/lEXcO0TWrkhPPsFxI2Y/FTzYXM3tR2slC4/GBzcvQgfzKzFW3pztHiaMcvSzRP0t jCFtwS0V6Y3d0eJ1d5Nl50a2LEjWEDviPBNfBO65SsruRD8SyAdo0L+KwVirigGW1ly7 EZkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=hSEt5EVf8DXF4W9lPQbsD4SwFM2nxWgAcLuv0SDdzZw=; b=jxi4P/ImBLqDUFr2NCltXh5ZZ+yOcR0+pph8zM+V27mscyOc4+VsG0eKPKelrfjTBp I5b+ZfFHu2Ks+voxh+PO9fm7eNly/jTM78TXBfjpvP/7Ho8q6EaXp2vg8q+R77KCAibo I9VH+laYRIVHKZlC/L5T2KJB1DC8CdmEETozcH/lOVSuHiZNlbSa3rgC98PFbKpKyWbM IlCUBhncBjx0GqXBSkzdtYBHxU9k/WFCLGaqQNgaSqU+8+u7QAzrVDTeMec7ts3q/uqm 8h4b2SSFGZI1YTGVZHoKNa5VsqV0j02IdBsIe8cseYkWCVeI3Rh4tKtx6ky5uZldFXBL 9z6g==
X-Gm-Message-State: ALoCoQlQKsSORE4LNCGNGyEYZjLS3urRtFBEu3iO+tSHmKXGYaoIjNQDGQKCt1FOHSRjOUt4PbVW
MIME-Version: 1.0
X-Received: by 10.50.150.37 with SMTP id uf5mr6443712igb.10.1447827922877; Tue, 17 Nov 2015 22:25:22 -0800 (PST)
Received: by 10.107.16.210 with HTTP; Tue, 17 Nov 2015 22:25:22 -0800 (PST)
Date: Tue, 17 Nov 2015 22:25:22 -0800
Message-ID: <CACS9xV+FhSwLHmhAO42PTE3iOWEk4YgYm7uDC0B-faH0dOewPQ@mail.gmail.com>
From: Diego Garcia del Rio <diego@nuagenetworks.net>
To: bess@ietf.org
Content-Type: multipart/alternative; boundary="001a11c2c3500a1a5e0524cab5fd"
Archived-At: <http://mailarchive.ietf.org/arch/msg/bess/_PhEez3fLphWXacjaZ_loeWtsOM>
Subject: Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2015 06:25:29 -0000

Some comments from my side,


I think the draft makes quite a few assumptions on specific
implementation details that are way too general to be considered
widely available.

Most of the TOR devices today already pay a double-pass penalty when
doing routing of traffic in/out of vxlan-type tunnels. Only the newest
generation can route into tunnels without additional passes. And are
definitively limited in being able to arbitrary select UDP ports on a
per tunnel basis. In fact, most are even limited at using more than
one VNID per "service" of sorts.

The IP-addressed based implementation which would, I assume, imply
assigning one or more CIDRs to a loopback interface on the ASBR-d is
also quite arbitrary and does not seem like a technically "clean"
solution. (besides burning tons of IPs). As a side-note, most PE-grade
routers i've worked with were quite limited in terms of IP addresses
used for tunnel termination and it wasn't that just a simple pool can
be used.

Wim's mentions on cases where the Application itself, hosted in a
datacenter, would be part of the option-C interconnect, was dismissed
without much discussion so far, while, if we look in detail at the
type of users which will even consider a complex topology like model-C
its most likely users and operators very familiar with MPLS VPNs in
the WAN. Those type of operators will most likely be very interested
in deploying MPLS or WAN-grade applications (i.e., virtual-routers or
other VNFs) in the DC and thus its highly likely that the interconnect
would not terminate at the NVE itself but rather the TS (the virtual
machine).

Also, the use of UDP ports at random would imply quite complex logic
on the ASBR-d IMHO. Im not saying its impossible, but since the
received packet now not only has to be received on a random (though
locally chosen) UDP port and this information preserved in the
pipeline to be able to do the double-tunnel-stitching, it sounds quite
complex. I am sure someone in the list will mention this has already
been implemented somewhere, and I won't argue with that. And let's not
even bring into account what this would do to any DC middlebox that
now has to look at vxlan over almost any random port. We have to go
back to the "is it a 4 or is it a 6 in byte x" heuristics to try to
guess whether the packet is vxlan or just something entirely different
running over IP.

In general I feel the proposed solution seems to be fitting of a
specific use-case which is not really detailed in the draft and does
not describe   a solution that would "elegantly" solve the issues at
hand. It just feels like we're using any available bit-space to stuff
data into places that do not necesarily belong.

Yes, MPLS encapsulations on virtual switches are not yet fully
available, and there can be some performance penalty on the TORs, but
the solutions are much cleaner from a control and data plane point of
view. Maybe I'm too utopic.


Best regards,

Diego




---------------------------------------------------------------------------------

Hi,

The problem we are trying to solve is to reduce data center
GW/ASBR-d's forwarding table size, the motivation is same as
traditional MPLS VPN option-C. Currently, the most common practise on
ASBR-d is to terminate VXLAN encapsulation, look up local routing
table, and then perform MPLS encapsulation to the WAN network. ASBR-d
needs to maintain all VM's MAC/IP. In Option-C method, only transport
layer information needed to be maintained at GW/ASBR-d, the
scalability will be greatly enhanced. Traditonal Option-C is only for
MPLS VPN network interworking, because VXLAN is becoming pervasive in
data center,  the solution in this draft was proposed for the
heterogeneous network interworking.

The advantage of this solution is that only VXLAN encapsulation is
required for OVS/TOR. Unlike Wim's solution, east-west bound traffic
uses VXLAN encap, while north-south bound traffic uses MPLSoGRE/UDP
encap.

There are two solutions in this draft:

1. Using VXLAN tunnel destination IP for stitching at ASBR-d.

No data plane modification requirements on OVS or TOR switches, only
hardware changes on ASBR-d. ASBR-d normally is router, it has
capability to realize the hardware changes. It will consume many IP
addresses and the IP pool for allocation needs to be configured on
ASBR-d beforehand.

2. Using VXLAN destination UDP port for stitching at ASBR-d.

Compared with solution 1, less IP address will be consumed for
allocation. If UDP port range is too large, we can combine with
solution 1 and 2.

In this solution, both data plane modification changes are needed at
OVS/TOR and ASBR-d. ASBR-d also has capability to realize the hardware
changes. For OVS, it also can realize the data plane changes. For TOR
switch, it normally can't realize this function.  This solution mainly
focuses on pure software based overlay network, it has more
scalability. In public cloud data center, software based overlay
network is the majority case.



Whether using solution 1 or 2 depends on the operators real envionment.



So I think our solution has no flaws, it works fine.

Thanks,

weiguo



________________________________
From: BESS [bess-bounces@ietf.org] on behalf of John E Drake
[jdrake@juniper.net]
Sent: Wednesday, November 18, 2015 2:49
To: Henderickx, Wim (Wim); EXT - thomas.morin@orange.com; BESS
Subject: Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc

Hi,

I think Wim has conclusively demonstrated that this draft has fatal
flaws and I don’t support it.  I also agree with his suggestion that
we first figure out what problem we are trying to solve before solving
it.

Yours Irrespectively,

John

From: BESS [mailto:bess-bounces@ietf.org] On Behalf Of Henderickx, Wim (Wim)
Sent: Tuesday, November 17, 2015 12:49 PM
To: EXT - thomas.morin@orange.com; BESS
Subject: Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc

— Snip —

No, the spec as it is can be implemented in its VXLAN variant with
existing vswitches (e.g. OVS allows to choose the VXLAN destination
port, ditto for the linux kernel stack).

(ToR is certainly another story, most of them not having a flexible
enough VXLAN dataplane nor support for any MPLS-over-IP.)

WH> and how many ports simultaneously would they support? For this to
work every tenant needs a different VXLAN UDP destination port/receive
port.
There might be SW elements that could do some of this, but IETF
defines solutions which should be implemented across the board
HW/SW/etc. Even if some SW switches can do this, the proposal will
impose so many issues in HW/data-plane engines that I cannot be behind
this solution.

To make this work generically we will have to make changes anyhow.
Given this, we better do it in the right way and guide the industry to
a solution which does not imply those complexities. Otherwise we will
stick with these specials forever with all consequences (bugs, etc).

- snip -

From: "thomas.morin@orange.com<mailto:thomas.morin@orange.com>"
<thomas.morin@orange.com<mailto:thomas.morin@orange.com>>
Organization: Orange
Date: Tuesday 17 November 2015 at 01:37
To: Wim Henderickx
<wim.henderickx@alcatel-lucent.com<mailto:wim.henderickx@alcatel-lucent.com>>,
BESS <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc

Hi Wim, WG,

2015-11-16, Henderickx, Wim (Wim):

2015-11-13, Henderickx, Wim (Wim):
Thomas, we can discuss forever and someone need to describe
requirements, but the current proposal I cannot agree to for the
reasons explained.

TM> Well, although discussing forever is certainly not the goal, the
reasons for rejecting a proposal need to be thoroughly understood.
WH> my point is what is the real driver for supporting a plain VXLAN
data-plane here, the use cases I have seen in this txt is always where
an application behind a NVE/TOR is demanding option c, but none of the
NVE/TOR elements.


My understanding is that the applications  are contexts where overlays
are present is when workloads (VMs or baremetal) need to be
interconnected with VPNs. In these contexts, there can be reasons to
want Option C to reduce the state on ASBRs.

In these context, its not the workload (VM or baremetal) that would
typically handle VRFs, but really the vswitch or ToR.

WH2> can it not be all cases: TOR/vswitch/Application. I would make
the solution flexible to support all of these not?




2015-11-13, Henderickx, Wim (Wim):

TM> The right trade-off to make may in fact depend on whether you prefer:
(a) a new dataplane stitching behavior on DC ASBRs (the behavior
specified in this draft)
or (b) an evolution of the encaps on the vswitches and ToRs to support
MPLS/MPLS/(UDP or GRE)

WH> b depends on the use case

I don't get what you mean by "b depends on the use case".
WH> see my above comment. If the real use case is an application
behind NVE/TOR requiring model C, than all the discussion on impact on
NVE/TOR is void. As such I want to have a discussion on the real
driver/requirement for option c interworking with an IP based Fabric.

Although I can agree than detailing requirements can always help, I
don't think one can assume a certain application to dismiss the
proposal.

WH> for me the proposal is not acceptable for the reasons explained:
too much impact on the data-planes


I wrote the above based on the idea that the encap used in
MPLS/MPLS/(UDP or GRE), which hence has to be supported on the ToRs
and vswitches.
Another possibility would be service-label/middle-label/Ethernet
assuming an L2 adjacency between vswitches/ToRs and ASBRs, but this
certainly does not match your typical DC architecture. Or perhaps had
you something else in mind ?

WH> see above. The draft right now also requires changes in existing
TOR/NVE so for me all this discussion/debate is void.

No, the spec as it is can be implemented in its VXLAN variant with
existing vswitches (e.g. OVS allows to choose the VXLAN destination
port, ditto for the linux kernel stack).

(ToR is certainly another story, most of them not having a flexible
enough VXLAN dataplane nor support for any MPLS-over-IP.)

WH> and how many ports simultaneously would they support?

WH> and depending on implementation you don’t need to change any of
the TOR/vswitches.

Does this mean that for some implementations you may not need to
change any of the TOR/vswitches, but that for some others you may ?

WH> any proposal on the table requires changes, so for me this is not
a valid discussion

See above, the proposal in the draft does not necessarily need changes
in vswitches.



Let me take a practical example : while I can quite easily see how to
implement the procedures in draft-hao-bess-inter-nvo3-vpn-optionc
based on current vswitch implementations of VXLAN, the lack of
MPLS/MPLS/(UDP, GRE) support in commonplace vswitches seems to me as
making that alternate solution you suggest harder to implement.

WH> I would disagree to this. Tell me which switch/TOR handles
multiple UDP ports for VXLAN ?

I mentioned _v_switches, and many do support a variable destination
port for VXLAN, which is sufficient to implement what the draft
proposes.

-Thomas













From: Thomas Morin <thomas.morin@orange.com<mailto:thomas.morin@orange.com>>
Organization: Orange
Date: Friday 13 November 2015 at 09:57
To: Wim Henderickx
<wim.henderickx@alcatel-lucent.com<mailto:wim.henderickx@alcatel-lucent.com>>
Cc: "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] draft-hao-bess-inter-nvo3-vpn-optionc

Hi Wim,

I agree on the analysis that this proposal is restricted to
implementations that supports the chosen encap with non-IANA ports
(which may be hard to achieve for instance on hardware
implementations, as you suggest), or to context where managing
multiple IPs would be operationally viable.

However, it does not seem obvious to me how the alternative you
propose [relying on 3-label option C with an MPLS/MPLS/(UDP|GRE)
encap] addresses the issue of whether the encap behavior is supported
or not (e.g. your typical ToR chipset possibly may not support this
kind of encap,  and even software-based switches may not be ready to
support that today).

My take is that having different options to adapt to various
implementations constraints we may have would have value.

(+ one question below on VXLAN...)

-Thomas


2015-11-12, Henderickx, Wim (Wim):
On VXLAN/NVGRE, do you challenge the fact that they would be used with
a dummy MAC address that would be replaced by the right MAC by a
sender based on an ARP request when needed ?


Is the above the issue you had in mind about VXLAN and NVGRE ?

WH> yes

I you don't mind me asking : why do you challenge that ?





_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
recu ce message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les
messages electroniques etant susceptibles d'alteration,

France Telecom - Orange decline toute responsabilite si ce message a
ete altere, deforme ou falsifie. Merci



This message and its attachments may contain confidential or
privileged information that may be protected by law;

they should not be distributed, used or copied without authorization.

If you have received this email in error, please notify the sender and
delete this message and its attachments.

As emails may be altered, France Telecom - Orange shall not be liable
if this message was modified, changed or falsified.

Thank you.