Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt

Tom Herbert <therbert@google.com> Wed, 21 May 2014 02:42 UTC

Return-Path: <therbert@google.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0DC3F1A043F for <tofoo@ietfa.amsl.com>; Tue, 20 May 2014 19:42:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.029
X-Spam-Level:
X-Spam-Status: No, score=-2.029 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wUjqO6-LoeqA for <tofoo@ietfa.amsl.com>; Tue, 20 May 2014 19:42:30 -0700 (PDT)
Received: from mail-ie0-x236.google.com (mail-ie0-x236.google.com [IPv6:2607:f8b0:4001:c03::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 126031A0433 for <tofoo@ietf.org>; Tue, 20 May 2014 19:42:29 -0700 (PDT)
Received: by mail-ie0-f182.google.com with SMTP id as1so1413153iec.13 for <tofoo@ietf.org>; Tue, 20 May 2014 19:42:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dfqzi112raD0FculCE/sZefuUX7xxsz4uMwKr44rqMg=; b=YFkWzDzTQDxfN6g5s+6Pb8N6/ndzk/kG9YTrvzra1yAdp9qCxyzoePMOVupFVqLSjY ZbW/YsCcfFWk/HCWRzGOXvl55RksUpLZuckgIgew+El1K8rs47WY1BFTlU12pezc1/qd DBUbm0IPqWPJjUqu9DYbkJ83vN5patRdWRD/LId1oTHFEOo6xN3qwtAXtoVK78HagCxI WpCFPbRxTRVLOrL89TsKXJnO0rVFJzSg6gm7q95t7hrPwG7/2kFHlRJ+bn1eQkPlWIvc QqaDSjvBDwuYWB2OejOMijgBSiMFYT4qLax3tQDm5gGrgD9QEQthD4APBQ1EBzCZr9QB IW5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=dfqzi112raD0FculCE/sZefuUX7xxsz4uMwKr44rqMg=; b=Zamj1FXODIIbmCuPdNuJzo7BY0ZwqFaMIxXwNnjkKOFETMWV27jVjfVfoW1hJfn+/y n/dnjRFJSm8pLv4BKJhCA2gpiKEIxSUSGbIqxQEanXof/SXaIPzCjqP/PIrJna378UWx 5xMTerXRtS5/bZhiAIRdLhcBpSVk5hqny7mwMewHI+z/b1s6m0OLTcbGPX8ML5tNvppd 9ts/z2O4opoL3u4fj44qijEzwDGPQRr1KTOXHYrX7Pq2jGLhH4rIxZxsRH00wxr20f9Y yHN/9ausQOTUegUlGYH5BeO41W+o9soZZGCPfVKcYIfrVS2Bz5ASUd/EtNGdbeYMA/eS eyLw==
X-Gm-Message-State: ALoCoQn1e81voWOsQ2Jp4t3sYp+5SIqytTjkp3whklm+BiB4P1+C0ANy0zn+w3Hl/L9KamdGOth9
MIME-Version: 1.0
X-Received: by 10.42.119.138 with SMTP id b10mr46735456icr.31.1400640149039; Tue, 20 May 2014 19:42:29 -0700 (PDT)
Received: by 10.64.148.98 with HTTP; Tue, 20 May 2014 19:42:28 -0700 (PDT)
In-Reply-To: <9F56174078B48B459268EFF1DAB66B1A109C36BC@DEN-EXDDA-S32.corp.ebay.com>
References: <20140502120923.9835.17537.idtracker@ietfa.amsl.com> <9F56174078B48B459268EFF1DAB66B1A109C2DD3@DEN-EXDDA-S32.corp.ebay.com> <CA+mtBx_CGvUb0jP724T-wBk=SJW3o1RjZQgTvcC+zVaFFK78mA@mail.gmail.com> <9F56174078B48B459268EFF1DAB66B1A109C36BC@DEN-EXDDA-S32.corp.ebay.com>
Date: Tue, 20 May 2014 19:42:28 -0700
Message-ID: <CA+mtBx9aKm2csAdFb=r2X_etLThDGw-J5SH74JpeOK8=OeXPKA@mail.gmail.com>
From: Tom Herbert <therbert@google.com>
To: "Zhou, Han" <hzhou8@ebay.com>
Content-Type: multipart/alternative; boundary="90e6ba5bc1b78ac96004f9dff26e"
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/gciX9CSIvEuf4W-07j6-r0L9VhA
Cc: "draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org" <draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org>, "tofoo@ietf.org" <tofoo@ietf.org>, "nvo3@ietf.org" <nvo3@ietf.org>, "draft-zhou-li-vxlan-soe@tools.ietf.org" <draft-zhou-li-vxlan-soe@tools.ietf.org>, Erik Nordmark <nordmark@sonic.net>
Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 May 2014 02:42:33 -0000

On Tue, May 20, 2014 at 5:28 PM, Zhou, Han <hzhou8@ebay.com> wrote:

>  Hi Tom,
>
>
>
> Thanks for your comments.
>
> Yes TSO/LRO with VXLAN support should provide similar or even better
> performance gains, but the mechanism proposed by this draft decouples
> overlay and underlay, and it is hardware independent.
>
> Secondly, hardware offloading usually support TCP only (TSO). The
> mechanism here can help on large UDP packet performance, also verified by
> the prototype.
>
 BTW, how many types of off-the-shelf NIC support VXLAN offloading? Any
> performance data?
>
>
>
HW is not a requirement for this offloading. There has been a lot of work
recently to get software variants (GSO and GRO) working in Linux with
tunnels. These do show 2-3x performance improvements. HW support would be
mostly an incremental improvement to that, or becomes interesting for OS
bypass like in SR-IOV. GSO/GRO can be presented to the guest as TSO and LRO
so that it's possible to plumb use of large packets from the guest all the
way to the host driver. It should also be possible to plumb two VMs on the
same host to communicate without segmentation, i.e. output from TSO on one
VM becomes input for another.

The advantage that I see in your draft is that it allows an intermediate
device to perform segmentation/reassembly instead of fragmentation.

 Likewise, setting large MTU on overlay interfaces achieves similar result,
> but still the overlay/underlay decoupling issue. It is usually advised that
> overlay MTU is slightly smaller than underlay to avoid inefficient
> fragmentation after adding the outer header, but to achieve really high
> performance between VMs, large MTU is preferred.
>

Depends on the performance dimension to be optimized. Larger MTUs could
increase latency of high priority small packets for instance (HOL
blocking), or UDP based application might try to use MTU to decide how
large it should send it's packets to avoid fragmentation.

And considering overlay <-> physical connection, path MTU discovery is not
> always work.
>
 This kind of configuration complexity and pain-point can be resolved
> simply by decoupling overlay and underlay MTU, as suggested by this draft.
> Here is an example of configuration confusion:
>
> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
>
>
>
> Ideally, all NV tunnel protocols should support similar metadata, thus
> overlay segmentation can be offloaded hop-by-hop.
>

I think this would be applicable to about all tunnel protocols. Then you
would also want to do reassembly at each hop? Sounds expensive. Once you
segment, I think you'd only only want to reassemble at the end host.

One important thing to keep in mind, and a hard lesson in real deployment
:-), segmentation offload is opportunistic and very dependent on the
conditions of the traffic. It's value can be fleeting in real workloads.
For instance imagine a host with a lot of active high throughout
connections (like a video serving) which hits a hiccup causing all
congestions windows to shrink. If the system is not provisioned for this
event it can be very hard to recover (a lot more CPU is required to achieve
same throughput). This differentiates a larger MTU from relying on
segmentation, the former offers more predictable CPU savings.

Tom


>
> Best regards,
>
> Han
>
>
>
> *From:* Tom Herbert [mailto:therbert@google.com]
> *Sent:* Wednesday, May 21, 2014 2:44 AM
> *To:* Zhou, Han
> *Cc:* nvo3@ietf.org; tofoo@ietf.org;
> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> draft-zhou-li-vxlan-soe@tools.ietf.org; Erik Nordmark
> *Subject:* Re: FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>
>
>
> Hi Zou, a couple of questions inline.
>
>
>
> On Mon, May 19, 2014 at 8:01 PM, Zhou, Han <hzhou8@ebay.com> wrote:
>
> Hi,
>
> We have updated the VXLAN-SOE draft according to earlier comments. Now it
> is fully compatible with VXLAN-GPE. And some examples are added for better
> understanding
>
>
>
>  A prototype is also implemented here (patch based on Open vSwitch):
>
> https://github.com/hzhou8/openvswitch/commit/9a7deb8b432ce83a9c09d7d4ff85fa050f7dd2be
>
> netperf TCP_STREAM test result between a pairs of VMs on hosts with 10G
> interfaces:
>
> Before the change: 2.62 Gbits/sec
> After the change: 6.68 Gbits/sec
> Speedup is ~250%.
>
>  Can you provide some more details on this benefit? It seems like plain
> TSO/LRO that understands encapsulation should provide similar benefits when
> going between hosts.
>
>
>
>
>
> The patch attracted some interests in OVS community, but since this RFC
> draft is in very early stage so it is regarded as inappropriate by Jesse to
> apply the change to OVS tree.
> The discuss mail-thread:
> http://openvswitch.org/pipermail/discuss/2014-May/013981.html
> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
>
> So we would like to request a review here by NVO3/TOFOO groups and VXLAN
> authors: is this VXLAN extension is worth formally put into VXLAN as a
> standard, so that more people can benefit from it?
>
>  Could you get the same effect by setting larger MTUs on the overlay
> network interface and relying in path MTU discovery when going over
> physical network?
>
>
>
> Best regards,
> Han
>
> -----Original Message-----
> From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of
> internet-drafts@ietf.org
> Sent: Friday, May 02, 2014 8:09 PM
> To: i-d-announce@ietf.org
> Subject: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>
>
> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
>
>
>         Title           : Segmentation Offloading Extension for VXLAN
>         Authors         : Han Zhou
>                           Chengyuan Li
>         Filename        : draft-zhou-li-vxlan-soe-01.txt
>         Pages           : 13
>         Date            : 2014-05-02
>
> Abstract:
>    Segmentation offloading is nowadays common in network stack
>    implementation and well supported by para-virtualized network device
>    drivers for virtual machine (VM)s. This draft describes an extension
>    to Virtual eXtensible Local Area Network (VXLAN) so that segmentation
>    can be decoupled from physical/underlay networks and offloaded
>    further to the remote end-point thus improving data-plane performance
>    for VMs running on top of overlay networks.
>
>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
>
> There's also a htmlized version available at:
> http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-01
>
> A diff from the previous version is available at:
> http://www.ietf.org/rfcdiff?url2=draft-zhou-li-vxlan-soe-01
>
>
> Please note that it may take a couple of minutes from the time of
> submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>
> _______________________________________________
> I-D-Announce mailing list
> I-D-Announce@ietf.org
> https://www.ietf.org/mailman/listinfo/i-d-announce
> Internet-Draft<https://www.ietf.org/mailman/listinfo/i-d-announceInternet-Draft>directories:
> http://www.ietf.org/shadow.html
> or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>
>
>