Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt

Tom Herbert <therbert@google.com> Wed, 21 May 2014 18:29 UTC

Return-Path: <therbert@google.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD0BB1A06EB for <tofoo@ietfa.amsl.com>; Wed, 21 May 2014 11:29:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.029
X-Spam-Level:
X-Spam-Status: No, score=-2.029 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DTCaHR51d48v for <tofoo@ietfa.amsl.com>; Wed, 21 May 2014 11:29:31 -0700 (PDT)
Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com [IPv6:2607:f8b0:4001:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A511E1A06E2 for <tofoo@ietf.org>; Wed, 21 May 2014 11:29:31 -0700 (PDT)
Received: by mail-ie0-f169.google.com with SMTP id at1so2427326iec.0 for <tofoo@ietf.org>; Wed, 21 May 2014 11:29:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=aXt2X/XmOM/zqc/m87qrvtPZHcb2vnugE2pU+r4Yrv8=; b=FcS/rr8WMCqjGvYTHI7Bjt14k9/w1TwLxIgb3ojprgVu6we6zM2JG4BrEtWeKJBCe0 fDE0SWSUK8o0xC41cw/C7y+5xN+GbPV75sSNupGWhfUcp7bpG8i9GkS2H33eXH7mrYnm OsF1cMAq2+04mDG19v2AReBoRADyQaOUPQhs6s/qw+Q0vqQDAKpk+7FhcDYEHnnOiCYF 3NF4CBl2PA4UP9ekhSxa2KOtVRlUV5tNwRu250xVicF/aEz1Ti6Gp8mQ4+z/mx5PH+XL c0pRIb5tR4mU9hSGhAQoNPYtewc5UWU3jWDjnuaxsWauGRibkjjlRALFeWwX5jWFFyOW sj6Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=aXt2X/XmOM/zqc/m87qrvtPZHcb2vnugE2pU+r4Yrv8=; b=BKnrqcyPdWkU/Aw/MlIo7RF+uN7JnB4fv6uqbmLiJ0egihHZ+oiy4mdAUaa9KuYFDk HgXqo+/GXc+n8WDS/n34d/kIOQ4KLg/DWF7JRaOC7l+JFUjrR3eB/J7p7qNNUAy50XFk MNeLkssDXSSqEib1RHcu3u/RUPrFCdtmdRntMu929fCClKOaboTOh4YJ5py1gl2bnMYK 7w90M24N/NUKry7QubTatl0HnkenOBRvskAkq1s35GCLwJnDWr07p2u5HqMAXHdZmj2O 8NKvCBVd8NietY+WsxB7rOObP5mlO0nlh2lyuFD/Cc8iHgEJDbK2yhZ7SOR1ixzbTg39 2j4Q==
X-Gm-Message-State: ALoCoQmlaEDj3b1VO+R/3OrTSPu5nHvKr4DE+iT72lRNMLXZ9PVyNvBQi0J4l1EA1qEmF+1nbcFF
MIME-Version: 1.0
X-Received: by 10.50.20.97 with SMTP id m1mr16001920ige.28.1400696970202; Wed, 21 May 2014 11:29:30 -0700 (PDT)
Received: by 10.64.148.98 with HTTP; Wed, 21 May 2014 11:29:30 -0700 (PDT)
In-Reply-To: <9F56174078B48B459268EFF1DAB66B1A109C7F16@DEN-EXDDA-S32.corp.ebay.com>
References: <20140502120923.9835.17537.idtracker@ietfa.amsl.com> <9F56174078B48B459268EFF1DAB66B1A109C2DD3@DEN-EXDDA-S32.corp.ebay.com> <CA+mtBx_CGvUb0jP724T-wBk=SJW3o1RjZQgTvcC+zVaFFK78mA@mail.gmail.com> <9F56174078B48B459268EFF1DAB66B1A109C36BC@DEN-EXDDA-S32.corp.ebay.com> <CA+mtBx9aKm2csAdFb=r2X_etLThDGw-J5SH74JpeOK8=OeXPKA@mail.gmail.com> <9F56174078B48B459268EFF1DAB66B1A109C7F16@DEN-EXDDA-S32.corp.ebay.com>
Date: Wed, 21 May 2014 11:29:30 -0700
Message-ID: <CA+mtBx9hZ-HBgdRG2pSfnma8gciBSa04b8-3Po3g0D-p3cTsdg@mail.gmail.com>
From: Tom Herbert <therbert@google.com>
To: "Zhou, Han" <hzhou8@ebay.com>
Content-Type: multipart/alternative; boundary=047d7bd76afe59453904f9ed2d3c
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/6oJl1OkKbub6rkLP9BdzCdLTN7I
Cc: "draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org" <draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org>, "tofoo@ietf.org" <tofoo@ietf.org>, "nvo3@ietf.org" <nvo3@ietf.org>, "draft-zhou-li-vxlan-soe@tools.ietf.org" <draft-zhou-li-vxlan-soe@tools.ietf.org>, Erik Nordmark <nordmark@sonic.net>
Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 May 2014 18:29:34 -0000

On Tue, May 20, 2014 at 8:07 PM, Zhou, Han <hzhou8@ebay.com>; wrote:

>  Hi Tom,
>
>
>
> > There has been a lot of work recently to get software variants (GSO and
> GRO) working in Linux with tunnels.
>
> Are you referring UDP GRO for VXLAN? I am aware of this implementation in
> kernel 3.14.
>

GRO and GSO for UDP tunnels, VXLAN is one use case.


> The performance gain should be similar, but still requires careful
> configuration of VM MTU to avoid IP fragmentation on physical MTU, rather
> than decoupling from underlay. I think each of these solutions has its own
> advantages.
>
>
In the case of TCP it's really the path MTU that is relevant. We really
want the MSS to be the largest value which avoids fragmentation in the path.

For UDP it is less straightforward. If the VM MTU is greater than the
physical MTU or even the path MTU then fragmentation is possible. Most UDP
applications really want to avoid fragmentation (much worse than doing TCP
segmentation), so they need to ascertain what size to send packets-- either
they choose a fixed minimum value, use the MTU, or some out of band
negotiation. With the occurrence of transport protocols running over UDP
(such as QUIC), this becomes important. In this case, it's really hard to
meaningfully decouple physical MTU (path MTU) from those of the overlay.

>
>
> > I think this would be applicable to about all tunnel protocols. Then
> you would also want to do reassembly at each hop? Sounds expensive. Once
> you segment, I think you'd only only want to reassemble at the end host.
>
> There is no segmentation/reassembly needed at each hop, but instead
> “offload” hop by hop, just save the metadata to the next hop headers. And
> finally segmentation should be avoided if the destination is a VM on a
> hypervisor.
>
> This scenario is like in Example 2 (section 5.2), just replace the
> VXLAN-SOE between the 2 gateways by any other Tunnel protocol that can
> support similar offloading feature, such as STT.
>
>
>
This where your proposal confuses me, we already have a lot of offloading
support of protocols without having to change protocols to support
offloading. I don't know if the concept of offloading is generic enough to
need intrinsic protocol support, or if it is at what protocol level this
should be (for instance I could conceive of this being an option in TCP to
allow mega sized segments).

A example where use case SOE is usable but GSO/GRO wouldn't be might be
enlightening!

Thanks,
Tom

 Best regards,
>
> Han
>
>
>
> *From:* Tom Herbert [mailto:therbert@google.com]
> *Sent:* Wednesday, May 21, 2014 10:42 AM
>
> *To:* Zhou, Han
> *Cc:* nvo3@ietf.org; tofoo@ietf.org;
> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> draft-zhou-li-vxlan-soe@tools.ietf.org; Erik Nordmark
> *Subject:* Re: FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>
>
>
>
>
>
>
> On Tue, May 20, 2014 at 5:28 PM, Zhou, Han <hzhou8@ebay.com>; wrote:
>
> Hi Tom,
>
>
>
> Thanks for your comments.
>
> Yes TSO/LRO with VXLAN support should provide similar or even better
> performance gains, but the mechanism proposed by this draft decouples
> overlay and underlay, and it is hardware independent.
>
> Secondly, hardware offloading usually support TCP only (TSO). The
> mechanism here can help on large UDP packet performance, also verified by
> the prototype.
>
>  BTW, how many types of off-the-shelf NIC support VXLAN offloading? Any
> performance data?
>
>
>
>  HW is not a requirement for this offloading. There has been a lot of
> work recently to get software variants (GSO and GRO) working in Linux with
> tunnels. These do show 2-3x performance improvements. HW support would be
> mostly an incremental improvement to that, or becomes interesting for OS
> bypass like in SR-IOV. GSO/GRO can be presented to the guest as TSO and LRO
> so that it's possible to plumb use of large packets from the guest all the
> way to the host driver. It should also be possible to plumb two VMs on the
> same host to communicate without segmentation, i.e. output from TSO on one
> VM becomes input for another.
>
>
>
> The advantage that I see in your draft is that it allows an intermediate
> device to perform segmentation/reassembly instead of fragmentation.
>
>
>
>  Likewise, setting large MTU on overlay interfaces achieves similar
> result, but still the overlay/underlay decoupling issue. It is usually
> advised that overlay MTU is slightly smaller than underlay to avoid
> inefficient fragmentation after adding the outer header, but to achieve
> really high performance between VMs, large MTU is preferred.
>
>
>
> Depends on the performance dimension to be optimized. Larger MTUs could
> increase latency of high priority small packets for instance (HOL
> blocking), or UDP based application might try to use MTU to decide how
> large it should send it's packets to avoid fragmentation.
>
>
>
>  And considering overlay <-> physical connection, path MTU discovery is
> not always work.
>
>   This kind of configuration complexity and pain-point can be resolved
> simply by decoupling overlay and underlay MTU, as suggested by this draft.
> Here is an example of configuration confusion:
>
> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
>
>
>
> Ideally, all NV tunnel protocols should support similar metadata, thus
> overlay segmentation can be offloaded hop-by-hop.
>
>
>
> I think this would be applicable to about all tunnel protocols. Then you
> would also want to do reassembly at each hop? Sounds expensive. Once you
> segment, I think you'd only only want to reassemble at the end host.
>
>
>
> One important thing to keep in mind, and a hard lesson in real deployment
> :-), segmentation offload is opportunistic and very dependent on the
> conditions of the traffic. It's value can be fleeting in real workloads.
> For instance imagine a host with a lot of active high throughout
> connections (like a video serving) which hits a hiccup causing all
> congestions windows to shrink. If the system is not provisioned for this
> event it can be very hard to recover (a lot more CPU is required to achieve
> same throughput). This differentiates a larger MTU from relying on
> segmentation, the former offers more predictable CPU savings.
>
>
>
> Tom
>
>
>
>
>
> Best regards,
>
> Han
>
>
>
> *From:* Tom Herbert [mailto:therbert@google.com]
> *Sent:* Wednesday, May 21, 2014 2:44 AM
> *To:* Zhou, Han
> *Cc:* nvo3@ietf.org; tofoo@ietf.org;
> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> draft-zhou-li-vxlan-soe@tools.ietf.org; Erik Nordmark
> *Subject:* Re: FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>
>
>
> Hi Zou, a couple of questions inline.
>
>
>
> On Mon, May 19, 2014 at 8:01 PM, Zhou, Han <hzhou8@ebay.com>; wrote:
>
> Hi,
>
> We have updated the VXLAN-SOE draft according to earlier comments. Now it
> is fully compatible with VXLAN-GPE. And some examples are added for better
> understanding
>
>
>
>  A prototype is also implemented here (patch based on Open vSwitch):
>
> https://github.com/hzhou8/openvswitch/commit/9a7deb8b432ce83a9c09d7d4ff85fa050f7dd2be
>
> netperf TCP_STREAM test result between a pairs of VMs on hosts with 10G
> interfaces:
>
> Before the change: 2.62 Gbits/sec
> After the change: 6.68 Gbits/sec
> Speedup is ~250%.
>
>  Can you provide some more details on this benefit? It seems like plain
> TSO/LRO that understands encapsulation should provide similar benefits when
> going between hosts.
>
>
>
>
>
> The patch attracted some interests in OVS community, but since this RFC
> draft is in very early stage so it is regarded as inappropriate by Jesse to
> apply the change to OVS tree.
> The discuss mail-thread:
> http://openvswitch.org/pipermail/discuss/2014-May/013981.html
> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
>
> So we would like to request a review here by NVO3/TOFOO groups and VXLAN
> authors: is this VXLAN extension is worth formally put into VXLAN as a
> standard, so that more people can benefit from it?
>
>  Could you get the same effect by setting larger MTUs on the overlay
> network interface and relying in path MTU discovery when going over
> physical network?
>
>
>
>  Best regards,
> Han
>
> -----Original Message-----
> From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of
> internet-drafts@ietf.org
> Sent: Friday, May 02, 2014 8:09 PM
> To: i-d-announce@ietf.org
> Subject: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>
>
> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
>
>
>         Title           : Segmentation Offloading Extension for VXLAN
>         Authors         : Han Zhou
>                           Chengyuan Li
>         Filename        : draft-zhou-li-vxlan-soe-01.txt
>         Pages           : 13
>         Date            : 2014-05-02
>
> Abstract:
>    Segmentation offloading is nowadays common in network stack
>    implementation and well supported by para-virtualized network device
>    drivers for virtual machine (VM)s. This draft describes an extension
>    to Virtual eXtensible Local Area Network (VXLAN) so that segmentation
>    can be decoupled from physical/underlay networks and offloaded
>    further to the remote end-point thus improving data-plane performance
>    for VMs running on top of overlay networks.
>
>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
>
> There's also a htmlized version available at:
> http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-01
>
> A diff from the previous version is available at:
> http://www.ietf.org/rfcdiff?url2=draft-zhou-li-vxlan-soe-01
>
>
> Please note that it may take a couple of minutes from the time of
> submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>
> _______________________________________________
> I-D-Announce mailing list
> I-D-Announce@ietf.org
> https://www.ietf.org/mailman/listinfo/i-d-announce
> Internet-Draft<https://www.ietf.org/mailman/listinfo/i-d-announceInternet-Draft>directories:
> http://www.ietf.org/shadow.html
> or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>
>
>
>
>