Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt

"Zhou, Han" <hzhou8@ebay.com> Wed, 21 May 2014 02:17 UTC

Return-Path: <hzhou8@ebay.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E2FE1A042D; Tue, 20 May 2014 19:17:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -23.152
X-Spam-Level:
X-Spam-Status: No, score=-23.152 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mYFctbMV-VSt; Tue, 20 May 2014 19:17:26 -0700 (PDT)
Received: from den-mipot-001.corp.ebay.com (den-mipot-001.corp.ebay.com [216.113.175.152]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5096D1A040F; Tue, 20 May 2014 19:17:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ebay.com; i=@ebay.com; q=dns/txt; s=ebaycorp; t=1400638646; x=1432174646; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=D+gC249J+PZjdSWJxQVpMdD9P+JY4wr0ygta6oE2EYo=; b=vQWnJPa7AM5hlAUxmtqPqQcZ73JahKp0ZPAReU0n5wtflyeU4YBE2B55 9QBwADkuIe3ipJY8rNC1VZiufaf0E3Du4CqNczG2jirx5Cl2aQDlAbr3n 3hKMmd/wD7iYqXwVi8IZHqKJIqr/9XH97nMjFTVWd2IksvE3tXMDHItR+ o=;
X-EBay-Corp: Yes
X-IronPort-AV: E=Sophos;i="4.98,877,1392192000"; d="scan'208";a="50940089"
Received: from den-vteml-001.corp.ebay.com (HELO DEN-EXMHT-002.corp.ebay.com) ([10.101.112.212]) by den-mipot-001.corp.ebay.com with ESMTP; 20 May 2014 19:17:25 -0700
Received: from DEN-EXDDA-S32.corp.ebay.com ([fe80::e420:c190:6f77:31f7]) by DEN-EXMHT-002.corp.ebay.com ([fe80::cbe:ffa5:17f0:a24a%14]) with mapi id 14.03.0174.001; Tue, 20 May 2014 20:17:25 -0600
From: "Zhou, Han" <hzhou8@ebay.com>
To: Joe Touch <touch@isi.edu>, "nvo3@ietf.org" <nvo3@ietf.org>, "tofoo@ietf.org" <tofoo@ietf.org>, "draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org" <draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org>, "draft-zhou-li-vxlan-soe@tools.ietf.org" <draft-zhou-li-vxlan-soe@tools.ietf.org>
Thread-Topic: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
Thread-Index: AQHPZf9ognVu1+LRHU+2rgpg3xG8G5tI3oOggAFb9YCAABDEMIAAcWAA//+gOXA=
Date: Wed, 21 May 2014 02:17:24 +0000
Message-ID: <9F56174078B48B459268EFF1DAB66B1A109C5845@DEN-EXDDA-S32.corp.ebay.com>
References: <20140502120923.9835.17537.idtracker@ietfa.amsl.com> <9F56174078B48B459268EFF1DAB66B1A109C2DD3@DEN-EXDDA-S32.corp.ebay.com> <537B90C9.1090003@isi.edu> <9F56174078B48B459268EFF1DAB66B1A109C36FC@DEN-EXDDA-S32.corp.ebay.com> <537BFDF4.9030300@isi.edu>
In-Reply-To: <537BFDF4.9030300@isi.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.241.19.243]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-CFilter: Scanned den1
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/JLMThYWGgZCMwdo5mqCM3SJSe9A
Cc: Erik Nordmark <nordmark@sonic.net>, Tom Herbert <therbert@google.com>
Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 May 2014 02:17:28 -0000

Hi Joe,

This is an interesting topic.

> TCP offloading is fine when the OS hands off user data, and the offload
> engine creates the entire segment.
Existing TSO/GSO mechanisms deliver full (large) TCP segment to "offload engine", which then create smaller segments according to physical MTU, and recalculates checksums.
This is the case even without overlay considered. So I suppose the problem you pointed out is not related to my change, but a general limitation for TSO/GSO, right?

For my understanding the TCP implementation should decide whether to use offloading or not according to the feature/options required by a TCP connection. If the option required (such as MD5) is not supported by offloading, the TCP stack should do the segmentation by itself instead of utilizing offloading.

In fact, the proposal in this draft should be able to alleviate the limitation for TCP connections between VMs behind same gateways, because in this case there is no real TCP segmentation performed by "offload engine". 

Let me know if you have more concerns, or maybe an example of how an option is broken by TSO/GSO, then we can check what's the current solution in kernel.

Best regards,
Han

> -----Original Message-----
> From: Joe Touch [mailto:touch@isi.edu]
> Sent: Wednesday, May 21, 2014 9:14 AM
> To: Zhou, Han; nvo3@ietf.org; tofoo@ietf.org;
> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> draft-zhou-li-vxlan-soe@tools.ietf.org
> Cc: Erik Nordmark; Tom Herbert
> Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
> 
> Hi, Han,
> 
> This helps, but doesn't quite address my concern.
> 
> TCP offloading is fine when the OS hands off user data, and the offload
> engine creates the entire segment.
> 
> The situation you're describing seems to be a hybrid, where the guest OS
> makes a TCP segment, and then something lower (in the VM system) parses
> that segment to create one or more segments on the wire.
> 
> If that happens, it will be incompatible with a number of existing TCP
> options, and will also cause side-effects with ACK clocking, timeouts,
> and more than a few other TCP features.
> 
> Although I appreciate the goal of efficiency and speed, there is a
> severe compatibility issue that doesn't seem to be addressed.
> 
> We can discuss this off-list if useful, FWIW.
> 
> Joe
> 
> On 5/20/2014 6:01 PM, Zhou, Han wrote:
> > Hi Joe,
> >
> > Thanks for your comment.
> >
> > Yes you are right that "length" fields in packet headers can be regarded as hard
> limit of MTU, but here in the draft we were referring interface MTU. We will
> make it more precise in next versions.
> >
> > For your question, it seems there are misunderstandings. It is always Guest OS
> (VM) handles the TCP, but segmentation is offloaded to host. Let me explain the
> code change:
> > - TX side:
> > -- before the change:
> >    TCP segmentation offloaded by TSO of Guest OS virt-driver from guest to
> host, MSS carried by GSO metadata in skbuff.
> >    VXLAN layer add encapsulation, and overlay TCP segmentation is carried
> right before sending to host IP layer, which is the idea of GSO - segment at the
> latest point.
> >    Host do IP fragmentation only if overlay segment + outer header exceed
> physical interface MTU. (e.g. when both guest MTU and host MTU are configured
> to 1500)
> > -- after the change:
> >    TCP segmentation offloaded by TSO of Guest OS virt-driver from guest to
> host, MSS carried by GSO metadata in skbuff.
> >    VXLAN layer removes GSO metadata and store it in VXLAN-SOE header. So
> overlay TCP segmentation is skipped.
> >    Host do IP fragmentation according to physical interface MTU.
> >
> > -RX side:
> > -- before the change:
> >    Host do IP reassembly if necessary. (e.g. when both guest MTU and host
> MTU are configured to 1500)
> >    Overlay TCP segments are decapsulated and delivered all the way to guest,
> and guest OS do TCP handling. (high cost here)
> > -- after the change:
> >    Host do IP reassembly, and large packets decapsulated and delivered to
> guest, and guest OS do TCP handling. (cost reduced here because of reduced
> number of packets)
> >
> > I hope this clarifies.
> >
> > Best regards,
> > Han
> >
> >> -----Original Message-----
> >> From: Joe Touch [mailto:touch@isi.edu]
> >> Sent: Wednesday, May 21, 2014 1:29 AM
> >> To: Zhou, Han; nvo3@ietf.org; tofoo@ietf.org;
> >> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> >> draft-zhou-li-vxlan-soe@tools.ietf.org
> >> Cc: Erik Nordmark; Tom Herbert
> >> Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
> >>
> >> Hi, all,
> >>
> >> I had a comment and a question:
> >>
> >> Comment - (from the doc) overlays do have a hard MTU limit; it is the
> >> limit of the encapsulation mechanism. E.g., without additional layers,
> >> for UDP in IPv4 this would be a at most 65507 bytes (i.e., IPv4 max -
> >> (min IP header + UDP header)). Using additional adaptation layers, this
> >> could be larger (e.g., see SEAL).
> >>
> >> Question - the code appears to have the VXLAN layer do the
> >> fragmentation, with the OS layer implementing the rest of TCP. There are
> >> a lot of interactions, notably:
> >>
> >> 	- any mechanism outside of the TCP source and TCP destination
> >> 	that interprets the TCP header will result in a decrease in
> >> 	functionality
> >> 		i.e., the TCP connection will support only the
> >> 		intersection of options and features supported
> >> 		by the source, dest, *and* VXLAN layers
> >>
> >> 		(rather than being limited only by the
> >> 		source-dest pair)
> >>
> >> 	- if passed a full TCP segment, this mechanism will be
> >> 	incompatible with TCP security (e.g., TCP MD5, TCP-AO, and
> >> 	the results of the TCPCRYPT WG.
> >>
> >> I'm not quite sure from your doc whether you're re-segmenting TCP
> >> segments, or merely collecting them for aggregate transit (e.g., as is
> >> done in burst-mode Ethernet).
> >>
> >> Can you please clarify?
> >>
> >> Joe
> >>
> >>
> >> On 5/19/2014 8:01 PM, Zhou, Han wrote:
> >>> Hi,
> >>>
> >>> We have updated the VXLAN-SOE draft according to earlier comments. Now
> it
> >> is fully compatible with VXLAN-GPE. And some examples are added for better
> >> understanding.
> >>>
> >>> A prototype is also implemented here (patch based on Open vSwitch):
> >>>
> >>
> https://github.com/hzhou8/openvswitch/commit/9a7deb8b432ce83a9c09d7d
> >> 4ff85fa050f7dd2be
> >>>
> >>> netperf TCP_STREAM test result between a pairs of VMs on hosts with 10G
> >> interfaces:
> >>>
> >>> Before the change: 2.62 Gbits/sec
> >>> After the change: 6.68 Gbits/sec
> >>> Speedup is ~250%.
> >>>
> >>> The patch attracted some interests in OVS community, but since this RFC
> draft
> >> is in very early stage so it is regarded as inappropriate by Jesse to apply the
> >> change to OVS tree.
> >>> The discuss mail-thread:
> >>> http://openvswitch.org/pipermail/discuss/2014-May/013981.html
> >>> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
> >>>
> >>> So we would like to request a review here by NVO3/TOFOO groups and
> VXLAN
> >> authors: is this VXLAN extension is worth formally put into VXLAN as a
> standard,
> >> so that more people can benefit from it?
> >>>
> >>> Best regards,
> >>> Han
> >>>
> >>> -----Original Message-----
> >>> From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of
> >> internet-drafts@ietf.org
> >>> Sent: Friday, May 02, 2014 8:09 PM
> >>> To: i-d-announce@ietf.org
> >>> Subject: I-D Action: draft-zhou-li-vxlan-soe-01.txt
> >>>
> >>>
> >>> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> >>>
> >>>
> >>>           Title           : Segmentation Offloading Extension for
> VXLAN
> >>>           Authors         : Han Zhou
> >>>                             Chengyuan Li
> >>> 	Filename        : draft-zhou-li-vxlan-soe-01.txt
> >>> 	Pages           : 13
> >>> 	Date            : 2014-05-02
> >>>
> >>> Abstract:
> >>>      Segmentation offloading is nowadays common in network stack
> >>>      implementation and well supported by para-virtualized network
> device
> >>>      drivers for virtual machine (VM)s. This draft describes an extension
> >>>      to Virtual eXtensible Local Area Network (VXLAN) so that
> segmentation
> >>>      can be decoupled from physical/underlay networks and offloaded
> >>>      further to the remote end-point thus improving data-plane
> performance
> >>>      for VMs running on top of overlay networks.
> >>>
> >>>
> >>> The IETF datatracker status page for this draft is:
> >>> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
> >>>
> >>> There's also a htmlized version available at:
> >>> http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-01
> >>>
> >>> A diff from the previous version is available at:
> >>> http://www.ietf.org/rfcdiff?url2=draft-zhou-li-vxlan-soe-01
> >>>
> >>>
> >>> Please note that it may take a couple of minutes from the time of submission
> >>> until the htmlized version and diff are available at tools.ietf.org.
> >>>
> >>> Internet-Drafts are also available by anonymous FTP at:
> >>> ftp://ftp.ietf.org/internet-drafts/
> >>>
> >>> _______________________________________________
> >>> I-D-Announce mailing list
> >>> I-D-Announce@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/i-d-announce
> >>> Internet-Draft directories: http://www.ietf.org/shadow.html
> >>> or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
> >>>
> >>> _______________________________________________
> >>> Tofoo mailing list
> >>> Tofoo@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/tofoo
> >>>