Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt

"Zhou, Han" <hzhou8@ebay.com> Wed, 21 May 2014 01:01 UTC

Return-Path: <hzhou8@ebay.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F4361A03DE; Tue, 20 May 2014 18:01:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -23.152
X-Spam-Level:
X-Spam-Status: No, score=-23.152 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tz1lgJVBFuFr; Tue, 20 May 2014 18:01:30 -0700 (PDT)
Received: from den-mipot-002.corp.ebay.com (den-mipot-002.corp.ebay.com [216.113.175.153]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 515141A03CF; Tue, 20 May 2014 18:01:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ebay.com; i=@ebay.com; q=dns/txt; s=ebaycorp; t=1400634090; x=1432170090; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=kwrUuxza/XSRX74jt1j5PIkB6zzUoSEjvfs9zmFwf08=; b=jxBekBw0yXCJ2mN8QPxwbFdlSBSQPParRajMGnrnA5zGlC7J+zjJrq4m 1nLEX+g7IIoa/HBh+jxvv1y5QDJnPiN49OzZ3g+tmYXprhKgSt7HfLiQG nxcyvRGHcWKrULUrYNgBY7s0beYtSuMZpRxzcoOn1Q/s/oZ/g9bQ5gRgz 8=;
X-EBay-Corp: Yes
X-IronPort-AV: E=Sophos;i="4.98,877,1392192000"; d="scan'208";a="51223184"
Received: from den-vteml-001.corp.ebay.com (HELO DEN-EXMHT-006.corp.ebay.com) ([10.101.112.212]) by den-mipot-002.corp.ebay.com with ESMTP; 20 May 2014 18:01:29 -0700
Received: from DEN-EXDDA-S32.corp.ebay.com ([fe80::e420:c190:6f77:31f7]) by DEN-EXMHT-006.corp.ebay.com ([fe80::5c45:283f:1e47:5cdf%17]) with mapi id 14.03.0174.001; Tue, 20 May 2014 19:01:29 -0600
From: "Zhou, Han" <hzhou8@ebay.com>
To: Joe Touch <touch@isi.edu>, "nvo3@ietf.org" <nvo3@ietf.org>, "tofoo@ietf.org" <tofoo@ietf.org>, "draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org" <draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org>, "draft-zhou-li-vxlan-soe@tools.ietf.org" <draft-zhou-li-vxlan-soe@tools.ietf.org>
Thread-Topic: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
Thread-Index: AQHPZf9ognVu1+LRHU+2rgpg3xG8G5tI3oOggAFb9YCAABDEMA==
Date: Wed, 21 May 2014 01:01:27 +0000
Message-ID: <9F56174078B48B459268EFF1DAB66B1A109C36FC@DEN-EXDDA-S32.corp.ebay.com>
References: <20140502120923.9835.17537.idtracker@ietfa.amsl.com> <9F56174078B48B459268EFF1DAB66B1A109C2DD3@DEN-EXDDA-S32.corp.ebay.com> <537B90C9.1090003@isi.edu>
In-Reply-To: <537B90C9.1090003@isi.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.241.19.243]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-CFilter: Scanned den1
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/PwHFazACW2DzW0FUSY_0qvJbe_w
Cc: Erik Nordmark <nordmark@sonic.net>, Tom Herbert <therbert@google.com>
Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 May 2014 01:01:33 -0000

Hi Joe,

Thanks for your comment. 

Yes you are right that "length" fields in packet headers can be regarded as hard limit of MTU, but here in the draft we were referring interface MTU. We will make it more precise in next versions.

For your question, it seems there are misunderstandings. It is always Guest OS (VM) handles the TCP, but segmentation is offloaded to host. Let me explain the code change:
- TX side:
-- before the change:
  TCP segmentation offloaded by TSO of Guest OS virt-driver from guest to host, MSS carried by GSO metadata in skbuff.
  VXLAN layer add encapsulation, and overlay TCP segmentation is carried right before sending to host IP layer, which is the idea of GSO - segment at the latest point.
  Host do IP fragmentation only if overlay segment + outer header exceed physical interface MTU. (e.g. when both guest MTU and host MTU are configured to 1500)
-- after the change:
  TCP segmentation offloaded by TSO of Guest OS virt-driver from guest to host, MSS carried by GSO metadata in skbuff.
  VXLAN layer removes GSO metadata and store it in VXLAN-SOE header. So overlay TCP segmentation is skipped.
  Host do IP fragmentation according to physical interface MTU.

-RX side:
-- before the change:
  Host do IP reassembly if necessary. (e.g. when both guest MTU and host MTU are configured to 1500)
  Overlay TCP segments are decapsulated and delivered all the way to guest, and guest OS do TCP handling. (high cost here)
-- after the change:
  Host do IP reassembly, and large packets decapsulated and delivered to guest, and guest OS do TCP handling. (cost reduced here because of reduced number of packets)

I hope this clarifies.

Best regards,
Han

> -----Original Message-----
> From: Joe Touch [mailto:touch@isi.edu]
> Sent: Wednesday, May 21, 2014 1:29 AM
> To: Zhou, Han; nvo3@ietf.org; tofoo@ietf.org;
> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
> draft-zhou-li-vxlan-soe@tools.ietf.org
> Cc: Erik Nordmark; Tom Herbert
> Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
> 
> Hi, all,
> 
> I had a comment and a question:
> 
> Comment - (from the doc) overlays do have a hard MTU limit; it is the
> limit of the encapsulation mechanism. E.g., without additional layers,
> for UDP in IPv4 this would be a at most 65507 bytes (i.e., IPv4 max -
> (min IP header + UDP header)). Using additional adaptation layers, this
> could be larger (e.g., see SEAL).
> 
> Question - the code appears to have the VXLAN layer do the
> fragmentation, with the OS layer implementing the rest of TCP. There are
> a lot of interactions, notably:
> 
> 	- any mechanism outside of the TCP source and TCP destination
> 	that interprets the TCP header will result in a decrease in
> 	functionality
> 		i.e., the TCP connection will support only the
> 		intersection of options and features supported
> 		by the source, dest, *and* VXLAN layers
> 
> 		(rather than being limited only by the
> 		source-dest pair)
> 
> 	- if passed a full TCP segment, this mechanism will be
> 	incompatible with TCP security (e.g., TCP MD5, TCP-AO, and
> 	the results of the TCPCRYPT WG.
> 
> I'm not quite sure from your doc whether you're re-segmenting TCP
> segments, or merely collecting them for aggregate transit (e.g., as is
> done in burst-mode Ethernet).
> 
> Can you please clarify?
> 
> Joe
> 
> 
> On 5/19/2014 8:01 PM, Zhou, Han wrote:
> > Hi,
> >
> > We have updated the VXLAN-SOE draft according to earlier comments. Now it
> is fully compatible with VXLAN-GPE. And some examples are added for better
> understanding.
> >
> > A prototype is also implemented here (patch based on Open vSwitch):
> >
> https://github.com/hzhou8/openvswitch/commit/9a7deb8b432ce83a9c09d7d
> 4ff85fa050f7dd2be
> >
> > netperf TCP_STREAM test result between a pairs of VMs on hosts with 10G
> interfaces:
> >
> > Before the change: 2.62 Gbits/sec
> > After the change: 6.68 Gbits/sec
> > Speedup is ~250%.
> >
> > The patch attracted some interests in OVS community, but since this RFC draft
> is in very early stage so it is regarded as inappropriate by Jesse to apply the
> change to OVS tree.
> > The discuss mail-thread:
> > http://openvswitch.org/pipermail/discuss/2014-May/013981.html
> > http://openvswitch.org/pipermail/discuss/2014-May/013898.html
> >
> > So we would like to request a review here by NVO3/TOFOO groups and VXLAN
> authors: is this VXLAN extension is worth formally put into VXLAN as a standard,
> so that more people can benefit from it?
> >
> > Best regards,
> > Han
> >
> > -----Original Message-----
> > From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of
> internet-drafts@ietf.org
> > Sent: Friday, May 02, 2014 8:09 PM
> > To: i-d-announce@ietf.org
> > Subject: I-D Action: draft-zhou-li-vxlan-soe-01.txt
> >
> >
> > A New Internet-Draft is available from the on-line Internet-Drafts directories.
> >
> >
> >          Title           : Segmentation Offloading Extension for VXLAN
> >          Authors         : Han Zhou
> >                            Chengyuan Li
> > 	Filename        : draft-zhou-li-vxlan-soe-01.txt
> > 	Pages           : 13
> > 	Date            : 2014-05-02
> >
> > Abstract:
> >     Segmentation offloading is nowadays common in network stack
> >     implementation and well supported by para-virtualized network device
> >     drivers for virtual machine (VM)s. This draft describes an extension
> >     to Virtual eXtensible Local Area Network (VXLAN) so that segmentation
> >     can be decoupled from physical/underlay networks and offloaded
> >     further to the remote end-point thus improving data-plane performance
> >     for VMs running on top of overlay networks.
> >
> >
> > The IETF datatracker status page for this draft is:
> > https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
> >
> > There's also a htmlized version available at:
> > http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-01
> >
> > A diff from the previous version is available at:
> > http://www.ietf.org/rfcdiff?url2=draft-zhou-li-vxlan-soe-01
> >
> >
> > Please note that it may take a couple of minutes from the time of submission
> > until the htmlized version and diff are available at tools.ietf.org.
> >
> > Internet-Drafts are also available by anonymous FTP at:
> > ftp://ftp.ietf.org/internet-drafts/
> >
> > _______________________________________________
> > I-D-Announce mailing list
> > I-D-Announce@ietf.org
> > https://www.ietf.org/mailman/listinfo/i-d-announce
> > Internet-Draft directories: http://www.ietf.org/shadow.html
> > or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
> >
> > _______________________________________________
> > Tofoo mailing list
> > Tofoo@ietf.org
> > https://www.ietf.org/mailman/listinfo/tofoo
> >