Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt

Tom Herbert <therbert@google.com> Thu, 22 May 2014 18:43 UTC

Return-Path: <therbert@google.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B98511A0298 for <tofoo@ietfa.amsl.com>; Thu, 22 May 2014 11:43:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.029
X-Spam-Level:
X-Spam-Status: No, score=-2.029 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Jj05VvKrTa36 for <tofoo@ietfa.amsl.com>; Thu, 22 May 2014 11:43:43 -0700 (PDT)
Received: from mail-ie0-x235.google.com (mail-ie0-x235.google.com [IPv6:2607:f8b0:4001:c03::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D8151A0276 for <tofoo@ietf.org>; Thu, 22 May 2014 11:43:43 -0700 (PDT)
Received: by mail-ie0-f181.google.com with SMTP id rp18so2345142iec.26 for <tofoo@ietf.org>; Thu, 22 May 2014 11:43:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vW2a75i7A3ks6Y74TX8xDSS4Rf5zVvdqcHwyayVcF9U=; b=DqCR34lYfwT/fOmnXAcL81CpFNGM3juaTawbqXCtzyZDjUPBpzBBiyhFnn6sqiLLTV NSqG9cxsT4/pu/5ZUsSWcAF/cy1y2H2q5Dhjn8LYV/J28hfMSmSxfzKIsBPnhL8MgseH NUV6ViLtWFmE9cMVCQUJ3/+HZh2/lagWUHKXFbCBvqUomTh+mgyks6IzdbwgcgIAoYLs lVsjOOLrzwscot1gVp/qTkpWceBTptwhA4gvvHI1sMMSp3EOZAhgmyPxreMUj61i3ahm fELNOQBrkslTrQBiwTGBN81JRW92sixKhV5+ySWLjJ3SkPOLif1oJH3/3A3e8LCv1AsA f/fQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vW2a75i7A3ks6Y74TX8xDSS4Rf5zVvdqcHwyayVcF9U=; b=GTRCujGRTRNWD4n1xjHZCP9YUUj7Hx5hlaG1wy3Hac5RFsvw9edsnYg5kIv7wvIZE7 II6GvnduZ9R5xxQpGJjpMhEkkPPmJH+CkJ43tmdLSharx9zPxt33pdOa4HgITIAx8M8N mMJYAxM15OkK4Jzf4MyqfZP99j3Bcsc4lnlf3ohCnaLPtH5vITPo6eivEWSRxpv7Dz5X hsPKm5P5l3sts53hHvpTks1V8kPpRkwHvzV8nU7dv+df7QWfCe76Enipbo8HwY0zsl65 B0u9qUYSdrZT13FUy/qlVaVajcSrpcHg2YfqH8bvzJUYXDOprnYBXzOb07zXzCG7kHp0 7pxw==
X-Gm-Message-State: ALoCoQk4w5avRGrcqzakY0+/+evvbNJD7EGgonqrphHyi5NXfs49DQckOKBNzVbOjkggK/8BYqf4
MIME-Version: 1.0
X-Received: by 10.50.141.232 with SMTP id rr8mr679471igb.48.1400784221929; Thu, 22 May 2014 11:43:41 -0700 (PDT)
Received: by 10.64.148.98 with HTTP; Thu, 22 May 2014 11:43:41 -0700 (PDT)
In-Reply-To: <537BFDF4.9030300@isi.edu>
References: <20140502120923.9835.17537.idtracker@ietfa.amsl.com> <9F56174078B48B459268EFF1DAB66B1A109C2DD3@DEN-EXDDA-S32.corp.ebay.com> <537B90C9.1090003@isi.edu> <9F56174078B48B459268EFF1DAB66B1A109C36FC@DEN-EXDDA-S32.corp.ebay.com> <537BFDF4.9030300@isi.edu>
Date: Thu, 22 May 2014 11:43:41 -0700
Message-ID: <CA+mtBx9+7hC5p+3djpLVQ2upx-rs7iE2ms0mET4AoC6uZ1LWnQ@mail.gmail.com>
From: Tom Herbert <therbert@google.com>
To: Joe Touch <touch@isi.edu>
Content-Type: multipart/alternative; boundary="089e0129555cf49d8204fa017dbc"
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/DsM5Qbswabmhe7l8gr9BJE9Qz0w
Cc: "Zhou, Han" <hzhou8@ebay.com>, Erik Nordmark <nordmark@sonic.net>, "tofoo@ietf.org" <tofoo@ietf.org>, "nvo3@ietf.org" <nvo3@ietf.org>, "draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org" <draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org>, "draft-zhou-li-vxlan-soe@tools.ietf.org" <draft-zhou-li-vxlan-soe@tools.ietf.org>
Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 May 2014 18:43:46 -0000

On Tue, May 20, 2014 at 6:14 PM, Joe Touch <touch@isi.edu> wrote:

> Hi, Han,
>
> This helps, but doesn't quite address my concern.
>
> TCP offloading is fine when the OS hands off user data, and the offload
> engine creates the entire segment.
>
> This would essentially be TOE.


> The situation you're describing seems to be a hybrid, where the guest OS
> makes a TCP segment, and then something lower (in the VM system) parses
> that segment to create one or more segments on the wire.
>
> This how Linux and probably about every NIC implements TCP segmentation
offload. A large TCP segment is broken up into MSS sized segments (MSS is
provided in ancillary data). The process replicates the IP/TCP headers per
packet, adjusts the length, sequence numbers, and computes checksum for
each segment.  There's no special handling of any options, it is assumed
they can be replicated in each segment.


If that happens, it will be incompatible with a number of existing TCP
> options, and will also cause side-effects with ACK clocking, timeouts, and
> more than a few other TCP features.


>
Although I appreciate the goal of efficiency and speed, there is a severe
> compatibility issue that doesn't seem to be addressed.
>
> TCP segmentation offload is already widely deployed. Since this draft
would be using the same mechanism it's unlikely to create new compatibility
issues except for the fact that the segmentation might be deferred to an
off host entity which is a new concept that would probably have side
effects.

Tom

We can discuss this off-list if useful, FWIW.
>
> Joe
>
>
> On 5/20/2014 6:01 PM, Zhou, Han wrote:
>
>> Hi Joe,
>>
>> Thanks for your comment.
>>
>> Yes you are right that "length" fields in packet headers can be regarded
>> as hard limit of MTU, but here in the draft we were referring interface
>> MTU. We will make it more precise in next versions.
>>
>> For your question, it seems there are misunderstandings. It is always
>> Guest OS (VM) handles the TCP, but segmentation is offloaded to host. Let
>> me explain the code change:
>> - TX side:
>> -- before the change:
>>    TCP segmentation offloaded by TSO of Guest OS virt-driver from guest
>> to host, MSS carried by GSO metadata in skbuff.
>>    VXLAN layer add encapsulation, and overlay TCP segmentation is carried
>> right before sending to host IP layer, which is the idea of GSO - segment
>> at the latest point.
>>    Host do IP fragmentation only if overlay segment + outer header exceed
>> physical interface MTU. (e.g. when both guest MTU and host MTU are
>> configured to 1500)
>> -- after the change:
>>    TCP segmentation offloaded by TSO of Guest OS virt-driver from guest
>> to host, MSS carried by GSO metadata in skbuff.
>>    VXLAN layer removes GSO metadata and store it in VXLAN-SOE header. So
>> overlay TCP segmentation is skipped.
>>    Host do IP fragmentation according to physical interface MTU.
>>
>> -RX side:
>> -- before the change:
>>    Host do IP reassembly if necessary. (e.g. when both guest MTU and host
>> MTU are configured to 1500)
>>    Overlay TCP segments are decapsulated and delivered all the way to
>> guest, and guest OS do TCP handling. (high cost here)
>> -- after the change:
>>    Host do IP reassembly, and large packets decapsulated and delivered to
>> guest, and guest OS do TCP handling. (cost reduced here because of reduced
>> number of packets)
>>
>> I hope this clarifies.
>>
>> Best regards,
>> Han
>>
>>  -----Original Message-----
>>> From: Joe Touch [mailto:touch@isi.edu]
>>> Sent: Wednesday, May 21, 2014 1:29 AM
>>> To: Zhou, Han; nvo3@ietf.org; tofoo@ietf.org;
>>> draft-mahalingam-dutt-dcops-vxlan@tools.ietf.org;
>>> draft-zhou-li-vxlan-soe@tools.ietf.org
>>> Cc: Erik Nordmark; Tom Herbert
>>> Subject: Re: [Tofoo] FW: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>>>
>>> Hi, all,
>>>
>>> I had a comment and a question:
>>>
>>> Comment - (from the doc) overlays do have a hard MTU limit; it is the
>>> limit of the encapsulation mechanism. E.g., without additional layers,
>>> for UDP in IPv4 this would be a at most 65507 bytes (i.e., IPv4 max -
>>> (min IP header + UDP header)). Using additional adaptation layers, this
>>> could be larger (e.g., see SEAL).
>>>
>>> Question - the code appears to have the VXLAN layer do the
>>> fragmentation, with the OS layer implementing the rest of TCP. There are
>>> a lot of interactions, notably:
>>>
>>>         - any mechanism outside of the TCP source and TCP destination
>>>         that interprets the TCP header will result in a decrease in
>>>         functionality
>>>                 i.e., the TCP connection will support only the
>>>                 intersection of options and features supported
>>>                 by the source, dest, *and* VXLAN layers
>>>
>>>                 (rather than being limited only by the
>>>                 source-dest pair)
>>>
>>>         - if passed a full TCP segment, this mechanism will be
>>>         incompatible with TCP security (e.g., TCP MD5, TCP-AO, and
>>>         the results of the TCPCRYPT WG.
>>>
>>> I'm not quite sure from your doc whether you're re-segmenting TCP
>>> segments, or merely collecting them for aggregate transit (e.g., as is
>>> done in burst-mode Ethernet).
>>>
>>> Can you please clarify?
>>>
>>> Joe
>>>
>>>
>>> On 5/19/2014 8:01 PM, Zhou, Han wrote:
>>>
>>>> Hi,
>>>>
>>>> We have updated the VXLAN-SOE draft according to earlier comments. Now
>>>> it
>>>>
>>> is fully compatible with VXLAN-GPE. And some examples are added for
>>> better
>>> understanding.
>>>
>>>>
>>>> A prototype is also implemented here (patch based on Open vSwitch):
>>>>
>>>>  https://github.com/hzhou8/openvswitch/commit/9a7deb8b432ce83a9c09d7d
>>> 4ff85fa050f7dd2be
>>>
>>>>
>>>> netperf TCP_STREAM test result between a pairs of VMs on hosts with 10G
>>>>
>>> interfaces:
>>>
>>>>
>>>> Before the change: 2.62 Gbits/sec
>>>> After the change: 6.68 Gbits/sec
>>>> Speedup is ~250%.
>>>>
>>>> The patch attracted some interests in OVS community, but since this RFC
>>>> draft
>>>>
>>> is in very early stage so it is regarded as inappropriate by Jesse to
>>> apply the
>>> change to OVS tree.
>>>
>>>> The discuss mail-thread:
>>>> http://openvswitch.org/pipermail/discuss/2014-May/013981.html
>>>> http://openvswitch.org/pipermail/discuss/2014-May/013898.html
>>>>
>>>> So we would like to request a review here by NVO3/TOFOO groups and VXLAN
>>>>
>>> authors: is this VXLAN extension is worth formally put into VXLAN as a
>>> standard,
>>> so that more people can benefit from it?
>>>
>>>>
>>>> Best regards,
>>>> Han
>>>>
>>>> -----Original Message-----
>>>> From: I-D-Announce [mailto:i-d-announce-bounces@ietf.org] On Behalf Of
>>>>
>>> internet-drafts@ietf.org
>>>
>>>> Sent: Friday, May 02, 2014 8:09 PM
>>>> To: i-d-announce@ietf.org
>>>> Subject: I-D Action: draft-zhou-li-vxlan-soe-01.txt
>>>>
>>>>
>>>> A New Internet-Draft is available from the on-line Internet-Drafts
>>>> directories.
>>>>
>>>>
>>>>           Title           : Segmentation Offloading Extension for VXLAN
>>>>           Authors         : Han Zhou
>>>>                             Chengyuan Li
>>>>         Filename        : draft-zhou-li-vxlan-soe-01.txt
>>>>         Pages           : 13
>>>>         Date            : 2014-05-02
>>>>
>>>> Abstract:
>>>>      Segmentation offloading is nowadays common in network stack
>>>>      implementation and well supported by para-virtualized network
>>>> device
>>>>      drivers for virtual machine (VM)s. This draft describes an
>>>> extension
>>>>      to Virtual eXtensible Local Area Network (VXLAN) so that
>>>> segmentation
>>>>      can be decoupled from physical/underlay networks and offloaded
>>>>      further to the remote end-point thus improving data-plane
>>>> performance
>>>>      for VMs running on top of overlay networks.
>>>>
>>>>
>>>> The IETF datatracker status page for this draft is:
>>>> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
>>>>
>>>> There's also a htmlized version available at:
>>>> http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-01
>>>>
>>>> A diff from the previous version is available at:
>>>> http://www.ietf.org/rfcdiff?url2=draft-zhou-li-vxlan-soe-01
>>>>
>>>>
>>>> Please note that it may take a couple of minutes from the time of
>>>> submission
>>>> until the htmlized version and diff are available at tools.ietf.org.
>>>>
>>>> Internet-Drafts are also available by anonymous FTP at:
>>>> ftp://ftp.ietf.org/internet-drafts/
>>>>
>>>> _______________________________________________
>>>> I-D-Announce mailing list
>>>> I-D-Announce@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/i-d-announce
>>>> Internet-Draft directories: http://www.ietf.org/shadow.html
>>>> or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>>>>
>>>> _______________________________________________
>>>> Tofoo mailing list
>>>> Tofoo@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/tofoo
>>>>
>>>>