Re: [nvo3] Draft Geneve

Anton Ivanov <anton.ivanov@kot-begemot.co.uk> Sun, 02 March 2014 08:30 UTC

Return-Path: <anton.ivanov@kot-begemot.co.uk>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80AEA1A0657 for <nvo3@ietfa.amsl.com>; Sun, 2 Mar 2014 00:30:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zsGg-9CBvhVl for <nvo3@ietfa.amsl.com>; Sun, 2 Mar 2014 00:30:41 -0800 (PST)
Received: from ivanoab3.miniserver.com (ivanoab3.miniserver.com [89.200.143.206]) by ietfa.amsl.com (Postfix) with ESMTP id 1CEE11A0652 for <nvo3@ietf.org>; Sun, 2 Mar 2014 00:30:40 -0800 (PST)
Received: from tun252.maui-covenant.sigsegv.cx ([192.168.17.6] helo=falkor.sigsegv.cx) by ivanoab3.miniserver.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from <anton.ivanov@kot-begemot.co.uk>) id 1WK1lc-0005lD-Db for nvo3@ietf.org; Sun, 02 Mar 2014 08:29:00 +0000
Received: from wyvern.kot-begemot.co.uk ([192.168.3.72]) by falkor.sigsegv.cx with esmtp (Exim 4.80) (envelope-from <anton.ivanov@kot-begemot.co.uk>) id 1WK1n9-0008BY-Ry for nvo3@ietf.org; Sun, 02 Mar 2014 08:30:35 +0000
Message-ID: <5312EC16.2020007@kot-begemot.co.uk>
Date: Sun, 02 Mar 2014 08:30:14 +0000
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12
MIME-Version: 1.0
To: nvo3@ietf.org
References: <53104916.5040606@cisco.com> <1278160553.35330292.1393596841752.JavaMail.root@vmware.com> <53109FF1.1020904@cisco.com> <1219445865.35385295.1393600136972.JavaMail.root@vmware.com> <CF362062.65F9%kegray@cisco.com> <278b5c26711e4ae3a2ddba4bdb4f190b@BY2PR03MB128.namprd03.prod.outlook.com> <5310BE26.7060004@cisco.com> <db29312dcbfa4cb881a458b5eca8fcfe@BY2PR03MB128.namprd03.prod.outlook.com> <5310CB8E.5010006@cisco.com> <e0661d11f2184bc08a9651da702a0b93@BY2PR03MB128.namprd03.prod.outlook.com> <5310DC89.1050508@cisco.com> <CA+C0YO2=N1LGVKwwTRXVBYZ6oy6b1AHw935uw-RQy8A2B4gz3Q@mail.gmail.com> <CA+mtBx_QQCNxOnJBZWFh8bSc3wHqiQ7uSUBQD1YQb4Ayr1XjpQ@mail.gmail.com> <53118586.2080306@cisco.com> <CA+mtBx8sCVvZwL2az7HKpXcVZdf2KSnibBXBtGsYHyfFhdJ_Hg@mail.gmail.com> <53124A4B.6070608@cisco.com> <CA+mtBx_fGH8SdLScfstpA7u9O39m17Jq0ovxs22tazr0-q9_yQ@mail.gmail.com>
In-Reply-To: <CA+mtBx_fGH8SdLScfstpA7u9O39m17Jq0ovxs22tazr0-q9_yQ@mail.gmail.com>
X-Enigmail-Version: 1.4.1
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/nvo3/E24ayEh73xAK84SFnLb9g1jT_jg
Subject: Re: [nvo3] Draft Geneve
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Mar 2014 08:30:44 -0000

On 01/03/14 22:28, Tom Herbert wrote:
> On Sat, Mar 1, 2014 at 1:00 PM, Anton Ivanov (antivano)
> <antivano@cisco.com> wrote:
>> Hi Tom,
>>
>> Based on your comments you have not followed the discussion. I think it will
>> be good if you go through the thread in the archive.
>>
> This discussion is about geneve which is what I was commenting on.

This discussion is about the fact that geneve is a reinvention of what
we have already. It is not "about geneve". It is about geneve as an
encapsulation not having sufficient technical merit to warrant its
perpetration.

>
>> First, we started the discussion by announcing the fact that we have open
>> sourced a working static tunnel L2TPv3 implementation as an overlay at vNIC
>> level (allowing for off-host switching, direct overlay to physical and
>> direct vm to vm overlay). Based on this discussion I will add back (I
>> actually removed it as I saw it as unnecessary) the "application-specific
>> data between header and payload" feature.
>>
>> Second, I have been very specific about using _STATIC_ tunnels. What you are
>> saying in your mail is mostly invalid in a static tunnel context.
>>
>> Static tunnels are just PWEs  - same as any other encaps.
>>
>> 1. There is no control plane. RFC 4719 does not apply.
>>
>> 2. For an example - see
>> http://tools.ietf.org/html/draft-mkonstan-l2tpext-keyed-ipv6-tunnel-00 .
>> This is one example use case, the limitations specified in it are not
>> necessary in most others.
>>
>> 3. L2TPv3 static tunnels do not specify what the payload is and have no
>> in-band information on this. You can have PPP, Ethernet, IP, ZigBee, RFC1149
>> or whatever else you may like. If you want a special packet type in the
>> static tunnel case it is up to you. L2TPv3 will carry it for you.
>>
> If you don't carry a protocol type in each packet doesn't this prevent
> network devices that don't have access to tunnel state from parsing
> the inner packet. How would you implement LRO in a NIC, deep packet
> inspection, or network flow monitoring in that case?

Depends on device capabilities.

For big boxes - just go and ask anyone who has built a BNG and its NPUs
- all of them have been implementing it and supporting it for nearly a
decade now. There will be plenty of people from that area at the IETF.
As I said - for a lot of us it is a case of been there, done that. There
is no need to invent a new encaps the world does not need.

As far as offloads on a generic compute system and low(ish) end NICs

    TX: Let's for sake of argument, assume that we have a NIC that can
do generic TSO that is so generic that it can do VXLAN, GRE or geneve.
This means that you can program the NIC to do a reasonably arbitrary
header and put the segment at the end of it. So no difference there.

    RX: This one is more interesting. For RX I can see the benefit of
having a header length field or a well defined header length, especially
in the case of a well defined protocol. However, from an implementation
standpoint I do not see a difference between that and programming the
NIC with a trivial "for this rxhash, 5-tupple, whatever - offload from
offset X" and communicating that offset via control plane.  That has the
limitation of keeping the offset constant per session. It is however
easier to implement compared to parsing variable length options - both
in software and hardware (by the way, so far nobody has specified a use
case which explicitly requires the length of options to vary within a
session).

In any case, for both RX/TX - if we want to go down the "options may
vary within session" route instead of "variable, but fixed per session",
I do not see why we should not specify the app extension header (the one
which presently needs to be set via offset) to have a format which
includes a header length field. This also decouples sufficiently the
encaps from the metadata as preferred by most people who have spoken so
far. It is should be a separate header, not something welded to the encaps.

By the way, in its current form the checksum section,
drive-by-shooting-by-UDP, etc in the geneve spec look like they have
been built with the assumption of all or nothing (full offload by some
single offload spec offered by vendor X or no offload). That is not the
way you build things - you have to have have the cases in-between
handled in a reasonable manner. For example the spec should be friendly
to partial checksum re-use (you take the hardware checksum on RX and
compute the inner payload checksum as a difference from that over outer
header), etc. Also, the drive-by-shooting by UDP is extremely unfriendly
with respect to how you implement things on a Type 2 (kvm and Co).

A.

>
>
[snip]

-- 
"If you think it's expensive to hire a professional to do the job,
    wait until you hire an amateur."
				    Paul Neal "Red" Adair 

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk