Re: [nvo3] Draft Geneve

"Anton Ivanov (antivano)" <antivano@cisco.com> Sat, 01 March 2014 21:00 UTC

Return-Path: <antivano@cisco.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B1C011A032F for <nvo3@ietfa.amsl.com>; Sat, 1 Mar 2014 13:00:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.047
X-Spam-Level:
X-Spam-Status: No, score=-15.047 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.547, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U5JV9bkYdF6I for <nvo3@ietfa.amsl.com>; Sat, 1 Mar 2014 13:00:50 -0800 (PST)
Received: from rcdn-iport-4.cisco.com (rcdn-iport-4.cisco.com [173.37.86.75]) by ietfa.amsl.com (Postfix) with ESMTP id 5D33B1A02CD for <nvo3@ietf.org>; Sat, 1 Mar 2014 13:00:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=19927; q=dns/txt; s=iport; t=1393707648; x=1394917248; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=N7A7Ki+IsU0Ree/laxIfmf4qAk2+rex6BuqGFbBfizw=; b=K/U6jec3vkeDR6LRBjbiHnHwFG4MuwmErB+WabM06sDlAkMiEsDpyiRN nMHj5Ptu53h10TBrJEc7thMEIMkIR2Xmo4SM4FPJvt85MxVgy+6RQwkk7 5t1noB9VD2i6LQUx/ev4JrvgAc0JA/QrQK+mZzROBA/uzNHZm1pyaSY5c M=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AhsFADZJElOtJV2a/2dsb2JhbABagwY7V8EggRMWdIImAQEEAQEBaAMKEQIBCCEWCAcJAwIBAgEPBgsUEQIEDQYCAhEGh0oDEQ3EXg2HHReMQ4FFAQFWhDgEiROLPoF+gW2BMosxhUiDLYFqBxcGHA
X-IronPort-AV: E=Sophos; i="4.97,569,1389744000"; d="scan'208,217"; a="307495838"
Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by rcdn-iport-4.cisco.com with ESMTP; 01 Mar 2014 21:00:47 +0000
Received: from xhc-rcd-x10.cisco.com (xhc-rcd-x10.cisco.com [173.37.183.84]) by rcdn-core-3.cisco.com (8.14.5/8.14.5) with ESMTP id s21L0le3000351 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for <nvo3@ietf.org>; Sat, 1 Mar 2014 21:00:47 GMT
Received: from xmb-aln-x12.cisco.com ([169.254.7.200]) by xhc-rcd-x10.cisco.com ([173.37.183.84]) with mapi id 14.03.0123.003; Sat, 1 Mar 2014 15:00:47 -0600
From: "Anton Ivanov (antivano)" <antivano@cisco.com>
To: "nvo3@ietf.org" <nvo3@ietf.org>
Thread-Topic: [nvo3] Draft Geneve
Thread-Index: AQHPNF9OIRc3MngbZEqEtPjRdT7DfSUP+Z1CmaKiLoCDr+oLMPxP3V+A///v5VCAAGyzAP//nBBQAA59hAAAC+MS4P//tSWAgABKAACAACkqAIAAVkkAgACrbgCAAD8jgA==
Date: Sat, 01 Mar 2014 21:00:46 +0000
Message-ID: <53124A4B.6070608@cisco.com>
References: <53104916.5040606@cisco.com> <1278160553.35330292.1393596841752.JavaMail.root@vmware.com> <53109FF1.1020904@cisco.com> <1219445865.35385295.1393600136972.JavaMail.root@vmware.com> <CF362062.65F9%kegray@cisco.com> <278b5c26711e4ae3a2ddba4bdb4f190b@BY2PR03MB128.namprd03.prod.outlook.com> <5310BE26.7060004@cisco.com> <db29312dcbfa4cb881a458b5eca8fcfe@BY2PR03MB128.namprd03.prod.outlook.com> <5310CB8E.5010006@cisco.com> <e0661d11f2184bc08a9651da702a0b93@BY2PR03MB128.namprd03.prod.outlook.com> <5310DC89.1050508@cisco.com> <CA+C0YO2=N1LGVKwwTRXVBYZ6oy6b1AHw935uw-RQy8A2B4gz3Q@mail.gmail.com> <CA+mtBx_QQCNxOnJBZWFh8bSc3wHqiQ7uSUBQD1YQb4Ayr1XjpQ@mail.gmail.com> <53118586.2080306@cisco.com> <CA+mtBx8sCVvZwL2az7HKpXcVZdf2KSnibBXBtGsYHyfFhdJ_Hg@mail.gmail.com>
In-Reply-To: <CA+mtBx8sCVvZwL2az7HKpXcVZdf2KSnibBXBtGsYHyfFhdJ_Hg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12
x-originating-ip: [10.60.178.250]
Content-Type: multipart/alternative; boundary="_000_53124A4B6070608ciscocom_"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/nvo3/uzWo6wJEs1jowEp4O-E_WUjZW1k
Subject: Re: [nvo3] Draft Geneve
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Mar 2014 21:00:54 -0000

Hi Tom,

Based on your comments you have not followed the discussion. I think it will be good if you go through the thread in the archive.

First, we started the discussion by announcing the fact that we have open sourced a working static tunnel L2TPv3 implementation as an overlay at vNIC level (allowing for off-host switching, direct overlay to physical and direct vm to vm overlay). Based on this discussion I will add back (I actually removed it as I saw it as unnecessary) the "application-specific data between header and payload" feature.

Second, I have been very specific about using _STATIC_ tunnels. What you are saying in your mail is mostly invalid in a static tunnel context.

Static tunnels are just PWEs  - same as any other encaps.

1. There is no control plane. RFC 4719 does not apply.

2. For an example - see http://tools.ietf.org/html/draft-mkonstan-l2tpext-keyed-ipv6-tunnel-00 . This is one example use case, the limitations specified in it are not necessary in most others.

3. L2TPv3 static tunnels do not specify what the payload is and have no in-band information on this. You can have PPP, Ethernet, IP, ZigBee, RFC1149 or whatever else you may like. If you want a special packet type in the static tunnel case it is up to you. L2TPv3 will carry it for you.

4. There is no issue with offload in static tunnels as there is nothing inband in a static tunnel to signal actual header length. In fact, even the cookie size is unknown. If you want an offload you have to specify where the offload to start looking at. This is no different from any other arbitrary case of variable header. So if a NIC or NPU can offload starting from an arbitrary offset into the packet (f.e. geneve) it should also be possible to make it offload L2TPv3

5. I am not surprised with what you saw with GRE. However it does not apply here - see 4. You saw that because GRE provides info on what the payload is in the header. So an offload implementation is entitled to know where to find the payload packet and how to treat it. Protocols that do not provide this information in the header (L2TPv3) do not have this problem. With these you need to configure explicitly where to look for the packet and how to offload.

I am not a VXLAN expert. However, I suspect that most of this is applicable there (or can be made to apply) too.

So I am going to repeat what has been said quite a few times - the world does not need another encapsulation, the existing one(s) can do the job. Please use them.

A

On 01/03/14 17:13, Tom Herbert wrote:

On Fri, Feb 28, 2014 at 11:01 PM, Anton Ivanov (antivano)
<antivano@cisco.com><mailto:antivano@cisco.com> wrote:


On 01/03/14 01:51, Tom Herbert wrote:


On Fri, Feb 28, 2014 at 3:24 PM, Sam Aldrin <aldrin.ietf@gmail.com><mailto:aldrin.ietf@gmail.com> wrote:


Hi all,

Read the draft but have few questions on the same line others have asked.

- Is this draft intended for standardizing within NVo3 WG? The status
indicates it as informational. Also it is good to have it as draft-nvo3....,
if it is meant for NVo3 WG.
- I fail to find good reasoning, in the current version of the draft, on why
design of encap transport header should be closely associated with metadata
OR closely tied together? Could you add more details to clarify?


The draft alludes to the general need for extensibility, but does not
provide any example uses, so maybe I can suggest one. We have a real
use case for an encapsulation protocol with security to allow
validation of the virtual network identifier. In their current for
vxlan and nvgre have no provisions for authenticating or integrity
check of vni, existing mechanisms in the network were not deemed
robust enough to guarantee integrity of vni and ensure strict
isolation between tenants. UDP checksum is not sufficient for this. We
need a mechanism to at least have enforce an unpredictable security
token, or possibly at stronger authentication using something like a
message digest. This is intrinsic to the encapsulation, we cannot
deploy network virtualization without this security, hence an
extensible protocol is desirable. Additionally, as the network scales,
new threats emerge, we may have need for further extensions to adapt.
All of this needs to be efficient and amenable to HW performance
optimizations.





Tom, you are describing the L2TPv3 cookie.
http://tools.ietf.org/html/rfc3931#section-4.1.1 That has already been
defined and standardized in 2005.



That's great, and I would certainly want to adapt that to a data
center encapsulation protocol, but L2TP is *not* an equivalent
protocol to encapsulations like GRE. It is a tunnel protocol, more
than encapsulation. It's circuit based needing negotiation, and there
is no way to specify Ethertype or IP protocol. As I mentioned before,
I'm not going to artificially force IP packets in Ethernet frames just
to satisfy the needs of an encapsulation protocol-- this needs to work
the other way around, we need an encapsulation that is generic to
directly encapsulate IP packets and other protocols.



As quite a few people said - we do not need to invent a new
encapsulation for the goals of this draft or for the goals of NVO3 for
that matter. This just proves the point.



Saying we don't need a new encapsulation is not proof we don't need one.  :-)



We can copy that option to VXLAN or NVGRE as an extension if we wish too.



Unfortunately, that's not feasible. In optional fields model of vxlan
and nvgre in order to compute the offset of the next header, an
implementation needs to know the lengths of all the present optional
fields So if a new optional field is used a device that doesn't know
about won't be able to skip over it. This manifests itself when
hardware devices implement based on parsing the encapsulated headers.
We saw exactly this problem when trying to add the security token to
GRE, this broke ECMP in network switches as well as LRO on the NIC. So
once vxlan and nvgre are deployed in HW, there really is no way to
extend them without breaking compatibility-- for all intents and
purposes these protocols are not extensible. The solution, which we
advocate in GUE, is that protocols with variable length headers need
to have a header length field to allow devices to skip over unknown
fields.

Another deficiency that I see in the current encapsulations that
really needs to be addressed is the interaction with IPsec.  Just
saying we can use IPsec with any of these encapsulations to provide
security is *not* sufficient! For instance, we can secure vxlan
packets with IPsec by encrypting the UDP packet. This provides packet
security, but now the network has no visibility into the encapsulation
so we can't route or firewall based on vni so we've lost the value.
For this reason we really want the encapsulation in the outside header
(which actually would be the same property if vlan were used). I don't
see a reasonable way to do this with protocols encapsulating by
Ethertype, which is a reason why GUE uses IP protocol.




As far as metadata extensions - I believe there is an agreement that we
should do it. Similarly there is a consensus that they should not be
welded into the network header. That particular aspect of the design has
no other function but to be a "mono-culture monopoly license".




Unless meta data extensions, or for that matter TLVs, are defined
which are intrinsic to the operation of the protocol, it's exceedingly
likely that hardware vendors will implement their fast path assuming
no extensions or options. This is precisely why IP options have been
rendered useless, and does not bode well for meta data extensions or
TLVs. Besides, what is important enough to be directly in the header
versus what should be in an extension seems arbitrary to me.

Any options we deploy associated with encapsulation will be important
and may very well appear in *all* packets sent so they need to be
super efficient for processing. Neither do we want any baked in
restrictions on what fields we might want to route or firewall, for
instance some day we might add new QoS classification field for
special tenant or groups. The encapsulation protocol should support
this, this should not kill HW optimizations, and I would expect that
we can program switches to perform QoS routing based on the new field
without needing to HW change.

Tom



A.
_______________________________________________
nvo3 mailing list
nvo3@ietf.org<mailto:nvo3@ietf.org>
https://www.ietf.org/mailman/listinfo/nvo3



_______________________________________________
nvo3 mailing list
nvo3@ietf.org<mailto:nvo3@ietf.org>
https://www.ietf.org/mailman/listinfo/nvo3