Re: [nvo3] Review of draft-dt-nvo3-encap-01

Sami Boutros <sboutros@vmware.com> Fri, 28 April 2017 17:38 UTC

Return-Path: <sboutros@vmware.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 629AA1294AC for <nvo3@ietfa.amsl.com>; Fri, 28 Apr 2017 10:38:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=onevmw.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FUfrxVIsWZmq for <nvo3@ietfa.amsl.com>; Fri, 28 Apr 2017 10:38:19 -0700 (PDT)
Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0083.outbound.protection.outlook.com [104.47.41.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A7DAD12955F for <nvo3@ietf.org>; Fri, 28 Apr 2017 10:34:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=onevmw.onmicrosoft.com; s=selector1-vmware-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=diCQzufhrJTGn5iWaFMxX2hG5CJo19GeOq+7vx6AHqs=; b=OUI87ZMjj4R1z80tfRFLqXjienNRl+bJQG82LrfQP4MgKMdL/VyBIqoTgNxKHTMTnJEmY398DtriXEAnwZ1sTrlX4j9ttHjpza/FE6FDi+B3P8+SukIj+ly8dRxGPQcMqjtjh98Z3Xj0NGnpc/PLTRaSUWqQGVhACOo7qmIclnU=
Received: from BN6PR05MB3009.namprd05.prod.outlook.com (10.173.19.15) by BN6PR05MB3009.namprd05.prod.outlook.com (10.173.19.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1075.1; Fri, 28 Apr 2017 17:34:51 +0000
Received: from BN6PR05MB3009.namprd05.prod.outlook.com ([10.173.19.15]) by BN6PR05MB3009.namprd05.prod.outlook.com ([10.173.19.15]) with mapi id 15.01.1075.002; Fri, 28 Apr 2017 17:34:51 +0000
From: Sami Boutros <sboutros@vmware.com>
To: Tom Herbert <tom@herbertland.com>
CC: "nvo3@ietf.org" <nvo3@ietf.org>
Thread-Topic: Review of draft-dt-nvo3-encap-01
Thread-Index: AQHSwEW9m8etkeEoKkaI4ONN+Q8ciw==
Date: Fri, 28 Apr 2017 17:34:51 +0000
Message-ID: <3CC4D1CC-DFF1-497F-96CF-5EFD35B46E29@vmware.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: herbertland.com; dkim=none (message not signed) header.d=none; herbertland.com; dmarc=none action=none header.from=vmware.com;
x-originating-ip: [2601:642:4400:5082:683a:71f4:dfb4:cdc0]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN6PR05MB3009; 7:+NkwhEcBarUcohmNcZfkraB9/aS8Tm2dNEBPmpBouGzMx6Fs0RU75zJcONGXsRHrOEA5SkE/jh6ZLmY2KbOXyB3cx3YIGo6gbFWQrqdezKyPI1VwrcOmp+NmDD58LbWBeAIUUvSEswI4xmtLEJlcDDRbY9XQ9lWZpLXVcUMdjV31ncweqNG3OVMOfRnIP3qPVew0IlHuW22WpwHwIpcUBcsZuW+km6hd3wPH1XQAxRsLZFn0ROqWd+BWet+2/bRWzVJcVwr6zuKy/T+km0oQe6QG/SklvwkiqATh8Yrz5sqcGgplwXUIEiCRUbS53O962KktqH1squbXAhXmatH7qQ==; 20:/NEBmgcuVRs6vMW+Qj9VfhtpKosZWp0Q8vx4ywjH6yMDd8nsc+zfCKKQUJX8uE8oBBvnymHvLP/0GjKWQKZllwoG8yW7WJuV3cs9OEjY6k2NMsYmZnpaLr/YCO5g+Y84Ove+qh1huFD84mp5l6C3Um0Uz36tQcp7pL9w4B4/s/M=
x-ms-office365-filtering-correlation-id: 798f293f-66ec-4503-c421-08d48e5ce03c
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(201703131423075)(201703031133081); SRVR:BN6PR05MB3009;
x-microsoft-antispam-prvs: <BN6PR05MB3009AC7121A3844BA7ED9EC4BE130@BN6PR05MB3009.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(192374486261705)(100405760836317);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6041248)(20161123555025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123560025)(20161123558100)(6072148); SRVR:BN6PR05MB3009; BCL:0; PCL:0; RULEID:; SRVR:BN6PR05MB3009;
x-forefront-prvs: 029174C036
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39450400003)(39840400002)(39410400002)(39850400002)(39400400002)(57704003)(6486002)(77096006)(230783001)(4326008)(38730400002)(8936002)(25786009)(86362001)(8676002)(99286003)(5660300001)(6506006)(189998001)(6512007)(81166006)(6436002)(54896002)(122556002)(102836003)(53946003)(50986999)(6246003)(7736002)(110136004)(33656002)(6116002)(3660700001)(82746002)(6916009)(229853002)(2906002)(54356999)(53936002)(3280700002)(36756003)(83716003)(2900100001)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR05MB3009; H:BN6PR05MB3009.namprd05.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_3CC4D1CCDFF1497F96CF5EFD35B46E29vmwarecom_"
MIME-Version: 1.0
X-OriginatorOrg: vmware.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Apr 2017 17:34:51.3984 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR05MB3009
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/r7p-aGB501T1h51pp57inx52_AQ>
Subject: Re: [nvo3] Review of draft-dt-nvo3-encap-01
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Apr 2017 17:38:23 -0000

Hi Tom,


Please find some comments inline beside DT:


Thanks,


Sami

FWIW here is some feedback on that draft.



Comments on sections I looked at indicated by '-'.



2. Design Team Goals



   As communicated by WG Chairs, the design team should take one of the

   proposed encapsulations and enhance it to address the technical

   concerns. Backwards compatibility with the chosen encapsulation and



- "Backwards compatibility" is at best a weak goal. As written:

"Internet-Drafts have no formal status, and are subject to change or

removal at any time; therefore they should not be cited or quoted in

any formal document.". Maintaining compatibility with an Internet

draft cannot be a requirement of standard protocol.



DT: Those points were also communicated by the WG chairs. We agree

the Backwards compatibility can be removed.





   the simple evolution of deployed networks as well as applicability to

   all locations in the NVO3 architecture are goals. The DT should

   specifically avoid a design that is burdensome on hardware

   implementations, but should allow future extensibility. The chosen



- I still don't understand all this focus on hardware. An nvo3

protocol is not a hardware nor a software protocol. Jumping through

hoops to make hardware implementation better at the expense of

software is not a reasonable trade-off.



DT: This has been a sentiment expressed on the list and multiple

IETF meetings. We are reflecting this as a requirement based on those

discussions as well to encourage HW implementation. Having said that

the DT has addressed the usage models while considering the requirements

and implementations in general that includes software and hardware.





   design should also operate well with ICMP and in ECMP environments.

   If further extensibility is required, then it should be done in such

   a manner that it does not require the consent of an entity outside of

   the IETF.



5.2 GUE



   - There were a significant number of objections related to the

   complexity of implementation in hardware, similar to those noted for

   Geneve above.



- The objections for complexity of hardware implementation raised for

GUE are not remotely similar to those raised for Geneve, the mechanism

of extensibility is completely different. This objection was

addressed.



DT: This feedback *also* came from the WG chairs. Out of curiosity, can

you summarize the objections? And how they were addressed?



   - In addition, there were concerns raised that GUE does not support a

   sufficient number of extensions due to its reliance on a limited

   flags field, which is already almost 45% allocated.



- As I mentioned in a rebuttal to this objection the flag-fields can

be extended with for flags. This has already been implemented, the

objection is not valid.



DT: We are documenting the concerns based on what is already out there –

a mailing list rebuttal or how it can be sorted out with “this has

already been implemented” doesn’t sound very helpful. The GUE draft

mention that "the set of available flags may be extended in the future

by defining a "flag extensions bit" that refers to a field containing

a new set of flags." Is that what you refer too as addressed

the objection?





6.5 Extension Ordering



   In order to support hardware nodes at the tunnel endpoint or at the

   transit that can process one or few extensions TLVs in TCAM. A

   control plane in such a deployment can signal a capability to ensure

   a specific TLV will always appear in a specific order for example the

   first one in the packet.



- I do not believe this is at all plausible. 1) This could only help

the endpoints and not intermediate devices. As point out two sentences

below transit nodes may need to process extensions. 2) This creates an

a dependency between data plane and control plane that is at odds with

the requirement for control plane independence (section 2.1 of Geneve

draft) 3) This would entail a serious design and implementation effort

that would likely only be ready long after the dataplane has been

deployed. 4) This creates new problems for interoperability, for

instance two devices could support same set of options but can't

interoperate because they need different orderings.



- Btw along these lines from the Geneve I noticed:



"Transit devices MUST maintain consistent forwarding behavior

   irrespective of the value of 'Opt Len', including ECMP link

   selection.  These devices SHOULD be able to forward packets

   containing options without resorting to a slow path."



- Making this requirement a SHOULD opens the door for devices to slow

path all packets with options and hence ossify the protocol in exactly

the same way that IP options were. It would not surprise me at all if

Geneve is already ossified so that options will never be deployed.

This really needs to be a MUST, but even so that probably won't

prevent vendors from throwing packets with options in the slow path.



DT: You can’t have it both ways – if you think “you can’t

understand all this focus on hardware” then why is this slow path etc.

a big issue? That is a function of the capability of the hardware – it

can become better in a few years and move the processing to the fast path.



   The order of the TLVs should be HW friendly for both the sender and

   the receiver and possibly the transit node too.



- This is exceedingly weak statement. If ordering is important then

just define a global ordering and dispense with all this hand waving

about a control plane solution and friendliness to everyone. Given

that the type space of Geneve TLVs is twenty-four, sparse assignment

of type values allows new options to be placed an appropriate order

relative to existing options.



DT: Control plane negotiated options and ordering is the most

flexible way to do this. We should consider writing a working group

draft to provide examples/guidance on this. Range of usage models and

deployments scenarios drive specific options and ordering that are

relevant for that specific deployment. This includes end points and

middle boxes using the options. So having the control plane negotiate

the constraints is most appropriate and flexible way to address these

requirement. Hardcoding the options and/or ordering would limit the

applications.





   A transit node may need to process some extensions like telemetry

   and/or OAM inband extensions.



- See comment above why this breaks TLV order negitation.



6.6 TLV vs Bit Fields



- Up front I will reiterate my previous point that I have made several

times now: _NO_ Geneve TLVs have been proposed. No TLVs have been

implemented and the AFAIK the required processing loop has not been

implemented. There are proposed bit-fields in GUE, at least one has

been implemented and deployed, and the core processing for bit-fields

has been implemented in software. All the discussion in these drafts

pertaining to TLVs and their benefits is completely academic!



DT: This is not true, there are implemented Geneve options by Vmware

today in production. There are as well new HW supporting Geneve TLV

parsing. In addition Inband Telemetry (INT) specification being

developed by P4.org illustrates the option of INT meta data carried

over Geneve.  OVN/OVS have also defined some option TLV(s) for Geneve.



   If there is a well-known initial set of options that are likely to be

   implemented in software and in hardware, it can be efficient to use



- There is such a set. There are described in draft-herbert-gue-extensions-01



   the bit-field approach as in GUE. However, as described in section

   6.3, if options are added over time and different subsets of options

   are likely to be implemented in different pieces of hardware, then it





   would be hard for the IETF to specify which options should get the

   early bit fields.



  TLVs are a lot more flexible, which avoids the need



- Yes, they are more flexible. In fact, as currently defined we can

define up to 16M TLVs each of which can be variable length. But _why_

do we need this? What are there requirements here? Most of the rest of

this section is trying to deal with the problems this "flexibility"

creates in the first place (limiting size, alignment, a new control

plane function to enforce order, etc.)





   to determine the relative importance different options. However,

   general TLV of arbitrary order, size, and repetition of the same

   order is difficult to implement in hardware. A middle ground is to

   use TLV with restrictions on the size and alignment, observing that

   individual TLVs can have a fixed length, and support in the control

   plane such that an NVE will only receive options that to needs and

   implements. The control plane approach can potentially be used to

   control the order of the TLVs sent to a particular NVE. Note that

   transit devices are not likely to participate in the control plane

   hence to the extent that they need to participate in option

   processing they need more effort,



- Which is a major problem with the whole control plane idea.



DT: We are not saying we want to limit the flexibility because of

the transit node, transit node will process a small subset of options

that will be consumed by tunnel endpoints.

The WG should consider developing a separate draft on guidance for

option processing and control plane participation.



   But transit devices would have

   issues with future GUE bits being defined for future options as well.



- That is not true. New options are added to the end of the flags so

this would not affect the the transit device processing of options it

knows about.





   A benefit of TLVs  from a HW perspective is that they are self

   describing i.e., all the information is in the TLV. In a Bit fields

   approach the hardware needs to look up the bit to determine the

   length of the data associated with the bit through some separate

   table, which would add hardware complexity.



- Yes, looking up the length of a bit field does require some

complexity, but this is a simple table lookup with a small number as

index. This pales in comparison to the lookup over a 24 bit type value

in the Geneve TLV.  And there is additional cost to verify that the

length in the TLV is appropriate for the type.



DT: The 24 bits consist of 16 bits class and 8 bits type,  so it is

not a 24 bits type. The ordering of TLV(s) can benefit from the split

of class/type.



   There are use cases where multiple modules of software are running on

   NVE. This can be modules such as a diagnostic module by one vendor

   that does packet sampling and another module from a different vendor

   that does a firewall. Using a TLV format, it is easier to have

   different software modules process different TLVs, which could be

   standard extensions or vendor specific extensions defined by the

   different vendors, without conflicting with each other. This can help

   with hardware modularity as well.



- This is weak, real implementation experience would be nice.



DT: There are some implementations with options that allows different

software like mac learning and security handle different options.



- Here are things that this section failed to address:



- The combinatorics of TLVs and sequential processing requirements are

hard to make efficient in both software and hardware implementations.

Bit-fields do not have this problem



DT:  This is not true according to the HW vendors presented in the

design team.



- Open ended TLVs, especially with the possibility of receiving ones

that can be ignored are a DOS vector.



DT: The ordering, what TLV(s) should be present and what can be

processed by which node transit/tunnel endpoint can address this.



- A survey of actual implementation of the protocols. Remember it's

"rough consensus and running code"-- Geneve is short on the running

code.



DT: Geneve is already running in shipping products.



- The rationale for a 24 bit type and cost of processing 24 bit type

fields. Deriving an expected rate of adding new extensions is not

difficult based on experience with other extensible dataplane

protocols. This will probably at most be one or two a year.

- Random access of options, for instance consider a device is trying

to find a specific option in a long list

- A comparison of Geneve TLVs to IP options, IPv6 options, or some

other protocol. Specifically, I would like to know why we should

believe Geneve would not suffer the same fate of protocol ossification

that those did.



DT: The argument that if IPv4 failed to process options in fast

path then Geneve will fail too, doesn’t sound right. IPv6 EHs for

SR are being defined and those will be processed by hardware platforms.

Also, there are hardware out there which can actually process options up

to a limited size.



1. We studied whether VNI should be in base header or in extensions

   and whether it should be 24-bit or 32-bit. The design team agreed

   that VNI is critical information for network virtualization and MUST

   be present in all packets. Design team also agreed that 24-bit VNI

   matches the existing widely used encapsulation format i.e. VxLAN and

   NVGRE and hence more suitable to use going forward.



- As I've stated before, there is simply no technical rationale behind

a 24 bit VNI. There is no reason to believe this is sufficient to

scale for large deployments over the lifetime of the protocol. Also,

as stated above, requiring compatibility of a standard protocol with a

draft is inappropriate. Just because this was 24 bits in VXLAN and

NVGRE and there may have been some deployment does not validate the

protocol element. The VNI should simply be extended to occupy 32 bits.

(btw if you don't do this then it is likely that the eight spare bits

will either be commandeered to extend the VNI or used for other some

purpose, in either case these unreserved eight bits will be abused and

create non-interoperability).



- The counter argument to this is that 32 bits is not enough, for

instance we might want to merge to large cloud providers and not force

them to renumber. That's why the VNI should itself _be_ an extension

so that the VNI is extensible. This has a huge advantage that it would

make at least one extension required for operation of the protocol

such that intermediate devices cannot ossify it. I suppose the counter

argument is that it's somehow too important of a value to and needs to

be accessed quickly so that we can't entrust it to the extension

mechanism, but then if we're not willing to commit the VNI to be an

extension why would we be willing to put anything in an extension?



DT: We are not against having the VNI in an extension (option) if

VNI value need more bits beyond the 24 bits in the header.



4. We compared the TLV vs Bit-fields style extension and it was

   deemed that parsing both TLV and bit-fields is expensive and while

   bit-fields may be simpler to parse, it is also more restrictive and

   requires guessing which extensions will be widely implemented so they

   can get early bit assignments for efficiency, as well Bit-fields are



- I don't understand this "early bit assignments" problem. Bit-field

allow easy random access to fields there is no need for sequential

processing. Please clarify this.



DT: Given that half the bits are already assigned in GUE, you may

need to set the bit for a widely deployed extension in a flag extension,

and this will require extra processing, to dig the flag from the flag

extension and then look for the extension itself.



- Also, I would advise the design team to be careful with use of the

word "efficiency" as applied to other protocols than one being

advocated. If you're going to claim someone else's protocol is then

you need to be prepared with the data to back this up.



   not flexible enough to address the requirement of variable length and



- What precisely is the requirement for variable length?



DT: Requirements came from OAM, Telemetry and even security extensions,

all require variable length option.



   different subtypes of the same option. While TLV are more flexible, a

   control plane can restrict the number of option TLVs as well the

   order and size of the TLVs to make it simpler for a dataplane

   implementation to handle.



- If this control plane idea doesn't go away I, for one, would really

likely to see the draft that describes _precisely_ how this will work.



DT: Agreed. And this why we need a control plane draft that

discuss this.