[nvo3] Review of draft-dt-nvo3-encap-01

Tom Herbert <tom@herbertland.com> Sat, 15 April 2017 19:19 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 017CA129461 for <nvo3@ietfa.amsl.com>; Sat, 15 Apr 2017 12:19:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i7jHS1xAahyl for <nvo3@ietfa.amsl.com>; Sat, 15 Apr 2017 12:19:45 -0700 (PDT)
Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com [IPv6:2607:f8b0:400d:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B38F12944C for <nvo3@ietf.org>; Sat, 15 Apr 2017 12:19:45 -0700 (PDT)
Received: by mail-qk0-x233.google.com with SMTP id p68so84579711qke.1 for <nvo3@ietf.org>; Sat, 15 Apr 2017 12:19:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=4UoTmz1PWFG5xZo3Pv9xRik+CC5QCMcg+RI0P6Gff70=; b=niook0sNrA0ONgmspa3UPMb1EUmi9lPS6b7sbWyqH02T4cS3Eusyq7oFWPPpofbirO 5yLNgpFxFUittI9Fo5Ty4udnwB8SLWIaCiGUdHyAglXxwvPbgjYDx6tZATLEs3GGmcTd /GiVaYn+pg3ps9DPVya5Hb1IfNVhzB8rhxfOmWoW92m0/hHrWH7TDrAaguIrVFVC3Png FTTz1u2nBnEqbDLIe7ntJ8t0zRhp7f/3yNk2KiUQfpwz38YuYpSQxN9CrqMurt6+tLEY zHwDTW/0KKrAAOmBBvJ6YSzRmWJO8qazR4lvgSkt4vTls0YNw7pP7pW9DfQ6pAYLRJKZ HaYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=4UoTmz1PWFG5xZo3Pv9xRik+CC5QCMcg+RI0P6Gff70=; b=nykrPiDjFnT4E1epuKaribEe162N8WN4AaThPqlp71HAowEbS326rD1DzP6+MsvCd+ J9sFCKaeV1Vz5CuCK99inFEs350liuK8N3iDlb80YF/S8LxVy6uXbT38VAAh4v195n6K 85r0gPfePh0rSm67ZN9MJomawzVQyh78OPG9VECOzFkiH2vrmnDfw9Mjkh82m/u5aLih aLFCtCLW2DQZPk4LKeZ87+fLa/b/Ik+nVvQ2TFe5/BElXy5xxxSaiUyG16KfpUfsnKfE 2dhU55iGW82dT9XDEX4GKeSoYYXdQm53z3HLQqVmvc6SyEFuo3N4arzAX10zEMND2N/l HlJQ==
X-Gm-Message-State: AN3rC/4oXjANDnEAiGSvxrO4FBpr40Al280kR9mkEzkvZeqrddZkEBYC 2LOTXDjRPgwciqz6NHfyb2v1bvpieRS1
X-Received: by 10.55.114.7 with SMTP id n7mr2623252qkc.286.1492283984063; Sat, 15 Apr 2017 12:19:44 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.140.97.10 with HTTP; Sat, 15 Apr 2017 12:19:43 -0700 (PDT)
From: Tom Herbert <tom@herbertland.com>
Date: Sat, 15 Apr 2017 12:19:43 -0700
Message-ID: <CALx6S35GXTSx=eiBrVqbDDYrkaE1syQhStLBgRGyAVKGPzzfUg@mail.gmail.com>
To: "nvo3@ietf.org" <nvo3@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/U3pfK8kMfLKALTzgbzDRS7XCHkQ>
Subject: [nvo3] Review of draft-dt-nvo3-encap-01
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Apr 2017 19:19:48 -0000

FWIW here is some feedback on that draft.

Comments on sections I looked at indicated by '-'.

2. Design Team Goals

   As communicated by WG Chairs, the design team should take one of the
   proposed encapsulations and enhance it to address the technical
   concerns. Backwards compatibility with the chosen encapsulation and

- "Backwards compatibility" is at best a weak goal. As written:
"Internet-Drafts have no formal status, and are subject to change or
removal at any time; therefore they should not be cited or quoted in
any formal document.". Maintaining compatibility with an Internet
draft cannot be a requirement of standard protocol.

   the simple evolution of deployed networks as well as applicability to
   all locations in the NVO3 architecture are goals. The DT should
   specifically avoid a design that is burdensome on hardware
   implementations, but should allow future extensibility. The chosen

- I still don't understand all this focus on hardware. An nvo3
protocol is not a hardware nor a software protocol. Jumping through
hoops to make hardware implementation better at the expense of
software is not a reasonable trade-off.

   design should also operate well with ICMP and in ECMP environments.
   If further extensibility is required, then it should be done in such
   a manner that it does not require the consent of an entity outside of
   the IETF.

5.2 GUE

   - There were a significant number of objections related to the
   complexity of implementation in hardware, similar to those noted for
   Geneve above.

- The objections for complexity of hardware implementation raised for
GUE are not remotely similar to those raised for Geneve, the mechanism
of extensibility is completely different. This objection was
addressed.

   - In addition, there were concerns raised that GUE does not support a
   sufficient number of extensions due to its reliance on a limited
   flags field, which is already almost 45% allocated.

- As I mentioned in a rebuttal to this objection the flag-fields can
be extended with for flags. This has already been implemented, the
objection is not valid.

6.5 Extension Ordering

   In order to support hardware nodes at the tunnel endpoint or at the
   transit that can process one or few extensions TLVs in TCAM. A
   control plane in such a deployment can signal a capability to ensure
   a specific TLV will always appear in a specific order for example the
   first one in the packet.

- I do not believe this is at all plausible. 1) This could only help
the endpoints and not intermediate devices. As point out two sentences
below transit nodes may need to process extensions. 2) This creates an
a dependency between data plane and control plane that is at odds with
the requirement for control plane independence (section 2.1 of Geneve
draft) 3) This would entail a serious design and implementation effort
that would likely only be ready long after the dataplane has been
deployed. 4) This creates new problems for interoperability, for
instance two devices could support same set of options but can't
interoperate because they need different orderings.

- Btw along these lines from the Geneve I noticed:

"Transit devices MUST maintain consistent forwarding behavior
   irrespective of the value of 'Opt Len', including ECMP link
   selection.  These devices SHOULD be able to forward packets
   containing options without resorting to a slow path."

- Making this requirement a SHOULD opens the door for devices to slow
path all packets with options and hence ossify the protocol in exactly
the same way that IP options were. It would not surprise me at all if
Geneve is already ossified so that options will never be deployed.
This really needs to be a MUST, but even so that probably won't
prevent vendors from throwing packets with options in the slow path.

   The order of the TLVs should be HW friendly for both the sender and
   the receiver and possibly the transit node too.

- This is exceedingly weak statement. If ordering is important then
just define a global ordering and dispense with all this hand waving
about a control plane solution and friendliness to everyone. Given
that the type space of Geneve TLVs is twenty-four, sparse assignment
of type values allows new options to be placed an appropriate order
relative to existing options.

   A transit node may need to process some extensions like telemetry
   and/or OAM inband extensions.

- See comment above why this breaks TLV order negitation.

6.6 TLV vs Bit Fields

- Up front I will reiterate my previous point that I have made several
times now: _NO_ Geneve TLVs have been proposed. No TLVs have been
implemented and the AFAIK the required processing loop has not been
implemented. There are proposed bit-fields in GUE, at least one has
been implemented and deployed, and the core processing for bit-fields
has been implemented in software. All the discussion in these drafts
pertaining to TLVs and their benefits is completely academic!

   If there is a well-known initial set of options that are likely to be
   implemented in software and in hardware, it can be efficient to use

- There is such a set. There are described in draft-herbert-gue-extensions-01

   the bit-field approach as in GUE. However, as described in section
   6.3, if options are added over time and different subsets of options
   are likely to be implemented in different pieces of hardware, then it


   would be hard for the IETF to specify which options should get the
   early bit fields.

  TLVs are a lot more flexible, which avoids the need

- Yes, they are more flexible. In fact, as currently defined we can
define up to 16M TLVs each of which can be variable length. But _why_
do we need this? What are there requirements here? Most of the rest of
this section is trying to deal with the problems this "flexibility"
creates in the first place (limiting size, alignment, a new control
plane function to enforce order, etc.)

   to determine the relative importance different options. However,
   general TLV of arbitrary order, size, and repetition of the same
   order is difficult to implement in hardware. A middle ground is to
   use TLV with restrictions on the size and alignment, observing that
   individual TLVs can have a fixed length, and support in the control
   plane such that an NVE will only receive options that to needs and
   implements. The control plane approach can potentially be used to
   control the order of the TLVs sent to a particular NVE. Note that
   transit devices are not likely to participate in the control plane
   hence to the extent that they need to participate in option
   processing they need more effort,

- Which is a major problem with the whole control plane idea.

   But transit devices would have
   issues with future GUE bits being defined for future options as well.

- That is not true. New options are added to the end of the flags so
this would not affect the the transit device processing of options it
knows about.

   A benefit of TLVs  from a HW perspective is that they are self
   describing i.e., all the information is in the TLV. In a Bit fields
   approach the hardware needs to look up the bit to determine the
   length of the data associated with the bit through some separate
   table, which would add hardware complexity.

- Yes, looking up the length of a bit field does require some
complexity, but this is a simple table lookup with a small number as
index. This pales in comparison to the lookup over a 24 bit type value
in the Geneve TLV.  And there is additional cost to verify that the
length in the TLV is appropriate for the type.

   There are use cases where multiple modules of software are running on
   NVE. This can be modules such as a diagnostic module by one vendor
   that does packet sampling and another module from a different vendor
   that does a firewall. Using a TLV format, it is easier to have
   different software modules process different TLVs, which could be
   standard extensions or vendor specific extensions defined by the
   different vendors, without conflicting with each other. This can help
   with hardware modularity as well.

- This is weak, real implementation experience would be nice.

- Here are things that this section failed to address:

- The combinatorics of TLVs and sequential processing requirements are
hard to make efficient in both software and hardware implementations.
Bit-fields do not have this problem
- Open ended TLVs, especially with the possibility of receiving ones
that can be ignored are a DOS vector.
- A survey of actual implementation of the protocols. Remember it's
"rough consensus and running code"-- Geneve is short on the running
code.
- The rationale for a 24 bit type and cost of processing 24 bit type
fields. Deriving an expected rate of adding new extensions is not
difficult based on experience with other extensible dataplane
protocols. This will probably at most be one or two a year.
- Random access of options, for instance consider a device is trying
to find a specific option in a long list
- A comparison of Geneve TLVs to IP options, IPv6 options, or some
other protocol. Specifically, I would like to know why we should
believe Geneve would not suffer the same fate of protocol ossification
that those did.

1. We studied whether VNI should be in base header or in extensions
   and whether it should be 24-bit or 32-bit. The design team agreed
   that VNI is critical information for network virtualization and MUST
   be present in all packets. Design team also agreed that 24-bit VNI
   matches the existing widely used encapsulation format i.e. VxLAN and
   NVGRE and hence more suitable to use going forward.

- As I've stated before, there is simply no technical rationale behind
a 24 bit VNI. There is no reason to believe this is sufficient to
scale for large deployments over the lifetime of the protocol. Also,
as stated above, requiring compatibility of a standard protocol with a
draft is inappropriate. Just because this was 24 bits in VXLAN and
NVGRE and there may have been some deployment does not validate the
protocol element. The VNI should simply be extended to occupy 32 bits.
(btw if you don't do this then it is likely that the eight spare bits
will either be commandeered to extend the VNI or used for other some
purpose, in either case these unreserved eight bits will be abused and
create non-interoperability).

- The counter argument to this is that 32 bits is not enough, for
instance we might want to merge to large cloud providers and not force
them to renumber. That's why the VNI should itself _be_ an extension
so that the VNI is extensible. This has a huge advantage that it would
make at least one extension required for operation of the protocol
such that intermediate devices cannot ossify it. I suppose the counter
argument is that it's somehow too important of a value to and needs to
be accessed quickly so that we can't entrust it to the extension
mechanism, but then if we're not willing to commit the VNI to be an
extension why would we be willing to put anything in an extension?

4. We compared the TLV vs Bit-fields style extension and it was
   deemed that parsing both TLV and bit-fields is expensive and while
   bit-fields may be simpler to parse, it is also more restrictive and
   requires guessing which extensions will be widely implemented so they
   can get early bit assignments for efficiency, as well Bit-fields are

- I don't understand this "early bit assignments" problem. Bit-field
allow easy random access to fields there is no need for sequential
processing. Please clarify this.

- Also, I would advise the design team to be careful with use of the
word "efficiency" as applied to other protocols than one being
advocated. If you're going to claim someone else's protocol is then
you need to be prepared with the data to back this up.

   not flexible enough to address the requirement of variable length and

- What precisely is the requirement for variable length?

   different subtypes of the same option. While TLV are more flexible, a
   control plane can restrict the number of option TLVs as well the
   order and size of the TLVs to make it simpler for a dataplane
   implementation to handle.

- If this control plane idea doesn't go away I, for one, would really
likely to see the draft that describes _precisely_ how this will work.