Re: [nvo3] RTG Dir QA review of draft-ietf-nvo3-gue

Tom Herbert <tom@herbertland.com> Mon, 27 June 2016 23:06 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 00D8412DA6B for <nvo3@ietfa.amsl.com>; Mon, 27 Jun 2016 16:06:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Aynq0kayO4p8 for <nvo3@ietfa.amsl.com>; Mon, 27 Jun 2016 16:06:40 -0700 (PDT)
Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2878112D8C6 for <nvo3@ietf.org>; Mon, 27 Jun 2016 16:06:40 -0700 (PDT)
Received: by mail-it0-x230.google.com with SMTP id f6so2032814ith.0 for <nvo3@ietf.org>; Mon, 27 Jun 2016 16:06:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=yh7syAYb8+tprbU9mAn4XAlbMuT0VH6ecpjg0EIVkOk=; b=T4U+IvTD28KeLeIjFYxStw0FFMuULgKud7tj2RwZDMCHlI5GQRwFW1iMlLMHu8jwFB J/+si9A9lGM/HYrjLSCTJ3ZMFKJ7gyPznQF4JO/7nyFmrdCj9jU8Jey5J0tMu1A/dwyU +Ixvb5FDEjsH5QDWp9pc4R/HV5keOxwkl4ARBrPYP082TDBIym+QY0LUX26h/IWxekRn Y+vcr4cwdTVOdonbwx86V8yxOdGcO60SaEnz8CFYoCRqudOHXNtRsHhoeUFLdQNUI6yN L6kzAkXe65p7kcSTui0sBDS0ibJK9EPJNJG/GOfYtZiP6YGI1cpNb874csnwQOjfpox7 95bw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=yh7syAYb8+tprbU9mAn4XAlbMuT0VH6ecpjg0EIVkOk=; b=HbXz+mggFhBlrLJTZDMEm2zcH//Z/IGhcX1gaVWDZWxDnz4c8s5k/E+PisPly619MY rd3zMWziAsVrQlMYN5PHsUvfoRYftfR1VTA+uqvyz7zz5+lt8mgqLSTQ3kWWdqlDvXL7 0l3CJq+ghVvHYPu5DaZShYYmLwWER2obsUGR5d3LYm/T95enllYRrrdae6cMTKEW9GfZ gr0R/LqR3fZTop4iiz6h3iP20YjJOYp1xPaT0Yft0RxEuHCHaa7RMGswQUprGtONEgoc cUm2k7oUiy3DaCcucGtjm7nCsF6xXptFtp+zHgEQ9LpVsYtY8vbd53d2JxcaIOTZtEel oBnQ==
X-Gm-Message-State: ALyK8tIdUGtaUE53WGb9GiWrn0aTv7mQoht8Bz6KTk/Ly6+1Oz4V+1ICRC2Pu/6JEFiOrA5nN4qX385foFgXGw==
X-Received: by 10.36.46.21 with SMTP id i21mr299556ita.91.1467068798799; Mon, 27 Jun 2016 16:06:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.31.134 with HTTP; Mon, 27 Jun 2016 16:06:37 -0700 (PDT)
In-Reply-To: <0e6201d1cb03$5f2f2280$1d8d6780$@olddog.co.uk>
References: <0e6201d1cb03$5f2f2280$1d8d6780$@olddog.co.uk>
From: Tom Herbert <tom@herbertland.com>
Date: Mon, 27 Jun 2016 16:06:37 -0700
Message-ID: <CALx6S379htYqZaWXqEXKFL92jrrz9e-WSS7F3RzmYC28NitRjQ@mail.gmail.com>
To: adrian@olddog.co.uk
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/x1hzHJUFidkI3mkJvJ88IvwU3N0>
Cc: rtg-dir@ietf.org, "nvo3@ietf.org" <nvo3@ietf.org>, draft-ietf-nvo3-gue.all@ietf.org, "int-area@ietf.org" <int-area@ietf.org>
Subject: Re: [nvo3] RTG Dir QA review of draft-ietf-nvo3-gue
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jun 2016 23:06:43 -0000

Hi Adrian,

Thanks for the detailed and insightful comments! Some replies in line

On Mon, Jun 20, 2016 at 7:52 AM, Adrian Farrel <adrian@olddog.co.uk> wrote:
> Hi,
>
> I've been randomly selected from the Routing Directorate to perform a
> QA review of this document. The philosophy behind QA reviews can be
> found at https://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDirDocQa
>
> In short, the purpose of the review is to uncover issues or fixing or
> wider discussion earlier in the process than sometimes happens with
> RTG Dir reviews around or even after WG last call. It is not my
> intention to be overly critical or harsh n my review, but I have tried
> to raise everything I could think of - my intention is to allow you to
> be able to say "Yes, that was looked at, discussed, and agreed."
>
> Please do follow up with questions or discussion, but don't feel that
> you have to convince me of things - you need to convince the WG.
>
> Cheers,
> Adrian
>
> ---
>
> Being forced to read this document, I'm afraid I was required to enter a
> third-party IPR disclosure because of the IPR already disclosed against
> draft-ietf-mpls-in-udp that became RFC 7510. This should show up on the
> NVO3 mailing list.
>
We saw that. Note that this applies to several other UDP encapsulation
protocols.


> ---
>
> It seems to me that there is some careful coordination needed with other
> work on encapsulation of transport or network protocols in UDP. This
> idea clearly has value in NVO3, but I should have thought it sat better
> in the TSVWG. I hope the NVO3 chairs have discussed this with the TSVWG
> chairs to ensure that there is no friction. This is particularly
> important because it will be important to recognise that only one of
> this draft and draft-manner-tsvwg-gut is likely to make it to RFC.
>
Much of the content around TSV issues (e.g. zero UDP6 checksum and
congestion control) is based on that in MPLS/UDP and GRE/UDP.

> You seem to have correctly addressed the three issues that have most
> worried the TSVWG (checksum, congestion and security), so that is all
> good, but I would recommend getting the TSVWG involved for a full and
> detailed review now and for each future revision of the document. In
> fact, I would have tended towards making this a TSVWG document, but so
> long as the chairs, the ADs, and the WGs are happy, that should be fine.
>
> ---
>
> Overall, this work is a good idea and needed. When we did MPLS-in-UDP
> there was a background proposal to generalise and only burn one port
> number for al UDP encapsulations. This achieves that end.
>
> However, I think this proposal may be too general and too extensible.
> Future-proof is good, but there seem to be a lot of bells and whistles
> defined here that have no specific use proposed, and no indication that
> a future use might ever be defined. I think it is one thing that it
> should be possible to extend a protocol, and another that it defines
> multiple fields and extension mechanisms that might never be used.
>
Yes, two of the authors on this draft are also authors for GRE/UDP.
GRE/UDP has received excellent review in TSVWG and we would like to
have similar review of GUE.

> I comment on some of these mechanisms in my notes below.
>
> ---
>
> In section 3.1, please add a forward pointer to section 6 instead of
> "below"
>
>       o Source port (inner flow identifier): This should be set to a
>         value that represents the encapsulated flow. The properties of
>         the inner flow identifier are described below.
>
Okay.

> Probably add a forward pointer each time "inner flow identifier" is
> mentioned.
>
Okay.

> ---
>
> In 3.1 when describing the contents of the Proto/ctype field it would
> be helpful to b crystal clear of which set of IP protocol numbers you
> are using. Maybe a reference to
> http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
>
Okay. Offhand, all protocol numbers except for IPv6 extension headers
are normally allowed. Protocol #59 is allowed even with IPv4.

> ---
>
> Maybe add forward pointers to 3.2, 3.3, and 3.4 from 3.1.
>
> Similarly 3.5 from 3.2.
>
Okay.

> ---
>
> I wonder whether you have possibly overdone the future-proofing since
> you have defined no uses for flags nor any possible fields or extension
> fields. The mechanism you have defined could actually be added later if
> needed (although it is possible to believe this to not happen for a
> number of years) by assigning a flag (the E flag as now), defining a new
> field to contain extension flags, and proceeding as described.
>
Extensions have been defined in several other drafts. We are
consolidating a core set into one draft (fragmentation, security,
payload transform, checksum, and remote checksum offload). An
additional option for network virtualization is defined in
draft-hy-gue-4-secure-transport.

Agree that it is unnecessary to define E-bit now, with above
extensions we've currently proposed allocating seven bits so we still
have eight free (assuming E-bit). I think we should just mention this
as the mechanism when we need to extend the number of flag bits.

> There is nothing wrong with what you have written, but it does seem to
> complicate the base protocol with a very long-term extensibility
> horizon.
>
> ---
>
> 3.3 has
>
>    New flags should be
>    allocated from high to low order bit contiguously.
>
> I am pretty sure you want s/should be/are to be/
> Also, this text needs to be in the IANA section as well otherwise IANA
> will not know that they are constrained.
>
Okay.

> ---
>
> 3.3
>
>    Flags may be paired together to allow different lengths for an
>    optional field. For example, if two flag bits are paired, a field may
>    possibly be three different lengths. Regardless of how flag bits may
>    be paired, the lengths and offsets of optional fields corresponding
>    to a set of flags must be well defined.
>
> This works, of course, but aren't you again being too flexible and too
> clever? For a field that might have two lengths, you are not saving any
> bits. Why not simply allow a field with two possible lengths to simply
> be defined as two different fields?
>
Then we have to process it as two different fields. We currently have
defined security flags to be two bits allowing three possible sizes
(64, 128, or 256bits). Also this allows for creating multibit fields
in the flag bits themselves, similar to how Recur is defined in GRE.

> ---
>
> 3.3
>
>    Flags (or paired flags) are idempotent such that new flags should not
>
> Is that s/should not/do not/ ?

I think "must not" is appropriate.

>
> ---
>
> 3.4
>
>    An encapsulator and decapsulator MUST agree on the meaning
>    of private data before using it.
>
> How? Using an OID? Using a control message?
>
It's implementation specific. For many deployments, like ours at
Facebook, configuration is sufficient for such things. GUE does not
require control messages for normal protocol operation, they could
defined (some C-bit=0 messages) but that is out of scope for this
document. The key point is that agreement and meaning of private data
is matter between the two communicating endpoints. GUE does not
preclude the development of a commonly used definitions for private
data, say for instance a set of TLVs like Geneve might define, but we
shouldn't need to declare that in the base protocol.

> ---
>
> I am not enthusiastic about allowing "private data" in the packet
> header. I can see its use for specific functions that you have called
> out (security and identifiers), but even then I am not too comfortable.
> Actually, wouldn't security and identifiers by standard fields rather
> than private data?
>
Yes, security and identifiers are being defined as well known
extensions. The reason for having a private data section is to allow
implementations or sites to extend the protocol for their own
purposes. They may or may not intend standardize. What we don't want
to happen is that people randomly reserved bits for their own purposes
so that in the future we we need to define a standard use for them we
can't because there is some significant deployment already using them.
This is especially a concern for an nvo3 protocol which is more of DC
protocol then protocol used on the Internet so we anticipate more
"customization".

> Recall that these UDP packets will be exchanged by many implementations
> and the ideal is that every speaker should be able to understand every
> packet. Also recall that things that might be used as covert channels
> are best avoided.
>
> ---
>
> Don't you need an IANA registry to track control messages?
>
Yes.

> ---
>
> Version 0x01 of GUE is very "clever". I wonder whether it is really
> necessary.
>
We think it's clever too :-). We didn't initially design GUE with the intent
of doing this. This request came up on int-area list. Motivation is
that IPv4 and IPv6 are probably most common payloads, this save 4
bytes (as a form of GUE hdr compression), and this obviates the need
for define IP-in-UDP as a separate protocol with separate port
numbers.

> In any case, you should discuss it in the Introduction with an
> explanation of what it is, and some motivation.
>
Okay.

> ---
>
> 5.4 has
>
>    The decapsulator validates packets, including fields of the GUE
>    header. If a packet is acceptable, the UDP and GUE headers are
>    removed and the packet is resubmitted for IP protocol processing or
>    control message processing if it is a control message.
>
> ...but, of course, the contents of the GUE packet that is not a control
> packet may be any protocol as indicated by the version number and proto
> fields. Passing anything other than IP for IP protocol processing might
> be considered a mistake :-)
>
Okay.

> ---
>
> In 5.4
>
>    ...otherwise malformed
>    header, it must drop the packet and may log the event.
>
> That is better as MUST and MAY according to other usage in the draft.
> You might do well to check all uses of must/should/may to check whether
> they could/should be in upper case.
>
Okay.

> When saying "may log the event" it is traditional (and probably good) to
> also give advice about thresholds and risks of log-swamping when under
> attack.
>
Do you know if there is an RFC to reference?

> Although "otherwise malformed" might cover it, I think you should call
> out "unknown or unsupported payload protocol".
>
Okay.

> ---
>
> In 5.5
>
>    It
>    may encapsulate a GUE packet in another GUE packet, for instance to
>    implement a network tunnel.
>
> Doesn't that require a protocol number to be assigned for GUE?
>
Technically, no, This could look like IP|UDP|GUE|UDP|GUE. Will clarify.

> ---
>
> In 5.6
>
>    A middle box may interpret some flags and optional fields of the GUE
>    header for classification purposes, but is not required to understand
>    all flags and fields in GUE packets.
>
> I think you mean s/all/any of the/
>
Yes.

> ---
>
> 5.6.1 has
>
>    The source port set in the UDP
>    header must be the destination port the peer would set for replies.
>
> But 3.1 has
>
>       o Source port (inner flow identifier): This should be set to a
>         value that represents the encapsulated flow. The properties of
>         the inner flow identifier are described below.
>
>       o Destination port: The GUE assigned port number, 6080.
>
> You can't achieve both and it would seem that the only way GUE can be
> "symmetrical" is to use the same source port in both directions.
>
3.1 text is a "should" for this reason. Will clarify that symmetric
port number for NAT is an exception to source port assignment.

> ---
>
> The text in 5.6.2
>    This method
>    is problematic since ports numbers do not have global meaning
>    ([RFC7605]) and a packet which is not GUE but destined to the same
>    port number could be misinterpreted.
> ...sent me scurrying to 7605. I think the point is not that the port
>    number does not have global meaning, but that "It is important to
>    recognize that any interpretation of port numbers -- except at the
>    endpoints -- may be incorrect, because port numbers are meaningful
>    only at the endpoints," and "Ultimately, port numbers indicate
>    services only to the endpoints, and any intermediate device that
>    assigns meaning to a value can be incorrect."
>
> Maybe similar enough, but I think that the intent of 7605 is to say
> that a service may be run over many different port numbers so you can't
> guarantee to find all instances of the service by looking for the port
> number. I don't think the intent of 7605 is to say that something in the
> network seen using port 6080 might not be GUE.
>
Actually, I think that is exactly what it means.

> However, if you proceed with this you'll need to:
> - resurrect draft-herbert-udp-magic-numbers
> - make it  normative reference
> - explain where the GUE magic number comes from
>
Yes, there has been more discussion related to the UDP service
identification problem on the SPUD lists. We borrowed the idea of the
magic-number from draft-hildebrand-spud-prototype, but I have become
less enthralled with it over time. I'm inclined to remove the concept
of it from GUE draft.

> I think you would do well to reduce 5.6.2 to just an observation on
> middlebox behavior, and remove all reference to draft-herbert-udp-magic-
> numbers.
>
> ---
>
> I suspect that discussing NAT as you do in 5.7 will not make you very
> popular. It is true that NAT exists, and it is worth observing what
> would happen if a GUE packet went through a NAT. But I am not sure that
> this is a problem to be solved in this document.
>
We can take the section. We now have Transport-over-UDP (
draft-herbert-transports-over-udp) which would cover that in depth.

> Indeed, since you don't actually solve it but only make suggestions
> about how it might be solved, I suggest reducing the text and saying
> that another document could be written in the future to describe NAT-
> traversal for GUE packets.
>
> BTW, where you say...
>    connection semantics must be applied to a
>    GUE tunnel as described above
> ... I think you are probably referring to section 5.6.2. You should be
> explicit if that is the case, but consider my comments about 5.6.2.
>
> ---
>
> 5.8.2 has
>
>     The GUE header checksum (in version 0x0) provides a UDP-lite
>     [RFC3828] type of checksum capability as an optional field of the
>     GUE header.
>
> This is confusing! At first read we might assume you mean the checksum
> field in the GUE header as shown in 3.1, but I think you are actually
> calling that the UDP checksum. Reading between the lines, you are
> describing an optional field called the GUE Checksum that could be
> included in the GUE header (if the corresponding flag is set). You need
> to:
>
> - fix the broken reference ([GUECSUM] or [REMCSUM]?)
> - make the reference normative
> - consider simply moving the description to this document.
>
This is described in the core extensions draft so it can be taken out.

> ---
>
> 5.9
> Pay attention to the current discussion on the softwire and nvo3 lists.
>
Okay. We should also reference the GUE fragmentation option.

> ---
>
> 6.2
>       o An encapsulator may occasionally change the inner flow
>         identifier used for an inner flow per its discretion (for
>         security, route selection, etc). Changing the value should
>         happen no more than once every thirty seconds.
>
> I assume the limitation is because statistical load balancing will not
> work if there is too much variance in hashable fields. However, 30
> seconds may be a very large number of packets. Is there any science
> behind that value?
>
No :-) Just that thrashing does not seem to be a great idea. The Linux
implementation already does this, after N retransmissions of a TCP
connection we choose new flow identifiers (src port for UDP, IPv6 flow
label, etc.) in hopes of finding a better path.

> ---
>
> 6.2
>       o Decapsulators, or any networking devices, should not attempt any
>         interpretation of the inner flow identifier, nor should they
>         attempt to reproduce any hash calculation. They may use the
>         value to match further receive packets for steering decisions,
>         but cannot assume that the hash uniquely or permanently
>         identifies a flow.
>
> I agree with "should not attempt". But then you give an example of
> applying (limited) interpretation.
>
It's the semantics of "interpretation". Will try to clarify.

> If the source port can change (even only once every 30 seconds) then
> what does it mean to "match further packets"? After all, between 29.9
> and 30.1 seconds is only a short window, but during such time, any
> such matching for steering would be invalid.
>
No. If a devices choose to try to match groups of packets in some
stateful way it can only do this as soft state. It cannot assume any
persistence the tuple it matches. The 30 second rule is only a should,
a node can change the src port for packet if it desires (I suspect
there may be extreme case where security concerns might warrant that).

> ---
>
> Hooray for section 7!
> Could you please point to it from the Introduction because it will
> significantly help the reader.
>
Okay.

> Probably you should add 7510 to the long list of references.
>
Okay.

> ---
>
> Section 7 says
>       o GUE permits encapsulation of arbitrary IP protocols, which
>         includes layer 2 3, and 4 protocols. This potentially allows
>         nearly all traffic within a data center to be normalized to be
>         either TCP or UDP on the wire.
> How so "normalized to TCP"? That doesn't seem to be mentioned anywhere
> in this document, and so either needs a reference or to be dropped.
>
It's a reference the fact that network HW tend to optimize for UDP and
TCP, so the consequence is that networks would only use UDP or TCP.

> ---
>
> I'm not convinced that punting security to a separate document is the
> best idea, and it will make progressing this document hard unless you
> can get that other document adopted and well advanced.
>
The separate item is an option for stronger security. It is not
required for deployment. We can elaborate on security issues for the
base GUE draft (like use IPsec if you're paranoid, etc).

> ---
>
>
> Could you please get IANA to update the reference for port 6080 to
> point to this document. And could you please update section 9 so that
> when this document is published as an RFC IANA will update the registry
> to point to the RFC.
>
Okay.

> ---
>
> The IANA section will need some more work
>
>    IANA is requested to create a "GUE flag-fields" registry to allocate
>    flags and optional fields for the primary GUE header flags and
>    extension flags. This shall be a registry of bit assignments for
>    flags, length of optional fields for corresponding flags, and
>    descriptive strings. There are sixteen bits for primary GUE header
>    flags (bit number 0-15) where bit 15 is reserved as the extension
>    flag in this document. There are thirty-two bits for extension flags.
>
> I think you need to separate out the new registries rather than try to
> put all of the stuff into one registry which wold then have an
> ambiguous name.
>
> You should also;
> - set out what you want IANA to track in tabular form so that they can
>   reproduce it in the registry without any confusion
> - pre-populate the registries with any values you have defined (such as
>   the E flag)
> - describe the allocation policies for each registry
>
Okay.

> ---
>
> I think a good number of your references are normative. These include...
> [GUESEC]
> [UDPMAG]
> [REMCSUM] or [GUECSUM] if it exists
>
> ---
>
> It is always helpful to state at the top of an Appendix "This appendix
> is informational and does not constitute a normative part of this
> document."

Okay.

>
> Yeah! I'm always grumpy when I'm asked to do that, but it does actually
> help get it past the IESG, and it might even help the reader.
>

Awesome review and very helpful!

Thanks,
Tom