[nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04

Bob Briscoe <ietf@bobbriscoe.net> Sat, 13 August 2016 13:25 UTC
To: Tom Herbert <tom@herbertland.com>, Lucy Yong <lucy.yong@huawei.com>, Osama Zia <osamaz@microsoft.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <67e76ab1-2f5b-4906-4cce-f7c176fd49a0@bobbriscoe.net>
Date: Sat, 13 Aug 2016 14:25:31 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------54F5A56CFB0EC6E1682648A0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/U6ETZ2Ohf0jLOB3o5wRdDKtAfqU>
Cc: "nvo3@ietf.org" <nvo3@ietf.org>
Subject: [nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04
Precedence: list
Tom, Lucy, Osama,

This draft looks like it could become important, so I wanted to review 
it comprehensively. Particularly given my experience contributing to the 
design of Generic UDP Tunneling (GUT; draft-manner-tsvwg-gut-02 (expired 
Jan 2011)), which is very similar - as the GUE draft acknowledges.

In preparation for this review:
* I re-read all the (very useful) tsvwg mailing list comments about GUT
* I had to read a couple of dozen references (to catch up on the last 
few years in this area)
* it's required some pretty deep thought, which has led me to have to 
rewrite parts of it multiple times;

I'm afraid my review is about as long as the GUE draft itself. I should 
probably turn all this into a Internet draft, but it's email for now. So 
I've split it into 3 parts in separate emails:
A) Technical Review of GUE 'As-is'        <--- This email
B) Editorial Review
C) Redesign of parts of GUE

* Pls read the review of "GUE as-is" first (ptA), it hopefully gives 
solid arguments for why some parts of GUE's design are problematic.
* I'm afraid that is rather an understatement - I think I have 
undermined nearly every part of the wire protocol: the version field, 
the C flag, the Hlen field, and the flag-based options. And I believe 
the semantics of the one remaining part (the proto/ctype field) misses 
an opportunity to be a lot more powerful.
* Nonetheless, these are only my opinions at this stage. Therefore I 
have disciplined myself to refer out to PtC for all redesign ideas, so 
PtA remains solely about GUE "as-is".
* I hope you will accept the review in the spirit intended - 
constructive criticism to improve the final result, altho I appreciate 
you were probably hoping GUE was nearly done.
* I'm not proprietorial about any of the ideas I give in the redesign - 
they are offered for the WG to use as it chooses. I don't really want to 
be working on encapsulation stuff myself, I just end up having to 
because a) encap is fundamental to real networking and it's always not 
quite done right which makes everything else hard; and b) encap is often 
the best way to get ahead of middebox evolution.

I should add that:
* I don't generally follow nvo3 (or intarea) lists. So apologies if some 
of my points are duplicates.
* Nonetheless, this means my review is a good test of whether the draft 
is comprehensible to an outsider.
* After I wrote this, I read Adrian Farrel's RTG Dir QA review. I don't 
think I have directly duplicated any of his comments. We were uneasy 
about some of the same things, but I have tried to complement criticism 
with alternative design proposals (ptC).
* I noticed Adrian encouraged you to get review from the transport area. 
I'm on the transport area review team, but I haven't been asked to do an 
"official" transport area review of GUE. Whatever, the problems I have 
uncovered are wider - best categorised as transport, protocol design 
(encapsulation and extensibility), ops and security.

*EXEC SUMMARY**(of ptA Technical)
*
I've split the tech review up into the following parts, and I've 
highlighted here where there are particularly serious problems:

1/ Addressing Architecture
   For IETF standardization, connection semantics will need to be the 
rule, not the exception. I know the exception applies where GUE came 
from - private DCs. However nvo3 and the IETF more generally has to cope 
with multi-tenant, multi-admin, and therefore firewalls (and other 
middlebox crud).

I also identify some cases where GUE cannot work that will need to be 
documented (not show-stoppers).

2/ Wire Protocol
   I'm afraid I have unearthed a number of apparently nitty, but 
actually serious show-stoppers (IMO). E.g. GUEv1 precludes future 
versions of IP and GUE extensibility only works while there are no 
extensions (!).

Also, the semantics of the ctype/proto field precludes some ideas we had 
in GUT, but without really giving a reason. Perhaps you just hadn't 
realised some potential uses of GUE that we had in mind.

This could be stated as: "Please don't unnecessarily constrain your 
protocol design solely to the use-case(s) you have in mind." This is as 
much a problem with the IETF process, which by default tries to 
constrain a new protocol to the scope of one WG, even when it could be 
more powerful. I've heard suggestions that GUE ought to move from nvo3 
to intarea?, tsvwg?, which may help, but I don't know which would be 
better. We should also bear in mind that a more powerful protocol can 
become a more powerful attack weapon in the wrong hands, so strong 
security review is also important.

3/ State
   Important, but absent from the draft.

4/ Operation
   Numerous, but mostly minor problems. The more serious ones are:
   * no way for tunnels in tunnels to know which options to copy to the 
outer, and which not.
   * The claim that "GUE permits encap of arbitrary IP protocols" is 
only true until it encounters a protocol it doesn't know (!).

   An improved checksum solution is also presented (in PtC), which can 
ensure checksum coverage of all non-mutable parts of a GUE packet and 
traverses middleboxes even if they do not support zero checksums, while 
at the same time minimising extra processing by generally avoiding 
duplicate coverage.

5/ Security
   I am worried about the new security options in GUE. Because they are 
introduced within a completely new extension framework they will 
introduce a whole set of new security vulnerabilities, flaws and bugs. 
The security community is stretched enough as it is having to cover what 
we already have. So it is important to justify why existing security 
building blocks are insufficient for GUE (IMO, the relevant motivation 
sections in the GUE extensions draft are insufficient).

   I also highlight some new points about firewall interactions.

6/ Implementation
   Just my little rant about LRO

Finally there's one endemic editorial problem that has led to a large 
number of technical flaws and oversights. Over and over, the differences 
between the main two modes of usage go unstated and unresolved. There 
are only two short sections that discuss the two modes separately:
* Section 5.1 Network tunnel encap (adds GUE+UDP+IP outside an existing 
IP header)
* Section 5.2 Transport layer encap (adds GUE+UDP between an existing 
transport and an existing IP header).

The majority of the draft is written in the mindset of network tunnel 
encap, but without saying so. If the reader is keeping both modes in 
mind, this makes the draft very hard to understand. But also, some 
fundamental problems (with one mode in some cases and the other mode in 
other cases) have been overlooked by not considering each mode 
separately at each stage of the discussion.

*TABLE OF CONTENTS**
*
Yes, a ToC for an an email!

A/ TECHNICAL PROBLEMS/COMMENTS

1/ Addressing Architecture
1.1/ Inferring Connection Semantics: the rule not the exception
1.2/ A Firewall or NAT in front of both ends
1.3/ Multiple GUE servers (transport encap) not possible behind a NAT-PT 
with one external IP
1.4/ Network decap and transport decap problematic on the same (IP) 
interface

2/ Wire Protocol
2.1/ HLEN too small
2.2/ GUE versions
2.3/ No need to interpret the protocol field relative to IPv4
2.4/ No need to restrict interpretation of the protocol field
2.5/ Missed opportunity to liberalise interpretation of the protocol field
2.6/ Positioning GUE with respect to existing IPv6 extension headers
2.7/ Reliable delivery of control messages
2.8/ Extensibility of the flags and optional fields scheme: doesn't work
2.9/ Hard-coded option lengths do not scale
2.10/ Random access to options needs motivating

3/ State
3.1/ Per-connection state vs. stateless connections but per-tunnel state
3.2/ Transport encap with Connection Semantics: Flow state management
3.3/ Keepalives for middlebox flow state

4/ Operation
4.1/ Transport encap: to GUE or not to GUE?
4.2/ Hop limit / TTL processing
4.3/ Error messages
4.4/ Tunnels in tunnels
4.5/ SHOULD adjust MTU?
4.6/ Is orig-proto field necessary in the fragmentation option?
4.7/ Congestion Control: reductio ad absurdum
4.8/ Multicast outer -> Implosion on inner destination
4.9/ Deriving flow entropy from the inner is contrary to "GUE permits 
encap of arbitrary IP protocols" claim
4.10/ Flow entropy from encrypted data could weaken the crypto?
4.11/ No need to constrain flow entropy distribution
4.12/ No need to constrain flow entropy interpretation

5/ Security
5.1/ Addresses that are both visible and hidden? Have your GUE and eat 
it too?
5.2/ How can the Security option protect a UDP/GUE header from being 
moved or removed?
5.3/ What happens when a port scan sends a datagram to port 6080?
5.4/ Firewalls will still block new/atypical protocols
5.5/ Transport Encap: Two Passes through a Local Firewall?

6/ Implementation
6.1/ Practical Large Receive Offload Requirements


*A/ TECHNICAL PROBLEMS**/COMMENTS
*
_*1/ ADDRESSING ARCHITECTURE*__*
*_
*1.1/ Inferring Connection **Semantics: the rule not the exception
*
The draft assumes that, as a general rule, the UDP dst. port of a GUE 
packet will be fixed (6080) and that flow entropy will come from the 
source port (see the two quoted sections below).

S. 5.11.1. Flow classification

    " ... When a packet is encapsulated with
     GUE, the source port in the outer UDP packet is set to a flow
     entropy value ...

S.5.11.2 Flow entropy properties

         The flow entropy is the value set in the UDP source port of a
         GUE packet. Flow entropy in the UDP source port should adhere to
         the following properties:


Nonetheless, the draft recognises there will be cases where "connection 
semantics" have to be applied in order to traverse middleboxes such as 
firewalls and NATs (but only mentioned in the relevant parts of 5.6.1 & 
5.6.2 quoted below).

Such middleboxes generally only allow "ingress" UDP datagrams if they 
look like responses to recent "egress" datagram(s). So there has to be a 
concept of an "initiator" end of the GUE tunnel. Only once the initiator 
end has sent an "egress" datagram with src:dst ports e:G (from ephemeral 
port e to the GUE port G), then the GUE encap at the remote "responder" 
end would be able to traverse the middlebox using "ingress" datagrams 
with src:dst ports reversed (G:e).

S.5.6.1. Inferring connection semantics:

    A middlebox may infer bidirectional connection semantics
    [...] To operate in
    this environment, a GUE tunnel must assume connected semantics  [...]
    The source port set in the UDP
    header must be the destination port the peer would set for replies.
    In this case the UDP source port for a tunnel would be a fixed value
    and not set to be flow entropy as described insection 5.11 
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>.

    The selection of whether to make the UDP source port fixed or set to
    a flow entropy value for each packet sent should be configurable for
    a tunnel.

S. 5.6.2. NAT

    In
    the case of stateful NAT, connection semantics must be applied to a
    GUE tunnel as described insection 5.6.1 
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.6.1>.

[BTW, I suggest changing the final sentence of the first para in 
S.5.6.1. (quoted above) to:

    Therefore, in the ingress direction, the destination UDP port would
    provide flow entropy, while the source port would take the fixed
    value of 6080 (the converse of the case insection 5.11 
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>).

]

The text quoted from both sections 5.6.1 & 5.6.2 above implies
a) that the operator of tunnel endpoint(s) can somehow know whether 
there are any middleboxes within the tunnel.
b) that applying connection semantics is feasible.

Connection semantics feasibility:
*  transport encap: relatively easy - it was simple to implement 
connection semantics in GUT (see code 
<http://www.netlab.tkk.fi/%7Ejmanner/gut.html> or example in Figure 4 in 
draft-manner-tsvwg-gut-02 
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>, or 
see description later under A3.2/ "Transport encap with Connection 
Semantics: Flow state management"). Nonetheless, without congestion 
semantics, GUE/GUT is even simpler, because it can be stateless.
* network encap: harder (see separate email for my proposed design: C1/ 
"Stateless Connection Semantics", but until there's a working 
implementation we have to allow for the possibility that it's not feasible).

Regarding the first question - whether middleboxes (such as firewalls) 
exist on a path:
* most operators of tunnel endpoints don't know for sure, but they do 
know that firewalls, etc. are very likely, so they would have to turn on 
the "middleboxes exist" parameter.
* in one or two important (but private) data centres, the admin might 
know that there are no firewalls (and certainly no NATs), so she can 
turn off the "middleboxes exist" parameter. However, that is the 
exception not the rule.

In summary, connection semantics are essential wherever there might be 
middleboxes. This implies:
* transport encap: connection semantics are relatively simple, so why 
not solely standardize this case? The few cases where the operator knows 
for certain that there are no middleboxes don't need to use connection 
semantics, but they are in private networks, so they shouldn't be the 
primary use-case for standardization.
* network encap: Will connection semantics work? Two possibilities:
   a) if no, the GUE network encap will be pretty useless, given nearly 
all real networks contain firewalls, etc. There will be no point 
standardizing the network encap just for a few special private networks 
that have no middleboxes.
   b) if yes, they will be needed in most real networks, so it should be 
the default case that is standardized. Then the IETF has to ask, is 
there any point standardizing a GUE network encap without connection 
semantics, just for a few controlled environments where the operator 
knows for sure that there are no middleboxes?

Corollary of all this: A packet is a "GUE packet" if either src or dst 
port = 6080.

*1.2/ A Firewall or NAT in front of both ends**
*
Most firewalls / NATs only allow an incoming UDP datagram in response to 
a recent outgoing datagram. If there there are two such middleboxes each 
"protecting" a different endpoint of a GUE tunnel (network or transport 
encap), then neither end can send an initial GUE datagram.

To operate in such an environment, GUE endpoints will need to support 
STUN [RFC5389].

*1.3/ **Multiple GUE servers (transport encap) not possible behind a 
NAT-PT with one external IP**
*
Two cases:

* For transport encap: every GUE server has to have its own public IP 
address.
Reason: if a NAT-PT with one external IP address (A) sits in front of 
multiple GUE servers, only one can be reached on the well-known GUE port 
(6080). Because there will be only one address:port combination to 
address packets to (A:6080). (Dan Wing pointed out this same problem 
with GUT on the tsvwg ML 
<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09851.html>). 
It's not a killer, but it is a limitation to applicability that has to 
be understood and documented.

* With network encap: Non-issue.


*1.4/ Network decap and transport decap problematic on the same***(IP)* 
interface**
*
A consequence of using the same well-known port for GUE transport and 
network encap is that both decaps cannot be deployed at the same IP 
address.

Thought experiment: This might work by implementing a combined 
transport/network decap that checked whether there was another IP header 
in the header chain and:
* if there was, removed the outer IP and the outer UDP+GUE+option headers
* if not, removed solely the outer UDP+GUE+option headers, but not the 
outer IP.

However, there is nothing to say that a GUE transport encap should not 
encapsulate a packet that has already been tunnelled in an IP outer 
(e.g. IPsec AH or ESP). That is, the transport encap would insert a UDP 
and GUE header between the outer IP and the inner IP, without adding 
another IP outer.

It would be safer to use two different well-known ports for transport 
and network encap. However, I think deploying transport and network 
encap on the same IP is a corner case we just need to rule as 
inadmissible. Nonetheless, a sys-admin would get weird behaviour if this 
did happen, with lots of head-scratching before she realised what had 
happened. I'm not sure how to mitigate this.

_*2/ WIRE PROTOCOL*__*
*_
*2.1/ HLEN too small*
S3.1
The 5-bit Hlen field (multiplied in 4B units making max header length 
128B) worries me a lot.
Let's not make a similar mistake to when we limited TCP option space to 
40B, which has caused enormous grief.

*2.2/ GUE versions*
S3.1
The hack in GUE v1 to compress out the GUE header for direct 
encapsulation of IP (v4 or v6) seems neat, but it is also /extremely 
dangerous/. If GUE becomes successful, it would prevent incremental 
deployment of any new version of IP starting 0b10, 0b11 or 0b00. Because:
* S.5.4 says drop an unknown version field, so IP cannot be upgraded 
independently from GUE code.
* A version of IP starting 0b00 would be mistaken for GUE.

The latter might sound unlikely, but bear in mind that:
* you don't know what ideas might come up in future for using multiple 
versions of IP - the IP version field could become important.
* a future version of IP might wrap the version field, because 0x0-0x3 
are no longer used (a version only has to be a unique tag, it doesn't 
have to increase).

[Aside: If you prefer an equally dangerous hack (perhaps because you 
don't believe there will ever be a version of IP beyond v6), you could 
have reduced the Ver field to the first single bit by making GUEv0 the 
one without a GUE header, and GUEv1 the one with. This would have given 
more space for the Hlen field (see my concern in A2.1/ "HLEN too small" 
above and my idea in a separate email to remove the C flag).]

In the separate email about redesign, I'll describe an alternative 
approach that always fits the base GUE protocol into 4B, or even within 
the 8B UDP header (see C6/ Wire Protocol; it comes from an idea to 
develop GUT into what I called Gutless 
<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>, 
back in Feb 2010).

*2.3/ No need to interpret the protocol field relative to IPv4**
*S3.2.1:

    The protocol number in interpreted relative
    to the IP protocol that encapsulates the UDP packet (i.e. protocol of
    the outer IP header).

IPv6 [RFC2460] defines the Next Header field to use the same protocol 
identifier space as IPv4. There are no IPv4 protocol numbers that are 
inappropriate for IPv6 (see the IANA protocol number registry 
<http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml>). 
Therefore, this should simply say that the protocol number is 
interpreted as an IPv6 protocol number (and therefore the field would be 
more appropriately called "Next Header").

*2.4/ No need to restrict interpretation of the protocol field**
*S3.2.1:
This draft should not state any restrictions (e.g. those in the second 
and third paragraphs quoted below) that preclude certain protocol 
numbers in combination with either an IPv4 or IPv6 outer.

    For an IPv4 header the protocol may be set to any number except for
    those that refer to IPv6 extension headers or ICMPv6 options (number
    58). [...]

    For an IPv6 header the protocol may be set to any defined protocol
    number except Hop-by-hop options (number 0). [...]

Various implementations are capable of understanding an IPv6 extension 
or v6-ICMP within an IPv4 header (e.g. [RFC6145 
<https://tools.ietf.org/html/rfc6145#section-5.2>]). And any list of 
restricted header combinations can never deal with newly defined 
headers. So the only test needed is "Does your code for this combination 
and order of headers have the logic for the next header?" GUE then only 
needs to refer to the appropriate action already specified in RFC2046 
(quoted below) rather than making up its own rules:

    The Option Type identifiers are internally encoded such that their
    highest-order two bits specify the action that must be taken if the
    processing IPv6 node does not recognize the Option Type:
    [...]

    If, as a result of processing a header, a node is required to proceed
    to the next header but the Next Header value in the current header is
    unrecognized by the node, it should discard the packet and send an
    ICMP Parameter Problem message to the source of the packet, with an
    ICMP Code value of 1 ("unrecognized Next Header type encountered")
    and the ICMP Pointer field containing the offset of the unrecognized
    value within the original packet.  The same action should be taken if
    a node encounters a Next Header value of zero in any header other
    than an IPv6 header.

There is a sentence at the end of S.3.6 (quoted below) that repeats 
these unnecessary restrictions. If you agree with me, please also remove it.

    [...] In this case next
    header must refer to a valid IP protocol for IPv4. No other extension
    headers or destination options are permitted with IPv4.


*2.5**/ Missed opportunity to liberalise interpretation of the protocol 
field**
*
I believe that GUE offers the opportunity to liberalise, rather than 
restrict, protocol field interpretation. In particular, GUE could allow 
encapsulation of hop-by-hop options (next header number 0). You might 
wonder what a HbH option could possibly mean within a GUE header - see 
C2.4/ "GUE: a potential solution to the IPv6 extension header discard 
problem" in my separate email about how to use GUE to solve the problem 
where IPv6 packets with header extensions are highly prone to discard 
[RFC7872 <https://tools.ietf.org/html/rfc7872>].

*2.6/ Positioning GUE with respect to existing IPv6 extension headers**
*
The draft needs to state rules for where GUE encapsulation fits in the 
order of a chain of any IPv6 extension headers already present in an 
arriving IPv6 packet. Below, this question is considered for both types 
of encapsulation, and in both cases it can be seen that the UDP/GUE 
header would not necessarily be the first header after an IPv6 outer.

* Network encap:
According to my reading of RFC2473, certain IPv6 extension headers in an 
arriving IPv6 should (theoretically) be copied as extension headers for 
the outer:
   a) a Hop-by-Hop Options header (depending on the encap configuration, 
but a jumbogram option would have to be copied)
   b) a Routing header (depending on the encap configuration)
   c) The Tunnel Encapsulation Limit Option (within a Destination 
Options Extension Header)

   - HbH options are pretty academic these days, given they cause about 
39-54% discard [RFC7872 <https://tools.ietf.org/html/rfc7872>]. However, 
if there is one on the inner, I guess we should still say that a GUE 
network encap should copy it to the outer before UDP/GUE is added.
   - I believe RFC2473 was wrong to say a routing header could be copied 
to the outer. Imagine a packet gets tunnelled that has a routing header 
listing addresses D2, D1 & D0 still left to visit. Although it is 
unclear what it means to copy a routing header to the outer, it must 
mean that these addresses would be visited by the tunnelled packet, then 
visited again after decapsulation.
   - I believe the Tunnel Encapsulation Limit Option is also pretty 
academic these days, but again, if one arrived, a GUE network encap 
ought to check the value, decrement it, and copy the header to the outer.

* Transport encap:
In this case, I have suggested where the UDP/GUE header should fit in 
the following order of extension headers (copied from RFC2046):
            IPv6 header
            Hop-by-Hop Options header
          +UDP
          +GUE
            Destination Options header (note 1)
            Routing header
            Fragment header
            Authentication header (note 2)
            Encapsulating Security Payload header (note 2)
            Destination Options header (note 3)
            upper-layer header

The draft ought to mention that if AH has been applied to a packet which 
is then encapsulated by GUE in transport mode, the AH header is not  
recalculated, so it does not cover the UDP/GUE headers. Decapsulation 
works because the UDP/GUE headers are inserted before the authentication 
header, so they will be removed (by a GUE decapsulator in transport 
mode) before AH is verified.

Personally I don't know enough about routing headers to make the 
decision on whether they should be above or below the GUE header in the 
transport encap. I believe they are only processed when a packet reaches 
the destination address in the main header, but I am not familiar with 
all the different routing types (I know some are deprecated, and frankly 
I couldn't be bothered to read the others).

*2.7/ Reliable delivery of control messages**
*
The examples of potential control messages (those with the 'C' flag) 
given in S.3.5.1. (echo request/reply for testing) aim to mimic the data 
channel, so unreliable delivery as a GUE datagram is appropriate.

The draft doesn't define any other tunnel control messages. However, if 
it did, many/most would need to be delivered reliably and in order (e.g. 
key agreement, any necessary configuration agreement, consistent 
application of connection semantics, etc).

Therefore, reliable ordered delivery for control messages will need to 
be defined (see C3.2/ "Reliable delivery of control messages" in 
separate email for a suggested design).

*2.8/ Extensibility of the flags and optional fields scheme: doesn't work**
*S3.3:
This is meant to be "the primary mechanism of extensibility in GUE". 
However, for extensibility to work, GUE needs to distinguish between:
* options: the base set of flags+options defined from the start and 
required in all GUE code
* "extensions" (my term): future extensions to the flags and options.

The current GUE flags scheme only works for options, but it inherently 
puts extensions into a chicken-and-egg stand-off. because:
a) S5.4 says an implementation MUST drop a packet with an unknown flag. 
So, if the IETF later defines bit 7, until a very large proportion of 
GUE decap implementations have been upgraded with logic that understands 
bit 7, the packet is going to be dropped with high probability. So no 
encap is going to want to set bit 7 on a packet, so there is no 
motivation for a decap to implement the code for bit 7.
b) For such unknown flags, we cannot change "MUST drop" to "MUST 
ignore", because the lengths of the fields are not self-describing - 
they have to be hard-coded into an implementation. So if one GUE 
implementation only has logic about the flags up to bit 6, but a packet 
arrives with bit 8 set, the implementation doesn't know how large the 
"Fields" field is, so it doesn't know where the private data starts.

For proper extensibility, each new GUE flagged option needs to be 
self-describing, i.e. with additional fields to say:
a) Whether nodes that do not have the logic to understand the option 
should drop or ignore the packet, separately for:
   - nodes on the path
   - nodes at the dest. (decap) of the GUE datagram.
b) Whether the option is intended to change on path (in which case it 
should not be covered by integrity or authentication codes).
c) Whether the option should be copied or not by a GUE-in-GUE tunnel 
encap (see A4.4/ "Tunnels in Tunnels" later).
d) The length of the option
e) Additionally you might want to borrow the IPv6 idea of controlling 
whether there needs to be an error message or not, but personally I 
believe that is overkill (the intention was for silent failure to be 
impossible for critical features, but it is very hard to deliver error 
messages reliably anyway).

The above shows that attempting to invent a new extensibility scheme 
usually ends in tears. The IETF and others have developed 
tried-and-tested extensibility approaches like TLV, CBOR. Even then, 
they still have problems. The above points draw lessons from all this, 
particularly:
* action codes and change codes in the initial bits of IPv6 HbH & DO 
options [RFC2460]
* TRILL extension word flags: critical and non-critical separately for 
hop-by-hop and ingress-to-egress (see [RFC7179] updated by [RFC7780]).
* 'Self-describing objects', including type and size, is listed as 
'Architectural Principle of the Internet' number 3.12 in [RFC1958]

*2.9/ Hard-coded option lengths do not scale**
*
By hard-coding the length of each option in an RFC and in the GUE code 
(rather than self-describing in the packet), you are stuck with a 
certain size option for ever. Experience has proven that fields such as 
message authentication codes (MACs), fragment IDs, etc. have to scale. 
Admittedly, we could define flags for larger fields later, but I have 
shown above that new flags would be undeployable.

*2.10/ Random access to options needs motivating*
Quoting S3.3:

    Flags allow random access, for instance [...]

There might be a case for GUE to use a protocol heap rather than a stack 
[Braden03]. If so, please motivate it.

[Braden03] Braden, R., Faber, T. & Handley, M., "From Protocol Stack to 
Protocol Heap: Role-Based Architecture 
<http://doi.acm.org/10.1145/774763.774765>," ACM SIGCOMM Computer 
Communication Review 33:17--22 ACM (January 2003)


_*3/ STATE*__*
*_
*3.1/ Per-connection**state vs. ***stateless connections* but per-tunnel 
state**
*
The GUE draft does not suggest a mechanism for GUE endpoints to apply 
connection semantics.

* For transport encap the GUT draft suggests an approach that uses 
per-flow state (see the example given in Figure 4 in 
draft-manner-tsvwg-gut-02 
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>).
* For network encap a stateless approach is proposed in my separate 
email (see C1/ "Stateless Connection Semantics"). Statelessness is 
important to simplify migration during load-balancing, failures etc.

The 'shared fate' resilience principle [Clark88] maintains that a system 
should avoid reliance on flow-state held on the path, preferring to hold 
state solely at the endpoints. One could argue that, in transport encap 
mode, the GUE endpoints are on the end hosts, and therefore, the 
communication path is resilient because if GUE flow state is lost 
because an end host fails, the communication will have failed anyway. 
However, strictly, a GUE endpoint process is likely to be separate 
(perhaps even in NIC hardware) so it could fail independently of the 
true endpoint process of the connection.

So it would be ideal to use a stateless approach for both network and 
transport encap. However, the best stateless approach I could come up 
with (if it works at all) requires some coordination and hence one-off 
set-up latency between the GUE endpoints. Therefore, stateless 
connections will be:
* more appropriate for network encap (usually long-lived tunnels); and
* less useful for transport encap (opportunistic per connection).

To summarize, it is likely that the stateful approach will be used, at 
least for some GUE encapsulators in transport mode. Therefore, for the 
transport encap mode at least, the draft needs to consider per-flow 
state and its management (see following section).

[Clark88] Clark, D.D., "The design philosophy of the DARPA internet 
protocols," Proc. ACM SIGCOMM'88, Computer Communication Review 
18(4):106--114 (August 1988)

*3.2/ Transport encap with **Connection Semantics: Flow state management**
*
Hosts already maintain flow-state for each connection in progress. To 
support GUE in transport encap mode, it is trivial for the hosts at each 
end to associate a little extra state with the existing state of each 
inner flow:
* At the initiator end, it needs no flow-state to receive GUE packets, 
but in order to send GUE packets, it associates the original (inner) 
flow's ID with the source port it will use in the UDP outer to send 
every GUE packet.
* At the responder end, it has to associate the inner flow ID with the 
source port in arriving GUE UDP outer headers. It needs this so that, 
when the inner flow sends out packets, the GUE encapsulator can 
intercept them and encapsulate them with a GUE header, using the stored 
source port as the destination port.
* Any error messages returned from the responder also need to be 
encapsulated in the same way.

Also, the draft needs to specify:
* that a GUE transport decap ought to protect itself against DDoS by not 
storing flow state if no associated socket is open;
* how long to time out unused flow state;
* what to do with a packet if the necessary flow state is not present;

*3.3/ Keepalives for middlebox flow state**
*
Middleboxes, such as firewalls and NATs time out the pin-hole associated 
with UDP flow-state fairly rapidly, but rarely less than 15s [RFC5405]. 
RFC5405 rightly says that an application that uses UDP should be 
responsible for recovering a timed out connection, rather than the stack 
sending keepalives to hold open a connection, when it doesn't actually 
know whether the application still wants the connection open.

Nonetheless, an inner flow will not be aware that it is being tunnelled 
using UDP/GUE. Therefore it seems less inappropriate for the GUE encap 
to keep state alive on behalf of the application, so it ought to send 
keepalive GUE datagrams to hold any pin-hole open. However, if the 
application has not sent anything for some time (whatever that means), 
the GUE encap should time out the connection, rather than holding 
middlebox flow-state (and its own flow-state) open for ever.

If you agree, it might be necessary to specify a keepalive control 
message that a GUE encap can send to the remote end of the GUE tunnel 
(which would also keep any flow-state at the remote end alive). These 
would only be necessary in one direction, and would not need to be 
reliably delivered.

See Section 3.1 of draft-manner-tsvwg-gut-02 
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.1> for 
the keepalive control message defined for GUT.


_*4/ OPERATION*_
*
4.1/ Transport encap: to GUE or not to GUE?**
*
For transport encap, the draft needs to say how the host decides when to 
use GUE and when not.
There's text on this inS.4 of the GUT draft 
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-4>, if 
you want to use it.

*4.2/ Hop limit / TTL processing**
*I couldn't find any text about this. Perhaps you intended this sentence 
in S.5.3 to cover it:

    it should follow standard conventions for tunneling of
    one IP protocol over another

I think it would be best to spell out Hop limit processing. There's text 
on this inS.3.2 of the GUT draft 
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.2>, if 
you want to use it.


*4.3/ Error messages**
*S5.4

    No error message is returned
    back to the encapsulator.

Please go through every type of error and in each case justify why no 
error message to the encap is necessary.

*4.4/ T**unnels in tunnels
*S5.5 2nd para

    It
    may encapsulate a GUE packet in another GUE packet, for instance to
    implement a network tunnel (i.e. by encapsulating an IP packet with a
    GUE payload in another IP packet as a GUE payload).

A number of problems here:

1) A "GUE packet" has not been defined. I assume any UDP header with 
either src or dst UDP port = 6080 (see A1.1/ "Inferring Connection 
Semantics: the rule not the exception").

2) There is an incremental deployment problem here. Existing tunnels 
won't check within the outer IP for whether a UDP port is a GUE port. 
They will just add a new outer IP header without the UDP or GUE.

3) Whatever, if a tunnel is GUE-aware, this para needs to be clear 
exactly which headers it should copy with the outer IP:
* Do you intend this to mean that all the following should be copied to 
the outer IP header:
   - the outer UDP,
   - any v0 GUE header
   - plus any GUE options or private data.
* Is it appropriate to copy all the options and private data? I think 
only some (e.g. perhaps the VNID in certain circumstances?). Others 
would not have the correct semantics if blindly copied (e.g. fragment 
options, coverage of MACs, etc).
* How does a GUE-in-GUE encapsulator know which to copy?

Also, should any extension headers on an arriving IPv6 outer also be 
copied to be associated with the new outer? If so, which ones, and how 
does the encapsulator know? Do the same rules apply whether using 
transport or network encapsulation?

I have been arguing since about 2009 that, when adding a new IP outer, 
each IP (at least IPv6) extension header should self-describe which 
headers should be copied to the outer on encap. At present RFC2473 lists 
some extension headers that might be copied and says it depends on the 
configuration of the encapsulator. But a hard-coded list precludes 
introduction of any new extension that needs to be copied. And certainly 
it doesn't work for extensions like GUE that don't fit into the original 
mould of what an IPv6 extension looks like. The behaviour needs to be 
somehow self-declared in each header, not in a standard.

It is tough to solve this problem in a way that will work with existing 
tunnels. It needs solving more generally, not just for GUE. However, as 
long as GUE encapsulators address this problem from day-1, GUE presents 
an opportunity to solve the general problem in environments where all 
encapsulations are GUE-based (see my proposed solution in C4.1/ 
"Ensuring certain GUE headers are copied when a GUE packet is tunnelled" 
within my separate email on redesign). Then other encapsulation 
approaches might follow.

*4.5/ **SHOULD adjust MTU?
*

     An operator may set MTU to account for encapsulation overhead
     and reduce the likelihood of fragmentation.

I would expect "SHOULD" here.

You might want to refer to draft-ietf-tram-stun-pmtud for a way to do 
PMTUD with UDP (for STUN, but I think it would be similar for GUE).
*
**4.6/ Is orig-proto field necessary in the fragmentation option?*
S4.3 of draft-herbert-gue-extensions-00

Why does the original protocol of a fragmented packet need to be visible 
before reassembly by declaring it in the GUE fragmentation option of 
each fragment? The GUE protocol field will be available once the 
fragments are reassembled, and I can't see why it would be needed before 
that.

It is not good security practice to create multiple fields that are all 
intended to be set to the same value. Even if the implementation uses 
these orig-proto fields before reassembling the fragments, it will still 
have to check that they all match the GUE protocol field when the packet 
has been reassembled. And if any are not the same, it will raise 
security concerns about any action that had previously been taken based 
on an inconsistent value.

*4.7/ Congestion Control: reductio ad absurdum**
*S5.9
I suggest you remove the para about DCCP being appropriate for tunnel 
congestion control. I appreciate you are trying to comply with RFC5405, 
but it is impossible for tunnel specs to do so without looking absurd. 
The more you try, the more it will look like you are the ones that are 
absurd. RFC5405 gives no guidance on how to comply with its requirement 
about congestion control of non-IP traffic across a tunnel... because 
there is no running code for tunnel congestion control, or for a network 
circuit breaker.

It has been suggested in the past that DCCP should be used across 
tunnels. DCCP is intended for a single flow and all the DCCP profiles 
defined so far ensure a DCCP "flow" will consume about as much capacity 
as a TCP flow. If DCCP were to be applied across a GUE tunnel it would 
reduce the rate of the aggregate of all flows across the tunnel to 
roughly the same as a /single/ TCP flow (see the intro of RFC7893 
"Pseudowire Congestion Considerations").

One might imagine that RFC5405 means that a tunnel protocol designer 
would have to detect roughly how many flows a tunnel aggregate consisted 
of at any one time (say N flows) and attempt to design a congestion 
control (e.g. a DCCP profile) to consume roughly as much capacity as N 
TCP flows. However, this would probably cause horror for some in the 
transport area at the thought of the IETF endorsing a congestion control 
that can be N times as greedy as TCP.

To further reduce the idea of a tunnel encap applying congestion control 
to absurdity, it would need:
a) a huge buffer to absorb incoming packets whenever they arrived faster 
than the tunnel rate. All packets (in small and large flows) would back 
up behind this huge queue, which would be called buffer bloat, which 
would cause horror for most people in the transport area.
b) ideally, a time machine (a negative buffer) to bring packets forward 
in time whenever the arrival rate of all the flows was insufficient to 
satisfy the desired aggregate rate of the tunnel.
c) the addition of feedback channel(s) and a huge amount of extra 
processing.

[As you can see, I don't support the idea in RFC5405 that a tunnel 
becomes responsible for congestion control of traffic that it 
encapsulates. Otherwise, to be consistent, an Ethernet link would become 
responsible for congestion control of traffic it encapsulates. However, 
I accept that consistency with RFC5405 is currently a hurdle your draft 
has to cross before it can be approved. If you feel you have to suggest 
a mechanism, IMO a policer makes sense - either a rate policer or a 
congestion-rate policer.]


*4.8/ Multicast outer -> Implosion on inner destination**
*S.5.10
Consider an inner flow of unicast packets, src-IP A, dst-IP B. Consider 
the encap adds an outer addressed to multicast address M, and consider n 
decapsulators subscribe to group M. This will cause the network to 
duplicate each packet n times. As each decap forwards the inner, n 
duplicates of each packet will converge on B.

This might make sense with unicast inner packets for a small number of 
decaps (e.g. two for redundancy). And a multicast overlay could make 
sense for multicast inner packets as long as the multicast routing was 
aware of the P2MP tunnel (with suitable grouping of multicast groups).

I think the text should say that a multicast outer is not precluded, 
because it is a theoretical possibility, but it should not be attempted 
without a safety harness and an empty bladder.

*4.9/ Deriving flow entropy from the inner is contrary to "GUE permits 
encap of arbitrary IP protocols" claim**
*S.5.11.1
The general idea for creating flow entropy seems to be for the GUE encap 
to map inner flows of possibly "atypical IP protocols" to individual UDP 
outer flows, on the assumption that switches or routers that implement 
ECMP etc. will understand UDP but not "atypical IP protocols". Let's 
examine this claim by taking network encap and transport encap separately.

1) Network encap
Imagine that a GUE encap has been implemented that understands TCP, UDP, 
SCTP, DCCP, ICMP, RSVP, IPsec and ESP.
Then researchers implement NewSexyTP, with a new IP protocol number. 
Every GUE encap in the world doesn't have any logic to understand or 
locate the flow ID fields of NewSexyTP. So GUE does not "permit encap of 
arbitrary IP protocols" as claimed in the motivation section.

Further, why will GUE implementations be updated with logic to 
understand NewSexyTP any faster than the ECMP code in general-purpose 
switches and routers? One GUE implementation might be updated, but other 
developers might not so diligently track the latest transport protocols. 
One cannot even really argue that the ECMP code in switches and routers 
is implemented in hardware, so it will be harder to change than GUE 
code. Because the forwarding performance of GUE tunnel encap will need 
to be no different to the performance of forwarding in general switches 
and routers, so if hardware is necessary for one it will be necessary 
for the other.

2) Transport encap.
If GUE encap is implemented as a centralized daemon process on a host or 
centralized in a NIC, it will suffer from the same lack of forward 
compatibility with new transport protocols as the network encap - 
particularly if it is implemented in NIC hardware. Ie, if an operator 
installs SexyNewTP in their OS, they will also have to wait for a GUE 
update that supports SexyNewTP. This is the case with or without 
connection semantics.

However, it might be possible to implement GUE transport encap 
(including with connection semantics) so that each instance of a 
protocol stack is associated with an instance of GUE (warning: I have no 
idea yet whether this will be possible). In this case, each GUE instance 
would consistently add the same outer port number to the inner protocol 
instance it was associated with, without needing to understand how to 
identify a flow ID in any particular protocol.

In summary, certainly for net encap, but possibly not for transport 
encap, GUE only helps "atypical IP protocols" that a particular GUE 
encap implementation already understands.

*4.10/ Flow entropy from encrypted data could weaken the crypto?**
*S.5.11.1

      o If a node is encrypting a packet using ESP tunnel mode and GUE
         encapsulation, the flow entropy could be based on the contents
         of clear-text packet. For instance, a canonical five-tuple hash
         for a TCP/IP packet could be used.

I'm not a crypto expert, but it sounds dangerous to take some clear-text 
from a known position in the data, hash it with a function that is not 
strongly one-way, then send this hash along with the cipher text.

I think the SPI can be used as a unique consistent per-flow value, can't 
it? The SPI has been suitably randomised so that it reveals nothing 
about the flow ID.

*4.11/ No need to constrain flow entropy distribution**
*S.5.11.2

       o The flow entropy should have a uniform distribution across
         encapsulated flows.

Equal distribution of flows is not necessarily appropriate for all 
scenarios. Flows have a distribution of sizes, and altho ECMP is 
generally done randomly, an operator might want to (somehow) bias the 
hash algorithm to allow for the flows with the highest rate, which might 
otherwise unbalance the load. See for instance:
"Engineered Elephant Flows for Boosting Application Performance in 
Large-Scale CLOS Networks 
<https://www.broadcom.com/collateral/wp/OF-DPA-WP102-R.pdf>" Broadcom 
White Paper (March 2014)

*4.12/ No need to constrain flow entropy interpretation**
*

         Decapsulators, or any networking devices, should not attempt to
         interpret flow entropy as anything more than an opaque value.

This seems unnecessarily constraining. This might not be a good idea, 
but if someone finds a use for it, there's no need to stop them - if 
it's useful they'll ignore you anyway, so why bother saying it? Perhaps 
you intended to explain why doing this could be problematic, rather than 
precluding it?

_*5/ SECURITY*__*
*_
*5.1/ Addresses that are both visible and hidden? Have your GUE and eat 
it too?**
*
S.7.  In the following sentence,

    Existing network security
    mechanisms, such as address spoofing detection, DDOS mitigation, and
    transparent encrypted tunnels can be applied to GUE packets.

This should point out that an existing set of address spoofing detection 
rules would not work with GUE. I think you meant that existing rules and 
mechanisms could be modified to check the packets encapsulated by GUE 
without using radically new techniques.

However, if GUE is in network encap mode and it encrypts the IP headers 
of the inner packets, address spoofing detection and DDoS mitigation 
will not be possible over the length of the GUE tunnel. You cannot both 
claim that GUE can hide information, and that GUE allows existing 
security techniques to work that rely on access to the hidden information.

*5.2/ How can the Security option protect a UDP/GUE header from being 
moved or removed?**
*
The Security option is "used to provide integrity and authentication of 
the GUE header."
I assume you envisage this would be complemented by other authentication 
techniques such as IPsec AH to provide integrity and authentication of 
the rest of the packet.

However, it occurs to me that the two together do not protect the 
integrity of the /structure/ of the packet as a whole (whether network 
or transport encap). An on-path attacker could still move the UDP/GUE 
header within the packet (it might be possible to construct a valid 
packet with altered semantics), or remove the UDP/GUE header completely. 
I can't immediately think whether any damage could be done with such an 
attack, or how to prevent it. However, I'm sure there will be a crypto 
expert for whom this is not a new problem.

Also, the 32B max length of the security option is insufficient. I 
looked for a MAC protocol where a larger field is needed, and the first 
one I picked required a larger field: RFC4383 "TESLA in Secure RTP" 
requires 34B, and that's just for the default sizes, not even the 
maximum. I picked TESLA because I knew each datagram needs a lot of 
authentication space. TESLA provides multicast message authentication, 
so as well as a key index and a MAC, each packet reveals a continually 
changing key.

*5.3/ What happens when a port scan sends a datagram to port 6080?**
*
When a port scan (that doesn't necessarily know about GUE) sends a 
datagram to port 6080, if the datagram has a body, and the body starts 
with a zero bit, the GUE daemon will start processing it.
If the first 4 octets happen (randomly) to be set to values that would 
be a valid GUE header (see S.5.4), it will be decapsulated and forwarded 
to a protocol handler.

Not a show-stopper, but worth documenting?

*5.4/ Firewalls will still block new/atypical protocols**
*Few firewalls allow incoming UDP. So GUE will not enable deployment of 
servers using atypical/new protocols, which will still face a deployment 
problem.

If a firewall opens a pin-hole to allow incoming UDP to access the 
well-known GUE port it would allow attackers to reach servers of any 
protocol while bypassing the firewall. E.g. an attacker could access a 
TCP-server by encapsulating TCP in GUE in order to bypass the firewall. 
Therefore, a firewall will only open a pin-hole to a GUE server, if it 
also inspects the packet encapsulated by GUE and applies all its normal 
rules to that as well.

This is why I have said elsewhere that the draft should state that 
firewall bypass by new/atypical protocols is a non-goal of GUE.

*5.5/ Transport Encap: Two Passes through a Local Firewall?**
*
GUE in transport mode resubmits the encapsulated packet to the host's IP 
stack. But it needs to make sure it re-injects the packet at the correct 
point in relation to any local firewall.

* If the firewall includes rules to inspect the packet encapsulated with 
GUE (as discussed in the previous point), it would make sense to 
re-submit the packet above the local firewall.
* If not, GUE should resubmit the packet so that it passes through the 
local firewall again.

The latter mode would make more sense if GUE was also decrypting the 
inner packet. So, rather than have two options, a local firewall could 
work co-operatively with GUE in transport mode, so it doesn't have to 
inspect the inner in both passes.

*6/ Implementation**
*
*6.1/ Practical Large Receive Offload Requirements**
*Appendix A.4 says:

    The conservative approach to supporting LRO for GUE would be to
    assign packets to the same flow only if they have identical five-
    tuple and were encapsulated the same way. That is the outer IP
    addresses, the outer UDP ports, GUE protocol, GUE flags and fields,
    and inner five tuple are all identical.

Rant: It is sad if such a conservative approach to LRO is still 
necessary. Any API to LRO hardware needs to be able to be given the 
locations of certain header fields that are deliberately intended to 
vary, so it can offer the facility to separately report these for each 
packet. A MAC of the encapsulating headers is a good case in point. ECN 
is an even better example of a varying field, because it has been a 
standard part of the IP header since 2001, long before LRO hardware was 
designed.


-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/
Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Bob Briscoe
Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Tom Herbert
Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Bob Briscoe
[nvo3] Review ptA: Technical draft-ietf-nvo3-gue-… Bob Briscoe
Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Tom Herbert