[nvo3] Review ptC redesign of parts: draft-ietf-nvo3-gue-04

Bob Briscoe <ietf@bobbriscoe.net> Sat, 13 August 2016 13:54 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE6CB12D1DC for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:54:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5PttwNCDFEiM for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:54:51 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CEDE512D0C8 for <nvo3@ietf.org>; Sat, 13 Aug 2016 06:54:48 -0700 (PDT)
Received: from 203.137.112.87.dyn.plus.net ([87.112.137.203]:49568 helo=[192.168.0.7]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from <ietf@bobbriscoe.net>) id 1bYYx9-0002Ds-Ej; Sat, 13 Aug 2016 14:26:20 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Tom Herbert <tom@herbertland.com>, Lucy Yong <lucy.yong@huawei.com>, Osama Zia <osamaz@microsoft.com>
Message-ID: <132469ad-2ea2-5592-cc9d-96fafeff7a7e@bobbriscoe.net>
Date: Sat, 13 Aug 2016 14:26:18 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------F372248E8155FBA9960406F9"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/4pF2vgsspQdUlPDUy87xgeBI6SU>
Cc: "nvo3@ietf.org" <nvo3@ietf.org>
Subject: [nvo3] Review ptC redesign of parts: draft-ietf-nvo3-gue-04
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Aug 2016 13:54:56 -0000

Tom, Lucy, Osama,

A) Technical Review of GUE 'As-is'
B) Editorial Review
C) Redesign of parts of GUE  <--- This email

Below I suggest a number of ideas for improving those aspects of GUE 
protocol that my review in PtA identified as problematic.

Each main section gives a partial redesign that I believe can be adopted 
independently of the others. Except, section 6/ "Wire Protocol", which 
suggests a couple of wire protocol designs that neatly fold together all 
the other ideas.

I guess you might be inclined to resist protocol changes if GUE is 
already deployed (e.g. in private  networks). But we can just use a 
different well-known port for a new version (as suggested in section 6).

*TABLE OF CONTENTS**
*
C/ REDESIGN OF PARTS OF GUE

1/ STATELESS CONNECTION SEMANTICS
1.1/ Suggested connection semantics solution to the middlebox traversal 
problem of GUE network encap

2/ PROTOCOL EXTENSIBILITY
2.1/ Reuse existing building blocks (extension headers); don't 'reinvent 
the wheel'
2.2/ Hybrid of GUE option flags and IPv6 extension headers?
2.3/ Why only IPv6 next-header space is needed
2.4/ GUE: a potential solution to the IPv6 extension header discard problem
2.5/ Handling Hob-by-Hop headers in the original packet
2.6/ Solving the problem of tunnel fragmentation in IPv4

3/ CONTROL MESSAGES
3.1/ Is the C flag necessary?
3.2/ Reliable delivery of control messages

4/ TUNNELS IN TUNNELS
4.1/ Ensuring certain GUE headers are copied when a GUE packet is tunnelled

5/ CHECKSUMS
5.1/ Avoiding double checksum coverage, except when essential for transition

6/ WIRE PROTOCOL
6.1/ A Proposed Redesign for the GUE Wire Protocol
6.2/ 'UDP Checksum Operative' flag
6.3/ Compressed UDP/GUE header: 'GUTless'


*C/ REDESIGN OF PARTS OF GUE**
*
_*1/ STATELESS CONNECTION SEMANTICS*__*
*_
*1.1/ Suggested connection semantics solution to the middlebox traversal 
problem of GUE network encap**
****
*Sections 5.6.1 & 5.6.2 of the GUE draft require that GUE endpoints 
apply connection semantics if there is a middlebox between them (e.g. 
firewall or NAT). Given it is hard for tunnel endpoints to know whether 
they traverse a middlebox, but in many scenarios middleboxes are likely, 
this implies GUE will nearly always need to use connection semantics. 
However the draft does not say how to. A potential approach without 
connection state is outlined below.

The proposed approach requires co-ordination between the tunnel 
endpoints. Therefore applicability will differ between encapsulation modes:
* Network encapsulation: The set-up latency for coordination will 
usually be acceptable for a pair of network encapsulation peers, which 
will often form a long-lived relationship.
* Transport encapsulation: This mode will tend to be used more 
opportunistically. That is, for a communication between hosts A and B, a 
GUE daemon on host A can derive the address of a potential GUE daemon on 
host B, by just sending to port 6080 at the same IP address (B). In such 
opportunistic cases, any set-up latency for co-ordination between the 
GUE daemons is likely to be unacceptable. Fortunately, in transport 
encap mode, the two ends can use per-flow state rather than coordination 
to maintain connection semantics (see A3.2/ "Transport encap with 
Connection Semantics: Flow state management" in my separate email review 
of GUE).

Therefore, the following approach is theoretically applicable for both 
network and transport encapsulation, but it is less likely to be used 
for transport encap.

* GUE network encap/decap endpoints agree between them:
   a) which end will use fixed GUE port (port G)
   b) that the other end, E, will use an ephemeral port (port e) for 
flow entropy
   c) the hashing approach they will both use between inner and outer
* The encap (either end):
   - internally normalises the inner flow info (S.5.11.1) to the E-G 
direction, then
   - hashes it to port e.
* Then adds a UDP header with src:dest ports:
   - e:G at the 'E' end
   - G:e at the 'G' end.

Notes:
* It shouldn't matter which end uses the fixed port and which uses the 
ephemeral port. If there is a middlebox such as a firewall or NAT 
between the GUE endpoints, it shouldn't matter if the egress direction 
from private to public runs from G to e (rather than e:G). As long as 
the first datagram of a connection is in the egress direction allowed by 
the middlebox, it will then open the pin-hole required for the return 
datagram (e:G).
* Coordinating the hashing algorithms is the difficult part of this 
proposal. It is necessary so that both ends consistently use the same 
random ephemeral port for each inner connection. Co-ordinating hash 
algorithms is not really technically difficult; it is "merely" difficult 
in that there are many choices none of which have any particular merit 
relative to others, but no-one wants to use someone else's choice even 
though it doesn't really matter - a typical standardization problem! 
Standardization could be applied at different levels:
   - The simplest but naive approach would be for all GUE 
implementations to use one standard hashing approach for each inner 
protocol, defining which inner identifiers and which hashing algorithm 
to use.
   - A more likely approach would be similar to that used to negotiate 
consistent use of encryption and decryption between two endpoints: each 
hashing algorithm and each choice of inner protocol IDs would have to be 
given standardized names, then a negotiation protocol would be needed 
for each pair of tunnel endpoints to agree a common hashing approach to 
be used for packets passing between them.
   - In both cases, the endpoints would need to agree on a value to use 
to seed the hashing algorithm, and they would occasionally need to 
reseed it.

_*2/ PROTOCOL EXTENSIBILITY*__*
*_
*2.1/ Reuse existing building blocks (extension headers); don't 
'reinvent the wheel'**
**
*A whole new protocol framework is included in GUE:
a) using flags to locate some option fields that are associated with the 
data;
b) using the C flag and the ctype field to signal that the whole 
datagram is a tunnel control datagram.

The review of GUE "as-is" in my separate email has identified a number 
of problems with this attempt to "reinvent the wheel" under the 
following headings:
* A2.8/ "Extensibility of the flags and optional fields scheme: doesn't 
work"
* A2.9/ "Hard-coded option lengths do not scale"
* A2.3/ "No need to interpret the protocol field relative to IPv4"
* A2.4/ "No need to restrict interpretation of the protocol field"
* A2.5/ "Missed opportunity to liberalise interpretation of the protocol 
field"

Instead, I suggest that a GUE encapsulation could use the existing IPv6 
building blocks, rather than reinventing the wheel. The resulting 
structure of a packet encapsulated by GUE would be as follows:

Network encap:

+ IP (v4 or v6)
+ Copied IPv6 HbH Extension header [if present]
+ UDP
+ GUE
+ GUE IPv6 extension header(s) [optional]
   IP (v4 or v6)
   IPv6 Extension headers [if present]
   Transport Protocol header

Transport encap:

   IP (v4 or v6)
IPv6 HbH Extension header [if present]
+ UDP
+ GUE
+ GUE IPv6 extension header(s) [optional]
   Remaining IPv6 Extension headers [if present]
   Transport Protocol header

The detailed behaviour is as follows:
* The '+' signs indicate headers added by a GUE encapsulator
* The GUE header includes a header length field to identify the dividing 
line between headers added during GUE encapsulation and those already in 
the native packet. This identifies the headers that:
   a) are addressed to the GUE decapsulator, rather than the destination 
of the encapsulated packet
   b) need to be covered by a GUE checksum; and
   c) need to be removed by a GUE decapsulator.
* In the following, we shall call any IPv6 extension headers that fall 
within the GUE header length "GUE extension headers".
* The initial GUE header includes a next header field to start the new 
header chain.
* The number-space for IPv6 next-headers inherits the IPv4 Protocol 
number allocation-space [RFC2460]
* The GUE spec should be highly judicious in specifying which extension 
headers cannot be combined with others. Some implementations can be more 
creative than others, and a spec ought not to prevent that. If an 
implementation does not have the logic to understand a particular 
extension header or option in a particular position, the implementation 
can tell:
   - whether to ignore the option or discard the packet;
   - how long the unknown option or header is, so it can move on to the 
next one if appropriate;
   - whether an unknown option or header is immutable in transit, so 
that it can be included in integrity/authentication coverage.
This is because every IPv6 extension header and option starts with a 
common structure that gives all this information.
* The chain of next header fields added with GUE ends by chaining into 
the outermost header encapsulated by the whole set of added GUE headers 
(true for both network and transport encap).
* For transport encapsulation, when the set of GUE headers (those marked 
'+') are decapsulated, the chain is stitched back together in the 
reverse of the encap process. I.e. the UDP Protocol number (17) in the 
outer IP header is replaced by the last next-header of the GUE 
extensions headers.
* Normally, an application identifies any headers encapsulated by a 
transport protocol implicitly from the server port number, not by a next 
header value in the transport protocol header. For instance, an 
application talking to or from port 80 uses HTTP, even tho there is no 
next-header pointing to HTTP. GUE is both the same and subtly different:
   - the GUE port number (6080) implies that the "application protocol" 
inside the UDP header is GUE - there is no next-header pointing to GUE.
   - However, a GUE header itself has a next header field - the 
ctype/proto field - that can start a new chain of extension headers. 
Currently this points to the next header beyond any GUE options. Here, I 
propose the semantics of this field should be redefined so that Hlen 
determines how much of the chain is within the GUE header. If GUE Hlen 
is zero, the GUE next-header will naturally still point to the first 
protocol or extension header after the GUE header.
* Details would have to be defined, e.g. alignment of Hlen (IPv6 
extension headers use 8B alignment); Error conditions, e.g. when HLen 
ends in the middle of a header, etc.


IPv6 extension headers and options could be used:
* instead of options (e.g. those described in draft-herbert-gue-extensions )
* for private optional headers (using experimental IPv6 option types)
* for control messages, in place of using the 'C' flag.

An IPv6 extension header already exists for some of the desired GUE 
options (e.g. the IPv6 fragmentation extension header).
* Where GUE needs an option for which an IPv6 extension header does not 
already exist, a new option type can be defined, typically as a 
destination option, to be included within a Destination Options 
extension header (to be acted on by the destination of the tunnel 
outer). For example:
   - a new destination option would be appropriate for a virtual network 
ID (VNID).
* In some cases, an IPv6 extension header with a similar function to a 
GUE option exists (e.g. AH/ESP integrity and authentication and/or 
confidentiality), but its semantics might not meet the needs of GUE. It 
might be possible to define GUE-specific semantics for such a field when 
used as a "GUE extension header", i.e. within the range of the GUE 
header length.
* Tunnel control datagrams would solely consist of a GUE extension 
header(s) or a v6-ICMP message, without anything outside the GUE header, 
by ending the header chain with the No Next Header code.

Disadvantages:
* Using the type-length-value (TLV) scheme of IPv6 consumes 2B overhead 
for each option within DO (for the type and length), plus 2B for the DO 
extension header itself (and a DO extension header is padded to 
multiples of 8B). In contrast GUE optional fields use 1-2b overhead for 
the flag in the GUE header.
* You don't get random access (but do you need that, e.g. in any of the 
cases defined so far?)
* Raises the standardisation bar for new GUE options. For each new 
option for GUE, the standardization process is likely to be more onerous 
given the implications of creating a new IPv6 option are greater than 
the implications of just creating a new GUE option.

Advantages:
* Removes the need for the flags field and the C flag, making more space 
(e.g. 8-bits) for the otherwise worryingly small (5-bit) Hlen field (see 
/6 "Wire Protocol" for an alternative);
* What to do with an option that an implementation doesn't understand 
comes for free with IPv6 options
   - whether to drop, ignore or return an error;
   - whether the field is mutable; and
   - how much space to jump to parse the next option
are all determined by the 2-bit act code, the 1-bit chg code and the 
length field in common positions at the start of each option type;
* Avoids extra design effort, security analysis, implementation bugs;
* Code re-use.
* No change in the semantics of an IPv6 DO, because they are already 
intended to be acted on solely by a tunnel decap [RFC2473]
* Provides a principled way to implement new IPv6 extensions without the 
heavy discard problem [RFC7872 <https://tools.ietf.org/html/rfc7872>] 
(see section 2.4/ "GUE: a potential solution to the IPv6 extension 
header discard problem" below)
* Lowers the standardization bar for new IPv6 options. Given GUE is 
intended to be a generic facility, it could become the best way to 
extend IP (v4 & v6).

*2.2/ Hybrid of GUE option flags and IPv6 extension headers?**
*
A hybrid approach could be adopted to save header space but still allow 
extensibility:
* Extensions and options could be divided [Briscoe14] into:
   - those known at the time GUE is standardized (options) and
   - those introduced afterwards (extensions)
* Extensions would need to be self-describing, but options would not - 
all GUE code could be required to support the base set of options, so it 
could be hard-coded with their lengths and properties.
* For the base set of options, a single IPv6 destination option could be 
defined that looked very similar to the GUE option flags and option fields.

However, this still leaves one major disadvantage: a fixed length field 
is inappropriate for many of the base options (packet IDs, the size of 
crypto material), which need to be able to scale in the future (see 
section 2.9/ "Hard-coded option lengths do not scale" in my separate 
tech review email).

*2.3/ Why only IPv6 next-header space is needed**
*
As already pointed out, the IPv6 next-header number-space is a superset 
of the IPv4 protocol number-space. But why are IPv4 and IPv6 the only 
alternatives being considered anyway? What about other network 
protocols, or a potential future version of IP?

1) GUE transport encap: One could answer this by saying that a GUE 
header chain must break into a chain of IPv4 & IPv6 extensions and 
protocols because UDP/GUE is only applicable with an IPv4 or IPv6 outer. 
But is this latter point really true? The GUE draft uses IPv4 or v6 as 
the outer, but it doesn't actually say GUE MUST only be used with an 
IPv4 or v6 outer. is it possible to encapsulate UDP with other network 
protocols than IPv4 & v6?

Well, theoretically UDP is possible over other network protocols, e.g. 
UDP over IPX [RFC1791], as long as a new UDP checksum pseudoheader is 
defined wrt the outer network protocol. But other network protocols 
(e.g. MPLS) use IP to demux transport protocols, because they don't 
support this themselves.

2) GUE network encap: This does not break into an existing chain of 
headers. The GUE header is always last in the outer header, so it only 
has to point to the first encapsulated header (which the GUE draft 
already points out can be Ethernet, IPv4, IPv6, L2TP, etc). However, it 
makes sense to reuse the same extension headers as the transport mode.


Whatever, to use UDP/GUE with another network protocol (e.g. a future 
version of IP), it would probably be best to define a protocol similar 
to UDP/GUE but modified to integrate with the new network protocol and 
designated by a different well-known port (or equivalent concept).

*2.4/ GUE: a potential solution to the IPv6 extension header discard 
**problem *

It is well-known that the IPv6 extension header arrangement doesn't work 
because it is incompatible with the value-system of many of those who 
build middleboxes. Writing the rules more forcefully [RFC7045 
<https://tools.ietf.org/html/rfc7045>] is unlikely to help. Experiments 
in [RFC7872 <https://tools.ietf.org/html/rfc7872>] report 11%-21% of 
packets with DOs are discarded and 39%-54% of packets with HBH. 
Theoretically DOs are not meant to be looked at until the destination, 
but 2%-47% of the above DO discards were dropped in another AS. Less 
surprisingly, 31%-51% of the HBH drops were in another AS.

The insight of the proposed approach is that router code has to be 
updated anyway to understand a new extension header or option. So why 
not also include instructions on where to look for the updated header. 
Then new extensions and option headers can be located where legacy code 
never looks (e.g. encapsulated within a UDP/GUE header). So legacy 
routers won't have to use their slow path to parse extension headers, 
dropping many in the process, and then only to discover they don't have 
the code to handle the ones they do not drop anyway.

For GUE, the main benefit is to allow destination options to be used 
without risk of the high levels of packet discard. However, beyond the 
immediate concerns of GUE, this approach is likely to give a workable 
way for HbH options to be usable without the risk of discard.

Of course, any hop-by-hop option within the GUE header will not actually 
be processed by every IP hop:
* it will be seen only by those hops that implement code to look within 
the GUE header for HbH options, and
* it will be processed only by the subset of those nodes that have logic 
to act on it.
Nonetheless, that is the same applicability as IPv6 Hop-by-Hop options 
today (but without the discard problem) because it has now been 
recognised that:
* the HbH extension header is only accessed by nodes configured to do so 
[draft-ietf-6man-rfc2460bis-05];
* and then an option is only processed by those nodes that have the 
logic to understand it.

Does this just defer the problem? Yes and no:
* Yes: Firewalls will quickly start to inspect within GUE 
encapsulations. And they will probably still drop anything unknown. 
Further, developers of firewall rules will still take years before they 
get round to assessing each new header, and until then they will block 
it - they have little incentive to behave otherwise. However, as I have 
said elsewhere, firewall bypass ought to be a non-goal for GUE.
* No: the purpose of hiding (encapsulating) extension headers within GUE 
is solely to prevent unintentional discard (by those who place 
performance above function), but still allow intentional discard (by 
those who place precaution above function).


This approach was originally proposed for GUT (see S2.4 of [Briscoe14]) 
to provide a structured way to add options to GUT and to solve the 
extension header discard problem ([Briscoe14] acknowledges that it was 
first proposed by Rob Hancock during SHIM4 discussions in 2008).

[Briscoe14] Briscoe, B., "Tunnelling through Inner Space 
<https://www.iab.org/wp-content/IAB-uploads/2014/12/semi2015_briscoe.pdf>," 
In: Proc. Internet Architecture Board (IAB) Stack Evolution in a 
Middlebox Internet (SEMI) Workshop Position Paper (January 2015)

*2.5/ Handling Hop-by-Hop headers in the original packet**
*
In the proposal to use IPv6 extension headers within the GUE header 
(S2.1/), the proposed structure of a GUE packet was shown for the 
network and transport encap cases. In both cases, IPv6 HbH extension 
headers (if present in the original inner packet) were shown between the 
outer IP header and the UDP/GUE header.

Why don't we encapsulate arriving HbH headers within the GUE header, 
given it has just been said that this would protect them from the high 
levels of discard reported in [RFC7872 
<https://tools.ietf.org/html/rfc7872>]?

The answer is that, if a packet already contains a HbH option when it 
arrives at a GUE encapsulator, GUE has to assume it was intended to be 
where it has been placed. A sender that wants to protect a HbH option 
within a GUE header has to use transport encap mode to put it there. 
That would also be the only way to tell a GUE network encap to include 
the HbH option in the outer (see C4.1/ "Ensuring certain GUE headers are 
copied when a GUE packet is tunnelled" later).

*2.6/ Solving the problem of tunnel fragmentation in IPv4**
*
When the outer is IPv4, S.4.1 of draft-ietf-nvo3-gue-extensions 
<https://tools.ietf.org/html/draft-herbert-gue-extensions-00#section-4.1> 
motivates segmenting the inner packet across the tunnel, rather than 
using IPv4 fragmentation. The motivation makes sense, but why design a 
completely new fragmentation protocol just for GUE? Why not simply use 
an IPv6 fragment header after the GUE header? This is a good example of 
how GUE could allow encap/decap to re-use IPv6 code, even if the outer 
header is IPv4.

Also see section A4.6/ "Is orig-proto field necessary in the 
fragmentation option?" in my separate review of GUE 'as-is', where I 
question the need for a 'orig-proto' field in the GUE-specific 
fragmentation option, which is the only difference from the IPv6 
Fragmentation extension header (other than the IPv6 TLV structure, 
except the L is inexplicably absent in the Frag case).

_*3/ CONTROL MESSAGES*__*
*_
*3.1/ Is the C flag necessary? *

The C flag essentially says "The encapsulated message is for the remote 
tunnel endpoint, not to be forwarded."
However, all the fields associated with the GUE header are already "for 
the remote tunnel endpoint". So the C flag really only says "Do not 
forward anything after decap." Surely that can be arranged by simply not 
providing anything to forward.

* In the network encap case, if there is no inner IP header, the GUE 
endpoint knows there is nothing to forward.
* In the transport encap case, if the GUE ctype/protocol field contains 
a valid Ctype, and there is no encapsulated payload beyond the GUE 
header, then surely  the GUE endpoint knows there is nothing to forward. 
Strictly there is still an IP header with no payload to forward. 
However, it would seem sensible for a transport decap to only forward an 
empty IP header when the Ctype/Proto field is "No next header" and not 
otherwise.

*3.2/ Reliable delivery of control messages**
*
Most tunnel control messages will need to be delivered reliably and 
in-order (e.g. key agreement, any configuration agreement, consistent 
application of connection semantics by both ends).

A couple of approaches for how to add reliable delivery:
a) Add reliability on top of UDP (complex): i.e. add a GUE control 
message type(s) for acknowledgements and some way to identify which 
message is being ack'd.
b) Get reliable delivery for free by using TCP for control messages that 
need it (by a GUE endpoint listening on TCP port 6080?).

In both cases, one would still need to tag the unordered flow of 
datagrams in the data plane to synchronize them with control commands 
(e.g. a key tag in each datagram to synch with a re-key arranged over 
the reliable control channel).

It is unlikely that many unreliable (UDP/GUE) control messages will be 
defined. None have been so far AFAICT, but I suspect the two we defined 
when designing GUT will be needed for GUE (S.3.5.1 of the GUE draft 
already mentions the latter as a possibility, but it is yet to be 
specified):
* keepalives (to hold open middlebox flow state)
* echo request/reply (testing the path, and testing responder support 
for GUE)

These examples illustrate that perhaps the only occasions when reliable 
control messages are not needed will be when trying to mimic a GUE data 
message. If reliable in-order delivery is needed, I suggest you use TCP 
- another example of not reinventing the wheel.

_*4/ TUNNELS IN TUNNELS*__*
*_
*4.1/ Ensuring certain GUE headers are copied when a GUE packet is 
tunnelled**
*
Section A4.4/ "Tunnels in tunnels" in my review of GUE poses the problem 
of how a GUE-in-GUE encapsulator knows which GUE options to copy and 
which not. Similarly, which IPv6 extension headers to copy and which not?

[Caveat: I originally promised that the solutions in each section would 
be independent of the other sections. In the following I will partially 
break that promise, and assume my proposed solution where GUE options 
have the semantics of IPv6 extension headers (section C2). As already 
pointed out in my review of GUE "as-is" (section A4.4), each GUE option 
would need a self-describing way to say whether it should be copied to 
the outer when tunnelled. But I cannot see an easy way to do this 
without completely changing GUE option flags. Whereas IPv6 extensions 
already have a categorisation system that can be extended.

Nonetheless, if GUE does not adopt my proposal to use IPv6 extension 
header semantics, it could develop its own similar option categorisation 
system.
]

Consider a naive rule (that could at least be used when encapsulating 
IPv6 extensions): Copy HbH options to the outer, but not DOs.

Problem: Even if GUE options were categorised as HbH or Dest. this naive 
rule would not be sufficient. For instance I believe a VNID would 
sometimes need to be copied to the outer (is that true?), but it is 
intended for the inner destination, so it would not be categorised as HbH.

Solution: We need a new category of node that is more specific than 
per-hop but less specific than destination-only. Answer: an 
encapsulator. In the context of IPv6, we need to define a new 
encapsulator options (EO) extension header.

Then we can have an improved rule: An encapsulator copies all HBH 
options and all EOs, but not DOs, i.e. each category of node acts on 
each type of extension header as follows:

         HBHO EO   DO
hop     Y    N    N
encap   Y    Y N
dest    Y    Y    Y

Note that any node is free to categorise itself as it sees fit, and 
anyway any node can look in any header it chooses to - these categories 
are purely so that a node can choose to process headers efficiently.

The EO would be appropriate for some other existing or potential 
extensions, E.g.:
* the tunnel encapsulation limit option (although everyone seems to have 
coped without it). RFC2460 says it should be carried as a DO, but admits 
that this is an awkward exception to the general rule (see Section 4.1.1 
bullet (a));
* the experimental ConEx option, which had to be defined as a DO, even 
though the designers wanted it to be copied to the outer by 
encapsulators [RFC7837 <https://tools.ietf.org/html/rfc7837>];
* Path layer extensions, as proposed in the PLUS/SPUD discussions;
* Possibly others?

Note that standardization of a new Encapsulation Options (EO) extension 
header would face a high bar [RFC6564]. However the above gives a strong 
case, and GUE provides a deployment context that should avoid the 
discard problem that RFC6564 was trying to address.


_*5/ CHECKSUMS*__*
*_
*5.1/ Avoiding double checksum coverage, except when essential for 
transition**
*
The GUE draft misses a trick that could offer a way through the checksum 
maze. Below, I give the solution first, then the rationale. The wire 
protocol proposal in the final section (C6/) gives concrete examples of 
how this could be achieved, but first I describe it in more abstract terms.

The proposed approach is designed for an Internet that should be 
gradually adopting zero UDP checksums, for IPv6 as well as v4 [RFC6935, 
RFC6936].

Initially (i.e. without the benefit of any cached path state about zero 
checksum support), a GUE encapsulator (network or transport):
a) MUST place a zero checksum in the outer UDP header, and
b) {MAY|SHOULD|MUST} (?) introduce a GUE-specific checksum header that 
solely covers the headers added by GUE encap.

For transport encap: (?) = MUST?
For network encap: (?) = MAY/SHOULD?

A GUE decapsulator (before decap):
a) MUST ignore the checksum in the UDP outer even if it is non-zero, on 
the basis that the GUE encap was required to set it to zero, so it could 
have only become non-zero at a middlebox that did not check whether it 
was zero before changing it.
b) MUST verify the GUE headers using the GUE-specific checksum header 
(if present).

If a GUE encapsulator discovers (e.g. using echo request/reply control 
message testing) that zero UDP checksums are being discarded on the path 
to the decapsulator, there could be two alternative approaches:

Alt.#1
On detecting a path that discards zero checksums, a GUE encap:
a) can place a full UDP checksum in the UDP outer just to traverse the 
path. Usually delta computation of the outer checksum from the 
encapsulated checksum is possible, at least for 2s complement checksums, 
i.e. not for SCTP or fragments.
b) still {MAY|SHOULD|MUST} (?) introduce a GUE-specific checksum header 
that solely covers the headers added by GUE encap.

A GUE decapsulator (before decap):
a) still MUST ignore the outer UDP checksum.
b) MUST verify the GUE headers using the GUE-specific checksum header 
(if present).

Alt.#2
On detecting a path that discards zero checksums, a GUE encap:
a) MUST include a full checksum in the UDP outer over the whole packet; and
b) MUST include a GUE-specific 'UDP Checksum Operative' flag to indicate 
that the outer UDP checksum is operative; and
c) does not need a GUE-specific checksum header.

A GUE decapsulator (before decap):
a) MUST check for the 'UDP Checksum Operative' flag;
b) If present, it MUST verify the full checksum in the UDP outer over 
the whole packet;
c) Otherwise it solely verifies the GUE headers using the GUE-specific 
checksum header (if present).

In checksum processing terms, the two alternatives are the same when the 
path supports zero checksums, but otherwise they compare as follows:

Alt.#1 Alt.#2
YYY     YYY   encap calculates full UDP checksum
  Y       -    encap calculates GUE checksum (a subset of the full checksum)
  -      YYY   decap verifies full UDP checksum
  Y       -    decap verifies GUE checksum (a subset of the full checksum)

More Y's are shown against the full UDP checksum calculations to 
visualise that they require more processing. Thus, when both ends are 
considered, it can be seen that Alt.#1 requires less processing.

Alt.#2 requires one extra bit of information than Alt.#1. This is not so 
significant for GUE 'as-is', which can just assign a spare flag.
However, it will be seen later (in section 6/ "Wire Protocol") that I 
have managed to squeeze all the commonly required fields, including the 
GUE header checksum, into the base GUE header, but only with Alt.#1. The 
extra bit for Alt.#2 makes my proposed new GUE base header either 
inefficient or clumsy.

Whatever, whether using the GUE wire protocol 'as-is' or as I propose, I 
would recommend Alt.#1.


The RFCs assume (without citing any evidence) that middleboxes will drop 
an IPv6/UDP datagram with a zero checksum, because such a datagram is 
disallowed [RFC2460].  It would be useful to check how frequently 
middleboxes drop zero checksum IPv6/UDP datagrams.

Precise GUE checksum coverage would depend on the GUE mode:
* In transport encap mode, it would not include the outer IP header (the 
inner transport is responsible for covering that), because GUE didn't 
add it.
* In network encap mode, it would include a pseudoheader covering the 
important parts of the outer IPv6 header (added by GUE), but this would 
not be needed for IPv4 (which has its own header checksum, even though 
the coverage of inner TCP and UDP checksums normally provide double 
coverage of the important fields in an outer IPv4 header as a pseudoheader).



The full rationale for the above solution starts further back in the 
checksum maze, and proceeds as follows:

As section 2.3 of RFC6936 points out, normally it is less important if a 
network encapsulation gets corrupted (resulting in discard or circuitous 
delivery) than if a transport encap gets corrupted (which could crash an 
innocent application). So, with GUE, particularly with the transport 
encap, it is more important to checksum the added GUE headers.

As a thought experiment, let's set aside middleboxes for a moment, and 
consider what checksumming would be necessary and sufficient for GUE, 
taking each encap mode separately:
* net encap: GUE should not need to take responsibility for checksumming 
the encapsulated packet. It adds IP+UDP+GUE+options, so the outer UDP 
checksum field solely needs to cover the headers that have been added 
(including the outer IP as a pseudoheader);
* transport encap: Because the UDP+GUE+options are inserted within an 
existing IP header, again, GUE should not be responsible for 
checksumming any of the pre-existing headers or payload, only the 
headers it adds (in this case not including the outer IP via a 
pseudoheader).

Still setting aside middleboxes, let's assume we use something like 
UDP-Lite to put a partial checksum in the UDP-Lite outer that just 
covers the added headers.
* The only processes that need to set or check the outer UDP-Lite 
checksum are at the GUE endpoints, which know how many headers they are 
adding or removing;
* Therefore, as long as checksum coverage coincides with the fields that 
GUE adds/removes, GUE needs no additional protocol field to communicate 
checksum coverage.

GUE decap, in either mode, can verify the outer UDP-Lite checksum before 
removing the headers that it covers. Once they are removed, the GUE 
decap does not need to verify or check the checksum of the remaining 
packet. It just forwards the packet, because the inner checksum will be 
checked when it reaches its destination (in the case of transport encap, 
this will happen when it is re-submitted to the same machine's stack).

This approach ensures GUE checksumming is simple and effective
* it always covers all the additional bits added by GUE tunnelling, and 
no more;
* there's no configuration choices - so no chance of bugs due to 
inconsistent understanding at the two ends of a tunnel.

Back to reality: middleboxes do exist. So some middleboxes will discard 
a UDP-Lite protocol number as unrecognized.

Given a GUE endpoint only ever talks to another GUE endpoint, we could 
use the same header structure as UDP-Lite, but with the UDP protocol 
number (17). However, this would run into the same middlebox problems 
that required UDP-Lite to need a different protocol number from UDP:
* a middlebox on the path might attempt to verify the outer checksum 
without knowing which fields GUE calculates this checksum from, because 
it will not be party to the same knowledge as GUE code. So it will 
discard all GUE packets thinking they are corrupt.
* other middleboxes (NATs in particular) might incrementally update the 
outer checksum to reflect any changes they make to other fields.

(I also couldn't find a reference to any study that measures the 
prevalence of these problems, which we ought to have before dancing 
around them.)

The insight from thinking about the maze of checksum problems in the 
above way is that the inner transport's checksum together with a 
checksum over the added GUE headers are always sufficient. The full UDP 
checksum is never necessary for data integrity, it is only needed to 
traverse middleboxes. So a GUE decap doesn't have to verify it. This was 
the train of thought that led to the solution proposed at the start of 
this section.


_*6/ WIRE PROTOCOL*__*
*_
*6.1/ A Proposed Redesign for the GUE Wire Protocol**
*
Taking all the above ideas together, the GUE wire protocol would be 
rather rudimentary (which is good):

                     0              15 16             31
                    +--------+--------+--------+--------+
                    |     Source      |   Destination   | \
                    |      Port       |      Port       |  |
                    +--------+--------+--------+--------+  > UDP header
                    |     Length      |  UDP Checksum   |  |
                    |                 | (default zero)  | /
                    +--------+--------+--------+--------+
                  / |  Next  |  GUE   |  GUE Checksum   | \ GUE base header
                 |  | Header |  HLen  |                 | /
                 |  +--------+--------+--------+--------+
                 |  |  Next  | Length |                 | \
    GUE Headers <   : header |        |                 :  |
                 |  +-----------------+                 +   > GUE extension headers
                 |  :                                   :  |  (optional)
                  \ |                                   | /
                    +-----------------------------------+


* There is deliberately no GUE version field; we can just use a 
different well-known port for a new version.
* UDP Length is as for standard UDP, i.e. it includes UDP header, GUE 
headers and the encapsulated payload.
* GUE HLen is the total length of all GUE extension headers, in 8 octet 
units, not including the GUE base header or the encapsulated payload.
* The length of the GUE base header is always 4 octets.
* Next Header has the same semantics as the IPv6 Next Header field.
* GUE would use the same identifier space as IPv6 for its extension headers
* The last Next Header field of the GUE Headers points to:
   - Network encap mode: the encapsulated network protocol (e.g. IPv4, 
IPv6, Ethernet)
   - Transport encap mode: the start of any chain of IPv6 extension 
headers in the original encapsulated packet, ending with the upper layer 
transport protocol
* If the GUE HLen=0, the last Next Header in the GUE Headers will be the 
one in the GUE base header.
* The GUE Checksum solely covers the UDP and GUE headers. Solely in the 
network encap case with an IPv6 outer, it also covers the outer IPv6 
pseudoheader.
* The sender can set the GUE checksum to zero to disable checksumming of 
the GUE headers.
* Any GUE decap ignores the UDP Checksum (whether or not it is zero) 
(assuming the Alt.#1 checksum solution)

*6.2/ 'UDP Checksum Operative' flag**
***
This section considers whether an efficient wire protocol could be 
designed for the Alt.#2 checksum solution (which is not recommended).

Even tho the 'UDP Checksum Operative' flag is solely for transition, it 
would be nice to squeeze it into the 4 octet GUE base header. However, I 
cannot currently find a way that pleases me. At first I thought GUE 
Checksum = 0x0000 could flag this, but that would be better used to mean 
the GUE checksum is disabled, which has to be possible whether the UDP 
checksum is operative or not. AFAICT, that leaves 4 possibilities:
* burn 8 octets of GUE extension header for one extra bit (inefficient, 
but for transition only);
* Cut GUE HLen to 7 bits, and redefine the highest significant bit as a 
'UDP Checksum Operative' flag (inelegant and for ever, even tho only 
needed for transition);
* Use the last 2 octets of the GUE base header for flags (with only the 
'UDP Checksum Operative' flag defined initially), and define a GUE 
extension header for the GUE checksum (inefficient for the common case).
* Register a special IPv6 Next Header value (S) solely for GUE, which 
redefines the GUE base header as:

                    +--------+--------+--------+--------+
                    |   S    |  GUE   |  Next  | Flags  | \ GUE base header
                    |        |  HLen  | Header |        | /
                    +--------+--------+--------+--------+


This last idea is a variant of the idea described earlier under C2.2/ 
"Hybrid of GUE option flags and IPv6 extension headers".

The recommended solution is to use none of these, because Alt.#1 is more 
efficient in checksum processing and requires no extra bit for a 'UDP 
Checksum Operative' flag.

*6.3/ Compressed UDP/GUE header: 'GUTless'**
*
Given the second 4 octets of the UDP header are redundant, it would be 
nice to be able to replace them with the proposed 4 octet GUE base 
header, as proposed for 'GUTless 
<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>' 
back in Feb 2010, and illustrated below. This would also preserve 
8-octet alignment. We could allocate a new protocol number for GUTless 
and use that in middlebox-free environments, as a more efficient drop-in 
replacement for UDP/GUE, but with the same GUE extension headers.

                     0              15 16             31
                    +--------+--------+--------+--------+
                    |     Source      |   Destination   | \
                    |      Port       |      Port       |  | GUTless base header
                    +--------+--------+--------+--------+  > with new protocol number
                    |  Next  | GUTless|     Checksum    |  | as a UDP replacement
                    | Header |  HLen  |                 | /
                    +--------+--------+--------+--------+
                    |  Next  | Length |                 | \
                    : header |        |                 :  |
                    +-----------------+                 +   > GUE extension headers
                    :                                   :  |  (optional)
                    |                                   | /
                    +-----------------------------------+


GUTless on a non-GUE port with a Next Header value of "no next header" 
(59) also serves the purpose of UDP-Lite.

That's "all" Folks


Bob

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/