[nvo3] Review ptC redesign of parts: draft-ietf-nvo3-gue-04
Bob Briscoe <ietf@bobbriscoe.net> Sat, 13 August 2016 13:54 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE6CB12D1DC for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:54:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5PttwNCDFEiM for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:54:51 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CEDE512D0C8 for <nvo3@ietf.org>; Sat, 13 Aug 2016 06:54:48 -0700 (PDT)
Received: from 203.137.112.87.dyn.plus.net ([87.112.137.203]:49568 helo=[192.168.0.7]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from <ietf@bobbriscoe.net>) id 1bYYx9-0002Ds-Ej; Sat, 13 Aug 2016 14:26:20 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Tom Herbert <tom@herbertland.com>, Lucy Yong <lucy.yong@huawei.com>, Osama Zia <osamaz@microsoft.com>
Message-ID: <132469ad-2ea2-5592-cc9d-96fafeff7a7e@bobbriscoe.net>
Date: Sat, 13 Aug 2016 14:26:18 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------F372248E8155FBA9960406F9"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/4pF2vgsspQdUlPDUy87xgeBI6SU>
Cc: "nvo3@ietf.org" <nvo3@ietf.org>
Subject: [nvo3] Review ptC redesign of parts: draft-ietf-nvo3-gue-04
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Aug 2016 13:54:56 -0000
Tom, Lucy, Osama, A) Technical Review of GUE 'As-is' B) Editorial Review C) Redesign of parts of GUE <--- This email Below I suggest a number of ideas for improving those aspects of GUE protocol that my review in PtA identified as problematic. Each main section gives a partial redesign that I believe can be adopted independently of the others. Except, section 6/ "Wire Protocol", which suggests a couple of wire protocol designs that neatly fold together all the other ideas. I guess you might be inclined to resist protocol changes if GUE is already deployed (e.g. in private networks). But we can just use a different well-known port for a new version (as suggested in section 6). *TABLE OF CONTENTS** * C/ REDESIGN OF PARTS OF GUE 1/ STATELESS CONNECTION SEMANTICS 1.1/ Suggested connection semantics solution to the middlebox traversal problem of GUE network encap 2/ PROTOCOL EXTENSIBILITY 2.1/ Reuse existing building blocks (extension headers); don't 'reinvent the wheel' 2.2/ Hybrid of GUE option flags and IPv6 extension headers? 2.3/ Why only IPv6 next-header space is needed 2.4/ GUE: a potential solution to the IPv6 extension header discard problem 2.5/ Handling Hob-by-Hop headers in the original packet 2.6/ Solving the problem of tunnel fragmentation in IPv4 3/ CONTROL MESSAGES 3.1/ Is the C flag necessary? 3.2/ Reliable delivery of control messages 4/ TUNNELS IN TUNNELS 4.1/ Ensuring certain GUE headers are copied when a GUE packet is tunnelled 5/ CHECKSUMS 5.1/ Avoiding double checksum coverage, except when essential for transition 6/ WIRE PROTOCOL 6.1/ A Proposed Redesign for the GUE Wire Protocol 6.2/ 'UDP Checksum Operative' flag 6.3/ Compressed UDP/GUE header: 'GUTless' *C/ REDESIGN OF PARTS OF GUE** * _*1/ STATELESS CONNECTION SEMANTICS*__* *_ *1.1/ Suggested connection semantics solution to the middlebox traversal problem of GUE network encap** **** *Sections 5.6.1 & 5.6.2 of the GUE draft require that GUE endpoints apply connection semantics if there is a middlebox between them (e.g. firewall or NAT). Given it is hard for tunnel endpoints to know whether they traverse a middlebox, but in many scenarios middleboxes are likely, this implies GUE will nearly always need to use connection semantics. However the draft does not say how to. A potential approach without connection state is outlined below. The proposed approach requires co-ordination between the tunnel endpoints. Therefore applicability will differ between encapsulation modes: * Network encapsulation: The set-up latency for coordination will usually be acceptable for a pair of network encapsulation peers, which will often form a long-lived relationship. * Transport encapsulation: This mode will tend to be used more opportunistically. That is, for a communication between hosts A and B, a GUE daemon on host A can derive the address of a potential GUE daemon on host B, by just sending to port 6080 at the same IP address (B). In such opportunistic cases, any set-up latency for co-ordination between the GUE daemons is likely to be unacceptable. Fortunately, in transport encap mode, the two ends can use per-flow state rather than coordination to maintain connection semantics (see A3.2/ "Transport encap with Connection Semantics: Flow state management" in my separate email review of GUE). Therefore, the following approach is theoretically applicable for both network and transport encapsulation, but it is less likely to be used for transport encap. * GUE network encap/decap endpoints agree between them: a) which end will use fixed GUE port (port G) b) that the other end, E, will use an ephemeral port (port e) for flow entropy c) the hashing approach they will both use between inner and outer * The encap (either end): - internally normalises the inner flow info (S.5.11.1) to the E-G direction, then - hashes it to port e. * Then adds a UDP header with src:dest ports: - e:G at the 'E' end - G:e at the 'G' end. Notes: * It shouldn't matter which end uses the fixed port and which uses the ephemeral port. If there is a middlebox such as a firewall or NAT between the GUE endpoints, it shouldn't matter if the egress direction from private to public runs from G to e (rather than e:G). As long as the first datagram of a connection is in the egress direction allowed by the middlebox, it will then open the pin-hole required for the return datagram (e:G). * Coordinating the hashing algorithms is the difficult part of this proposal. It is necessary so that both ends consistently use the same random ephemeral port for each inner connection. Co-ordinating hash algorithms is not really technically difficult; it is "merely" difficult in that there are many choices none of which have any particular merit relative to others, but no-one wants to use someone else's choice even though it doesn't really matter - a typical standardization problem! Standardization could be applied at different levels: - The simplest but naive approach would be for all GUE implementations to use one standard hashing approach for each inner protocol, defining which inner identifiers and which hashing algorithm to use. - A more likely approach would be similar to that used to negotiate consistent use of encryption and decryption between two endpoints: each hashing algorithm and each choice of inner protocol IDs would have to be given standardized names, then a negotiation protocol would be needed for each pair of tunnel endpoints to agree a common hashing approach to be used for packets passing between them. - In both cases, the endpoints would need to agree on a value to use to seed the hashing algorithm, and they would occasionally need to reseed it. _*2/ PROTOCOL EXTENSIBILITY*__* *_ *2.1/ Reuse existing building blocks (extension headers); don't 'reinvent the wheel'** ** *A whole new protocol framework is included in GUE: a) using flags to locate some option fields that are associated with the data; b) using the C flag and the ctype field to signal that the whole datagram is a tunnel control datagram. The review of GUE "as-is" in my separate email has identified a number of problems with this attempt to "reinvent the wheel" under the following headings: * A2.8/ "Extensibility of the flags and optional fields scheme: doesn't work" * A2.9/ "Hard-coded option lengths do not scale" * A2.3/ "No need to interpret the protocol field relative to IPv4" * A2.4/ "No need to restrict interpretation of the protocol field" * A2.5/ "Missed opportunity to liberalise interpretation of the protocol field" Instead, I suggest that a GUE encapsulation could use the existing IPv6 building blocks, rather than reinventing the wheel. The resulting structure of a packet encapsulated by GUE would be as follows: Network encap: + IP (v4 or v6) + Copied IPv6 HbH Extension header [if present] + UDP + GUE + GUE IPv6 extension header(s) [optional] IP (v4 or v6) IPv6 Extension headers [if present] Transport Protocol header Transport encap: IP (v4 or v6) IPv6 HbH Extension header [if present] + UDP + GUE + GUE IPv6 extension header(s) [optional] Remaining IPv6 Extension headers [if present] Transport Protocol header The detailed behaviour is as follows: * The '+' signs indicate headers added by a GUE encapsulator * The GUE header includes a header length field to identify the dividing line between headers added during GUE encapsulation and those already in the native packet. This identifies the headers that: a) are addressed to the GUE decapsulator, rather than the destination of the encapsulated packet b) need to be covered by a GUE checksum; and c) need to be removed by a GUE decapsulator. * In the following, we shall call any IPv6 extension headers that fall within the GUE header length "GUE extension headers". * The initial GUE header includes a next header field to start the new header chain. * The number-space for IPv6 next-headers inherits the IPv4 Protocol number allocation-space [RFC2460] * The GUE spec should be highly judicious in specifying which extension headers cannot be combined with others. Some implementations can be more creative than others, and a spec ought not to prevent that. If an implementation does not have the logic to understand a particular extension header or option in a particular position, the implementation can tell: - whether to ignore the option or discard the packet; - how long the unknown option or header is, so it can move on to the next one if appropriate; - whether an unknown option or header is immutable in transit, so that it can be included in integrity/authentication coverage. This is because every IPv6 extension header and option starts with a common structure that gives all this information. * The chain of next header fields added with GUE ends by chaining into the outermost header encapsulated by the whole set of added GUE headers (true for both network and transport encap). * For transport encapsulation, when the set of GUE headers (those marked '+') are decapsulated, the chain is stitched back together in the reverse of the encap process. I.e. the UDP Protocol number (17) in the outer IP header is replaced by the last next-header of the GUE extensions headers. * Normally, an application identifies any headers encapsulated by a transport protocol implicitly from the server port number, not by a next header value in the transport protocol header. For instance, an application talking to or from port 80 uses HTTP, even tho there is no next-header pointing to HTTP. GUE is both the same and subtly different: - the GUE port number (6080) implies that the "application protocol" inside the UDP header is GUE - there is no next-header pointing to GUE. - However, a GUE header itself has a next header field - the ctype/proto field - that can start a new chain of extension headers. Currently this points to the next header beyond any GUE options. Here, I propose the semantics of this field should be redefined so that Hlen determines how much of the chain is within the GUE header. If GUE Hlen is zero, the GUE next-header will naturally still point to the first protocol or extension header after the GUE header. * Details would have to be defined, e.g. alignment of Hlen (IPv6 extension headers use 8B alignment); Error conditions, e.g. when HLen ends in the middle of a header, etc. IPv6 extension headers and options could be used: * instead of options (e.g. those described in draft-herbert-gue-extensions ) * for private optional headers (using experimental IPv6 option types) * for control messages, in place of using the 'C' flag. An IPv6 extension header already exists for some of the desired GUE options (e.g. the IPv6 fragmentation extension header). * Where GUE needs an option for which an IPv6 extension header does not already exist, a new option type can be defined, typically as a destination option, to be included within a Destination Options extension header (to be acted on by the destination of the tunnel outer). For example: - a new destination option would be appropriate for a virtual network ID (VNID). * In some cases, an IPv6 extension header with a similar function to a GUE option exists (e.g. AH/ESP integrity and authentication and/or confidentiality), but its semantics might not meet the needs of GUE. It might be possible to define GUE-specific semantics for such a field when used as a "GUE extension header", i.e. within the range of the GUE header length. * Tunnel control datagrams would solely consist of a GUE extension header(s) or a v6-ICMP message, without anything outside the GUE header, by ending the header chain with the No Next Header code. Disadvantages: * Using the type-length-value (TLV) scheme of IPv6 consumes 2B overhead for each option within DO (for the type and length), plus 2B for the DO extension header itself (and a DO extension header is padded to multiples of 8B). In contrast GUE optional fields use 1-2b overhead for the flag in the GUE header. * You don't get random access (but do you need that, e.g. in any of the cases defined so far?) * Raises the standardisation bar for new GUE options. For each new option for GUE, the standardization process is likely to be more onerous given the implications of creating a new IPv6 option are greater than the implications of just creating a new GUE option. Advantages: * Removes the need for the flags field and the C flag, making more space (e.g. 8-bits) for the otherwise worryingly small (5-bit) Hlen field (see /6 "Wire Protocol" for an alternative); * What to do with an option that an implementation doesn't understand comes for free with IPv6 options - whether to drop, ignore or return an error; - whether the field is mutable; and - how much space to jump to parse the next option are all determined by the 2-bit act code, the 1-bit chg code and the length field in common positions at the start of each option type; * Avoids extra design effort, security analysis, implementation bugs; * Code re-use. * No change in the semantics of an IPv6 DO, because they are already intended to be acted on solely by a tunnel decap [RFC2473] * Provides a principled way to implement new IPv6 extensions without the heavy discard problem [RFC7872 <https://tools.ietf.org/html/rfc7872>] (see section 2.4/ "GUE: a potential solution to the IPv6 extension header discard problem" below) * Lowers the standardization bar for new IPv6 options. Given GUE is intended to be a generic facility, it could become the best way to extend IP (v4 & v6). *2.2/ Hybrid of GUE option flags and IPv6 extension headers?** * A hybrid approach could be adopted to save header space but still allow extensibility: * Extensions and options could be divided [Briscoe14] into: - those known at the time GUE is standardized (options) and - those introduced afterwards (extensions) * Extensions would need to be self-describing, but options would not - all GUE code could be required to support the base set of options, so it could be hard-coded with their lengths and properties. * For the base set of options, a single IPv6 destination option could be defined that looked very similar to the GUE option flags and option fields. However, this still leaves one major disadvantage: a fixed length field is inappropriate for many of the base options (packet IDs, the size of crypto material), which need to be able to scale in the future (see section 2.9/ "Hard-coded option lengths do not scale" in my separate tech review email). *2.3/ Why only IPv6 next-header space is needed** * As already pointed out, the IPv6 next-header number-space is a superset of the IPv4 protocol number-space. But why are IPv4 and IPv6 the only alternatives being considered anyway? What about other network protocols, or a potential future version of IP? 1) GUE transport encap: One could answer this by saying that a GUE header chain must break into a chain of IPv4 & IPv6 extensions and protocols because UDP/GUE is only applicable with an IPv4 or IPv6 outer. But is this latter point really true? The GUE draft uses IPv4 or v6 as the outer, but it doesn't actually say GUE MUST only be used with an IPv4 or v6 outer. is it possible to encapsulate UDP with other network protocols than IPv4 & v6? Well, theoretically UDP is possible over other network protocols, e.g. UDP over IPX [RFC1791], as long as a new UDP checksum pseudoheader is defined wrt the outer network protocol. But other network protocols (e.g. MPLS) use IP to demux transport protocols, because they don't support this themselves. 2) GUE network encap: This does not break into an existing chain of headers. The GUE header is always last in the outer header, so it only has to point to the first encapsulated header (which the GUE draft already points out can be Ethernet, IPv4, IPv6, L2TP, etc). However, it makes sense to reuse the same extension headers as the transport mode. Whatever, to use UDP/GUE with another network protocol (e.g. a future version of IP), it would probably be best to define a protocol similar to UDP/GUE but modified to integrate with the new network protocol and designated by a different well-known port (or equivalent concept). *2.4/ GUE: a potential solution to the IPv6 extension header discard **problem * It is well-known that the IPv6 extension header arrangement doesn't work because it is incompatible with the value-system of many of those who build middleboxes. Writing the rules more forcefully [RFC7045 <https://tools.ietf.org/html/rfc7045>] is unlikely to help. Experiments in [RFC7872 <https://tools.ietf.org/html/rfc7872>] report 11%-21% of packets with DOs are discarded and 39%-54% of packets with HBH. Theoretically DOs are not meant to be looked at until the destination, but 2%-47% of the above DO discards were dropped in another AS. Less surprisingly, 31%-51% of the HBH drops were in another AS. The insight of the proposed approach is that router code has to be updated anyway to understand a new extension header or option. So why not also include instructions on where to look for the updated header. Then new extensions and option headers can be located where legacy code never looks (e.g. encapsulated within a UDP/GUE header). So legacy routers won't have to use their slow path to parse extension headers, dropping many in the process, and then only to discover they don't have the code to handle the ones they do not drop anyway. For GUE, the main benefit is to allow destination options to be used without risk of the high levels of packet discard. However, beyond the immediate concerns of GUE, this approach is likely to give a workable way for HbH options to be usable without the risk of discard. Of course, any hop-by-hop option within the GUE header will not actually be processed by every IP hop: * it will be seen only by those hops that implement code to look within the GUE header for HbH options, and * it will be processed only by the subset of those nodes that have logic to act on it. Nonetheless, that is the same applicability as IPv6 Hop-by-Hop options today (but without the discard problem) because it has now been recognised that: * the HbH extension header is only accessed by nodes configured to do so [draft-ietf-6man-rfc2460bis-05]; * and then an option is only processed by those nodes that have the logic to understand it. Does this just defer the problem? Yes and no: * Yes: Firewalls will quickly start to inspect within GUE encapsulations. And they will probably still drop anything unknown. Further, developers of firewall rules will still take years before they get round to assessing each new header, and until then they will block it - they have little incentive to behave otherwise. However, as I have said elsewhere, firewall bypass ought to be a non-goal for GUE. * No: the purpose of hiding (encapsulating) extension headers within GUE is solely to prevent unintentional discard (by those who place performance above function), but still allow intentional discard (by those who place precaution above function). This approach was originally proposed for GUT (see S2.4 of [Briscoe14]) to provide a structured way to add options to GUT and to solve the extension header discard problem ([Briscoe14] acknowledges that it was first proposed by Rob Hancock during SHIM4 discussions in 2008). [Briscoe14] Briscoe, B., "Tunnelling through Inner Space <https://www.iab.org/wp-content/IAB-uploads/2014/12/semi2015_briscoe.pdf>," In: Proc. Internet Architecture Board (IAB) Stack Evolution in a Middlebox Internet (SEMI) Workshop Position Paper (January 2015) *2.5/ Handling Hop-by-Hop headers in the original packet** * In the proposal to use IPv6 extension headers within the GUE header (S2.1/), the proposed structure of a GUE packet was shown for the network and transport encap cases. In both cases, IPv6 HbH extension headers (if present in the original inner packet) were shown between the outer IP header and the UDP/GUE header. Why don't we encapsulate arriving HbH headers within the GUE header, given it has just been said that this would protect them from the high levels of discard reported in [RFC7872 <https://tools.ietf.org/html/rfc7872>]? The answer is that, if a packet already contains a HbH option when it arrives at a GUE encapsulator, GUE has to assume it was intended to be where it has been placed. A sender that wants to protect a HbH option within a GUE header has to use transport encap mode to put it there. That would also be the only way to tell a GUE network encap to include the HbH option in the outer (see C4.1/ "Ensuring certain GUE headers are copied when a GUE packet is tunnelled" later). *2.6/ Solving the problem of tunnel fragmentation in IPv4** * When the outer is IPv4, S.4.1 of draft-ietf-nvo3-gue-extensions <https://tools.ietf.org/html/draft-herbert-gue-extensions-00#section-4.1> motivates segmenting the inner packet across the tunnel, rather than using IPv4 fragmentation. The motivation makes sense, but why design a completely new fragmentation protocol just for GUE? Why not simply use an IPv6 fragment header after the GUE header? This is a good example of how GUE could allow encap/decap to re-use IPv6 code, even if the outer header is IPv4. Also see section A4.6/ "Is orig-proto field necessary in the fragmentation option?" in my separate review of GUE 'as-is', where I question the need for a 'orig-proto' field in the GUE-specific fragmentation option, which is the only difference from the IPv6 Fragmentation extension header (other than the IPv6 TLV structure, except the L is inexplicably absent in the Frag case). _*3/ CONTROL MESSAGES*__* *_ *3.1/ Is the C flag necessary? * The C flag essentially says "The encapsulated message is for the remote tunnel endpoint, not to be forwarded." However, all the fields associated with the GUE header are already "for the remote tunnel endpoint". So the C flag really only says "Do not forward anything after decap." Surely that can be arranged by simply not providing anything to forward. * In the network encap case, if there is no inner IP header, the GUE endpoint knows there is nothing to forward. * In the transport encap case, if the GUE ctype/protocol field contains a valid Ctype, and there is no encapsulated payload beyond the GUE header, then surely the GUE endpoint knows there is nothing to forward. Strictly there is still an IP header with no payload to forward. However, it would seem sensible for a transport decap to only forward an empty IP header when the Ctype/Proto field is "No next header" and not otherwise. *3.2/ Reliable delivery of control messages** * Most tunnel control messages will need to be delivered reliably and in-order (e.g. key agreement, any configuration agreement, consistent application of connection semantics by both ends). A couple of approaches for how to add reliable delivery: a) Add reliability on top of UDP (complex): i.e. add a GUE control message type(s) for acknowledgements and some way to identify which message is being ack'd. b) Get reliable delivery for free by using TCP for control messages that need it (by a GUE endpoint listening on TCP port 6080?). In both cases, one would still need to tag the unordered flow of datagrams in the data plane to synchronize them with control commands (e.g. a key tag in each datagram to synch with a re-key arranged over the reliable control channel). It is unlikely that many unreliable (UDP/GUE) control messages will be defined. None have been so far AFAICT, but I suspect the two we defined when designing GUT will be needed for GUE (S.3.5.1 of the GUE draft already mentions the latter as a possibility, but it is yet to be specified): * keepalives (to hold open middlebox flow state) * echo request/reply (testing the path, and testing responder support for GUE) These examples illustrate that perhaps the only occasions when reliable control messages are not needed will be when trying to mimic a GUE data message. If reliable in-order delivery is needed, I suggest you use TCP - another example of not reinventing the wheel. _*4/ TUNNELS IN TUNNELS*__* *_ *4.1/ Ensuring certain GUE headers are copied when a GUE packet is tunnelled** * Section A4.4/ "Tunnels in tunnels" in my review of GUE poses the problem of how a GUE-in-GUE encapsulator knows which GUE options to copy and which not. Similarly, which IPv6 extension headers to copy and which not? [Caveat: I originally promised that the solutions in each section would be independent of the other sections. In the following I will partially break that promise, and assume my proposed solution where GUE options have the semantics of IPv6 extension headers (section C2). As already pointed out in my review of GUE "as-is" (section A4.4), each GUE option would need a self-describing way to say whether it should be copied to the outer when tunnelled. But I cannot see an easy way to do this without completely changing GUE option flags. Whereas IPv6 extensions already have a categorisation system that can be extended. Nonetheless, if GUE does not adopt my proposal to use IPv6 extension header semantics, it could develop its own similar option categorisation system. ] Consider a naive rule (that could at least be used when encapsulating IPv6 extensions): Copy HbH options to the outer, but not DOs. Problem: Even if GUE options were categorised as HbH or Dest. this naive rule would not be sufficient. For instance I believe a VNID would sometimes need to be copied to the outer (is that true?), but it is intended for the inner destination, so it would not be categorised as HbH. Solution: We need a new category of node that is more specific than per-hop but less specific than destination-only. Answer: an encapsulator. In the context of IPv6, we need to define a new encapsulator options (EO) extension header. Then we can have an improved rule: An encapsulator copies all HBH options and all EOs, but not DOs, i.e. each category of node acts on each type of extension header as follows: HBHO EO DO hop Y N N encap Y Y N dest Y Y Y Note that any node is free to categorise itself as it sees fit, and anyway any node can look in any header it chooses to - these categories are purely so that a node can choose to process headers efficiently. The EO would be appropriate for some other existing or potential extensions, E.g.: * the tunnel encapsulation limit option (although everyone seems to have coped without it). RFC2460 says it should be carried as a DO, but admits that this is an awkward exception to the general rule (see Section 4.1.1 bullet (a)); * the experimental ConEx option, which had to be defined as a DO, even though the designers wanted it to be copied to the outer by encapsulators [RFC7837 <https://tools.ietf.org/html/rfc7837>]; * Path layer extensions, as proposed in the PLUS/SPUD discussions; * Possibly others? Note that standardization of a new Encapsulation Options (EO) extension header would face a high bar [RFC6564]. However the above gives a strong case, and GUE provides a deployment context that should avoid the discard problem that RFC6564 was trying to address. _*5/ CHECKSUMS*__* *_ *5.1/ Avoiding double checksum coverage, except when essential for transition** * The GUE draft misses a trick that could offer a way through the checksum maze. Below, I give the solution first, then the rationale. The wire protocol proposal in the final section (C6/) gives concrete examples of how this could be achieved, but first I describe it in more abstract terms. The proposed approach is designed for an Internet that should be gradually adopting zero UDP checksums, for IPv6 as well as v4 [RFC6935, RFC6936]. Initially (i.e. without the benefit of any cached path state about zero checksum support), a GUE encapsulator (network or transport): a) MUST place a zero checksum in the outer UDP header, and b) {MAY|SHOULD|MUST} (?) introduce a GUE-specific checksum header that solely covers the headers added by GUE encap. For transport encap: (?) = MUST? For network encap: (?) = MAY/SHOULD? A GUE decapsulator (before decap): a) MUST ignore the checksum in the UDP outer even if it is non-zero, on the basis that the GUE encap was required to set it to zero, so it could have only become non-zero at a middlebox that did not check whether it was zero before changing it. b) MUST verify the GUE headers using the GUE-specific checksum header (if present). If a GUE encapsulator discovers (e.g. using echo request/reply control message testing) that zero UDP checksums are being discarded on the path to the decapsulator, there could be two alternative approaches: Alt.#1 On detecting a path that discards zero checksums, a GUE encap: a) can place a full UDP checksum in the UDP outer just to traverse the path. Usually delta computation of the outer checksum from the encapsulated checksum is possible, at least for 2s complement checksums, i.e. not for SCTP or fragments. b) still {MAY|SHOULD|MUST} (?) introduce a GUE-specific checksum header that solely covers the headers added by GUE encap. A GUE decapsulator (before decap): a) still MUST ignore the outer UDP checksum. b) MUST verify the GUE headers using the GUE-specific checksum header (if present). Alt.#2 On detecting a path that discards zero checksums, a GUE encap: a) MUST include a full checksum in the UDP outer over the whole packet; and b) MUST include a GUE-specific 'UDP Checksum Operative' flag to indicate that the outer UDP checksum is operative; and c) does not need a GUE-specific checksum header. A GUE decapsulator (before decap): a) MUST check for the 'UDP Checksum Operative' flag; b) If present, it MUST verify the full checksum in the UDP outer over the whole packet; c) Otherwise it solely verifies the GUE headers using the GUE-specific checksum header (if present). In checksum processing terms, the two alternatives are the same when the path supports zero checksums, but otherwise they compare as follows: Alt.#1 Alt.#2 YYY YYY encap calculates full UDP checksum Y - encap calculates GUE checksum (a subset of the full checksum) - YYY decap verifies full UDP checksum Y - decap verifies GUE checksum (a subset of the full checksum) More Y's are shown against the full UDP checksum calculations to visualise that they require more processing. Thus, when both ends are considered, it can be seen that Alt.#1 requires less processing. Alt.#2 requires one extra bit of information than Alt.#1. This is not so significant for GUE 'as-is', which can just assign a spare flag. However, it will be seen later (in section 6/ "Wire Protocol") that I have managed to squeeze all the commonly required fields, including the GUE header checksum, into the base GUE header, but only with Alt.#1. The extra bit for Alt.#2 makes my proposed new GUE base header either inefficient or clumsy. Whatever, whether using the GUE wire protocol 'as-is' or as I propose, I would recommend Alt.#1. The RFCs assume (without citing any evidence) that middleboxes will drop an IPv6/UDP datagram with a zero checksum, because such a datagram is disallowed [RFC2460]. It would be useful to check how frequently middleboxes drop zero checksum IPv6/UDP datagrams. Precise GUE checksum coverage would depend on the GUE mode: * In transport encap mode, it would not include the outer IP header (the inner transport is responsible for covering that), because GUE didn't add it. * In network encap mode, it would include a pseudoheader covering the important parts of the outer IPv6 header (added by GUE), but this would not be needed for IPv4 (which has its own header checksum, even though the coverage of inner TCP and UDP checksums normally provide double coverage of the important fields in an outer IPv4 header as a pseudoheader). The full rationale for the above solution starts further back in the checksum maze, and proceeds as follows: As section 2.3 of RFC6936 points out, normally it is less important if a network encapsulation gets corrupted (resulting in discard or circuitous delivery) than if a transport encap gets corrupted (which could crash an innocent application). So, with GUE, particularly with the transport encap, it is more important to checksum the added GUE headers. As a thought experiment, let's set aside middleboxes for a moment, and consider what checksumming would be necessary and sufficient for GUE, taking each encap mode separately: * net encap: GUE should not need to take responsibility for checksumming the encapsulated packet. It adds IP+UDP+GUE+options, so the outer UDP checksum field solely needs to cover the headers that have been added (including the outer IP as a pseudoheader); * transport encap: Because the UDP+GUE+options are inserted within an existing IP header, again, GUE should not be responsible for checksumming any of the pre-existing headers or payload, only the headers it adds (in this case not including the outer IP via a pseudoheader). Still setting aside middleboxes, let's assume we use something like UDP-Lite to put a partial checksum in the UDP-Lite outer that just covers the added headers. * The only processes that need to set or check the outer UDP-Lite checksum are at the GUE endpoints, which know how many headers they are adding or removing; * Therefore, as long as checksum coverage coincides with the fields that GUE adds/removes, GUE needs no additional protocol field to communicate checksum coverage. GUE decap, in either mode, can verify the outer UDP-Lite checksum before removing the headers that it covers. Once they are removed, the GUE decap does not need to verify or check the checksum of the remaining packet. It just forwards the packet, because the inner checksum will be checked when it reaches its destination (in the case of transport encap, this will happen when it is re-submitted to the same machine's stack). This approach ensures GUE checksumming is simple and effective * it always covers all the additional bits added by GUE tunnelling, and no more; * there's no configuration choices - so no chance of bugs due to inconsistent understanding at the two ends of a tunnel. Back to reality: middleboxes do exist. So some middleboxes will discard a UDP-Lite protocol number as unrecognized. Given a GUE endpoint only ever talks to another GUE endpoint, we could use the same header structure as UDP-Lite, but with the UDP protocol number (17). However, this would run into the same middlebox problems that required UDP-Lite to need a different protocol number from UDP: * a middlebox on the path might attempt to verify the outer checksum without knowing which fields GUE calculates this checksum from, because it will not be party to the same knowledge as GUE code. So it will discard all GUE packets thinking they are corrupt. * other middleboxes (NATs in particular) might incrementally update the outer checksum to reflect any changes they make to other fields. (I also couldn't find a reference to any study that measures the prevalence of these problems, which we ought to have before dancing around them.) The insight from thinking about the maze of checksum problems in the above way is that the inner transport's checksum together with a checksum over the added GUE headers are always sufficient. The full UDP checksum is never necessary for data integrity, it is only needed to traverse middleboxes. So a GUE decap doesn't have to verify it. This was the train of thought that led to the solution proposed at the start of this section. _*6/ WIRE PROTOCOL*__* *_ *6.1/ A Proposed Redesign for the GUE Wire Protocol** * Taking all the above ideas together, the GUE wire protocol would be rather rudimentary (which is good): 0 15 16 31 +--------+--------+--------+--------+ | Source | Destination | \ | Port | Port | | +--------+--------+--------+--------+ > UDP header | Length | UDP Checksum | | | | (default zero) | / +--------+--------+--------+--------+ / | Next | GUE | GUE Checksum | \ GUE base header | | Header | HLen | | / | +--------+--------+--------+--------+ | | Next | Length | | \ GUE Headers < : header | | : | | +-----------------+ + > GUE extension headers | : : | (optional) \ | | / +-----------------------------------+ * There is deliberately no GUE version field; we can just use a different well-known port for a new version. * UDP Length is as for standard UDP, i.e. it includes UDP header, GUE headers and the encapsulated payload. * GUE HLen is the total length of all GUE extension headers, in 8 octet units, not including the GUE base header or the encapsulated payload. * The length of the GUE base header is always 4 octets. * Next Header has the same semantics as the IPv6 Next Header field. * GUE would use the same identifier space as IPv6 for its extension headers * The last Next Header field of the GUE Headers points to: - Network encap mode: the encapsulated network protocol (e.g. IPv4, IPv6, Ethernet) - Transport encap mode: the start of any chain of IPv6 extension headers in the original encapsulated packet, ending with the upper layer transport protocol * If the GUE HLen=0, the last Next Header in the GUE Headers will be the one in the GUE base header. * The GUE Checksum solely covers the UDP and GUE headers. Solely in the network encap case with an IPv6 outer, it also covers the outer IPv6 pseudoheader. * The sender can set the GUE checksum to zero to disable checksumming of the GUE headers. * Any GUE decap ignores the UDP Checksum (whether or not it is zero) (assuming the Alt.#1 checksum solution) *6.2/ 'UDP Checksum Operative' flag** *** This section considers whether an efficient wire protocol could be designed for the Alt.#2 checksum solution (which is not recommended). Even tho the 'UDP Checksum Operative' flag is solely for transition, it would be nice to squeeze it into the 4 octet GUE base header. However, I cannot currently find a way that pleases me. At first I thought GUE Checksum = 0x0000 could flag this, but that would be better used to mean the GUE checksum is disabled, which has to be possible whether the UDP checksum is operative or not. AFAICT, that leaves 4 possibilities: * burn 8 octets of GUE extension header for one extra bit (inefficient, but for transition only); * Cut GUE HLen to 7 bits, and redefine the highest significant bit as a 'UDP Checksum Operative' flag (inelegant and for ever, even tho only needed for transition); * Use the last 2 octets of the GUE base header for flags (with only the 'UDP Checksum Operative' flag defined initially), and define a GUE extension header for the GUE checksum (inefficient for the common case). * Register a special IPv6 Next Header value (S) solely for GUE, which redefines the GUE base header as: +--------+--------+--------+--------+ | S | GUE | Next | Flags | \ GUE base header | | HLen | Header | | / +--------+--------+--------+--------+ This last idea is a variant of the idea described earlier under C2.2/ "Hybrid of GUE option flags and IPv6 extension headers". The recommended solution is to use none of these, because Alt.#1 is more efficient in checksum processing and requires no extra bit for a 'UDP Checksum Operative' flag. *6.3/ Compressed UDP/GUE header: 'GUTless'** * Given the second 4 octets of the UDP header are redundant, it would be nice to be able to replace them with the proposed 4 octet GUE base header, as proposed for 'GUTless <https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>' back in Feb 2010, and illustrated below. This would also preserve 8-octet alignment. We could allocate a new protocol number for GUTless and use that in middlebox-free environments, as a more efficient drop-in replacement for UDP/GUE, but with the same GUE extension headers. 0 15 16 31 +--------+--------+--------+--------+ | Source | Destination | \ | Port | Port | | GUTless base header +--------+--------+--------+--------+ > with new protocol number | Next | GUTless| Checksum | | as a UDP replacement | Header | HLen | | / +--------+--------+--------+--------+ | Next | Length | | \ : header | | : | +-----------------+ + > GUE extension headers : : | (optional) | | / +-----------------------------------+ GUTless on a non-GUE port with a Next Header value of "no next header" (59) also serves the purpose of UDP-Lite. That's "all" Folks Bob -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- [nvo3] Review ptC redesign of parts: draft-ietf-n… Bob Briscoe
- Re: [nvo3] Review ptC redesign of parts: draft-ie… Tom Herbert