[nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04
Bob Briscoe <ietf@bobbriscoe.net> Sat, 13 August 2016 13:25 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9347412D1D1 for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:25:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.89
X-Spam-Level:
X-Spam-Status: No, score=-1.89 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sTP7_F-N84Fe for <nvo3@ietfa.amsl.com>; Sat, 13 Aug 2016 06:25:40 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 85CBF12D192 for <nvo3@ietf.org>; Sat, 13 Aug 2016 06:25:37 -0700 (PDT)
Received: from 203.137.112.87.dyn.plus.net ([87.112.137.203]:49564 helo=[192.168.0.7]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from <ietf@bobbriscoe.net>) id 1bYYwP-00029s-0I; Sat, 13 Aug 2016 14:25:34 +0100
To: Tom Herbert <tom@herbertland.com>, Lucy Yong <lucy.yong@huawei.com>, Osama Zia <osamaz@microsoft.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <67e76ab1-2f5b-4906-4cce-f7c176fd49a0@bobbriscoe.net>
Date: Sat, 13 Aug 2016 14:25:31 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------54F5A56CFB0EC6E1682648A0"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/U6ETZ2Ohf0jLOB3o5wRdDKtAfqU>
Cc: "nvo3@ietf.org" <nvo3@ietf.org>
Subject: [nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nvo3/>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Aug 2016 13:25:48 -0000
Tom, Lucy, Osama, This draft looks like it could become important, so I wanted to review it comprehensively. Particularly given my experience contributing to the design of Generic UDP Tunneling (GUT; draft-manner-tsvwg-gut-02 (expired Jan 2011)), which is very similar - as the GUE draft acknowledges. In preparation for this review: * I re-read all the (very useful) tsvwg mailing list comments about GUT * I had to read a couple of dozen references (to catch up on the last few years in this area) * it's required some pretty deep thought, which has led me to have to rewrite parts of it multiple times; I'm afraid my review is about as long as the GUE draft itself. I should probably turn all this into a Internet draft, but it's email for now. So I've split it into 3 parts in separate emails: A) Technical Review of GUE 'As-is' <--- This email B) Editorial Review C) Redesign of parts of GUE * Pls read the review of "GUE as-is" first (ptA), it hopefully gives solid arguments for why some parts of GUE's design are problematic. * I'm afraid that is rather an understatement - I think I have undermined nearly every part of the wire protocol: the version field, the C flag, the Hlen field, and the flag-based options. And I believe the semantics of the one remaining part (the proto/ctype field) misses an opportunity to be a lot more powerful. * Nonetheless, these are only my opinions at this stage. Therefore I have disciplined myself to refer out to PtC for all redesign ideas, so PtA remains solely about GUE "as-is". * I hope you will accept the review in the spirit intended - constructive criticism to improve the final result, altho I appreciate you were probably hoping GUE was nearly done. * I'm not proprietorial about any of the ideas I give in the redesign - they are offered for the WG to use as it chooses. I don't really want to be working on encapsulation stuff myself, I just end up having to because a) encap is fundamental to real networking and it's always not quite done right which makes everything else hard; and b) encap is often the best way to get ahead of middebox evolution. I should add that: * I don't generally follow nvo3 (or intarea) lists. So apologies if some of my points are duplicates. * Nonetheless, this means my review is a good test of whether the draft is comprehensible to an outsider. * After I wrote this, I read Adrian Farrel's RTG Dir QA review. I don't think I have directly duplicated any of his comments. We were uneasy about some of the same things, but I have tried to complement criticism with alternative design proposals (ptC). * I noticed Adrian encouraged you to get review from the transport area. I'm on the transport area review team, but I haven't been asked to do an "official" transport area review of GUE. Whatever, the problems I have uncovered are wider - best categorised as transport, protocol design (encapsulation and extensibility), ops and security. *EXEC SUMMARY**(of ptA Technical) * I've split the tech review up into the following parts, and I've highlighted here where there are particularly serious problems: 1/ Addressing Architecture For IETF standardization, connection semantics will need to be the rule, not the exception. I know the exception applies where GUE came from - private DCs. However nvo3 and the IETF more generally has to cope with multi-tenant, multi-admin, and therefore firewalls (and other middlebox crud). I also identify some cases where GUE cannot work that will need to be documented (not show-stoppers). 2/ Wire Protocol I'm afraid I have unearthed a number of apparently nitty, but actually serious show-stoppers (IMO). E.g. GUEv1 precludes future versions of IP and GUE extensibility only works while there are no extensions (!). Also, the semantics of the ctype/proto field precludes some ideas we had in GUT, but without really giving a reason. Perhaps you just hadn't realised some potential uses of GUE that we had in mind. This could be stated as: "Please don't unnecessarily constrain your protocol design solely to the use-case(s) you have in mind." This is as much a problem with the IETF process, which by default tries to constrain a new protocol to the scope of one WG, even when it could be more powerful. I've heard suggestions that GUE ought to move from nvo3 to intarea?, tsvwg?, which may help, but I don't know which would be better. We should also bear in mind that a more powerful protocol can become a more powerful attack weapon in the wrong hands, so strong security review is also important. 3/ State Important, but absent from the draft. 4/ Operation Numerous, but mostly minor problems. The more serious ones are: * no way for tunnels in tunnels to know which options to copy to the outer, and which not. * The claim that "GUE permits encap of arbitrary IP protocols" is only true until it encounters a protocol it doesn't know (!). An improved checksum solution is also presented (in PtC), which can ensure checksum coverage of all non-mutable parts of a GUE packet and traverses middleboxes even if they do not support zero checksums, while at the same time minimising extra processing by generally avoiding duplicate coverage. 5/ Security I am worried about the new security options in GUE. Because they are introduced within a completely new extension framework they will introduce a whole set of new security vulnerabilities, flaws and bugs. The security community is stretched enough as it is having to cover what we already have. So it is important to justify why existing security building blocks are insufficient for GUE (IMO, the relevant motivation sections in the GUE extensions draft are insufficient). I also highlight some new points about firewall interactions. 6/ Implementation Just my little rant about LRO Finally there's one endemic editorial problem that has led to a large number of technical flaws and oversights. Over and over, the differences between the main two modes of usage go unstated and unresolved. There are only two short sections that discuss the two modes separately: * Section 5.1 Network tunnel encap (adds GUE+UDP+IP outside an existing IP header) * Section 5.2 Transport layer encap (adds GUE+UDP between an existing transport and an existing IP header). The majority of the draft is written in the mindset of network tunnel encap, but without saying so. If the reader is keeping both modes in mind, this makes the draft very hard to understand. But also, some fundamental problems (with one mode in some cases and the other mode in other cases) have been overlooked by not considering each mode separately at each stage of the discussion. *TABLE OF CONTENTS** * Yes, a ToC for an an email! A/ TECHNICAL PROBLEMS/COMMENTS 1/ Addressing Architecture 1.1/ Inferring Connection Semantics: the rule not the exception 1.2/ A Firewall or NAT in front of both ends 1.3/ Multiple GUE servers (transport encap) not possible behind a NAT-PT with one external IP 1.4/ Network decap and transport decap problematic on the same (IP) interface 2/ Wire Protocol 2.1/ HLEN too small 2.2/ GUE versions 2.3/ No need to interpret the protocol field relative to IPv4 2.4/ No need to restrict interpretation of the protocol field 2.5/ Missed opportunity to liberalise interpretation of the protocol field 2.6/ Positioning GUE with respect to existing IPv6 extension headers 2.7/ Reliable delivery of control messages 2.8/ Extensibility of the flags and optional fields scheme: doesn't work 2.9/ Hard-coded option lengths do not scale 2.10/ Random access to options needs motivating 3/ State 3.1/ Per-connection state vs. stateless connections but per-tunnel state 3.2/ Transport encap with Connection Semantics: Flow state management 3.3/ Keepalives for middlebox flow state 4/ Operation 4.1/ Transport encap: to GUE or not to GUE? 4.2/ Hop limit / TTL processing 4.3/ Error messages 4.4/ Tunnels in tunnels 4.5/ SHOULD adjust MTU? 4.6/ Is orig-proto field necessary in the fragmentation option? 4.7/ Congestion Control: reductio ad absurdum 4.8/ Multicast outer -> Implosion on inner destination 4.9/ Deriving flow entropy from the inner is contrary to "GUE permits encap of arbitrary IP protocols" claim 4.10/ Flow entropy from encrypted data could weaken the crypto? 4.11/ No need to constrain flow entropy distribution 4.12/ No need to constrain flow entropy interpretation 5/ Security 5.1/ Addresses that are both visible and hidden? Have your GUE and eat it too? 5.2/ How can the Security option protect a UDP/GUE header from being moved or removed? 5.3/ What happens when a port scan sends a datagram to port 6080? 5.4/ Firewalls will still block new/atypical protocols 5.5/ Transport Encap: Two Passes through a Local Firewall? 6/ Implementation 6.1/ Practical Large Receive Offload Requirements *A/ TECHNICAL PROBLEMS**/COMMENTS * _*1/ ADDRESSING ARCHITECTURE*__* *_ *1.1/ Inferring Connection **Semantics: the rule not the exception * The draft assumes that, as a general rule, the UDP dst. port of a GUE packet will be fixed (6080) and that flow entropy will come from the source port (see the two quoted sections below). S. 5.11.1. Flow classification " ... When a packet is encapsulated with GUE, the source port in the outer UDP packet is set to a flow entropy value ... S.5.11.2 Flow entropy properties The flow entropy is the value set in the UDP source port of a GUE packet. Flow entropy in the UDP source port should adhere to the following properties: Nonetheless, the draft recognises there will be cases where "connection semantics" have to be applied in order to traverse middleboxes such as firewalls and NATs (but only mentioned in the relevant parts of 5.6.1 & 5.6.2 quoted below). Such middleboxes generally only allow "ingress" UDP datagrams if they look like responses to recent "egress" datagram(s). So there has to be a concept of an "initiator" end of the GUE tunnel. Only once the initiator end has sent an "egress" datagram with src:dst ports e:G (from ephemeral port e to the GUE port G), then the GUE encap at the remote "responder" end would be able to traverse the middlebox using "ingress" datagrams with src:dst ports reversed (G:e). S.5.6.1. Inferring connection semantics: A middlebox may infer bidirectional connection semantics [...] To operate in this environment, a GUE tunnel must assume connected semantics [...] The source port set in the UDP header must be the destination port the peer would set for replies. In this case the UDP source port for a tunnel would be a fixed value and not set to be flow entropy as described insection 5.11 <https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>. The selection of whether to make the UDP source port fixed or set to a flow entropy value for each packet sent should be configurable for a tunnel. S. 5.6.2. NAT In the case of stateful NAT, connection semantics must be applied to a GUE tunnel as described insection 5.6.1 <https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.6.1>. [BTW, I suggest changing the final sentence of the first para in S.5.6.1. (quoted above) to: Therefore, in the ingress direction, the destination UDP port would provide flow entropy, while the source port would take the fixed value of 6080 (the converse of the case insection 5.11 <https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>). ] The text quoted from both sections 5.6.1 & 5.6.2 above implies a) that the operator of tunnel endpoint(s) can somehow know whether there are any middleboxes within the tunnel. b) that applying connection semantics is feasible. Connection semantics feasibility: * transport encap: relatively easy - it was simple to implement connection semantics in GUT (see code <http://www.netlab.tkk.fi/%7Ejmanner/gut.html> or example in Figure 4 in draft-manner-tsvwg-gut-02 <https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>, or see description later under A3.2/ "Transport encap with Connection Semantics: Flow state management"). Nonetheless, without congestion semantics, GUE/GUT is even simpler, because it can be stateless. * network encap: harder (see separate email for my proposed design: C1/ "Stateless Connection Semantics", but until there's a working implementation we have to allow for the possibility that it's not feasible). Regarding the first question - whether middleboxes (such as firewalls) exist on a path: * most operators of tunnel endpoints don't know for sure, but they do know that firewalls, etc. are very likely, so they would have to turn on the "middleboxes exist" parameter. * in one or two important (but private) data centres, the admin might know that there are no firewalls (and certainly no NATs), so she can turn off the "middleboxes exist" parameter. However, that is the exception not the rule. In summary, connection semantics are essential wherever there might be middleboxes. This implies: * transport encap: connection semantics are relatively simple, so why not solely standardize this case? The few cases where the operator knows for certain that there are no middleboxes don't need to use connection semantics, but they are in private networks, so they shouldn't be the primary use-case for standardization. * network encap: Will connection semantics work? Two possibilities: a) if no, the GUE network encap will be pretty useless, given nearly all real networks contain firewalls, etc. There will be no point standardizing the network encap just for a few special private networks that have no middleboxes. b) if yes, they will be needed in most real networks, so it should be the default case that is standardized. Then the IETF has to ask, is there any point standardizing a GUE network encap without connection semantics, just for a few controlled environments where the operator knows for sure that there are no middleboxes? Corollary of all this: A packet is a "GUE packet" if either src or dst port = 6080. *1.2/ A Firewall or NAT in front of both ends** * Most firewalls / NATs only allow an incoming UDP datagram in response to a recent outgoing datagram. If there there are two such middleboxes each "protecting" a different endpoint of a GUE tunnel (network or transport encap), then neither end can send an initial GUE datagram. To operate in such an environment, GUE endpoints will need to support STUN [RFC5389]. *1.3/ **Multiple GUE servers (transport encap) not possible behind a NAT-PT with one external IP** * Two cases: * For transport encap: every GUE server has to have its own public IP address. Reason: if a NAT-PT with one external IP address (A) sits in front of multiple GUE servers, only one can be reached on the well-known GUE port (6080). Because there will be only one address:port combination to address packets to (A:6080). (Dan Wing pointed out this same problem with GUT on the tsvwg ML <https://www.ietf.org/mail-archive/web/tsvwg/current/msg09851.html>). It's not a killer, but it is a limitation to applicability that has to be understood and documented. * With network encap: Non-issue. *1.4/ Network decap and transport decap problematic on the same***(IP)* interface** * A consequence of using the same well-known port for GUE transport and network encap is that both decaps cannot be deployed at the same IP address. Thought experiment: This might work by implementing a combined transport/network decap that checked whether there was another IP header in the header chain and: * if there was, removed the outer IP and the outer UDP+GUE+option headers * if not, removed solely the outer UDP+GUE+option headers, but not the outer IP. However, there is nothing to say that a GUE transport encap should not encapsulate a packet that has already been tunnelled in an IP outer (e.g. IPsec AH or ESP). That is, the transport encap would insert a UDP and GUE header between the outer IP and the inner IP, without adding another IP outer. It would be safer to use two different well-known ports for transport and network encap. However, I think deploying transport and network encap on the same IP is a corner case we just need to rule as inadmissible. Nonetheless, a sys-admin would get weird behaviour if this did happen, with lots of head-scratching before she realised what had happened. I'm not sure how to mitigate this. _*2/ WIRE PROTOCOL*__* *_ *2.1/ HLEN too small* S3.1 The 5-bit Hlen field (multiplied in 4B units making max header length 128B) worries me a lot. Let's not make a similar mistake to when we limited TCP option space to 40B, which has caused enormous grief. *2.2/ GUE versions* S3.1 The hack in GUE v1 to compress out the GUE header for direct encapsulation of IP (v4 or v6) seems neat, but it is also /extremely dangerous/. If GUE becomes successful, it would prevent incremental deployment of any new version of IP starting 0b10, 0b11 or 0b00. Because: * S.5.4 says drop an unknown version field, so IP cannot be upgraded independently from GUE code. * A version of IP starting 0b00 would be mistaken for GUE. The latter might sound unlikely, but bear in mind that: * you don't know what ideas might come up in future for using multiple versions of IP - the IP version field could become important. * a future version of IP might wrap the version field, because 0x0-0x3 are no longer used (a version only has to be a unique tag, it doesn't have to increase). [Aside: If you prefer an equally dangerous hack (perhaps because you don't believe there will ever be a version of IP beyond v6), you could have reduced the Ver field to the first single bit by making GUEv0 the one without a GUE header, and GUEv1 the one with. This would have given more space for the Hlen field (see my concern in A2.1/ "HLEN too small" above and my idea in a separate email to remove the C flag).] In the separate email about redesign, I'll describe an alternative approach that always fits the base GUE protocol into 4B, or even within the 8B UDP header (see C6/ Wire Protocol; it comes from an idea to develop GUT into what I called Gutless <https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>, back in Feb 2010). *2.3/ No need to interpret the protocol field relative to IPv4** *S3.2.1: The protocol number in interpreted relative to the IP protocol that encapsulates the UDP packet (i.e. protocol of the outer IP header). IPv6 [RFC2460] defines the Next Header field to use the same protocol identifier space as IPv4. There are no IPv4 protocol numbers that are inappropriate for IPv6 (see the IANA protocol number registry <http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml>). Therefore, this should simply say that the protocol number is interpreted as an IPv6 protocol number (and therefore the field would be more appropriately called "Next Header"). *2.4/ No need to restrict interpretation of the protocol field** *S3.2.1: This draft should not state any restrictions (e.g. those in the second and third paragraphs quoted below) that preclude certain protocol numbers in combination with either an IPv4 or IPv6 outer. For an IPv4 header the protocol may be set to any number except for those that refer to IPv6 extension headers or ICMPv6 options (number 58). [...] For an IPv6 header the protocol may be set to any defined protocol number except Hop-by-hop options (number 0). [...] Various implementations are capable of understanding an IPv6 extension or v6-ICMP within an IPv4 header (e.g. [RFC6145 <https://tools.ietf.org/html/rfc6145#section-5.2>]). And any list of restricted header combinations can never deal with newly defined headers. So the only test needed is "Does your code for this combination and order of headers have the logic for the next header?" GUE then only needs to refer to the appropriate action already specified in RFC2046 (quoted below) rather than making up its own rules: The Option Type identifiers are internally encoded such that their highest-order two bits specify the action that must be taken if the processing IPv6 node does not recognize the Option Type: [...] If, as a result of processing a header, a node is required to proceed to the next header but the Next Header value in the current header is unrecognized by the node, it should discard the packet and send an ICMP Parameter Problem message to the source of the packet, with an ICMP Code value of 1 ("unrecognized Next Header type encountered") and the ICMP Pointer field containing the offset of the unrecognized value within the original packet. The same action should be taken if a node encounters a Next Header value of zero in any header other than an IPv6 header. There is a sentence at the end of S.3.6 (quoted below) that repeats these unnecessary restrictions. If you agree with me, please also remove it. [...] In this case next header must refer to a valid IP protocol for IPv4. No other extension headers or destination options are permitted with IPv4. *2.5**/ Missed opportunity to liberalise interpretation of the protocol field** * I believe that GUE offers the opportunity to liberalise, rather than restrict, protocol field interpretation. In particular, GUE could allow encapsulation of hop-by-hop options (next header number 0). You might wonder what a HbH option could possibly mean within a GUE header - see C2.4/ "GUE: a potential solution to the IPv6 extension header discard problem" in my separate email about how to use GUE to solve the problem where IPv6 packets with header extensions are highly prone to discard [RFC7872 <https://tools.ietf.org/html/rfc7872>]. *2.6/ Positioning GUE with respect to existing IPv6 extension headers** * The draft needs to state rules for where GUE encapsulation fits in the order of a chain of any IPv6 extension headers already present in an arriving IPv6 packet. Below, this question is considered for both types of encapsulation, and in both cases it can be seen that the UDP/GUE header would not necessarily be the first header after an IPv6 outer. * Network encap: According to my reading of RFC2473, certain IPv6 extension headers in an arriving IPv6 should (theoretically) be copied as extension headers for the outer: a) a Hop-by-Hop Options header (depending on the encap configuration, but a jumbogram option would have to be copied) b) a Routing header (depending on the encap configuration) c) The Tunnel Encapsulation Limit Option (within a Destination Options Extension Header) - HbH options are pretty academic these days, given they cause about 39-54% discard [RFC7872 <https://tools.ietf.org/html/rfc7872>]. However, if there is one on the inner, I guess we should still say that a GUE network encap should copy it to the outer before UDP/GUE is added. - I believe RFC2473 was wrong to say a routing header could be copied to the outer. Imagine a packet gets tunnelled that has a routing header listing addresses D2, D1 & D0 still left to visit. Although it is unclear what it means to copy a routing header to the outer, it must mean that these addresses would be visited by the tunnelled packet, then visited again after decapsulation. - I believe the Tunnel Encapsulation Limit Option is also pretty academic these days, but again, if one arrived, a GUE network encap ought to check the value, decrement it, and copy the header to the outer. * Transport encap: In this case, I have suggested where the UDP/GUE header should fit in the following order of extension headers (copied from RFC2046): IPv6 header Hop-by-Hop Options header +UDP +GUE Destination Options header (note 1) Routing header Fragment header Authentication header (note 2) Encapsulating Security Payload header (note 2) Destination Options header (note 3) upper-layer header The draft ought to mention that if AH has been applied to a packet which is then encapsulated by GUE in transport mode, the AH header is not recalculated, so it does not cover the UDP/GUE headers. Decapsulation works because the UDP/GUE headers are inserted before the authentication header, so they will be removed (by a GUE decapsulator in transport mode) before AH is verified. Personally I don't know enough about routing headers to make the decision on whether they should be above or below the GUE header in the transport encap. I believe they are only processed when a packet reaches the destination address in the main header, but I am not familiar with all the different routing types (I know some are deprecated, and frankly I couldn't be bothered to read the others). *2.7/ Reliable delivery of control messages** * The examples of potential control messages (those with the 'C' flag) given in S.3.5.1. (echo request/reply for testing) aim to mimic the data channel, so unreliable delivery as a GUE datagram is appropriate. The draft doesn't define any other tunnel control messages. However, if it did, many/most would need to be delivered reliably and in order (e.g. key agreement, any necessary configuration agreement, consistent application of connection semantics, etc). Therefore, reliable ordered delivery for control messages will need to be defined (see C3.2/ "Reliable delivery of control messages" in separate email for a suggested design). *2.8/ Extensibility of the flags and optional fields scheme: doesn't work** *S3.3: This is meant to be "the primary mechanism of extensibility in GUE". However, for extensibility to work, GUE needs to distinguish between: * options: the base set of flags+options defined from the start and required in all GUE code * "extensions" (my term): future extensions to the flags and options. The current GUE flags scheme only works for options, but it inherently puts extensions into a chicken-and-egg stand-off. because: a) S5.4 says an implementation MUST drop a packet with an unknown flag. So, if the IETF later defines bit 7, until a very large proportion of GUE decap implementations have been upgraded with logic that understands bit 7, the packet is going to be dropped with high probability. So no encap is going to want to set bit 7 on a packet, so there is no motivation for a decap to implement the code for bit 7. b) For such unknown flags, we cannot change "MUST drop" to "MUST ignore", because the lengths of the fields are not self-describing - they have to be hard-coded into an implementation. So if one GUE implementation only has logic about the flags up to bit 6, but a packet arrives with bit 8 set, the implementation doesn't know how large the "Fields" field is, so it doesn't know where the private data starts. For proper extensibility, each new GUE flagged option needs to be self-describing, i.e. with additional fields to say: a) Whether nodes that do not have the logic to understand the option should drop or ignore the packet, separately for: - nodes on the path - nodes at the dest. (decap) of the GUE datagram. b) Whether the option is intended to change on path (in which case it should not be covered by integrity or authentication codes). c) Whether the option should be copied or not by a GUE-in-GUE tunnel encap (see A4.4/ "Tunnels in Tunnels" later). d) The length of the option e) Additionally you might want to borrow the IPv6 idea of controlling whether there needs to be an error message or not, but personally I believe that is overkill (the intention was for silent failure to be impossible for critical features, but it is very hard to deliver error messages reliably anyway). The above shows that attempting to invent a new extensibility scheme usually ends in tears. The IETF and others have developed tried-and-tested extensibility approaches like TLV, CBOR. Even then, they still have problems. The above points draw lessons from all this, particularly: * action codes and change codes in the initial bits of IPv6 HbH & DO options [RFC2460] * TRILL extension word flags: critical and non-critical separately for hop-by-hop and ingress-to-egress (see [RFC7179] updated by [RFC7780]). * 'Self-describing objects', including type and size, is listed as 'Architectural Principle of the Internet' number 3.12 in [RFC1958] *2.9/ Hard-coded option lengths do not scale** * By hard-coding the length of each option in an RFC and in the GUE code (rather than self-describing in the packet), you are stuck with a certain size option for ever. Experience has proven that fields such as message authentication codes (MACs), fragment IDs, etc. have to scale. Admittedly, we could define flags for larger fields later, but I have shown above that new flags would be undeployable. *2.10/ Random access to options needs motivating* Quoting S3.3: Flags allow random access, for instance [...] There might be a case for GUE to use a protocol heap rather than a stack [Braden03]. If so, please motivate it. [Braden03] Braden, R., Faber, T. & Handley, M., "From Protocol Stack to Protocol Heap: Role-Based Architecture <http://doi.acm.org/10.1145/774763.774765>," ACM SIGCOMM Computer Communication Review 33:17--22 ACM (January 2003) _*3/ STATE*__* *_ *3.1/ Per-connection**state vs. ***stateless connections* but per-tunnel state** * The GUE draft does not suggest a mechanism for GUE endpoints to apply connection semantics. * For transport encap the GUT draft suggests an approach that uses per-flow state (see the example given in Figure 4 in draft-manner-tsvwg-gut-02 <https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>). * For network encap a stateless approach is proposed in my separate email (see C1/ "Stateless Connection Semantics"). Statelessness is important to simplify migration during load-balancing, failures etc. The 'shared fate' resilience principle [Clark88] maintains that a system should avoid reliance on flow-state held on the path, preferring to hold state solely at the endpoints. One could argue that, in transport encap mode, the GUE endpoints are on the end hosts, and therefore, the communication path is resilient because if GUE flow state is lost because an end host fails, the communication will have failed anyway. However, strictly, a GUE endpoint process is likely to be separate (perhaps even in NIC hardware) so it could fail independently of the true endpoint process of the connection. So it would be ideal to use a stateless approach for both network and transport encap. However, the best stateless approach I could come up with (if it works at all) requires some coordination and hence one-off set-up latency between the GUE endpoints. Therefore, stateless connections will be: * more appropriate for network encap (usually long-lived tunnels); and * less useful for transport encap (opportunistic per connection). To summarize, it is likely that the stateful approach will be used, at least for some GUE encapsulators in transport mode. Therefore, for the transport encap mode at least, the draft needs to consider per-flow state and its management (see following section). [Clark88] Clark, D.D., "The design philosophy of the DARPA internet protocols," Proc. ACM SIGCOMM'88, Computer Communication Review 18(4):106--114 (August 1988) *3.2/ Transport encap with **Connection Semantics: Flow state management** * Hosts already maintain flow-state for each connection in progress. To support GUE in transport encap mode, it is trivial for the hosts at each end to associate a little extra state with the existing state of each inner flow: * At the initiator end, it needs no flow-state to receive GUE packets, but in order to send GUE packets, it associates the original (inner) flow's ID with the source port it will use in the UDP outer to send every GUE packet. * At the responder end, it has to associate the inner flow ID with the source port in arriving GUE UDP outer headers. It needs this so that, when the inner flow sends out packets, the GUE encapsulator can intercept them and encapsulate them with a GUE header, using the stored source port as the destination port. * Any error messages returned from the responder also need to be encapsulated in the same way. Also, the draft needs to specify: * that a GUE transport decap ought to protect itself against DDoS by not storing flow state if no associated socket is open; * how long to time out unused flow state; * what to do with a packet if the necessary flow state is not present; *3.3/ Keepalives for middlebox flow state** * Middleboxes, such as firewalls and NATs time out the pin-hole associated with UDP flow-state fairly rapidly, but rarely less than 15s [RFC5405]. RFC5405 rightly says that an application that uses UDP should be responsible for recovering a timed out connection, rather than the stack sending keepalives to hold open a connection, when it doesn't actually know whether the application still wants the connection open. Nonetheless, an inner flow will not be aware that it is being tunnelled using UDP/GUE. Therefore it seems less inappropriate for the GUE encap to keep state alive on behalf of the application, so it ought to send keepalive GUE datagrams to hold any pin-hole open. However, if the application has not sent anything for some time (whatever that means), the GUE encap should time out the connection, rather than holding middlebox flow-state (and its own flow-state) open for ever. If you agree, it might be necessary to specify a keepalive control message that a GUE encap can send to the remote end of the GUE tunnel (which would also keep any flow-state at the remote end alive). These would only be necessary in one direction, and would not need to be reliably delivered. See Section 3.1 of draft-manner-tsvwg-gut-02 <https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.1> for the keepalive control message defined for GUT. _*4/ OPERATION*_ * 4.1/ Transport encap: to GUE or not to GUE?** * For transport encap, the draft needs to say how the host decides when to use GUE and when not. There's text on this inS.4 of the GUT draft <https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-4>, if you want to use it. *4.2/ Hop limit / TTL processing** *I couldn't find any text about this. Perhaps you intended this sentence in S.5.3 to cover it: it should follow standard conventions for tunneling of one IP protocol over another I think it would be best to spell out Hop limit processing. There's text on this inS.3.2 of the GUT draft <https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.2>, if you want to use it. *4.3/ Error messages** *S5.4 No error message is returned back to the encapsulator. Please go through every type of error and in each case justify why no error message to the encap is necessary. *4.4/ T**unnels in tunnels *S5.5 2nd para It may encapsulate a GUE packet in another GUE packet, for instance to implement a network tunnel (i.e. by encapsulating an IP packet with a GUE payload in another IP packet as a GUE payload). A number of problems here: 1) A "GUE packet" has not been defined. I assume any UDP header with either src or dst UDP port = 6080 (see A1.1/ "Inferring Connection Semantics: the rule not the exception"). 2) There is an incremental deployment problem here. Existing tunnels won't check within the outer IP for whether a UDP port is a GUE port. They will just add a new outer IP header without the UDP or GUE. 3) Whatever, if a tunnel is GUE-aware, this para needs to be clear exactly which headers it should copy with the outer IP: * Do you intend this to mean that all the following should be copied to the outer IP header: - the outer UDP, - any v0 GUE header - plus any GUE options or private data. * Is it appropriate to copy all the options and private data? I think only some (e.g. perhaps the VNID in certain circumstances?). Others would not have the correct semantics if blindly copied (e.g. fragment options, coverage of MACs, etc). * How does a GUE-in-GUE encapsulator know which to copy? Also, should any extension headers on an arriving IPv6 outer also be copied to be associated with the new outer? If so, which ones, and how does the encapsulator know? Do the same rules apply whether using transport or network encapsulation? I have been arguing since about 2009 that, when adding a new IP outer, each IP (at least IPv6) extension header should self-describe which headers should be copied to the outer on encap. At present RFC2473 lists some extension headers that might be copied and says it depends on the configuration of the encapsulator. But a hard-coded list precludes introduction of any new extension that needs to be copied. And certainly it doesn't work for extensions like GUE that don't fit into the original mould of what an IPv6 extension looks like. The behaviour needs to be somehow self-declared in each header, not in a standard. It is tough to solve this problem in a way that will work with existing tunnels. It needs solving more generally, not just for GUE. However, as long as GUE encapsulators address this problem from day-1, GUE presents an opportunity to solve the general problem in environments where all encapsulations are GUE-based (see my proposed solution in C4.1/ "Ensuring certain GUE headers are copied when a GUE packet is tunnelled" within my separate email on redesign). Then other encapsulation approaches might follow. *4.5/ **SHOULD adjust MTU? * An operator may set MTU to account for encapsulation overhead and reduce the likelihood of fragmentation. I would expect "SHOULD" here. You might want to refer to draft-ietf-tram-stun-pmtud for a way to do PMTUD with UDP (for STUN, but I think it would be similar for GUE). * **4.6/ Is orig-proto field necessary in the fragmentation option?* S4.3 of draft-herbert-gue-extensions-00 Why does the original protocol of a fragmented packet need to be visible before reassembly by declaring it in the GUE fragmentation option of each fragment? The GUE protocol field will be available once the fragments are reassembled, and I can't see why it would be needed before that. It is not good security practice to create multiple fields that are all intended to be set to the same value. Even if the implementation uses these orig-proto fields before reassembling the fragments, it will still have to check that they all match the GUE protocol field when the packet has been reassembled. And if any are not the same, it will raise security concerns about any action that had previously been taken based on an inconsistent value. *4.7/ Congestion Control: reductio ad absurdum** *S5.9 I suggest you remove the para about DCCP being appropriate for tunnel congestion control. I appreciate you are trying to comply with RFC5405, but it is impossible for tunnel specs to do so without looking absurd. The more you try, the more it will look like you are the ones that are absurd. RFC5405 gives no guidance on how to comply with its requirement about congestion control of non-IP traffic across a tunnel... because there is no running code for tunnel congestion control, or for a network circuit breaker. It has been suggested in the past that DCCP should be used across tunnels. DCCP is intended for a single flow and all the DCCP profiles defined so far ensure a DCCP "flow" will consume about as much capacity as a TCP flow. If DCCP were to be applied across a GUE tunnel it would reduce the rate of the aggregate of all flows across the tunnel to roughly the same as a /single/ TCP flow (see the intro of RFC7893 "Pseudowire Congestion Considerations"). One might imagine that RFC5405 means that a tunnel protocol designer would have to detect roughly how many flows a tunnel aggregate consisted of at any one time (say N flows) and attempt to design a congestion control (e.g. a DCCP profile) to consume roughly as much capacity as N TCP flows. However, this would probably cause horror for some in the transport area at the thought of the IETF endorsing a congestion control that can be N times as greedy as TCP. To further reduce the idea of a tunnel encap applying congestion control to absurdity, it would need: a) a huge buffer to absorb incoming packets whenever they arrived faster than the tunnel rate. All packets (in small and large flows) would back up behind this huge queue, which would be called buffer bloat, which would cause horror for most people in the transport area. b) ideally, a time machine (a negative buffer) to bring packets forward in time whenever the arrival rate of all the flows was insufficient to satisfy the desired aggregate rate of the tunnel. c) the addition of feedback channel(s) and a huge amount of extra processing. [As you can see, I don't support the idea in RFC5405 that a tunnel becomes responsible for congestion control of traffic that it encapsulates. Otherwise, to be consistent, an Ethernet link would become responsible for congestion control of traffic it encapsulates. However, I accept that consistency with RFC5405 is currently a hurdle your draft has to cross before it can be approved. If you feel you have to suggest a mechanism, IMO a policer makes sense - either a rate policer or a congestion-rate policer.] *4.8/ Multicast outer -> Implosion on inner destination** *S.5.10 Consider an inner flow of unicast packets, src-IP A, dst-IP B. Consider the encap adds an outer addressed to multicast address M, and consider n decapsulators subscribe to group M. This will cause the network to duplicate each packet n times. As each decap forwards the inner, n duplicates of each packet will converge on B. This might make sense with unicast inner packets for a small number of decaps (e.g. two for redundancy). And a multicast overlay could make sense for multicast inner packets as long as the multicast routing was aware of the P2MP tunnel (with suitable grouping of multicast groups). I think the text should say that a multicast outer is not precluded, because it is a theoretical possibility, but it should not be attempted without a safety harness and an empty bladder. *4.9/ Deriving flow entropy from the inner is contrary to "GUE permits encap of arbitrary IP protocols" claim** *S.5.11.1 The general idea for creating flow entropy seems to be for the GUE encap to map inner flows of possibly "atypical IP protocols" to individual UDP outer flows, on the assumption that switches or routers that implement ECMP etc. will understand UDP but not "atypical IP protocols". Let's examine this claim by taking network encap and transport encap separately. 1) Network encap Imagine that a GUE encap has been implemented that understands TCP, UDP, SCTP, DCCP, ICMP, RSVP, IPsec and ESP. Then researchers implement NewSexyTP, with a new IP protocol number. Every GUE encap in the world doesn't have any logic to understand or locate the flow ID fields of NewSexyTP. So GUE does not "permit encap of arbitrary IP protocols" as claimed in the motivation section. Further, why will GUE implementations be updated with logic to understand NewSexyTP any faster than the ECMP code in general-purpose switches and routers? One GUE implementation might be updated, but other developers might not so diligently track the latest transport protocols. One cannot even really argue that the ECMP code in switches and routers is implemented in hardware, so it will be harder to change than GUE code. Because the forwarding performance of GUE tunnel encap will need to be no different to the performance of forwarding in general switches and routers, so if hardware is necessary for one it will be necessary for the other. 2) Transport encap. If GUE encap is implemented as a centralized daemon process on a host or centralized in a NIC, it will suffer from the same lack of forward compatibility with new transport protocols as the network encap - particularly if it is implemented in NIC hardware. Ie, if an operator installs SexyNewTP in their OS, they will also have to wait for a GUE update that supports SexyNewTP. This is the case with or without connection semantics. However, it might be possible to implement GUE transport encap (including with connection semantics) so that each instance of a protocol stack is associated with an instance of GUE (warning: I have no idea yet whether this will be possible). In this case, each GUE instance would consistently add the same outer port number to the inner protocol instance it was associated with, without needing to understand how to identify a flow ID in any particular protocol. In summary, certainly for net encap, but possibly not for transport encap, GUE only helps "atypical IP protocols" that a particular GUE encap implementation already understands. *4.10/ Flow entropy from encrypted data could weaken the crypto?** *S.5.11.1 o If a node is encrypting a packet using ESP tunnel mode and GUE encapsulation, the flow entropy could be based on the contents of clear-text packet. For instance, a canonical five-tuple hash for a TCP/IP packet could be used. I'm not a crypto expert, but it sounds dangerous to take some clear-text from a known position in the data, hash it with a function that is not strongly one-way, then send this hash along with the cipher text. I think the SPI can be used as a unique consistent per-flow value, can't it? The SPI has been suitably randomised so that it reveals nothing about the flow ID. *4.11/ No need to constrain flow entropy distribution** *S.5.11.2 o The flow entropy should have a uniform distribution across encapsulated flows. Equal distribution of flows is not necessarily appropriate for all scenarios. Flows have a distribution of sizes, and altho ECMP is generally done randomly, an operator might want to (somehow) bias the hash algorithm to allow for the flows with the highest rate, which might otherwise unbalance the load. See for instance: "Engineered Elephant Flows for Boosting Application Performance in Large-Scale CLOS Networks <https://www.broadcom.com/collateral/wp/OF-DPA-WP102-R.pdf>" Broadcom White Paper (March 2014) *4.12/ No need to constrain flow entropy interpretation** * Decapsulators, or any networking devices, should not attempt to interpret flow entropy as anything more than an opaque value. This seems unnecessarily constraining. This might not be a good idea, but if someone finds a use for it, there's no need to stop them - if it's useful they'll ignore you anyway, so why bother saying it? Perhaps you intended to explain why doing this could be problematic, rather than precluding it? _*5/ SECURITY*__* *_ *5.1/ Addresses that are both visible and hidden? Have your GUE and eat it too?** * S.7. In the following sentence, Existing network security mechanisms, such as address spoofing detection, DDOS mitigation, and transparent encrypted tunnels can be applied to GUE packets. This should point out that an existing set of address spoofing detection rules would not work with GUE. I think you meant that existing rules and mechanisms could be modified to check the packets encapsulated by GUE without using radically new techniques. However, if GUE is in network encap mode and it encrypts the IP headers of the inner packets, address spoofing detection and DDoS mitigation will not be possible over the length of the GUE tunnel. You cannot both claim that GUE can hide information, and that GUE allows existing security techniques to work that rely on access to the hidden information. *5.2/ How can the Security option protect a UDP/GUE header from being moved or removed?** * The Security option is "used to provide integrity and authentication of the GUE header." I assume you envisage this would be complemented by other authentication techniques such as IPsec AH to provide integrity and authentication of the rest of the packet. However, it occurs to me that the two together do not protect the integrity of the /structure/ of the packet as a whole (whether network or transport encap). An on-path attacker could still move the UDP/GUE header within the packet (it might be possible to construct a valid packet with altered semantics), or remove the UDP/GUE header completely. I can't immediately think whether any damage could be done with such an attack, or how to prevent it. However, I'm sure there will be a crypto expert for whom this is not a new problem. Also, the 32B max length of the security option is insufficient. I looked for a MAC protocol where a larger field is needed, and the first one I picked required a larger field: RFC4383 "TESLA in Secure RTP" requires 34B, and that's just for the default sizes, not even the maximum. I picked TESLA because I knew each datagram needs a lot of authentication space. TESLA provides multicast message authentication, so as well as a key index and a MAC, each packet reveals a continually changing key. *5.3/ What happens when a port scan sends a datagram to port 6080?** * When a port scan (that doesn't necessarily know about GUE) sends a datagram to port 6080, if the datagram has a body, and the body starts with a zero bit, the GUE daemon will start processing it. If the first 4 octets happen (randomly) to be set to values that would be a valid GUE header (see S.5.4), it will be decapsulated and forwarded to a protocol handler. Not a show-stopper, but worth documenting? *5.4/ Firewalls will still block new/atypical protocols** *Few firewalls allow incoming UDP. So GUE will not enable deployment of servers using atypical/new protocols, which will still face a deployment problem. If a firewall opens a pin-hole to allow incoming UDP to access the well-known GUE port it would allow attackers to reach servers of any protocol while bypassing the firewall. E.g. an attacker could access a TCP-server by encapsulating TCP in GUE in order to bypass the firewall. Therefore, a firewall will only open a pin-hole to a GUE server, if it also inspects the packet encapsulated by GUE and applies all its normal rules to that as well. This is why I have said elsewhere that the draft should state that firewall bypass by new/atypical protocols is a non-goal of GUE. *5.5/ Transport Encap: Two Passes through a Local Firewall?** * GUE in transport mode resubmits the encapsulated packet to the host's IP stack. But it needs to make sure it re-injects the packet at the correct point in relation to any local firewall. * If the firewall includes rules to inspect the packet encapsulated with GUE (as discussed in the previous point), it would make sense to re-submit the packet above the local firewall. * If not, GUE should resubmit the packet so that it passes through the local firewall again. The latter mode would make more sense if GUE was also decrypting the inner packet. So, rather than have two options, a local firewall could work co-operatively with GUE in transport mode, so it doesn't have to inspect the inner in both passes. *6/ Implementation** * *6.1/ Practical Large Receive Offload Requirements** *Appendix A.4 says: The conservative approach to supporting LRO for GUE would be to assign packets to the same flow only if they have identical five- tuple and were encapsulated the same way. That is the outer IP addresses, the outer UDP ports, GUE protocol, GUE flags and fields, and inner five tuple are all identical. Rant: It is sad if such a conservative approach to LRO is still necessary. Any API to LRO hardware needs to be able to be given the locations of certain header fields that are deliberately intended to vary, so it can offer the facility to separately report these for each packet. A MAC of the encapsulating headers is a good case in point. ECN is an even better example of a varying field, because it has been a standard part of the IP header since 2001, long before LRO hardware was designed. -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Bob Briscoe
- Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Tom Herbert
- Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Bob Briscoe
- [nvo3] Review ptA: Technical draft-ietf-nvo3-gue-… Bob Briscoe
- Re: [nvo3] Review ptA: Technical draft-ietf-nvo3-… Tom Herbert