--- draft-rtg-dt-encap-01.txt 2015-03-09 15:36:13.000000000 -0700 +++ draft-rtg-dt-encap-02.txt 2015-05-14 10:41:57.000000000 -0700 @@ -313,6 +313,11 @@ o Importance of being friendly to hardware and software implementations + The degree to which these common issues apply to a particular + encapsulation can differ based on the intended purpose of the + encapsulation. But it is useful to understand all of them before + determining which ones apply. + 4. Scope @@ -464,27 +470,45 @@ balancing, but the load balancing procedure MUST choose the same path for any two packets have the same entropy value.] + In summary: + o The entropy is associated with the transport, that is an outer IP + header or MPLS. + o In the case of IP transport use >=14 bits of UDP source port, plus + outer IPv6 flowid for entropy. + 8. Next-protocol indication - The transport delivery mechanism for the encapsulations we discuss in - this document need some way to indicate which encapsulation header - (or other payload) comes next in the packet. Some encapsulations - might be identified by a UDP port; others might be identified by an - Ethernet type or IP protocol number. Which approach is used is a - function of the preceding header the same was as IPv4 being - identified by both an Ethernet type and an IP protocol number (for - IP-in-IP). In some cases the header type is implicit in some session - (L2TP) or path (MPLS) setup. But this is largely beyond the control - of the encapsulation protocol. For instance, if there is a - requirement to carry the encapsulation after an Ethernet header, then - an Ethernet type is needed. If required to be carried after an IP/ - UDP header, then a UDP port number is needed. - - The encapsulation needs to indicate the type of its payload, which is - in scope for the design of the encapsulation. We have existing - protocols which use Ethernet types (such as GRE). Here each - encapsulation header can potentially makes its own choices between: + Next-protocol indications appear in three different context for + encapsulations. + + Firstly, the transport delivery mechanism for the encapsulations we + discuss in this document need some way to indicate which + encapsulation header (or other payload) comes next in the packet. + Some encapsulations might be identified by a UDP port; others might + be identified by an Ethernet type or IP protocol number. Which + approach is used is a function of the preceding header the same was + as IPv4 being identified by both an Ethernet type and an IP protocol + number (for IP-in-IP). In some cases the header type is implicit in + some session (L2TP) or path (MPLS) setup. But this is largely beyond + the control of the encapsulation protocol. For instance, if there is + a requirement to carry the encapsulation after an Ethernet header, + then an Ethernet type is needed. If required to be carried after an + IP/UDP header, then a UDP port number is needed. + + Secondly, the encapsulation needs to indicate the type of its + + + +Nordmark (ed), et al. Expires September 2, 2015 [Page 9] + +Internet-Draft Encaps Considerations March 2015 + + + payload, which is in scope for the design of the encapsulation. We + have existing protocols which use Ethernet types (such as GRE). Here + each encapsulation header can potentially makes its own choices + between: o Reuse Ethernet types - makes it easy to carry existing L2 and L3 protocols o Reuse IP protocol numbers - makes it easy to carry e.g., ESP but @@ -494,21 +518,28 @@ bits than an Ethernet type and give more flexibility, but at the cost of administering that numbering space. - If the IETF ends up defining multiple encapsulations at about the - same time, and there is some chance that multiple such encapsulations - can be combined in the same packet, there is a question whether it - - - -Nordmark (ed), et al. Expires September 10, 2015 [Page 9] - -Internet-Draft Encaps Considerations March 2015 - - - makes sense to use a common approach and numbering space for the - encapsulation across the different protocols. A common approach - might not be beneficial as long as there is only one way to indicate - e.g., SFC inside NVO3. + Thirdly, if the IETF ends up defining multiple encapsulations at + about the same time, and there is some chance that multiple such + encapsulations can be combined in the same packet, there is a + question whether it makes sense to use a common approach and + numbering space for the encapsulation across the different protocols. + A common approach might not be beneficial as long as there is only + one way to indicate e.g., SFC inside NVO3. + + Many Internet protocols use fixed values (typically managed by the + IANA function) for their next-protocol field. That facilitates + interpretation of packets by middleboxes and e.g., for debugging + purposes, but might make the protocol evolution inflexible. Our + collective experience with MPLS shows an alternative where the label + can be viewed as an index to a table containing processing + instructions and the table content can be managed in different ways. + Encapsulations might want to consider the tradeoffs between such more + flexible versus more fixed approaches. + + In summary: + o Would it be useful for the IETF come up with a common scheme for + encapsulation protocols? If not each encapsulation can define its + own scheme. 9. MTU and Fragmentation @@ -531,13 +570,14 @@ Encapsulations could also define an optional tunnel fragmentation and reassembly mechanism which would be useful in the case when the - operator doesn't have full control of the path. Such a mechanism - would be required if the underlay might have a path MTU which makes - it impossible to carry at least 1518 bytes (if offering Ethernet - service), or at least 1280 (if offering IPv6 service). The use of - such a protocol mechanism could be triggered by receiving a PTB. But - such a mechanism might not be implemented by all encaps and decaps - nodes. [Aerolink is one example of such a protocol.] + operator doesn't have full control of the path, or when the protocol + gets deployed outside of its original intended context. Such a + mechanism would be required if the underlay might have a path MTU + which makes it impossible to carry at least 1518 bytes (if offering + Ethernet service), or at least 1280 (if offering IPv6 service). The + use of such a protocol mechanism could be triggered by receiving a + PTB. But such a mechanism might not be implemented by all encaps and + decaps nodes. [Aerolink is one example of such a protocol.] Depending on the payload carried by the encapsulation there are some additional possibilities: @@ -549,24 +589,34 @@ stations, but unmodified end stations would do nothing with that TLV since they assume that the MTU is at least 1518. + In summary: + o In some deployments an encapsulation can assume well-managed MTU + hence no need for fragmentation and reassembly related to the + encapsulation. + o Even so, it makes sense for ingress to track any ICMP packet too + big addressed to ingress to be able to log any MTU + misconfigurations. + o Should an encapsulation protocol be depoyed outside of the + original context it might very well need support for fragmentation + and reassembly. + 10. OAM The OAM area is seeing active development in the IETF with @@ -595,12 +645,18 @@ in any case be excluded from the entropy. There can be several ways to prevent OAM packets from accidentally - being forwarded to hosts using: + being forwarded to the end station using: o A bit in the frame (as in TRILL) indicating OAM - o A next protocol indication with a designated value for "none" or + o A next-protocol indication with a designated value for "none" or "oam". This assumes that the bit or next protocol, respectively, would not - affect entropy/ECMP in the underlay. + affect entropy/ECMP in the underlay. However, the next-protocol + field might be used to provide differentiated treatement of packets + based on their payload; for instance a TCP vs. IPsec ESP payload + might be handled differently. Based on that observation it might be + undesirable to overload the next protocol with the OAM drop behavior, + resulting in a preference for having a bit to indicate that the + packet should be forwarded to the end station after decapsulation. There has been suggestions that one (or more) marker bits in the encaps header would be useful in order to delineate measurement @@ -632,6 +688,17 @@ encapsulations and might want to carry OAM end-to-end across the different encapsulations. + In summary: + o It makes sense to reserve a bit for "drop after decaps" for OAM + out-of-band. + o An encapsulation needs sufficient extensibility for OAM (bits, + timestamps, sequence numbers). That might be motivated by in-band + OAM in which case it would make sense to leverage it also for + out-of band OAM. + o OAM places some constraints on entropy use in forwarding devices. + o Should IETF look into error reporting that is independent of the + specific encaps? + 11. Security Considerations @@ -823,9 +889,30 @@ be secured also). In this case of security applied to encap payload, this does present a bit of protocol layer inversion in the header (encapsulation refers to overlay, but ESP operates on underlay), but this should be okay as long as semantics are clear and processing is deterministic. +11.4. In summary: + + o Need extensibility to be able to add security features like + cookies and secure hashes protecting the encapsulation header. + o NVO3 proably has specific higher requirements relating to + isolation for network virtualization, which is in scope for the + NVO3 WG/ + o Our collective IETF experience is that succesful protocols get + deployed outside of the original intended context, hence the + initial assumptions about the threat model might become invalid. + That needs to be considered in the standardization of new + encapsulations. + 12. QoS @@ -852,12 +932,27 @@ headers. Thus the encapsulation considerations in this area are mainly about applying the framework in [RFC2983]. + Note that the DSCP and ECN bits are not the only part of an inner + packet that might potentially affect the outer packet. For example, + [RFC2473] specifies handling of inner IPv6 hop-by-hop options that + effectively result in copying some options to the outer header. It + is simpler to not have future encapsulations depend on such copying + behavior. + There are some other considerations specific to doing OAM for encapsulations. If OAM messages are used to measure latency, it would make sense to treat them the same as data payloads. Thus they need to have the same outer DSCP value as the data packets which they wish to measure. @@ -865,6 +960,9 @@ data packets. That issue is broader than just QoS - applies to firewall filters etc. + In summary: + o Leverage the existing approach in [RFC2983] for DSCP handling. + 13. Congestion Considerations @@ -962,6 +1052,21 @@ to different tenants. The fallback would be to rate limit different traffic. + In summary: + o Leverage the existing approach in [RFC6040] for ECN handling. + o If the encapsulation can carry non-IP, hence non-congestion + controlled traffic, the leverage the approach in + [I-D.ietf-mpls-in-udp]. + o "Watch this space" for circuit breakers. + 14. Header Protection @@ -1034,6 +1139,13 @@ require additional checksum protection as the hash provides stronger assurance than a simple checksum. + In summary: + o Need extensibility to be able to add checksum/CRC for encaps + itself. + o When encaps has checksum/CRC, include IPv6 pseudo-header in it. + o Checksum/CRC can potentially be avoided when cryptographic + protection is applied to to the encapsulation. + 15. Extensibility Considerations @@ -1053,18 +1165,31 @@ more associated with defining a protocol than extending it (IPv6 being a successor to IPv4 is an example of protocol versioning). - Many protocol definitions include some number of reserved fields or - bits which can be used for future extension. VXLAN is an example of - a protocol that includes reserved bits which are subsequently being - allocated for new purposes. Another technique employed is to + In some cases it might be more appropriate to define a new inner + protocol which can carry the new functionality instead of extending + the outer protocol. Examples where this works well is in the IP/ + transport split, where the earlier architecture had a single NCP -Nordmark (ed), et al. Expires September 10, 2015 [Page 19] +Nordmark (ed), et al. Expires September 2, 2015 [Page 21] Internet-Draft Encaps Considerations March 2015 + protocol which carried both the hop-by-hop semantics which are now in + IP, and the end-to-end semantics which are now in TCP. Such a split + is effective when different nodes need to act upon the different + information. Applying this for general protocol extensibility + through nesting is not well understood, and does result in longer + header chains. Furthermore, our experience with IPv6 extension + headers [RFC2460] in middleboxes indicates that the approach does not + help with middlebox traversal. + + Many protocol definitions include some number of reserved fields or + bits which can be used for future extension. VXLAN is an example of + a protocol that includes reserved bits which are subsequently being + allocated for new purposes. Another technique employed is to repurpose existing header fields with new meanings. A classic example of this is the definition of DSCP code point which redefines the ToS field originally specified in IPv4. When a field is @@ -1212,8 +1337,28 @@ it might be easier to define protocols for encapsulation that are not defined in other number spaces (802.11 for instance). Disadvantage is that it represents yet another number space to be managed and doesn't leverage existing ones. +15.2. In summary: + + Extensibility needs to be considered for encapsulations. + o Encapsulations need the ability to be extended to handle e.g., the + OAM or security aspects discussed in this document. + o Practical experience seems to tell us that extensibility + mechanisms which are not in use on day one might result in + immediate ossification by lack of implementation support. In some + cases that has been in routers and in other cases in middleboxes. + Hence devising ways where the extensibility mechanisms are in use + seems important. + 16. Layering Considerations @@ -1312,14 +1449,20 @@ Architectural such services would make sense, but as a separate layer on top of an encapsulation protocol. They could be deployed between ingress and egress of a tunnel which uses some encaps. Potentially - the tunnel control points in the form of an ingress and egress will - become a platform for fixing suboptimal behavior elsewhere in the - network. For example, this document suggests that some congestion - handling might be needed to handle non-congestion controlled traffic - that gets tunneled, and also that fairness/QoS policing can be - deployed on those devices. Others have suggested that tunnels is one - way to deploy ECN without having to add ECN support in the endpoints - [I-D.briscoe-conex-data-centre]. + the tunnel control points at the ingress and egress could become a + platform for fixing suboptimal behavior elsewhere in the network. + That would clearly be undesirable in the general case. However, + handling encapsulation of non-IP traffic hence non-congestion- + controlled traffic is likely to be required, which implies some + fairness and/or QoS policing on the ingress and egress devices. But the tunnels could potentially do more like increase reliability (retransmissions, FEC) or load spreading using e.g. MP-TCP between @@ -1370,11 +1505,15 @@ point. If important information are located at the beginning of the encapsulation header, the packet may be processed with smaller number of bytes to be read into the fast memory and improve performance. - o Separation of NVO3 header from SFC header such that an encap can - also be processed by forwarding hardware (who can only process - network virtualization and pass the service chaining function to - another device specialized in service offering) o Avoid full packet checksums in the encapsulation if possible. Most of the switch/router hardware make switching/forwarding decisions by reading and examining only the first certain number @@ -1389,19 +1528,12 @@ processing. If important information can be found at fixed offset, different part of the encapsulation header may be processed by different hardware units in parallel (for example - multiple table lookups may be launched in parallel). Hardware can - handle optional information as long as when the information is - present it is found in one and only one place in the header. - Typical TLV encoding of options does not have that property since - the order of TLVs is unconstrained. + multiple table lookups may be launched in parallel). It is easier + for hardware to handle optional information when the information, + if present, can be found in ideally one place, but in general, in + as few places as possible. That facilitates parallel processing. + TLV encoding with unconstrained order typically does not have that + property. o Limit the number of header combinations. In many cases the hardware can explore different combinations of headers in parallel, however there is some added cost for this. @@ -1591,6 +1720,22 @@ addresses, the outer UDP ports, encapsulated protocol, encapsulation headers, and inner five tuple are all identical. +18.1.3.3. In summary: + + In summary, for NIC offload: + o The considerations for using full UDP checksums are different for + NIC offload than for implementations in forwarding devices like + routers and switches. + o Be judicious about encapsulations which change fields on a per- + packet basis (to be able to use TSO). 19. Middlebox Considerations @@ -1617,12 +1762,11 @@ to document how encaps-aware middleboxes should avoid unintended consequences in those (and perhaps other) areas. + In summary: + o We are likely to see middleboxes that at least parse the headers + for succesful new encapsulations. + o Should the IETF document considerations for what not to do in such + middleboxes? 20. Related Work @@ -1652,7 +1804,8 @@ layering additional services/characteristics such as ordered delivery or timely deliver on top of an encapsulation. That layering approach might be useful for the new encapsulations as well. For instance, - the control word [RFC4385]. + the control word [RFC4385]. There is also material on congestion + control for pseudo-wires in [I-D.ietf-pwe3-congcons]. Both MPLS and L2TP [RFC3931] rely on some control or signaling to establish state (for the path/labels in the case of MPLS, and for the @@ -1691,8 +1836,17 @@ 21. Acknowledgements - The authors acknowledge the comments from David Black, Andy Malis, - and Radia Perlman. + The authors acknowledge the comments from Alia Atlas, Fred Baker, + David Black, Bob Briscoe, Stewart Bryant, Mike Cox, Andy Malis, Radia + Perlman, and Michael Smith. 22. Open Issues @@ -1744,10 +1890,21 @@ [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. + [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in + IPv6 Specification", RFC 2473, December 1998. + [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", RFC 2890, September 2000. [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 2983, October 2000. [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., @@ -1802,6 +1952,18 @@ Operations, Administration, and Maintenance (OAM) Framework", RFC 7174, May 2014. + [RFC7325] Villamizar, C., Kompella, K., Amante, S., Malis, A., and + C. Pignataro, "MPLS Forwarding Compliance and Performance + Requirements", RFC 7325, August 2014. + [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for @@ -1841,32 +2003,37 @@ + + [I-D.ietf-pwe3-congcons] + Stein, Y., Black, D., and B. Briscoe, "Pseudowire + Congestion Considerations", draft-ietf-pwe3-congcons-02 + (work in progress), July 2014.