[tcpm] Rewriting MSS option for NAT64
Iljitsch van Beijnum <iljitsch@muada.com> Tue, 14 April 2009 19:26 UTC
Return-Path: <iljitsch@muada.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B41453A6EB6 for <tcpm@core3.amsl.com>; Tue, 14 Apr 2009 12:26:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.232
X-Spam-Level:
X-Spam-Status: No, score=-5.232 tagged_above=-999 required=5 tests=[AWL=1.367, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bPeErJHlKzQ0 for <tcpm@core3.amsl.com>; Tue, 14 Apr 2009 12:26:08 -0700 (PDT)
Received: from sequoia.muada.com (sequoia.muada.com [83.149.65.1]) by core3.amsl.com (Postfix) with ESMTP id 885CA28C228 for <tcpm@ietf.org>; Tue, 14 Apr 2009 12:26:05 -0700 (PDT)
Received: from [192.168.0.196] (static-167-138-7-89.ipcom.comunitel.net [89.7.138.167] (may be forged)) (authenticated bits=0) by sequoia.muada.com (8.13.3/8.13.3) with ESMTP id n3EJOwx4073260 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for <tcpm@ietf.org>; Tue, 14 Apr 2009 21:25:14 +0200 (CEST) (envelope-from iljitsch@muada.com)
Message-Id: <F5821C72-5E8E-402C-9E11-FA710258BC27@muada.com>
From: Iljitsch van Beijnum <iljitsch@muada.com>
To: tcpm@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Tue, 14 Apr 2009 21:25:39 +0200
References: <5D63E0F4-2784-4C50-833E-3A2E499DF55A@muada.com>
X-Mailer: Apple Mail (2.930.3)
Subject: [tcpm] Rewriting MSS option for NAT64
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 19:26:09 -0000
Hi, In the BEHAVE wg we're working on NAT64, a way to make IPv6 clients talk to IPv4 servers through a translator. It's a lot like NAT-PT but with many of the issues better addressed. See: http://tools.ietf.org/html/draft-bagnulo-behave-nat64-03 I've been working on text about packet sizes and fragmentation (see the text at the end of the message for context), and Lars asked me to ask you guys' input on this part: > The TCP MSS option [RFC 793] is used during the three-way handshake > by the two hosts involved to inform each other about the maximum TCP > segment size (assuming IP and TCP headers without options) that the > host can receive. > In practice, the MSS option is often used to make TCP work in the > presence of broken path MTU discovery. > To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD > rewrite the MSS option in SYN packets to the minimum of the original > MSS option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's > MTU on the IPv4 side - 40. This applies to SYNs in both the IPv4-to- > IPv6 direction and the IPv6-to-IPv4 direction. Since this is already very widely deployed in boxes that do stuff like PPPoE that reduces the MTU on access networks, I'm assuming there is no problem with this, especially since we're putting this into a translator that breaks authentication etc anyway. Iljitsch Packet sizes It's the job of the network layer to adapt to different maximum packet sizes as packets move through the network. There are three mechanisms that handle this: transport layer negotiations such as the TCP MSS option, path MTU discovery and fragmentation. The difference between the IPv4 and IPv6 header sizes requires some handling in a NAT64 translator, and there are complications because of the differences between how IPv4 and IPv6 handle fragmentation, as well as the issue of how to demultiplex fragmented IPv4 packets. There are two approaches to path MTU discovery and fragmentation when translating from IPv6 to IPv4: 1. Set DF to 0 in the translated packets. This avoids path MTU discovery issues but leads to significant numbers of fragments. 2. Set DF to 1 in the translated packets. This supports path MTU discovery on the IPv4 side so unnecessary fragments are avoided, but it doesn't address the issue that IPv6 hosts are not required to perform PMTUD when sending packets of 1280 bytes or smaller. The choice made in this document is to support option 1 for packets upto 1280 bytes, and option 2 for packets larger than 1280 bytes. A NAT64 translator MUST have an MTU of at least 1280 on all of its interfaces, both IPv4 and IPv6 interfaces. TCP MSS option The TCP MSS option [RFC 793] is used during the three-way handshake by the two hosts involved to inform each other about the maximum TCP segment size (assuming IP and TCP headers without options) that the host can receive. In practice, the MSS option is often used to make TCP work in the presence of broken path MTU discovery. To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD rewrite the MSS option in SYN packets to the minimum of the original MSS option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's MTU on the IPv4 side - 40. This applies to SYNs in both the IPv4-to-IPv6 direction and the IPv6-to-IPv4 direction. Path MTU discovery The vast majority of both IPv4 and IPv6 hosts use path MTU discovery [RFC 1191] [RFC 1981]. With IPv4, PMTUD can be enabled on a per-packet basis by setting the DF bit to 1. With IPv6, there is no need for PMTUD for packets up to 1280 bytes because all IPv6 hosts are required to be able to receive 1280-byte packets without fragmentation. When sending larger packets, IPv6 hosts implicitly use PMTUD. IPv6-to-IPv4 If the NAT64 has the same MTUs on its IPv6 and IPv4 interfaces, it will never have to generate "packet too big" messages for incoming IPv6 packets because the translation from IPv6 to IPv4 reduces the packet size by 20 bytes, more if the IPv6 packet has extension headers that are removed during the translation, such as the fragment header. If the MTU on the IPv6 side is larger than 1280 bytes and more than 20 bytes smaller than the MTU on the IPv4 side, the NAT64 MUST generate the appropriate "packet too big" messages on the IPv6 side. To support PMTUD, for translated packets that are larger than 1260 bytes on the IPv4 side (1280 bytes IPv6 packets with 20 byte size reduction through the translation), the DF bit is set to 1 in the resulting IPv4 packet. IPv4 routers may generate "packet too big" messages indicating a supported MTU size smaller than 1280 bytes. In those cases, the IPv6 hosts will continue to send packets larger than what the IPv4 path MTU can support. To allow packets to be delivered successfully in this case, the DF bit is set to 0 in all translated packets smaller than or equal to 1260 bytes, to allow these packets to be fragmented in the IPv4 network. Note: it is highly recommended for IPv4 hosts running services that may be used by IPv6 clients through a NAT64 translator to use an MTU size of at least 1260 bytes and to properly generate "packet too big" messages. When a NAT64 translates "packet too big" messages from IPv6 to IPv4, it adjusts the advertised MTU to the minimum of the original advertised MTU + 20, the NAT64's MTU on the IPv6 side + 20 and the NAT64's MTU on the IPv4 side. IPv4-to-IPv6 Because it may be necessary to include a fragmentation header or other extension header, the NAT64 MUST be prepared to generate "packet too big" messages for packets with the DF bit set to 1 received from the IPv4 side, regardless of the MTU sizes on the IPv4 and IPv6 interfaces. If the packet is larger than can be transmitted on the IPv6 side after translation, the NAT64 returns a "packet too big" message indicating the maximum IPv4 packet size that would be supported using the same translation as the current packet. This can be calculated as IPv4-packet-size - (IPv6-packet-size - IPv6-total- length) + 20. When a NAT64 translates "packet too big" messages from IPv4 to IPv6, it adjusts the advertised MTU to the minimum of the original advertised MTU - 20, the NAT64's MTU on the IPv6 side and the NAT64's MTU on the IPv4 side - 20. However, if the advertised MTU in "packet too big" messages is smaller than 1260 bytes, the value put into the translated "packet too big" message is 1280. This makes sure that the IPv6 host will limit its packet sizes to 1280 bytes, so its packets are subsequently translated into IPv4 packets with DF set to 0. (This deviates from [RFC 2765].) Fragmentation Because NAT deviates from normal router behavior, the limitation that IPv6 packets or IPv4 packets with DF set to 1 are not fragmented by routers doesn't apply to a NAT64 translator. Where appropriate, these packets are fragmented after translation as described below. Demultiplexing Because NAT64 provides a stateful many-to-one (perhaps even many-to- many) translation, it is necessary to recognize which session a given packet belongs to. For this, the TCP or UDP port numbers must be known, but these only occur in the first fragment of a fragmented packet. There are two possible ways to deal with this: 1. Reassemble the packet before translating it. 2. Create translation state for the fragments belonging to the same packet so each packet can be translated. Strategy 2 is attractive in large installations because it requires less storage and processing. However, it may still be necessary to buffer fragments for some time, as the fragment containing the first part of the packet (and with that, the port numbers) may not be the first one to arrive. Note: based on the assumptions that hosts generate fragments in-order and that reordering must happen through parallel network links and that the path between these parallel links and a NAT64 supports speeds of at least 10 Mbps, there is a very high probability that two out-of- order fragments making up a packet will arrive at the NAT64 within 50 to 100 milliseconds. Further assuming that fragmented traffic makes up less than 10% of all traffic, this only requires a buffer of 6 to 12,500 fragments (50 ms at 10 Mbps to 100 ms at 10 Gbps). In some cases, especially in the IPv6-to-IPv4 direction, there may only be a single session matching the fragment's source and destination addresses and protocol number. In these cases, it would be possible to translate the fragments out-of-order. A NAT64 translator MAY do this for TCP, however, it MUST NOT translate UDP packets before the first fragment is available. The reason for this is that the fragment could be part of a packet setting up a new session. However, with TCP session establishment packets don't carry data, so it's extremely unlikely that they are fragmented. This is not the case with UDP, and in the IPv4-to-IPv6 direction, a UDP packet may have a zero checksum, which must be recalculated when translating to IPv6, for which the entire packet must be available. IPv6-to-IPv4 For all IPv4 packets that the NAT64 creates through translation, the translator generates an ID value. This applies to all packets, regardless of their size or the value of the DF field. A NAT64 translator MAY employ strategies to avoid reusing an ID value for a certain source, destination, protocol tuple as long as possible. If the IPv4 packets are fragments of an IPv6 packet, then state is created that makes it possible for all the fragments to have the same ID value on the IPv4 side. [RFC 2765] specifies copying the lower bits from the IPv6 ID field in a fragment header (if present) to the IPv4 ID field, but this runs the risk of two IPv6 hosts talking to the same IPv4 destination through the NAT64 using the same ID value. Otherwise, when translating IPv6 packets with a fragmentation header, the fragments are translated as per [RFC 2765]. IPv4-to-IPv6 Because packets coming in on the IPv4 side may be larger than 1280 bytes after translation, a NAT64 MUST implement PMTUD on the IPv6 side. In other words, it must react to "packet too big" messages for any IPv6 destination that it communicates with by limiting the size of the packets that it sends to the advertised maximum. In the case where, after translation from IPv4 to IPv6, a packet is larger than a destination's PMTU, the NAT64 returns a "packet too big" as outlined earlier in the case that the DF bit was set to 1 in the IPv4 packet. If the DF bit was set to 0, the translator first translates the IPv4 packet, and then fragments the resulting IPv6 packets using normal IPv6 fragmentation rules. The value in the ID field is generated locally by the NAT64. If the IPv4 packet was a fragment, state is created that allows the same ID value to be used for all IPv6 packets or fragments that are part of the same original IPv4 packet. _______________________________________________ Behave mailing list Behave@ietf.org https://www.ietf.org/mailman/listinfo/behave
- [tcpm] Rewriting MSS option for NAT64 Iljitsch van Beijnum