[tcpm] Rewriting MSS option for NAT64

Iljitsch van Beijnum <iljitsch@muada.com> Tue, 14 April 2009 19:26 UTC

Return-Path: <iljitsch@muada.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B41453A6EB6 for <tcpm@core3.amsl.com>; Tue, 14 Apr 2009 12:26:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.232
X-Spam-Level:
X-Spam-Status: No, score=-5.232 tagged_above=-999 required=5 tests=[AWL=1.367, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bPeErJHlKzQ0 for <tcpm@core3.amsl.com>; Tue, 14 Apr 2009 12:26:08 -0700 (PDT)
Received: from sequoia.muada.com (sequoia.muada.com [83.149.65.1]) by core3.amsl.com (Postfix) with ESMTP id 885CA28C228 for <tcpm@ietf.org>; Tue, 14 Apr 2009 12:26:05 -0700 (PDT)
Received: from [192.168.0.196] (static-167-138-7-89.ipcom.comunitel.net [89.7.138.167] (may be forged)) (authenticated bits=0) by sequoia.muada.com (8.13.3/8.13.3) with ESMTP id n3EJOwx4073260 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for <tcpm@ietf.org>; Tue, 14 Apr 2009 21:25:14 +0200 (CEST) (envelope-from iljitsch@muada.com)
Message-Id: <F5821C72-5E8E-402C-9E11-FA710258BC27@muada.com>
From: Iljitsch van Beijnum <iljitsch@muada.com>
To: tcpm@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Tue, 14 Apr 2009 21:25:39 +0200
References: <5D63E0F4-2784-4C50-833E-3A2E499DF55A@muada.com>
X-Mailer: Apple Mail (2.930.3)
Subject: [tcpm] Rewriting MSS option for NAT64
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Apr 2009 19:26:09 -0000

Hi,

In the BEHAVE wg we're working on NAT64, a way to make IPv6 clients  
talk to IPv4 servers through a translator. It's a lot like NAT-PT but  
with many of the issues better addressed. See:

http://tools.ietf.org/html/draft-bagnulo-behave-nat64-03

I've been working on text about packet sizes and fragmentation (see  
the text at the end of the message for context), and Lars asked me to  
ask you guys' input on this part:

> The TCP MSS option [RFC 793] is used during the three-way handshake  
> by the two hosts involved to inform each other about the maximum TCP  
> segment size (assuming IP and TCP headers without options) that the  
> host can receive.

> In practice, the MSS option is often used to make TCP work in the  
> presence of broken path MTU discovery.

> To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD  
> rewrite the MSS option in SYN packets to the minimum of the original  
> MSS option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's  
> MTU on the IPv4 side - 40. This applies to SYNs in both the IPv4-to- 
> IPv6 direction and the IPv6-to-IPv4 direction.


Since this is already very widely deployed in boxes that do stuff like  
PPPoE that reduces the MTU on access networks, I'm assuming there is  
no problem with this, especially since we're putting this into a  
translator that breaks authentication etc anyway.

Iljitsch



Packet sizes

It's the job of the network layer to adapt to different maximum packet  
sizes as packets move through the network. There are three mechanisms  
that handle this: transport layer negotiations such as the TCP MSS  
option, path MTU discovery and fragmentation. The difference between  
the IPv4 and IPv6 header sizes requires some handling in a NAT64  
translator, and there are complications because of the differences  
between how IPv4 and IPv6 handle fragmentation, as well as the issue  
of how to demultiplex fragmented IPv4 packets.

There are two approaches to path MTU discovery and fragmentation when  
translating from IPv6 to IPv4:

1. Set DF to 0 in the translated packets. This avoids path MTU  
discovery issues but leads to significant numbers of fragments.

2. Set DF to 1 in the translated packets. This supports path MTU  
discovery on the IPv4 side so unnecessary fragments are avoided, but  
it doesn't address the issue that IPv6 hosts are not required to  
perform PMTUD when sending packets of 1280 bytes or smaller.

The choice made in this document is to support option 1 for packets  
upto 1280 bytes, and option 2 for packets larger than 1280 bytes.

A NAT64 translator MUST have an MTU of at least 1280 on all of its  
interfaces, both IPv4 and IPv6 interfaces.

TCP MSS option

The TCP MSS option [RFC 793] is used during the three-way handshake by  
the two hosts involved to inform each other about the maximum TCP  
segment size (assuming IP and TCP headers without options) that the  
host can receive.

In practice, the MSS option is often used to make TCP work in the  
presence of broken path MTU discovery.

To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD rewrite  
the MSS option in SYN packets to the minimum of the original MSS  
option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's MTU on  
the IPv4 side - 40. This applies to SYNs in both the IPv4-to-IPv6  
direction and the IPv6-to-IPv4 direction.

Path MTU discovery

The vast majority of both IPv4 and IPv6 hosts use path MTU discovery  
[RFC 1191] [RFC 1981]. With IPv4, PMTUD can be enabled on a per-packet  
basis by setting the DF bit to 1. With IPv6, there is no need for  
PMTUD for packets up to 1280 bytes because all IPv6 hosts are required  
to be able to receive 1280-byte packets without fragmentation. When  
sending larger packets, IPv6 hosts implicitly use PMTUD.

IPv6-to-IPv4

If the NAT64 has the same MTUs on its IPv6 and IPv4 interfaces, it  
will never have to generate "packet too big" messages for incoming  
IPv6 packets because the translation from IPv6 to IPv4 reduces the  
packet size by 20 bytes, more if the IPv6 packet has extension headers  
that are removed during the translation, such as the fragment header.  
If the MTU on the IPv6 side is larger than 1280 bytes and more than 20  
bytes smaller than the MTU on the IPv4 side, the NAT64 MUST generate  
the appropriate "packet too big" messages on the IPv6 side.

To support PMTUD, for translated packets that are larger than 1260  
bytes on the IPv4 side (1280 bytes IPv6 packets with 20 byte size  
reduction through the translation), the DF bit is set to 1 in the  
resulting IPv4 packet.

IPv4 routers may generate "packet too big" messages indicating a  
supported MTU size smaller than 1280 bytes. In those cases, the IPv6  
hosts will continue to send packets larger than what the IPv4 path MTU  
can support. To allow packets to be delivered successfully in this  
case, the DF bit is set to 0 in all translated packets smaller than or  
equal to 1260 bytes, to allow these packets to be fragmented in the  
IPv4 network.

Note: it is highly recommended for IPv4 hosts running services that  
may be used by IPv6 clients through a NAT64 translator to use an MTU  
size of at least 1260 bytes and to properly generate "packet too big"  
messages.

When a NAT64 translates "packet too big" messages from IPv6 to IPv4,  
it adjusts the advertised MTU to the minimum of the original  
advertised MTU + 20, the NAT64's MTU on the IPv6 side + 20 and the  
NAT64's MTU on the IPv4 side.

IPv4-to-IPv6

Because it may be necessary to include a fragmentation header or other  
extension header, the NAT64 MUST be prepared to generate "packet too  
big" messages for packets with the DF bit set to 1 received from the  
IPv4 side, regardless of the MTU sizes on the IPv4 and IPv6  
interfaces. If the packet is larger than can be transmitted on the  
IPv6 side after translation, the NAT64 returns a "packet too big"  
message indicating the maximum IPv4 packet size that would be  
supported using the same translation as the current packet. This can  
be calculated as IPv4-packet-size - (IPv6-packet-size - IPv6-total- 
length) + 20.

When a NAT64 translates "packet too big" messages from IPv4 to IPv6,  
it adjusts the advertised MTU to the minimum of the original  
advertised MTU - 20, the NAT64's MTU on the IPv6 side and the NAT64's  
MTU on the IPv4 side - 20. However, if the advertised MTU in "packet  
too big" messages is smaller than 1260 bytes, the value put into the  
translated "packet too big" message is 1280. This makes sure that the  
IPv6 host will limit its packet sizes to 1280 bytes, so its packets  
are subsequently translated into IPv4 packets with DF set to 0. (This  
deviates from [RFC 2765].)

Fragmentation

Because NAT deviates from normal router behavior, the limitation that  
IPv6 packets or IPv4 packets with DF set to 1 are not fragmented by  
routers doesn't apply to a NAT64 translator. Where appropriate, these  
packets are fragmented after translation as described below.

Demultiplexing

Because NAT64 provides a stateful many-to-one (perhaps even many-to- 
many) translation, it is necessary to recognize which session a given  
packet belongs to. For this, the TCP or UDP port numbers must be  
known, but these only occur in the first fragment of a fragmented  
packet. There are two possible ways to deal with this:

1. Reassemble the packet before translating it.

2. Create translation state for the fragments belonging to the same  
packet so each packet can be translated.

Strategy 2 is attractive in large installations because it requires  
less storage and processing. However, it may still be necessary to  
buffer fragments for some time, as the fragment containing the first  
part of the packet (and with that, the port numbers) may not be the  
first one to arrive.

Note: based on the assumptions that hosts generate fragments in-order  
and that reordering must happen through parallel network links and  
that the path between these parallel links and a NAT64 supports speeds  
of at least 10 Mbps, there is a very high probability that two out-of- 
order fragments making up a packet will arrive at the NAT64 within 50  
to 100 milliseconds. Further assuming that fragmented traffic makes up  
less than 10% of all traffic, this only requires a buffer of 6 to  
12,500 fragments (50 ms at 10 Mbps to 100 ms at 10 Gbps).

In some cases, especially in the IPv6-to-IPv4 direction, there may  
only be a single session matching the fragment's source and  
destination addresses and protocol number. In these cases, it would be  
possible to translate the fragments out-of-order. A NAT64 translator  
MAY do this for TCP, however, it MUST NOT translate UDP packets before  
the first fragment is available. The reason for this is that the  
fragment could be part of a packet setting up a new session. However,  
with TCP session establishment packets don't carry data, so it's  
extremely unlikely that they are fragmented. This is not the case with  
UDP, and in the IPv4-to-IPv6 direction, a UDP packet may have a zero  
checksum, which must be recalculated when translating to IPv6, for  
which the entire packet must be available.

IPv6-to-IPv4

For all IPv4 packets that the NAT64 creates through translation, the  
translator generates an ID value. This applies to all packets,  
regardless of their size or the value of the DF field. A NAT64  
translator MAY employ strategies to avoid reusing an ID value for a  
certain source, destination, protocol tuple as long as possible. If  
the IPv4 packets are fragments of an IPv6 packet, then state is  
created that makes it possible for all the fragments to have the same  
ID value on the IPv4 side.

[RFC 2765] specifies copying the lower bits from the IPv6 ID field in  
a fragment header (if present) to the IPv4 ID field, but this runs the  
risk of two IPv6 hosts talking to the same IPv4 destination through  
the NAT64 using the same ID value.

Otherwise, when translating IPv6 packets with a fragmentation header,  
the fragments are translated as per [RFC 2765].

IPv4-to-IPv6

Because packets coming in on the IPv4 side may be larger than 1280  
bytes after translation, a NAT64 MUST implement PMTUD on the IPv6  
side. In other words, it must react to "packet too big" messages for  
any IPv6 destination that it communicates with by limiting the size of  
the packets that it sends to the advertised maximum.

In the case where, after translation from IPv4 to IPv6, a packet is  
larger than a destination's PMTU, the NAT64 returns a "packet too big"  
as outlined earlier in the case that the DF bit was set to 1 in the  
IPv4 packet. If the DF bit was set to 0, the translator first  
translates the IPv4 packet, and then fragments the resulting IPv6  
packets using normal IPv6 fragmentation rules. The value in the ID  
field is generated locally by the NAT64. If the IPv4 packet was a  
fragment, state is created that allows the same ID value to be used  
for all IPv6 packets or fragments that are part of the same original  
IPv4 packet.

_______________________________________________
Behave mailing list
Behave@ietf.org
https://www.ietf.org/mailman/listinfo/behave