Re: [RAM] Tunneling overheads and fragmentation

Iljitsch van Beijnum <iljitsch@muada.com> Mon, 10 September 2007 13:05 UTC

Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IUixJ-000469-T3; Mon, 10 Sep 2007 09:05:33 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IUixH-00045z-Rm for ram@iab.org; Mon, 10 Sep 2007 09:05:31 -0400
Received: from sequoia.muada.com ([83.149.65.1]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IUixG-0004Sx-7j for ram@iab.org; Mon, 10 Sep 2007 09:05:31 -0400
Received: from [82.192.90.28] (nirrti.muada.com [82.192.90.28]) (authenticated bits=0) by sequoia.muada.com (8.13.3/8.13.3) with ESMTP id l8AD1Lk8088083 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Mon, 10 Sep 2007 15:01:26 +0200 (CEST) (envelope-from iljitsch@muada.com)
In-Reply-To: <46A21AD6.2060501@firstpr.com.au>
References: <469F7673.6070702@firstpr.com.au> <20070720140433.GA69215@Space.Net> <46A21AD6.2060501@firstpr.com.au>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset="US-ASCII"; delsp="yes"; format="flowed"
Message-Id: <0857530C-5C9D-4D29-ACAB-16A99CBFD929@muada.com>
Content-Transfer-Encoding: 7bit
From: Iljitsch van Beijnum <iljitsch@muada.com>
Subject: Re: [RAM] Tunneling overheads and fragmentation
Date: Mon, 10 Sep 2007 15:03:50 +0200
To: Robin Whittle <rw@firstpr.com.au>
X-Mailer: Apple Mail (2.752.3)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 8fbbaa16f9fd29df280814cb95ae2290
Cc: ram@iab.org
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org

[caught up, yay!]

On 21-jul-2007, at 16:40, Robin Whittle wrote:

> Even then, I fear that in order to preserve both reachability and
> efficiency (and any reachability problems which arise from
> fragmentation), that hosts in all networks, including non-upgraded
> networks, will need to adopt a somewhat lower MTU setting - for
> all the packets they send.

That would assume that there is something that doesn't support the  
"regular" MTU (in theory, there is no such thing, in practice: 1500  
bytes) between the place the packets are encapsulated and the place  
the packets are decapsulated.

I'm still operating under the assumption that both those places are  
in ISP networks. Now obviously there are lots of places in ISP  
networks that only support 1500-byte packets, but what would be a  
better decision here: push out a reduced packet size EVERYWHERE which  
we probably won't be able to raise any time soon, or require ISPs to  
either:

1. Implement path MTU discovery correctly, or:
2. Make sure that all encapsulated packets travel over paths that  
support at least 1500 byte + encapsulation sized packets

> I wonder to what extent every possible application respects the
> operating system's MTU.

The two are unrelated. Applications talk to transport protocols. The  
most popular one, TCP, breaks large chunks of data into smaller  
segments and coalesces multiple small chunks into larger segments.  
UDP simply adds its header and hands over the packet to the IP layer.  
The IP layer will fragment packets that are too large to be sent out  
over the interface of choice and/or the packet's destination. The  
only time there are problems (except of course from firewalls that  
don't like fragmentation) is when the transport protocol or  
application doesn't want fragmentation (DF=1) but the packet is  
larger than the interface MTU. Not sure what happens then, except  
that there is no way that a packet larger than the interface MTU is  
sent in one piece.

> I wonder if there are any widely
> used applications, such as games, P2P programs etc. which are
> hard-coded to assume a certain MTU which is close to, or right at,
> the limit of what can safely be sent across most of the Net.

Mostly video streaming, although that's all quickly moving to TCP  
these days. A common packet size here is 1450 bytes but I think  
that's excluding 28 bytes IPv4 + UDP so the packets are really 1478  
bytes.

I also remember seeing fragmented TCP traffic coming from DSL users.  
Obviously some link didn't do 1500 bytes in the middle but rather  
than use PMTUD the devices involved fragmented. I think DF wasn't set  
in this case, but it could have been cleared by a router or other box  
somewhere along the way. (I once implemented that myself when I got  
dial-up service delivered over a tunnel that only supported a 576- 
byte MTU.)

> Perhaps, in an optimistic scenario, recognising that some packets
> are to be encapsulated by ITRs, an ISP network could set up its
> DHCP system or whatever it is which gives DSL modems their
> parameters, to reduce the MTU and MSS settings sufficiently that
> the ITR-applied encapsulation still results in packets which will
> not be fragmented in most transit and border routers.

There are several MTU-related DHCP options but I'm not sure how  
widely they are supported. Note by the way that ATM which is used for  
both ADSL and DOCSIS (cable) supports MTUs much larger than 1500 bytes.

By the way, I'm currently working on this:

http://www.ietf.org/internet-drafts/draft-van-beijnum-multi-mtu-01.txt

It doesn't directly address this issue but it allows for systems with  
different MTUs to coexist on the same subnet so it becomes a lot  
easier to deploy jumboframes.

> This really needs to be done for all hosts in all networks - not
> just hosts in networks which have been upgraded with ITRs and ETRs
> etc.

Oh joy.

Doing path MTU discovery is probably easier, and note that if one  
host in a TCP session has a reduced MTU, it will let the other know  
during the three-way handshake so the unencumbered host won't send  
packets that are too large.

> Even if the host had its own ITFH
> (Ingress Tunnel Function in Host) function, I don't see how the
> operating system could tell application programs that there is one
> MTU and MSS setting for packets going to some addresses and
> another setting for packets going to other addresses.

Not a problem, path MTU discovery already does this today.

However, I see issues with hosts doing their own en/decapsulation:  
that way, the locators are exposed to hosts and they are at risk for  
becoming just as unrenumberable as IP addresses today.

The only way to make sure that you can renumber easily is if there  
are no firewall rules looking at locators. And the only way that will  
happen is if the relationship between location and identity is strong  
enough that spoofing it is not a viable attack vector. Concretely:  
today the routing system is unspoofable enough that people filter on  
IP addresses. The routing system is run by service providers who  
generally aren't in the business of attacking people. But if a  
locator mapping system is also open to end-users, it may be possible  
for attacks to use this path and people won't be happy to filter on  
just identifiers, just like they aren't happy to filter on just DNS  
names today. So either the ISPs must run it or there must be a heavy  
layer of magic security dust.

> UDP encapsulation, as used by LISP and I think eFIT-APT, involves
> 20 bytes for the IP header, 8 bytes for the UDP header and some
> number of bytes, such as 4, for extra stuff

What exactly was the reason for UDP encapsulation again? I think Dino  
said something about load balancing and firewalling. I'm not buying  
that: you can load balance on the destination IP address and anyone  
running firewalls in the middle of the routing system is best served  
with a single deny any any rule.

> This is where IPv6's long addresses and headers become really
> ugly. There would be 40 bytes for IP-in-IP and 52 for basic UDP
> encapsulation.

We really need bigger packets to offset the ever-increasing overhead.


_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram