Re: [BEHAVE] [Softwires] PMTU Discovery and ICMPv6 filtering

"Dave Dolson" <ddolson@sandvine.com> Mon, 08 February 2010 15:12 UTC

Return-Path: <ddolson@sandvine.com>
X-Original-To: behave@core3.amsl.com
Delivered-To: behave@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 71C1D28C121; Mon, 8 Feb 2010 07:12:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bOWdDM-hRlmh; Mon, 8 Feb 2010 07:12:34 -0800 (PST)
Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by core3.amsl.com (Postfix) with ESMTP id 0130628C124; Mon, 8 Feb 2010 07:12:33 -0800 (PST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 08 Feb 2010 10:13:34 -0500
Message-ID: <F489AB573A749146B33461ECE080913A0D7DEE36@EXCHANGE-1.sandvine.com>
In-Reply-To: <E1829B60731D1740BB7A0626B4FAF0A64951037BF6@XCH-NW-01V.nw.nos.boeing.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [Softwires] [BEHAVE] PMTU Discovery and ICMPv6 filtering
Thread-Index: Acqk9e5k2HVZbYNmRqmDmJiAEKHTMABrvomwAIlmcyA=
References: <4B69B06D.7080606@sri.com> <E1829B60731D1740BB7A0626B4FAF0A64951037BF6@XCH-NW-01V.nw.nos.boeing.com>
From: Dave Dolson <ddolson@sandvine.com>
To: "Templin, Fred L" <Fred.L.Templin@boeing.com>, Ed Jankiewicz <edward.jankiewicz@sri.com>, Behave WG <behave@ietf.org>, softwires@ietf.org
X-Mailman-Approved-At: Mon, 08 Feb 2010 08:39:01 -0800
Subject: Re: [BEHAVE] [Softwires] PMTU Discovery and ICMPv6 filtering
X-BeenThere: behave@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: mailing list of BEHAVE IETF WG <behave.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/behave>
List-Post: <mailto:behave@ietf.org>
List-Help: <mailto:behave-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Feb 2010 15:12:36 -0000

Fred,

Quoted from below (by Ed Jankiewicz):
> > (4) Provide the capability in the IPSec gateway to discover the MTU
> on its WAN interface, subtract
> > the maximum header size that this gateway will add to packets
> presented on its LAN interface, which
> > the host can then discover through the PMTUD.
> >
> > Method (4) would be the best solution, but is not currently
available
> in the IPSec gateway products.

I believe this is referring to mapping of ICMP "too big" messages. When
a tunnel end-point receives a "too big" message, it would have to map
the message to an ICMP message for the original host. I believe this
could be done in a stateless manner using the packet quoted in the
message. (RFC2463 says the "too big" message will include "As much of
invoking packet as will fit without the ICMPv6 packet exceeding the
minimum IPv6 MTU")

In my opinion, this suggestion is indeed the best solution, and should
have MAY or perhaps RECOMMENDED status despite lack of current
availability.

The network will operate most efficiently without fragmentation. When
1500-byte packets are *always* fragmented, there are twice as many of
these packets, and the probability of packet loss is doubled, especially
problematic when the tunnel passes over a network that can experience
congestion. (Only one fragment has to be lost for the entire packet to
be lost.) This will impact the throughput of bulk traffic the most. This
is why TCP went to "don't fragment" mode many years ago.

Therefore I respectfully disagree with your assertion that "whenever it
is practically possible, tunnel routers should use tunnel
fragmentation."

I suggest that tunnel gateways SHOULD send the ICMP "too big", but MAY
instead fragment. A vendor could provide fragmentation or ICMP
configuration options, either of which could be useful depending on the
reliability of the network that carries the tunnel. (Of course the
network carrying the tunnel could simply have a larger MTU.) The
receiving end of the tunnel would have to support reassembly of tunnel
fragments for multi-vendor interop.


As for ICMP black-holes, is this likely to be standard practice with
ICMP6 as well (vs. being done to protect old IPv4 routers from attacks
on their expensive control-path processing) ?


David Dolson
Software Architect, Sandvine Incorporated.
http://www.sandvine.com



> -----Original Message-----
> From: softwires-bounces@ietf.org [mailto:softwires-bounces@ietf.org]
On
> Behalf Of Templin, Fred L
> Sent: Friday, February 05, 2010 5:17 PM
> To: Ed Jankiewicz; Behave WG; softwires@ietf.org
> Subject: Re: [Softwires] [BEHAVE] PMTU Discovery and ICMPv6 filtering
> 
> Hi Ed,
> 
> > -----Original Message-----
> > From: behave-bounces@ietf.org [mailto:behave-bounces@ietf.org] On
> Behalf Of Ed Jankiewicz
> > Sent: Wednesday, February 03, 2010 9:21 AM
> > To: Behave WG; softwires@ietf.org
> > Subject: [BEHAVE] PMTU Discovery and ICMPv6 filtering
> >
> > One of my colleagues received a long comment on Path MTU Discovery
> recommendations his organization
> > published and is seeking advice.  I recall this has been discussed
> several times at IETF meetings,
> > not sure which WG, so this may be redundant.  I've tried to
summarize
> the salient points below, and
> > have two broad questions on this:  Are these points already covered
> in RFCs (other than 4459, 4890)
> > or current Internet-Drafts? If so, I would appreciate pointers.  If
> not already covered by current
> > publications, is there interest in documenting the problem and
> comparing the solutions/drawbacks?
> >
> > The commenter basically wrote:
> >
> > IPv4 and IPv6 treat packets exceeding MTU differently - IPv4 will
> fragment packets that are "too big"
> > but IPv6 will drop the packet and respond with ICMPv6 "too-big"
error
> message. [The subject
> > publication] recommends using the Path MTU Discovery Protocol to
> discover the end-to-end PMTU, which
> > relies on ICMPv6 error messages. These may be blocked by various
> "filters" and IPsec gateways, which
> > is the case in many operational networks.
> >
> > However, even when ICMPv6 is not blocked, IPsec gateways (in tunnel
> mode) add extra headers, and
> > there can be more than one tunnel header involved (routers also
> create tunnels). When a "too-big"
> > message is sent the router will return put in its ICMPv6 message the
> value of the MTU on the next
> > link at layer 2. The host receiving this MTU value in an ICMP
message
> at part of the Path MTU
> > Discovery Protocol has no way of knowing how many extra tunnel
> headers are added along the path, and
> > so if it just takes the reported MTU value without allowing for
these
> extra headers the process will
> > keep on failing and will not recover. We have seen this behavior in
> our experiments.
> >
> > This can be prevented by ensuring that the maximum packet size sent
> by the host is smaller than the
> > layer 2 limit: smaller by an amount estimated to be sufficient to
> allow room for extra headers to be
> > added along the path. Several ways of achieving this are possible:
> >
> > (1) Set this reduced MTU value on the on the IPSec gateway LAN
> interface; the host then discovers
> > this MTU through the PMTUD.
> >
> > (2) Statically configure this reduced MTU value into the host and
> switch off PMTUD.
> >
> > (3) Set a reduced MTU at the IPSec gateway WAN interface; The IPSec
> gateway acts as a host on this
> > interface and so can do packet fragmentation.
> >
> > (4) Provide the capability in the IPSec gateway to discover the MTU
> on its WAN interface, subtract
> > the maximum header size that this gateway will add to packets
> presented on its LAN interface, which
> > the host can then discover through the PMTUD.
> >
> > Method (4) would be the best solution, but is not currently
available
> in the IPSec gateway products.
> > The next best solution is (1), which has been used [in commenter
> experiments]. This is not as good as
> > (4) because it requires manual intervention, and an understanding of
> how to calculate the appropriate
> > (reduced) MTU value.
> > The next best solution is (2), the only disadvantage of this
approach
> is that only one value can be
> > set for all paths and so the worst case (lowest) value has to be
> used. In a complex network it may
> > not always be obvious what the worst case path is, and so a
> conservative estimate may be necessary.
> > Even so this could be preferable in some deployment scenarios since
> the path-MTU discovery protocol
> > relies on the passage of ICMP messages which are sometimes blocked
by
> firewalls and other security
> > devices.
> > Approach (3) is the worst solution since it will cause many IP
> packets to be fragmented which is
> > inefficient (both because, unlike IPv4, the IPv6 header has to be
> extended to include the
> > fragmentation offset field, and because it will result the second
> fragment being very small, i.e. the
> > ratio of user-data to IP header size will be poor).
> >
> > It is likely that for immediate use option (1) should be used
> although (4) would be better if it were
> > supported in the relevant products.
> 
> (1) seems like a safe option at face value, but can lead
> to undesirable inefficiencies. Consider for example the
> diagram below:
> 
>             L1    L2               L3
>             |     |                |
>         W --|--R--|--GW1<====>GW2--|--Z
>             |     |   (Internet)   |
>                X--|     L4, L5,
>                Y--|     L6, etc.
> 
> Here, we have a tunnel beteen 'GW1' and 'GW2' over the
> Internet to connect two networks. 'GW1' sets a reduced
> MTU 'M' on its LAN interface connected to link 'L2', and
> also advertises 'M' in the Router Advertisements it sends
> on 'L2'. Hosts 'X' and 'Y' pick up the reduced MTU from
> the RA and limit the size of the packets they send to at
> most 'M' bytes. But, host 'W' connected to link 'L1' does
> not see the MTU reduction, and hence will routinely send
> packets larger than 'M' to any hosts beyond router 'R'
> such as 'X', 'Y' and 'Z'. These packets will be dropped
> with an ICMPv6 PTB returned, then 'W' will be forced to
> reduce its packet size and retransmit. The only way to
> prevent this is to drive the reduced MTU 'M' deeply into
> the entire network stacked up behind 'GW1', which may
> contain arbitrarily many additional routers and links.
> 
> Now, even if the reduced MTU 'M' were propagated deeply
> throughout the 'GW1' network, if most communications
> remain localized hosts would only be able to use a packet
> size of at most 'M' even if their links natively support
> a much larger MTU. Consider for example that link 'L2'
> in the diagram has a native MTU of 9kb, but the effective
> MTU across the 'GW1'<====>'GW2' tunnel is only 1400. 'GW1'
> will advertise 1400 on link 'L1', and communications
> betweenhosts 'X' and 'Y' will be restricted to using at
> most 1400 byte packets when they could have used 9kb.
> 
> There are a couple of factors to consider in terms of
> what might be a better solution. First, what are the
> expected data rates over the 'GW1'<====>'GW2' tunnel,
> and second what are the performance characteristics of
> those gateways? If the data rates are such that GW1 and
> GW2 are already operating at their peak performance even
> without taking on any additional processing overhead,
> then the best solution would be to make sure that all
> links over which the tunnel might travel (e.g., 'L4',
> 'L5', 'L6', etc.) are large enough to "hide" the tunnel
> encapsulation artifact. For example, if all links 'L4'
> 'L5', 'L6', etc. configure a native MTU of no smaller
> than 1600 and the encapsulation overhead for the tunnel
> is 100 bytes, then all routers and hosts on the LAN side
> of 'GW1' would be able to happily use a 1500 MTU. If the
> MTUs of 'L4', etc. cannot be controlled, however, then
> there is no recourse but to use option (1) and cope with
> the inefficiencies.
> 
> On the other hand, if the data rates across the tunnel
> are nominal and/or 'GW1' and 'GW2' have more than
> sufficient processing capability to take on a modest
> amount of additional overhead, the GWs can use tunnel
> fragmentation so that 'GW1' can present a solid MTU on
> its LAN side interface that does not reflect the size
> of the tunnel encapsulation headers. If the tunnel
> fragmentation could accommodate an MTU of at least 1500
> in this way, then 'GW1' would be able to observe the
> "de facto Internet cell size" of 1500. If we further
> assume that the vast majority of hosts in the world
> today either limit their packet sizes to no more than
> 1500 bytes or are willing to assume the risk of silent
> loss of packets larger than 1500 due to MTU restrictions,
> then there is no need to place artificial restrictions
> on the size of packets that can be used within the 'GW1'
> network. This latter class of hosts (those that send
> packets larger than 1500) would be best served to use
> their own host-based MTU probing mechanisms for sending
> packets larger than 1500 in case the network is somehow
> silently dropping PMTUD messages. RFC4821 was specifically
> designed for this purpose.
> 
> End result - whenever it is practically possible, tunnel
> routers should use tunnel fragmentation and hosts should
> use RFC4821.
> 
> Fred
> fred.l.templin@boeing.com
> 
> > --
> > Ed Jankiewicz - SRI International
> > Fort Monmouth Branch Office - IPv6 Research
> > Supporting DISA Standards Engineering Branch
> > 732-389-1003 or  ed.jankiewicz@sri.com
> >
> > _______________________________________________
> > Behave mailing list
> > Behave@ietf.org
> > https://www.ietf.org/mailman/listinfo/behave
> _______________________________________________
> Softwires mailing list
> Softwires@ietf.org
> https://www.ietf.org/mailman/listinfo/softwires