Re: [BEHAVE] [Softwires] PMTU Discovery and ICMPv6 filtering

"Templin, Fred L" <Fred.L.Templin@boeing.com> Mon, 08 February 2010 21:07 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: behave@core3.amsl.com
Delivered-To: behave@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BDCCE28C182; Mon, 8 Feb 2010 13:07:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.615
X-Spam-Level:
X-Spam-Status: No, score=-6.615 tagged_above=-999 required=5 tests=[AWL=-0.016, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rc7to8doujup; Mon, 8 Feb 2010 13:07:05 -0800 (PST)
Received: from blv-smtpout-01.boeing.com (blv-smtpout-01.boeing.com [130.76.32.69]) by core3.amsl.com (Postfix) with ESMTP id A2DE83A70AF; Mon, 8 Feb 2010 13:07:05 -0800 (PST)
Received: from slb-av-01.boeing.com (slb-av-01.boeing.com [129.172.13.4]) by blv-smtpout-01.ns.cs.boeing.com (8.14.0/8.14.0/8.14.0/SMTPOUT) with ESMTP id o18L83CD007645 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 8 Feb 2010 13:08:04 -0800 (PST)
Received: from slb-av-01.boeing.com (localhost [127.0.0.1]) by slb-av-01.boeing.com (8.14.0/8.14.0/DOWNSTREAM_RELAY) with ESMTP id o18L83fo008696; Mon, 8 Feb 2010 13:08:03 -0800 (PST)
Received: from XCH-NWHT-04.nw.nos.boeing.com (xch-nwht-04.nw.nos.boeing.com [130.247.64.250]) by slb-av-01.boeing.com (8.14.0/8.14.0/UPSTREAM_RELAY) with ESMTP id o18L82Jh008675 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Mon, 8 Feb 2010 13:08:03 -0800 (PST)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-04.nw.nos.boeing.com ([130.247.64.250]) with mapi; Mon, 8 Feb 2010 13:08:03 -0800
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Dave Dolson <ddolson@sandvine.com>, Ed Jankiewicz <edward.jankiewicz@sri.com>, Behave WG <behave@ietf.org>, "softwires@ietf.org" <softwires@ietf.org>
Date: Mon, 08 Feb 2010 13:08:01 -0800
Thread-Topic: [Softwires] [BEHAVE] PMTU Discovery and ICMPv6 filtering
Thread-Index: Acqk9e5k2HVZbYNmRqmDmJiAEKHTMABrvomwAIlmcyAADPtXgA==
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64951037F71@XCH-NW-01V.nw.nos.boeing.com>
References: <4B69B06D.7080606@sri.com><E1829B60731D1740BB7A0626B4FAF0A64951037BF6@XCH-NW-01V.nw.nos.boeing.com> <F489AB573A749146B33461ECE080913A0D7DEE36@EXCHANGE-1.sandvine.com>
In-Reply-To: <F489AB573A749146B33461ECE080913A0D7DEE36@EXCHANGE-1.sandvine.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [BEHAVE] [Softwires] PMTU Discovery and ICMPv6 filtering
X-BeenThere: behave@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: mailing list of BEHAVE IETF WG <behave.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/behave>
List-Post: <mailto:behave@ietf.org>
List-Help: <mailto:behave-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Feb 2010 21:07:08 -0000

Hi Dave,

> -----Original Message-----
> From: softwires-bounces@ietf.org [mailto:softwires-bounces@ietf.org] On Behalf Of Dave Dolson
> Sent: Monday, February 08, 2010 7:14 AM
> To: Templin, Fred L; Ed Jankiewicz; Behave WG; softwires@ietf.org
> Subject: Re: [Softwires] [BEHAVE] PMTU Discovery and ICMPv6 filtering
>
> Fred,
>
> Quoted from below (by Ed Jankiewicz):
> > > (4) Provide the capability in the IPSec gateway to discover the MTU
> > on its WAN interface, subtract
> > > the maximum header size that this gateway will add to packets
> > presented on its LAN interface, which
> > > the host can then discover through the PMTUD.
> > >
> > > Method (4) would be the best solution, but is not currently
> available
> > in the IPSec gateway products.
>
> I believe this is referring to mapping of ICMP "too big" messages. When
> a tunnel end-point receives a "too big" message, it would have to map
> the message to an ICMP message for the original host. I believe this
> could be done in a stateless manner using the packet quoted in the
> message. (RFC2463 says the "too big" message will include "As much of
> invoking packet as will fit without the ICMPv6 packet exceeding the
> minimum IPv6 MTU")

But, if the packet-in-error encapsulated in the ICMP
PTB message is encrypted and not complete (i.e., only
the leading 1200 or so bytes of the encrypted packet
are included), how will the tunnel ingress know how to
decrypt it in order to do the stateless translation?

Also, there seems to be some evidence that ICMPv6 PTB
messages are being dropped in the network. And, there is
ample evidence that ICMPv4 PTB messages are frequently
dropped in the network.

> In my opinion, this suggestion is indeed the best solution, and should
> have MAY or perhaps RECOMMENDED status despite lack of current
> availability.
>
> The network will operate most efficiently without fragmentation. When
> 1500-byte packets are *always* fragmented, there are twice as many of
> these packets, and the probability of packet loss is doubled, especially
> problematic when the tunnel passes over a network that can experience
> congestion. (Only one fragment has to be lost for the entire packet to
> be lost.) This will impact the throughput of bulk traffic the most. This
> is why TCP went to "don't fragment" mode many years ago.

The optimum solution comes when through good fortune
the network happens to have link sizes that are large
enough to hide the tunnel overhead artifact. Some have
suggested that almost all links in the Internet core
configure an MTU of ~4KB or greater. That seems like
a fairly bold statement to me, but I have no way of
confirming or refuting it. In that case, it is a
safe bet that packet of up to 1500 bytes will get
through without the need for fragmentation even after
the tunneling overhead is required.

> Therefore I respectfully disagree with your assertion that "whenever it
> is practically possible, tunnel routers should use tunnel
> fragmentation."

OK, I can see how my statement may have been alarming
without sufficient supporting background (which I
failed to provide). First I am not recommending
steady-state segmentation and reassembly; it is
greatly preferable that segmentation and reassembly
be avoided when it is not absolutely necessary. But,
the key is to be able to sense *when* segmentation
and reassembly is necessary, and then only use it
for those cases. That is what SEAL does - it uses
data packets as implicit probes to tell when the
path MTU is too small to pass original packets of
up to 1500 bytes (i.e., before the tunnel headers
are added). If the path MTU is large enough, SEAL
will pass the original packets in one piece w/o
further fragmentation. If the path MTU is too
small, SEAL uses segmentation to produce a "packet
pair" that gets pasted back together at the tunnel
far end. (SEAL will support longer trains of
segments as well, but that would be a deeper
discussion.)

> I suggest that tunnel gateways SHOULD send the ICMP "too big", but MAY
> instead fragment.

I would amend this to say that for DF=1 packets the tunnel
ingress MUST send an ICMP PTB when the packet is larger
than a fixed minimum tunnel MTU and also too large to
traverse the tunnel in a single piece. For packets that
are no larger than the fixed minimum tunnel MTU, the
tunnel ingress SHOULD use tunnel fragmentation (i.e.,
SEAL segmentation and reassembly). Presumably, the tunnel
ingress would choose a fixed minimum tunnel MTU that is
large enough so that source hosts will not see excessive
data loss and retransmissions due to path MTU discovery
interactions.

For many/most deployments, a fixed minimum tunnel MTU of
1500 bytes would seem advisable.

> A vendor could provide fragmentation or ICMP
> configuration options, either of which could be useful depending on the
> reliability of the network that carries the tunnel. (Of course the
> network carrying the tunnel could simply have a larger MTU.) The
> receiving end of the tunnel would have to support reassembly of tunnel
> fragments for multi-vendor interop.
>
>
> As for ICMP black-holes, is this likely to be standard practice with
> ICMP6 as well (vs. being done to protect old IPv4 routers from attacks
> on their expensive control-path processing) ?

Studies have been cited on the RRG list showing
non-negligible losses of ICMPv6 PTB messages as
well as significant losses of ICMPv4 PTB messages.
Trusting an untrustworthy network seems like at
best a tenuous approach.

Thanks - Fred
fred.l.templin@boeing.com

> David Dolson
> Software Architect, Sandvine Incorporated.
> http://www.sandvine.com
>
>
>
> > -----Original Message-----
> > From: softwires-bounces@ietf.org [mailto:softwires-bounces@ietf.org]
> On
> > Behalf Of Templin, Fred L
> > Sent: Friday, February 05, 2010 5:17 PM
> > To: Ed Jankiewicz; Behave WG; softwires@ietf.org
> > Subject: Re: [Softwires] [BEHAVE] PMTU Discovery and ICMPv6 filtering
> >
> > Hi Ed,
> >
> > > -----Original Message-----
> > > From: behave-bounces@ietf.org [mailto:behave-bounces@ietf.org] On
> > Behalf Of Ed Jankiewicz
> > > Sent: Wednesday, February 03, 2010 9:21 AM
> > > To: Behave WG; softwires@ietf.org
> > > Subject: [BEHAVE] PMTU Discovery and ICMPv6 filtering
> > >
> > > One of my colleagues received a long comment on Path MTU Discovery
> > recommendations his organization
> > > published and is seeking advice.  I recall this has been discussed
> > several times at IETF meetings,
> > > not sure which WG, so this may be redundant.  I've tried to
> summarize
> > the salient points below, and
> > > have two broad questions on this:  Are these points already covered
> > in RFCs (other than 4459, 4890)
> > > or current Internet-Drafts? If so, I would appreciate pointers.  If
> > not already covered by current
> > > publications, is there interest in documenting the problem and
> > comparing the solutions/drawbacks?
> > >
> > > The commenter basically wrote:
> > >
> > > IPv4 and IPv6 treat packets exceeding MTU differently - IPv4 will
> > fragment packets that are "too big"
> > > but IPv6 will drop the packet and respond with ICMPv6 "too-big"
> error
> > message. [The subject
> > > publication] recommends using the Path MTU Discovery Protocol to
> > discover the end-to-end PMTU, which
> > > relies on ICMPv6 error messages. These may be blocked by various
> > "filters" and IPsec gateways, which
> > > is the case in many operational networks.
> > >
> > > However, even when ICMPv6 is not blocked, IPsec gateways (in tunnel
> > mode) add extra headers, and
> > > there can be more than one tunnel header involved (routers also
> > create tunnels). When a "too-big"
> > > message is sent the router will return put in its ICMPv6 message the
> > value of the MTU on the next
> > > link at layer 2. The host receiving this MTU value in an ICMP
> message
> > at part of the Path MTU
> > > Discovery Protocol has no way of knowing how many extra tunnel
> > headers are added along the path, and
> > > so if it just takes the reported MTU value without allowing for
> these
> > extra headers the process will
> > > keep on failing and will not recover. We have seen this behavior in
> > our experiments.
> > >
> > > This can be prevented by ensuring that the maximum packet size sent
> > by the host is smaller than the
> > > layer 2 limit: smaller by an amount estimated to be sufficient to
> > allow room for extra headers to be
> > > added along the path. Several ways of achieving this are possible:
> > >
> > > (1) Set this reduced MTU value on the on the IPSec gateway LAN
> > interface; the host then discovers
> > > this MTU through the PMTUD.
> > >
> > > (2) Statically configure this reduced MTU value into the host and
> > switch off PMTUD.
> > >
> > > (3) Set a reduced MTU at the IPSec gateway WAN interface; The IPSec
> > gateway acts as a host on this
> > > interface and so can do packet fragmentation.
> > >
> > > (4) Provide the capability in the IPSec gateway to discover the MTU
> > on its WAN interface, subtract
> > > the maximum header size that this gateway will add to packets
> > presented on its LAN interface, which
> > > the host can then discover through the PMTUD.
> > >
> > > Method (4) would be the best solution, but is not currently
> available
> > in the IPSec gateway products.
> > > The next best solution is (1), which has been used [in commenter
> > experiments]. This is not as good as
> > > (4) because it requires manual intervention, and an understanding of
> > how to calculate the appropriate
> > > (reduced) MTU value.
> > > The next best solution is (2), the only disadvantage of this
> approach
> > is that only one value can be
> > > set for all paths and so the worst case (lowest) value has to be
> > used. In a complex network it may
> > > not always be obvious what the worst case path is, and so a
> > conservative estimate may be necessary.
> > > Even so this could be preferable in some deployment scenarios since
> > the path-MTU discovery protocol
> > > relies on the passage of ICMP messages which are sometimes blocked
> by
> > firewalls and other security
> > > devices.
> > > Approach (3) is the worst solution since it will cause many IP
> > packets to be fragmented which is
> > > inefficient (both because, unlike IPv4, the IPv6 header has to be
> > extended to include the
> > > fragmentation offset field, and because it will result the second
> > fragment being very small, i.e. the
> > > ratio of user-data to IP header size will be poor).
> > >
> > > It is likely that for immediate use option (1) should be used
> > although (4) would be better if it were
> > > supported in the relevant products.
> >
> > (1) seems like a safe option at face value, but can lead
> > to undesirable inefficiencies. Consider for example the
> > diagram below:
> >
> >             L1    L2               L3
> >             |     |                |
> >         W --|--R--|--GW1<====>GW2--|--Z
> >             |     |   (Internet)   |
> >                X--|     L4, L5,
> >                Y--|     L6, etc.
> >
> > Here, we have a tunnel beteen 'GW1' and 'GW2' over the
> > Internet to connect two networks. 'GW1' sets a reduced
> > MTU 'M' on its LAN interface connected to link 'L2', and
> > also advertises 'M' in the Router Advertisements it sends
> > on 'L2'. Hosts 'X' and 'Y' pick up the reduced MTU from
> > the RA and limit the size of the packets they send to at
> > most 'M' bytes. But, host 'W' connected to link 'L1' does
> > not see the MTU reduction, and hence will routinely send
> > packets larger than 'M' to any hosts beyond router 'R'
> > such as 'X', 'Y' and 'Z'. These packets will be dropped
> > with an ICMPv6 PTB returned, then 'W' will be forced to
> > reduce its packet size and retransmit. The only way to
> > prevent this is to drive the reduced MTU 'M' deeply into
> > the entire network stacked up behind 'GW1', which may
> > contain arbitrarily many additional routers and links.
> >
> > Now, even if the reduced MTU 'M' were propagated deeply
> > throughout the 'GW1' network, if most communications
> > remain localized hosts would only be able to use a packet
> > size of at most 'M' even if their links natively support
> > a much larger MTU. Consider for example that link 'L2'
> > in the diagram has a native MTU of 9kb, but the effective
> > MTU across the 'GW1'<====>'GW2' tunnel is only 1400. 'GW1'
> > will advertise 1400 on link 'L1', and communications
> > betweenhosts 'X' and 'Y' will be restricted to using at
> > most 1400 byte packets when they could have used 9kb.
> >
> > There are a couple of factors to consider in terms of
> > what might be a better solution. First, what are the
> > expected data rates over the 'GW1'<====>'GW2' tunnel,
> > and second what are the performance characteristics of
> > those gateways? If the data rates are such that GW1 and
> > GW2 are already operating at their peak performance even
> > without taking on any additional processing overhead,
> > then the best solution would be to make sure that all
> > links over which the tunnel might travel (e.g., 'L4',
> > 'L5', 'L6', etc.) are large enough to "hide" the tunnel
> > encapsulation artifact. For example, if all links 'L4'
> > 'L5', 'L6', etc. configure a native MTU of no smaller
> > than 1600 and the encapsulation overhead for the tunnel
> > is 100 bytes, then all routers and hosts on the LAN side
> > of 'GW1' would be able to happily use a 1500 MTU. If the
> > MTUs of 'L4', etc. cannot be controlled, however, then
> > there is no recourse but to use option (1) and cope with
> > the inefficiencies.
> >
> > On the other hand, if the data rates across the tunnel
> > are nominal and/or 'GW1' and 'GW2' have more than
> > sufficient processing capability to take on a modest
> > amount of additional overhead, the GWs can use tunnel
> > fragmentation so that 'GW1' can present a solid MTU on
> > its LAN side interface that does not reflect the size
> > of the tunnel encapsulation headers. If the tunnel
> > fragmentation could accommodate an MTU of at least 1500
> > in this way, then 'GW1' would be able to observe the
> > "de facto Internet cell size" of 1500. If we further
> > assume that the vast majority of hosts in the world
> > today either limit their packet sizes to no more than
> > 1500 bytes or are willing to assume the risk of silent
> > loss of packets larger than 1500 due to MTU restrictions,
> > then there is no need to place artificial restrictions
> > on the size of packets that can be used within the 'GW1'
> > network. This latter class of hosts (those that send
> > packets larger than 1500) would be best served to use
> > their own host-based MTU probing mechanisms for sending
> > packets larger than 1500 in case the network is somehow
> > silently dropping PMTUD messages. RFC4821 was specifically
> > designed for this purpose.
> >
> > End result - whenever it is practically possible, tunnel
> > routers should use tunnel fragmentation and hosts should
> > use RFC4821.
> >
> > Fred
> > fred.l.templin@boeing.com
> >
> > > --
> > > Ed Jankiewicz - SRI International
> > > Fort Monmouth Branch Office - IPv6 Research
> > > Supporting DISA Standards Engineering Branch
> > > 732-389-1003 or  ed.jankiewicz@sri.com
> > >
> > > _______________________________________________
> > > Behave mailing list
> > > Behave@ietf.org
> > > https://www.ietf.org/mailman/listinfo/behave
> > _______________________________________________
> > Softwires mailing list
> > Softwires@ietf.org
> > https://www.ietf.org/mailman/listinfo/softwires
> _______________________________________________
> Softwires mailing list
> Softwires@ietf.org
> https://www.ietf.org/mailman/listinfo/softwires