Re: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets

Joel Jaeggli <joelja@bogus.com> Tue, 09 February 2010 22:21 UTC

Return-Path: <joelja@bogus.com>
X-Original-To: behave@core3.amsl.com
Delivered-To: behave@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 54AB03A6E17 for <behave@core3.amsl.com>; Tue, 9 Feb 2010 14:21:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_33=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FSfPNBBb8vm3 for <behave@core3.amsl.com>; Tue, 9 Feb 2010 14:21:54 -0800 (PST)
Received: from nagasaki.bogus.com (nagasaki.bogus.com [147.28.0.81]) by core3.amsl.com (Postfix) with ESMTP id D23513A7417 for <behave@ietf.org>; Tue, 9 Feb 2010 14:21:54 -0800 (PST)
Received: from [10.192.0.252] ([206.132.194.20]) (authenticated bits=0) by nagasaki.bogus.com (8.14.3/8.14.3) with ESMTP id o19MLscp081924 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 9 Feb 2010 22:22:38 GMT (envelope-from joelja@bogus.com)
Message-ID: <4B71E002.9090502@bogus.com>
Date: Tue, 09 Feb 2010 14:21:54 -0800
From: Joel Jaeggli <joelja@bogus.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: "Templin, Fred L" <Fred.L.Templin@boeing.com>
References: <4B6F08CC.2070900@wand.net.nz> <063A973F-EBC3-4CD0-B5B6-B0FB42A8 593D@muada.com><00f201caa8da$b78e3e90$c4f0200a@cisco.com><4B704153.2020007@ it.uc3m.es><015801caa8e6$9b72fff0$c4f0200a@cisco.com><75A95C0D-E2CC-4FD6-B1 1A-5C772FCD0F5C@muada.com><02cd01caa90e$dde921c0$c4f0200a@cisco.com><B50C7F 0A-19DB-4C63-9F72-867B5C2D4841@muada.com><02f101caa916$712d0170$c4f0200a@ci sco.com><E1829B60731D1740BB7A0626B4FAF0A64951038053@XCH-NW-01V.nw.nos.boein g.com><032d01caa91e$343c71d0$c4f0200a@cisco.com> <E1829B60731D1740BB7A0626B4FAF0A64951038079@XCH-NW-01V.nw.nos.boeing.com> <035601caa925$800f2240$c4f0200a@cisco.com> <E1829B60731D1740BB7A0626B4FAF0A649510381E7@XCH-NW-01V.nw.nos.boeing.com>
In-Reply-To: <E1829B60731D1740BB7A0626B4FAF0A649510381E7@XCH-NW-01V.nw.nos.boeing.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.2 (nagasaki.bogus.com [147.28.0.81]); Tue, 09 Feb 2010 22:22:39 +0000 (UTC)
Cc: "behave@ietf.org" <behave@ietf.org>, Dan Wing <dwing@cisco.com>
Subject: Re: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
X-BeenThere: behave@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: mailing list of BEHAVE IETF WG <behave.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/behave>
List-Post: <mailto:behave@ietf.org>
List-Help: <mailto:behave-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/behave>, <mailto:behave-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Feb 2010 22:21:56 -0000

Templin, Fred L wrote:
> Dan,
> 
> About the placement of the translator, in the IPv6->IPv4
> direction the packet incurs a size reduction due to the
> substitution of an IPv4 header in place of the IPv6 header.
> In the IPv4->IPv6 direction, however, the size of the packet
> is inflated. So, e.g., if the translator receives a 1500byte
> IPv4 packet and has to translate it into a 1520byte IPv6
> packet, there is a chance that it may be dropped due to a
> link MTU restriction somewhere in the IPv6 network and an
> ICMPv6 PTB sent back toward the translator. Presumably, the
> translator would translate that into an ICMPv4 PTB message
> that it forwards on to the IPv4 host.
> 
> If I have that right, there may be a concern for whether
> the ICMPv4 PTB would be dropped in the IPv4 network before
> it arrives at the IPv4 host. I ran some tests and found
> that path MTU discovery in IPv4 seems to be problematic
> for a substantial number of the top 1000 websites:
> 
> http://www.ietf.org/mail-archive/web/rrg/current/msg05907.html
> 
> This suggests to me that path MTU discovery in the IPv4
> network "works" simply because it is rarely invoked. In
> other words, the IPv4 network seems to be getting by
> based on the good fortune that most links in the Internet
> core configure an MTU of 1500 or larger. But, this state
> of affairs may be disrupted if more and more translators
> are used that cause IPv4 packets to be inflated in size
> after they are translated into IPv6 packets.

well note that as (rfc 2923) observes and some platforms implement pmtu
black -hole avoidance which involves dropping back to 576 (rfc 791).
Appart from being kind of suboptimal that will work in places where
pmutd won't such as when you have a 9k mtu on your nic, but need to talk
to a printer on the same subnet.

> So, if the translator were located at the site border of
> the IPv4 host (i.e., rather than at the site border of
> the IPv6 host) the bulk of the path would be over IPv6
> instead of IPv4, hence the incidence of ICMPv4 filtering
> would be isolated to the end site of the IPv4 host.
> 
> Has the translator deployment analysis taken IPv4 path
> MTU-related black holing into consideration?
> 
> Fred
> fred.l.templin@boeing.com  
> 
>> -----Original Message-----
>> From: Dan Wing [mailto:dwing@cisco.com]
>> Sent: Monday, February 08, 2010 5:16 PM
>> To: Templin, Fred L; 'Iljitsch van Beijnum'
>> Cc: behave@ietf.org
>> Subject: RE: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
>>
>>
>>
>>> -----Original Message-----
>>> From: behave-bounces@ietf.org
>>> [mailto:behave-bounces@ietf.org] On Behalf Of Templin, Fred L
>>> Sent: Monday, February 08, 2010 5:11 PM
>>> To: Dan Wing; 'Iljitsch van Beijnum'
>>> Cc: behave@ietf.org
>>> Subject: Re: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
>>>
>>> Dan,
>>>
>>>> -----Original Message-----
>>>> From: behave-bounces@ietf.org
>>> [mailto:behave-bounces@ietf.org] On Behalf Of Dan Wing
>>>> Sent: Monday, February 08, 2010 4:24 PM
>>>> To: Templin, Fred L; 'Iljitsch van Beijnum'
>>>> Cc: behave@ietf.org
>>>> Subject: Re: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Templin, Fred L [mailto:Fred.L.Templin@boeing.com]
>>>>> Sent: Monday, February 08, 2010 4:02 PM
>>>>> To: Dan Wing; 'Iljitsch van Beijnum'
>>>>> Cc: behave@ietf.org
>>>>> Subject: RE: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
>>>>>
>>>>> The concern with setting DF=0 on the IPv4 side for packets
>>>>> that are 1280 or smaller on the IPv6 side is the possibility
>>>>> for RFC4963 data corruption if we let it run at line rate.
>>>> At high speed with the same source/destination addresses, and when
>>>> fragmentation on the IPv4 network occurs.  That is, sending
>>> lots of packets
>>>> with DF=0 does not, by itself, cause a problem even if they
>>> are fragmented on
>>>> the same link -- so long as they have different source or different
>>>> destination addresses.  I agree we should make a note of that.
>>> Yes, the concern is for same src,dst operating at line
>>> rate. IPv6 apps assume that there is no possibility for
>>> fragmentation in the network, hence they may not be as
>>> shy as IPv4 apps when it comes to sending large packets
>>> at high data rates. Having the translator set DF=0 can
>>> break that assumption.
>> There are three approaches discussed in new text in
>> http://tools.ietf.org/html/draft-ietf-behave-v6v4-xlate-08#page-18,
>> which only set DF=0 for "small packets" (packets that were
>> less than 1280 on the IPv6 side).
>>
>> I don't think that eliminates your concern, though.
>>
>>>> Is there another approach that should be considered to
>>>> avoid that concern?
>>> I still believe placement of the translator matters.
>>> Placing the translator closer to the IPv4 host
>>> reduces the portion of the path that is exposed to
>>> IPv4 hops. This is true whether the DF=0 or DF=1
>>> approach is used.
>> Agreed.
>>
>> However, that isn't always possible.  For example,
>> someone operating an IPv6-only network will have to
>> place their IPv6/IPv4 translator at their network
>> border.
>>
>> But perhaps some text could be added to
>> draft-ietf-behave-v6v4-xlate which highlighted
>> this recommendation to place the translator as
>> close to the IPv4 host as possible.  Or, similarly,
>> to ensure the IPv4 network has an MTU greater than
>> 1280 (minus whatever the IPv6/IPv4 header differences
>> are).
>>
>> -d
>>
>>
>>> Fred
>>> fred.l.templin@boeing.com
>>>
>>>>> There are certainly plenty of websites on the IPv4 Internet
>>>>> that set DF=0 on every packet they send, and many even use
>>>>> packet sizes larger than 1280. But, how many of them send
>>>>> large packets at line rate?
>>>> Content delivery networks?
>>>>
>>>> -d
>>>>
>>>>
>>>>> Fred
>>>>> fred.l.templin@boeing.com
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: behave-bounces@ietf.org
>>>>> [mailto:behave-bounces@ietf.org] On Behalf Of Dan Wing
>>>>>> Sent: Monday, February 08, 2010 3:29 PM
>>>>>> To: 'Iljitsch van Beijnum'
>>>>>> Cc: behave@ietf.org
>>>>>> Subject: Re: [BEHAVE] Fwd: IPv6 hosts sending <1280 byte packets
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Iljitsch van Beijnum [mailto:iljitsch@muada.com]
>>>>>>> Sent: Monday, February 08, 2010 3:02 PM
>>>>>>> To: Dan Wing
>>>>>>> Cc: 'marcelo bagnulo braun'; behave@ietf.org
>>>>>>> Subject: Re: [BEHAVE] Fwd: IPv6 hosts sending <1280
>>> byte packets
>>>>>>> On 8 feb 2010, at 23:34, Dan Wing wrote:
>>>>>>>
>>>>>>>>> So essentially this means doing more work (keeping PMTUD
>>>>>>>>> state for the IPv4 path) in order to do even more
>>> work later
>>>>>>>>> (fragment). Not sure how that makes sense...
>>>>>>>> Yes, it means the translator is doing the effort of
>>> fragmenting
>>>>>>>> rather than the router immediately in front of the small-MTU
>>>>>>>> link.  However, by doing it this way, we can
>>> maintain end-to-end
>>>>>>>> PMTUD for those IPv6 hosts that receive ICMPv6 PTB and send
>>>>>>>> packets with the fragmentation header.
>>>>>>> No, the two issues are orthogonal.
>>>>>>>
>>>>>>> At some point, an IPv6 host sends a packet that is larger
>>>>>>> than the IPv4 PMTU. A router sends back a too big message.
>>>>>>> Issue A is how to handle this too big message. Issue B is
>>>>>>> what happens to packets that the IPv6 host subsequently sends.
>>>>>> A stateless translator cannot identify 'subsequent' packets,
>>>>>> because it doesn't remember previous state.
>>>>>>
>>>>>>> I think there is agreement that A should be handled by simply
>>>>>>> translating the IPv4 too big too IPv6 and adjusting it as per
>>>>>>> the header size differences, but otherwise let it through
>>>>>>> transparently. (I once argued for rewriting it to
>>> 1280, though.)
>>>>>>> Nothing has changed here.
>>>>>>>
>>>>>>> But now for part B. A stateless translator can't monitor the
>>>>>>> IPv4 PMTU. Not because it's stateful (we're talking per
>>>>>>> destination state, not per session state, it would be doable
>>>>>>> to track this) but because there may be more than one
>>>>>>> translator so any given translator may not see all packets.
>>>>>>> So for stateless translation when the translator has a < 1280
>>>>>>> packet with no fragment header we don't know whether this is
>>>>>>> because PMTUD was succesful and the packet size fits in the
>>>>>>> PMTU, or the IPv6 host ignored the < 1280 too big message and
>>>>>>> is omitting the fragment header in violation of the relevant
>>>>>>> RFCs. In order to be compatible with the latter the only
>>>>>>> thing we can do is set DF to 0.
>>>>>>>
>>>>>>> (I just realize that we don't know what happens when for
>>>>>>> instance the PMTU is 1000 and the IPv6 host wants to send a
>>>>>>> 1100-byte packet: does it include a fragment header or not?)
>>>>>>>
>>>>>>> Note that for part B, if there _is_ a fragment header, that
>>>>>>> will also trigger the translator to set DF to 0. So in the
>>>>>>> case where the IPv6 hosts are well behaved there is no change
>>>>>>> to either parts A or B.
>>>>>> So, I believe we are in agreement:  IPv6 packets less than
>>>>>> 1280 need to be translated to IPv4 and sent with DF=0.
>>>>>>
>>>>>>> In the stateful case we know all packets flow through the
>>>>>>> same translator so we get to track the IPv4 PMTU so when
>>>>>>> there is an IPv6 packet that needs to be translated we know
>>>>>>> whether it's bigger or smaller than the PMTU so we can send
>>>>>>> packets that are bigger with DF=1 and packets that are
>>>>>>> smaller with DF=0 and if we are to, also immediately fragment
>>>>>>> them. However, there is no advantage to doing this extra work
>>>>>>> because even if we get to set DF=1 on some packets that are
>>>>>>> smaller than the PMTU that only lets us discover a reduction
>>>>>>> in the PMTU, but that discovery doesn't buy us anything
>>>>>>> because the IPv6 host isn't going to reduce its packet size.
>>>>>> I agree the IPv6 host won't reduce its packet size below
>>>>>> 1280.  But the IPv6 host becomes aware that the path MTU
>>>>>> is smaller than 1280.  I don't know if the IPv6 host finds
>>>>>> that useful/valuable to know the path MTU is less than the
>>>>>> IPv6 minimum MTU.
>>>>>>
>>>>>>> Fragmenting at the translator also doesn't buy the translator
>>>>>>> anything and it may even get the packet blocked or create
>>>>>>> more work if there's a firewall with a > 1280 MTU before the
>>>>>>> path hits the < 1280 MTU which would need to reassemble the
>>>>>>> packet to observe the port numbers. And it could be an attack
>>>>>>> vector: attackers could send too bigs with tiny MTUs to make
>>>>>>> the translator work harder.
>>>>>>>
>>>>>>>> Certainly there was some
>>>>>>>> perceived value in the IPv6 knowing its packets are being
>>>>>>>> fragmented.
>>>>>>> I don't think so. Even if the application somehow learns this
>>>>>>> information, what is it supposed to do then?
>>>>>> Don't know.  It is obviously easier for the 6/4 translator to
>>>>>> simply fragment the packet (or translate and send with
>>> DF=0) without
>>>>>> informing the IPv6 host, but there is long-standing text in both
>>>>>> RFC2460 and its predecessor (RFC1883) that says to send back
>>>>>> ICMPv6 PTB and the IPv6 host is then supposed to send packets
>>>>>> with the fragmentation header.
>>>>>>
>>>>>>>> I agree that nearly 100% of PMTUD on IPv4 is with TCP.
>>>>>>>> But there isn't anything prohibiting PMTUD for UDP.
>>>>>>> With sufficient thrust pigs fly just fine...
>>>>>>>
>>>>>>> Generally, it's not easy for UDP applications to limit their
>>>>>>> packet sizes. So the protocol would have to support changing
>>>>>>> the packet size on the fly and then the application would
>>>>>>> have to react to too big messages.
>>>>>> draft-petithuguenin-behave-stun-pmtud does not rely on
>>>>>> ICMP packet too big messages.
>>>>>>
>>>>>>> Not impossible, but I've never seen it happen.
>>>>>> Me, neither.  I'm just trying to temper the "only TCP does
>>>>>> PMTUD", because UDP can do PMTUD.  And of course there is value
>>>>>> in sending the largest packets possible.
>>>>>>
>>>>>>> But if BitTorrent goes to UDP it's
>>>>>>> likely they'll put this in in some way.
>>>>>> As of version 2.x, the most popular BitTorrent client on
>>>>>> Windows (uTorrent) has implemented uTP, for whatever that's
>>>>>> worth.  I don't know if they're doing PMTUD, though.  Beta
>>>>>> versions of uTorrent are available for OSX, too, and I expect
>>>>>> they will soon support uTP (if they aren't already).
>>>>>>
>>>>>>>> I tend to think, though, we don't want to figure this out
>>>>>>>> for stateful translators at this point in time.  If it is
>>>>>>>> found useful/necessary, it can be specified later.
>>>>>>> Right.=
>>>>>> -d
>>>>>>
>>>>>> _______________________________________________
>>>>>> Behave mailing list
>>>>>> Behave@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/behave
>>>> _______________________________________________
>>>> Behave mailing list
>>>> Behave@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/behave
>>> _______________________________________________
>>> Behave mailing list
>>> Behave@ietf.org
>>> https://www.ietf.org/mailman/listinfo/behave
> 
> _______________________________________________
> Behave mailing list
> Behave@ietf.org
> https://www.ietf.org/mailman/listinfo/behave
>