Re: New Version Notification for draft-bonica-6man-frag-deprecate-00.txt

Geoff Huston <gih@apnic.net> Tue, 25 June 2013 07:16 UTC

Return-Path: <gih@apnic.net>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A40CE21F9FAF for <ipv6@ietfa.amsl.com>; Tue, 25 Jun 2013 00:16:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -98.745
X-Spam-Level:
X-Spam-Status: No, score=-98.745 tagged_above=-999 required=5 tests=[AWL=-0.698, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_NET=0.611, MIME_QP_LONG_LINE=1.396, RDNS_NONE=0.1, RELAY_IS_203=0.994, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JoaFXSSONOV1 for <ipv6@ietfa.amsl.com>; Tue, 25 Jun 2013 00:15:57 -0700 (PDT)
Received: from ia-mailgw.apnic.net (ia-mailgw.apnic.net [IPv6:2001:dd8:a:3::243]) by ietfa.amsl.com (Postfix) with SMTP id 43F0321F9F83 for <ipv6@ietf.org>; Tue, 25 Jun 2013 00:15:55 -0700 (PDT)
Received: from NXMDA1.org.apnic.net (unknown [203.119.93.247]) by ia-mailgw.apnic.net (Halon Mail Gateway) with ESMTP; Tue, 25 Jun 2013 17:15:50 +1000 (EST)
Received: from [172.16.100.111] (203.119.101.249) by NXMDA1.org.apnic.net (203.119.107.11) with Microsoft SMTP Server (TLS) id 14.1.218.12; Tue, 25 Jun 2013 17:15:50 +1000
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: New Version Notification for draft-bonica-6man-frag-deprecate-00.txt
From: Geoff Huston <gih@apnic.net>
In-Reply-To: <2CF4CB03E2AA464BA0982EC92A02CE2509F878B0@BY2PRD0512MB653.namprd05.prod.outlook.com>
Date: Tue, 25 Jun 2013 17:15:49 +1000
Content-Transfer-Encoding: quoted-printable
Message-ID: <EE995320-48D6-4D97-888A-0C2AD5024743@apnic.net>
References: <2CF4CB03E2AA464BA0982EC92A02CE2509F85151@BY2PRD0512MB653.namprd05.prod.outlook.com> <51C56E60.5040009@fud.no> <8C48B86A895913448548E6D15DA7553B9237F3@xmb-rcd-x09.cisco.com> <CAKr6gn17O+B78HJofr-z7Nsgv-y8+w4hgKy+YPicgNS126qwXA@mail.gmail.com> <2CF4CB03E2AA464BA0982EC92A02CE2509F870FC@BY2PRD0512MB653.namprd05.prod.outlook.com> <CAKr6gn2zu2n-pJMirG-seN5WX=Evyquu9EqqLOV-zf-RKQ9eYg@mail.gmail.com> <20130625015317.6B256363BD8F@drugs.dv.isc.org> <2CF4CB03E2AA464BA0982EC92A02CE2509F878B0@BY2PRD0512MB653.namprd05.prod.outlook.com>
To: Ronald Bonica <rbonica@juniper.net>
X-Mailer: Apple Mail (2.1508)
Cc: "ipv6@ietf.org 6man-wg" <ipv6@ietf.org>, Mark Andrews <marka@isc.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Jun 2013 07:16:01 -0000

Hi,

We _are_ seeing IPv6 packet fragmentation in TCP over IPv6, and the causes are systematic, rather than random chance.

What is happening is that the client is behind some form of tunnel (MPLS) we _assume_. The server (FreeBSD recent version) is sitting with a MTU of 1500, as is the client.

The client opens 5 ports in parallel (possibly firefox???) The client makes a request down port 1, then, a little later, makes another request down port 2. The server sends large packets down the TCP session open to port 1.  A gateway on the path sends back an ICMP6 packet too big to the server for this TCP session, with a new MTU of 1492. The TCP session on port 1 now adjust its MSS to 1432 (=1492 - 60) and the server resends its set of data as a sequence of packets each of which is 1492 octets in size - all good, no fragmentation of TCP packets so far.

But then, on the session open to the client's port 2 the server assembles its set of packets for the second request to send to the client. The TCP session on the server which is bound to port 2 of the client is unaware of the MTU change (after all, the ICMP6 PTB message was directed to the TCP session that is bound to client port 1, not port 2), and the server's IPv6 driver gets a set of 1440 octet TCP payload packets to send to the client. But there is a data structure in the server (I think FreeBSD uses the routing table, but the FreeBSD folk probably know better than I do which internal table is used to cache path MTU data derived from ICMPv6 PTB messages) which says that for this IPv6 destination address the new MTU is not 1500, but is 1492. The IPv6 output driver now fragments the queued TCP packets to session connected to the client port 2 to send 2 packets for each single packet passed to it from the upper level TCP driver, and what heads onto the wire is a 1420 octet payload and a 20 octet payload, with IPv6 fragmentation set to bind them together.

As far as I can see this is not "bad" behaviour. The second session has not received it own ICMP6 packet too big, so this TCP session (and all other parallel TCP sessions) are unaware of the need to drop its MSS, and because of the cached per host entry with the new MTU the IP driver is aware that it cannot send these packets out without fragmentation. So the IP layer is aware of this lower path MTU and correctly fragments the TCP packet to ensure delivery. I don't believe this is a bug per se. I think its an unintended side effect of the way in which ICMP6 PTB messages are processed by the recipient.

How prevalent is this behaviour?

Well, how prevalent are browsers that open up parallel ports to the same destination?

Gee, with today's browsers just about everyone does this!

If you want to deprecate IPv6 fragmentation and still allow this form of parallel session behaviour to work rather than wedge, then the internal handling of ICMPv6 PTB messages in the host needs to be reworked as far as I can tell.

thanks,

   Geoff




On 25/06/2013, at 12:22 PM, Ronald Bonica <rbonica@juniper.net> wrote:

> Hi Mark,
> 
> Thanks for this good empirical data!
> 
> I would like to verify your assertion that most of the IPv6 fragment carry UDP. Do you have any way to be sure?
> 
>                                      Ron
> 
> 
>> -----Original Message-----
>> From: Mark Andrews [mailto:marka@isc.org]
>> Sent: Monday, June 24, 2013 9:53 PM
>> To: George Michaelson
>> Cc: Ronald Bonica; ipv6@ietf.org 6man-wg
>> Subject: Re: New Version Notification for draft-bonica-6man-frag-
>> deprecate-00.txt
>> 
>> 
>> In message <CAKr6gn2zu2n-pJMirG-seN5WX=Evyquu9EqqLOV-zf-
>> RKQ9eYg@mail.gmail.com>
>> , George Michaelson writes:
>>> --===============4023034923616370839==
>>> Content-Type: multipart/alternative;
>>> boundary=047d7b86e55011538004dff06308
>>> 
>>> --047d7b86e55011538004dff06308
>>> Content-Type: text/plain; charset=ISO-8859-1
>>> 
>>> On Tue, Jun 25, 2013 at 2:38 AM, Ronald Bonica <rbonica@juniper.net>
>> wrote:
>>> 
>>>>   ** **
>>>> 
>>>> I'd like to understand the basis of these assertions. I believe
>> what
>>>> I am seeing, on the edge, suggests there is in fact V6
>> fragmentation
>>>> in both TCP and UDP.****
>>>> 
>>>> ** **
>>>> 
>>>> ** **
>>>> 
>>>> Hi George,****
>>>> 
>>>> ** **
>>>> 
>>>> It would be helpful if you could describe:****
>>>> 
>>>> ** **
>>>> 
>>>> **-          **Where your observations are being made
>>>> 
>>> 
>>> On our own web services (www.apnic.net, and an associated whois
>>> service which attracts more wide ranging traffic)
>>> 
>>> On 'high in the tree' DNS servers for reverse DNS, including an NS of
>>> in-addr.arpa and ip6.arpa (note: dns transport is disjoint from the
>>> namespace being searched: we see queries over v6 transport to v4
>>> domains, and to ccTLD we secondary)
>>> 
>>> In a packet capture of 2400::/12 run in conjunction with Merit, as
>>> research into darknets.
>>> 
>>> 
>>>> ****
>>>> 
>>>> **-          **What percentage of traffic is fragmented
>>>> 
>>> 
>>> our own web: practically none.
>>> 
>>> our own dns: 0.01%. in a sequence of 10 minute samples. consistently,
>>> I might add.
>>> 
>>> the 2400::/12:  around 0.25% to 1%. so more variable, but higher.
>>> 
>>> 
>>>> ****
>>>> 
>>>> **-          **What kinds of packets are being fragmented
>>>> 
>>> 
>>> our own DNS: port 53. little TCP.
>>> 
>>> 2400::/12 capture. mostly port 53. TCP doesn't get captured in the
>>> darknet research. Its impossible to establish the end-to-end
>> relationship.
>>> 
>>> I am not sure I call up to 1% of something 'rare'. I'm not even sure
>> I
>>> call 0.1% or 0.01% of something 'rare'. Otherwise, Since IPv6
>> adoption
>>> rates are at this class of deployment by end user, perhaps it also
>>> should be considered for deprecation..
>>> 
>>> It really would be helpful to understand your assertion about the
>>> rarity of
>>> IPv6 fragmentation. I want to understand how you got to this point of
>>> view on IPv6 frags.
>>> 
>>> -George
>> 
>> .58% of my IPv6 traffic in fragmented.  Assuming that it is mostly UDP
>> I get 14% of my IPv6 UDP traffic is fragmented.  Most of that traffic
>> is non local.  I would assume most of the drops are due to PMTUD
>> blocking the initial fragment but letting the tail fragment through as
>> this machine is behind a tunnel.
>> 
>> Mark
>> 
>> ip6:
>> 	381915 total packets received
>> 	0 with size smaller than minimum
>> 	0 with data size < data length
>> 	0 with bad options
>> 	0 with incorrect version number
>> 	2213 fragments received
>> 	0 fragments dropped (dup or out of space)
>> 	48 fragments dropped after timeout
>> 	0 fragments that exceeded limit
>> 	1077 packets reassembled ok
>> 	217810 packets for this host
>> 	0 packets forwarded
>> 	93958 packets not forwardable
>> 	0 redirects sent
>> 	297719 packets sent from this host
>> 	0 packets sent with fabricated ip header
>> 	0 output packets dropped due to no bufs, etc.
>> 	5031 output packets discarded due to no route
>> 	33 output datagrams fragmented
>> 	66 fragments created
>> 	0 datagrams that can't be fragmented
>> 	0 packets that violated scope rules
>> 	93924 multicast packets which we don't join
>> 	Input histogram:
>> 		hop by hop: 132
>> 		TCP: 202894
>> 		UDP: 15103
>> 		fragment: 2213
>> 		ICMP6: 161573
>> 
>> --
>> Mark Andrews, ISC
>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>> PHONE: +61 2 9871 4742                 INTERNET: marka@isc.org
>> 
> 
> 
> 
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------