fragmentation and tunnels (was: RE: [Int-area] Re: [tsv-area] Fwd: I-DACTION:draft-heffner-frag-harmful-03.txt)

"Templin, Fred L" <Fred.L.Templin@boeing.com> Thu, 04 January 2007 13:23 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1H2SZA-0007nq-Sh; Thu, 04 Jan 2007 08:23:32 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1H2C4y-0000W5-Qp; Wed, 03 Jan 2007 14:47:16 -0500
Received: from slb-smtpout-01.boeing.com ([130.76.64.48] helo=slb-smtpout-01.ns.cs.boeing.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1H2C4x-0002XT-5j; Wed, 03 Jan 2007 14:47:16 -0500
Received: from slb-av-01.boeing.com (slb-av-01.boeing.com [129.172.13.4]) by slb-smtpout-01.ns.cs.boeing.com (8.13.6/8.13.6/TEST_SMTPIN) with ESMTP id l03Jl6WZ016316 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 3 Jan 2007 11:47:11 -0800 (PST)
Received: from slb-av-01.boeing.com (localhost [127.0.0.1]) by slb-av-01.boeing.com (8.13.6/8.13.6/DOWNSTREAM_RELAY) with ESMTP id l03Jl6gf013732; Wed, 3 Jan 2007 11:47:06 -0800 (PST)
Received: from XCH-NWBH-11.nw.nos.boeing.com (xch-nwbh-11.nw.nos.boeing.com [130.247.55.84]) by slb-av-01.boeing.com (8.13.6/8.13.6/UPSTREAM_RELAY) with ESMTP id l03Jl0Va013538; Wed, 3 Jan 2007 11:47:05 -0800 (PST)
Received: from XCH-NW-7V2.nw.nos.boeing.com ([130.247.54.35]) by XCH-NWBH-11.nw.nos.boeing.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 3 Jan 2007 11:47:00 -0800
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: fragmentation and tunnels (was: RE: [Int-area] Re: [tsv-area] Fwd: I-DACTION:draft-heffner-frag-harmful-03.txt)
Date: Wed, 03 Jan 2007 11:46:16 -0800
Message-ID: <39C363776A4E8C4A94691D2BD9D1C9A10177455F@XCH-NW-7V2.nw.nos.boeing.com>
In-Reply-To: <5.2.1.1.2.20070103113937.046cf228@pop3.jungle.bt.co.uk>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: fragmentation and tunnels (was: RE: [Int-area] Re: [tsv-area] Fwd: I-DACTION:draft-heffner-frag-harmful-03.txt)
Thread-Index: AccvPadG+C34dOuiSBOJSz9KG1jFpgAFMamw
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Bob Briscoe <rbriscoe@jungle.bt.co.uk>, Matt Mathis <mathis@psc.edu>, John Heffner <jheffner@psc.edu>
X-OriginalArrivalTime: 03 Jan 2007 19:47:00.0233 (UTC) FILETIME=[EF149F90:01C72F6F]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2b2ad76aced9b1d558e34a970a85c027
X-Mailman-Approved-At: Thu, 04 Jan 2007 08:23:32 -0500
Cc: tsv-area@ietf.org, int-area@ietf.org
X-BeenThere: tsv-area@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: IETF Transport Area Mailing List <tsv-area.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/tsv-area>
List-Post: <mailto:tsv-area@ietf.org>
List-Help: <mailto:tsv-area-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/tsv-area>, <mailto:tsv-area-request@ietf.org?subject=subscribe>
Errors-To: tsv-area-bounces@ietf.org

Hello Bob,

Your message is very interesting, and I believe a fresh new
way of looking at the situation - especially for fragmentation
that occurs within tunnels. I appreciate what you are saying,
and look forward to further dialogue on it.

I have been looking at fragmentation over IP*-in-IPv4 tunnels
for some time, and 'draft-templin-linkadapt-03.txt' proposes
an Alternate Fragmentation (AF) scheme that occurs below the
transport layer segmentation (e.g., Packetization Layer Path
MTU Discovery) but above IPv4 fragmentation. It supports
segmentation/reassembly at the tunnel endpoints and uses a
dynamic segment size probing mechanism within the tunnel to
avoid in-the-network IPv4 fragmentation, yet remains compatible
with in-the-network fragmentation should it occur. Use of the
scheme is indicated by setting the reserved bit in the IPv4
header 'Flags' field (to be renamed as the 'AF' bit), thus
it obsoletes RFC3514 (an "April Fools Day" RFC).

This AF scheme codes the 'ip_id' field in the IPv4 header and
thus presents a shorter-than-16b ID (the current draft version
specifies only a 6b ID). The assumption (which I think is also
an assumption you are making) is that successful reassembly
will occur within a very short time window if it will occur
at all. This assumes that reassembly failure will normally
occur due to packet loss rather than gross reordering of
packets within the same flow.

There have been many studies on packet reordering within the
Internet, but I have not found one yet that can completely
characterize the *degree* of reordering, i.e., the expected
number of places by which a reordered packet is out-of-order.
Most of the studies I have seen seem to suggest that the
expected degree of reordering of packets within a short chain
of packets sent in rapid succession within the same flow is
typically very small, e.g., a reordering event such as
(1,2,4,5,3,6,...) may occur occasionally, while one such
as (1,2,4,5,...,64k,3,64k+1,...) most likely will not. Other
factors to consider are: 1) as you observe, lengthening the
ID field may be insufficient in the presence of gross
reordering, and 2) transports such as TCP are likely to
treat grossly reordered packets as loss anyway.

Benefits of this AF scheme in relation to IPv4 fragmentation
are:

  - avoidance of in-the-network fragmentation by dynamic
    segment size probing at the tunnel endpoints
  - authenticated probe response from the decapsulating
    tunnel endpoint to avoid false-positives from on- and
    off-path attackers
  - flow-control to avoid reassembly buffer overruns
  - reduced reassembly buffer memory requirements
  - avoidance of undetected reassembly errors via a
    trailing checksum that is dissimilar from the
    Internet checksum
  - larger MTUs for tunnels

Finally, this work has been around for some time now, and
I believe has been reviewed by many while few have commented.
Perhaps now is the time for discussion on a wider basis.

Thanks - Fred
fred.l.templin@boeing.com

PS: Errata in the current draft version:

   - the current draft only speaks to IPv6-in-IPv4 tunnels
   - the ICMP messaging in the current draft may be too
     IPv6-centric
   - network byte ordering issue in the ip_id coding ('A';
     'P' bits should be MSBs; not LSBs)
   - network byte order coding of trailing checksum needs
     to be clarified
   - draft title should be changed to something like
     "Alternate Fragmentation for IP-in-IP Tunnels"

   The intention is to fix these in a -04 version, along
   with any other change suggestions that may come. 

> -----Original Message-----
> From: Bob Briscoe [mailto:rbriscoe@jungle.bt.co.uk] 
> Sent: Wednesday, January 03, 2007 5:46 AM
> To: Matt Mathis; John Heffner
> Cc: int-area@ietf.org; tsv-area@ietf.org
> Subject: [Int-area] Re: [tsv-area] Fwd: 
> I-DACTION:draft-heffner-frag-harmful-03.txt 
> 
> Matt, John,
> 
> 1/ During the lifetime of this draft its scope has become 
> restricted to 
> soley describing the problem. That's good (and it's a good 
> solid draft), 
> but the abstract or intro needs to explictly say what it is 
> deliberately 
> not setting out to say (not describing partial solutions, not 
> proposing 
> solutions).
> 
> 2/ Given the new restricted scope, the title is wrong - it's 
> snappy, but 
> not appropriate for this draft any more. It should be "Field Wrapping 
> Problems with IP Fragmentation" or some such.
> 
> "IPv4 Fragmentation Considered Very Harmful" attributes blame 
> squarely on 
> the fragmentation protocol (as opposed to re-assembly 
> implementations - see 
> later). This title effectively says "You SHOULD NOT fragment 
> with a 16b ID 
> field", which is beyond what the text dares to say, and it's 
> beyond what an 
> informational draft should say.
> 
> Even if this was a BCP rather than informational, I would say 
> it would be 
> wrong to deprecate 16b fragmentation anyway. There are good 
> reasons why 
> fragmentation is useful (e.g. in tunnels), so we need to try 
> really hard to 
> find robust ways to do it before writing it off as 
> deprecated. Saying it's 
> very harmful should only be a last resort if we /prove/ it 
> cannot ever be 
> done robustly.
> 
> As we know, one way to fragment with improved robustness is 
> to use more 
> bits for the ID field. But if 16b fragmentation is a problem now, 32b 
> fragmentation will be a problem in the future (not so distant 
> future as 
> Matt pointed out on int-area 
> <http://osdir.com/ml/int-area@lists.ietf.org/msg00545.html>). IP is 
> sufficiently pivotal at the neck of the hour glass that 
> anything we say 
> about it should endure for decades. I would contend that the 
> IETF shouldn't 
> condone putting off a problem to a later date when we know it 
> will return. 
> The present title leads us towards that sort of solution, 
> implying "32b ID 
> fields aren't [ever] very harmful" - on a draft that isn't 
> even meant to be 
> discussing solutions.
> 
> Given hierarchical layering is here to stay (and always has 
> been), it would 
> be more fruitful to admit that we need to be able to do fragmentation 
> robustly and so we cannot avoid choosing an ID field width that will
> - either not be wide enough at some future time
> - or will be overly wasteful today.
> 
> In this vein, it would be useful to focus everyone on 
> designing better 
> re-assembly /implementations/ around a 16b fragmentation 
> /protocol/ (see a 
> possible idea below). There is no proof yet that we have 
> reached the end of 
> our innovation potential on this.
> 
> A sketch idea for a more robust re-assembly implementation:
> On receipt of each fragment, within the re-assembly 
> implementation increase 
> the precision of the ID field by adding a "received timestamp" of 
> sufficient precision. Then on a first pass, match fragments 
> only if the 
> fragment IDs match AND the timestamps are within a certain 
> narrow range of 
> each other. Otherwise hold the fragment and, as a last resort 
> later, widen 
> the timestamp range that will cause a match - perhaps when 
> the fragment is 
> about to be expired from the buffer (...rest of 
> implementation left as an 
> exercise for the reader).
> 
> In summary, a 16bit fragment ID field should be innocent until proven 
> guilty. As long as the culprit might be /implementations/, the title 
> shouldn't presume the IPv4 fragmentation /protocol/ is guilty.
> 
> 
> 3/ The draft should say something about how the problem gets 
> worse if the 
> sender uses a pseudo-random number generator for the IPid 
> field (as recent 
> versions of OpenBSD and some versions of FreeBSD do). Then 
> there is no 
> longer a deterministic wrapping problem, but there is 
> /always/ some small 
> probability of a clash within the max packet lifetime. A good 
> ref for this is:
> 
> S. Bellovin, ``A Technique for Counting NATted Hosts,'' 
> Proceedings of the 
> Second Internet Measurement Workshop, November 2002. 
> http://www.cs.columbia.edu/~smb/papers/fnat.pdf
> 
> 
> 
> Bob
> 
> At 07:06 06/12/2006, Lars Eggert wrote:
> >Hi,
> >
> >could the people who had commented on the -02 revision during IETF LC
> >please take a look whether the latest revision addresses 
> their issues?
> >
> >Begin forwarded message:
> >>A New Internet-Draft is available from the on-line Internet-Drafts
> >>directories.
> >>
> >>         Title           : IPv4 Fragmentation Considered 
> Very Harmful
> >>         Author(s)       : J. Heffner, et al.
> >>         Filename        : draft-heffner-frag-harmful-03.txt
> >>         Pages           : 9
> >>         Date            : 2006-12-5
> >
> >Thanks,
> >Lars
> >
> >
> >
> 
> ______________________________________________________________
> ______________
> Notice: This contribution is the personal view of the author 
> and does not 
> necessarily reflect the technical nor commercial direction of BT plc.
> ______________________________________________________________
> ______________
> Bob Briscoe,                           Networks Research 
> Centre, BT Research
> B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK.    
> +44 1473 645196 
> 
> 
> 
> _______________________________________________
> Int-area mailing list
> Int-area@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/int-area
>