Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile

Brian E Carpenter <> Wed, 11 September 2019 23:14 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 74EAB1200B8 for <>; Wed, 11 Sep 2019 16:14:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id EtOSK3u0F71I for <>; Wed, 11 Sep 2019 16:14:21 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::432]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id AC30112003F for <>; Wed, 11 Sep 2019 16:14:21 -0700 (PDT)
Received: by with SMTP id w22so14657123pfi.9 for <>; Wed, 11 Sep 2019 16:14:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=oByRGiBLyazsP3v/AEFpmQXPPaJImFxhGTRVa3i7BTU=; b=tEBosxThqwBvU09/XNckjhZ3Mvu2a+KD0o0HFhYLqg+fIOFgMMDDD/WeEJaOeiYjqH G/ojSKXO004CQfqjZDfThIpE0L5RWDJjuhgtLUT5W0JZKzrg0raCwWW8g2/z4+K9tn5r CyYpEbrKUbUZ0ntP0YadxhPVNehHqX4JNLFZRc5O8eqUUtKSQt6uB//82chORkOn4LBc qz0B/grqPl1GpciADVQeaxSC6hgr8MABmdYsBKQEr0/Qla3MFHDRQARIMMEm+HgSc5Jf NLrXKszZCLzaTGIIgT548zWtee65SeR/MXdBXDpqjqtSlkxiZSzcJPqCIDsmU2ZDLQlb vYFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=oByRGiBLyazsP3v/AEFpmQXPPaJImFxhGTRVa3i7BTU=; b=Nkm49Fco7xr4Fa2A2WcmLtWt5e7s7rWXxdk7hTnNe/foqbPbjZqFpyiMABkCAcWi4D ybvoMNmD2PQM7mv//k1hfVXYBvWrRH++ncZDCHAUIt5Scbkr9P8+LQtl+FcNBRqGa28g qpcuf/zq3sXfuKzDY30CEDW6qQqQLmLi8SSKu5SnsBI0a2dLhC49qFCcfFTGOchBZpIi 5jNzxosjoQVqYkzAxmMUIt3e25Bid5SiKq751HPORlRvNbXTbkhkMi3ONA8C3iQHaSgM sBMNNb6HhlE5nQVrqNZ1lcFSYK8bJCPLpCQfr5AbJ9VvNe6kl0CyFJDw08B15H6CuJyM w8ng==
X-Gm-Message-State: APjAAAXrRR5gZq3JLXYD4b1SlOXWMQFlWfMIFu/ZUOWgPfl30lLmDGUZ cskTpQc6eoT/NWgGJAop7js=
X-Google-Smtp-Source: APXvYqz9Jf/dCuTvt9v/IrFy42vhB5RQ4Flju/cRm4xEmvL2Tf6p/j6+qi+7UM74SSqHz4L1zVYwIQ==
X-Received: by 2002:a63:1908:: with SMTP id z8mr34427607pgl.433.1568243660751; Wed, 11 Sep 2019 16:14:20 -0700 (PDT)
Received: from [] ( []) by with ESMTPSA id 4sm3796449pja.29.2019. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Sep 2019 16:14:20 -0700 (PDT)
To: Bob Hinden <>, "Templin (US), Fred L" <>
Cc: "" <>, Suresh Krishnan <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
From: Brian E Carpenter <>
Message-ID: <>
Date: Thu, 12 Sep 2019 11:14:16 +1200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 11 Sep 2019 23:14:26 -0000

On 12-Sep-19 10:59, Bob Hinden wrote:
> Fred,
>> On Sep 11, 2019, at 7:48 AM, Templin (US), Fred L <> wrote:
>> Geoff, the 1280 MTU came from Steve Deering's November 13, 1997 proposal to
>> the ipngwg. The exact message from the ipng archives is reproduced below.
>> 1280 isn't just a recommendation - it's *the law*. Any link that cannot do 1280
>> (tunnels included) is not an IPv6 link.
> Yes from IPv6’s view, but you can make a link that can’t do 1280 work if it has its own local L2 fragmentation / reassembly as noted in Steve’s email.  ATM with is 53 byte cells comes to mind.

IPv4 with a small PMTU also comes to mind, as discussed in Section 3.2.2 of RFC 4213:

   In this case, the IPv6 layer has to "see" a link
   layer with an MTU of 1280 bytes and the encapsulator has to use IPv4
   fragmentation in order to forward the 1280 byte IPv6 packets.


> Bob
>> Fred
>> ---
>> From  Thu Nov 13 16:41:01 1997
>> Received: (from majordomo@localhost)
>> 	by (8.8.8+Sun.Beta.0/8.8.8) id QAA14339
>> 	for ipng-dist; Thu, 13 Nov 1997 16:38:00 -0800 (PST)
>> Received: from Eng.Sun.COM (engmail1 [])
>> 	by (8.8.8+Sun.Beta.0/8.8.8) with SMTP id QAA14332
>> 	for <ipng@sunroof>; Thu, 13 Nov 1997 16:37:51 -0800 (PST)
>> Received: from (saturn.EBay.Sun.COM [])
>> 	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id QAA28654
>> 	for <ipng@sunroof.Eng.Sun.COM>OM>; Thu, 13 Nov 1997 16:37:48 -0800
>> Received: from ( [])
>> 	by (8.8.8/8.8.8) with ESMTP id QAA28706
>> 	for <ipng@sunroof.Eng.Sun.COM>OM>; Thu, 13 Nov 1997 16:37:49 -0800 (PST)
>> Received: from [] ( []) by (8.8.5-Cisco.1/8.6.5) with ESMTP id QAA20862; Thu, 13 Nov 1997 16:37:48 -0800 (PST)
>> X-Sender:
>> Message-Id: <v03110702b0598e80008d@[]>
>> Mime-Version: 1.0
>> Content-Type: text/plain; charset="us-ascii"
>> Date: Thu, 13 Nov 1997 16:37:00 -0800
>> To: IPng Working Group <>
>> From: Steve Deering <>
>> Subject: (IPng 4802) increasing the IPv6 minimum MTU
>> Cc:
>> Sender:
>> Precedence: bulk
>> In the ipngwg meeting in Munich, I proposed increasing the IPv6 minimum MTU
>> from 576 bytes to something closer to the Ethernet MTU of 1500 bytes, (i.e.,
>> 1500 minus room for a couple layers of encapsulating headers, so that min-
>> MTU-size packets that are tunneled across 1500-byte-MTU paths won't be
>> subject to fragmentation/reassembly on ingress/egress from the tunnels,
>> in most cases).
>> After the short discussion in the Munich meeting, I called for a show of
>> hands, and of those who raised their hands (about half the attendees, if
>> I recall correctly), the vast majority were in favor of this change --
>> there were only two or three people opposed.  However, we recognized that
>> a fundamental change of this nature requires thoughtful discussion and
>> analysis on the mailing list, to allow those who were not at the meeting
>> and those who were there but who have since had second thoughts, to express
>> their opinions.  A couple of people have already, in private conversation,
>> raised some concerns that were not identified in the discussion at the
>> meeting, which I report below.  We would like to get this issue settled as
>> soon as possible, since this is the only thing holding up the publication
>> of the updated Proposed Standard IPv6 spec (the version we expect to advance
>> to Draft Standard), so let's see if we can come to a decision before the ID
>> deadline at the end of next week (hoping there isn't any conflict between
>> "thoughtful analysis" and "let's decide quickly" :-).
>> The reason I would like to increase the minimum MTU is that there are some
>> applications for which Path MTU Discovery just won't work very well, and
>> which will therefore limit themselves to sending packets no larger than
>> the minimum MTU.  Increasing the minimum MTU would improve the bandwidth
>> efficiency, i.e., reduce the header overhead (ratio of header bytes to
>> payload bytes), for those applications.  Some examples of such applications
>> are:
>>    (1) Large-fanout, high-volume multicast apps, such as multicast video
>> 	("Internet TV"), multicast netnews, and multicast software
>> 	distribution.  I believe these applications will end up limiting
>> 	themselves to packets no large than the min MTU in order to avoid
>> 	the danger of incurring  an "implosion" of ICMP Packet-Too-Big
>> 	messages in response.  Even though we have specified that router
>> 	implementations must carefully rate-limit the emission of ICMP
>> 	error messages, I am nervous about how well this will work in
>> 	practice, especially once there is a lot of high-speed, bulk
>> 	multicasting happening.  An appropriate choice of rate or
>> 	probability of emission of Packet-Too-Big responses to multicasts
>> 	really depends on the fan-out of the multicast trees and the MTUs of
>> 	all the branches in that tree, which is unknown and unknowable to
>> 	the routers.  Being sensibly conservative by choosing a very low
>> 	rate could, in many cases, significantly increase the delay before
>> 	the multicast source learns the right MTU for the tree and, hence,
>> 	before receivers on smaller-MTU branches can start receiving the
>> 	data.
>>    (2) DNS servers, or other similar apps that have the requirement of
>> 	sending a small amount of data (a few packets at most) to a very
>> 	large and transient set of clients.  Such servers often reside on
>> 	links, such as Ethernet, that have an MTU bigger than the links on
>> 	which many of their clients may reside, such as dial-up links.  If
>> 	those servers were to send many reply messages of the size of their
>> 	own links (as required by PMTU Discovery), they could incur very
>> 	many ICMP packet-too-big messages and consequent retransmissions of
>> 	the replies -- in the worse case, multiplying the total bandwidth
>> 	consumption (and delivery delay) by 2 or 3 times that of the
>> 	alternative approach of just using the min MTU always.  Furthermore,
>> 	the use of PMTU Discovery could result in such servers filling up
>> 	lots of memory withed cached PMTU information that will never be
>> 	used again (at least, not before it gets garbage-collected).
>> The number I propose for the new minimum MTU is 1280 bytes (1024 + 256,
>> as compared to the classic 576 value which is 512 + 64).  That would
>> leave generous room for encapsulating/tunnel headers within the Ethernet
>> MTU of 1500, e.g., enough for two layers of secure tunneling including
>> both ESP and AUTH headers.
>> For medium-to-high speed links, this change would reduce the IPv6 header
>> overhead for min MTU packets from 7% to 3% (a little less than the IPv4
>> header overhead for 576-byte IPv4 packets).  For low-speed links such as
>> analog dial-up or low-speed wireless, I assume that header compression will
>> be employed, which compresses out the IPv6 header completely, so the IPv6
>> header overhead on such links is effectively zero in any case.
>> Here is a list of *disadvantages* to increasing the IPv6 minimum MTU that
>> have been raised, either publically or privately:
>>    (1) This change would require the specification of link-specific
>> 	fragmentation and reassembly protocols for those link-layers
>> 	that can support 576-byte packets but not 1280-byte packets,
>> 	e.g., AppleTalk.  I think such a protocol could be very simple,
>> 	and I briefly sketch such a protocol in Appendix I of this
>> 	message, as an example.
>> 	Often, those links that have a small native MTU are also the ones
>> 	that have low bandwidth.  On low-bandwidth links, it is often
>> 	desirable to locally fragment and reassemble IPv6 packets anyway
>> 	(even 576-byte ones) in order to avoid having small, interactive
>> 	packets (e.g., keystrokes, character echoes, or voice samples)
>> 	be delayed excessively behind bigger packets (e.g., file transfers);
>> 	the small packets can be interleaved with the fragments of the
>> 	big packets.  Someone mentioned in the meeting in Munich that the
>> 	ISSLL WG was working on a PPP-specific fragmentation and
>> 	reassembly protocol for precisely this reason, so maybe the job
>> 	of specifying such a protocol is already being taken care of.
>>    (2) Someone raised the concern that, if we make the minimum MTU close
>> 	to Ethernet size, implementors might never bother to implement PMTU
>> 	Discovery.  That would be regrettable, especially if the Internet
>> 	evolves to much more widespread use of links with MTUs bigger
>> 	than Ethernet's, since IPv6 would then fail to take advantage of
>> 	the bandwidth efficiencies possible on larger MTU paths.
>>    (3) Peter Curran pointed out to me that using a larger minimum MTU for
>> 	IPv6 may result in much greater reliance on *IPv4* fragmentation and
>> 	reassembly during the transition phase while much of the IPv6
>> 	traffic is being tunneled over IPv4.  This could incur unfortunate
>> 	performance penalties for tunneled IPv6 traffic (disasterous
>> 	penalties if there is non-negligible loss of IPv4 fragments).
>> 	I have included Peter's message, describing his concern in more
>> 	detail, in Appendix II of this message.
>>    (4) Someone expressed the opinion that the requirement for link-layer
>> 	fragmentation and reassembly of IPv6 over low-cost, low-MTU links
>> 	like Firewire, would doom the potential use of IPv6 in cheap
>> 	consumer devices in which minimizing code size is important --
>> 	implementors of cheap Firewire devices would choose IPv4 instead,
>> 	since it would not need a fragmenting "shim" layer.  This may well
>> 	be true, though I suspect the code required for local frag/reasm
>> 	would be negligible compared to the code required for Neighbor
>> 	Discovery.
>> Personally, I am not convinced by the above concerns that increasing the
>> minimum MTU would be a mistake, but I'd like to hear what the rest of the
>> WG thinks.  Are there other problems that anyone can think of?  As I
>> mentioned earlier, the clear consensus of the Munich attendees was to
>> increase the minimum MTU, so we need to find out if these newly-identified
>> problems are enough to swing the consensus in the other direction.  Your
>> feedback is heartily requested.
>> Steve
>> ----------
>> Appendix I
>> Here is a sketch of a fragmentation and reassembly protocol (call it FRP)
>> to be employed between the IP layer and the link layer of a link with native
>> (or configured) MTU less than 1280 bytes.
>> Identify a Block Size, B, which is the lesser of (a) the native MTU of the
>> link or (b) a value related to the bandwidth of the link, chosen to bound
>> the latency that one block can impose on a subsequent block.  For example,
>> to stay within a latency of 200 ms on a 9600 bps link, choose a block size
>> of .2 * 9600 = 2400 bits = 240 bytes.
>> IPv6 packets of length <= B are transmitted directly on the link.
>> IPv6 packets of length > B are fragmented into blocks of size B
>> (the last block possibly being shorter than B), and those fragments
>> are transmitted on the link with an FRP header containing the following
>> fields:
>> 	[packet ID, block number, end flag]
>> where:
>> 	packet ID is the same for all fragments of the same packet,
>> 	and is incremented for each new fragmented packet.  The size of
>> 	the packet ID field limits how many packets can be in flight or
>> 	interleaved on the link at any one time.
>> 	block number identifies the blocks within a packet, starting at
>> 	block zero.  The block number field must be large enough to
>> 	identify 1280/B blocks.
>> 	end flag is a one-bit flag which is used to mark the last block
>> 	of a packet.
>> For example, on a 9600 bps serial link, one might use a block size of
>> 240 bytes and an 8-bit FRP header of the following format:
>> 	4-bit packet ID, which allows interleaving of up to 16 packets.
>> 	3-bit block number, to identify blocks numbered 0 through 5.
>> 	1-bit end flag.
>> On a 256 kpbs AppleTalk link, one might use the AppleTalk-imposed block
>> size of ~580 bytes and an 8-bit FRP header of the following format:
>> 	5-bit packet ID, which allows for up to 32 fragmented packets in
>> 		   flight from each source across the AppleTalk internet.
>> 	2-bit block number, to identify blocks numbered 0 through 2.
>> 	1-bit end flag.
>> On a multi-access link, like AppleTalk, the receiver uses the link-level
>> source address as well as the packet ID to identify blocks belonging to
>> the same packet.
>> If a receiver fails to receive all of the blocks of a packet by the time
>> the packet number wraps around, it discards the incompletely-reassembled
>> packet.  Taking this approach, no timers should be needed at the receiver
>> to detect fragment loss.  We expect the transport layer (e.g., TCP) checksum
>> at the final IPv6 destination to detect mis-assembly that might be caused by
>> extreme misordering/delay during transit across the link.
>> On links on which IPv6 header compression is being used, compression is
>> performed before fragmentation, and reassembly is done before decompression.
>> ----------
>> Appendix II
>> From: Peter Curran <>
>> Subject: Re: IPv6 MTU issue
>> To: (Steve Deering)
>> Date: Mon, 22 Sep 1997 11:50:34 +0100 (BST)
>> Steve
>> My problem was that moving the MTU close to 1500 would have an adverse
>> effect on the transition strategy.  The current strategy assumes that the
>> typical Internet MTU is >576, and that sending an IPv6 packet close to the
>> minimum MTU will not require any IPv4 fragmentation to support the tunnel
>> transparently.  The PMTU discovery mechanism will 'tune' IPv6 to use a
>> suitable MTU.
>> If the IPv4 MTU is <= 576 then IPv4 fragmentation will be required to
>> provide a tunnel with a minimum MTU of 576 for IPv6.  This clearly places
>> a significant strain on the tunnelling nodes - as these will normally be
>> routers then there will be a demand for memory (for reassembly buffers)
>> as well as CPU (for the frag/reassembly process) that will have an overall
>> impact on performance.
>> This is an acceptable risk, as Internet MTU's of <= 576 are not too common.
>> However, if the minimum MTU of IPv6 is increased to something of the order
>> of 1200-1500 octets then the likelihood of finding an IPv4 path with an
>> MTU lower than this value increases (I think significantly) and this will
>> have a performance impact on these devices.
>> During the brief discussion of this matter in the IPNG session at Munich
>> you stated that MTU's less than 1500 where rare.  I don't agree with this
>> completely - it seems to be pretty common practise for smaller 2nd and 3rd
>> tier ISP's in the UK to use an MTU of 576 for connection to their transit
>> provider.  Their objective, I believe, is to 'normalize' the packet sizes
>> on relatively low bandwidth circuits (typically <1Mbps) to provide better
>> performance for interactive sessions compared to bulk-file transfer users.
>> I think that before we go ahead and make a decision on an increased minimum
>> MTU for IPv6 then we should discuss the issues a little more.
>> Incidentally, I am not convinced of the benefits of doing this anyway
>> (ignoring the issue raised above).  With a properly setup stack the PMTU
>> discovery mechanism seems to be able to select a good MTU for use on the
>> path - at least that is my experience on our test network and the 6Bone.
>> I appreciate that you are trying to address the issues of PMTU for multi-
>> casting but I don't see how raising the minumum MTU is going to help much.
>> PMTU discovery will still be required irrespective of the minimum MTU
>> adopted, unless we adopt a value that can be used on all link-layer technolo-
>> gies.
>> I would welcome wider discussion of these issues before pressing ahead
>> with a change.
>> Best regards
>> Peter Curran
>> --------------------------------------------------------------------
>> IETF IPng Working Group Mailing List
>> IPng Home Page:            
>> FTP archive:            
>> Direct all administrative requests to
>> --------------------------------------------------------------------
> _______________________________________________
> Int-area mailing list