Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile

"Templin (US), Fred L" <Fred.L.Templin@boeing.com> Wed, 11 September 2019 14:48 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A7542120132; Wed, 11 Sep 2019 07:48:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S187yQWbueK7; Wed, 11 Sep 2019 07:48:18 -0700 (PDT)
Received: from clt-mbsout-01.mbs.boeing.net (clt-mbsout-01.mbs.boeing.net [130.76.144.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3A6C612009C; Wed, 11 Sep 2019 07:48:18 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by clt-mbsout-01.mbs.boeing.net (8.14.4/8.14.4/DOWNSTREAM_MBSOUT) with SMTP id x8BEmGXI001047; Wed, 11 Sep 2019 10:48:16 -0400
Received: from XCH16-07-09.nos.boeing.com (xch16-07-09.nos.boeing.com [144.115.66.111]) by clt-mbsout-01.mbs.boeing.net (8.14.4/8.14.4/UPSTREAM_MBSOUT) with ESMTP id x8BEmAYH032163 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=FAIL); Wed, 11 Sep 2019 10:48:10 -0400
Received: from XCH16-07-07.nos.boeing.com (144.115.66.109) by XCH16-07-09.nos.boeing.com (144.115.66.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5; Wed, 11 Sep 2019 07:48:09 -0700
Received: from XCH16-07-07.nos.boeing.com ([fe80::7897:2974:6af3:208e]) by XCH16-07-07.nos.boeing.com ([fe80::7897:2974:6af3:208e%6]) with mapi id 15.01.1713.004; Wed, 11 Sep 2019 07:48:08 -0700
From: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
To: Geoff Huston <gih@apnic.net>
CC: Fernando Gont <fgont@si6networks.com>, Joe Touch <touch@strayalpha.com>, Bob Hinden <bob.hinden@gmail.com>, "draft-ietf-intarea-frag-fragile@ietf.org" <draft-ietf-intarea-frag-fragile@ietf.org>, "int-area@ietf.org" <int-area@ietf.org>, IESG <iesg@ietf.org>, Suresh Krishnan <suresh@kaloom.com>
Thread-Topic: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile
Thread-Index: AQHVZ9vC+0+swfDmzUGVsRG7ePSlRacl+uEAgACTH5A=
Date: Wed, 11 Sep 2019 14:48:08 +0000
Message-ID: <2f6ad3ad143d44588059f083a9e1835c@boeing.com>
References: <efabc7c9f72c4cd9a31f56de24669640@boeing.com> <2EB90A57-9BBD-417C-AEDB-AFBFBB906956@gmail.com> <CAHw9_iKozCAC+8TGS0fSxVZ_3pJW7rnhoKy=Y3AxLqWEXvemcA@mail.gmail.com> <4C8FE1C4-0054-4DA1-BC6E-EBBE78695F1B@gmail.com> <BYAPR05MB5463F112A3FFA8CE6378F3D3AEBB0@BYAPR05MB5463.namprd05.prod.outlook.com> <ab0d5600-d71c-9f0b-2955-64074e040bc6@strayalpha.com> <E770BEF0-D901-4CD0-96E6-C626B560DCD6@gmail.com> <163CD364-2975-467A-8925-F114FFD9C422@employees.org> <E00B6159-2771-42D8-B5E8-7750E0B828DE@strayalpha.com> <3764D860-BC6F-441A-86EF-59E1742D7654@employees.org> <939AFA6F-4C75-4532-82DE-77D14ABC41ED@strayalpha.com> <5C51DCDC-4031-47D9-A28E-812D0E66EE35@employees.org> <5DAA16CC-791E-4042-95F6-65DA58D23EB8@gmail.com> <EA3B45A1-FFD2-49A5-B577-602065632F41@strayalpha.com> <5d22dd34-3972-060e-ddc1-b7f27a110a69@si6networks.com> <14f06217149d40ba8a41865ebb08ee08@boeing.com> <91894E0E-09D3-42E4-B6C4-88AE4493D796@apnic.net>
In-Reply-To: <91894E0E-09D3-42E4-B6C4-88AE4493D796@apnic.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [137.137.12.6]
x-tm-snts-smtp: B1BAD9946C183BD27EE5FAC5BE0FF71F00A7A82B080BAFADF3DC263520CF8E802000:8
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-TM-AS-GCONF: 00
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/F0TcDdXkNZgxkgAwU1yqocdmkiA>
Subject: Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Sep 2019 14:48:22 -0000

Geoff, the 1280 MTU came from Steve Deering's November 13, 1997 proposal to
the ipngwg. The exact message from the ipng archives is reproduced below.

1280 isn't just a recommendation - it's *the law*. Any link that cannot do 1280
(tunnels included) is not an IPv6 link.

Fred

---
From owner-ipng@sunroof.eng.sun.com  Thu Nov 13 16:41:01 1997
Received: (from majordomo@localhost)
	by sunroof.eng.sun.com (8.8.8+Sun.Beta.0/8.8.8) id QAA14339
	for ipng-dist; Thu, 13 Nov 1997 16:38:00 -0800 (PST)
Received: from Eng.Sun.COM (engmail1 [129.146.1.13])
	by sunroof.eng.sun.com (8.8.8+Sun.Beta.0/8.8.8) with SMTP id QAA14332
	for <ipng@sunroof>; Thu, 13 Nov 1997 16:37:51 -0800 (PST)
Received: from saturn.sun.com (saturn.EBay.Sun.COM [129.150.69.2])
	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id QAA28654
	for <ipng@sunroof.Eng.Sun.COM>; Thu, 13 Nov 1997 16:37:48 -0800
Received: from postoffice.cisco.com (postoffice.cisco.com [171.69.200.88])
	by saturn.sun.com (8.8.8/8.8.8) with ESMTP id QAA28706
	for <ipng@sunroof.Eng.Sun.COM>; Thu, 13 Nov 1997 16:37:49 -0800 (PST)
Received: from [171.69.199.124] (deering-mac.cisco.com [171.69.199.124]) by postoffice.cisco.com (8.8.5-Cisco.1/8.6.5) with ESMTP id QAA20862; Thu, 13 Nov 1997 16:37:48 -0800 (PST)
X-Sender: deering@postoffice.cisco.com
Message-Id: <v03110702b0598e80008d@[171.69.199.124]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 13 Nov 1997 16:37:00 -0800
To: IPng Working Group <ipng@sunroof.eng.sun.com>
From: Steve Deering <deering@cisco.com>
Subject: (IPng 4802) increasing the IPv6 minimum MTU
Cc: hinden@ipsilon.com
Sender: owner-ipng@eng.sun.com
Precedence: bulk

In the ipngwg meeting in Munich, I proposed increasing the IPv6 minimum MTU
from 576 bytes to something closer to the Ethernet MTU of 1500 bytes, (i.e.,
1500 minus room for a couple layers of encapsulating headers, so that min-
MTU-size packets that are tunneled across 1500-byte-MTU paths won't be
subject to fragmentation/reassembly on ingress/egress from the tunnels,
in most cases).

After the short discussion in the Munich meeting, I called for a show of
hands, and of those who raised their hands (about half the attendees, if
I recall correctly), the vast majority were in favor of this change --
there were only two or three people opposed.  However, we recognized that
a fundamental change of this nature requires thoughtful discussion and
analysis on the mailing list, to allow those who were not at the meeting
and those who were there but who have since had second thoughts, to express
their opinions.  A couple of people have already, in private conversation,
raised some concerns that were not identified in the discussion at the
meeting, which I report below.  We would like to get this issue settled as
soon as possible, since this is the only thing holding up the publication
of the updated Proposed Standard IPv6 spec (the version we expect to advance
to Draft Standard), so let's see if we can come to a decision before the ID
deadline at the end of next week (hoping there isn't any conflict between
"thoughtful analysis" and "let's decide quickly" :-).

The reason I would like to increase the minimum MTU is that there are some
applications for which Path MTU Discovery just won't work very well, and
which will therefore limit themselves to sending packets no larger than
the minimum MTU.  Increasing the minimum MTU would improve the bandwidth
efficiency, i.e., reduce the header overhead (ratio of header bytes to
payload bytes), for those applications.  Some examples of such applications
are:

    (1) Large-fanout, high-volume multicast apps, such as multicast video
	("Internet TV"), multicast netnews, and multicast software
	distribution.  I believe these applications will end up limiting
	themselves to packets no large than the min MTU in order to avoid
	the danger of incurring  an "implosion" of ICMP Packet-Too-Big
	messages in response.  Even though we have specified that router
	implementations must carefully rate-limit the emission of ICMP
	error messages, I am nervous about how well this will work in
	practice, especially once there is a lot of high-speed, bulk
	multicasting happening.  An appropriate choice of rate or
	probability of emission of Packet-Too-Big responses to multicasts
	really depends on the fan-out of the multicast trees and the MTUs of
	all the branches in that tree, which is unknown and unknowable to
	the routers.  Being sensibly conservative by choosing a very low
	rate could, in many cases, significantly increase the delay before
	the multicast source learns the right MTU for the tree and, hence,
	before receivers on smaller-MTU branches can start receiving the
	data.

    (2) DNS servers, or other similar apps that have the requirement of
	sending a small amount of data (a few packets at most) to a very
	large and transient set of clients.  Such servers often reside on
	links, such as Ethernet, that have an MTU bigger than the links on
	which many of their clients may reside, such as dial-up links.  If
	those servers were to send many reply messages of the size of their
	own links (as required by PMTU Discovery), they could incur very
	many ICMP packet-too-big messages and consequent retransmissions of
	the replies -- in the worse case, multiplying the total bandwidth
	consumption (and delivery delay) by 2 or 3 times that of the
	alternative approach of just using the min MTU always.  Furthermore,
	the use of PMTU Discovery could result in such servers filling up
	lots of memory withed cached PMTU information that will never be
	used again (at least, not before it gets garbage-collected).

The number I propose for the new minimum MTU is 1280 bytes (1024 + 256,
as compared to the classic 576 value which is 512 + 64).  That would
leave generous room for encapsulating/tunnel headers within the Ethernet
MTU of 1500, e.g., enough for two layers of secure tunneling including
both ESP and AUTH headers.

For medium-to-high speed links, this change would reduce the IPv6 header
overhead for min MTU packets from 7% to 3% (a little less than the IPv4
header overhead for 576-byte IPv4 packets).  For low-speed links such as
analog dial-up or low-speed wireless, I assume that header compression will
be employed, which compresses out the IPv6 header completely, so the IPv6
header overhead on such links is effectively zero in any case.

Here is a list of *disadvantages* to increasing the IPv6 minimum MTU that
have been raised, either publically or privately:

    (1) This change would require the specification of link-specific
	fragmentation and reassembly protocols for those link-layers
	that can support 576-byte packets but not 1280-byte packets,
	e.g., AppleTalk.  I think such a protocol could be very simple,
	and I briefly sketch such a protocol in Appendix I of this
	message, as an example.

	Often, those links that have a small native MTU are also the ones
	that have low bandwidth.  On low-bandwidth links, it is often
	desirable to locally fragment and reassemble IPv6 packets anyway
	(even 576-byte ones) in order to avoid having small, interactive
	packets (e.g., keystrokes, character echoes, or voice samples)
	be delayed excessively behind bigger packets (e.g., file transfers);
	the small packets can be interleaved with the fragments of the
	big packets.  Someone mentioned in the meeting in Munich that the
	ISSLL WG was working on a PPP-specific fragmentation and
	reassembly protocol for precisely this reason, so maybe the job
	of specifying such a protocol is already being taken care of.

    (2) Someone raised the concern that, if we make the minimum MTU close
	to Ethernet size, implementors might never bother to implement PMTU
	Discovery.  That would be regrettable, especially if the Internet
	evolves to much more widespread use of links with MTUs bigger
	than Ethernet's, since IPv6 would then fail to take advantage of
	the bandwidth efficiencies possible on larger MTU paths.

    (3) Peter Curran pointed out to me that using a larger minimum MTU for
	IPv6 may result in much greater reliance on *IPv4* fragmentation and
	reassembly during the transition phase while much of the IPv6
	traffic is being tunneled over IPv4.  This could incur unfortunate
	performance penalties for tunneled IPv6 traffic (disasterous
	penalties if there is non-negligible loss of IPv4 fragments).
	I have included Peter's message, describing his concern in more
	detail, in Appendix II of this message.

    (4) Someone expressed the opinion that the requirement for link-layer
	fragmentation and reassembly of IPv6 over low-cost, low-MTU links
	like Firewire, would doom the potential use of IPv6 in cheap
	consumer devices in which minimizing code size is important --
	implementors of cheap Firewire devices would choose IPv4 instead,
	since it would not need a fragmenting "shim" layer.  This may well
	be true, though I suspect the code required for local frag/reasm
	would be negligible compared to the code required for Neighbor
	Discovery.

Personally, I am not convinced by the above concerns that increasing the
minimum MTU would be a mistake, but I'd like to hear what the rest of the
WG thinks.  Are there other problems that anyone can think of?  As I
mentioned earlier, the clear consensus of the Munich attendees was to
increase the minimum MTU, so we need to find out if these newly-identified
problems are enough to swing the consensus in the other direction.  Your
feedback is heartily requested.

Steve

----------

Appendix I

Here is a sketch of a fragmentation and reassembly protocol (call it FRP)
to be employed between the IP layer and the link layer of a link with native
(or configured) MTU less than 1280 bytes.

Identify a Block Size, B, which is the lesser of (a) the native MTU of the
link or (b) a value related to the bandwidth of the link, chosen to bound
the latency that one block can impose on a subsequent block.  For example,
to stay within a latency of 200 ms on a 9600 bps link, choose a block size
of .2 * 9600 = 2400 bits = 240 bytes.

IPv6 packets of length <= B are transmitted directly on the link.
IPv6 packets of length > B are fragmented into blocks of size B
(the last block possibly being shorter than B), and those fragments
are transmitted on the link with an FRP header containing the following
fields:

	[packet ID, block number, end flag]

where:

	packet ID is the same for all fragments of the same packet,
	and is incremented for each new fragmented packet.  The size of
	the packet ID field limits how many packets can be in flight or
	interleaved on the link at any one time.

	block number identifies the blocks within a packet, starting at
	block zero.  The block number field must be large enough to
	identify 1280/B blocks.

	end flag is a one-bit flag which is used to mark the last block
	of a packet.

For example, on a 9600 bps serial link, one might use a block size of
240 bytes and an 8-bit FRP header of the following format:

	4-bit packet ID, which allows interleaving of up to 16 packets.
	3-bit block number, to identify blocks numbered 0 through 5.
	1-bit end flag.

On a 256 kpbs AppleTalk link, one might use the AppleTalk-imposed block
size of ~580 bytes and an 8-bit FRP header of the following format:

	5-bit packet ID, which allows for up to 32 fragmented packets in
		   flight from each source across the AppleTalk internet.
	2-bit block number, to identify blocks numbered 0 through 2.
	1-bit end flag.

On a multi-access link, like AppleTalk, the receiver uses the link-level
source address as well as the packet ID to identify blocks belonging to
the same packet.

If a receiver fails to receive all of the blocks of a packet by the time
the packet number wraps around, it discards the incompletely-reassembled
packet.  Taking this approach, no timers should be needed at the receiver
to detect fragment loss.  We expect the transport layer (e.g., TCP) checksum
at the final IPv6 destination to detect mis-assembly that might be caused by
extreme misordering/delay during transit across the link.

On links on which IPv6 header compression is being used, compression is
performed before fragmentation, and reassembly is done before decompression.

----------

Appendix II

From: Peter Curran <peter@gate.ticl.co.uk>
Subject: Re: IPv6 MTU issue
To: deering@cisco.com (Steve Deering)
Date: Mon, 22 Sep 1997 11:50:34 +0100 (BST)

Steve

My problem was that moving the MTU close to 1500 would have an adverse
effect on the transition strategy.  The current strategy assumes that the
typical Internet MTU is >576, and that sending an IPv6 packet close to the
minimum MTU will not require any IPv4 fragmentation to support the tunnel
transparently.  The PMTU discovery mechanism will 'tune' IPv6 to use a
suitable MTU.

If the IPv4 MTU is <= 576 then IPv4 fragmentation will be required to
provide a tunnel with a minimum MTU of 576 for IPv6.  This clearly places
a significant strain on the tunnelling nodes - as these will normally be
routers then there will be a demand for memory (for reassembly buffers)
as well as CPU (for the frag/reassembly process) that will have an overall
impact on performance.

This is an acceptable risk, as Internet MTU's of <= 576 are not too common.

However, if the minimum MTU of IPv6 is increased to something of the order
of 1200-1500 octets then the likelihood of finding an IPv4 path with an
MTU lower than this value increases (I think significantly) and this will
have a performance impact on these devices.

During the brief discussion of this matter in the IPNG session at Munich
you stated that MTU's less than 1500 where rare.  I don't agree with this
completely - it seems to be pretty common practise for smaller 2nd and 3rd
tier ISP's in the UK to use an MTU of 576 for connection to their transit
provider.  Their objective, I believe, is to 'normalize' the packet sizes
on relatively low bandwidth circuits (typically <1Mbps) to provide better
performance for interactive sessions compared to bulk-file transfer users.

I think that before we go ahead and make a decision on an increased minimum
MTU for IPv6 then we should discuss the issues a little more.

Incidentally, I am not convinced of the benefits of doing this anyway
(ignoring the issue raised above).  With a properly setup stack the PMTU
discovery mechanism seems to be able to select a good MTU for use on the
path - at least that is my experience on our test network and the 6Bone.

I appreciate that you are trying to address the issues of PMTU for multi-
casting but I don't see how raising the minumum MTU is going to help much.
PMTU discovery will still be required irrespective of the minimum MTU
adopted, unless we adopt a value that can be used on all link-layer technolo-
gies.

I would welcome wider discussion of these issues before pressing ahead
with a change.

Best regards

Peter Curran
TICL


--------------------------------------------------------------------
IETF IPng Working Group Mailing List
IPng Home Page:                      http://playground.sun.com/ipng
FTP archive:                      ftp://playground.sun.com/pub/ipng
Direct all administrative requests to majordomo@sunroof.eng.sun.com
--------------------------------------------------------------------