Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile

Bob Hinden <bob.hinden@gmail.com> Wed, 11 September 2019 22:59 UTC

Return-Path: <bob.hinden@gmail.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 32E36120227 for <int-area@ietfa.amsl.com>; Wed, 11 Sep 2019 15:59:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k4zSzexGlztZ for <int-area@ietfa.amsl.com>; Wed, 11 Sep 2019 15:59:33 -0700 (PDT)
Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 78CDD12008D for <int-area@ietf.org>; Wed, 11 Sep 2019 15:59:33 -0700 (PDT)
Received: by mail-wm1-x333.google.com with SMTP id q18so5279358wmq.3 for <int-area@ietf.org>; Wed, 11 Sep 2019 15:59:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=GqKAzlo82W7bbci9G47WBAvy7+0W3oY5lhCvRzPjg+U=; b=VzqYT8ua8gnR86DYHAdBfrHX05IlzrDRCDaEygoxOLCSSRU2HCG69TScW2pDObWX1o ctfUdBRhs1noT1Ogsaj06nHB1WqnvfCwqlDOFID4d6QQ+u0o3D/HS0e2o2QnNU2y/keN dnoW08gdVcsTuofYWbhSxzDgyBI0ipY/RGIX/7mybjR5JReR6WGlX8lRyCOLCSLymtYo IWylVAwp73E6MaoRCZTik44RPXe9qRO9Vl2/HwHXBW8opnbts6LnwiImyFb3VOh2Q1Hb dMscAwZBtA0Uesz2WbwShqvAPLNh+llOzshw3waDO7QsBcOV3Siy07s0JhldthivRq48 wlnw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=GqKAzlo82W7bbci9G47WBAvy7+0W3oY5lhCvRzPjg+U=; b=n/xiu/zOBrHss/BIL7IPxfxSV0AXyo84GLXorjLdN6rzU7kkZDpG0dp89YbwltUgKn NR88ZAlWAi3KFCGeryPBikL32byNwA8V2Tr5Qvch44/tttDjdBX57KnGyYOdcDTmvpbf FASxlOwzekrKfIogbjZHXjj3CyXPPjBz1cWq09+FC8AeiXb8DpYrnYSneKLp1V9xxZd8 zqblU62I+i/ipmY83joxXiiHtYerb0ZqeflgAMov0zq2+tSa1OBwBqnwDjrzbyXotCKI TDrGxW+9mu1+4hLGWaZNfq08DLtvNEMrfjerR4X8rgJAq8Gg3Kh38FZ2penkMpLfhqL4 pQAw==
X-Gm-Message-State: APjAAAUvDDq018YxCdf/kAQaWwAUiQbsu6G0ms9VIsBd3OrCRIGCRqFr xjUSyj0CX6/STJ5C67YQ/vA=
X-Google-Smtp-Source: APXvYqyh9SKMnw0Vd4VdTXWyjJVY07R2hWhPbMmlwpbBd/AlE35pHT10UGQjdH5bfd/8zLbEgfG2pQ==
X-Received: by 2002:a7b:c305:: with SMTP id k5mr5329032wmj.123.1568242771864; Wed, 11 Sep 2019 15:59:31 -0700 (PDT)
Received: from [10.0.0.199] (c-24-5-53-184.hsd1.ca.comcast.net. [24.5.53.184]) by smtp.gmail.com with ESMTPSA id m18sm30018857wrg.97.2019.09.11.15.59.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Sep 2019 15:59:30 -0700 (PDT)
From: Bob Hinden <bob.hinden@gmail.com>
Message-Id: <C7AE8A6E-2451-4D08-9D77-6E69DECA4165@gmail.com>
Content-Type: multipart/signed; boundary="Apple-Mail=_36E85ED1-F93F-4BB4-803E-F6FD9624C36C"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Wed, 11 Sep 2019 15:59:24 -0700
In-Reply-To: <2f6ad3ad143d44588059f083a9e1835c@boeing.com>
Cc: Bob Hinden <bob.hinden@gmail.com>, Geoff Huston <gih@apnic.net>, Joe Touch <touch@strayalpha.com>, "int-area@ietf.org" <int-area@ietf.org>, Suresh Krishnan <suresh@kaloom.com>
To: "Templin (US), Fred L" <Fred.L.Templin@boeing.com>
References: <efabc7c9f72c4cd9a31f56de24669640@boeing.com> <2EB90A57-9BBD-417C-AEDB-AFBFBB906956@gmail.com> <CAHw9_iKozCAC+8TGS0fSxVZ_3pJW7rnhoKy=Y3AxLqWEXvemcA@mail.gmail.com> <4C8FE1C4-0054-4DA1-BC6E-EBBE78695F1B@gmail.com> <BYAPR05MB5463F112A3FFA8CE6378F3D3AEBB0@BYAPR05MB5463.namprd05.prod.outlook.com> <ab0d5600-d71c-9f0b-2955-64074e040bc6@strayalpha.com> <E770BEF0-D901-4CD0-96E6-C626B560DCD6@gmail.com> <163CD364-2975-467A-8925-F114FFD9C422@employees.org> <E00B6159-2771-42D8-B5E8-7750E0B828DE@strayalpha.com> <3764D860-BC6F-441A-86EF-59E1742D7654@employees.org> <939AFA6F-4C75-4532-82DE-77D14ABC41ED@strayalpha.com> <5C51DCDC-4031-47D9-A28E-812D0E66EE35@employees.org> <5DAA16CC-791E-4042-95F6-65DA58D23EB8@gmail.com> <EA3B45A1-FFD2-49A5-B577-602065632F41@strayalpha.com> <5d22dd34-3972-060e-ddc1-b7f27a110a69@si6networks.com> <14f06217149d40ba8a41865ebb08ee08@boeing.com> <91894E0E-09D3-42E4-B6C4-88AE4493D796@apnic.net> <2f6ad3ad143d44588059f083a9e1835c@boeing.com>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/ehuSajqDsNTxjx-CDmeWIdHikZ0>
Subject: Re: [Int-area] Discussion about Section 6.1 in draft-ietf-intarea-frag-fragile
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Sep 2019 22:59:37 -0000

Fred,

> On Sep 11, 2019, at 7:48 AM, Templin (US), Fred L <Fred.L.Templin@boeing.com> wrote:
> 
> Geoff, the 1280 MTU came from Steve Deering's November 13, 1997 proposal to
> the ipngwg. The exact message from the ipng archives is reproduced below.
> 
> 1280 isn't just a recommendation - it's *the law*. Any link that cannot do 1280
> (tunnels included) is not an IPv6 link.

Yes from IPv6’s view, but you can make a link that can’t do 1280 work if it has its own local L2 fragmentation / reassembly as noted in Steve’s email.  ATM with is 53 byte cells comes to mind.

Bob


> 
> Fred
> 
> ---
> From owner-ipng@sunroof.eng.sun.com  Thu Nov 13 16:41:01 1997
> Received: (from majordomo@localhost)
> 	by sunroof.eng.sun.com (8.8.8+Sun.Beta.0/8.8.8) id QAA14339
> 	for ipng-dist; Thu, 13 Nov 1997 16:38:00 -0800 (PST)
> Received: from Eng.Sun.COM (engmail1 [129.146.1.13])
> 	by sunroof.eng.sun.com (8.8.8+Sun.Beta.0/8.8.8) with SMTP id QAA14332
> 	for <ipng@sunroof>; Thu, 13 Nov 1997 16:37:51 -0800 (PST)
> Received: from saturn.sun.com (saturn.EBay.Sun.COM [129.150.69.2])
> 	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id QAA28654
> 	for <ipng@sunroof.Eng.Sun.COM>; Thu, 13 Nov 1997 16:37:48 -0800
> Received: from postoffice.cisco.com (postoffice.cisco.com [171.69.200.88])
> 	by saturn.sun.com (8.8.8/8.8.8) with ESMTP id QAA28706
> 	for <ipng@sunroof.Eng.Sun.COM>; Thu, 13 Nov 1997 16:37:49 -0800 (PST)
> Received: from [171.69.199.124] (deering-mac.cisco.com [171.69.199.124]) by postoffice.cisco.com (8.8.5-Cisco.1/8.6.5) with ESMTP id QAA20862; Thu, 13 Nov 1997 16:37:48 -0800 (PST)
> X-Sender: deering@postoffice.cisco.com
> Message-Id: <v03110702b0598e80008d@[171.69.199.124]>
> Mime-Version: 1.0
> Content-Type: text/plain; charset="us-ascii"
> Date: Thu, 13 Nov 1997 16:37:00 -0800
> To: IPng Working Group <ipng@sunroof.eng.sun.com>
> From: Steve Deering <deering@cisco.com>
> Subject: (IPng 4802) increasing the IPv6 minimum MTU
> Cc: hinden@ipsilon.com
> Sender: owner-ipng@eng.sun.com
> Precedence: bulk
> 
> In the ipngwg meeting in Munich, I proposed increasing the IPv6 minimum MTU
> from 576 bytes to something closer to the Ethernet MTU of 1500 bytes, (i.e.,
> 1500 minus room for a couple layers of encapsulating headers, so that min-
> MTU-size packets that are tunneled across 1500-byte-MTU paths won't be
> subject to fragmentation/reassembly on ingress/egress from the tunnels,
> in most cases).
> 
> After the short discussion in the Munich meeting, I called for a show of
> hands, and of those who raised their hands (about half the attendees, if
> I recall correctly), the vast majority were in favor of this change --
> there were only two or three people opposed.  However, we recognized that
> a fundamental change of this nature requires thoughtful discussion and
> analysis on the mailing list, to allow those who were not at the meeting
> and those who were there but who have since had second thoughts, to express
> their opinions.  A couple of people have already, in private conversation,
> raised some concerns that were not identified in the discussion at the
> meeting, which I report below.  We would like to get this issue settled as
> soon as possible, since this is the only thing holding up the publication
> of the updated Proposed Standard IPv6 spec (the version we expect to advance
> to Draft Standard), so let's see if we can come to a decision before the ID
> deadline at the end of next week (hoping there isn't any conflict between
> "thoughtful analysis" and "let's decide quickly" :-).
> 
> The reason I would like to increase the minimum MTU is that there are some
> applications for which Path MTU Discovery just won't work very well, and
> which will therefore limit themselves to sending packets no larger than
> the minimum MTU.  Increasing the minimum MTU would improve the bandwidth
> efficiency, i.e., reduce the header overhead (ratio of header bytes to
> payload bytes), for those applications.  Some examples of such applications
> are:
> 
>    (1) Large-fanout, high-volume multicast apps, such as multicast video
> 	("Internet TV"), multicast netnews, and multicast software
> 	distribution.  I believe these applications will end up limiting
> 	themselves to packets no large than the min MTU in order to avoid
> 	the danger of incurring  an "implosion" of ICMP Packet-Too-Big
> 	messages in response.  Even though we have specified that router
> 	implementations must carefully rate-limit the emission of ICMP
> 	error messages, I am nervous about how well this will work in
> 	practice, especially once there is a lot of high-speed, bulk
> 	multicasting happening.  An appropriate choice of rate or
> 	probability of emission of Packet-Too-Big responses to multicasts
> 	really depends on the fan-out of the multicast trees and the MTUs of
> 	all the branches in that tree, which is unknown and unknowable to
> 	the routers.  Being sensibly conservative by choosing a very low
> 	rate could, in many cases, significantly increase the delay before
> 	the multicast source learns the right MTU for the tree and, hence,
> 	before receivers on smaller-MTU branches can start receiving the
> 	data.
> 
>    (2) DNS servers, or other similar apps that have the requirement of
> 	sending a small amount of data (a few packets at most) to a very
> 	large and transient set of clients.  Such servers often reside on
> 	links, such as Ethernet, that have an MTU bigger than the links on
> 	which many of their clients may reside, such as dial-up links.  If
> 	those servers were to send many reply messages of the size of their
> 	own links (as required by PMTU Discovery), they could incur very
> 	many ICMP packet-too-big messages and consequent retransmissions of
> 	the replies -- in the worse case, multiplying the total bandwidth
> 	consumption (and delivery delay) by 2 or 3 times that of the
> 	alternative approach of just using the min MTU always.  Furthermore,
> 	the use of PMTU Discovery could result in such servers filling up
> 	lots of memory withed cached PMTU information that will never be
> 	used again (at least, not before it gets garbage-collected).
> 
> The number I propose for the new minimum MTU is 1280 bytes (1024 + 256,
> as compared to the classic 576 value which is 512 + 64).  That would
> leave generous room for encapsulating/tunnel headers within the Ethernet
> MTU of 1500, e.g., enough for two layers of secure tunneling including
> both ESP and AUTH headers.
> 
> For medium-to-high speed links, this change would reduce the IPv6 header
> overhead for min MTU packets from 7% to 3% (a little less than the IPv4
> header overhead for 576-byte IPv4 packets).  For low-speed links such as
> analog dial-up or low-speed wireless, I assume that header compression will
> be employed, which compresses out the IPv6 header completely, so the IPv6
> header overhead on such links is effectively zero in any case.
> 
> Here is a list of *disadvantages* to increasing the IPv6 minimum MTU that
> have been raised, either publically or privately:
> 
>    (1) This change would require the specification of link-specific
> 	fragmentation and reassembly protocols for those link-layers
> 	that can support 576-byte packets but not 1280-byte packets,
> 	e.g., AppleTalk.  I think such a protocol could be very simple,
> 	and I briefly sketch such a protocol in Appendix I of this
> 	message, as an example.
> 
> 	Often, those links that have a small native MTU are also the ones
> 	that have low bandwidth.  On low-bandwidth links, it is often
> 	desirable to locally fragment and reassemble IPv6 packets anyway
> 	(even 576-byte ones) in order to avoid having small, interactive
> 	packets (e.g., keystrokes, character echoes, or voice samples)
> 	be delayed excessively behind bigger packets (e.g., file transfers);
> 	the small packets can be interleaved with the fragments of the
> 	big packets.  Someone mentioned in the meeting in Munich that the
> 	ISSLL WG was working on a PPP-specific fragmentation and
> 	reassembly protocol for precisely this reason, so maybe the job
> 	of specifying such a protocol is already being taken care of.
> 
>    (2) Someone raised the concern that, if we make the minimum MTU close
> 	to Ethernet size, implementors might never bother to implement PMTU
> 	Discovery.  That would be regrettable, especially if the Internet
> 	evolves to much more widespread use of links with MTUs bigger
> 	than Ethernet's, since IPv6 would then fail to take advantage of
> 	the bandwidth efficiencies possible on larger MTU paths.
> 
>    (3) Peter Curran pointed out to me that using a larger minimum MTU for
> 	IPv6 may result in much greater reliance on *IPv4* fragmentation and
> 	reassembly during the transition phase while much of the IPv6
> 	traffic is being tunneled over IPv4.  This could incur unfortunate
> 	performance penalties for tunneled IPv6 traffic (disasterous
> 	penalties if there is non-negligible loss of IPv4 fragments).
> 	I have included Peter's message, describing his concern in more
> 	detail, in Appendix II of this message.
> 
>    (4) Someone expressed the opinion that the requirement for link-layer
> 	fragmentation and reassembly of IPv6 over low-cost, low-MTU links
> 	like Firewire, would doom the potential use of IPv6 in cheap
> 	consumer devices in which minimizing code size is important --
> 	implementors of cheap Firewire devices would choose IPv4 instead,
> 	since it would not need a fragmenting "shim" layer.  This may well
> 	be true, though I suspect the code required for local frag/reasm
> 	would be negligible compared to the code required for Neighbor
> 	Discovery.
> 
> Personally, I am not convinced by the above concerns that increasing the
> minimum MTU would be a mistake, but I'd like to hear what the rest of the
> WG thinks.  Are there other problems that anyone can think of?  As I
> mentioned earlier, the clear consensus of the Munich attendees was to
> increase the minimum MTU, so we need to find out if these newly-identified
> problems are enough to swing the consensus in the other direction.  Your
> feedback is heartily requested.
> 
> Steve
> 
> ----------
> 
> Appendix I
> 
> Here is a sketch of a fragmentation and reassembly protocol (call it FRP)
> to be employed between the IP layer and the link layer of a link with native
> (or configured) MTU less than 1280 bytes.
> 
> Identify a Block Size, B, which is the lesser of (a) the native MTU of the
> link or (b) a value related to the bandwidth of the link, chosen to bound
> the latency that one block can impose on a subsequent block.  For example,
> to stay within a latency of 200 ms on a 9600 bps link, choose a block size
> of .2 * 9600 = 2400 bits = 240 bytes.
> 
> IPv6 packets of length <= B are transmitted directly on the link.
> IPv6 packets of length > B are fragmented into blocks of size B
> (the last block possibly being shorter than B), and those fragments
> are transmitted on the link with an FRP header containing the following
> fields:
> 
> 	[packet ID, block number, end flag]
> 
> where:
> 
> 	packet ID is the same for all fragments of the same packet,
> 	and is incremented for each new fragmented packet.  The size of
> 	the packet ID field limits how many packets can be in flight or
> 	interleaved on the link at any one time.
> 
> 	block number identifies the blocks within a packet, starting at
> 	block zero.  The block number field must be large enough to
> 	identify 1280/B blocks.
> 
> 	end flag is a one-bit flag which is used to mark the last block
> 	of a packet.
> 
> For example, on a 9600 bps serial link, one might use a block size of
> 240 bytes and an 8-bit FRP header of the following format:
> 
> 	4-bit packet ID, which allows interleaving of up to 16 packets.
> 	3-bit block number, to identify blocks numbered 0 through 5.
> 	1-bit end flag.
> 
> On a 256 kpbs AppleTalk link, one might use the AppleTalk-imposed block
> size of ~580 bytes and an 8-bit FRP header of the following format:
> 
> 	5-bit packet ID, which allows for up to 32 fragmented packets in
> 		   flight from each source across the AppleTalk internet.
> 	2-bit block number, to identify blocks numbered 0 through 2.
> 	1-bit end flag.
> 
> On a multi-access link, like AppleTalk, the receiver uses the link-level
> source address as well as the packet ID to identify blocks belonging to
> the same packet.
> 
> If a receiver fails to receive all of the blocks of a packet by the time
> the packet number wraps around, it discards the incompletely-reassembled
> packet.  Taking this approach, no timers should be needed at the receiver
> to detect fragment loss.  We expect the transport layer (e.g., TCP) checksum
> at the final IPv6 destination to detect mis-assembly that might be caused by
> extreme misordering/delay during transit across the link.
> 
> On links on which IPv6 header compression is being used, compression is
> performed before fragmentation, and reassembly is done before decompression.
> 
> ----------
> 
> Appendix II
> 
> From: Peter Curran <peter@gate.ticl.co.uk>
> Subject: Re: IPv6 MTU issue
> To: deering@cisco.com (Steve Deering)
> Date: Mon, 22 Sep 1997 11:50:34 +0100 (BST)
> 
> Steve
> 
> My problem was that moving the MTU close to 1500 would have an adverse
> effect on the transition strategy.  The current strategy assumes that the
> typical Internet MTU is >576, and that sending an IPv6 packet close to the
> minimum MTU will not require any IPv4 fragmentation to support the tunnel
> transparently.  The PMTU discovery mechanism will 'tune' IPv6 to use a
> suitable MTU.
> 
> If the IPv4 MTU is <= 576 then IPv4 fragmentation will be required to
> provide a tunnel with a minimum MTU of 576 for IPv6.  This clearly places
> a significant strain on the tunnelling nodes - as these will normally be
> routers then there will be a demand for memory (for reassembly buffers)
> as well as CPU (for the frag/reassembly process) that will have an overall
> impact on performance.
> 
> This is an acceptable risk, as Internet MTU's of <= 576 are not too common.
> 
> However, if the minimum MTU of IPv6 is increased to something of the order
> of 1200-1500 octets then the likelihood of finding an IPv4 path with an
> MTU lower than this value increases (I think significantly) and this will
> have a performance impact on these devices.
> 
> During the brief discussion of this matter in the IPNG session at Munich
> you stated that MTU's less than 1500 where rare.  I don't agree with this
> completely - it seems to be pretty common practise for smaller 2nd and 3rd
> tier ISP's in the UK to use an MTU of 576 for connection to their transit
> provider.  Their objective, I believe, is to 'normalize' the packet sizes
> on relatively low bandwidth circuits (typically <1Mbps) to provide better
> performance for interactive sessions compared to bulk-file transfer users.
> 
> I think that before we go ahead and make a decision on an increased minimum
> MTU for IPv6 then we should discuss the issues a little more.
> 
> Incidentally, I am not convinced of the benefits of doing this anyway
> (ignoring the issue raised above).  With a properly setup stack the PMTU
> discovery mechanism seems to be able to select a good MTU for use on the
> path - at least that is my experience on our test network and the 6Bone.
> 
> I appreciate that you are trying to address the issues of PMTU for multi-
> casting but I don't see how raising the minumum MTU is going to help much.
> PMTU discovery will still be required irrespective of the minimum MTU
> adopted, unless we adopt a value that can be used on all link-layer technolo-
> gies.
> 
> I would welcome wider discussion of these issues before pressing ahead
> with a change.
> 
> Best regards
> 
> Peter Curran
> TICL
> 
> 
> --------------------------------------------------------------------
> IETF IPng Working Group Mailing List
> IPng Home Page:                      http://playground.sun.com/ipng
> FTP archive:                      ftp://playground.sun.com/pub/ipng
> Direct all administrative requests to majordomo@sunroof.eng.sun.com
> --------------------------------------------------------------------