Re: [tsvwg] [Int-area] Fragmentation and Path MTU text in nvo3 dataplane reqts draft

"Templin, Fred L" <Fred.L.Templin@boeing.com> Fri, 16 May 2014 18:28 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 275BF1A0326; Fri, 16 May 2014 11:28:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.852
X-Spam-Level:
X-Spam-Status: No, score=-4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B2a7oI7THVmu; Fri, 16 May 2014 11:28:01 -0700 (PDT)
Received: from stl-mbsout-02.boeing.com (stl-mbsout-02.boeing.com [130.76.96.170]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F19661A0322; Fri, 16 May 2014 11:28:00 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by stl-mbsout-02.boeing.com (8.14.4/8.14.4/DOWNSTREAM_MBSOUT) with SMTP id s4GIRqfv009963; Fri, 16 May 2014 13:27:52 -0500
Received: from XCH-PHX-111.sw.nos.boeing.com (xch-phx-111.sw.nos.boeing.com [130.247.25.132]) by stl-mbsout-02.boeing.com (8.14.4/8.14.4/UPSTREAM_MBSOUT) with ESMTP id s4GIRmhQ009726 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK); Fri, 16 May 2014 13:27:49 -0500
Received: from XCH-BLV-205.nw.nos.boeing.com (10.57.37.61) by XCH-PHX-111.sw.nos.boeing.com (130.247.25.132) with Microsoft SMTP Server (TLS) id 14.3.181.6; Fri, 16 May 2014 11:27:47 -0700
Received: from XCH-BLV-504.nw.nos.boeing.com ([169.254.4.105]) by XCH-BLV-205.nw.nos.boeing.com ([169.254.5.221]) with mapi id 14.03.0181.006; Fri, 16 May 2014 11:27:45 -0700
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: "Black, David" <david.black@emc.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>, "tsv-area@ietf.org" <tsv-area@ietf.org>
Thread-Topic: [Int-area] Fragmentation and Path MTU text in nvo3 dataplane reqts draft
Thread-Index: AQHPcTSH/Asq9x/h/UWLdY5SldYekg==
Date: Fri, 16 May 2014 18:27:44 +0000
Message-ID: <2134F8430051B64F815C691A62D983181B2AD6AE@XCH-BLV-504.nw.nos.boeing.com>
References: <8D3D17ACE214DC429325B2B98F3AE712076C55B7B1@MX15A.corp.emc.com> <2134F8430051B64F815C691A62D983181B2AC94E@XCH-BLV-504.nw.nos.boeing.com>
In-Reply-To: <2134F8430051B64F815C691A62D983181B2AC94E@XCH-BLV-504.nw.nos.boeing.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [130.247.104.6]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-TM-AS-MML: disable
Archived-At: http://mailarchive.ietf.org/arch/msg/tsvwg/xM4Y1bQ_uvY1n4oaQC6JLnUy7os
X-Mailman-Approved-At: Sat, 17 May 2014 09:01:04 -0700
Cc: Mark Townsley <townsley@cisco.com>, "int-area@ietf.org" <int-area@ietf.org>
Subject: Re: [tsvwg] [Int-area] Fragmentation and Path MTU text in nvo3 dataplane reqts draft
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 May 2014 18:28:03 -0000

> That document should be the place to put generic recommendations for
> tunnel MTU handling that apply to all tunnel types.

In case you are wondering what I think "generic recommendations for
tunnel MTU handling" should look like, here is what I think:

   The tunnel link Maximum Transmission Unit (MTU) is 64KB minus the
   encapsulation overhead for IPv4 [RFC0791] and 4GB minus the
   encapsulation overhead for IPv6 [RFC2675].  This is the most that
   IPv4 and IPv6 (respectively) can convey within the constraints of
   protocol constants, but actual sizes available for tunneling will
   frequently be much smaller.

   The base tunneling specifications for IPv4 and IPv6 typically set a
   static MTU on the tunnel ingress to 1500 bytes minus the
   encapsulation overhead or smaller still if the tunnel is likely to
   incur additional encapsulations on the path.  This can result in path
   MTU related black holes when packets that are too large to be
   accommodated over the tunnel are dropped, but the resulting ICMP
   Packet Too Big (PTB) messages are lost on the return path.  As a
   result, tunnels use the following MTU mitigations to accommodate
   larger packets.

   Tunnels set their ingress MTU to the larger of the
   underlying interface MTU minus the encapsulation overhead, and 1500
   bytes.  Tunnels optionally cache per-egress MTU values in
   the underlying IP path MTU discovery cache initialized to the
   underlying interface MTU.

   Tunnels admit packets that are no larger than 1280 bytes minus the
   encapsulation overhead (*) as well as packets that are larger than
   1500 bytes into the tunnel without fragmentation, i.e., as long as
   they are no larger than the tunnel ingress MTU before encapsulation
   and also no larger than the cached per-egress MTU following
   encapsulation.  For IPv4, the ingress sets the "Don't Fragment" (DF)
   bit to 0 for packets no larger than 1280 bytes minus the encapsulation
   overhead (*) and sets the DF bit to 1 for packets larger than 1500
   bytes.  If a large packet is lost in the path, the ingress may
   optionally cache the MTU reported in the resulting PTB message or may
   ignore the message, e.g., if there is a possibility that the message
   is spurious.

   For packets admitted into the tunnel that are larger than 1280 bytes
   minus the encapsulation overhead (*) but no larger than 1500 bytes,
   the ingress uses IP fragmentation to fragment the encapsulated packet
   into two pieces (where the first fragment contains 1024 bytes of the
   fragmented inner packet) then sends the fragments to the egress.
   If the outer protocol is IPv4, the node sends the fragments with
   DF set to 0 and subject to rate limiting to avoid
   reassembly errors [RFC4963][RFC6864].  For both IPv4 and IPv6, the
   ingress also sends a 1500 byte probe message (**) to the egress,
   subject to rate limiting. To construct a probe, the ingress prepares
   an ICMPv6 Neighbor Solicitation (NS) message with trailing padding
   octets added to a length of 1500 bytes but does not include the
   length of the padding in the IPv6 Payload Length field.  The ingress
   then encapsulates the NS in the outer encapsulation headers (while
   including the length of the padding in the outer length fields), sets
   DF to 1 (for IPv4) and sends the padded NS message to the neighbor.
   If the egress returns an NA message, the ingress may then send whole
   packets within this size range and (for IPv4) relax the rate limiting
   requirement. (Note that for tunnels that do not perform IPv6 neighbor
   discovery, an ICMP echo request message can be used instead of NS.) 

   The egress MUST be capable of reassembling packets up to 1500 bytes
   plus the encapsulation overhead length.  It is therefore RECOMMENDED
   that the egress be capable of reassembling at least 2KB.

   (*) Note that if it is known without probing that the minimum Path
   MTU to a tunnel egress is MINMTU bytes (where 1280 < MINMTU < 1500)
   then MINMTU can be used instead of 1280 in the fragmentation threshold
   considerations listed above.

   (**) It is RECOMMENDED that no probes smaller than 1500 bytes be used
   for MTU probing purposes, since smaller probes may be fragmented if
   there is a nested tunnel somewhere on the path to the egress.
   Probe sizes larger than 1500 bytes MAY be used, but may be
   unnecessary since original sources are expected to use [RFC4821]
   when sending large packets.

I think this applies to all IP-in-(foo)-in-IP tunnel types, and could go
as a set of generic recommendations to be cited by other documents.

Comments?

Thanks - Fred
fred.l.templin@boeing.com

> -----Original Message-----
> From: Int-area [mailto:int-area-bounces@ietf.org] On Behalf Of Templin, Fred L
> Sent: Thursday, May 15, 2014 3:41 PM
> To: Black, David; tsvwg@ietf.org; tsv-area@ietf.org
> Cc: Mark Townsley; int-area@ietf.org
> Subject: Re: [Int-area] Fragmentation and Path MTU text in nvo3 dataplane reqts draft
> 
> Hi,
> 
> > -----Original Message-----
> > From: tsv-area [mailto:tsv-area-bounces@ietf.org] On Behalf Of Black, David
> > Sent: Wednesday, May 14, 2014 1:53 PM
> > To: tsvwg@ietf.org; tsv-area@ietf.org
> > Subject: Fragmentation and Path MTU text in nvo3 dataplane reqts draft
> >
> > <WG chair hat off>
> >
> > Over in the nvo3 WG, draft-ietf-nvo3-dataplane-requirements-03 contains
> > some text on dealing with the fragmentation and MTU effects of tunnels.
> > I thought I'd ask for some early review of this text, given recent IESG
> > excitement around fragmentation and Path MTU topics in another draft:
> 
> All tunnels have trouble with path MTU, and in some cases have no choice
> but to fragment. However, they should strive to tune out fragmentation
> and forward whole packets whenever possible.
> 
> Over in the intarea, there have been sporadic ongoing discussions about
> how to recommend generic MTU mitigations for tunnels. Joe Touch and Mark
> Townsley have been working for a long time on a document titled
> "Tunnels in the Internet Architecture":
> 
> http://tools.ietf.org/id/draft-ietf-intarea-tunnels-00.txt
> 
> That document should be the place to put generic recommendations for
> tunnel MTU handling that apply to all tunnel types.
> 
> Tunnel MTU issues keep popping up in all places, and this is just
> another example. Is it time to revive Joe and Mark's document?
> 
> Thanks - Fred
> fred.l.templin@boeing.
> 
> > http://datatracker.ietf.org/doc/draft-ietf-ipsecme-ikev2-fragmentation/ballot/
> >
> > I believe that the nvo3 draft is in better shape in these areas.  Nonetheless,
> > I've included its current text on fragmentation and path MTU below, and (on
> > behalf of the draft authors and nvo3 WG chairs) I'm looking for input on
> > what that text should say and why.
> >
> > In nvo3 terminology, an overlay network is an inner network that is tunneled
> > over an outer underlay network.  The nvo3 WG also uses "Tenant System" as
> > the term for a sender/receiver of network traffic because multi-tenancy is
> > an important motivation for the WG's activities in network virtualization.
> >
> > --------------------------------------
> >
> > 3.5. Path MTU
> >
> >        The tunnel overlay header can cause the MTU of the path to the
> >        egress tunnel endpoint to be exceeded.
> >
> >        IP fragmentation SHOULD be avoided for performance reasons.
> >
> >        The interface MTU as seen by a Tenant System SHOULD be adjusted such
> >        that no fragmentation is needed. This can be achieved by
> >        configuration or be discovered dynamically.
> >
> >        Either of the following options MUST be supported:
> >
> >           o Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] or
> >             Extended MTU Path Discovery techniques such as defined in
> >             [RFC4821]
> >
> >           o Segmentation and reassembly support from the overlay layer
> >             operations without relying on the Tenant Systems to know about
> >             the end-to-end MTU
> >
> >           o The underlay network MAY be designed in such a way that the MTU
> >             can accommodate the extra tunnel overhead.
> >
> > --------------------------------------
> >
> > </WG chair hat off>
> >
> > Thanks,
> > --David
> > ----------------------------------------------------
> > David L. Black, Distinguished Engineer
> > EMC Corporation, 176 South St., Hopkinton, MA  01748
> > +1 (508) 293-7953             FAX: +1 (508) 293-7786
> > david.black@emc.com        Mobile: +1 (978) 394-7754
> > ----------------------------------------------------
> 
> _______________________________________________
> Int-area mailing list
> Int-area@ietf.org
> https://www.ietf.org/mailman/listinfo/int-area