Re: [rrg] IRON: SEAL summary

"Templin, Fred L" <Fred.L.Templin@boeing.com> Mon, 08 February 2010 20:24 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 624AB28C175 for <rrg@core3.amsl.com>; Mon, 8 Feb 2010 12:24:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.616
X-Spam-Level:
X-Spam-Status: No, score=-6.616 tagged_above=-999 required=5 tests=[AWL=-0.017, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iRRkpzVKDGUl for <rrg@core3.amsl.com>; Mon, 8 Feb 2010 12:24:53 -0800 (PST)
Received: from stl-smtpout-01.boeing.com (stl-smtpout-01.boeing.com [130.76.96.56]) by core3.amsl.com (Postfix) with ESMTP id 855B83A748E for <rrg@irtf.org>; Mon, 8 Feb 2010 12:24:53 -0800 (PST)
Received: from blv-av-01.boeing.com (blv-av-01.boeing.com [130.247.48.231]) by stl-smtpout-01.ns.cs.boeing.com (8.14.0/8.14.0/8.14.0/SMTPOUT) with ESMTP id o18KPhms027202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Mon, 8 Feb 2010 14:25:46 -0600 (CST)
Received: from blv-av-01.boeing.com (localhost [127.0.0.1]) by blv-av-01.boeing.com (8.14.0/8.14.0/DOWNSTREAM_RELAY) with ESMTP id o18KPhkx010304; Mon, 8 Feb 2010 12:25:43 -0800 (PST)
Received: from XCH-NWHT-01.nw.nos.boeing.com (xch-nwht-01.nw.nos.boeing.com [130.247.70.222]) by blv-av-01.boeing.com (8.14.0/8.14.0/UPSTREAM_RELAY) with ESMTP id o18KPg86010293 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Mon, 8 Feb 2010 12:25:42 -0800 (PST)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-01.nw.nos.boeing.com ([130.247.70.222]) with mapi; Mon, 8 Feb 2010 12:25:42 -0800
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>, RRG <rrg@irtf.org>
Date: Mon, 08 Feb 2010 12:25:41 -0800
Thread-Topic: IRON: SEAL summary
Thread-Index: Acqooj/Ua9DHYhfZSO2Z4slJqreh6wAUkeZg
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64951037F38@XCH-NW-01V.nw.nos.boeing.com>
References: <4B6FDB2F.5090203@firstpr.com.au>
In-Reply-To: <4B6FDB2F.5090203@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
x-tm-as-product-ver: SMEX-8.0.0.1181-6.000.1038-17180.004
x-tm-as-result: No--38.908000-8.000000-31
x-tm-as-user-approved-sender: No
x-tm-as-user-blocked-sender: No
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [rrg] IRON: SEAL summary
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Feb 2010 20:24:56 -0000

Hi Robin,

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Monday, February 08, 2010 1:37 AM
> To: RRG
> Cc: Templin, Fred L
> Subject: IRON: SEAL summary
>
> This is a summary of my current probably imperfect understanding of
> the parts of Fred Templin's SEAL tunneling protocol which are
> important for his scalable routing proposal: RANGER (now IRON).
>
> Please consult any discussion which follows - I wrote this without
> checking it first with Fred, so he will no provide corrections.
>
> At the end I compare this understanding of SEAL with my IPTM
> arrangement for Ivip, and with the non-stateful and stateful
> approaches of LISP.
>
>
>   - Robin
>
>
>
> Fred's "RANGER" scalable routing proposal is now known as IRON:
>
>    The Internet Routing Overlay Network (IRON)
>    http://tools.ietf.org/html/draft-templin-iron-00

No - RANGER is still RANGER and always will be. IRON tells
how the routing system used by RANGER works, with a first
emphasis of the Internet core domain of application. But,
the same IRON principles apply to any RANGER level of
recursion. So, it is "IRON-plus-RANGER", and not
"IRON-obsoletes-RANGER". IRON-RANGER might be a good
way to call it.

> The proposal as currently described:
>    http://tools.ietf.org/html/draft-irtf-rrg-recommendation-04#section-16
>
> is a particular application of Fred's more general-purpose RANGER:
>
>    http://tools.ietf.org/html/draft-templin-ranger-09
>
> which itself is based on ISATAP (RFC 5214).  RANGER and therefore
> IRON uses SEAL for tunneling with Path MTU Discovery management
> functions.
>
> Next I will write up my understanding of IRON.
>
>
> The latest SEAL ID is:
>
>   http://tools.ietf.org/html/draft-templin-intarea-seal-08
>
> Perhaps SEAL is capable of tunneling IPv4 packets in IPv6 tunnels and
> vice-versa, but I am only interested in SEAL for pure IPv4 and pure
> IPv6, since this is how I am trying to understand IRON.
>
> SEAL is intended to be suitable for the extremely ad-hoc tunneling
> arrangements found in Core-Edge Separation solutions to the routing
> scaling problem, where an ITR may have a sudden need to tunnel one or
> more packets to an ETR it has never had anything to do with.
>
> In SEAL, the router (or some function programmed into a server,
> rather than a conventional router) which accepts packets and tunnels
> them is known as the ITE (Ingress Tunnel Endpoint).  The router or
> other device which the tunnel reaches to, and which decapsulates the
> inner packet, is the ETE (Egress Tunnel Endpoint).
>
> Please see the discussions between Fred and me in the thread: "SEAL
> critique, PMTUD, RFC4821 = vapourware".
>
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05816.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05834.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05843.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05902.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05924.html RW
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05927.html   FT
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05976.html RW
>
>
> SEAL for IPv4 and IPv6
> ----------------------
>
> SEAL is capable of "segmenting" (fragmenting, but within the SEAL
> protocol, rather than by using IPv4 or IPv4 fragmentation mechanisms)
> packets which are known to be too long for the PMTU to a given ETE.
> However, this is not intended to be used with IRON.
>
> SEAL is intended to tunnel packets to an ETE, without any need to
> establish tunneling arrangements, without expecting acknowledgements
> of successful receipt by the ETE and without resending any packets.
> The tunnel is a one-way (ITE to ETE), ad-hoc, arrangement - but the
> ITE stores some state for each particular ETE it is tunneling to.
>
> SEAL ITEs do not cache any part of the packets they send.  So in
> order to generate a PTB to the sending host (which may itself be
> another router, if this SEAL tunnel is already within an outer tunnel
> - which I think is not the case with IRON) the ITE relies on
> receiving enough of the packet from (IPv6-only) a PTB from a limiting
> router on the path to the ETE, or (IPv4-only) a "Fragmentation
> Experienced" message from the ETE itself.
>
> I assume that for IRON, there will be no need for "mid-level headers"
> such as UDP between the outer header and the SEAL (or IPv6
> Fragmentation) header - but see msg05927 and msg05976 and search for
> "ECMP".
>
>
> There is some pretty confusing text about the Tunnel Interface MTU -
> 4.3.1.  One part discusses setting this to 1500 bytes or more.  Later
> in that paragraph there is discussion of setting it to smaller values
> than this. The next paragraph discusses setting it to an infinite
> value.  I hope Fred will provide some guidance on how this would be
> done for IRON.
>
> Each ITE maintains some state for each ETE it tunnels packets to (by
> each ETE IP address).  I guess there would be a process for deleting
> this after a while if no packets are sent to that ETE, since over
> time, the state could grow to a considerable size, depending on how
> many ETEs there are in the world.
>
>     SEAL-ID most recently used.  A 32 bit value which is initialised
>     to some random value, and then incremented modulo 2^32 every time
>     a packet is tunneled to this particular ETE.
>
>     Further state as required to implement a window function or
>     some other arrangement by which this ITE can test a SEAL-ID
>     in an incoming message including at least these:
>
>        PTB from a router in the tunnel path (IPv6 only).
>        SEAL Packet Too Big message from ITE (Not needed in IRON?).
>        SEAL "Fragmentation Experienced" (IPv4 only).
>
>     Please see the last two messages listed above for Fred's approach
>     to doing this and my critique and timer-based suggestion.
>
> Other items of state are listed in 4.3.3:
>
>     MHLEN   Constant mid-level header length.  In this attempt to
>             describe SEAL, I will assume there is no need for these
>             mid-level headers - between the outer IP header and the
>             SEAL (or IPv6 fragmentation) header which precedes the
>             encapsulated packet.  So I will assume this = 0.

Actually, the mid-level headers occur between the *inner*
IP header and the SEAL header.

>     HLEN    Constant outer header length: 20 for IPv4 and 40 for
>             IPv6 plus the length of the SEAL header (IPv4 only -
>             8 bytes) or IPv6 Fragment Header (8 bytes).  So these
>             are constants:
>
>                IPv4 28           IPv6 48

I am going to work on this. I think what I will do is
eliminate the need for the IPv6 header to include a
fragment header and instead handle IPv6 segmentation
and reassembly using SEAL the same as is currently
documented for IPv4.

>     S_MSS   Variable.  SEAL Maximum Segment Size.  I think this is
>             initialised to the value of (the ID says "no larger
>             than") the MTU of the "underlying IP interface".  I guess
>             this means that the ITE has a single interface for
>             sending out encapsulated packets on.  (To be pernickety,
>             it could be pointed out that the ITE might be a multiple
>             interface router, with different MTUs on each, and that
>             the best path to a given ETE might change from one
>             interface to another.)
>
>             According to the ID, this value may be adjusted upwards
>             and downwards based on received SEAL Reassembly Report
>             messages.
>
>             However, I think these are not of interest in IRON - and
>             that it is other messages which alter this value which
>             we need to consider:
>
>               IPv4: SEAL "Fragmentation Experienced" from the ETE.
>
>               IPv6: PTB from a router in the tunnel path.

Right. SEAL-FS (and not SEAL-SR) is used for the use
case of IRON in the core.

>     S_MRU   Variable.  SEAL Maximum Reassembly Unit.  Initialised to
>             "infinity", but the effective value of S_MRU is never
>             more than 256 * S_MSS.  (Since S_MSS can rise and fall,
>             this means there are really two items of state: one the
>             limit and the other the effective value, which may be
>             increased or decreased according to S_MSS * 256, as long
>             as it remains less than or equal to the limit value.)
>
>
> SEAL for IPv4
> -------------
>
> DF=0 packets will be fragmented with standard IPv4 techniques before
> other processing, if they are longer than:
>
>    (MIN(S_MRU, S_MSS) - 28)
>
> The fragments will be of this length and will be tunneled as
> described below.
>
> DF=1 packets which are longer than:
>
>    (S_MRU - 28)
>
> will result in a PTB being sent to the Sending Host (SH).  Apart from
> being used to generate that PTB, the packet will be dropped.
>
> Encapsulation involves:
>
>   IPv4 outer header   With the 16 Identification bit field set to
>                       the 16 most significant bits of SEAL-ID.
>                       DF = 0, so the packet is fragmentable by
>                       any router which finds the packet is too long
>                       for its next-hop MTU.
>
>   SEAL header (32 bits) as described below.
>
>   The original traffic packet.
>
> The SEAL header (only for IPv4) has a 16 bit field "ID-Extension"
> which is set to the least significant 16 bits of SEAL-ID.  Section
> 4.2 discusses this.  I haven't figured out exactly how the other bits
> would be set.
>
> If the packet arrives intact, then the ETE decapsulates it and
> forwards the traffic packet to wherever it needs to go.  In IRON,
> most of the time, there is a single tunnel to the IRON router which
> directly connects to the end-user network.  However initial packets
> go to an IRON router (the VP router) which is typically not the
> router which connects to the end-user network - so then the packet
> would be decapsulated and tunneled in the same manner to the IRON
> router which connects to the destination network.
>
> If the packet is too long for one router in the ITE -> ETE path, then
> that router will fragment the whole packet and the ETE will (usually)
> receive the first and other fragments.  The first fragment should be
> as long as the limit imposed at that router by the next-hop MTU.  If
> there were two or more MTU limits, such as 1400 and then 1300, the
> fragment generated at the first limiting router would be 1400 bytes
> long, and this would be fragmented at the second router, so the first
> fragment would be 1300 bytes.
>
> In this way, the ETE discovers the PMTU of the ITE -> ETE tunnel path.
>
> The ETE does not attempt to deliver the packet, but sends a
> "Fragmentation Experienced" message to the ITE.  See 4.3.9.1.2 and
> 4.4.5.1.2.  This message is defined in Figure 6 and contains as much
> of the first fragment as would make the total message 576 bytes in
> length.  There are 20 bytes for the IPv4 header and 16 for this
> particular SEAL header, so up to 540 bytes of the first fragment will
> be sent back to the ITE.  There is no acknowledgement of reception by
> the ITE.
>
> The ITE can authenticate this message by looking at the 16 bit ID in
> the IPv4 header of the enclosed portion of the first fragment - and
> the 16 bits of "ID-Extension" in the SEAL header in that fragment.
>
> The S_MSS field of this message is set to the length of the first
> fragment, which is (or should be, and is assumed to be) the PMTU of
> the ITE -> ETE path.
>
> The S_MRU field may be set to zero.  As far as I know S_MRU is how
> big a packet the ETE is prepared to reconstruct from packets which
> are fragmented with IPv4's native fragmentation - but it may also
> apply to the use of SEAL's internal segmentation system, which I
> understand is not generally used for IRON.

I will clarify the cases in which S_MRU may be
set to zero. I think now that they apply only to
SEAL-FS and not to SEAL-SR.

> There is a separate SEAL PTB message (4.4.5.1.3) which as far as I
> know, is not relevant to IRON - since I think it concerns SEAL's
> internal "segmentation" system.

Yes, that is correct for the case of IRON used in
the core. For IRON applied to other levels of the
RANGER recursion, SEAL-SR may be used instead of
SEAL-FS.

> I understand that the reason Fred uses DF=0 packets for IPv4 tunnels
> is that IPv4 routers, when sending back a PTB, are required (by RFC
> 1191) to only return the IPv4 header and the next 8 bytes.  This is
> enough for the ITE to authenticate the PTB as genuine, since these 8
> bytes are the SEAL header. With the IPv4 header, the ITE can
> therefore see the full 32 bit SEAL-ID.  However, there is nothing of
> the original traffic packet in the PTB, so the only way the ITE could
> generate a PTB to the SH would be to have cached the first 28 bytes
> of the original traffic packet.  To avoid the need for caching, and
> so as not to rely on PTBs from routers in the ITE -> ETE tunnel, SEAL
> ITEs tunnel IPv4 packets using DF=0 packets.
>
> The SEAL header used in this IPv4 process contains a flag which, when
> set, signals that the ETE should report the successful reception of
> the packet to the ITE.  I am not sure to what extent this would be
> used for IRON.
>
>
> SEAL for IPv6
> -------------
>
> DF=1 packets which are longer than:
>
>    (S_MRU - 48)
>
> will result in a PTB being sent to the Sending Host (SH).  Apart from
> being used to generate that PTB, the packet will be dropped.
>
> When encapsulating IPv6 packets, the SEAL ITE does not use a SEAL
> header.  It uses the IPv6 Fragment Header:
>
>   http://tools.ietf.org/html/rfc2460#section-4.5
>
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |  Next Header  |   Reserved    |      Fragment Offset    |Res|M|
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                         Identification                        |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> The single packet does not contain a "fragment" - this is just using
> the header for another purpose.  This header contains a 32 bit
> "Identification" field, which is set to the SEAL-ID value.

I am going to change this so that IPv6 does not use the
IPv6 fragment header but instead uses the SEAL header
(plus SEAL segmentation and reassembly when needed).

> If the packet does not encounter any MTU limits and arrives at the
> ETE, then the ETE decapsulates it and does whatever it needs to do
> with it - deliver it directly to the end-user network or perhaps (if
> the ETE is the VP IRON router) tunnel it again to an IRON router
> which will deliver it to the end-user network.
>
> If the packet hits an MTU limit, then the limiting router will send
> back a PTB to the ITE, and since this is IPv6, the PTB will contain
> enough of the packet for the ITE to construct a valid PTB for the SH.
>
> There is no facility, as there was with IPv4, for the ITE to request
> the ETE to confirm that the packet arrived.

I'll fix this.

> Rejecting larger packets directly
> ---------------------------------
>
> Once the ITE has established a value for S_MSS for a given ETE, then
> any packet it needs to tunnel to that ETE will be rejected with a PTB
> to the SH if, once it was encapsulated, it would be too long for the
> S_MSS of this ETE.   So if a DF=1 IPv4 packet arrives with a length
> longer than (S_MSS - 28), it will be rejected with a PTB.  Likewise
> an IPv6 packet longer than (S_MSS - 48).
>
> As far as I know, all IPv4 DF=0 packets longer than (S_MSS - 28) will
> be fragmented into IPv4 fragments of this length (with the final one
> potentially - typically - being shorter) and then the fragments will
> be tunnelled.  At the ETE, each fragment will be forwarded to the
> destination network and then host (or tunnelled to another IRON
> router which will forward them to the destination network and then
> host - and the destination host will reassemble the fragments using
> the standard IPv4 system.)
>
> The ITE can't reject any DF=0 packets.  It can however learn the PMTU
> to the ETE by the ETE reporting the size of any fragments which
> result from the tunneled versions of the IPv4 native fragments.  That
> will enable the ITE to use IPv4 native fragmentation on subsequent
> DF=0 packets to this ITE in order that they will fit within the newly
> discovered PMTU limitation, which is stored in this ETE's S_MSS variable.

This is entering a very sensitive area - when the ITE
gets a DF=0 packet, it needs to know when to use inner
fragmentation and what is the size that can be used for
inner fragmentation? If the ETE will not be doing
reassembly of any kind, the ITE has no choice but
to perform inner fragmentation using a safe initial
fragment size (e.g., 1280). But, it could always send
a "dummy packet" to probe the true inner packet size
so that future DF=0 packets might be able to avoid the
inner fragmentation. The draft doesn't currently say
this, but there is nothing to stop an implementation
from doing it.

> Adapting to changed PMTUs
> -------------------------
>
> I am not sure if or where it is specified, but I understand that SEAL
> will allow exploration of increased PMTUs.  According to RFC 1191 and
> therefore RFC 1981, the SH can try sending a longer packet than it
> has been told to send by a previous PTB, after 10 minutes has elapsed.
>
> I assume that if the ITE is only handling DF=0 packets, or at least
> that if there are no DF=1 packets which are longer than the current
> limitation (S_MSS - 28) and that if DF=0 packets keep arriving,
> requiring native IPv4 fragmentation before encapsulation (that is,
> the DF=0 packets are longer than (S_MSS - 28)) then after 10 minutes
> or so, the ITE should explore sending larger packets into the tunnel.
>
> Except when lost packets prevent it, the SEAL ITE will instantly
> discover a reduced PMTU to a given ETE and so reduce that ETE's S_MSS
> value, due to:
>
>   IPv4:    Limiting router fragments the DF=0 tunnel packet and
>            the ETE reports to the ITE the length of the first
>            fragment, which is the new PMTU limit of this path.
>
>            If the original packet was DF=1, the ITE will generate
>            a suitable PTB to the SH.
>
>   IPv6:    A PTB arrives at the ITE, and the ITE sends a PTB to the
>            SH.
>
>
>
> Jumboframes
> -----------
>
> ("Jumbograms" refer to IPv6 packets with special formats so they can
> be longer than 2^16 bytes, and can be as long as 4 gigabytes.
> Neither SEAL, Ivip's IPTM, nor any CES proposal attempts to deal with
> these.)

That is not correct; SEAL can accommodate Jumbograms
using SEAL segmentation and reassembly if necessary.
I think this is made clear in the case of IPv4 as the
outer protocol, but I need to make sure to make it more
clear in the IPv6 case.

> As far as I understand how SEAL would work, SEAL will be able to
> smoothly adapt to the appearance of jumboframe ~9k byte paths between
>  ITEs and ETEs.
>
>
> Coping with blocked or missing PTBs
> -----------------------------------
>
> I understand that for IPv4, between the ITE and ETE, SEAL does not
> use DF=1 packets and so doesn't rely at all on PTBs.   Sometimes, I
> think, the DF=0 arrangement would produce faster results than by
> using PTBs as is done with IPv6. Other times it might be slower - it
> depends on the location and nature of the MTU limits along the tunnel.
>
> It does rely on the limiting router(s) producing a first fragment
> which is the same length as the limiting PMTU of the ITE -> ETE path
> - which seems reasonable.
>
> For IPv6, I understand that SEAL relies on PTBs being sent by routers
> in the ITE -> ETE tunnel path - and these being received by the ITE.
>
> My impression is that there are problems today with some tunnels or
> combinations of tunnels not generating PTBs.  Also, some networks
> apparently stop the reception of PTBs from the DFZ, so this would
> probably prevent SEAL's PMTUD from working.  As far as I know, the
> SEAL ITE doesn't have a way with its handling of ordinary IPv6
> traffic packets to request that the ETE acknowledge their successful
> reception.

Actually, I will be re-working the IPv6 case so that IPv6
encapsulation also uses the SEAL header which has the
explicit acknowledgement feature.

> For both IPv4 and IPv6, if the SH ignores PTBs, or if there is some
> filtering between the ITE and the SH which drops PTBs, then there's
> no way the SH is going to adapt its packets to the lengths required
> by SEAL to send them without fragmentation, segmentation or whatever.

It is true that SEAL does not attempt to fix MTU
problems that occur in the end site in front of
the ITE.

> As far as I know, SEAL does have "segmentation" capabilities - but
> these are not intended to be used within the IRON CES architecture.
> So SEAL's segmentation is no solution to this kind of failure of
> PMTUD.  There's nothing to be done about this - the fault is in the
> SH and/or the filtering, so these need to be fixed.
>
>
>
>
> Comparison with Ivip's IPTM
> ---------------------------
>
> Here is a rough comparison between my understanding of SEAL and my
> IPTM approach to PMTUD tunneling when Ivip uses encapsulation:
>
>     http://www.firstpr.com.au/ip/ivip/pmtud-frag/
>
>   IPTM involves the ITRs caching enough of the IPv4 traffic packet
>   to generate a valid PTB to the SH.  This is more expensive than
>   SEAL.  However, this is only done when sending a B and A pair of
>   packets, which is only when the length (after encapsulation) is
>   in the Zone of Uncertainty.  Once this Zone is reduced to zero,
>   there's no need to use the IPTM protocol, since traffic packets
>   are either encapsulated or rejected with a PTB.

What happens if a routing change occurs after the zone
of uncertainty has been reduced to zero?

>   Maybe IPTM doesn't require caching of part of the IPv6 traffic
>   packet, since the PTBs should contain enough of it to generate
>   a PTB.  However, IPTM can work even if no PTBs are received.
>   It is probably best for the ITR to cache the first 540 bytes or
>   so of those IPv6 traffic packets which are used in this
>   probing of PMTU.
>
>   IPTM uses the same protocol for both IPv6 and IPv4: a dual
>   packet arrangement in which, if both are received, the
>   traffic packet will be delivered and the ITR will learn of this
>   and so be able to raise the lower limit of the range of possible
>   PMTU values to this ETR, so reducing the Zone of Uncertainty.
>   Specifically, IPTM does not use DF=0 packets in the tunnel, like
>   SEAL does.
>
>   IPTM makes use of PTBs from within the tunnel, but if none
>   arrive at the ITR, then the ITR can still (usually) determine
>   whether the long B packet arrived or not at the ITR.  When
>   longer packets do not arrive and shorter ones do, in the
>   absence of PTBs, the ITR can try sending shorter packets until
>   a size is found which are reliably delivered.  (This is one
>   of the many things I need to work on when developing IPTM.)
>
>   If the ITR steps downwards by 20 bytes per attempt, it may
>   overshoot the actual PMTU limit somewhat, but it will find
>   a value which works, and is usually not too far below the
>   real PMTU limit - all without any reliance on PTBs within
>   the tunnel.  As far as I know SEAL needs PTBs from the tunnel
>   routers to work with IPv6 - but perhaps I don't understand
>   this part of SEAL correctly.

SEAL for IPv6 will be changed to use the SEAL header
instead of the IPv6 fragmentation header. Then, it
can use either data packets or dummy packets as
probes the same as for IPv4 - with explicit probe
responses from the ETE even if PTBs are being
dropped in the network. There might be something
in IPTM for SEAL to glean here.

>   So for both IPv4 and IPv6, IPTM should be able to estimate
>   the PMTU even if no PTBs are received.
>
>
>   IPTM will probably involve some limit on the maximum size of
>   IPv4 DF=0 packet which will be allowed.  I suppose the ITR could
>   fragment really large ones - if a host sent ~9k byte DF=0
>   packets - but I think that sort of packet should not be
>   tolerated or encouraged.
>
>   IPTM will adapt to larger PMTUs by trying larger packets
>   - assuming a SH sends them - after 10 minutes.
>
>   IPTM differs very significantly from SEAL in that all SEAL's
>   packets are capable of reporting a lower PMTU to the ITE.  This
>   is not the case with Ivip and IPTM.  IPTM is only used when
>   the ITR is unsure whether the traffic packet will run into an
>   MTU limitation.  This would typically lead to the reduction and
>   soon the elimination of the Zone of Uncertainty as successive
>   attempts using IPTM either succeed or fail.
>
>   Whenever an Ivip ITR tunnels a packet to an ETR and its
>   length, once encapsulated (just 20 or 40 bytes extra, for IP in
>   IP encapsulation) is no greater than the current minimum estimate
>   for the PMTU to this ETR, then the ITR encapsulates the packet
>   normally.  This means the outer header source address is that of
>   the SH.  So the ITR would not get any resulting PTB.  The SH would
>   get a PTB - would not recognise it.
>
>   This means that the ITR may not notice a drop in PMTU until after
>   10 minutes, when it would retry sending longer packets and so
>   discover the new lower value.
>
>   This is something I need to work on.  One approach is to have
>   the ITR perform IPTM on these traffic packets few minutes.
>
>   IPTM as currently described uses 32 bit nonces.  Perhaps I
>   will change this to something like Fred's arrangement of
>   a sequentially increasing value for each ETR, but starting
>   from a randomly initialised value.
>
>   IPTM's encapsulation overhead is 20 and 40 bytes for IPv4 and
>   IPv6 respectively.  SEAL's is 28 and 48 - which is less than
>   LISP's, since LISP has an 8 byte UDP header plus the 8 byte
>   LISP header: 36 and 56 bytes for IPv4 and IPv6 respectively.
>
>
> Comparison with LISP
> ---------------------
>
> LISP has non-stateful and a stateful approach to handling PMTUD
> problems, neither of which are mandatory.
>
> The non-stateful approach might work if the constant was set well
> below 1500, such as 1400 or so (depending on the lowest PMTU from any
> ITR to any ETR) but it will lock the whole system into this size
> without any possibility of using higher values closer to 1550 bytes,
> or jumboframe ~9k byte paths in the DFZ as these become available.

IMHO, this is a significant limitation.

> I think the the stateful approach, from the OPENLISP project in Belgium:
>
>   http://tools.ietf.org/html/draft-ietf-lisp-06#section-5.4.2
>
> needs more work.
>
> It doesn't mention how DF=0 packets would be handled, how the system
> could work if PTBs were not received from the limiting tunnel router,
> or how the ITR would explore larger values of PMTU after 10 minutes.

This too.

>


Thanks for your comments,

Fred
fred.l.templin@boeing.com