Re: [rrg] IRON: SEAL summary

"Templin, Fred L" <Fred.L.Templin@boeing.com> Tue, 09 February 2010 22:19 UTC

From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>
Date: Tue, 09 Feb 2010 14:20:39 -0800
Thread-Topic: IRON: SEAL summary
Thread-Index: AcqpMUQiQtGalUhsQ0ixQOFO4S3O3wAoULOw
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A649510383B7@XCH-NW-01V.nw.nos.boeing.com>
References: <4B6FDB2F.5090203@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64951037F38@XCH-NW-01V.nw.nos.boeing.com> <4B70CB22.1020104@firstpr.com.au>
In-Reply-To: <4B70CB22.1020104@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: RRG <rrg@irtf.org>
Subject: Re: [rrg] IRON: SEAL summary
Precedence: list

Hi Robin,

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Monday, February 08, 2010 6:41 PM
> To: Templin, Fred L
> Cc: RRG
> Subject: Re: IRON: SEAL summary
>
> Short version:   Fred's CES proposal is now called IRON-RANGER.
>
>                  But in protecting DF=0 fragmentable IPv4 packets
>                  and suggesting that we live with, and laboriously
>                  adapt to, the growing practices of PTB filtering and
>                  tunnels which don't support RFC 1191 and RFC 1981
>                  Path MTU Discovery, is IRON-RANGER upholding
>                  righteousness or .....
>
>
> Hi Fred,
>
> I plan to write a "IRON: SEAL summary V2" based on what I learnt from
> your two recent on-list messages and one off-list message.  Here is
> my response to your second on-list message.
>
> You wrote:
>
>
> >> Fred's "RANGER" scalable routing proposal is now known as IRON:
> >>
> >>    The Internet Routing Overlay Network (IRON)
> >>    http://tools.ietf.org/html/draft-templin-iron-00
> >
> > No - RANGER is still RANGER and always will be. IRON tells
> > how the routing system used by RANGER works, with a first
> > emphasis of the Internet core domain of application. But,
> > the same IRON principles apply to any RANGER level of
> > recursion. So, it is "IRON-plus-RANGER", and not
> > "IRON-obsoletes-RANGER". IRON-RANGER might be a good
> > way to call it.
>
> OK . . .
>
> I thought "LISP" and "NERD" were weedy names and so I was quite happy
> with "Ivip" - energetically reminiscent of a 1950s abrasive sink
> cleaner.
>
> APT was brief and apt.
>
> RANGER and (Navy, I assume) SEAL were always lurking in the distance.
>
> Now it seems we have a full-tilt Acronym Arms Race with the formidable:
>
>     >>>> IRON-RANGER <<<<
>
> on the scene.  Other proposals would be well advised to mind their
> manners . . .
>
> The moral questions are not so clear-cut.  Is IRON-RANGER really
> defending against the Forces of Evil?
>
> Are DF=0 packets really the endangered species and unfortunate
> refugees of past eras, which we should dutifully slice up, cart
> around in multiple fragments, and painstakingly reassemble as if
> nothing had happened?
>
> Should stack and application programmers be required, by the value
> judgements inherent in IRON-RANGER's PMTUD Modernization Program, to
> update everything to use complex and messy RFC 4821 PMTUD?
>
> Is IRON-RANGER's SEAL Highway Planning Department doing the right
> thing by routing our communication pathways further and further
> around a growing swamp caused by lousy tunnel implementations and
> ill-advised filtering of PTBs?
>
> Or is plucky little Ivip pursuing a more efficient and enlightened
> future by recognising the incompatibility between this swamp and a
> healthy future, by proposing it is best and easiest to Just Say No to
> such swampificacious practices and by casting DF=0 packets to the
> outer, along the lines of (msg05816)?
>
>    DF=0 is an indulgence still granted to lazy bourgeois hosts
>    in their dotage, enabling them to continue their regressive
>    habits of unfairly burdening the proletariat of Worker
>    routers with the responsibility for slicing up and
>    individually carrying and reassembling their overweight,
>    carelessly emitted, packets.  Come the Revolution . . . .
>
> or:
>
>    Citizens and businesses shouldn't have to pay extra for their
>    Internet services just because a few applications and big
>    corporations (Google) are burdening the Internet's routing
>    system with over-long packets which require special handling.
>
>
>
> >> Other items of state are listed in 4.3.3:
> >>
> >>     MHLEN   Constant mid-level header length.  In this attempt to
> >>             describe SEAL, I will assume there is no need for these
> >>             mid-level headers - between the outer IP header and the
> >>             SEAL (or IPv6 fragmentation) header which precedes the
> >>             encapsulated packet.  So I will assume this = 0.
> >
> > Actually, the mid-level headers occur between the *inner*
> > IP header and the SEAL header.
>
> Something may need to be rewritten, since Figures 1 and 2 show
> "mid-layer headers" after SEAL header.

I think the figures are OK. The top of the figure is
the outermost header, and descending downwards through
the figure shows successively more inner layers of
encapsulation. Is it somehow confusing?

> >>     HLEN    Constant outer header length: 20 for IPv4 and 40 for
> >>             IPv6 plus the length of the SEAL header (IPv4 only -
> >>             8 bytes) or IPv6 Fragment Header (8 bytes).  So these
> >>             are constants:
> >>
> >>                IPv4 28           IPv6 48
> >
> > I am going to work on this. I think what I will do is
> > eliminate the need for the IPv6 header to include a
> > fragment header and instead handle IPv6 segmentation
> > and reassembly using SEAL the same as is currently
> > documented for IPv4.
>
> OK - but as far as I know, in an IRON-RANGER scenario, the ITEs are
> not intended to be continually sending a traffic packet by two or
> more tunnel packets, whether by IPv4 fragmentation, IPv6
> fragmentation or inbuilt SEAL segmentation - with the probable
> exception of DF=0 IPv4 packets.

In the core, we will not require core routers to
reassemble and those routers will use SEAL-FS.
Toward the edges, there may be places that would
benefit from using segmentation and reassembly,
e.g., to hide the encapsulation artifact for
tunnels. Those routers would use SEAL-SR.

> >>     S_MSS   Variable.  SEAL Maximum Segment Size.  I think this is
> >>             initialised to the value of (the ID says "no larger
> >>             than") the MTU of the "underlying IP interface".  I guess
> >>             this means that the ITE has a single interface for
> >>             sending out encapsulated packets on.  (To be pernickety,
> >>             it could be pointed out that the ITE might be a multiple
> >>             interface router, with different MTUs on each, and that
> >>             the best path to a given ETE might change from one
> >>             interface to another.)
> >>
> >>             According to the ID, this value may be adjusted upwards
> >>             and downwards based on received SEAL Reassembly Report
> >>             messages.
> >>
> >>             However, I think these are not of interest in IRON - and
> >>             that it is other messages which alter this value which
> >>             we need to consider:
> >>
> >>               IPv4: SEAL "Fragmentation Experienced" from the ETE.
> >>
> >>               IPv6: PTB from a router in the tunnel path.
> >
> > Right. SEAL-FS (and not SEAL-SR) is used for the use
> > case of IRON in the core.
>
> I think that the IRON ID should attempt a summary of which parts of
> SEAL are used for this CES system.  Likewise, it should contain a
> reasonably self-contained description of what parts of RANGER and VET
> are used.

OK.

> >> The S_MRU field may be set to zero.  As far as I know S_MRU is how
> >> big a packet the ETE is prepared to reconstruct from packets which
> >> are fragmented with IPv4's native fragmentation - but it may also
> >> apply to the use of SEAL's internal segmentation system, which I
> >> understand is not generally used for IRON.
> >
> > I will clarify the cases in which S_MRU may be
> > set to zero. I think now that they apply only to
> > SEAL-FS and not to SEAL-SR.
>
> OK.
>
>
> >> There is a separate SEAL PTB message (4.4.5.1.3) which as far as I
> >> know, is not relevant to IRON - since I think it concerns SEAL's
> >> internal "segmentation" system.
> >
> > Yes, that is correct for the case of IRON used in
> > the core. For IRON applied to other levels of the
> > RANGER recursion, SEAL-SR may be used instead of
> > SEAL-FS.
>
> OK.
>
>
> >> When encapsulating IPv6 packets, the SEAL ITE does not use a SEAL
> >> header.  It uses the IPv6 Fragment Header:
> >>
> >>   http://tools.ietf.org/html/rfc2460#section-4.5
> >>
> >>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>    |  Next Header  |   Reserved    |      Fragment Offset    |Res|M|
> >>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>    |                         Identification                        |
> >>   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> The single packet does not contain a "fragment" - this is just using
> >> the header for another purpose.  This header contains a 32 bit
> >> "Identification" field, which is set to the SEAL-ID value.
> >
> > I am going to change this so that IPv6 does not use the
> > IPv6 fragment header but instead uses the SEAL header
> > (plus SEAL segmentation and reassembly when needed).
>
> OK.
>
> >> There is no facility, as there was with IPv4, for the ITE to request
> >> the ETE to confirm that the packet arrived.
> >
> > I'll fix this.
>
> OK - the SEAL header for IPv6 will have a flag to tell the ETE to
> confirm reception.
>
>
> >> Rejecting larger packets directly
> >> ---------------------------------
> >>
> >> Once the ITE has established a value for S_MSS for a given ETE, then
> >> any packet it needs to tunnel to that ETE will be rejected with a PTB
> >> to the SH if, once it was encapsulated, it would be too long for the
> >> S_MSS of this ETE.   So if a DF=1 IPv4 packet arrives with a length
> >> longer than (S_MSS - 28), it will be rejected with a PTB.  Likewise
> >> an IPv6 packet longer than (S_MSS - 48).
> >>
> >> As far as I know, all IPv4 DF=0 packets longer than (S_MSS - 28) will
> >> be fragmented into IPv4 fragments of this length (with the final one
> >> potentially - typically - being shorter) and then the fragments will
> >> be tunnelled.  At the ETE, each fragment will be forwarded to the
> >> destination network and then host (or tunnelled to another IRON
> >> router which will forward them to the destination network and then
> >> host - and the destination host will reassemble the fragments using
> >> the standard IPv4 system.)
> >>
> >> The ITE can't reject any DF=0 packets.  It can however learn the PMTU
> >> to the ETE by the ETE reporting the size of any fragments which
> >> result from the tunneled versions of the IPv4 native fragments.  That
> >> will enable the ITE to use IPv4 native fragmentation on subsequent
> >> DF=0 packets to this ITE in order that they will fit within the newly
> >> discovered PMTU limitation, which is stored in this ETE's S_MSS variable.
> >
> > This is entering a very sensitive area - when the ITE
> > gets a DF=0 packet, it needs to know when to use inner
> > fragmentation and what is the size that can be used for
> > inner fragmentation? If the ETE will not be doing
> > reassembly of any kind, the ITE has no choice but
> > to perform inner fragmentation using a safe initial
> > fragment size (e.g., 1280). But, it could always send
> > a "dummy packet" to probe the true inner packet size
> > so that future DF=0 packets might be able to avoid the
> > inner fragmentation. The draft doesn't currently say
> > this, but there is nothing to stop an implementation
> > from doing it.
>
> We may have to resort to this sort of dual packet tunneling for some
> DF=0 packets, but I think the sooner DF=0 packets are deprecated, the
> better.
>
>
>
> >> Jumboframes
> >> -----------
> >>
> >> ("Jumbograms" refer to IPv6 packets with special formats so they can
> >> be longer than 2^16 bytes, and can be as long as 4 gigabytes.
> >> Neither SEAL, Ivip's IPTM, nor any CES proposal attempts to deal with
> >> these.)
> >
> > That is not correct; SEAL can accommodate Jumbograms
> > using SEAL segmentation and reassembly if necessary.
> > I think this is made clear in the case of IPv4 as the
> > outer protocol, but I need to make sure to make it more
> > clear in the IPv6 case.
>
> I think you could make SEAL to almost anything, but why would you
> want any router to accept 64k and longer packets, up to 4 gigabytes
> long, and fragment or segment then to be sent and later reassembled?
>
>   http://tools.ietf.org/html/rfc2675

Hi speed data centers that have 9KB MTU links and
very low packet loss due to congestion might want to
present a large MTU to TCP (e.g., 64B) then carry the
segments as multiple 9KB packets using SEAL. This also
brings up the question of effectiveness of the link
level CRC in detecting errored data as a function of
the packet size. (Earlier works seemed to show that
CRC-32 performance deteriorates for packet sizes
larger than ~9KB.)

So, high speed data centers might want to use packet
sizes that are larger than the underlying links can
support natively.

> >> For both IPv4 and IPv6, if the SH ignores PTBs, or if there is some
> >> filtering between the ITE and the SH which drops PTBs, then there's
> >> no way the SH is going to adapt its packets to the lengths required
> >> by SEAL to send them without fragmentation, segmentation or whatever.
> >
> > It is true that SEAL does not attempt to fix MTU
> > problems that occur in the end site in front of
> > the ITE.
>
> Nor any other CES.  Only RFC 4821 in the TCP stack or the application
> packetization layers can cope with this.  I think it should be fixed,
> rather than coped with.

This can easily morph into a discussion of the
end-to-end principles and how they would view
MTU determination. I still think that the end
systems would be better served to take MTU
assurance into their own hands rather than
rely on an untrustworthy network.

> >> Comparison with Ivip's IPTM
> >> ---------------------------
> >>
> >> Here is a rough comparison between my understanding of SEAL and my
> >> IPTM approach to PMTUD tunneling when Ivip uses encapsulation:
> >>
> >>     http://www.firstpr.com.au/ip/ivip/pmtud-frag/
> >>
> >>   IPTM involves the ITRs caching enough of the IPv4 traffic packet
> >>   to generate a valid PTB to the SH.  This is more expensive than
> >>   SEAL.  However, this is only done when sending a B and A pair of
> >>   packets, which is only when the length (after encapsulation) is
> >>   in the Zone of Uncertainty.  Once this Zone is reduced to zero,
> >>   there's no need to use the IPTM protocol, since traffic packets
> >>   are either encapsulated or rejected with a PTB.
> >
> > What happens if a routing change occurs after the zone
> > of uncertainty has been reduced to zero?
>
> If the routing change lowers the PMTU, then there would be lost
> packets.
>
> As I wrote further down:
>
> >>   This means that the ITR may not notice a drop in PMTU until
> >>   after 10 minutes, when it would retry sending longer packets and
> >>   so discover the new lower value.
> >>
> >>   This is something I need to work on.  One approach is to have
> >>   the ITR perform IPTM on these traffic packets every few minutes.
>
> Maybe the ITR needs to do this at more frequent intervals.  If such
> changes were frequently encountered, it could also deliberately tell
> the SH to send somewhat shorter packets than could be handled at
> present, so that if a routing change dropped the PMTU by 30 bytes or
> so, the resulting packets would still get through the new lower limit.
>
> If the ITR function was in the sending host, it could easily detect
> the PTBs which should arise from the new MTU limit and use them to
> send a PTB to the SH's IP stack to lower the packet length accordingly.
>
>
> >>   Maybe IPTM doesn't require caching of part of the IPv6 traffic
> >>   packet, since the PTBs should contain enough of it to generate
> >>   a PTB.  However, IPTM can work even if no PTBs are received.
> >>   It is probably best for the ITR to cache the first 540 bytes or
> >>   so of those IPv6 traffic packets which are used in this
> >>   probing of PMTU.
> >>
> >>   IPTM uses the same protocol for both IPv6 and IPv4: a dual
> >>   packet arrangement in which, if both are received, the
> >>   traffic packet will be delivered and the ITR will learn of this
> >>   and so be able to raise the lower limit of the range of possible
> >>   PMTU values to this ETR, so reducing the Zone of Uncertainty.
> >>   Specifically, IPTM does not use DF=0 packets in the tunnel, like
> >>   SEAL does.
> >>
> >>   IPTM makes use of PTBs from within the tunnel, but if none
> >>   arrive at the ITR, then the ITR can still (usually) determine
> >>   whether the long B packet arrived or not at the ITR.  When
> >>   longer packets do not arrive and shorter ones do, in the
> >>   absence of PTBs, the ITR can try sending shorter packets until
> >>   a size is found which are reliably delivered.  (This is one
> >>   of the many things I need to work on when developing IPTM.)
> >>
> >>   If the ITR steps downwards by 20 bytes per attempt, it may
> >>   overshoot the actual PMTU limit somewhat, but it will find
> >>   a value which works, and is usually not too far below the
> >>   real PMTU limit - all without any reliance on PTBs within
> >>   the tunnel.  As far as I know SEAL needs PTBs from the tunnel
> >>   routers to work with IPv6 - but perhaps I don't understand
> >>   this part of SEAL correctly.
> >
> > SEAL for IPv6 will be changed to use the SEAL header
> > instead of the IPv6 fragmentation header. Then, it
> > can use either data packets or dummy packets as
> > probes the same as for IPv4 - with explicit probe
> > responses from the ETE even if PTBs are being
> > dropped in the network. There might be something
> > in IPTM for SEAL to glean here.
>
> IPTM already uses a probing strategy which uses PTBs if they arrive
> and otherwise relies on the ETR explicitly acknowledging the receipt
> or non-receipt of the B packet, which is the same length as would
> result from normal IP-in-IP encapsulation of the original traffic packet.
>
>
>
> >> Comparison with LISP
> >> ---------------------
> >>
> >> LISP has non-stateful and a stateful approach to handling PMTUD
> >> problems, neither of which are mandatory.
> >>
> >> The non-stateful approach might work if the constant was set well
> >> below 1500, such as 1400 or so (depending on the lowest PMTU from any
> >> ITR to any ETR) but it will lock the whole system into this size
> >> without any possibility of using higher values closer to 1550 bytes,
> >> or jumboframe ~9k byte paths in the DFZ as these become available.
> >
> > IMHO, this is a significant limitation.
> >
> >> I think the the stateful approach, from the OPENLISP project in Belgium:
> >>
> >>   http://tools.ietf.org/html/draft-ietf-lisp-06#section-5.4.2
> >>
> >> needs more work.
> >>
> >> It doesn't mention how DF=0 packets would be handled, how the system
> >> could work if PTBs were not received from the limiting tunnel router,
> >> or how the ITR would explore larger values of PMTU after 10 minutes.
> >
> > This too.
>
> I agree.
>
>
> > Thanks for your comments,
>
> Thanks for discussing your proposal and PMTUD in detail.

Fred
fred.l.templin@boeing.com

>   - Robin

[rrg] IRON: SEAL summary Robin Whittle
Re: [rrg] IRON: SEAL summary Templin, Fred L
Re: [rrg] IRON: SEAL summary Robin Whittle
Re: [rrg] IRON: SEAL summary V2 Robin Whittle
Re: [rrg] IRON: SEAL summary Templin, Fred L
Re: [rrg] IRON: SEAL summary V2 Templin, Fred L
Re: [rrg] IRON: SEAL summary Robin Whittle
Re: [rrg] IRON: SEAL summary V2 Robin Whittle