Re: [rrg] IRON: SEAL summary V2

Robin,

> -----Original Message-----
> From: rrg-bounces@irtf.org [mailto:rrg-bounces@irtf.org] On Behalf Of Robin Whittle
> Sent: Monday, February 08, 2010 7:34 PM
> To: RRG
> Subject: Re: [rrg] IRON: SEAL summary V2
>
> Based on discussions with Fred (my recent messages), here is my
> revised attempt at describing SEAL, at least as far as it is used in
> the IRON-RANGER Core Edge Separation scalable routing proposal.
>
> Please see the initial attempt:
>
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05977.html
>
> for references to the source documents and older messages and for the
> comparisons with Ivip's IPTM and LISP, which Fred and I discussed in
> (msg05979) and (msg05981).
>
>  - Robin
>
>
> ITEs and ETEs
> -------------
>
> An Ingress Tunnel Endpoint (ITE) is part of the VET interface in an
> IRON router.  It uses SEAL to encapsulate and tunnel packets to a
> distant IRON router, with SEAL's approach to PMTUD which enables the
> IRON router which is the ITE to send Packet To Big (PTB) messages to
> the Sending Host (SH).   The ETE (Egress Tunnel Endpoint) is
> implemented in the VET interface section of the distant IRON router.
>
> When SEAL and RANGER are updated as Fred plans, the ETE router will
> be able to send a SEAL message to the ITE router telling it to
> redirect packets to a given prefix, to another IRON router.  This
> will involve a caching time and the SEAL_ID from the SEAL header of
> the tunnel packet which gave rise to the redirection.  This is an
> important part of IRON - which I will attempt to describe in a
> separate message.
>
>
> SEAL for IPv4 and IPv6
> ----------------------
>
> SEAL is capable of "segmenting" (fragmenting, but within the SEAL
> protocol, rather than by using IPv4 or IPv4 fragmentation mechanisms)
> packets which are known to be too long for the PMTU to a given ETE.
> However, this is not intended to be used with IRON.
>
> SEAL is intended to tunnel packets to an ETE, without any need to
> establish tunneling arrangements, without expecting acknowledgements
> of successful receipt by the ETE and without resending any packets.
> The tunnel is a one-way (ITE to ETE), ad-hoc, arrangement - but the
> ITE stores some state for each particular ETE it is tunneling to.
>
> SEAL ITEs do not cache any part of the packets they send.  So in
> order to generate a PTB to the sending host (which may itself be
> another router, if this SEAL tunnel is already within an outer tunnel
> - which I think is not the case with IRON) the ITE relies on
> receiving enough of the packet from (IPv6-only) a PTB from a limiting
> router on the path to the ETE, or (IPv4-only) a "Fragmentation
> Experienced" message from the ETE itself.
>
> I assume that for IRON, there will be no need for "mid-level headers"
> such as UDP between the outer header and the SEAL header - but see
> msg05927 and msg05976 and search for "ECMP".  (msg05980 and msg05981
> indicate there is some confusion about UDP headers and where they
> would go.)
>
>
> There is some pretty confusing text about the Tunnel Interface MTU -
> 4.3.1.  One part discusses setting this to 1500 bytes or more.  Later
> in that paragraph there is discussion of setting it to smaller values
> than this. The next paragraph discusses setting it to an infinite
> value.  I hope Fred will provide some guidance on how this would be
> done for IRON.
>
> Each ITE maintains some state for each ETE it tunnels packets to (by
> each ETE IP address).  I guess there would be a process for deleting
> this after a while if no packets are sent to that ETE, since over
> time, the state could grow to a considerable size, depending on how
> many ETEs there are in the world.
>
>     SEAL-ID most recently used.  A 32 bit value which is initialised
>     to some random value, and then incremented modulo 2^32 every time
>     a packet is tunneled to this particular ETE.
>
>     Further state as required to implement a window function or
>     some other arrangement by which this ITE can test a SEAL-ID
>     in an incoming message including at least these:
>
>        PTB from a router in the tunnel path (IPv6 only).
>        SEAL Packet Too Big message from ITE (Not needed in IRON?).
>        SEAL "Fragmentation Experienced" (IPv4 only).
>
>     Please see previous messages for Fred's approach to doing this
>     and my critique and timer-based suggestion.
>
> Other items of state are listed in 4.3.3:
>
>     MHLEN   Constant mid-level header length.  In this attempt to
>             describe SEAL, I will assume there is no need for these
>             "mid-level" (?) headers - between the outer IP header and
>             the SEAL header which precedes the encapsulated packet.
>             So I will assume this = 0.
>
>     HLEN    Constant outer header length: 20 for IPv4 and 40 for
>             IPv6 plus the length of the SEAL header - 8 bytes I guess
>             but Fred hasn't yet defined it.  It needs 4 bytes just
>             for the SEAL_ID.
>
>                IPv4 28           IPv6 48
>
>     S_MSS   Variable.  SEAL Maximum Segment Size.  I think this is
>             initialised to the value of (the ID says "no larger
>             than") the MTU of the "underlying IP interface".  I guess
>             this means that the ITE has a single interface for
>             sending out encapsulated packets on.  (To be pernickety,
>             it could be pointed out that the ITE might be a multiple
>             interface router, with different MTUs on each, and that
>             the best path to a given ETE might change from one
>             interface to another.)
>
>             According to the ID, this value may be adjusted upwards
>             and downwards based on received SEAL Reassembly Report
>             messages.
>
>             However, I think these are not of interest in IRON - and
>             that it is other messages which alter this value which
>             we need to consider:
>
>               IPv4: SEAL "Fragmentation Experienced" from the ETE.
>
>               IPv6: PTB from a router in the tunnel path.
>
>     S_MRU   Variable.  SEAL Maximum Reassembly Unit.  Initialised to
>             "infinity", but the effective value of S_MRU is never
>             more than 256 * S_MSS.  (Since S_MSS can rise and fall,
>             this means there are really two items of state: one the
>             limit and the other the effective value, which may be
>             increased or decreased according to S_MSS * 256, as long
>             as it remains less than or equal to the limit value.)
>
>
> SEAL for IPv4
> -------------
>
> DF=0 packets will be fragmented with standard IPv4 techniques before
> other processing, if they are longer than:
>
>    (MIN(S_MRU, S_MSS) - 28)
>
> The fragments will be of this length and will be tunneled as
> described below.
>
> DF=1 packets which are longer than:
>
>    (S_MRU - 28)
>
> will result in a PTB being sent to the Sending Host (SH).  Apart from
> being used to generate that PTB, the packet will be dropped.
>
> Encapsulation involves:
>
>   IPv4 outer header   With the 16 Identification bit field set to
>                       the 16 most significant bits of SEAL-ID.
>                       DF = 0, so the packet is fragmentable by
>                       any router which finds the packet is too long
>                       for its next-hop MTU.
>
>   SEAL header (32 bits) as described below.
>
>   The original traffic packet.
>
> The IPv4 SEAL header has a 16 bit field "ID-Extension"  which is set
> to the least significant 16 bits of SEAL-ID.  Section 4.2 discusses
> this.  I haven't figured out exactly how the other bits would be set.
>
> If the packet arrives intact, then the ETE decapsulates it and
> forwards the traffic packet to wherever it needs to go.  In IRON,
> most of the time, there is a single tunnel to the IRON router which
> directly connects to the end-user network.  However initial packets
> go to an IRON router (the VP router) which is typically not the
> router which connects to the end-user network - so then the packet
> would be decapsulated and tunneled in the same manner to the IRON
> router which connects to the destination network.
>
> If the packet is too long for one router in the ITE -> ETE path, then
> that router will fragment the whole packet and the ETE will (usually)
> receive the first and other fragments.  The first fragment should be
> as long as the limit imposed at that router by the next-hop MTU.  If
> there were two or more MTU limits, such as 1400 and then 1300, the
> fragment generated at the first limiting router would be 1400 bytes
> long, and this would be fragmented at the second router, so the first
> fragment would be 1300 bytes.
>
> In this way, the ETE discovers the PMTU of the ITE -> ETE tunnel path.
>
> The ETE does not attempt to deliver the packet, but sends a
> "Fragmentation Experienced" message to the ITE.  See 4.3.9.1.2 and
> 4.4.5.1.2.  This message is defined in Figure 6 and contains as much
> of the first fragment as would make the total message 576 bytes in
> length.  There are 20 bytes for the IPv4 header and 16 for this
> particular SEAL header, so up to 540 bytes of the first fragment will
> be sent back to the ITE.  There is no acknowledgement of reception by
> the ITE.
>
> The ITE can authenticate this message by looking at the 16 bit ID in
> the IPv4 header of the enclosed portion of the first fragment - and
> the 16 bits of "ID-Extension" in the SEAL header in that fragment.
>
> The S_MSS field of this message is set to the length of the first
> fragment, which is (or should be, and is assumed to be) the PMTU of
> the ITE -> ETE path.
>
> The S_MRU field may be set to zero.  As far as I know S_MRU is how
> big a packet the ETE is prepared to reconstruct from packets which
> are fragmented with IPv4's native fragmentation - but it may also
> apply to the use of SEAL's internal segmentation system, which I
> understand is not generally used for IRON.
>
> There is a separate SEAL PTB message (4.4.5.1.3) which is not
> relevant to IRON - since it concerns SEAL's internal "segmentation"
> system.
>
>
> I understand that the reason Fred uses DF=0 packets for IPv4 tunnels
> is that IPv4 routers, when sending back a PTB, are required (by RFC
> 1191) to only return the IPv4 header and the next 8 bytes.  This is
> enough for the ITE to authenticate the PTB as genuine, since these 8
> bytes are the SEAL header. With the IPv4 header, the ITE can
> therefore see the full 32 bit SEAL-ID.  However, there is nothing of
> the original traffic packet in the PTB, so the only way the ITE could
> generate a PTB to the SH would be to have cached the first 28 bytes
> of the original traffic packet.  To avoid the need for caching, and
> so as not to rely on PTBs from routers in the ITE -> ETE tunnel, SEAL
> ITEs tunnel IPv4 packets using DF=0 packets.
>
> The SEAL header used in this IPv4 process contains a flag which, when
> set, signals that the ETE should report the successful reception of
> the packet to the ITE.  I am not sure to what extent this would be
> used for IRON.
>
>
> SEAL for IPv6
> -------------
>
> DF=1 packets which are longer than:
>
>    (S_MRU - 48)
>
> will result in a PTB being sent to the Sending Host (SH).  Apart from
> being used to generate that PTB, the packet will be dropped.
>
> When encapsulating IPv6 packets, there will be a new SEAL header
> which Fred is yet to define.
>
> I guess it will be 8 bytes.  It will contain, at least, a 4 byte
> SEAL_ID field and a flag to ask the ETE to tell the ITE the packet
> was received correctly.
>
> If the packet does not encounter any MTU limits and arrives at the
> ETE, then the ETE decapsulates it and does whatever it needs to do
> with it - deliver it directly to the end-user network or perhaps (if
> the ETE is the VP IRON router) tunnel it again to an IRON router
> which will deliver it to the end-user network.
>
> If the packet hits an MTU limit, then the limiting router will send
> back a PTB to the ITE, and since this is IPv6, the PTB will contain
> enough of the packet for the ITE to construct a valid PTB for the SH.
>
> If the ITE sets the "Acknowledgement Requested" bit then the ETE will
> tell the ITE it has been received.  The ITE can therefore try
> different length packets to determine the PMTU to this ETE, even if
> it receives no PTBs from whichever router is causing the limitation.
>
>
> Rejecting larger packets directly
> ---------------------------------
>
> Once the ITE has established a value for S_MSS for a given ETE, then
> any packet it needs to tunnel to that ETE will be rejected with a PTB
> to the SH if, once it was encapsulated, it would be too long for the
> S_MSS of this ETE.   So if a DF=1 IPv4 packet arrives with a length
> longer than (S_MSS - 28), it will be rejected with a PTB.  Likewise
> an IPv6 packet longer than (S_MSS - 48).
>
> As far as I know, all IPv4 DF=0 packets longer than (S_MSS - 28) will
> be fragmented into IPv4 fragments of this length (with the final one
> potentially - typically - being shorter) and then the fragments will
> be tunnelled.  At the ETE, each fragment will be forwarded to the
> destination network and then host (or tunnelled to another IRON
> router which will forward them to the destination network and then
> host - and the destination host will reassemble the fragments using
> the standard IPv4 system.)
>
> The ITE can't reject any DF=0 packets.  It can however learn the PMTU
> to the ETE by the ETE reporting the size of any fragments which
> result from the tunneled versions of the IPv4 native fragments.  That
> will enable the ITE to use IPv4 native fragmentation on subsequent
> DF=0 packets to this ITE in order that they will fit within the newly
> discovered PMTU limitation, which is stored in this ETE's S_MSS variable.
>
> (See discussion in msg05979 and msg05981.)
>
>
>
> Adapting to changed PMTUs
> -------------------------
>
> I am not sure if or where it is specified, but I understand that SEAL
> will allow exploration of increased PMTUs.  According to RFC 1191 and
> therefore RFC 1981, the SH can try sending a longer packet than it
> has been told to send by a previous PTB, after 10 minutes has elapsed.

This is discussed in SEAL, Section 4.3.5.

> I assume that if the ITE is only handling DF=0 packets, or at least
> that if there are no DF=1 packets which are longer than the current
> limitation (S_MSS - 28) and that if DF=0 packets keep arriving,
> requiring native IPv4 fragmentation before encapsulation (that is,
> the DF=0 packets are longer than (S_MSS - 28)) then after 10 minutes
> or so, the ITE should explore sending larger packets into the tunnel.
>
> Except when lost packets prevent it, the SEAL ITE will instantly
> discover a reduced PMTU to a given ETE and so reduce that ETE's S_MSS
> value, due to:
>
>   IPv4:    Limiting router fragments the DF=0 tunnel packet and
>            the ETE reports to the ITE the length of the first
>            fragment, which is the new PMTU limit of this path.
>
>            If the original packet was DF=1, the ITE will generate
>            a suitable PTB to the SH.
>
>   IPv6:    A PTB arrives at the ITE, and the ITE sends a PTB to the
>            SH.
>
>
>
> Jumboframes
> -----------
>
> "Jumbograms" refer to IPv6 packets with special formats so they can
> be longer than 2^16 bytes, and can be as long as 4 gigabytes.
> Neither SEAL, Ivip's IPTM, nor any CES proposal attempts to deal with
> these.
>
> See discussion in msg05979 and msg05981 - Fred writes that SEAL can
> accommodate jumbograms - but I can't see how.

The draft talks about jumbograms and cites RFC2675. But,
it could probably benefit from a sentence or two telling
how the Jumbo Payload Option is processed.

> As far as I understand how SEAL would work, SEAL will be able to
> smoothly adapt to the appearance of jumboframe ~9k byte paths between
>  ITEs and ETEs.

Yes.

> Coping with blocked or missing PTBs
> -----------------------------------
>
> For IPv4, between the ITE and ETE, SEAL does not use DF=1 packets and
> so doesn't rely at all on PTBs.   Sometimes, I think, this DF=0
> arrangement would produce faster results than by using PTBs as is
> done with IPv6. Other times it might be slower - it depends on the
> location and nature of the MTU limits along the tunnel.
>
> It does rely on the limiting router(s) producing a first fragment
> which is the same length as the limiting PMTU of the ITE -> ETE path
> - which seems reasonable.
>
> For IPv6, SEAL relies on PTBs being sent by routers in the ITE -> ETE
> tunnel path - and these being received by the ITE.
>
> My impression is that there are problems today with some tunnels or
> combinations of tunnels not generating PTBs.  Also, some networks
> apparently stop the reception of PTBs from the DFZ, so this would
> probably prevent SEAL's PMTUD from working.  When Fred revises SEAL,
> the ITE will be able to tunnel IPv6 traffic packets and request that
> the ETE acknowledge their successful reception.
>
>
> For both IPv4 and IPv6, if the SH ignores PTBs, or if there is some
> filtering between the ITE and the SH which drops PTBs, then there's
> no way the SH is going to adapt its packets to the lengths required
> by SEAL to send them without fragmentation, segmentation or whatever.
>
> No CES architecture tries to cope with these problems - they must be
> fixed directly.
>
> As far as I know, SEAL does have "segmentation" capabilities - but
> these are not intended to be used within the IRON CES architecture.
> So SEAL's segmentation is no solution to this kind of failure of
> PMTUD.  There's nothing to be done about this - the fault is in the
> SH and/or the filtering, so these need to be fixed.

This discussion is not specific to SEAL nor even to
tunneling in general. If the SH is not accepting PTBs
from the gateway that connects the source's site to
the Internet, then there is a problem regardless of
the way in which the gateway connects.

Thanks - Fred
fred.l.templin@boeing.com

> _______________________________________________
> rrg mailing list
> rrg@irtf.org
> http://www.irtf.org/mailman/listinfo/rrg