Re: [Softwires] [tsvwg] Is it feasible to perform fragmentation on UDP encapsulated packets.

Tom Herbert <> Mon, 13 June 2016 16:29 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0563512D87E for <>; Mon, 13 Jun 2016 09:29:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id nJPt0tMC-5Ap for <>; Mon, 13 Jun 2016 09:29:12 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4001:c0b::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 0658812D583 for <>; Mon, 13 Jun 2016 09:29:07 -0700 (PDT)
Received: by with SMTP id h190so46856111ith.1 for <>; Mon, 13 Jun 2016 09:29:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=du4m1CiAht/rWn4Pu541Szj3Q9nuu59Dq4Mqrbk8uQE=; b=bJbvuLN8iv02D9MyVIThRticmNUVLQpNOCfGrMOvmgUzjeD5sK5T2eRSUE8K1rl1Uu RWC1v3P8Jj5DUAN4PuDTHx4eNiwO0NT8A/OmyBAH+6nMdAgn1GagGhJrmN5z1XN+68he 6fQHMHth/W0He/YSnWWF7Z0kPhIqMlGfZIP/fBW8ebsn1qZ0S5ELgvc6bTx1aV6UTbnp 1YTmVryS1b0ut8XkYeY1SRAJ/I/npqNZlBJ7kcmvSkBOW0/FrkzsJDIiNETVEXpv9tX6 klra8rK7AaiyKcjYH5r0ASt1woSosj1t7vgSIlTDK5FyhfsoFB1BUrUsm7ojLJAhKozl Btww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=du4m1CiAht/rWn4Pu541Szj3Q9nuu59Dq4Mqrbk8uQE=; b=IYdtIrVkUK5zKshtdXD6n6ASYo98ywFTxiKjRx6Py2fygaK571YwWX8vvBB7qfpU16 Tm4YWkeiJMae49EhLVI4jVQmT747n3HNtjxHnC1dxwbKNbgnGCtyur8oKoetMpeGeCNH fLJNPxw+aSOHQYOyssvpI+mEYfjLRlnVPvRgJkoN1iz2nSiB+F/FQz0X5wg/AXCyheNE 55NPaPYs2HXGxlMgsIE16NztQzOwtz9bBmHroC0hhKDlh39tvu5Hzue4vWHD9js9SMOg vn1PMGslCElTmF3mvyGiZv6+SsxqCYJ73VMZCqxOuH40xywjQD82od4krTAGd1ZGFDkS +NkQ==
X-Gm-Message-State: ALyK8tLRM5Xl8tHFbHGcauEHdgna2EQdUtVG0U4plpCNIDOUkwM2spj4JskR/aM++wngMTdO2CH82SgyjyLAow==
MIME-Version: 1.0
X-Received: by with SMTP id e88mr841383itd.88.1465835346137; Mon, 13 Jun 2016 09:29:06 -0700 (PDT)
Received: by with HTTP; Mon, 13 Jun 2016 09:29:05 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
Date: Mon, 13 Jun 2016 09:29:05 -0700
Message-ID: <>
From: Tom Herbert <>
To: Lloyd Wood <>
Content-Type: text/plain; charset=UTF-8
Archived-At: <>
Cc: "" <>, "" <>, "" <>, Softwires WG <>, "" <>
Subject: Re: [Softwires] [tsvwg] Is it feasible to perform fragmentation on UDP encapsulated packets.
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: softwires wg discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 13 Jun 2016 16:29:14 -0000

On Sat, Jun 11, 2016 at 10:11 PM, Lloyd Wood <> wrote:
> Fragmentation should be strongly discouraged.
> if you've designed a tunnelling solution and you have fragmentation
> happening as a matter of course, you've designed it wrong.

As mentioned before we do not always have control of the underlay MTUs
to be able to set Jumbo Frames to make MTU larger than 1500 to account
for tunnel fragmentation. At Facebook for instance, we run IP tunnels
between L4 load balancers and backend servers. In certain POPs we
cannot get more than 1500 MTU in the path. The ingress MTU into the
load balancer is 1500 so it is possible to receive packets from the
Internet that are greater than the tunnel MTU. In this case we will
fragment. Since the tunnel is behind a firewall and our devices and
the path from load balancers to backends is low loss there is little
concern of being DOSed.


> Lloyd Wood
> On Friday, May 27, 2016, 8:10 PM, Xuxiaohu <> wrote:
> <Note that I have changed the subject of the email hence it has nothing to
> do with the WG adoption call now. It's just a discussion on a particular
> issue which is related to those WGs which are working on UDP tunnels. The
> reason for containing the old email is to use it as a background which may
> be useful for better understanding of this particular issue>
> The possible side-effect of performing fragmentation on UDP encapsulated
> packets is to worsen the reassembly burden on tunnel egress since fragments
> of UDP encapsulated packets are more likely to be forwarded across different
> paths towards the tunnel egress than those of IP or GRE encapsulated
> packets.
> It seems that most X-over-UDP proposals choose to prohibit the tunnel
> ingress from performing fragmentation on UDP encapsulated packets. See the
> following quoted text regarding fragmentation from those X-over-UDP drafts:
> When an ITR receives a packet from a site-facing interface and adds H
>   octets worth of encapsulation to yield a packet size greater than L
>   octets, it resolves the MTU issue by first splitting the original
>   packet into 2 equal-sized fragments.  A LISP header is then prepended
>   to each fragment.
> VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
>   fragment encapsulated VXLAN packets due to the larger frame size.
>   The destination VTEP MAY silently discard such VXLAN fragments.
> VTEPs MUST never fragment an encapsulated VXLAN GPE packet, and when
>   the outer IP header is IPv4, VTEPs MUST set the DF bit in the outer
>   IPv4 header.
>   To prevent fragmentation and maximize performance, the best practice
>   when using Geneve is to ensure that the MTU of the physical network
>   is greater than or equal to the MTU of the encapsulated network plus
>   tunnel headers.
> GUE:
>     If a packet is fragmented before encapsulation in GUE, all the
>     related fragments must be encapsulated using the same source port
>     (inner flow identifier). An operator may set MTU to account for
>     encapsulation overhead and reduce the likelihood of fragmentation.
> Regarding packet fragmentation, an encapsulator/decapsulator SHOULD
>   be compliant with [RFC7588] and perform fragmentation before the
>   encapsulation.
> However, the above choice seems conflict with the requirements as described
> in
> I wonder whether the IETF should reach a consensus on whether or not the
> fragmentation on UDP encapsulated packets should be allowed.
> Best regards,
> Xiaohu
>> -----Original Message-----
>> From: Xuxiaohu
>> Sent: Thursday, May 26, 2016 4:35 PM
>> To: 'Joe Touch'; Fred Baker (fred); Wassim Haddad
>> Cc:
>> Subject: RE: [Int-area] Call for adoption of draft-xu-intarea-ip-in-udp-03
>> > -----Original Message-----
>> > From: Joe Touch []
>> > Sent: Thursday, May 26, 2016 2:11 AM
>> > To: Xuxiaohu; Fred Baker (fred); Wassim Haddad
>> > Cc:
>> > Subject: Re: [Int-area] Call for adoption of
>> > draft-xu-intarea-ip-in-udp-03
>> >
>> >
>> >
>> > On 5/24/2016 7:24 PM, Xuxiaohu wrote:
>> > > Hi Joe,
>> > >
>> > > I wonder whether you want to tell me the following truth by the
>> > > example that you gave: no matter whatever improvements we had done
>> > > with this draft, those persons who dislike it by the light of nature
>> > > would dislike it in the end.
>> >
>> > The only improvements that would make this doc useful would be to add
>> > capabilities already in GRE/UDP or GUE/UDP, which we already have.
>> Let's go over the four things you mentioned earlier in GRE/UDP and
>>     - stronger checksums
>> In GRE/UDP, in order to use UDP-zero-checksum, it gave the following
>> restrictions:
>> " 6. UDP Checksum Handling
>>    6.1. UDP Checksum with IPv4
>>    For UDP in IPv4, the UDP checksum MUST be processed as specified in
>>    [RFC768] and [RFC1122] for both transmit and receive. The IPv4
>> Yong, Crabber, Xu, Herbert                                    [Page 12]
>> --------------------------------------------------------------------------------
>> Internet-Draft          GRE-in-UDP Encapsulation            March 2016
>>    header includes a checksum which protects against mis-delivery of
>>    the packet due to corruption of IP addresses. The UDP checksum
>>    potentially provides protection against corruption of the UDP header,
>>    GRE header, and GRE payload. Disabling the use of checksums is a
>>    deployment consideration that should take into account the risk and
>>    effects of packet corruption.
>>    When a decapsulator receives a packet, the UDP checksum field MUST
>>    be processed. If the UDP checksum is non-zero, the decapsulator MUST
>>    verify the checksum before accepting the packet. By default a
>>    decapsulator SHOULD accept UDP packets with a zero checksum. A node
>>    MAY be configured to disallow zero checksums per [RFC1122]; this may
>>    be done selectively, for instance disallowing zero checksums from
>>    certain hosts that are known to be sending over paths subject to
>>    packet corruption. If verification of a non-zero checksum fails, a
>>    decapsulator lacks the capability to verify a non-zero checksum, or
>>    a packet with a zero-checksum was received and the decapsulator is
>>    configured to disallow, the packet MUST be dropped and an event MAY
>>    be logged.
>>    6.2. UDP Checksum with IPv6
>>    For UDP in IPv6, the UDP checksum MUST be processed as specified in
>>    [RFC768] and [RFC2460] for both transmit and receive.
>>    When UDP is used over IPv6, the UDP checksum is relied upon to
>>    protect both the IPv6 and UDP headers from corruption. As such, A
>>    default GRE-in-UDP Tunnel MUST perform UDP checksum; A TMCE GRE-in-
>>    UDP Tunnel MAY be configured with the UDP zero-checksum mode if the
>>    traffic-managed controlled environment or a set of closely
>>    cooperating traffic-managed controlled environments (such as by
>>    network operators who have agreed to work together in order to
>>    jointly provide specific services) meet at least one of following
>>    conditions:
>>    a. It is known (perhaps through knowledge of equipment types and
>>      lower layer checks) that packet corruption is exceptionally
>>      unlikely and where the operator is willing to take the risk of
>>      undetected packet corruption.
>>    b. It is judged through observational measurements (perhaps of
>>      historic or current traffic flows that use a non-zero checksum)
>>      that the level of packet corruption is tolerably low and where
>>      the operator is willing to take the risk of undetected packet
>>      corruption.
>> Yong, Crabber, Xu, Herbert                                    [Page 13]
>> --------------------------------------------------------------------------------
>> Internet-Draft          GRE-in-UDP Encapsulation            March 2016
>>    c. Carrying applications that are tolerant of mis-delivered or
>>      corrupted packets (perhaps through higher layer checksum,
>>      validation, and retransmission or transmission redundancy) where
>>      the operator is willing to rely on the applications using the
>>      tunnel to survive any corrupt packets.
>>    The following requirements apply to a TMCE GRE-in-UDP tunnel that
>>    use UDP zero-checksum mode:
>>      a. Use of the UDP checksum with IPv6 MUST be the default
>>        configuration of all GRE-in-UDP tunnels.
>>      b. The GRE-in-UDP tunnel implementation MUST comply with all
>>        requirements specified in Section 4 of [RFC6936] and with
>>        requirement 1 specified in Section 5 of [RFC6936].
>>      c. The tunnel decapsulator SHOULD only allow the use of UDP zero-
>>        checksum mode for IPv6 on a single received UDP Destination
>>        Port regardless of the encapsulator. The motivation for this
>>        requirement is possible corruption of the UDP Destination Port,
>>        which may cause packet delivery to the wrong UDP port. If that
>>        other UDP port requires the UDP checksum, the mis-delivered
>>        packet will be discarded.
>>      d. It is RECOMMENDED that the UDP zero-checksum mode for IPv6 is
>>        only enabled for certain selected source addresses. The tunnel
>>        decapsulator MUST check that the source and destination IPv6
>>        addresses are valid for the GRE-in-UDP tunnel on which the
>>        packet was received if that tunnel uses UDP zero-checksum mode
>>        and discard any packet for which this check fails.
>>      e. The tunnel encapsulator SHOULD use different IPv6 addresses for
>>        each GRE-in-UDP tunnel that uses UDP zero-checksum mode
>>        regardless of the decapsulator in order to strengthen the
>>        decapsulator's check of the IPv6 source address (i.e., the same
>>        IPv6 source address SHOULD NOT be used with more than one IPv6
>>        destination address, independent of whether that destination
>>        address is a unicast or multicast address). When this is not
>>        possible, it is RECOMMENDED to use each source IPv6 address for
>>        as few UDP zero-checksum mode GRE-in-UDP tunnels as is feasible.
>>      f. When any middlebox exists on the path of a GRE-in-UDP tunnel,
>>        it is RECOMMENDED to use the default mode, i.e. use UDP
>>        checksum, to reduce the chance that the encapsulated packets to
>>        be dropped.
>> Yong, Crabber, Xu, Herbert                                    [Page 14]
>> --------------------------------------------------------------------------------
>> Internet-Draft          GRE-in-UDP Encapsulation            March 2016
>>      g. Any middlebox that allows the UDP zero-checksum mode for IPv6
>>        MUST comply with requirement 1 and 8-10 in Section 5 of
>>        [RFC6936].
>>      h. Measures SHOULD be taken to prevent IPv6 traffic with zero UDP
>>        checksums from "escaping" to the general Internet; see Section
>>        8 for examples of such measures.
>>      i. IPv6 traffic with zero UDP checksums MUST be actively monitored
>>        for errors by the network operator. For example, the operator
>>        may monitor Ethernet layer packet error rates.
>>      j. If a packet with a non-zero checksum is received, the checksum
>>        MUST be verified before accepting the packet. This is
>>        regardless of whether the tunnel encapsulator and decapsulator
>>        have been configured with UDP zero-checksum mode.
>>    The above requirements do not change either the requirements
>>    specified in [RFC2460] as modified by [RFC6935] or the requirements
>>    specified in [RFC6936].
>>    The requirement to check the source IPv6 address in addition to the
>>    destination IPv6 address, plus the strong recommendation against
>>    reuse of source IPv6 addresses among GRE-in-UDP tunnels collectively
>>    provide some mitigation for the absence of UDP checksum coverage of
>>    the IPv6 header. A traffic-managed controlled environment that
>>    satisfies at least one of three conditions listed above in this
>>    section provides additional assurance.
>>    A GRE-in-UDP tunnel is suitable for transmission over lower layers
>>    in the traffic-managed controlled environments that are allowed by
>>    the exceptions stated above and the rate of corruption of the inner
>>    IP packet on such networks is not expected to increase by comparison
>>    to GRE traffic that is not encapsulated in UDP.  For these reasons,
>>    GRE-in-UDP does not provide an additional integrity check except
>>    when GRE checksum is used when UDP zero-checksum mode is used with
>>    IPv6, and this design is in accordance with requirements 2, 3 and 5
>>    specified in Section 5 of [RFC6936].
>>    Generic Router Encapsulation (GRE) does not accumulate incorrect
>>    state as a consequence of GRE header corruption. A corrupt GRE
>>    packet may result in either packet discard or forwarding of the
>>    packet without accumulation of GRE state. Active monitoring of GRE-
>>    in-UDP traffic for errors is REQUIRED as occurrence of errors will
>>    result in some accumulation of error information outside the
>>    protocol for operational and management purposes. This design is in
>>    accordance with requirement 4 specified in Section 5 of [RFC6936].
>> Yong, Crabber, Xu, Herbert                                    [Page 15]
>> --------------------------------------------------------------------------------
>> Internet-Draft          GRE-in-UDP Encapsulation            March 2016
>>    The remaining requirements specified in Section 5 of [RFC6936] are
>>    not applicable to GRE-in-UDP.  Requirements 6 and 7 do not apply
>>    because GRE does not include a control feedback mechanism.
>>    Requirements 8-10 are middlebox requirements that do not apply to
>>    GRE-in-UDP tunnel endpoints (see Section 7.1 for further middlebox
>>    discussion).
>>    It is worth mentioning that the use of a zero UDP checksum should
>>    present the equivalent risk of undetected packet corruption when
>>    sending similar packet using GRE-in-IPv6 without UDP [RFC7676] and
>>    without GRE checksums.
>>    In summary, a TMCE GRE-in-UDP Tunnel is allowed to use UDP-zero-
>>    checksum mode for IPv6 when the conditions and requirements stated
>>    above are met. Otherwise the UDP checksum need to be used for IPv6
>>    as specified in [RFC768] and [RFC2460]. Use of GRE checksum is
>>    RECOMMENED when the UDP checksum is not used.
>> "
>> In GUE, to support UDP-checksum-zero, it said
>> " Therefore, when GUE is used over
>>    IPv6, either the UDP checksum must be enabled or the GUE header
>>    checksum must be used.  An encapsulator MAY set a zero UDP checksum
>>    for performance or implementation reasons, in which case the GUE
>>    header checksum MUST be used or applicable requirements for using
>>    zero UDP checksums in [GREUDP] MUST be met. If the UDP checksum is
>>    enabled, then the GUE header checksum should not be used since it is
>>    mostly redundant."
>> It's easy for me to add the similar words to the IP-in-UDP draft like "the
>> applicable requirements for using zero UDP checksum in [GREUDP] MUST be
>> met when zero UDP checksum is used by the tunnel ingress". However, the
>> major goal for disabling the UDP checksum is to improve the performance.
>> When GUE header checksum is used and/or the bunch of applicable
>> requirements as described in GRE/UDP are verified, is the goal of
>> improving
>> performance still achievable? If not, why not directly enable the
>> UDP-checksum
>> instead?
>>     - fragmentation support
>> In GRE/UDP, it said
>> " 4.1. MTU and Fragmentation
>>    Regarding packet fragmentation, an encapsulator/decapsulator SHOULD
>>    be compliant with [RFC7588] and perform fragmentation before the
>>    encapsulation. The size of fragments SHOULD be less or equal to the
>>    PMTU associated with the path between the GRE ingress and the GRE
>>    egress tunnel endpoints minus the GRE and UDP overhead ..."
>> in GUE, it said
>> " 4.9. MTU and fragmentation
>>    Standard conventions for handling of MTU (Maximum Transmission Unit)
>>    and fragmentation in conjunction with networking tunnels
>>    (encapsulation of layer 2 or layer 3 packets) should be followed.
>>    Details are described in MTU and Fragmentation Issues with In-the-
>>    Network Tunneling [RFC4459]... "
>> It seems that the only missing thing in the IP-in-UDP draft is to allow
>> the outer
>> fragmentation. However, as it said in
>> (, "
>> ...IPsec
>> performs only Outer Fragmentation; this distinguishes it from IP-in-IP,
>> which
>> performs only Inner Fragmentation. " Note that IP-in-IP is the dominant
>> encapsulation choice within Softwires networks. In other words, performing
>> only inner fragmentation works very well in practice. Furthermore, the
>> outer
>> fragmentation issue (e.g., reassembly cost for the egress) would become
>> even
>> worse since the fragments of X-in-UDP packets are more likely to be
>> forwarded
>> across different paths than those of X-in-IP and X-in-GRE packets. Hence,
>> I'm
>> wondering whether it's worthwhile to support the outer fragmentation on
>> UDP
>> encapsulated packets which seems useless in practice.
>>     - signalling support (e.g., to test whether a tunnel is up or
>>     to measure MTUs)
>> I haven't found any description of this in both GRE/UDP and GUE. Did you?
>>     - support for robust ID fields (related to fragmentation,
>>     e.g., to overcome the limits of IPv4 ID as per RFC 6864)
>> I haven't found any description of this in both GRE/UDP and GUE. Did you?
>> Xiaohu
>> > It is not our obligation to find a way for your document to proceed -
>> > that onus is on you.
>> >
>> > Joe