Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Ole Troan <> Sat, 28 July 2018 21:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DF352130E82; Sat, 28 Jul 2018 14:09:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id k-RVwx41koMK; Sat, 28 Jul 2018 14:09:04 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 92F4E130DFB; Sat, 28 Jul 2018 14:09:04 -0700 (PDT)
Received: from [] ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id ADF1E2D4FA5; Sat, 28 Jul 2018 21:09:02 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (1.0)
From: Ole Troan <>
X-Mailer: iPhone Mail (15G77)
In-Reply-To: <>
Date: Sat, 28 Jul 2018 23:08:59 +0200
Cc: Joe Touch <>, "" <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <> <> <>
To: Tom Herbert <>
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 28 Jul 2018 21:09:07 -0000


> On 28 Jul 2018, at 20:48, Tom Herbert <> wrote:
> On Sat, Jul 28, 2018 at 11:24 AM, Ole Troan <> wrote:
>>> Here’s the thing about fragmentation:
>>>      1. all links have a maximum packet size
>>>      2. all tunneling/encapsulation/layering increases payload size
>>> 1+2 implies there is always the need for fragmentation at some layer:
>> 1 implies that.
>> There is enough head room designed in 1 to accommodate 2.
> Ole,
> I'm not sure I follow what you're saying here. Ethernet MTU, the most
> common value, is 1500 bytes. There's no reference to headroom for
> that. If you're referring to the idea of artificially lowering MTUs to
> account for potential overhead introduced in encapsulation that can be
> done. However to avoid fragmentation _entirely_ one would need to
> determine the maximum possible overhead ever added in encapsulation(s)
> (plural in case of nested encapsulations). In a sprawling and dynamic
> network that has different sub-domains and simultaneously uses
> different encapsulation protocols, determining that specific magic
> number might be infeasible. There is also the problem that some 0.01%
> corner case of encapsulation might need extra large 100s of bytes of
> overhead. Lowering the MTU for everyone just to avoid fragmentation
> for that case is a poor tradeoff-- it's better to fragment for that
> case.

I am talking of the headroom of 1500-1280 designed into IPv6. 

For restricted domains you can increase the headroom by increasing the link MTU.
And I am not saying there aren’t corner cases where fragmentation couldn’t be useful for tunnels.

>>>      3. fragmentation always splits info across packets
>>> And there’s something important about layering:
>>>      4. layering intends to isolate the behavior of one layer from another, such that
>>>      it will always be impossible for an upper layer to know exactly what is going on below,
>>>      i.e., to determine that limiting size across an entire path of possibly virtual tunnels
>>> The next two are where we get into trouble:
>>>      5. network devices increasingly WANT to inspect contents beyond the layer at which they are intended to operate
>> not that network devices have an intent in themselves, but yes, it seems like network operators want to inspect content or are forced into it because of the necessity of IPv4 address sharing.
>>>      6. inspecting contents ultimately means reassembly, at some level
>> _some_ content inspection would require that, but I don't think you can make that the general rule.
>> e.g. a NAT or an L4 ACL only needs access to the L4 header.
>>> Which brings us to the punchline:
>>>      7. but network device vendors want to save money, so they don’t want to reassemble at any layer
>> We'd all wish it to be that simple. Can you substantiate that claim?
>> You can easily make the speculation that customers don't want to pay what it costs to be able to do reassembly at terabit speeds...
>> Or accept that it's technically hard.
>> The implementations of e.g. NATs, IPv4 address sharing implementations I'm aware of do flavours of network layer reassembly.
>> However much money you throw at it, you can't reassemble fragments travelling on different paths, nor can you trivially make network layer reassembly not be an attack vector on those boxes.
>>> So I agree, IP fragmentation has its flaws - but those flaws are created not only because it leaves out the transport port numbers, but also because DPI and NAT devices don’t reassemble. And they don’t because it’s cheaper to sell devices that say they run at 1 Gbps (e.g.) that don’t bother to reassemble.
>> I don't agree with your conclusion.
>> NATs extend the network layer to include the L4 ports. NAT implementations of course do reassemble.
>>> I.e., it will never matter what layering we add to fix this - GRE, GUE, Aero, etc. - ultimately, we’re doomed to need fragmentation support down to IP exactly because:
>>>      a. #1-4 mean we need frag/reassembly at any tunnel ingress
>>>      b. vendors want to sell #5 at a price that is too low for them to support #6 (i.e., point #7)
>>> So pushing this to another layer will never solve it. What will solve it will only be a compliance requirement for #6 - which could be done right now, and has to be done for ANY solution to work.
>> For IPv4 address sharing specifically removing network layer fragmentation would be a solution.
>>> NOTE: even rewriting EVERY application won’t fix this, nor will deploying a new layer at any level.
>> For some type of content inspection that would require reassembling the whole application context.
>> But that's quite different from IPv4 address sharing, which we have unfortunately made an integral part of the Internet architecture.
>>> And yes, I do intend to add this to draft-ietf-tunnels, so it can be referred to elsewhere.
>> Ole
>> _______________________________________________
>> Int-area mailing list