Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Ole Troan <> Sat, 28 July 2018 18:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 680EF130E6B; Sat, 28 Jul 2018 11:24:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BUDuE0Uj9YmH; Sat, 28 Jul 2018 11:24:20 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E31DE126BED; Sat, 28 Jul 2018 11:24:20 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 180362D4FA7; Sat, 28 Jul 2018 18:24:19 +0000 (UTC)
Received: from [IPv6:::1] (localhost [IPv6:::1]) by (Postfix) with ESMTP id 1D5EA2033D6B7A; Sat, 28 Jul 2018 20:24:16 +0200 (CEST)
From: Ole Troan <>
Message-Id: <>
Content-Type: multipart/signed; boundary="Apple-Mail=_A1F3ED54-7114-4258-ACB7-62867D155108"; protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\))
Date: Sat, 28 Jul 2018 20:24:15 +0200
In-Reply-To: <>
Cc: Mikael Abrahamsson <>, "" <>, "" <>
To: Joe Touch <>
References: <> <> <> <> <> <> <>
X-Mailer: Apple Mail (2.3445.6.18)
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 28 Jul 2018 18:24:24 -0000

> Here’s the thing about fragmentation:
> 	1. all links have a maximum packet size
> 	2. all tunneling/encapsulation/layering increases payload size
> 1+2 implies there is always the need for fragmentation at some layer:

1 implies that.
There is enough head room designed in 1 to accommodate 2.

> 	3. fragmentation always splits info across packets
> And there’s something important about layering:
> 	4. layering intends to isolate the behavior of one layer from another, such that
> 	it will always be impossible for an upper layer to know exactly what is going on below,
> 	i.e., to determine that limiting size across an entire path of possibly virtual tunnels
> The next two are where we get into trouble:
> 	5. network devices increasingly WANT to inspect contents beyond the layer at which they are intended to operate

not that network devices have an intent in themselves, but yes, it seems like network operators want to inspect content or are forced into it because of the necessity of IPv4 address sharing.

> 	6. inspecting contents ultimately means reassembly, at some level

_some_ content inspection would require that, but I don't think you can make that the general rule.
e.g. a NAT or an L4 ACL only needs access to the L4 header.

> Which brings us to the punchline:
> 	7. but network device vendors want to save money, so they don’t want to reassemble at any layer

We'd all wish it to be that simple. Can you substantiate that claim?
You can easily make the speculation that customers don't want to pay what it costs to be able to do reassembly at terabit speeds...
Or accept that it's technically hard.

The implementations of e.g. NATs, IPv4 address sharing implementations I'm aware of do flavours of network layer reassembly.
However much money you throw at it, you can't reassemble fragments travelling on different paths, nor can you trivially make network layer reassembly not be an attack vector on those boxes.

> So I agree, IP fragmentation has its flaws - but those flaws are created not only because it leaves out the transport port numbers, but also because DPI and NAT devices don’t reassemble. And they don’t because it’s cheaper to sell devices that say they run at 1 Gbps (e.g.) that don’t bother to reassemble.

I don't agree with your conclusion.
NATs extend the network layer to include the L4 ports. NAT implementations of course do reassemble.

> I.e., it will never matter what layering we add to fix this - GRE, GUE, Aero, etc. - ultimately, we’re doomed to need fragmentation support down to IP exactly because:
> 	a. #1-4 mean we need frag/reassembly at any tunnel ingress
> 	b. vendors want to sell #5 at a price that is too low for them to support #6 (i.e., point #7)

> So pushing this to another layer will never solve it. What will solve it will only be a compliance requirement for #6 - which could be done right now, and has to be done for ANY solution to work.

For IPv4 address sharing specifically removing network layer fragmentation would be a solution.

> NOTE: even rewriting EVERY application won’t fix this, nor will deploying a new layer at any level.

For some type of content inspection that would require reassembling the whole application context.
But that's quite different from IPv4 address sharing, which we have unfortunately made an integral part of the Internet architecture.

> And yes, I do intend to add this to draft-ietf-tunnels, so it can be referred to elsewhere.