Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Joe Touch <> Sun, 29 July 2018 15:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 99320130E69; Sun, 29 Jul 2018 08:39:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.989
X-Spam-Status: No, score=-1.989 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id h1Nm6fiVhrSO; Sun, 29 Jul 2018 08:39:00 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B05F9130E80; Sun, 29 Jul 2018 08:38:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=O7ORgt73WyoEXYRSiBDGQxi1RIaZr19SW3+SbKNqxfU=; b=ApFbDtxwWEJinSGN5z9hCbnJZ zP0QPxYGsAP+0eHk2ziePMp+yhXfP2WG+GQ4/Hx6zKwumraAnbEmOviyjFtjmvVm1RPDAClU1R+lf yosIr+hEqE2itZY4/6/WiZrXXkMus320UBFKJOxC4ytbC6xIm9aoPgsS7J1+uvZTpktJWJueD6bIE u/NfW6zeDRnxGmRnmEDvEAFLAN8gQf1SDz4CfqHEENmLa6fu8FaA7uFqfZjrJgTnTXbx+Guohdsdu TmeyeTK1sv2wGTpUk/jLMl7akDZrGBvnQK4WeXlB9X3eKJMKTXgNyPsIvMljapFeXAuEfnkTuOq6o LoUmgW7fg==;
Received: from ([]:54635 helo=[]) by with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from <>) id 1fjnmb-001jCf-6F; Sun, 29 Jul 2018 11:38:58 -0400
Content-Type: multipart/alternative; boundary="Apple-Mail=_F00FC84D-E8C2-4B7D-8737-C8CC628225A1"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Joe Touch <>
In-Reply-To: <>
Date: Sun, 29 Jul 2018 08:38:56 -0700
Cc: Mikael Abrahamsson <>, "" <>, "" <>
Message-Id: <>
References: <> <> <> <> <> <> <> <>
To: Ole Troan <>
X-Mailer: Apple Mail (2.3445.9.1)
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
X-From-Rewrite: unmodified, already matched
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 29 Jul 2018 15:39:04 -0000

> On Jul 28, 2018, at 11:24 AM, Ole Troan <> wrote:
>> Here’s the thing about fragmentation:
>> 	1. all links have a maximum packet size
>> 	2. all tunneling/encapsulation/layering increases payload size
>> 1+2 implies there is always the need for fragmentation at some layer:
> 1 implies that.
> There is enough head room designed in 1 to accommodate 2.

As was noted, there’s never enough headroom because you can’t control (or even know) how many layers of tunnels your traffic experiences.

>> 	3. fragmentation always splits info across packets
>> And there’s something important about layering:
>> 	4. layering intends to isolate the behavior of one layer from another, such that
>> 	it will always be impossible for an upper layer to know exactly what is going on below,
>> 	i.e., to determine that limiting size across an entire path of possibly virtual tunnels
>> The next two are where we get into trouble:
>> 	5. network devices increasingly WANT to inspect contents beyond the layer at which they are intended to operate
> not that network devices have an intent in themselves, but yes, it seems like network operators want to inspect content or are forced into it because of the necessity of IPv4 address sharing.

They were “forced” into differentiating commercial from home customer pricing. NATs didn’t need to exist when they were first deployed.

That said, there’s no real problem with a NAT *IF* it acts as a host on the Internet
(see ouch, J: Middlebox Models Compatible with the Internet <>. USC/ISI (ISI-TR-711), 2016.)

>> 	6. inspecting contents ultimately means reassembly, at some level
> _some_ content inspection would require that, but I don't think you can make that the general rule.
> e.g. a NAT or an L4 ACL only needs access to the L4 header.

It’s a general rule; you merely refer to an instance that is relevant ONLY when the L3 header at which a router operates is followed (or expected to be followed) by an L4 header.

That may have been the Internet of 10 years ago, but it is less and less the Internet moving forward.

>> Which brings us to the punchline:
>> 	7. but network device vendors want to save money, so they don’t want to reassemble at any layer
> We'd all wish it to be that simple. Can you substantiate that claim?
> You can easily make the speculation that customers don't want to pay what it costs to be able to do reassembly at terabit speeds...
> Or accept that it's technically hard.

I’m not claiming it isn’t hard; I’m claiming that it’s always cheaper to *not* do something. 

My concern isn’t that reassembly isn’t being done; it’s that vendors sell devices that don’t make this clear - AND that they don’t pass on packets they don’t/can’t reassemble.

> The implementations of e.g. NATs, IPv4 address sharing implementations I'm aware of do flavours of network layer reassembly.

If they did so, we would not be here talking about considering IP fragmentation fragile.

> However much money you throw at it, you can't reassemble fragments travelling on different paths, nor can you trivially make network layer reassembly not be an attack vector on those boxes.

Agreed, but here’s the other point:

	Any device that inspects L4 content can do so ONLY as a proxy for the destination endpoint.

I.e., I know vendors WANT to sell devices they say can be deployed anywhere in the network, and operators believe that, but it’s wrong.

Basically, if you’re not at a place in the network where you represent that endpoint, you have no business acting as that endpoint - “full stop”.

>> So I agree, IP fragmentation has its flaws - but those flaws are created not only because it leaves out the transport port numbers, but also because DPI and NAT devices don’t reassemble. And they don’t because it’s cheaper to sell devices that say they run at 1 Gbps (e.g.) that don’t bother to reassemble.
> I don't agree with your conclusion.
> NATs extend the network layer to include the L4 ports. NAT implementations of course do reassemble.

See Touch, J: Middlebox Models Compatible with the Internet <>. USC/ISI (ISI-TR-711), 2016. 

NATs do not extend the network layer to include L4; they act as endpoints in the public net and routers in the private net - for the reason above (multipath), among others.

>> I.e., it will never matter what layering we add to fix this - GRE, GUE, Aero, etc. - ultimately, we’re doomed to need fragmentation support down to IP exactly because:
>> 	a. #1-4 mean we need frag/reassembly at any tunnel ingress
>> 	b. vendors want to sell #5 at a price that is too low for them to support #6 (i.e., point #7)
>> So pushing this to another layer will never solve it. What will solve it will only be a compliance requirement for #6 - which could be done right now, and has to be done for ANY solution to work.
> For IPv4 address sharing specifically removing network layer fragmentation would be a solution.

For ONE problem there is ONE patch until a new device looks further (to do port filtering on UDP tunneled traffic, e.g.). Then we’re back here talking about fragmentation over GUE considered fragile.

>> NOTE: even rewriting EVERY application won’t fix this, nor will deploying a new layer at any level.
> For some type of content inspection that would require reassembling the whole application context.
> But that's quite different from IPv4 address sharing, which we have unfortunately made an integral part of the Internet architecture.

The only difference is that the impact of the current L4 problem is being currently discussed. If you consider why it’s happening, it might be more clear that this problem will cycle around again and cannot be solved by patching this way.