Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Tom Herbert <> Sat, 28 July 2018 18:48 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6F6B013108E for <>; Sat, 28 Jul 2018 11:48:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id EQsFOcPpxseA for <>; Sat, 28 Jul 2018 11:48:23 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:400d:c0d::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id CEAE812D7F8 for <>; Sat, 28 Jul 2018 11:48:23 -0700 (PDT)
Received: by with SMTP id c15-v6so8377022qtp.0 for <>; Sat, 28 Jul 2018 11:48:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=zHJKJ5eUt/iEcvLG6ydTHHLuP2BBStEc3UFEUoKoP5g=; b=DWf4gRZLfqH2hwU2VV2ATATyDXNoJa6eIiID1cGzZG6C/zjQTb4tOenCBCmC/yCY8c bKb0E0mxvqzhIi+rhI4ufax8sjPZe7fqDcDWhbm0Y6n3rADOnLuvS0maV/tZmw6cFhkY hKrGe+Pn9+3o8wRT/O/UCp1w4ja9wrmtMYS35Dr+07n9lv4qjINACxjkBNxMrZwbipcx 7odJJSWm4cgpE5/0Eo0bBUFQ99RnV9lF3HLd1Vodoa2EHmhw4mIL3xwA+VBwbihRCXVR RAI64zbTAwlbNHJSjYdoYbAvwbaYlaomhAN4nFhgCQ2k48FVC0eg8cXE/KpnyxsYE0TF ykXg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zHJKJ5eUt/iEcvLG6ydTHHLuP2BBStEc3UFEUoKoP5g=; b=paJetJH0YrBWP8LEtJygzX9EM0OT6Z7aJrN9L4eTbA6jMS7gz6OhtI+3GvicWjVm+Z srTd4Eiy+9WF5RohSS1CiLjtMmNRBdRO6z4L1oJxeGxpy4VQM4/kR2jPGL78EIvI7T3+ VM/FlWxFEm1W4nUD0hdV3++hqNRLSxLiF39nlHoapogXkRilbp0tyagCK0T0zjeykchv H+uFPjswef4V/gZTLvKke6QapyQrqgPv2NBow6gB63bbsPDj+JIKJKseNb8llk45nLyy dkje7bGYch4MiKcmNVdQPvAsNmEc9Rtu38jWLSUK4MordAXoB7waJXRJ7dOUco9OfmEd 82tg==
X-Gm-Message-State: AOUpUlEKWUEAUpL8OZxBJrf5b4Ro2CTl6Bl2W5WhZ6xA6QrnR7H9sRIZ OCkElJB22TR2J9uLohgstpOZI7GlKZMv02tkErfKSA==
X-Google-Smtp-Source: AAOMgpdWl90LfiJRrJzMQY2gB7PuNYTqj6MzYxxlGpskqUAZcCXmyV53TQ8VgDIcLK+NsoqBdK7tnjaJWZD0ZwLfe1g=
X-Received: by 2002:ac8:611c:: with SMTP id a28-v6mr10993803qtm.130.1532803702640; Sat, 28 Jul 2018 11:48:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:ac8:3304:0:0:0:0:0 with HTTP; Sat, 28 Jul 2018 11:48:21 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <>
From: Tom Herbert <>
Date: Sat, 28 Jul 2018 11:48:21 -0700
Message-ID: <>
To: Ole Troan <>
Cc: Joe Touch <>, "" <>, "" <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 28 Jul 2018 18:48:27 -0000

On Sat, Jul 28, 2018 at 11:24 AM, Ole Troan <> wrote:
>> Here’s the thing about fragmentation:
>>       1. all links have a maximum packet size
>>       2. all tunneling/encapsulation/layering increases payload size
>> 1+2 implies there is always the need for fragmentation at some layer:
> 1 implies that.
> There is enough head room designed in 1 to accommodate 2.

I'm not sure I follow what you're saying here. Ethernet MTU, the most
common value, is 1500 bytes. There's no reference to headroom for
that. If you're referring to the idea of artificially lowering MTUs to
account for potential overhead introduced in encapsulation that can be
done. However to avoid fragmentation _entirely_ one would need to
determine the maximum possible overhead ever added in encapsulation(s)
(plural in case of nested encapsulations). In a sprawling and dynamic
network that has different sub-domains and simultaneously uses
different encapsulation protocols, determining that specific magic
number might be infeasible. There is also the problem that some 0.01%
corner case of encapsulation might need extra large 100s of bytes of
overhead. Lowering the MTU for everyone just to avoid fragmentation
for that case is a poor tradeoff-- it's better to fragment for that


>>       3. fragmentation always splits info across packets
>> And there’s something important about layering:
>>       4. layering intends to isolate the behavior of one layer from another, such that
>>       it will always be impossible for an upper layer to know exactly what is going on below,
>>       i.e., to determine that limiting size across an entire path of possibly virtual tunnels
>> The next two are where we get into trouble:
>>       5. network devices increasingly WANT to inspect contents beyond the layer at which they are intended to operate
> not that network devices have an intent in themselves, but yes, it seems like network operators want to inspect content or are forced into it because of the necessity of IPv4 address sharing.
>>       6. inspecting contents ultimately means reassembly, at some level
> _some_ content inspection would require that, but I don't think you can make that the general rule.
> e.g. a NAT or an L4 ACL only needs access to the L4 header.
>> Which brings us to the punchline:
>>       7. but network device vendors want to save money, so they don’t want to reassemble at any layer
> We'd all wish it to be that simple. Can you substantiate that claim?
> You can easily make the speculation that customers don't want to pay what it costs to be able to do reassembly at terabit speeds...
> Or accept that it's technically hard.
> The implementations of e.g. NATs, IPv4 address sharing implementations I'm aware of do flavours of network layer reassembly.
> However much money you throw at it, you can't reassemble fragments travelling on different paths, nor can you trivially make network layer reassembly not be an attack vector on those boxes.
>> So I agree, IP fragmentation has its flaws - but those flaws are created not only because it leaves out the transport port numbers, but also because DPI and NAT devices don’t reassemble. And they don’t because it’s cheaper to sell devices that say they run at 1 Gbps (e.g.) that don’t bother to reassemble.
> I don't agree with your conclusion.
> NATs extend the network layer to include the L4 ports. NAT implementations of course do reassemble.
>> I.e., it will never matter what layering we add to fix this - GRE, GUE, Aero, etc. - ultimately, we’re doomed to need fragmentation support down to IP exactly because:
>>       a. #1-4 mean we need frag/reassembly at any tunnel ingress
>>       b. vendors want to sell #5 at a price that is too low for them to support #6 (i.e., point #7)
>> So pushing this to another layer will never solve it. What will solve it will only be a compliance requirement for #6 - which could be done right now, and has to be done for ANY solution to work.
> For IPv4 address sharing specifically removing network layer fragmentation would be a solution.
>> NOTE: even rewriting EVERY application won’t fix this, nor will deploying a new layer at any level.
> For some type of content inspection that would require reassembling the whole application context.
> But that's quite different from IPv4 address sharing, which we have unfortunately made an integral part of the Internet architecture.
>> And yes, I do intend to add this to draft-ietf-tunnels, so it can be referred to elsewhere.
> Ole
> _______________________________________________
> Int-area mailing list