Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Toerless Eckert <> Tue, 28 August 2018 22:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0046B130F44; Tue, 28 Aug 2018 15:09:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id VOQZI78EF3ik; Tue, 28 Aug 2018 15:09:20 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 218B012D7F8; Tue, 28 Aug 2018 15:09:19 -0700 (PDT)
Received: from ( [IPv6:2001:638:a000:4134::ffff:52]) by (Postfix) with ESMTP id C55DB58C513; Wed, 29 Aug 2018 00:09:15 +0200 (CEST)
Received: by (Postfix, from userid 10463) id B6BE9440054; Wed, 29 Aug 2018 00:09:15 +0200 (CEST)
Date: Wed, 29 Aug 2018 00:09:15 +0200
From: Toerless Eckert <>
To: Joe Touch <>
Cc: Christian Huitema <>, Tom Herbert <>, int-area <>,
Message-ID: <>
References: <> <> <> <> <> <> <> <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: NeoMutt/20170113 (1.7.2)
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Aug 2018 22:09:23 -0000

Thanks, Joe

This has gotten pretty long. Let me sumarize my suggestions upfront:

For the draft itself, how about it also consideres recommendations not only
for IPv6 but IPv4. Such as simply also only do what we've accepted to be
feasible for IPv6. Like: do never rely on in-network fragmentation but
only use DF packets.

The remaining considerations are more generic, and i wonder if this draft
wants to have the guts to even mention them. Probably not. But IMHO
we will continue to be architecturally stuck in the hole we are if we do not
tackle them for poor middleboxes:

The IETF may think they are bad and not needed, but reality does need middleboxes
that function at linerate of only per-packet inspection and not at the lower
speed achievable with "virtual reassembly based inspection". Therefore we
need options to have in every fragment all the same context that middleboxes
reasonably should be able to inspect. 

Its IMHO a clean pragmatic solution to say that this additional context can
be in the higher-layer protocol and that layer willingly exposes it to middleboxes -
and encrypts everthing else. This view then requires that higher-layer
protocol to perform packet layer fragmentation. Like we can do with
TCP. Thats why that approach is IMHO a good recommendation to use. If
other transport protocols want to support the same degree of interaction
with middleboxes. Fine. If not, fine too.

Given how we do have TCP PLMTUD, i think it would be nice to suggest the use
of it instead of relying on IP layer fragmentation to make TCP flows more
middlebox friendly. In this draft. AFAIK, its the only option that does not
require new spec work, thats why it could make the cut for this draft.

For longer term architectural evolution of middlebox friendly IP fragmentation,
we could indeed have a new IP option carrying that context, and fragmentation
would put this option into every fragment. The definition of this context could
be per-transport proto.

I think there could be better middlebox contexts better than port numbers,
but to make fragments work better for existing TCP/UDP middlebox
functions, those 32 bits are it. But given how we can expect exposure of
information only from willing higher layers, they will have a much easier
way to get what they want to support by packet layer fragmentation. A
simple generic packet layer fragmentation for UDP would therefore be nice IMHO
so that UDP applications wanting to be friendly would not have to
reinvent that wheel.

If we actually ever do such an IP option, it MUST be a destination option,
because the insufficient RFCs defining the treatment of hop-by-hop options
burned any ability to deploy those.

For IPsec, IP in IP or similar higher layer protocols, i would either
use them as the key beneficiary of generic UDP fragmentation (IPsec/IP-UDP-IP)
for pragmatic short term solutions, or else the IP option would equally
be applicable to them (interesting discussionw aht the best context for
them would be, but the two port numbers would make them be most compatible
with those typcialyl very TCP/UDP centric middlebox functions).

Specific answers to your points below


On Sun, Aug 26, 2018 at 08:01:07PM -0700, Joe Touch wrote:
> IPv6. IP options.

IMHO hop-by-hop optsion got burned and are undeployable because the
RFCs never made it mandatory enough to have them never impact forwarding
performance randonly and badly. We had to abandon perfectly good
hop-by-hop inspection solutions because there whee so many stupid router
implemntations out there that didn't even have the feature but would still
punt those packets to slow-path, then the operators saw those packets as
DoS packets and filtered them. 

> And (perhaps) any new proposed solution.

The main issue of everything written on top is that the main
business interests the IETF works against is that of non-middlebox
friendly participants.

> We have to aim at what network components *need* to do to participate. IP fragmentation is exactly that.

See above. The technical question is how to enable sufficient per-packet context.

> That???s easy to say, but since any host might be an IPsec tunnel endpoint, you???re back where we are now - needing IP fragmentation support everywhere, ultimately.

asked and answered.

> My view is simple - if you fix what we KNOW is wrong with NATs and firewalls - in KNOWN ways - then not only don???t we have to solve this problem, we don???t have a lot of other problems either (e.g., lack of state when a flow takes a different path into an enterprise).

Thats an orthogonal discussion. The goal in this fragmentation
thread is purely to eliminate virtual-reassembly complexity on
middleboxes. However good or bad it is what they do.

Actually, its not fully orthogonal. By eliminating virtual
reassembly needs, we make the middlebox also work for
cases where fragments use different paths.

> > And yes, that would enable
> > me to make NAT and firewalls (for the firewall functions i think make sense)
> > for host stack traffic something that does not require to bother about
> > fragmentation and could therefore be done easier at higher speed
> > and architecturally as something only in the network layer. 
> You???re optimizing a long-term impact solution for a short-term limitation. That???s a bad idea; protocols last a very long time.

The diference between per-packet operation and virtual-fragment-reassembly
is orders of magnitude of complexity. Its the same order of magnitude
a business problem for network device development as those unnecessary RTTs
in pre-QUIC transport are for companies trying to make make money
from ADD millenials on web pages.

> > The draft in question argues to limit what future work should do
> > within the existing requirements, which is fine. I was merely
> > pointing out that we could move more into what i think would be 
> > a useful evolution if we also went beyond our current arch
> > and evolved it. 
> I am fine with encouraging the *search* for new solutions, as long as *in the meantime* we also call out firewalls and NATs for how they are already broken. Until IP fragmentation is deprecated, that has to be our position as a community.

Probably easier trying to figure out what subset of NAT/FW/middleboxes
we'd find to be worthwhile enough to support explicitly. Especially
for firewalls, many business interests say NONE and they will only
revert position when e.g.: they fail to get enough market share
with that position.

> > I think fragmentation is best pushed up on the stack.
> Again, please tell me how to do that for IPsec tunnels - which, again, can start/end anywhere in the network.

See above. I was talking about the pragmatic approach for consenting hosts and
middleboxes to get away without having to enhance IP.

> >>> If i wouldn't have to worry about such proxy forwarding plane capabilities,
> >>> i definitely would prefer models like SOCKS. If i have to think about them
> >>> it becomes certainly difficult to even model this well.
> >> When you find a complete model better than the Internet, propose it.
> >> Until then...
> > 
> > HTTPs over DWDM with application layer proxies on every hop.
> > You didn't define how to measure "better" ;-)
> Agreed, but that???s exactly where you???re headed when you say ???kick the can down the road to the upper layer protocol???.

I am still not sure about the meaning of your answers to Ole, but i
may have tried to argue on this point ("i like SOCKS") maybe in
the direction of how you may view middleboxes in your paper, e.g.: doing a lot
more higher layer. But thats a much larger story independent of the
problem of virtual fragment reassembly, which i think should be the
only relevant issue in this fragmentation specific thread (and
not the general bitching or improvments on firewall semantics).