Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Toerless Eckert <> Wed, 29 August 2018 00:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CD3F8130E28; Tue, 28 Aug 2018 17:24:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Na20oNYtVZTU; Tue, 28 Aug 2018 17:24:35 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 3499C12426A; Tue, 28 Aug 2018 17:24:35 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id EE70358C513; Wed, 29 Aug 2018 02:24:30 +0200 (CEST)
Received: by (Postfix, from userid 10463) id E0817440054; Wed, 29 Aug 2018 02:24:30 +0200 (CEST)
Date: Wed, 29 Aug 2018 02:24:30 +0200
From: Toerless Eckert <>
To: Tom Herbert <>
Cc: Joe Touch <>, Christian Huitema <>, int-area <>,
Message-ID: <>
References: <> <> <> <> <> <> <> <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: NeoMutt/20170113 (1.7.2)
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 29 Aug 2018 00:24:39 -0000

On Tue, Aug 28, 2018 at 03:51:58PM -0700, Tom Herbert wrote:
> I think it's the opposite-- the definition of the context should be
> protocol agnostic. We need to get middleboxes out of doing DPI and to
> stop worrying only about select transport protocols. So we need a
> mechanism  that works equally well with with TCP, UDP, SCTP, ICMP,
> IPsec, fragments, etc. It definitely needs to be secure though.

Sure, i meant to imply that port-numbers are useful pragmatically,
but other context identifiers would long term be better. 
Demux-Identifiers at the granualarity of a subscriber or 
application wold be a lot more scalable than flow identifiers.

Security is a wide topic. The firewall function of permitting return
traffic on a flow for internally initiated flows for example is a
wonderful simple function that in most deployment does a fine job
without additional security. And in a lot of embedded/walled-garden
networks, additionals ecurity throgh e.g.: ACLs (like through MUD)
is a more appropriate solution than cryptographic security. So
a bit more exploration of viable options would be useful. The least
i want to do is to force Internet PKI and complex out-of-band
middlebox discovery on all solutions where it's not needed.

> > I think there could be better middlebox contexts better than port numbers,
> > but to make fragments work better for existing TCP/UDP middlebox
> > functions, those 32 bits are it. But given how we can expect exposure of
> > information only from willing higher layers, they will have a much easier
> > way to get what they want to support by packet layer fragmentation. A
> > simple generic packet layer fragmentation for UDP would therefore be nice IMHO
> > so that UDP applications wanting to be friendly would not have to
> > reinvent that wheel.
> That's already in UDP options and some UDP encapsulations like GUE.

Pointers ?

> It's a good idea, but doesn't completely obsolete the use of IP
> fragmentation.

Nothing pragmatic will. Just a possible part of recommendations.

> > If we actually ever do such an IP option, it MUST be a destination option,
> > because the insufficient RFCs defining the treatment of hop-by-hop options
> > burned any ability to deploy those.
> >
> In PANRG when I suggested that FAST could be done in a Destination
> Option there was a lot of push back. I think it was for good reason.

WHat was for good reason ? Your proposal or the pushback ?

> Hop-by-Hop were designed precisely for inspection and potential
> modification at intermediate nodes, and the requirement that all nodes
> in the path process HBH has been relaxed in RFC8200.

Hop-by-hop options have been burned as i said through bad
implementations that for example punt them. Thats why operators often
configure to drop packets with those options to avoid them
burning their bad routers. 

Lets say we come up with some good new solution that depends on
new code written. I would hope we can document/standardize this
in such a way that new code would not be subject to this
legacy problem. But we can not get the benefit of that new code when
we use the existing burned code point for hop-by-hop because
we can not expect to get EXISTING code fixed and we will always
have paths with such old code.

Aka: if the religious architecture faction makes a fuzz out of
not using destination options for onpath functions, then lets
consider that we could simply rev the codepoint for hop-by-hop
options, but to make that solution stick, we would have to
show that we can write up correct processing RFCs for that
gen2 code-point such that it will not again get burned like
the first odepoint through bad new code.


Sorry for the soapbox, it's just been a frustration of
mine since  the early 2000 when the IPv6 specs did not
well enough improve on this point vs. IPv4 and i had
to rant about a lack of focus on reality of implementation
deltails to the IPv6 architects.


> Options (as well as Fragment EH) aren't supposed to even be inspected
> at intermediate nodes. The rationale for using DestOpts is of course
> that they're less likely to be dropped by intermediate nodes. That's
> true, they are more likely to be dropped; per RFC7872 it's about
> 15-17% drop rate for DestOpts and  40-43% for HBH. However, given the
> update in RFC8200 and if some useful HBH options are defined, I would
> expect that new deployments and replacements might start to lower the
> HBH drop rate. In any case, the drop rates for DestOpts are still no
> where close to zero, so regardless of which option is used used some
> backoff is needed when the options are dropped to continue to work but
> in a potentially degraded service mode relative to what a working
> option could provide.

Yes, its a darn unthankful job trying to come up with good
architectures for middleboxes when three is so much bad code and
bad operations out there.

Hence the thought in another venue
to have those middlebox instructions encrypted in such a
way that any onpath node not explicitly trusted to process
would have no way to even determine whether those options are there
and hence had no way to figure out what packets to drop. Of course
this approach is a lot more expensive for forwarding planes and
does require a lot more out-of-band synchronization.

See above: short of something that extreme, we should focus
on doing the right thing for new code but build it in
such a way that it is not blocked by bad existing old onpath code.


> Tom
> > For IPsec, IP in IP or similar higher layer protocols, i would either
> > use them as the key beneficiary of generic UDP fragmentation (IPsec/IP-UDP-IP)
> > for pragmatic short term solutions, or else the IP option would equally
> > be applicable to them (interesting discussionw aht the best context for
> > them would be, but the two port numbers would make them be most compatible
> > with those typcialyl very TCP/UDP centric middlebox functions).
> >
> > Specific answers to your points below
> >
> > Cheers
> >     Toerless
> >
> > On Sun, Aug 26, 2018 at 08:01:07PM -0700, Joe Touch wrote:
> >> IPv6. IP options.
> >
> > IMHO hop-by-hop optsion got burned and are undeployable because the
> > RFCs never made it mandatory enough to have them never impact forwarding
> > performance randonly and badly. We had to abandon perfectly good
> > hop-by-hop inspection solutions because there whee so many stupid router
> > implemntations out there that didn't even have the feature but would still
> > punt those packets to slow-path, then the operators saw those packets as
> > DoS packets and filtered them.
> >
> >> And (perhaps) any new proposed solution.
> >
> > The main issue of everything written on top is that the main
> > business interests the IETF works against is that of non-middlebox
> > friendly participants.
> >
> >> We have to aim at what network components *need* to do to participate. IP fragmentation is exactly that.
> >
> > See above. The technical question is how to enable sufficient per-packet context.
> >
> >> That???s easy to say, but since any host might be an IPsec tunnel endpoint, you???re back where we are now - needing IP fragmentation support everywhere, ultimately.
> >
> > asked and answered.
> >
> >> My view is simple - if you fix what we KNOW is wrong with NATs and firewalls - in KNOWN ways - then not only don???t we have to solve this problem, we don???t have a lot of other problems either (e.g., lack of state when a flow takes a different path into an enterprise).
> >
> > Thats an orthogonal discussion. The goal in this fragmentation
> > thread is purely to eliminate virtual-reassembly complexity on
> > middleboxes. However good or bad it is what they do.
> >
> > Actually, its not fully orthogonal. By eliminating virtual
> > reassembly needs, we make the middlebox also work for
> > cases where fragments use different paths.
> >
> >> > And yes, that would enable
> >> > me to make NAT and firewalls (for the firewall functions i think make sense)
> >> > for host stack traffic something that does not require to bother about
> >> > fragmentation and could therefore be done easier at higher speed
> >> > and architecturally as something only in the network layer.
> >>
> >> You???re optimizing a long-term impact solution for a short-term limitation. That???s a bad idea; protocols last a very long time.
> >
> > The diference between per-packet operation and virtual-fragment-reassembly
> > is orders of magnitude of complexity. Its the same order of magnitude
> > a business problem for network device development as those unnecessary RTTs
> > in pre-QUIC transport are for companies trying to make make money
> > from ADD millenials on web pages.
> >
> >> > The draft in question argues to limit what future work should do
> >> > within the existing requirements, which is fine. I was merely
> >> > pointing out that we could move more into what i think would be
> >> > a useful evolution if we also went beyond our current arch
> >> > and evolved it.
> >>
> >> I am fine with encouraging the *search* for new solutions, as long as *in the meantime* we also call out firewalls and NATs for how they are already broken. Until IP fragmentation is deprecated, that has to be our position as a community.
> >
> > Probably easier trying to figure out what subset of NAT/FW/middleboxes
> > we'd find to be worthwhile enough to support explicitly. Especially
> > for firewalls, many business interests say NONE and they will only
> > revert position when e.g.: they fail to get enough market share
> > with that position.
> >
> >> > I think fragmentation is best pushed up on the stack.
> >>
> >> Again, please tell me how to do that for IPsec tunnels - which, again, can start/end anywhere in the network.
> >
> > See above. I was talking about the pragmatic approach for consenting hosts and
> > middleboxes to get away without having to enhance IP.
> >
> >> >>> If i wouldn't have to worry about such proxy forwarding plane capabilities,
> >> >>> i definitely would prefer models like SOCKS. If i have to think about them
> >> >>> it becomes certainly difficult to even model this well.
> >> >> When you find a complete model better than the Internet, propose it.
> >> >> Until then...
> >> >
> >> > HTTPs over DWDM with application layer proxies on every hop.
> >> > You didn't define how to measure "better" ;-)
> >>
> >> Agreed, but that???s exactly where you???re headed when you say ???kick the can down the road to the upper layer protocol???.
> >
> > I am still not sure about the meaning of your answers to Ole, but i
> > may have tried to argue on this point ("i like SOCKS") maybe in
> > the direction of how you may view middleboxes in your paper, e.g.: doing a lot
> > more higher layer. But thats a much larger story independent of the
> > problem of virtual fragment reassembly, which i think should be the
> > only relevant issue in this fragmentation specific thread (and
> > not the general bitching or improvments on firewall semantics).
> >