Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile

Tom Herbert <> Tue, 28 August 2018 22:52 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 51A3C130DDA for <>; Tue, 28 Aug 2018 15:52:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id cnrUr2XB9Nnk for <>; Tue, 28 Aug 2018 15:52:00 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:400d:c0d::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 30D4F130E0C for <>; Tue, 28 Aug 2018 15:52:00 -0700 (PDT)
Received: by with SMTP id j7-v6so3726585qtp.2 for <>; Tue, 28 Aug 2018 15:52:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=WeYpioZ5d4Ei9ZMmHutNKMX9kzKKq5VL5VSvQjyjKbk=; b=wz8Ist9KzjAIvc93dLv91DaEqOh6zwmvkA+8SMVtf6aN9Uij2v1IPNreQfYI3gLl8I W6gyyofGKAa0jdmYTxFOVnrjtXRb3ZutMPpfMcrArEqo1yDau1BhV/f+b8Ego38zbzJL lUzHcjH8/hgvOaBw4LOOw4zuOl72OxX189j5FBO4AQ130btUoSXcddZdzkrGQi8Kw5Tt LmhzsucCB8m+ZoJbQcSsJnXBTK6VT4HkWT6gwRTpuR/A9d3OtQ8prht25+GxRcwzLqLx ShqqRQPtkFHAuxqGRCEBdYjWGo2Ek/+3KcuSqqdV4ThPfA7aSATSm3Tk4/waHASYvWue BPgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=WeYpioZ5d4Ei9ZMmHutNKMX9kzKKq5VL5VSvQjyjKbk=; b=WvWFzdOwBNaeE/qlETj0usTehAh2MDYsZATmHhv7E2sQytfn+QNjejCDqhXhOd9Inj cDCt7jcwJa+EvPQqmZi1HsUQGj/yRXBTnTppBtxjOeAQ9rUcF8TLas6V6Glsb/sGh7T4 CQIEOALaQjAdZsldtBAJl8vcYlnbB1ZTi7/Hf8nzT3Iip/+6hOSpzozn6kcHbSNNvOdC MIf/5UsAYneA5UJmU4nwE3uZYOHGmXmD3G+ikLZ0zI2rsJtdILasCuh65Sip10ivtmvS /zb2JI9n+7qic9rZrLNmx6/LGion2FWwq5J75tCyejkwifiyo2nCOZ1cYB7wnZ4SuxZ/ ikRQ==
X-Gm-Message-State: APzg51AhHYnGiR37VeCnGKrlk+cmlCa/+1FMnWT4aZyn3Evz7pmzu0tD XBsGPz1m0WaGHIZE3TYEVwyS0wd0lRejRokgbsDRYQ==
X-Google-Smtp-Source: ANB0VdYYoaGdLzudEmXrrOlvCVorMdN+WqlluzDuojfFW2qyHUp9x/ZzKJhC9Hzw0uUNY7ka6HkEUHlm3Q5qU1EiLe4=
X-Received: by 2002:a0c:f685:: with SMTP id p5-v6mr3917559qvn.22.1535496719056; Tue, 28 Aug 2018 15:51:59 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:ac8:3312:0:0:0:0:0 with HTTP; Tue, 28 Aug 2018 15:51:58 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <>
From: Tom Herbert <>
Date: Tue, 28 Aug 2018 15:51:58 -0700
Message-ID: <>
To: Toerless Eckert <>
Cc: Joe Touch <>, Christian Huitema <>, int-area <>,
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [Int-area] WG Adoption Call: IP Fragmentation Considered Fragile
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Aug 2018 22:52:05 -0000

On Tue, Aug 28, 2018 at 3:09 PM, Toerless Eckert <> wrote:
> Thanks, Joe
> This has gotten pretty long. Let me sumarize my suggestions upfront:
> For the draft itself, how about it also consideres recommendations not only
> for IPv6 but IPv4. Such as simply also only do what we've accepted to be
> feasible for IPv6. Like: do never rely on in-network fragmentation but
> only use DF packets.
> The remaining considerations are more generic, and i wonder if this draft
> wants to have the guts to even mention them. Probably not. But IMHO
> we will continue to be architecturally stuck in the hole we are if we do not
> tackle them for poor middleboxes:
> The IETF may think they are bad and not needed, but reality does need middleboxes
> that function at linerate of only per-packet inspection and not at the lower
> speed achievable with "virtual reassembly based inspection". Therefore we
> need options to have in every fragment all the same context that middleboxes
> reasonably should be able to inspect.
> Its IMHO a clean pragmatic solution to say that this additional context can
> be in the higher-layer protocol and that layer willingly exposes it to middleboxes -
> and encrypts everthing else. This view then requires that higher-layer
> protocol to perform packet layer fragmentation. Like we can do with
> TCP. Thats why that approach is IMHO a good recommendation to use. If
> other transport protocols want to support the same degree of interaction
> with middleboxes. Fine. If not, fine too.
> Given how we do have TCP PLMTUD, i think it would be nice to suggest the use
> of it instead of relying on IP layer fragmentation to make TCP flows more
> middlebox friendly. In this draft. AFAIK, its the only option that does not
> require new spec work, thats why it could make the cut for this draft.
> For longer term architectural evolution of middlebox friendly IP fragmentation,
> we could indeed have a new IP option carrying that context, and fragmentation
> would put this option into every fragment. The definition of this context could
> be per-transport proto.

I think it's the opposite-- the definition of the context should be
protocol agnostic. We need to get middleboxes out of doing DPI and to
stop worrying only about select transport protocols. So we need a
mechanism  that works equally well with with TCP, UDP, SCTP, ICMP,
IPsec, fragments, etc. It definitely needs to be secure though.

> I think there could be better middlebox contexts better than port numbers,
> but to make fragments work better for existing TCP/UDP middlebox
> functions, those 32 bits are it. But given how we can expect exposure of
> information only from willing higher layers, they will have a much easier
> way to get what they want to support by packet layer fragmentation. A
> simple generic packet layer fragmentation for UDP would therefore be nice IMHO
> so that UDP applications wanting to be friendly would not have to
> reinvent that wheel.

That's already in UDP options and some UDP encapsulations like GUE.
It's a good idea, but doesn't completely obsolete the use of IP

> If we actually ever do such an IP option, it MUST be a destination option,
> because the insufficient RFCs defining the treatment of hop-by-hop options
> burned any ability to deploy those.
In PANRG when I suggested that FAST could be done in a Destination
Option there was a lot of push back. I think it was for good reason.
Hop-by-Hop were designed precisely for inspection and potential
modification at intermediate nodes, and the requirement that all nodes
in the path process HBH has been relaxed in RFC8200. Destination
Options (as well as Fragment EH) aren't supposed to even be inspected
at intermediate nodes. The rationale for using DestOpts is of course
that they're less likely to be dropped by intermediate nodes. That's
true, they are more likely to be dropped; per RFC7872 it's about
15-17% drop rate for DestOpts and  40-43% for HBH. However, given the
update in RFC8200 and if some useful HBH options are defined, I would
expect that new deployments and replacements might start to lower the
HBH drop rate. In any case, the drop rates for DestOpts are still no
where close to zero, so regardless of which option is used used some
backoff is needed when the options are dropped to continue to work but
in a potentially degraded service mode relative to what a working
option could provide.


> For IPsec, IP in IP or similar higher layer protocols, i would either
> use them as the key beneficiary of generic UDP fragmentation (IPsec/IP-UDP-IP)
> for pragmatic short term solutions, or else the IP option would equally
> be applicable to them (interesting discussionw aht the best context for
> them would be, but the two port numbers would make them be most compatible
> with those typcialyl very TCP/UDP centric middlebox functions).
> Specific answers to your points below
> Cheers
>     Toerless
> On Sun, Aug 26, 2018 at 08:01:07PM -0700, Joe Touch wrote:
>> IPv6. IP options.
> IMHO hop-by-hop optsion got burned and are undeployable because the
> RFCs never made it mandatory enough to have them never impact forwarding
> performance randonly and badly. We had to abandon perfectly good
> hop-by-hop inspection solutions because there whee so many stupid router
> implemntations out there that didn't even have the feature but would still
> punt those packets to slow-path, then the operators saw those packets as
> DoS packets and filtered them.
>> And (perhaps) any new proposed solution.
> The main issue of everything written on top is that the main
> business interests the IETF works against is that of non-middlebox
> friendly participants.
>> We have to aim at what network components *need* to do to participate. IP fragmentation is exactly that.
> See above. The technical question is how to enable sufficient per-packet context.
>> That???s easy to say, but since any host might be an IPsec tunnel endpoint, you???re back where we are now - needing IP fragmentation support everywhere, ultimately.
> asked and answered.
>> My view is simple - if you fix what we KNOW is wrong with NATs and firewalls - in KNOWN ways - then not only don???t we have to solve this problem, we don???t have a lot of other problems either (e.g., lack of state when a flow takes a different path into an enterprise).
> Thats an orthogonal discussion. The goal in this fragmentation
> thread is purely to eliminate virtual-reassembly complexity on
> middleboxes. However good or bad it is what they do.
> Actually, its not fully orthogonal. By eliminating virtual
> reassembly needs, we make the middlebox also work for
> cases where fragments use different paths.
>> > And yes, that would enable
>> > me to make NAT and firewalls (for the firewall functions i think make sense)
>> > for host stack traffic something that does not require to bother about
>> > fragmentation and could therefore be done easier at higher speed
>> > and architecturally as something only in the network layer.
>> You???re optimizing a long-term impact solution for a short-term limitation. That???s a bad idea; protocols last a very long time.
> The diference between per-packet operation and virtual-fragment-reassembly
> is orders of magnitude of complexity. Its the same order of magnitude
> a business problem for network device development as those unnecessary RTTs
> in pre-QUIC transport are for companies trying to make make money
> from ADD millenials on web pages.
>> > The draft in question argues to limit what future work should do
>> > within the existing requirements, which is fine. I was merely
>> > pointing out that we could move more into what i think would be
>> > a useful evolution if we also went beyond our current arch
>> > and evolved it.
>> I am fine with encouraging the *search* for new solutions, as long as *in the meantime* we also call out firewalls and NATs for how they are already broken. Until IP fragmentation is deprecated, that has to be our position as a community.
> Probably easier trying to figure out what subset of NAT/FW/middleboxes
> we'd find to be worthwhile enough to support explicitly. Especially
> for firewalls, many business interests say NONE and they will only
> revert position when e.g.: they fail to get enough market share
> with that position.
>> > I think fragmentation is best pushed up on the stack.
>> Again, please tell me how to do that for IPsec tunnels - which, again, can start/end anywhere in the network.
> See above. I was talking about the pragmatic approach for consenting hosts and
> middleboxes to get away without having to enhance IP.
>> >>> If i wouldn't have to worry about such proxy forwarding plane capabilities,
>> >>> i definitely would prefer models like SOCKS. If i have to think about them
>> >>> it becomes certainly difficult to even model this well.
>> >> When you find a complete model better than the Internet, propose it.
>> >> Until then...
>> >
>> > HTTPs over DWDM with application layer proxies on every hop.
>> > You didn't define how to measure "better" ;-)
>> Agreed, but that???s exactly where you???re headed when you say ???kick the can down the road to the upper layer protocol???.
> I am still not sure about the meaning of your answers to Ole, but i
> may have tried to argue on this point ("i like SOCKS") maybe in
> the direction of how you may view middleboxes in your paper, e.g.: doing a lot
> more higher layer. But thats a much larger story independent of the
> problem of virtual fragment reassembly, which i think should be the
> only relevant issue in this fragmentation specific thread (and
> not the general bitching or improvments on firewall semantics).