Re: Fun and surprises with IPv6 fragmentation

Patrick McManus <pmcmanus@mozilla.com> Sat, 03 March 2018 15:23 UTC

MIME-Version: 1.0
In-Reply-To: <CAJ_4DfS=6h9qEQ+uwntLtDZNSODhqc_0pww7c2gK50XKna0BCw@mail.gmail.com>
References: <681fcc96-4cf9-100d-9ad6-b3c7be9189a5@huitema.net> <CAJ_4DfS=6h9qEQ+uwntLtDZNSODhqc_0pww7c2gK50XKna0BCw@mail.gmail.com>
From: Patrick McManus <pmcmanus@mozilla.com>
Date: Sat, 03 Mar 2018 10:23:49 -0500
Message-ID: <CAOdDvNqRD=NqbmDaTDi5t-iPy_sB-bjHcpeVPgEXZnN04DnRSQ@mail.gmail.com>
Subject: Re: Fun and surprises with IPv6 fragmentation
To: Ryan Hamilton <rch=40google.com@dmarc.ietf.org>
Cc: Christian Huitema <huitema@huitema.net>, "quic@ietf.org" <quic@ietf.org>
Content-Type: multipart/alternative; boundary="001a113dde0a0baae8056683ae27"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/t-i0fyq0u6GEtOa-obmqsD8EAE0>
Precedence: list

On Sat, Mar 3, 2018 at 12:09 AM, Ryan Hamilton <
rch=40google.com@dmarc.ietf.org> wrote:

> I'm sorry if this is a dumb question, but I understood that in IPv6
> routers could not fragment IPv6 packets, only endpoints.
>
>
I know! fun. Honestly the Internet is just so interesting - you wonder how
it works at all.

The pcap is in the #picoquic slack channel. (as always anyone reading this
can just ask anyone on the slack, like me, for an invite). search for
octopus.pcap


> Unlike in IPv4, IPv6 routers never fragment IPv6 packets. Packets
> exceeding the size of the maximum transmission unit of the destination link
> are dropped and this condition is signaled by a Packet too Big ICMPv6 type
> 2 message to the originating node, similarly to the IPv4 method when the
> Don't Fragment bit is set.[1]
>
> End nodes in IPv6 are expected to perform path MTU discovery to determine
> the maximum size of packets to send, and the upper-layer protocol is
> expected to limit the payload size. However, if the upper-layer protocol is
> unable to do so, the sending host may use the Fragment extension header in
> order to perform end-to-end fragmentation of IPv6 packets.
>
> https://en.wikipedia.org/wiki/IPv6_packet#Fragmentation
>
>
> How sure are you that it's a router and not the sending host that's doing
> the fragmentation.
>
>
It seems unlikely to be in the core.. the recv host does have a mtu of 1500
but its hard to imagine a recv stack fragmenting (and then reassembling and
reordering!) things.

one of the interesting tidbits here is that it isn't just the small
fragment that moves ahead in the queue - its both fragments of the big
packet.



> Cheers,
>
> Ryan
>
> On Fri, Mar 2, 2018 at 9:02 PM, Christian Huitema <huitema@huitema.net>
> wrote:
>
>> Yesterday, I was mentioning bugs of the interop. This morning, I woke up
>> to find an interesting message from Patrick McManus. Something is weird, he
>> said. The first data message that your server sends, with sequence number
>> N, always arrives before the final handshake message, with sequence number
>> N-1. That inversion appears to happen systematically.
>>
>> It took us the best part of a day to explore blind alleys and finally
>> understand what was happening. The exchange was over IPv6. Upon receiving a
>> connection request from Patrick’s implementation, Picoquic was sending back
>> a handshake packet. Immediately after that, Picoquic was sending its first
>> data packet, which happens to be an MTU probe. And it turns out that the
>> probe was 1518 bytes, a bit longer than what the AWS routers could accept.
>> So some router inserted an IPv6 fragmentation header and split the packet
>> in two: a large initial fragment, 1496 byte long, and a small second
>> fragment 78 bytes long. You could think that this is no big deal, since
>> fragments would just be reassembled at the destination, but you would be
>> wrong.
>>
>> Some routers on the path try to be helpful. They have learned from past
>> experience that short packets often carry important data, and so they try
>> to route them faster than long data packets. And here is what happens in
>> our case:
>>
>> ·         * The server prepares and send a Handshake packet, 590 bytes
>> long.
>>
>> ·         * The server then prepares the MTU probe, 1518 bytes long.
>>
>> ·         * The MTU probe is split into fragment 1, 1496 bytes, and
>> fragment 2, 78 bytes.
>>
>> ·         * The handshake and the long fragment are routed on the normal
>> path, but the small fragment is routed at a higher priority level.
>>
>> ·         * The Linux driver at the destination receives the small
>> fragment first. It queues everything behind that until it receives the long
>> fragment.
>>
>> ·         * The Linux driver passes the reassembled packet to the
>> application, which cannot do anything with it because the encryption keys
>> can only be obtained from the handshake packet.
>>
>> ·         * The Linux driver then passes the handshake packet to the
>> application.
>>
>> Which confirms an old opinion. When routers try to be smart and helpful,
>> they end up being dumb and harmful. Please just send the packets in the
>> order you get them!
>>
>> I tried to work around the issue by setting the "don't fragment" bit on
>> the socket, but somehow that doesn't work. So I simply programmed the
>> server to not use payloads larger than 1440 bytes. Still, I can see that
>> pattern happening in other circumstances, such as a long Connection Initial
>> message followed by a short 0-RTT packet. isn't networking fun?
>>
>> -- Christian Huitema
>>
>>
>

Fun and surprises with IPv6 fragmentation Christian Huitema
Re: Fun and surprises with IPv6 fragmentation Ryan Hamilton
Re: Fun and surprises with IPv6 fragmentation Christian Huitema
Re: Fun and surprises with IPv6 fragmentation Christian Huitema
Re: Fun and surprises with IPv6 fragmentation Ryan Hamilton
Re: Fun and surprises with IPv6 fragmentation Mikkel Fahnøe Jørgensen
Re: Fun and surprises with IPv6 fragmentation Christian Huitema
Re: Fun and surprises with IPv6 fragmentation Patrick McManus
RE: Fun and surprises with IPv6 fragmentation Praveen Balasubramanian
Re: Fun and surprises with IPv6 fragmentation Eggert, Lars
Re: Fun and surprises with IPv6 fragmentation Erik Kline
Re: Fun and surprises with IPv6 fragmentation Mikkel Fahnøe Jørgensen
RE: Fun and surprises with IPv6 fragmentation Lubashev, Igor
Re: Fun and surprises with IPv6 fragmentation Christian Huitema
Re: Fun and surprises with IPv6 fragmentation Ryan Hamilton