Re: Fun and surprises with IPv6 fragmentation

Ryan Hamilton <rch@google.com> Sat, 03 March 2018 05:10 UTC

Return-Path: <rch@google.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5EDD12EB26 for <quic@ietfa.amsl.com>; Fri, 2 Mar 2018 21:10:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.009
X-Spam-Level:
X-Spam-Status: No, score=-2.009 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4CB2zgvQrxTJ for <quic@ietfa.amsl.com>; Fri, 2 Mar 2018 21:09:59 -0800 (PST)
Received: from mail-yb0-x229.google.com (mail-yb0-x229.google.com [IPv6:2607:f8b0:4002:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AD07112D93F for <quic@ietf.org>; Fri, 2 Mar 2018 21:09:59 -0800 (PST)
Received: by mail-yb0-x229.google.com with SMTP id w9-v6so360897ybe.13 for <quic@ietf.org>; Fri, 02 Mar 2018 21:09:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=vQzuezbw/AczFrevWcX6URC2safJ84uaakhKRPLeHc8=; b=QnJidregz3YtCM87ER2+Yz/NH8WsjVtdWS2bCIBD9OOsP4RdXtvhYKl8CQG5QHu6Nv AEtVQ+3UqniCPKXuZXI3i0dl5JRTPv2PgZFcF6eIjpQpYj0eg7gFtmMIFoDCL+ZpK+ez 8ynBbxozxTmuc5wP+hBHzMKGxLDj3z9tHWpfruUeJF48r5m6V69yyyURqgfkZiHge25N 7ksEY6I64aV5daegKIO6Ax99AL7NOhVWFoG0l0TvFbAQf2qSKDcX3mnhUScZ0re/pKQj uBHooabXzfpwyunZ84iKKabK+wLhzoSaIIt8utEkw3OOrdTiQVoiE4W/3FgILLDztv5a DINA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vQzuezbw/AczFrevWcX6URC2safJ84uaakhKRPLeHc8=; b=hCEYy/Oa4O1dQE+KD/3WrTyyHC0HGewPGi8pcgZJJ2eq6sdStfQtMJOtvI/rOK6OAc cVpC5n1IrMLXDX1KAB630wBWgCG1vAKv11V1LBTIAsaGLitPBHOv58QxTvKcb9BPxHbF O118G2rGVejoCkK3tlgLSr7+h2fUAYXCBE+EL5DyuKkcxCFE4T3nVIYMhSiTYMtzfY7i 2xGVaLtROCfDEOkujY0A4tJb4kZJeWuFDpPgXznpGQkaOYcsSW9LGK0z+fdpYpZhKnsn fokg7FFyczABQEesXe3Lt8V6TZ5nEv9U2aV/zOm6ukkeuIBA6U7fk14o/6wOxleL0iIo 0RIA==
X-Gm-Message-State: AElRT7GXYh7bQbrjThpOlhjoTrI7JyMDKHGUDjvtY0DOv4TWwZM+7l4L /LmUSeKoaAoeFNGmBVdyv0ksHKD8Lk3JHa+02iIaWQ==
X-Google-Smtp-Source: AG47ELtFVvMQd7p0zB1M2rl5FEHeyA1GYWwP8slekpOV1dybxzHmric5tRiQCscLFKA4xwx46+zVoc2ngXFZtH9UH/M=
X-Received: by 2002:a25:cd84:: with SMTP id d126-v6mr4426772ybf.314.1520053798392; Fri, 02 Mar 2018 21:09:58 -0800 (PST)
MIME-Version: 1.0
Received: by 2002:a25:918a:0:0:0:0:0 with HTTP; Fri, 2 Mar 2018 21:09:57 -0800 (PST)
In-Reply-To: <681fcc96-4cf9-100d-9ad6-b3c7be9189a5@huitema.net>
References: <681fcc96-4cf9-100d-9ad6-b3c7be9189a5@huitema.net>
From: Ryan Hamilton <rch@google.com>
Date: Fri, 02 Mar 2018 21:09:57 -0800
Message-ID: <CAJ_4DfS=6h9qEQ+uwntLtDZNSODhqc_0pww7c2gK50XKna0BCw@mail.gmail.com>
Subject: Re: Fun and surprises with IPv6 fragmentation
To: Christian Huitema <huitema@huitema.net>
Cc: "quic@ietf.org" <quic@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b2129e05667b1a46"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/JFCM2soqPesfDV3tLP9Tp3vZMJ0>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Mar 2018 05:10:03 -0000

I'm sorry if this is a dumb question, but I understood that in IPv6 routers
could not fragment IPv6 packets, only endpoints.

Unlike in IPv4, IPv6 routers never fragment IPv6 packets. Packets exceeding
the size of the maximum transmission unit of the destination link are
dropped and this condition is signaled by a Packet too Big ICMPv6 type 2
message to the originating node, similarly to the IPv4 method when the
Don't Fragment bit is set.[1]

End nodes in IPv6 are expected to perform path MTU discovery to determine
the maximum size of packets to send, and the upper-layer protocol is
expected to limit the payload size. However, if the upper-layer protocol is
unable to do so, the sending host may use the Fragment extension header in
order to perform end-to-end fragmentation of IPv6 packets.

https://en.wikipedia.org/wiki/IPv6_packet#Fragmentation


How sure are you that it's a router and not the sending host that's doing
the fragmentation.

Cheers,

Ryan

On Fri, Mar 2, 2018 at 9:02 PM, Christian Huitema <huitema@huitema.net>
wrote:

> Yesterday, I was mentioning bugs of the interop. This morning, I woke up
> to find an interesting message from Patrick McManus. Something is weird, he
> said. The first data message that your server sends, with sequence number
> N, always arrives before the final handshake message, with sequence number
> N-1. That inversion appears to happen systematically.
>
> It took us the best part of a day to explore blind alleys and finally
> understand what was happening. The exchange was over IPv6. Upon receiving a
> connection request from Patrick’s implementation, Picoquic was sending back
> a handshake packet. Immediately after that, Picoquic was sending its first
> data packet, which happens to be an MTU probe. And it turns out that the
> probe was 1518 bytes, a bit longer than what the AWS routers could accept.
> So some router inserted an IPv6 fragmentation header and split the packet
> in two: a large initial fragment, 1496 byte long, and a small second
> fragment 78 bytes long. You could think that this is no big deal, since
> fragments would just be reassembled at the destination, but you would be
> wrong.
>
> Some routers on the path try to be helpful. They have learned from past
> experience that short packets often carry important data, and so they try
> to route them faster than long data packets. And here is what happens in
> our case:
>
> ·         * The server prepares and send a Handshake packet, 590 bytes
> long.
>
> ·         * The server then prepares the MTU probe, 1518 bytes long.
>
> ·         * The MTU probe is split into fragment 1, 1496 bytes, and
> fragment 2, 78 bytes.
>
> ·         * The handshake and the long fragment are routed on the normal
> path, but the small fragment is routed at a higher priority level.
>
> ·         * The Linux driver at the destination receives the small
> fragment first. It queues everything behind that until it receives the long
> fragment.
>
> ·         * The Linux driver passes the reassembled packet to the
> application, which cannot do anything with it because the encryption keys
> can only be obtained from the handshake packet.
>
> ·         * The Linux driver then passes the handshake packet to the
> application.
>
> Which confirms an old opinion. When routers try to be smart and helpful,
> they end up being dumb and harmful. Please just send the packets in the
> order you get them!
>
> I tried to work around the issue by setting the "don't fragment" bit on
> the socket, but somehow that doesn't work. So I simply programmed the
> server to not use payloads larger than 1440 bytes. Still, I can see that
> pattern happening in other circumstances, such as a long Connection Initial
> message followed by a short 0-RTT packet. isn't networking fun?
>
> -- Christian Huitema
>
>