Re: Never fragment: getting PMTU info transmitted reliably

Tom Herbert <tom@herbertland.com> Thu, 17 January 2019 00:39 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DB832130E9A for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 16:39:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.041
X-Spam-Level:
X-Spam-Status: No, score=-2.041 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wf7ii31m4WrF for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 16:39:32 -0800 (PST)
Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E23D126CC7 for <ipv6@ietf.org>; Wed, 16 Jan 2019 16:39:32 -0800 (PST)
Received: by mail-qk1-x735.google.com with SMTP id c21so5008881qkl.6 for <ipv6@ietf.org>; Wed, 16 Jan 2019 16:39:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WEiy1fF5WuDOVlH4zWW4ymG7t2U6ugVT4aZpL6+k3cE=; b=B9yPM32Dr1cOh5UI5Y9d8WgJXRzvda0pFC444K+nNQ0LMI7jrC8Udp/tS1ULDQMNtx FDrLQ7odaYdf+k4z/TA3PuLpy3l/G1ZNfDS3UuBiHTbYpSFDsWrsosxILqAhZ39gxaEH aZV5BhXR1KvffKRrM0KFshcy+8DbNQKXecSJROMcRbV7fZd5SvcnbkrC58qaxqN/VhPE YWb9SOADR5HjwhkOaBvZ0x70UHx7fOVKiYIhqST95CSsKpgbDIy+p0QApKRIJt41UCzV UQZNm6uU+umvm1IkXbf8vVbNO8G6bQcug9+XhtbC3qlWN+FGxFB/QnJLJDAA7ePZERF5 De8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WEiy1fF5WuDOVlH4zWW4ymG7t2U6ugVT4aZpL6+k3cE=; b=nqIJiRMto85LzDyPVS6Cj+5ai++ALt6jf1PekvWxSL1aEoLjO1qBkFLnUofRJySYlQ 3L9XvbRFHGvYglFIt2jJj7+6hAamy52QTM/C/BLyvVJknv0to/TO96NrDLqiUTupq2L+ YNWTQGdkl+zsnZL5k+bXc1hirKvuqf3OleNsd/i1Re7rGYC20qpcqUOiHUPfC9942cyM XkDA2xE1fgbvL3ZV9BErzlxNCyzvYb6Z7g25cuCrTIppNwPu/TlT39HStgXCnueoaRRV dJx4dTiC1o4wto4osbNlXBN65GoZ3s3LamnRi2baRzl2X8cwQOdCE0IAuppCqtPRBeZr 05wQ==
X-Gm-Message-State: AJcUukdm3VG5xL5VAW465tMHj9qnPpVSqhPBBp5anQMf/EwdQhCU2eXP Cyh8Z40eT6aRPO+/k8g1Na1uWyeMXDnzFHWZsu7SpQ==
X-Google-Smtp-Source: ALg8bN4M2GvmmeBMK9/yc59FY9qHq6eoZ+H28QzdQGATxDjI2akHYxjI8logT3wqvHoa+pBNAZIBUxZnJiOonb3YPpc=
X-Received: by 2002:a37:d4d9:: with SMTP id s86mr8938097qks.190.1547685570868; Wed, 16 Jan 2019 16:39:30 -0800 (PST)
MIME-Version: 1.0
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <AEA47E27-C0CB-4ABE-8ADE-51E9D599EF8F@gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <d3ee03ad-bd24-f353-ddc9-c3cf8a4eb89b@gmail.com>
In-Reply-To: <d3ee03ad-bd24-f353-ddc9-c3cf8a4eb89b@gmail.com>
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 16 Jan 2019 16:39:19 -0800
Message-ID: <CALx6S36OT-SapX66G_Yw6y_5Tkh2Nk6Abkjd+B_iwdiZqeCF+A@mail.gmail.com>
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Cc: "Joel M. Halpern" <jmh@joelhalpern.com>, Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/tC3foGOQJZfjkdEBg87C0cm5xlI>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 00:39:35 -0000

On Wed, Jan 16, 2019 at 4:30 PM Brian E Carpenter
<brian.e.carpenter@gmail.com> wrote:
>
> On 2019-01-17 13:12, Joel M. Halpern wrote:
> > Just to clarify one aspect of the way entropy in path selection, I want
> > to point out a complication.
> >
> > It is not anywhere near enough to have as much entropy data as the
> > number of choices.  The problem is that you need enough randomness so
> > that you can expect a good distribution of flows.  And that even the
> > smaller number of larger flows will likely get distributed across the
> > choices.    Reducing the amount of available entropy can be quite
> > problematic.
>
> Right. And for the server farm case, I don't think it's science fiction
> these days to think about hundreds or thousands of servers. Also, if the
> load sharing algorithm attempts to ensure that a given server has only
> one big job at a time, then a high collision rate in the hash can
> defeat it. A form of the birthday paradox applies: not "what is the
> chance of a clash per flow" but "what is the chance that out of a
> thousand servers, one of them gets two big jobs at the same time"?
>
> I am strongly against breaking the flow label just at the time when
> the major o/s are starting to set it correctly.
>
> I'm all for fixing the fragmentation problem; draft-ietf-intarea-frag-fragile
> exists for a reason. But not by breaking something else.
>
Agreed. Turning up flow labels was one of few things we were able to
do immediately at scale and doesn't seem to have caused problems.
Unfortunately, there aren't any available reserved bits in the IP
header and no bits that could easily be purposed (even if there were
it would a major discussion about what use cases warrant allocation of
bits in the primary IP header). IPv6 is extensible by virtue of
extension headers, that is where things like this belong.

Tom

>    Brian
>
> >
> > Yours,
> > Joel
> >
> > On 1/16/19 6:36 PM, Michael Richardson wrote:
> >>
> >> Warren writes about putting MTU info into flow Label:
> >>      >> to signal up to a 9K MTU, we would need 13bits (LN(9K-1280)/LN(2) =
> >>      >> 12.9).
> >>      >> 20 - 13 gives 7 bits (128) for the hash entropy. "7 bits of entropy,
> >>      >> evenly distributed, should be enough for anyone", he said, hoping
> >>      >> no-one points at
> >>      >> the obvious correlation to 640K of RAM...
> >>
> >> Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> >>      > I guess it all depends what you expect from the entropy. 20 bits gives
> >>      > you a 1-in-a-million chance of a clash. 7 bits gives you a 1-in-128 chance
> >>      > of a clash. This probably doesn't matter for a simple ECMP or LAG kind
> >>      > of load sharing, but who's to say it doesn't matter for some more
> >>      > fancy kind of sharing across a large array of servers?
> >>
> >> You'd have to have more than 128 servers and/or paths.
> >> Maybe one of our SPRING people in this WG can tell us if that's a real
> >> problem today.  Obviously, we can't guess if it will be bad in the future,
> >> but if it's a problem today, then we can know that immediately.
> >>
> >> Now when I pushed for draft-ietf-6man-rfc6434-bis-09 and RFC8200 to say
> >> that PLMTUD to be made MUST, I got various push backs that amounted to:
> >>    1) we don't have enough evidence yet.
> >>    2) it doesn't work for UDP and other traffic.
> >>
> >> (1) turned out to be a real issue. I thought that some of the big players
> >>      could easily get, or already would have, that kind of evidence.
> >>      (Linux does not ship with PLPMTUD on by default.  If you want it, btw,
> >>      sysctl -w net.ipv4.tcp_mtu_probing=2.  Yes. ipv4. It affects both)
> >>      Turns out I was told that they always set their TCP segment size such that
> >>      they likely will never fragment for v4 or v6, because due to hardware
> >>      Transit Offload, the cost of missing a tx-op exceeds the benefit of
> >>      making the packet slightly bigger.  I suspect that this is true in
> >>      general to UDP and QUIC traffic too.
> >>      I imagine a next generation 10G NICs might offer QUIC offload, including
> >>      doing the crypto.  That's what I'd be coding if I worked in that space.
> >>
> >> 2) I will admit that I personally don't care that much about UDP traffic,
> >>     except in that it lets me run IPsec through NAT44s.   I know that I use
> >>     QUIC and WebRTC regularly, and that corporate enterprise users gets
> >>     screwed by lack of UDP regularly.
> >>
> >> So I think question is: is there really a problem that needs to be solved?
> >> Maybe I will have to back and read the beginning of this thread again to
> >> recall what the issue was.   Does this belong in SPUD?
> >>
> >> I wish that SCTP had flown higher...
> >>
> >> --
> >> Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
> >>   -= IPv6 IoT consulting =-
> >>
> >>
> >> --------------------------------------------------------------------
> >> IETF IPv6 working group mailing list
> >> ipv6@ietf.org
> >> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> >> --------------------------------------------------------------------
> >>
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------
> >
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------