Re: Never fragment: getting PMTU info transmitted reliably

Mark Smith <markzzzsmith@gmail.com> Thu, 17 January 2019 03:28 UTC

Return-Path: <markzzzsmith@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 64111130F4C for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 19:28:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.5
X-Spam-Level:
X-Spam-Status: No, score=-0.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_LOCAL_NOVOWEL=0.5, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SHUqJf7YDkxC for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 19:27:59 -0800 (PST)
Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 86CED12785F for <ipv6@ietf.org>; Wed, 16 Jan 2019 19:27:59 -0800 (PST)
Received: by mail-ot1-x32f.google.com with SMTP id s5so9850033oth.7 for <ipv6@ietf.org>; Wed, 16 Jan 2019 19:27:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=py6C3wSpjMLBPK0lAYph8ulTIXTNn/aYTuyv0EWFwd0=; b=qbeSVFV3bO/ohHx+RfCvv4g24ew6GaqBBzz76Q1HqHRaPZQzcRRFvB0lfUwlf1Nkpy Q4gUNiSjMHgyQNSRlI6nOTfPnUyM5xHe3ttlDjYJglF7f95dyfMiWb45I5y65lDdBJlI E7A8xHpVpKHRK+Jtms1/USTjSAW4bx5WOP+h86OO4hqSsDIzlUtIdhhe54fDnfebhx/5 Z3ym5tk6iKJKj9Q8zlYrdUhsY8lPIr/Lw4GROkJiSh/WxYJt4lO5azfpI0Qym9s136Xu 6u3TmXIiFccZf7yLis9QkbN1MIo8FgW4AKZSyzwiKff1rm0KiALZF0mpV2hOZsHmbnF4 hjSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=py6C3wSpjMLBPK0lAYph8ulTIXTNn/aYTuyv0EWFwd0=; b=KbdpreWUywJq9o3CPZx7wPbqIJ1PsaI/BXvfRxxE00QZlIwZAUP1KZphIVaifG+NMM KxJkHi2BBk6LlsJZ0pLlkrmGyrjnOeyNEY1fh+CXv2inSl0rxi1KAiYQ0Gb6NJlju7Wx giA6ywuXNdmqd2dmReH1kzHcu1mVs814VB0Zryqqo7ZicLgZPooAaWpb40I6jdCCtSlK 1/1WvH2r/QhNvEAQ6lhJkK0TZZtLFaDyXleVUzSVjkUxWGPANsA/KoyY5Lpj+0Skdxp+ YhKS9w3zbw/0de6XeO6advqA8iFnzTZmrUYZi+yBd2LvXIyGIxqaA/F5c0Mzdopmu+4t L8hQ==
X-Gm-Message-State: AJcUukcWd7DzEes3sYcKOBZ7P6mVg89wteuoFJmGawc1d/xkWvQpj6BB k3376X5482pq0qdJGR/bGoP3r8EHxfpDTbzSMbo=
X-Google-Smtp-Source: ALg8bN7Bfcse00JeAmZOF/OfdsTBAk2+NOieVZ+zwzGEWZMiN88Rj69QobccLsI084Qqjnf7Exi6UjYxYskQ0eZ9pqI=
X-Received: by 2002:a05:6830:110:: with SMTP id i16mr8095819otp.72.1547695678617; Wed, 16 Jan 2019 19:27:58 -0800 (PST)
MIME-Version: 1.0
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com>
In-Reply-To: <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com>
From: Mark Smith <markzzzsmith@gmail.com>
Date: Thu, 17 Jan 2019 14:27:32 +1100
Message-ID: <CAO42Z2wsK+e3p25ZVnRfYXqmATLoEj+-1uTx8QVuEZEHqcXj0w@mail.gmail.com>
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/1zFwfSIiSBASiBsySDRG_X_lyh8>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 03:28:01 -0000

On Thu, 17 Jan 2019 at 13:57, Brian E Carpenter
<brian.e.carpenter@gmail.com> wrote:
>
> On 2019-01-17 15:39, Michael Richardson wrote:
> >
> > Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> >     > On 2019-01-17 13:12, Joel M. Halpern wrote:
> >     >> Just to clarify one aspect of the way entropy in path selection, I want
> >     >> to point out a complication.
> >     >>
> >     >> It is not anywhere near enough to have as much entropy data as the
> >     >> number of choices.  The problem is that you need enough randomness so
> >     >> that you can expect a good distribution of flows.  And that even the
> >     >> smaller number of larger flows will likely get distributed across the
> >     >> choices.    Reducing the amount of available entropy can be quite
> >     >> problematic.
> >
> >     > Right. And for the server farm case, I don't think it's science fiction
> >     > these days to think about hundreds or thousands of servers. Also, if the
> >     > load sharing algorithm attempts to ensure that a given server has only
> >     > one big job at a time, then a high collision rate in the hash can
> >     > defeat it. A form of the birthday paradox applies: not "what is the
> >     > chance of a clash per flow" but "what is the chance that out of a
> >     > thousand servers, one of them gets two big jobs at the same time"?
> >
> > Based upon my reading of the netflix blogs, they have experiemented
> > extensively with the load sharing, and they really don't care about
> > flow-labels in their decision process. (Of course, because IPv4 has
> > no such things)
>
> Indeed, but that's exactly why we brought in a load sharing expert
> to help us with RFC7098. And there are residual problems even in the
> ideal world where the flow label is perfect. We played with some ideas
> in https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
> but it didn't really go anywhere. In a nutshell, what's really needed
> is a bidirectional session ID, not a unidirectional flow ID. And
> that's not a layer 3 concept.
>

I think really what you want is an anycast IPv6 service address in DNS
for the load balanced service that the client uses to establish the
initial transport layer connection, and then a method to announce to
the client and then hand off that session to the unicast address of
the server actually handling the session. That would make the load
balancer with the anycast service address more of a session broker
rather than something that is inline with all the sessions' traffic.

Multipath TCP would fit the bill, and I assume the multipath
extensions for QUIC will too.

>    Brian
>
> >
> > It's about how fresh the (disk read) caches on the servers are, what content
> > is being streamed, and other things that have nothing to do with the
> > packets themselves.
> >
> > Architecturally with IPv6, if you have an entire /64 (or more) to play with
> > and you can statelessly forward packets at wire speed,  then there are
> > other interesting off-path choices one can do.  (For instance, assign
> > new server/128 for each client connection, and then when the connection
> > arrives, dynamically map it to a particular server.  This pushes the state
> > storage from layer-4 to the neighbour cache, which might not be a win)
> >
> > So I seriously question whether any of this matters to server farms.
> >
> >     > I am strongly against breaking the flow label just at the time when
> >     > the major o/s are starting to set it correctly.
> >
> > :-)
> >
> >     > I'm all for fixing the fragmentation problem;
> >     > draft-ietf-intarea-frag-fragile
> >     > exists for a reason. But not by breaking something else.
> >
> > My quick read says that it looks great to me.
> >
> > Again, I don't really think that using the flow label to seed PLPMTUD
> > is much of a win, but if it did provide something useful, I think it could be
> > done without too much harm.
> >
> > To reiterate: I don't think the benefit is high enough to warrant the
> >    risk, despite the fact that I don't think the risk is as high as you
> >    are suggesting.
> >
> > --
> > ]               Never tell me the odds!                 | ipv6 mesh networks [
> > ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> > ]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [
> >
> >
> > --
> > Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
> >  -= IPv6 IoT consulting =-
> >
> >
> >
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------