Re: Never fragment: getting PMTU info transmitted reliably

Tom Herbert <tom@herbertland.com> Thu, 17 January 2019 04:35 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0835126CB6 for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 20:35:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.042
X-Spam-Level:
X-Spam-Status: No, score=-2.042 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qN3FUpcvXhq5 for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 20:35:43 -0800 (PST)
Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5151A123FFD for <ipv6@ietf.org>; Wed, 16 Jan 2019 20:35:43 -0800 (PST)
Received: by mail-qt1-x830.google.com with SMTP id p17so10024643qtl.5 for <ipv6@ietf.org>; Wed, 16 Jan 2019 20:35:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ctkI6U9FEZ96UWC3KwqST2V8jYXis7QXJvA0JQqwzf0=; b=CYPk2gvk72JbcFTB7lO9BpAKyuTjDkpYc0WtqjcXmufPzgFr0eouHAoRbPyGoaaTk4 H8rc34nN0EVRRWG95TEYfCZGOyyshai5dt1NRQkEg1yB8Izviy+dt0NK7JjLIa/3eg1S ou7u46F3oBNxwrDjAU0DHo9lFtO1NaMkpuLrR2vf7WrspsYMCetN2knJZjgXXrO+uGTB Uhn/j3uQPxJnJodbhxKknJcttSkx4Za+8Gm2RfwjUUxM9hcnWN13xwWd7R0JuFTiG6LN NN/GTl4/lAjaDpHQ9hBunR/3zaHRRJYEgAfYG5cL/2CehDaculQcg182J7GOFzMghxlv cDsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ctkI6U9FEZ96UWC3KwqST2V8jYXis7QXJvA0JQqwzf0=; b=IzSQl5iHxEpdgpHA9xmhT77o3jOas8ZwOgyJ/UubsvkursYdMhdpTOX1EHejGtXXP+ LQuVAP4jZ1NfrkAOJYHliVUd4shZSoFmsK6OAlQn8IjaxLNOohw9P45sh8jjZ5g6NyAA Xl6VzhHJuC+ii7iEFiamD1aKtQKQAJOoLAytMoUAjiAQm2k8JNlooL6t+Q+IBCkveUaM USGJARBwjtiCP/zXshQKVpf58bjlzhH9NLdgnZfv+bnJ2g65jAjKwiRPOlBnZq5FcEDl 0VlDpoFsbma5Sv7gittSJ0Ji7GjOy2irTu20IIGwghsTrKZmmce9KXXNsw+84OsXx8MK hfFQ==
X-Gm-Message-State: AJcUukdEtAN+MQ2SwDZ/AdRzTrC1oVPy4pZa4wlOp8F/9fRykSFyObKa fi0NfSdsvM3jBocRIe56TGyrpkhV76e6dBKsE+aRZA==
X-Google-Smtp-Source: ALg8bN5kNUCm7qHYAP5IjVozdBNTofpfGUdHjgV34bVvmIUPo8tnXR+6Z0xtBDVLLUWw0sfOIAL3dyyWJoKu+8Rhfms=
X-Received: by 2002:a0c:b24f:: with SMTP id k15mr9988047qve.72.1547699742123; Wed, 16 Jan 2019 20:35:42 -0800 (PST)
MIME-Version: 1.0
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com> <CAO42Z2wsK+e3p25ZVnRfYXqmATLoEj+-1uTx8QVuEZEHqcXj0w@mail.gmail.com>
In-Reply-To: <CAO42Z2wsK+e3p25ZVnRfYXqmATLoEj+-1uTx8QVuEZEHqcXj0w@mail.gmail.com>
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 16 Jan 2019 20:35:30 -0800
Message-ID: <CALx6S35=AhF=5WdQNymTNu+Xtd3zV2KVWyHdwJzw2XNejns77g@mail.gmail.com>
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: Mark Smith <markzzzsmith@gmail.com>
Cc: Brian E Carpenter <brian.e.carpenter@gmail.com>, Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/Nu3hnc49ndnTxXpwjwYAdXlgnx0>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 04:35:46 -0000

On Wed, Jan 16, 2019 at 7:28 PM Mark Smith <markzzzsmith@gmail.com> wrote:
>
> On Thu, 17 Jan 2019 at 13:57, Brian E Carpenter
> <brian.e.carpenter@gmail.com> wrote:
> >
> > On 2019-01-17 15:39, Michael Richardson wrote:
> > >
> > > Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> > >     > On 2019-01-17 13:12, Joel M. Halpern wrote:
> > >     >> Just to clarify one aspect of the way entropy in path selection, I want
> > >     >> to point out a complication.
> > >     >>
> > >     >> It is not anywhere near enough to have as much entropy data as the
> > >     >> number of choices.  The problem is that you need enough randomness so
> > >     >> that you can expect a good distribution of flows.  And that even the
> > >     >> smaller number of larger flows will likely get distributed across the
> > >     >> choices.    Reducing the amount of available entropy can be quite
> > >     >> problematic.
> > >
> > >     > Right. And for the server farm case, I don't think it's science fiction
> > >     > these days to think about hundreds or thousands of servers. Also, if the
> > >     > load sharing algorithm attempts to ensure that a given server has only
> > >     > one big job at a time, then a high collision rate in the hash can
> > >     > defeat it. A form of the birthday paradox applies: not "what is the
> > >     > chance of a clash per flow" but "what is the chance that out of a
> > >     > thousand servers, one of them gets two big jobs at the same time"?
> > >
> > > Based upon my reading of the netflix blogs, they have experiemented
> > > extensively with the load sharing, and they really don't care about
> > > flow-labels in their decision process. (Of course, because IPv4 has
> > > no such things)
> >
> > Indeed, but that's exactly why we brought in a load sharing expert
> > to help us with RFC7098. And there are residual problems even in the
> > ideal world where the flow label is perfect. We played with some ideas
> > in https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
> > but it didn't really go anywhere. In a nutshell, what's really needed
> > is a bidirectional session ID, not a unidirectional flow ID. And
> > that's not a layer 3 concept.
> >
>
> I think really what you want is an anycast IPv6 service address in DNS
> for the load balanced service that the client uses to establish the
> initial transport layer connection, and then a method to announce to
> the client and then hand off that session to the unicast address of
> the server actually handling the session. That would make the load
> balancer with the anycast service address more of a session broker
> rather than something that is inline with all the sessions' traffic.
>
Mark,

Alternatively, a new record type could be added to DNS that returns
blocks of IPv6 address (could be called BBBB records). The BBBB record
would be something like a base IPv6 address and an extent. Given the
enormous size IPv6 addresses, a single record could contain billions
of addresses for a service. The client just needs to pick address in
the blaock at random, and load balancing to backend servers is
accomplished by routing to back end servers solely based on
destination address (each backend server serves some portion of the
address block). No need for VIPs, anycast, DPI into the transport
layer, or stateful load balancing. This also has the advantage of
introducing a lot more bits of entropy for other load balancing
techniques like ECMP (using a different IPv6 source IP address for
every connection would have a similar effect).

Tom


> Multipath TCP would fit the bill, and I assume the multipath
> extensions for QUIC will too.
>
> >    Brian
> >
> > >
> > > It's about how fresh the (disk read) caches on the servers are, what content
> > > is being streamed, and other things that have nothing to do with the
> > > packets themselves.
> > >
> > > Architecturally with IPv6, if you have an entire /64 (or more) to play with
> > > and you can statelessly forward packets at wire speed,  then there are
> > > other interesting off-path choices one can do.  (For instance, assign
> > > new server/128 for each client connection, and then when the connection
> > > arrives, dynamically map it to a particular server.  This pushes the state
> > > storage from layer-4 to the neighbour cache, which might not be a win)
> > >
> > > So I seriously question whether any of this matters to server farms.
> > >
> > >     > I am strongly against breaking the flow label just at the time when
> > >     > the major o/s are starting to set it correctly.
> > >
> > > :-)
> > >
> > >     > I'm all for fixing the fragmentation problem;
> > >     > draft-ietf-intarea-frag-fragile
> > >     > exists for a reason. But not by breaking something else.
> > >
> > > My quick read says that it looks great to me.
> > >
> > > Again, I don't really think that using the flow label to seed PLPMTUD
> > > is much of a win, but if it did provide something useful, I think it could be
> > > done without too much harm.
> > >
> > > To reiterate: I don't think the benefit is high enough to warrant the
> > >    risk, despite the fact that I don't think the risk is as high as you
> > >    are suggesting.
> > >
> > > --
> > > ]               Never tell me the odds!                 | ipv6 mesh networks [
> > > ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> > > ]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [
> > >
> > >
> > > --
> > > Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
> > >  -= IPv6 IoT consulting =-
> > >
> > >
> > >
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------