Re: Never fragment: getting PMTU info transmitted reliably

Tom Herbert <tom@herbertland.com> Thu, 17 January 2019 15:12 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BC571277CC for <ipv6@ietfa.amsl.com>; Thu, 17 Jan 2019 07:12:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.041
X-Spam-Level:
X-Spam-Status: No, score=-2.041 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sziLSNb4NZcU for <ipv6@ietfa.amsl.com>; Thu, 17 Jan 2019 07:12:22 -0800 (PST)
Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D574412894E for <ipv6@ietf.org>; Thu, 17 Jan 2019 07:12:21 -0800 (PST)
Received: by mail-qt1-x833.google.com with SMTP id v11so11688637qtc.2 for <ipv6@ietf.org>; Thu, 17 Jan 2019 07:12:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7WDT4zU0SPRsTO/xXBFU7OvvLpByCMeLwTGmO3wCRSs=; b=QlHAH/ruzq6pi7qqWxa+Q2nkDKtBP8WNLuIj799o1Vvixf58jmdALA7x3WI8p9DhPR c4ImAIb+ItwghNyqTa/DeDgOLoXgLixZNiHaXXiCmKHOTBVJ5zUB+iK0I24da/zptvUR M4dOZ3lGAIsByDlFflz2DezY8a4iiW2vEHGOKZHp7IgfccF0vJVafSG8L53/pYRiQLNT xxamCQiJGM4VrI1CY+p287Pv4Q7SXACU4gOuEpWd6sMT19kUOkTov68uL/hTLvz3OpVp UDS1qwEEiMDmg8vAAPVBKSyDnCZdeEU9RyGhpLVBDPwp9gqiJn5Uz63KFKa7oNKrDg16 A6fA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7WDT4zU0SPRsTO/xXBFU7OvvLpByCMeLwTGmO3wCRSs=; b=S8k7p5uyUWBAr7NwnJgQg6OrCyllCHSTjGH3LtLWsGJ8cxEoMKZfBw1M/nsT/JTTyx tnt1U0X43yMGF4OuDJsmT9fInC9Dd5KV+H13K0ceWaVZwp+hDXApa4bJrPVVwjKXlulq z0pr4ocebjxSbb5Kr+IAY9L6NunIVcVKbOhCBqwynSIjehZ7NhpJ++0KFa5YR2i2muUh hRV6JnQSEJZz9Ta2JBl+vWHsawF7Zm4k+uUVkid7Am9adyR5AGbVYMYmEsff1tuJ4lz1 9/7WfUfZGcVqW4IJWBpdxwPC8GiUqyu38MGckUi/aHvJi8Bj55BTO0DJjpwP9alr/tmI EnAA==
X-Gm-Message-State: AJcUukcGF89FUMv8EEaolSdLPt+RYy88RDduShaZIIRS6XjRSvP6gXay X2m0gi3deKHT+ddDaklX64ZKA03o4LOVPMSSx25MMEe136Q=
X-Google-Smtp-Source: ALg8bN5OjVY1vz3mWMxf38+84vsZUtTxck8FlI0u7k85jxmb15FxfJhNzUNtBbANVS/obMMwHRjQHPBO9B92HD97feM=
X-Received: by 2002:a0c:b407:: with SMTP id u7mr11172583qve.179.1547737940715; Thu, 17 Jan 2019 07:12:20 -0800 (PST)
MIME-Version: 1.0
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com> <CAO42Z2wsK+e3p25ZVnRfYXqmATLoEj+-1uTx8QVuEZEHqcXj0w@mail.gmail.com> <CALx6S35=AhF=5WdQNymTNu+Xtd3zV2KVWyHdwJzw2XNejns77g@mail.gmail.com> <cf56fa2230a14e358b297561a32bcf5b@ustx2ex-dag1mb5.msg.corp.akamai.com>
In-Reply-To: <cf56fa2230a14e358b297561a32bcf5b@ustx2ex-dag1mb5.msg.corp.akamai.com>
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 17 Jan 2019 07:12:09 -0800
Message-ID: <CALx6S37oBojME2B53KGmurGgHVx+T_RtWv=NORk4ouBCVPApww@mail.gmail.com>
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: "Lubashev, Igor" <ilubashe@akamai.com>
Cc: Mark Smith <markzzzsmith@gmail.com>, 6man <ipv6@ietf.org>, "mcr+ietf@sandelman.ca" <mcr+ietf@sandelman.ca>
Content-Type: multipart/alternative; boundary="000000000000297ff7057fa8d2bc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/yHwL_s8COCYlTeE-TGqVjY6Q1Cw>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 15:12:25 -0000

On Wed, Jan 16, 2019, 9:48 PM Lubashev, Igor <ilubashe@akamai.com wrote:

> There is a lot of complexity in injecting entropy into IP addresses. As
> long as using a 4-tupple for TCP and UDP traffic, 2-tupple+SPI for IPSec,
> 2-tupple+Key for GRE, etc. "just works", there is little incentive to
> deploy complex solutions that involve DNS, neighbour caches, etc. The hope
> for new protocols on top of ipv6 is the flow label, because it is so simple
> to use right by everyone: the sender, the receiver, and the middle boxes.
>

Igor,

I'm not sure I'd say that it "just works". Take a look at Maglev load
balancer (
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf)
and all the complexity needed to ensure consistent routing to backends, and
even with all that it still will drop some number of connections when
backends change.

Using flow label or transport layer information is only best effort for
persistence. Intermediate devices are allowed to change devices flow
labels, NAT evictions of UDP may change the client side ports and address
mid-flow. QUIC is interesting since it doesn't use the IP addresses or
ports for identifying connections. That conceptually allows a client to
purposely randomize both the source address and the port on every packet to
make it hard for intermediate devices to correlate packets as belonging to
the same flow (for security).

Tom

- Igor
>
> -----Original Message-----
> *From:* Tom Herbert [tom@herbertland.com]
> *Received:* Wednesday, 16 Jan 2019, 11:36PM
> *To:* Mark Smith [markzzzsmith@gmail.com]
> *CC:* Michael Richardson [mcr+ietf@sandelman.ca]; IPv6 List [ipv6@ietf.org
> ]
> *Subject:* Re: Never fragment: getting PMTU info transmitted reliably
>
> On Wed, Jan 16, 2019 at 7:28 PM Mark Smith <markzzzsmith@gmail.com> wrote:
> >
> > On Thu, 17 Jan 2019 at 13:57, Brian E Carpenter
> > <brian.e.carpenter@gmail.com> wrote:
> > >
> > > On 2019-01-17 15:39, Michael Richardson wrote:
> > > >
> > > > Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> > > >     > On 2019-01-17 13:12, Joel M. Halpern wrote:
> > > >     >> Just to clarify one aspect of the way entropy in path
> selection, I want
> > > >     >> to point out a complication.
> > > >     >>
> > > >     >> It is not anywhere near enough to have as much entropy data
> as the
> > > >     >> number of choices.  The problem is that you need enough
> randomness so
> > > >     >> that you can expect a good distribution of flows.  And that
> even the
> > > >     >> smaller number of larger flows will likely get distributed
> across the
> > > >     >> choices.    Reducing the amount of available entropy can be
> quite
> > > >     >> problematic.
> > > >
> > > >     > Right. And for the server farm case, I don't think it's
> science fiction
> > > >     > these days to think about hundreds or thousands of servers.
> Also, if the
> > > >     > load sharing algorithm attempts to ensure that a given server
> has only
> > > >     > one big job at a time, then a high collision rate in the hash
> can
> > > >     > defeat it. A form of the birthday paradox applies: not "what
> is the
> > > >     > chance of a clash per flow" but "what is the chance that out
> of a
> > > >     > thousand servers, one of them gets two big jobs at the same
> time"?
> > > >
> > > > Based upon my reading of the netflix blogs, they have experiemented
> > > > extensively with the load sharing, and they really don't care about
> > > > flow-labels in their decision process. (Of course, because IPv4 has
> > > > no such things)
> > >
> > > Indeed, but that's exactly why we brought in a load sharing expert
> > > to help us with RFC7098. And there are residual problems even in the
> > > ideal world where the flow label is perfect. We played with some ideas
> > > in
> https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
> > > but it didn't really go anywhere. In a nutshell, what's really needed
> > > is a bidirectional session ID, not a unidirectional flow ID. And
> > > that's not a layer 3 concept.
> > >
> >
> > I think really what you want is an anycast IPv6 service address in DNS
> > for the load balanced service that the client uses to establish the
> > initial transport layer connection, and then a method to announce to
> > the client and then hand off that session to the unicast address of
> > the server actually handling the session. That would make the load
> > balancer with the anycast service address more of a session broker
> > rather than something that is inline with all the sessions' traffic.
> >
> Mark,
>
> Alternatively, a new record type could be added to DNS that returns
> blocks of IPv6 address (could be called BBBB records). The BBBB record
> would be something like a base IPv6 address and an extent. Given the
> enormous size IPv6 addresses, a single record could contain billions
> of addresses for a service. The client just needs to pick address in
> the blaock at random, and load balancing to backend servers is
> accomplished by routing to back end servers solely based on
> destination address (each backend server serves some portion of the
> address block). No need for VIPs, anycast, DPI into the transport
> layer, or stateful load balancing. This also has the advantage of
> introducing a lot more bits of entropy for other load balancing
> techniques like ECMP (using a different IPv6 source IP address for
> every connection would have a similar effect).
>
> Tom
>
>
> > Multipath TCP would fit the bill, and I assume the multipath
> > extensions for QUIC will too.
> >
> > >    Brian
> > >
> > > >
> > > > It's about how fresh the (disk read) caches on the servers are, what
> content
> > > > is being streamed, and other things that have nothing to do with the
> > > > packets themselves.
> > > >
> > > > Architecturally with IPv6, if you have an entire /64 (or more) to
> play with
> > > > and you can statelessly forward packets at wire speed,  then there
> are
> > > > other interesting off-path choices one can do.  (For instance, assign
> > > > new server/128 for each client connection, and then when the
> connection
> > > > arrives, dynamically map it to a particular server.  This pushes the
> state
> > > > storage from layer-4 to the neighbour cache, which might not be a
> win)
> > > >
> > > > So I seriously question whether any of this matters to server farms.
> > > >
> > > >     > I am strongly against breaking the flow label just at the time
> when
> > > >     > the major o/s are starting to set it correctly.
> > > >
> > > > :-)
> > > >
> > > >     > I'm all for fixing the fragmentation problem;
> > > >     > draft-ietf-intarea-frag-fragile
> > > >     > exists for a reason. But not by breaking something else.
> > > >
> > > > My quick read says that it looks great to me.
> > > >
> > > > Again, I don't really think that using the flow label to seed PLPMTUD
> > > > is much of a win, but if it did provide something useful, I think it
> could be
> > > > done without too much harm.
> > > >
> > > > To reiterate: I don't think the benefit is high enough to warrant the
> > > >    risk, despite the fact that I don't think the risk is as high as
> you
> > > >    are suggesting.
> > > >
> > > > --
> > > > ]               Never tell me the odds!                 | ipv6 mesh
> networks [
> > > > ]   Michael Richardson, Sandelman Software Works        |    IoT
> architect   [
> > > > ]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on
> rails    [
> > > >
> > > >
> > > > --
> > > > Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
> > > >  -= IPv6 IoT consulting =-
> > > >
> > > >
> > > >
> > >
> > > --------------------------------------------------------------------
> > > IETF IPv6 working group mailing list
> > > ipv6@ietf.org
> > > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > > --------------------------------------------------------------------
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------
>