RE: Never fragment: getting PMTU info transmitted reliably

"Lubashev, Igor" <ilubashe@akamai.com> Thu, 17 January 2019 05:48 UTC

Return-Path: <ilubashe@akamai.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 693A1126CB6 for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 21:48:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.253
X-Spam-Level:
X-Spam-Status: No, score=-5.253 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-4.553, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, KHOP_DYNAMIC=2, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 00tMbj1M8sdY for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 21:48:52 -0800 (PST)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BEAF71228B7 for <ipv6@ietf.org>; Wed, 16 Jan 2019 21:48:52 -0800 (PST)
Received: from pps.filterd (m0122332.ppops.net [127.0.0.1]) by mx0a-00190b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0H5l6UQ000687; Thu, 17 Jan 2019 05:48:51 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=jan2016.eng; bh=Bjc5E45KDR5gOQzkh9V8h7oyGC9QKHyGxcsqjqBjevQ=; b=MjpQYhudwxiA8RGgcSLWumkAo7XXdWS6SrrbdyxgVUsvSX1k2xU30s/BNXjKMHGVwhtw eZEkaJMLJBI8/f6wuip+Sm7KCqpASIkWO7eQhT5nvJFf/UX/XbpYmyLCYYqzSdvNtFPE 78f4bWEXzRxlmZQxu4Q+KHpdQjxUPv+QB6tzdPQLKm/7fkD0fanzIG7MKZ4Vbqj5wLR1 PZevpYtfW5bVbTmwY5EIdqlGbnOlvy7Bje0PTlEANwWD95d94HYuHDNAllJhnPknu3nh FiQZuYlqQqM/ZkvcCcNhT4XVbsfE9zm023/DkSo9uvH//EOs37/JWxtxFHXCkBIIZkhz jQ==
Received: from prod-mail-ppoint3 (a96-6-114-86.deploy.static.akamaitechnologies.com [96.6.114.86] (may be forged)) by mx0a-00190b01.pphosted.com with ESMTP id 2q1hjpdkqv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 17 Jan 2019 05:48:50 +0000
Received: from pps.filterd (prod-mail-ppoint3.akamai.com [127.0.0.1]) by prod-mail-ppoint3.akamai.com (8.16.0.27/8.16.0.27) with SMTP id x0H5kuGV017431; Thu, 17 Jan 2019 00:48:49 -0500
Received: from email.msg.corp.akamai.com ([172.27.25.31]) by prod-mail-ppoint3.akamai.com with ESMTP id 2pycf33704-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 17 Jan 2019 00:48:48 -0500
Received: from USTX2EX-DAG1MB5.msg.corp.akamai.com (172.27.27.105) by ustx2ex-dag1mb2.msg.corp.akamai.com (172.27.27.102) with Microsoft SMTP Server (TLS) id 15.0.1365.1; Wed, 16 Jan 2019 23:48:47 -0600
Received: from USTX2EX-DAG1MB5.msg.corp.akamai.com ([172.27.27.105]) by ustx2ex-dag1mb5.msg.corp.akamai.com ([172.27.27.105]) with mapi id 15.00.1365.000; Wed, 16 Jan 2019 23:48:47 -0600
From: "Lubashev, Igor" <ilubashe@akamai.com>
To: "tom@herbertland.com" <tom@herbertland.com>, "markzzzsmith@gmail.com" <markzzzsmith@gmail.com>
CC: "ipv6@ietf.org" <ipv6@ietf.org>, "mcr+ietf@sandelman.ca" <mcr+ietf@sandelman.ca>
Subject: RE: Never fragment: getting PMTU info transmitted reliably
Thread-Topic: Never fragment: getting PMTU info transmitted reliably
Thread-Index: AQHUrfRJNoiRW7HbHk+RedCzRe2QzKWy+zWAgAAFIoCAACQggIAABLEAgAAIrgCAABL9AP//r+VK
Date: Thu, 17 Jan 2019 05:48:47 +0000
Message-ID: <cf56fa2230a14e358b297561a32bcf5b@ustx2ex-dag1mb5.msg.corp.akamai.com>
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com> <CAO42Z2wsK+e3p25ZVnRfYXqmATLoEj+-1uTx8QVuEZEHqcXj0w@mail.gmail.com>, <CALx6S35=AhF=5WdQNymTNu+Xtd3zV2KVWyHdwJzw2XNejns77g@mail.gmail.com>
In-Reply-To: <CALx6S35=AhF=5WdQNymTNu+Xtd3zV2KVWyHdwJzw2XNejns77g@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
Content-Type: multipart/alternative; boundary="_000_cf56fa2230a14e358b297561a32bcf5bustx2exdag1mb5msgcorpak_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-01-17_02:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901170042
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-01-17_02:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901170042
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/vhluFUUHMdIn7nhKjxsO9kfOrzA>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 05:48:55 -0000

There is a lot of complexity in injecting entropy into IP addresses. As long as using a 4-tupple for TCP and UDP traffic, 2-tupple+SPI for IPSec, 2-tupple+Key for GRE, etc. "just works", there is little incentive to deploy complex solutions that involve DNS, neighbour caches, etc. The hope for new protocols on top of ipv6 is the flow label, because it is so simple to use right by everyone: the sender, the receiver, and the middle boxes.

- Igor

-----Original Message-----
From: Tom Herbert [tom@herbertland.com]
Received: Wednesday, 16 Jan 2019, 11:36PM
To: Mark Smith [markzzzsmith@gmail.com]
CC: Michael Richardson [mcr+ietf@sandelman.ca]; IPv6 List [ipv6@ietf.org]
Subject: Re: Never fragment: getting PMTU info transmitted reliably

On Wed, Jan 16, 2019 at 7:28 PM Mark Smith <markzzzsmith@gmail.com> wrote:
>
> On Thu, 17 Jan 2019 at 13:57, Brian E Carpenter
> <brian.e.carpenter@gmail.com> wrote:
> >
> > On 2019-01-17 15:39, Michael Richardson wrote:
> > >
> > > Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> > >     > On 2019-01-17 13:12, Joel M. Halpern wrote:
> > >     >> Just to clarify one aspect of the way entropy in path selection, I want
> > >     >> to point out a complication.
> > >     >>
> > >     >> It is not anywhere near enough to have as much entropy data as the
> > >     >> number of choices.  The problem is that you need enough randomness so
> > >     >> that you can expect a good distribution of flows.  And that even the
> > >     >> smaller number of larger flows will likely get distributed across the
> > >     >> choices.    Reducing the amount of available entropy can be quite
> > >     >> problematic.
> > >
> > >     > Right. And for the server farm case, I don't think it's science fiction
> > >     > these days to think about hundreds or thousands of servers. Also, if the
> > >     > load sharing algorithm attempts to ensure that a given server has only
> > >     > one big job at a time, then a high collision rate in the hash can
> > >     > defeat it. A form of the birthday paradox applies: not "what is the
> > >     > chance of a clash per flow" but "what is the chance that out of a
> > >     > thousand servers, one of them gets two big jobs at the same time"?
> > >
> > > Based upon my reading of the netflix blogs, they have experiemented
> > > extensively with the load sharing, and they really don't care about
> > > flow-labels in their decision process. (Of course, because IPv4 has
> > > no such things)
> >
> > Indeed, but that's exactly why we brought in a load sharing expert
> > to help us with RFC7098. And there are residual problems even in the
> > ideal world where the flow label is perfect. We played with some ideas
> > in https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
> > but it didn't really go anywhere. In a nutshell, what's really needed
> > is a bidirectional session ID, not a unidirectional flow ID. And
> > that's not a layer 3 concept.
> >
>
> I think really what you want is an anycast IPv6 service address in DNS
> for the load balanced service that the client uses to establish the
> initial transport layer connection, and then a method to announce to
> the client and then hand off that session to the unicast address of
> the server actually handling the session. That would make the load
> balancer with the anycast service address more of a session broker
> rather than something that is inline with all the sessions' traffic.
>
Mark,

Alternatively, a new record type could be added to DNS that returns
blocks of IPv6 address (could be called BBBB records). The BBBB record
would be something like a base IPv6 address and an extent. Given the
enormous size IPv6 addresses, a single record could contain billions
of addresses for a service. The client just needs to pick address in
the blaock at random, and load balancing to backend servers is
accomplished by routing to back end servers solely based on
destination address (each backend server serves some portion of the
address block). No need for VIPs, anycast, DPI into the transport
layer, or stateful load balancing. This also has the advantage of
introducing a lot more bits of entropy for other load balancing
techniques like ECMP (using a different IPv6 source IP address for
every connection would have a similar effect).

Tom


> Multipath TCP would fit the bill, and I assume the multipath
> extensions for QUIC will too.
>
> >    Brian
> >
> > >
> > > It's about how fresh the (disk read) caches on the servers are, what content
> > > is being streamed, and other things that have nothing to do with the
> > > packets themselves.
> > >
> > > Architecturally with IPv6, if you have an entire /64 (or more) to play with
> > > and you can statelessly forward packets at wire speed,  then there are
> > > other interesting off-path choices one can do.  (For instance, assign
> > > new server/128 for each client connection, and then when the connection
> > > arrives, dynamically map it to a particular server.  This pushes the state
> > > storage from layer-4 to the neighbour cache, which might not be a win)
> > >
> > > So I seriously question whether any of this matters to server farms.
> > >
> > >     > I am strongly against breaking the flow label just at the time when
> > >     > the major o/s are starting to set it correctly.
> > >
> > > :-)
> > >
> > >     > I'm all for fixing the fragmentation problem;
> > >     > draft-ietf-intarea-frag-fragile
> > >     > exists for a reason. But not by breaking something else.
> > >
> > > My quick read says that it looks great to me.
> > >
> > > Again, I don't really think that using the flow label to seed PLPMTUD
> > > is much of a win, but if it did provide something useful, I think it could be
> > > done without too much harm.
> > >
> > > To reiterate: I don't think the benefit is high enough to warrant the
> > >    risk, despite the fact that I don't think the risk is as high as you
> > >    are suggesting.
> > >
> > > --
> > > ]               Never tell me the odds!                 | ipv6 mesh networks [
> > > ]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
> > > ]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [
> > >
> > >
> > > --
> > > Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
> > >  -= IPv6 IoT consulting =-
> > >
> > >
> > >
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------

--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------