Re: Never fragment: getting PMTU info transmitted reliably

Erik Kline <ek@loon.co> Thu, 17 January 2019 03:18 UTC

Return-Path: <ek@google.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B010312785F for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 19:18:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.5
X-Spam-Level:
X-Spam-Status: No, score=-9.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=loon.co
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ar2jyiNMYFd3 for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 19:18:05 -0800 (PST)
Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A93A8130DD6 for <ipv6@ietf.org>; Wed, 16 Jan 2019 19:18:05 -0800 (PST)
Received: by mail-io1-xd43.google.com with SMTP id r200so6638395iod.11 for <ipv6@ietf.org>; Wed, 16 Jan 2019 19:18:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=loon.co; s=google; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=L0dgsx7lswsPis7hmp5jeRDkcd3bTCMqtsUs9rbP5bU=; b=Owomq9JonpAvfsbspVLQBtT+h96Wkl+TMwC39MRNGXLvDXd0SPwK3vEDwoAT1Uh30u yiSoA/U9mZPHXf9Gymyc0bbo7jENQWASlYSCshG0P7Y47lA4OE6m3+eA9Eu/j97dmtyo OKpBJ2cerEYIViauKBxTk22dCstmt1bWE+tE8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=L0dgsx7lswsPis7hmp5jeRDkcd3bTCMqtsUs9rbP5bU=; b=WoYUEqgWOqXctZxWTP+nNMDfrTdlrZDBJWI66/2VyMh8WN3rd0ZXOWsdFa6rHrQ4lM JOZeepgNmcUgcdIANr27GvORpHrnI4rvPTg07ycDDi4Qg6+Dq7D0GM7Sui9nT6eH/pR9 gDSAHk2aulXkC9L/Czuql7t5wk8XXYg+xsuZNFnH5vzyRmlp2UcG7JojL5b+w+X/F0AX 0Ndd1Xs/JghWFosiyXtI1tZQOqwzfbjMwxqlANw56CcRZQ+AU+ZGhK/S+eEs9S9JuNEM zD8JfGNALdYZ7Y0E9z0XtMvTlORqvY+f+AuhVgYvjKPb3anYWnE5yspJNqWsejkmo+ka 7SCw==
X-Gm-Message-State: AJcUukfr6vFoT7YCT3ePbaqErNXVImwFYft3+tMuGWZzlDLqVtFTGMVY MYdmLrd3IAKlp1q7xElN/hc3zwLnmUZHnJ622sd3cg==
X-Google-Smtp-Source: ALg8bN6f2GivK6CEz0Uv4CpRPJW6HzSkITWLX9hwRWFOyd5JBm4MEa9tJWUX5U4wYQbIYwRioUxGMFGokHJ4w9Cm+xc=
X-Received: by 2002:a6b:5902:: with SMTP id n2mr6526263iob.16.1547695084577; Wed, 16 Jan 2019 19:18:04 -0800 (PST)
MIME-Version: 1.0
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com>
In-Reply-To: <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com>
Reply-To: ek@loon.co
From: Erik Kline <ek@loon.co>
Date: Wed, 16 Jan 2019 19:17:51 -0800
Message-ID: <CAAedzxrg590zfzvr+RJzBV7LAFzpADeY4c8iGxAgT25muGs63g@mail.gmail.com>
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/lR9FGZKMjq-B3stDjHBr3VnMnwY>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 03:18:08 -0000

On Wed, 16 Jan 2019 at 18:56, Brian E Carpenter
<brian.e.carpenter@gmail.com> wrote:
>
> On 2019-01-17 15:39, Michael Richardson wrote:
> >
> > Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
> >     > On 2019-01-17 13:12, Joel M. Halpern wrote:
> >     >> Just to clarify one aspect of the way entropy in path selection, I want
> >     >> to point out a complication.
> >     >>
> >     >> It is not anywhere near enough to have as much entropy data as the
> >     >> number of choices.  The problem is that you need enough randomness so
> >     >> that you can expect a good distribution of flows.  And that even the
> >     >> smaller number of larger flows will likely get distributed across the
> >     >> choices.    Reducing the amount of available entropy can be quite
> >     >> problematic.
> >
> >     > Right. And for the server farm case, I don't think it's science fiction
> >     > these days to think about hundreds or thousands of servers. Also, if the
> >     > load sharing algorithm attempts to ensure that a given server has only
> >     > one big job at a time, then a high collision rate in the hash can
> >     > defeat it. A form of the birthday paradox applies: not "what is the
> >     > chance of a clash per flow" but "what is the chance that out of a
> >     > thousand servers, one of them gets two big jobs at the same time"?
> >
> > Based upon my reading of the netflix blogs, they have experiemented
> > extensively with the load sharing, and they really don't care about
> > flow-labels in their decision process. (Of course, because IPv4 has
> > no such things)
>
> Indeed, but that's exactly why we brought in a load sharing expert
> to help us with RFC7098. And there are residual problems even in the
> ideal world where the flow label is perfect. We played with some ideas
> in https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
> but it didn't really go anywhere. In a nutshell, what's really needed
> is a bidirectional session ID, not a unidirectional flow ID. And
> that's not a layer 3 concept.

The potential utility of an operational convention that treats the
flowlabel akin to the TCP MSS field would seem to me to be much higher
than trying to help internet hyper giants with their load balancing.
All the more so if the load balancing is better solved with info from
layers above L3 (going to need to load balance on QUIC connection IDs
now, too).

(Writing a patch to the Linux kernel to stamp the MTU of the outgoing
interface into the flow label doesn't look like it's too hard.)