Re: Never fragment: getting PMTU info transmitted reliably

Brian E Carpenter <brian.e.carpenter@gmail.com> Thu, 17 January 2019 04:36 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 24BC3123FFD for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 20:36:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AC5Of9uj3Vfe for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 20:36:22 -0800 (PST)
Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 72571130FCF for <ipv6@ietf.org>; Wed, 16 Jan 2019 20:36:22 -0800 (PST)
Received: by mail-pl1-x643.google.com with SMTP id t13so4092831ply.13 for <ipv6@ietf.org>; Wed, 16 Jan 2019 20:36:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=2WwUms+2GCe0rIZNlbw+C9rlMQabtzk5bsfB0rCkiF8=; b=EW5PEkK52q2kfeTUJ18OAO8smxtdlg3gnZi+VXV8jSFmn7UE7bK6jkIRJGxzJn9BZc 0kwY6Btb8H/c9ZchvDntTWczotXNTlbydVFuywShJ5Yl7WiOF7rDCi1I5aswyb1+VcTW hMGpZbjeMyyleL22xJ65ldYcw38KNS46xcN6NIGDNmTOIuNhZXdU9e4alHDRiF+viWWi c7pnpCaClphy35sR1/hkzGVlMRJkTdIgrsbizCr0jpn9wS6Na2AMqKI2a1QqIh4jOELo nq8VU+NLAL9gOBFmS6dw5FVn2ncsgYsGgxzR0TRIbYeHKYqH82idyYGc5Qs/kJLm0l9w 1HrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2WwUms+2GCe0rIZNlbw+C9rlMQabtzk5bsfB0rCkiF8=; b=QnfGj/HTF3f9eD09BAw20s1DN6ak/3/Ejfw9lP+Hjq+w2WNNGYIh2n45KJBH9kFc01 KHPMDeUVJVOUA2s9xNbymMGQJl6uxGLAaIg98aBTZ1hiWlNtqy+I/O6dM60c1f7h9KfZ vZdwJRZlg7kdSANDUKcKlGhvROcayr+g3yZzmrhq8Wfw7UsFdWJfhYrPaw+pwUhgEMxr cXTEWy6wXTKrAiNL7Typ2t3P7czSl2Ne5YTZI3XdHi4JMsU/RoNhJcSa1EkYQKy61Lgk /byFNUSL5TNW7hD9/tI/rpxyxZ8OtJlM6hhvh1fFxsxcZyAkZ9TSbpw65vm3sQsyWtMS 9g6Q==
X-Gm-Message-State: AJcUukfDwkvjLZlbMehcIyspCzrdkCf3YomXQtLw00OU9X4o6XTq62iH 8bUqaGRxQzR4HOrYm9mCUrRA5klH
X-Google-Smtp-Source: ALg8bN474jahYLTOZpx1kevshr5/K3V4gPUagH0agnQUhW5Qha9+pnmf09n0gaqOBHjitsVu+LvtbQ==
X-Received: by 2002:a17:902:720c:: with SMTP id ba12mr13510492plb.79.1547699781611; Wed, 16 Jan 2019 20:36:21 -0800 (PST)
Received: from [192.168.178.30] ([118.148.79.176]) by smtp.gmail.com with ESMTPSA id s9sm602495pgl.88.2019.01.16.20.36.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 20:36:20 -0800 (PST)
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: ek@loon.co
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com> <24583.1547692781@localhost> <116fbbeb-c191-cd57-5998-1d80db1c9917@gmail.com> <CAAedzxrg590zfzvr+RJzBV7LAFzpADeY4c8iGxAgT25muGs63g@mail.gmail.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <9069185e-d8cd-2d6b-0e1d-a6ecbc0af286@gmail.com>
Date: Thu, 17 Jan 2019 17:36:14 +1300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <CAAedzxrg590zfzvr+RJzBV7LAFzpADeY4c8iGxAgT25muGs63g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/v20qVYxHYUzR5uMGm88z2HHrPUs>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 04:36:25 -0000

Hi Erik,

On 2019-01-17 16:17, Erik Kline wrote:
> On Wed, 16 Jan 2019 at 18:56, Brian E Carpenter
> <brian.e.carpenter@gmail.com> wrote:
>>
>> On 2019-01-17 15:39, Michael Richardson wrote:
>>>
>>> Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
>>>     > On 2019-01-17 13:12, Joel M. Halpern wrote:
>>>     >> Just to clarify one aspect of the way entropy in path selection, I want
>>>     >> to point out a complication.
>>>     >>
>>>     >> It is not anywhere near enough to have as much entropy data as the
>>>     >> number of choices.  The problem is that you need enough randomness so
>>>     >> that you can expect a good distribution of flows.  And that even the
>>>     >> smaller number of larger flows will likely get distributed across the
>>>     >> choices.    Reducing the amount of available entropy can be quite
>>>     >> problematic.
>>>
>>>     > Right. And for the server farm case, I don't think it's science fiction
>>>     > these days to think about hundreds or thousands of servers. Also, if the
>>>     > load sharing algorithm attempts to ensure that a given server has only
>>>     > one big job at a time, then a high collision rate in the hash can
>>>     > defeat it. A form of the birthday paradox applies: not "what is the
>>>     > chance of a clash per flow" but "what is the chance that out of a
>>>     > thousand servers, one of them gets two big jobs at the same time"?
>>>
>>> Based upon my reading of the netflix blogs, they have experiemented
>>> extensively with the load sharing, and they really don't care about
>>> flow-labels in their decision process. (Of course, because IPv4 has
>>> no such things)
>>
>> Indeed, but that's exactly why we brought in a load sharing expert
>> to help us with RFC7098. And there are residual problems even in the
>> ideal world where the flow label is perfect. We played with some ideas
>> in https://tools.ietf.org/html/draft-tarreau-extend-flow-label-balancing
>> but it didn't really go anywhere. In a nutshell, what's really needed
>> is a bidirectional session ID, not a unidirectional flow ID. And
>> that's not a layer 3 concept.
> 
> The potential utility of an operational convention that treats the
> flowlabel akin to the TCP MSS field would seem to me to be much higher
> than trying to help internet hyper giants with their load balancing.
> All the more so if the load balancing is better solved with info from
> layers above L3 (going to need to load balance on QUIC connection IDs
> now, too).

The flow label is a proxy for the information above layer 3. As we
know, that information is unavailable in many cases, because of
encryption, intervening extension headers, or fragmentation.

Certainly a QUIC sender must set the flow label; RFC6437 is agnostic
about the transport layer. It says so explicitly, written years
before QUIC was invented:

"  Additionally, if classifiers depend only on IP-layer headers, later
   introduction of alternative transport-layer protocols will be easier."

> (Writing a patch to the Linux kernel to stamp the MTU of the outgoing
> interface into the flow label doesn't look like it's too hard.)

No, but it would violate the relevant IETF standards.

   Brian