Re: Never fragment: getting PMTU info transmitted reliably

Brian E Carpenter <brian.e.carpenter@gmail.com> Thu, 17 January 2019 00:30 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DB79112D4EF for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 16:30:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 76LnB-YQU5Wi for <ipv6@ietfa.amsl.com>; Wed, 16 Jan 2019 16:30:30 -0800 (PST)
Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BD94B126CC7 for <ipv6@ietf.org>; Wed, 16 Jan 2019 16:30:30 -0800 (PST)
Received: by mail-pf1-x42d.google.com with SMTP id 64so3876651pfr.9 for <ipv6@ietf.org>; Wed, 16 Jan 2019 16:30:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=MLa24LlHLkKLAdGIe2E55j5nqMX/aajo/oWIKBc/r2w=; b=sQMVwBn4hGhidfORP6PCN01+oT6bFJ+IViTMrQnmM0uTmcUPEw4weeGU6T7UFwbZgj jU2+Dd75hjCgXr4s+Lw+MsuGkMWDp++XjmqesW2qBtPPoEZy8lZumsEHFu4o2Nx9pmKw pDMUgAAynNPqZKbn6TlAbxrNbqeo82yEzr/4LAqtpie9tKGG783y1lesxxGpACkE/Kyt uzZj4E5i7WcSPYoaBloD4SrSmwuKvAmkBToIs1l/6IsEHJUzlW+E+5CfAPVShJQeulmG qDFuNq4PhsIxiH3D9Nd4i2G4H6U8JNV5tKaFMw/IKE1gk1xRc4PnWGLfBJ6rfcOBdVL0 N1Tg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=MLa24LlHLkKLAdGIe2E55j5nqMX/aajo/oWIKBc/r2w=; b=AjjfbNJT/2RZ7a6oVgzwOmC4iG3gh+erK9OZ/HsaD5dvizHyMQwODE2mDiS25FY9LL 0UhEqtuk9AWoXvBui/eV41+GQrPaE5SKl2J6VMcn8H+YvLoQGShmejuPY8njljp8tXLl N8uWu6K6lbMXyH6k9FIXkCif077nc4TkFwamfc7PUH0ebB5N2oRExRo7We5mY1WQTUF1 kJutQtnQlvr1uH2EhBtRlGOlw3XA7FlnfW5VoA0qPVXGU9uQP26M3X1RT8OyktJltVFa B81NDzybOpT68rBJspKoOsdmLUpMWTNp4z6NwUIM8+EemxaAqtP4Cr3ZTo76wCUf1DAK o0HA==
X-Gm-Message-State: AJcUukc1MsKLm5C/wModfKYOaJAzjuaZw15lBWUpxgFvRuA/VPSk2oAv wIggDmIdMcuvCUSuKzhlu37kscXG
X-Google-Smtp-Source: ALg8bN6GyQNLDf6Hm6jzA/+oJLpYqwfqF4bCmGtl/HN/X3GJhzC08SP2RzJraaAVhuL+MgTEL59akQ==
X-Received: by 2002:a63:5723:: with SMTP id l35mr11042585pgb.228.1547685029655; Wed, 16 Jan 2019 16:30:29 -0800 (PST)
Received: from [192.168.178.30] ([118.148.79.176]) by smtp.gmail.com with ESMTPSA id k24sm10602108pfj.13.2019.01.16.16.30.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 16:30:28 -0800 (PST)
Subject: Re: Never fragment: getting PMTU info transmitted reliably
To: "Joel M. Halpern" <jmh@joelhalpern.com>, Michael Richardson <mcr+ietf@sandelman.ca>, IPv6 List <ipv6@ietf.org>
References: <CAOSSMjV0Vazum5OKztWhAhJrjLjXc5w5YGxdzHgbzi7YVSk7rg@mail.gmail.com> <AEA47E27-C0CB-4ABE-8ADE-51E9D599EF8F@gmail.com> <6aae7888-46a4-342d-1d76-10f8b50cebc4@gmail.com> <EC9CC5FE-5215-4105-8A34-B3F123D574B9@employees.org> <4c56f504-7cd7-6323-b14a-d34050d13f4e@foobar.org> <9E6D4A6E-8ABA-4BAB-BEC5-969078323C96@employees.org> <CAAedzxpdF+yhBXfnwUcaQb-HkgdaqXRU3L+S7v8sS1F0OkwM9A@mail.gmail.com> <78a8a0e0-8808-364c-41f7-f81f90362432@gont.com.ar> <CAAedzxpjxhP0nOZVU0CTwA1u3fsPFthrJASjDEfnLcRNvr2gBQ@mail.gmail.com> <c9be798e-5a32-7c3e-a948-9ca2fab30411@si6networks.com> <CAHw9_i+M2-420pykp99LcgMNSG=eeDqsZK8+hN20t_uUdANHfA@mail.gmail.com> <d6e52c30-bbd1-1ee7-144c-fa13a9df5f38@gmail.com> <0f4a6c88-1def-6766-235b-1bcd2cc5e33b@si6networks.com> <CAHw9_i+FB-tb8c+G22FCUxNg9BDpMfwqur8gSn5QaXteBcABZA@mail.gmail.com> <3 eead7ba-dcb4-ed52-05bb-a41a5602f251@gmail.com> <14135.1547681760@localhost> <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <d3ee03ad-bd24-f353-ddc9-c3cf8a4eb89b@gmail.com>
Date: Thu, 17 Jan 2019 13:30:23 +1300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <a044c327-d9ce-573e-a158-6c4b157f2d6c@joelhalpern.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/hZZPBYrSfm3HJRwgd9TlZvmFCf4>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jan 2019 00:30:33 -0000

On 2019-01-17 13:12, Joel M. Halpern wrote:
> Just to clarify one aspect of the way entropy in path selection, I want 
> to point out a complication.
> 
> It is not anywhere near enough to have as much entropy data as the 
> number of choices.  The problem is that you need enough randomness so 
> that you can expect a good distribution of flows.  And that even the 
> smaller number of larger flows will likely get distributed across the 
> choices.    Reducing the amount of available entropy can be quite 
> problematic.

Right. And for the server farm case, I don't think it's science fiction
these days to think about hundreds or thousands of servers. Also, if the
load sharing algorithm attempts to ensure that a given server has only
one big job at a time, then a high collision rate in the hash can 
defeat it. A form of the birthday paradox applies: not "what is the
chance of a clash per flow" but "what is the chance that out of a
thousand servers, one of them gets two big jobs at the same time"?

I am strongly against breaking the flow label just at the time when
the major o/s are starting to set it correctly.

I'm all for fixing the fragmentation problem; draft-ietf-intarea-frag-fragile
exists for a reason. But not by breaking something else.

   Brian

> 
> Yours,
> Joel
> 
> On 1/16/19 6:36 PM, Michael Richardson wrote:
>>
>> Warren writes about putting MTU info into flow Label:
>>      >> to signal up to a 9K MTU, we would need 13bits (LN(9K-1280)/LN(2) =
>>      >> 12.9).
>>      >> 20 - 13 gives 7 bits (128) for the hash entropy. "7 bits of entropy,
>>      >> evenly distributed, should be enough for anyone", he said, hoping
>>      >> no-one points at
>>      >> the obvious correlation to 640K of RAM...
>>
>> Brian E Carpenter <brian.e.carpenter@gmail.com> wrote:
>>      > I guess it all depends what you expect from the entropy. 20 bits gives
>>      > you a 1-in-a-million chance of a clash. 7 bits gives you a 1-in-128 chance
>>      > of a clash. This probably doesn't matter for a simple ECMP or LAG kind
>>      > of load sharing, but who's to say it doesn't matter for some more
>>      > fancy kind of sharing across a large array of servers?
>>
>> You'd have to have more than 128 servers and/or paths.
>> Maybe one of our SPRING people in this WG can tell us if that's a real
>> problem today.  Obviously, we can't guess if it will be bad in the future,
>> but if it's a problem today, then we can know that immediately.
>>
>> Now when I pushed for draft-ietf-6man-rfc6434-bis-09 and RFC8200 to say
>> that PLMTUD to be made MUST, I got various push backs that amounted to:
>>    1) we don't have enough evidence yet.
>>    2) it doesn't work for UDP and other traffic.
>>
>> (1) turned out to be a real issue. I thought that some of the big players
>>      could easily get, or already would have, that kind of evidence.
>>      (Linux does not ship with PLPMTUD on by default.  If you want it, btw,
>>      sysctl -w net.ipv4.tcp_mtu_probing=2.  Yes. ipv4. It affects both)
>>      Turns out I was told that they always set their TCP segment size such that
>>      they likely will never fragment for v4 or v6, because due to hardware
>>      Transit Offload, the cost of missing a tx-op exceeds the benefit of
>>      making the packet slightly bigger.  I suspect that this is true in
>>      general to UDP and QUIC traffic too.
>>      I imagine a next generation 10G NICs might offer QUIC offload, including
>>      doing the crypto.  That's what I'd be coding if I worked in that space.
>>
>> 2) I will admit that I personally don't care that much about UDP traffic,
>>     except in that it lets me run IPsec through NAT44s.   I know that I use
>>     QUIC and WebRTC regularly, and that corporate enterprise users gets
>>     screwed by lack of UDP regularly.
>>
>> So I think question is: is there really a problem that needs to be solved?
>> Maybe I will have to back and read the beginning of this thread again to
>> recall what the issue was.   Does this belong in SPUD?
>>
>> I wish that SCTP had flown higher...
>>
>> --
>> Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
>>   -= IPv6 IoT consulting =-
>>
>>
>> --------------------------------------------------------------------
>> IETF IPv6 working group mailing list
>> ipv6@ietf.org
>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>> --------------------------------------------------------------------
>>
> 
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------
>