Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-datagram-plpmtud-02 to end 10th January 2018

"Philipp S. Tiesel" <phils@in-panik.de> Thu, 25 January 2018 13:02 UTC

Return-Path: <phils@in-panik.de>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 576CE129966; Thu, 25 Jan 2018 05:02:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xUApPTFCHVMX; Thu, 25 Jan 2018 05:02:00 -0800 (PST)
Received: from einhorn-mail.in-berlin.de (einhorn.in-berlin.de [192.109.42.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B157012751F; Thu, 25 Jan 2018 05:01:59 -0800 (PST)
X-Envelope-From: phils@in-panik.de
Received: from x-berg.in-berlin.de (x-change.in-berlin.de [217.197.86.40]) by einhorn.in-berlin.de (8.14.4/8.14.4/Debian-8+deb8u2) with ESMTP id w0PD0GUk021077 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 Jan 2018 14:00:16 +0100
Received: from [2001:638:809:ff1f::8295:dc66] by x-berg.in-berlin.de with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <phils@in-panik.de>) id 1eeh80-0003b3-TL; Thu, 25 Jan 2018 13:59:40 +0100
From: "Philipp S. Tiesel" <phils@in-panik.de>
Message-Id: <FAF66B62-B39F-47E6-AE11-9380F45EE722@in-panik.de>
Content-Type: multipart/alternative; boundary="Apple-Mail=_EBD182D4-1C4F-4445-BBBC-A549E4DDC1AD"
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Subject: Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-datagram-plpmtud-02 to end 10th January 2018
Date: Thu, 25 Jan 2018 14:00:14 +0100
In-Reply-To: <20180125104057.GA9797@tom-desk.erg.abdn.ac.uk>
Cc: Christian Huitema <huitema@huitema.net>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Michael Tuexen <michael.tuexen@lurchi.franken.de>, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>, QUIC WG <quic@ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
To: Tom Jones <tom@erg.abdn.ac.uk>
References: <CE03DB3D7B45C245BCA0D243277949362FE164EB@MX307CL04.corp.emc.com> <9837331A-76DF-4137-9612-CC653E869553@netapp.com> <5A563390.8050403@erg.abdn.ac.uk> <4123465B-CFE2-410E-BE1D-E09DC189F280@huitema.net> <9493C5B9-A79E-4311-8C07-67E14564B1ED@lurchi.franken.de> <838694fc-bfb6-9804-b8fa-ad7aa47e5472@huitema.net> <CAN1APdfiUTbnW2SrE_DXB2bOWqjd5GKjmZwEF11pBsHh9HHxZQ@mail.gmail.com> <CAN1APddumrZPELjeFR5hnwP3CuGUNyzZ_rZy=UJHqEZh0iZcWA@mail.gmail.com> <150452ec-ba36-f88a-4c29-013fea99c9c2@huitema.net> <20180125104057.GA9797@tom-desk.erg.abdn.ac.uk>
X-Mailer: Apple Mail (2.3445.5.20)
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/SBEMXKx_CBBdJrxEgvps5ywu0Is>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jan 2018 13:02:02 -0000


> On 25. Jan 2018, at 11:40, Tom Jones <tom@erg.abdn.ac.uk> wrote:
> 
> On Sun, Jan 21, 2018 at 06:42:27PM -1000, Christian Huitema wrote:
>> 
>> 
>> On 1/20/2018 11:50 PM, Mikkel Fahnøe Jørgensen wrote:
>>> Here is a Fibonacci variant the grows slower.
>>> Not sure it is any better, but the intention is to avoid probing very
>>> large packets too early.
>>> It could probably be applied recursively to avoid bin search altogether.
>>> The same idea might be applicable to reducing the congestion window as
>>> opposed to doubling or halving.
>>> 
>>> /* Fibonacci variant */
>>> /* roughly like this - untestet */
>>> unit = 10 /* min probe increment */
>>> a = minPMTU / unit
>>> b = maxPMTU / unit
>>> k1 = initDelta /* 1 or larger, e.g. a / 4 */
>>> k0 = 0
>>> while a + k0 <= b
>>>  k2 = k0 + k1
>>>  n = binsearch(a + k0, min(a + k1 - 1, b), unit)
>>>  if n return n
>>>  k0 = k1
>>>  k1 = k2
>>> end
>>> /* binsearch probes multiples of unit and calls
>>>   updatePMTUEstimate(n) whenever n is a larger
>>>   valid probe than previously reported */
>>> 
>> Yes, there are multiple ways to go about sending probes. In the QUIC
>> case, the peer sends its own version of MAX MTU during the handshake. So
>> what I did was simply try the peer's MTU, and if that fail do a binary
>> search between that and the initial value. The peer MTU is typically
>> between 1470 and 1500, so the binary search converges very quickly.
> 
> Thanks for the comment, this is an interesting approach we probably need
> some discussion of the probe size search algorithm in the draft. 

Ack; – I think the main question is whether you want a generic
algorithm or something that is optimised for today’s Internet,
but also somehow works otherwise.

> 
> Is the ~1500 value something you have manually configured or have you
> found it from the host? maxdgram on my machine is 9216.

1500 is default for Ethernet without Jumbo Frames.

>> There are certainly ways to do better, especially if the peer sets its
>> own MTU to some large value, e.g. in a data center environment. For
>> example, one could have a table of probability of finding some
>> particular MTU. And then apply logic based on how much the connection
>> wants to send, what it costs to send a probe, and what is the likely
>> gain if the probe succeeds. You could use that to choose the next probed
>> value. Or to decide to stop probing if no plausible next value can bring
>> a potential gain. Or to send several probes in parallel...
>> 
> 
> RFC1191 suggested a table with some common values. I would try:
> 
> 	- validating the base ~1280
> 	- probing for ~1500
> 	- if that works probing for the Max MTU from the peer
> 
> then falling back to a search algorithm to close the gap if it is a
> sensible size.


I think the table from RFC1191 will need updating, but I think this
approach works best:
 - validating the base ~1280
 - probe table of common MTUs (linear or binary search) 
   including Max MTU 
 - do some search between the largest working / smallest non-working MTU

Whereby I would use some exponential search variant (not tested):
 
 MTUsearch (a,b,step):
   if probeMTU(b-step):
      MTU := b-step
      a   := b-step
      step = step/2
   else
      b = b-step
      step = step*2
   if b > a
      MTUsearch(a,b,step)    
 
Rational behind that is that MTUs will usually be LL-MTU (hopefully in the table) minus some encapsulation.

AVE!
  Philipp S. Tiesel / phils…