Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-datagram-plpmtud-02 to end 10th January 2018

"Philipp S. Tiesel" <phils@in-panik.de> Thu, 25 January 2018 13:02 UTC

From: "Philipp S. Tiesel" <phils@in-panik.de>
Message-Id: <FAF66B62-B39F-47E6-AE11-9380F45EE722@in-panik.de>
Content-Type: multipart/alternative; boundary="Apple-Mail=_EBD182D4-1C4F-4445-BBBC-A549E4DDC1AD"
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Subject: Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-datagram-plpmtud-02 to end 10th January 2018
Date: Thu, 25 Jan 2018 14:00:14 +0100
In-Reply-To: <20180125104057.GA9797@tom-desk.erg.abdn.ac.uk>
Cc: Christian Huitema <huitema@huitema.net>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Michael Tuexen <michael.tuexen@lurchi.franken.de>, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>, QUIC WG <quic@ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
To: Tom Jones <tom@erg.abdn.ac.uk>
References: <CE03DB3D7B45C245BCA0D243277949362FE164EB@MX307CL04.corp.emc.com> <9837331A-76DF-4137-9612-CC653E869553@netapp.com> <5A563390.8050403@erg.abdn.ac.uk> <4123465B-CFE2-410E-BE1D-E09DC189F280@huitema.net> <9493C5B9-A79E-4311-8C07-67E14564B1ED@lurchi.franken.de> <838694fc-bfb6-9804-b8fa-ad7aa47e5472@huitema.net> <CAN1APdfiUTbnW2SrE_DXB2bOWqjd5GKjmZwEF11pBsHh9HHxZQ@mail.gmail.com> <CAN1APddumrZPELjeFR5hnwP3CuGUNyzZ_rZy=UJHqEZh0iZcWA@mail.gmail.com> <150452ec-ba36-f88a-4c29-013fea99c9c2@huitema.net> <20180125104057.GA9797@tom-desk.erg.abdn.ac.uk>
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/SBEMXKx_CBBdJrxEgvps5ywu0Is>
Precedence: list


> On 25. Jan 2018, at 11:40, Tom Jones <tom@erg.abdn.ac.uk> wrote:
> 
> On Sun, Jan 21, 2018 at 06:42:27PM -1000, Christian Huitema wrote:
>> 
>> 
>> On 1/20/2018 11:50 PM, Mikkel Fahnøe Jørgensen wrote:
>>> Here is a Fibonacci variant the grows slower.
>>> Not sure it is any better, but the intention is to avoid probing very
>>> large packets too early.
>>> It could probably be applied recursively to avoid bin search altogether.
>>> The same idea might be applicable to reducing the congestion window as
>>> opposed to doubling or halving.
>>> 
>>> /* Fibonacci variant */
>>> /* roughly like this - untestet */
>>> unit = 10 /* min probe increment */
>>> a = minPMTU / unit
>>> b = maxPMTU / unit
>>> k1 = initDelta /* 1 or larger, e.g. a / 4 */
>>> k0 = 0
>>> while a + k0 <= b
>>>  k2 = k0 + k1
>>>  n = binsearch(a + k0, min(a + k1 - 1, b), unit)
>>>  if n return n
>>>  k0 = k1
>>>  k1 = k2
>>> end
>>> /* binsearch probes multiples of unit and calls
>>>   updatePMTUEstimate(n) whenever n is a larger
>>>   valid probe than previously reported */
>>> 
>> Yes, there are multiple ways to go about sending probes. In the QUIC
>> case, the peer sends its own version of MAX MTU during the handshake. So
>> what I did was simply try the peer's MTU, and if that fail do a binary
>> search between that and the initial value. The peer MTU is typically
>> between 1470 and 1500, so the binary search converges very quickly.
> 
> Thanks for the comment, this is an interesting approach we probably need
> some discussion of the probe size search algorithm in the draft. 

Ack; – I think the main question is whether you want a generic
algorithm or something that is optimised for today’s Internet,
but also somehow works otherwise.

> 
> Is the ~1500 value something you have manually configured or have you
> found it from the host? maxdgram on my machine is 9216.

1500 is default for Ethernet without Jumbo Frames.

>> There are certainly ways to do better, especially if the peer sets its
>> own MTU to some large value, e.g. in a data center environment. For
>> example, one could have a table of probability of finding some
>> particular MTU. And then apply logic based on how much the connection
>> wants to send, what it costs to send a probe, and what is the likely
>> gain if the probe succeeds. You could use that to choose the next probed
>> value. Or to decide to stop probing if no plausible next value can bring
>> a potential gain. Or to send several probes in parallel...
>> 
> 
> RFC1191 suggested a table with some common values. I would try:
> 
> 	- validating the base ~1280
> 	- probing for ~1500
> 	- if that works probing for the Max MTU from the peer
> 
> then falling back to a search algorithm to close the gap if it is a
> sensible size.


I think the table from RFC1191 will need updating, but I think this
approach works best:
 - validating the base ~1280
 - probe table of common MTUs (linear or binary search) 
   including Max MTU 
 - do some search between the largest working / smallest non-working MTU

Whereby I would use some exponential search variant (not tested):
 
 MTUsearch (a,b,step):
   if probeMTU(b-step):
      MTU := b-step
      a   := b-step
      step = step/2
   else
      b = b-step
      step = step*2
   if b > a
      MTUsearch(a,b,step)    
 
Rational behind that is that MTUs will usually be LL-MTU (hopefully in the table) minus some encapsulation.

AVE!
  Philipp S. Tiesel / phils…

Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Eggert, Lars
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Gorry Fairhurst
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Christian Huitema
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Michael Tuexen
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Christian Huitema
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Mikkel Fahnøe Jørgensen
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Mikkel Fahnøe Jørgensen
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Christian Huitema
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Tom Jones
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Philipp S. Tiesel
Re: [tsvwg] Adoption call: draft-fairhurst-tsvwg-… Joe Touch