Re: [trill] Tsvart telechat review of draft-ietf-trill-mtu-negotiation-06

Magnus Westerlund <magnus.westerlund@ericsson.com> Mon, 10 July 2017 12:13 UTC

Return-Path: <magnus.westerlund@ericsson.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5EBC127010; Mon, 10 Jul 2017 05:13:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UoAT-xWubVkn; Mon, 10 Jul 2017 05:13:40 -0700 (PDT)
Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FB72126BF0; Mon, 10 Jul 2017 05:13:38 -0700 (PDT)
X-AuditID: c1b4fb30-aeec49c000001664-3f-59636f71af8d
Received: from ESESSHC008.ericsson.se (Unknown_Domain [153.88.183.42]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id E5.ED.05732.17F63695; Mon, 10 Jul 2017 14:13:37 +0200 (CEST)
Received: from [127.0.0.1] (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.44) with Microsoft SMTP Server id 14.3.352.0; Mon, 10 Jul 2017 14:13:35 +0200
To: "Zhangmingui (Martin)" <zhangmingui@huawei.com>, "tsv-art@ietf.org" <tsv-art@ietf.org>
CC: "draft-ietf-trill-mtu-negotiation.all@ietf.org" <draft-ietf-trill-mtu-negotiation.all@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "trill@ietf.org" <trill@ietf.org>
References: <149925973580.17545.6979655005275084891@ietfa.amsl.com> <4552F0907735844E9204A62BBDD325E7A6546EF1@NKGEML515-MBX.china.huawei.com>
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
Message-ID: <1419a786-f8c4-ad1b-1fb0-9e4562939445@ericsson.com>
Date: Mon, 10 Jul 2017 14:13:33 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <4552F0907735844E9204A62BBDD325E7A6546EF1@NKGEML515-MBX.china.huawei.com>
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="------------ms090101070908000509020904"
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprAIsWRmVeSWpSXmKPExsUyM2K7lm5hfnKkwdsnghab/r1msXi2cT6L xfvJ29ksZu1ZxGJxZW4jmwOrR8uRt6weS5b8ZApgiuKySUnNySxLLdK3S+DKuLt/GVPBhA7G iiU3brI2MD4r7GLk5JAQMJGYvWA1UxcjF4eQwBFGiQmdp9khnOWMEs0zX7CAVAkLeElMuXUc qIqDQ0QgRmLi5wyQGmaB1YwSF5dtg2roZ5TYf3AHG0gDm4CFxM0fjWA2r4C9xMTtZ8EGsQio SmzY9ZgRxBYFGnRt5h1WiBpBiZMzn4DVcAqESUw5t4sFYkM3o0TbrP1MIAkhAW2JhqYO1gmM /LOQ9MxCVgeSYBYwk5i3+SEzhK0tsWzhayhbXOLWk/lQNdYSM34dZIOwFSWmdD9kh7BNJV4f /cgIYRtJvNvTyL6AkXMVo2hxanFSbrqRkV5qUWZycXF+nl5easkmRmDcHNzy22AH48vnjocY BTgYlXh4y62SI4VYE8uKK3MPMaoAzXm0YfUFRimWvPy8VCUR3hspQGnelMTKqtSi/Pii0pzU 4kOM0hwsSuK8jvsuRAgJpCeWpGanphakFsFkmTg4pRoY9a7L3HGcPy81KPqlx/uqtDid390i 3UZF5zqnBJl7+++cG3XQVP+h+Kmv7EdiTv9bamMVs7L7MEvtkZ4V1w9+nRr0oun5gZZjB1nK vnFWe6TyLEr6xqbpJbN5VUTmxhBfzXleWerNX3aYvf60Uffbd8ZkFSPu2a8m3fFX5P54YI8y zyvpZTP9lFiKMxINtZiLihMB4FzAzaMCAAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/trill/R5UqFKhxt2F_VMECAsuSJ46NJng>
Subject: Re: [trill] Tsvart telechat review of draft-ietf-trill-mtu-negotiation-06
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jul 2017 12:13:43 -0000

Den 2017-07-08 kl. 07:57, skrev Zhangmingui (Martin):
> Hi Magnus,
>
> Thanks for your careful review. Please see the reponses inline below.
>
> ________________________________________
> From: Magnus Westerlund [magnus.westerlund@ericsson.com]
> Sent: Wednesday, July 05, 2017 21:02
> To: tsv-art@ietf.org
> Cc: draft-ietf-trill-mtu-negotiation.all@ietf.org; ietf@ietf.org; trill@ietf.org
> Subject: Tsvart telechat review of draft-ietf-trill-mtu-negotiation-06
>
> Reviewer: Magnus Westerlund
> Review result: Not Ready
>
> This TSV-ART review is influenced by that I did the review of
> draft-ietf-trill-over-ip.
>
> 1. So draft-ietf-trill-over-ip-10 has MTU discovery needs for
> determining if the UDP encapsulation will work or not. It references  in
> Section 8.4 the old RFC, i.e. RFC 6325, which is updated by
> draft-ietf-trill-mtu-negotiation.
>
>      TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in
>      [RFC7177], can be used to obtain added assurance of the MTU of a
>      link.
>
> However, this is not quite true, as if the IP path MTU is below 1470
> bytes, which is not unheard of, the algorithm in the MTU negotiation
> draft can't determine it. It will only report the IP path as having an
> MTU to small when the 1470 bytes probe fail.
>
> [Mingui]   I copied the relevant text from RFC 6325.
>      "The desired minimum acceptable inter-RBridge link MTU for the
>        campus, that is, originatingLSPBufferSize.  This is a 16-bit
>        unsigned integer number of octets that defaults to 1470 bytes,
>        which is the minimum valid value.  Any lower value being
>        advertised by an RBridge is ignored."
> So the minimum value of Sz would be 1470. IOW, IP path with MTU below 1470 will not be qualified as an adjaceny of the TRILL network topology.

So this issue (1) is mostly bringing up what I saw as an discrepancy 
between trill-over-ip and this document. I think it is quite reasonable 
to push the needed extensions for more generalized IP path MTU discovery 
in the Trill context onto the trill-over-ip document. However, I want to 
note that from my perspective if one think one can run Trill-over-IP 
then one better have a mechanism that can do fragmentation and 
reassembly to handle IP MTUs that will not let trill packets of 1470 
bytes through as that is quite common, in fact IPv6/UDP/Trill over 
regular ethernet with 1500 bytes IP MTU is sufficient to fail that 
criteria.

>
> So, if the trill-over-ip authors want to use this as a mechanism, then the MTU
> negotiation draft needs to be expanded to have more flexible lower
> boundaries. However, that appear to affect MTU negotiation quite
> significant as it needs to separate algorithm for finding MTU, from the
> different usage of the algorithm with different starting points. Where
> the normal will have a lower bound of 1470, and be more tightly coupled
> to Sz when finding Lz. While the Trill-over-IP has a different usage.
>
> I think the trill WG needs to decide on how to slice this. If the
> MTU-negotiation only targets the explicit targets in the current draft and goes
> forward now. Or if they want to meet trill-over-ip's goals which will require
> restructuring.
>
> 2. Another issue, is that I think the algorithm is a bit short on
> transmission scheduling recommendations:
>
>      1) If RB1 successfully receives the MTU-ack from RB2 to the probe of
>         the value of link-wide Lz within k tries (where k is a
>         configurable parameter whose default is 3), link MTU size is set
>         to the size of link-wide Lz and stop.
>
> If I do this test with all three packets back to back at line rate, I could
> potentially get all probes lost in the same burst loss in router queue or
> switch fabric. What I think is needed here is a specification on how these
> probes are transmitted. Spaced in a particular way, or at least minimal
> distance, and are the additional probes only sent after the previous has been
> judged to have been lost, which makes it interact with the next issue.
>
> [Mingui] This seems an implementation space. However, the document may offer recommendations. The being recommended minimum interval between two successive probes would affect the boot up speed of a TRILL campus. One RTT is a reasonable value.

Okay, for this there might be some implementation variations. However, 
it can clearly affect the performance of the implementation so some 
recommendation are likely good. I also think a spacing of one per RTT is 
quite reasonable. However, that leads to the question, how does the 
sender know what RTT there is? Is there something in the TRILL protocol 
that will determine the RTT and have a current value? Sorry, I don't 
have time to digest the whole TRILL suit of specifications.


>
> 3. This is also unclear on what the criteria is for determining that something
> is lost:
>
>        a) If RB1 fails to receive an MTU-ack from RB2 after k tries, RB1
>            sets the "failed minimum MTU test" flag for RB2 in RB1's Hello
>            and stop.
>
> I fail to see any specification for the criteria when an MTU-ack should be
> considered to have failed to reach the probing entity. So this appear to
> require a timeout, and thus a timeout interval. Is the RTT known so that one
> can define something as lost after N*RTT? Are there possible delays in sending
> the MTU-ack that are considered okay that can affect this?
>
> [Mingui] Yes, this makes sense. An MTU-ack should be considered to have failed two RTT after the probe is sent out.

So two questions on this. First, can a receiver know which MTU probe it 
gets response to, i.e. are there some token or sequence number being 
acked? Second, 2*RTT appear to be to short to make that conclusion in. 
The reasons I say so is that there will be networks where the jitter of 
the path is larger than the RTT sample, thus leading to 
misinterpretation. Thus, I would recommend a bit more robustness to 
misinterpretation and likely some minimal value that avoids low latency 
paths wrongly classify the path. Note, I think it is needed to define 
what the criteria is for when an MTU is considered lost, just because it 
appears that it affects both startup and robustness of the probing. It 
also becomes a question of what tools in the probes that are used for 
this, and if you actually need some additional ones.

>
> 4. Section 3, the algorithm in Step 1 is unable to reach the first termination
> condition (3) "If lowerBound >= upperBound" in some cases.
>
> [Mingui] This algorithm has been updated through a few rounds of revisions. Let me insert a few minor updates to the cited text as below.
>
>    Step 1: RB1 tries to send an MTU-probe padded to the size x.
>
>     1) If RB1 fails to receive an MTU-ack from RB2 after k tries:
>
>           upperBound is set to x and x is set to [(lowerBound +
>           upperBound)/2], rounded up to the nearest integer.
>
> [Mingui] s/uppperBound is set to x/uppperBound is set to x-1/
> [Mingui] s/rounded up to the nearest integer./rounded down to the nearest integer./
>
>
>     2) If RB1 receives an MTU-ack to a probe of size x from RB2:
>
>           link MTU size is set to x, lowerBound is set to x and x is set
>           to [(lowerBound + upperBound)/2], rounded up to the nearest
>           integer.
>
> [Mingui] s/rounded up to the nearest integer./rounded down to the nearest integer./
> [Mingui] Append one condistion to this step 2): If lowerBound equals upperBound-1 then x is set to upperBound.
>
>     3) If lowerBound >= upperBound or Step 1 has been repeated n times
>        (where n is a configurable parameter whose default value is 5),
>        stop.
>
>     4) Repeat Step 1.
>
> I run this on the input data: Lower bound = 1470, Upper bound = 9216 and with
> an MTU of 7935 and gets the following sequence:
>
> Lower   Upper   X
> 1470    9216    5343
> 5343    9216    7280
> 7280    9216    8248
> 7280    8248    7764
> 7764    8248    8006
> 7764    8006    7885
> 7885    8006    7946
> 7885    7946    7916
> 7916    7946    7931
> 7931    7946    7939
> 7931    7939    7935
> 7935    7939    7937
> 7935    7937    7936
> 7935    7936    7936
> 7935    7936    7936
>
> Thus, the termination condition needs to change.
>
> [Mingui] After the update of the text, the sequence would become:
> Lower   Upper   X
> 1470    9216    5343
> 5343    9216    7279
> 7279    9216    8247
> 7279    8246    7762
> 7762    8246    8004
> 7762    8003    7882
> 7882    8003    7942
> 7882    7941    7911
> 7911    7941    7926
> 7926    7941    7933
> 7933    7941    7937
> 7933    7936    7934
> 7934    7936    7935
> 7935    7936    7935
Shouldn't the above line result in that X becomes 7936, as the probe 
before it succeeds, and then the new additional rule in step (2) then X 
becomes upper bound.

> 7935    7936    7936
> 7935    7935    7935
>
> The second I notice is that having a limitation on number of steps as 5,
>
> [Mingui] Since the testing might be too resource consuming, implementors suggested this limitation. Afterall, the purpose of testing a Lz value is to improve the efficiency (if Lz > Sz) rather than reach the optimal efficiency.

Ok.

>
> results in quite a large gap
> between upper and lower bound in which the MTU exists in.
>
> 5. I frankly gets confused by the application of the binary search. First it
> will in many case not be run to termination where the actual MTU is determined.
> Then the result of the upper and lower bound are just used to confirm the Sz
> value. There are no discussion about using the MTU search to determine a new
> possible value for Sz.
>
> [Mingui] Because the MTU search will NOT be used to determine a new possible value for Sz. It is only applicable to Lz.
>
>   The text is not even explicit that lower bound is the
> highest known to work Transmission unit size at the time of testing. I think
> section 3, should conclude in determine some TU value, and if that is Sz or
> something other appears quite relevant for what to do in the later sections.
>
> [Mingui] As specified in Section 3, “link MTU size” is already set to the lower bound. This tested “link MTU size” is the “TU” value. This value is potentially larger than Sz as explained in the introduction and Section 2.
>

Ok, I think I am getting what the different values are for.

Cheers

Magnus Westerlund

----------------------------------------------------------------------
Media Technologies, Ericsson Research
----------------------------------------------------------------------
Ericsson AB                 | Phone  +46 10 7148287
Torshamnsgatan 23           | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
----------------------------------------------------------------------