Re: [ippm] AD review: draft-ietf-ippm-tcp-throughput-tm-07

Barry Constantine <Barry.Constantine@jdsu.com> Mon, 11 October 2010 12:12 UTC

Return-Path: <Barry.Constantine@jdsu.com>
X-Original-To: ippm@core3.amsl.com
Delivered-To: ippm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 934EF3A69E8 for <ippm@core3.amsl.com>; Mon, 11 Oct 2010 05:12:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.964
X-Spam-Level:
X-Spam-Status: No, score=-5.964 tagged_above=-999 required=5 tests=[AWL=0.635, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J7tugpDHnSUG for <ippm@core3.amsl.com>; Mon, 11 Oct 2010 05:12:00 -0700 (PDT)
Received: from exprod7og102.obsmtp.com (exprod7og102.obsmtp.com [64.18.2.157]) by core3.amsl.com (Postfix) with SMTP id 9E6073A69E9 for <ippm@ietf.org>; Mon, 11 Oct 2010 05:11:47 -0700 (PDT)
Received: from source ([209.36.247.244]) by exprod7ob102.postini.com ([64.18.6.12]) with SMTP ID DSNKTLL/QtVXNnlGlQGoBfcGs7P9mSXaIDFx@postini.com; Mon, 11 Oct 2010 05:13:01 PDT
Received: from milexhtca2.ds.jdsu.net ([10.75.2.122]) by Outbound1.jdsu.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 11 Oct 2010 05:11:19 -0700
Received: from MILEXCH1.ds.jdsu.net ([fe80::6444:f010:6d40:bdb]) by milexhtca2.ds.jdsu.net ([::1]) with mapi; Mon, 11 Oct 2010 05:11:19 -0700
From: Barry Constantine <Barry.Constantine@jdsu.com>
To: Lars Eggert <lars.eggert@nokia.com>, "ippm@ietf.org WG" <ippm@ietf.org>
Date: Mon, 11 Oct 2010 05:11:18 -0700
Thread-Topic: [ippm] AD review: draft-ietf-ippm-tcp-throughput-tm-07
Thread-Index: ActpNg2j9cm09z4jRC6nGpSmGrVXZAABcXVg
Message-ID: <94DEE80C63F7D34F9DC9FE69E39436BE38BA6A4F7C@MILEXCH1.ds.jdsu.net>
References: <5CFCD2B2-2E0E-448F-9C3C-3A629E95543E@nokia.com>
In-Reply-To: <5CFCD2B2-2E0E-448F-9C3C-3A629E95543E@nokia.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginalArrivalTime: 11 Oct 2010 12:11:19.0928 (UTC) FILETIME=[69CEE780:01CB693D]
Subject: Re: [ippm] AD review: draft-ietf-ippm-tcp-throughput-tm-07
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ippm>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Oct 2010 12:12:19 -0000

Hi Lars,

Thank you for the thorough review. We will address all technical comments and revise or clarify accordingly.  I think we can get a draft-08 version out by the end of this week.

Regarding the use of RFC2119 language in this document, some comments were made to do so and we added this to the draft-07.

These comments may have been off-list and I'd have to dig around, but I seem to recall the intent was to clarify what was essential (must do) and optional (should do) and that is how the RFC2119 language came into play.

Just to be clear, do Informational works generally (or always) refrain from using RFC2119 language?

Thanks,

Barry


-----Original Message-----
From: ippm-bounces@ietf.org [mailto:ippm-bounces@ietf.org] On Behalf Of Lars Eggert
Sent: Monday, October 11, 2010 7:18 AM
To: ippm@ietf.org WG
Subject: [ippm] AD review: draft-ietf-ippm-tcp-throughput-tm-07

Hi,

summary: Not quite ready. My largest issues are:

        (1) questionable use of RFC2119 language throughout the
            document (suggestion: remove it)

        (2) big question marks around "traffic management tests"
            (the ID acquired this material between -02 and -03)
            unclear if this is even in scope of the chartered work
            and if the approach can evaluate such schemes to any
            useful degree

        (3) inconsistencies and inaccuracies (see below)

I'm also going to assign a transport directorate reviewer to this document (or its next revision.)

Lars



INTRODUCTION, paragraph 6:
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>    document are to be interpreted as described in RFC 2119 [RFC2119].

  There are many instances below where RFC2119 terms are IMO user
  inappropriately. Do you *really* need RFC2119 language in this
  Informational document?


Section 1., paragraph 1:
>    Measuring TCP throughput provides a
>    meaningful measure with respect to the end user experience (and
>    ultimately reach some level of TCP testing interoperability which
>    does not exist today).

  In the last sentence, the parenthesized clause seems to have no
  relation to the rest of the sentence?


Section 1., paragraph 2:
>    Additionally, end-users (business enterprises) seek to conduct
>    repeatable TCP throughput tests between enterprise locations.  Since
>    these enterprises rely on the networks of the providers, a common
>    test methodology (and metrics) would be equally beneficial to both
>    parties.

  The abstract says the goal of this draft is for measuring an "an
  end-to-end managed network environment". When end-users start to
  measure paths between locations, their measurements are not normally
  taken from an an end-to-end *managed* network environment. Are we
  focusing this ID on an end-to-end managed network environment, or is
  the intent to support more general end-user measurements? (Whatever it
  is, the ID needs to be consistent.)


Section 1., paragraph 6:
>    This methodology proposes a test which SHOULD be performed in
>    addition to traditional Layer 2/3 type tests, which are conducted to
>    verify the integrity of the network before conducting TCP tests.

  I don't think it's appropriate to use an RFC2119 term here.


Section 1.1, paragraph 3:
>    - Customer Provided Equipment (CPE), refers to customer owned

  "customer owned" what?


Section 1.1, paragraph 8:
> *  Bottleneck Bandwidth and Bandwidth are used synonomously in this
>    document.
> ** Most of the time the Bottleneck Bandwidth is in the access portion
>    of the wide area network (CE - PE)

  Footnotes in RFCs don't work well. Suggest to include the comment in
  the list above. Also: Nit: s/synonomously/synonymously/


Section 2., paragraph 0:
>    Note that the NUT may consist of a variety of devices including (and
>    NOT limited to): load balancers, proxy servers, WAN acceleration
>    devices.  The detailed topology of the NUT MUST be considered when
>    conducting the TCP throughput tests, but this methodology makes no
>    attempt to characterize TCP performance related to specific network
>    architectures.

  I don't understand what "MUST be considered" means - the paragraph
  leaves it completely open in which way this consideration is supposed
  to happen. (Esp. problematic because you use an RFC2119 term.)


Section 2., paragraph 3:
>    - The methodology is not intended to definitively benchmark TCP
>    implementations of one OS to another, although some users MAY find
>    some value in conducting qualitative experiments.

  I don't think it's appropriate to use an RFC2119 term here.


Section 2., paragraph 4:
>    - The methodology is not intended to provide detailed diagnosis
>    of problems within end-points or the network itself as related to
>    non-optimal TCP performance, although a results interpretation
>    section for each test step MAY provide insight into potential
>    issues within the network.

  I don't think it's appropriate to use an RFC2119 term here.


Section 2., paragraph 5:
>    - The methodology does not propose a method to operate permanently
>    with high measurement loads. TCP performance and optimization data of
>    operational networks MAY be captured and evaluated by using data of
>    the "TCP Extended Statistics MIB" [RFC4898].

  I don't think it's appropriate to use an RFC2119 term here.


Section 2., paragraph 9:
>    - Provide a practical test approach that specifies well understood,
>    end-user configurable TCP parameters such as TCP Window size, MSS
>    (Maximum Segment Size), number of connections, and how these affect
>    the outcome of TCP performance over a network.

  The window size for a flow is not typically end-user configurable.
  Neither is the MSS (because PMTUD/Nagle is in use.)


Section 2., paragraph 10:
>    - Provide specific test conditions (link speed, RTT, TCP Window size,
>    etc.) and maximum achievable TCP throughput under TCP Equilibrium
>    conditions.

  Again, window size is not typically a configurable OS parameter.


Section 2., paragraph 12:
>    - In test situations where the RECOMMENDED procedure does not yield
>    the maximum achievable TCP throughput result, this methodology
>    provides some possible areas within the end host or network that
>    SHOULD be considered for investigation (although again, this
>    methodology is not intended to provide a detailed diagnosis of these
>    issues).

  I don't think it's appropriate to use an RFC2119 term here.


Section 2.1, paragraph 5:
>    The following diagram depicts these phases.

  The diagram does not show a "retransmission phase".


Section 2.1, paragraph 7:
>    This TCP methodology provides guidelines to measure the equilibrium
>    throughput which refers to the maximum sustained rate obtained by
>    congestion avoidance before packet loss conditions occur (which MAY
>    cause the state change from congestion avoidance to a retransmission
>    phase). All maximum achievable throughputs specified in Section 3 are
>    with respect to this equilibrium state.

  I don't think it's appropriate to use an RFC2119 term here.


Section 2.2, paragraph 2:
>    The first metric is the TCP Transfer Time, which is simply the
>    measured time it takes to transfer a block of data across
>    simultaneous TCP connections.  The concept is useful when
>    benchmarking traffic management techniques, where multiple
>    connections MAY be REQUIRED.

  I don't think it's appropriate to use RFC2119 terms here. (And I have
  no idea what "MAY be REQUIRED" is supposed to indicate.)


Section 2.2, paragraph 7:
>    Table 2.2: Link Speed, RTT, TCP Throughput, Ideal TCP Transfer time

  Please give all link speeds in Mb/s, otherwise folks can't check the
  math. Please also give a formula that shows how exactly you computed
  these numbers (esp. since there are errors or inaccuracies, see
  below.) You really need to double-check the calculations here.


Section 2.2, paragraph 8:
>    Link                 Maximum Achievable     Ideal TCP Transfer time
>    Speed     RTT (ms)   TCP Throughput(Mbps)   Time in seconds
>    --------------------------------------------------------------------
>     T1          20              1.17                  684.93
>     T1          50              1.40                  570.61
>     T1         100              1.40                  570.61

  Why is the performance the same for 50 and 100 ms RTT?


Section 2.2, paragraph 9:
>     T3          10             42.05                   19.03
>     T3          15             42.05                   19.03
>     T3          25             41.52                   18.82

  Why is the transfer time faster for 25ms than for 15ms?


Section 2.2, paragraph 10:
>     T3(ATM)     10             36.50                   21.92
>     T3(ATM)     15             36.23                   22.14
>     T3(ATM)     25             36.27                   22.05

  Why is the transfer time faster for 25ms than for 15ms?


Section 2.2, paragraph 12:
>     *   Calculation is based on File Size in Bytes X 8 / TCP Throughput.
>     **  TCP Throughput is derived from Table 3.3.

  I see no asterisks in Table 2.2 - where do these remarks belong?


Section 2.2, paragraph 20:
>    And the third metric is the Buffer Delay Percentage, which represents
>    the increase in RTT during a TCP throughput test from the inherent
>    network RTT (baseline RTT).  The baseline RTT is the round-trip time
>    inherent to the network path under non-congested conditions.

  Note that the RTT across some paths is a factor of the load, esp. on
  paths that include wide-area radio links. Also, is the baseline RTT
  the shortest measured TCP RTT sample?


Section 2.2, paragraph 21:
>    The Buffer Delay Percentage is defined as:
>               Average RTT during Transfer - Baseline RTT
>               ------------------------------------------ x 100
>                              Baseline RTT

  How do you measure the average RTT during the transfer? Do you want to
  average over all TCP RTT samples? If so, how do you get at these? Do
  you do out-of-band measurements?


Section 2.2, paragraph 22:
>    As an example, the baseline RTT for the network path is 25 msec.
>    During the course of a TCP transfer, the average RTT across the
>    entire transfer increased to 32 msec.  In this example, the Buffer
>    Delay Percentage WOULD be calculated as:

  OK, so by now I am convinced that this ID should simply refrain from
  using any RFC2119 terms. (Which, by the way, do not include "WOULD".)


Section 1., paragraph 0:
>    1. Identify the Path MTU.  Packetization Layer Path MTU Discovery
>    or PLPMTUD, [RFC4821], MUST be conducted to verify the maximum
>    network path MTU.  Conducting PLPMTUD establishes the upper limit for
>    the MSS to be used in subsequent steps.

  You cannot run RFC4821 in isolation - it it a technique to be used
  when implementing a transport protocol. The Linux stack AFAIK does
  implement it, but many other stacks do not. The best you can do is
  identify the PMTU for a flow while or after that flow is being
  transmitted. (Because due to ECMP it may be five-tuple dependent.)


Section 3., paragraph 0:
>    3. TCP Connection Throughput Tests.  With baseline measurements
>    of Round Trip Time and bottleneck bandwidth, a series of single and
>    multiple TCP connection throughput tests SHOULD be conducted to
>    baseline the network performance expectations.

  Which exact series should be tested here? (Even unclear after reading
  Section 3.3).


Section 4., paragraph 0:

>    4. Traffic Management Tests.  Various traffic management and queuing
>    techniques SHOULD be tested in this step, using multiple TCP
>    connections.  Multiple connection testing SHOULD verify that the
>    network is configured properly for traffic shaping versus policing,
>    various queuing implementations, and RED.

  Even less precise than the previous paragraph. You really can't say
  "SHOULD" and then wave your hands on what exactly is to be done.
  (Even unclear after reading Section 3.4.)

Section 4., paragraph 4:
>    - Most importantly, the TCP test host must be capable of generating
>    and receiving stateful TCP test traffic at the full link speed of the

  What do you mean by "stateful"?


Section 3.1., paragraph 0:
> 3.1. Determine Network Path MTU

  It seems that you intend to describe a mechanism that runs RFC4821
  before you run the actual throughput test. That's not really how
  stacks work; RFC4821 (if it's implemented at all) is integrated into
  the stack, because the PMTUD is flow-specific thanks to ECMP. *If* you
  want to describe a protocol wrapper for running RFC4821 in a
  standalone manner, you need to fully specify that wrapper (packet
  formats, state machine, etc.) instead of just outlining its operation.


Section 3.2.1, paragraph 3:
>    During the actual sustained TCP throughput tests, RTT MUST be
>    measured along with TCP throughput. Buffer delay effects can be
>    isolated if RTT is concurrently measured.

  Out-of-band RTT measurements won't necessarily measure the RTT
  increase seen by the TCP stream, although they will measure the RTT
  increase sen by competing traffic (that happens to have chosen the
  same five-tuple as the OOB  RTT measurement). What is the intent here?
  (In-band measurements are difficult on uninstrumented TCP stacks.)


Section 3.2.1, paragraph 7:
>   - ICMP Pings MAY also be adequate to provide round trip time
>    estimations.  Some limitations of ICMP Ping MAY include msec
>    resolution and whether the network elements respond to pings (or
>    block them).

  ICMP is often rate-limited and segregated into different queues, so
  it's not as reliable and accurate as in-band measurements.


Section 3.3.1, paragraph 0:
> 3.3.1 Calculate Ideal TCP Window Size

  Why does the "ideal" TCP window size even matter? First, most stacks
  do auto-tuning these days. Second, as long as the window is larger
  than what you need for the BDP, you should be fine. Isn't it enough to
  require that windows should be large enough?


Section 3.3.1, paragraph 6:
>    Table 3.3: Link Speed, RTT and calculated BDP, TCP Throughput

  See my comments about Table 2.2.


Section 3.3.4, paragraph 2:
>    - Network congestion causing packet loss which MAY be inferred from
>    a poor TCP Efficiency metric (100% = no loss)

  "No loss" for TCP means that you're receive window limited. A TCP flow
  that will always show some losses, otherwise it isn't fully using the
  path.


Section 3.3.4, paragraph 3:
>    - Network congestion causing an increase in RTT which MAY be inferred
>    from the Buffer Delay metric (0% = no increase in RTT over baseline)

  Unless you're running a delay-based congestion controller, it will be
  really rare to see delay go up but no loss occurring.


Section 3.4.1, paragraph 3:
>    Traffic shaping is generally configured for TCP data services and
>    can provide improved TCP performance since the retransmissions are
>    reduced, which in turn optimizes TCP throughput for the given
>    available bandwidth.

  It's not clear at all to me that traffic shaping will improve
  performance. If you simply buffer data past the allocated rate, TCP
  will not slow down until you drop, plus then you may have introduced a
  long delay-spike in the payload stream data.


Section 3.4.1.1, paragraph 1:
>    By plotting the throughput achieved by each TCP connection, the fair
>    sharing of the bandwidth is generally very obvious when traffic
>    shaping is properly configured for the bottleneck interface.  For the
>    previous example of 5 connections sharing 500 Mbps, each connection
>    would consume ~100 Mbps with a smooth variation.  If traffic policing
>    was present on the bottleneck interface, the bandwidth sharing MAY
>    not be fair and the resulting throughput plot MAY reveal "spikey"
>    throughput consumption of the competing TCP connections (due to the
>    retransmissions).

  The spikes aren't due to retransmissions, they're due to loss
  probabilities not being fully equal, so that some flows end up with a
  larger capacity share then others for a while.


Section 3.4.2, paragraph 1:
>    Random Early Discard techniques are specifically targeted to provide
>    congestion avoidance for TCP traffic.  Before the network element
>    queue "fills" and enters the tail drop state, RED drops packets at
>    configurable queue depth thresholds.  This action causes TCP
>    connections to back-off which helps to prevent tail drop, which in
>    turn helps to prevent global TCP synchronization.

  But mostly, it keeps the queues (= delay) short compared to FIFO.


Section 3.4.2, paragraph 2:

>    Again, rate limited interfaces can benefit greatly from RED based
>    techniques.  Without RED, TCP is generally not able to achieve the
>    full bandwidth of the bottleneck interface.  With RED enabled, TCP
>    congestion avoidance throttles the connections on the higher speed
>    interface (i.e. LAN) and can reach equilibrium with the bottleneck
>    bandwidth (achieving closer to full throughput).

  I think this paragraph is overstating things. Yes, RED reduces the
  see-saw and can improve capacity utilization. But TCP can get within a
  few percent of link utilization even with FIFO (depends on the path
  parameters).


Section 3.4.2, paragraph 3:
>    The ability to detect proper RED configuration is more easily
>    diagnosed when conducting a multiple TCP connection test.  Multiple
>    TCP connections provide the multiple bursty sources that emulate the
>    real-world conditions for which RED was intended.

  Multiple parallel bulk transfers to not provide multiple bursty
  sources. If you want burstiness, you need to start and stop many
  shorter flows according to some distribution.


Section 3.4.2.1, paragraph 1:
>    The default queuing technique for most network devices is FIFO based.
>    Without RED, the FIFO based queue will cause excessive loss to all of
>    the TCP connections and in the worst case global TCP synchronization.

  But only in the very artificial case of a few perfectly synchronized
  bulk transfers. With more statmux, there is little issue.