Re: [ippm] How should capacity measurement interact with shaping?

"MORTON, ALFRED C (AL)" <acm@research.att.com> Thu, 19 September 2019 22:35 UTC

Return-Path: <acm@research.att.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3B7C120046 for <ippm@ietfa.amsl.com>; Thu, 19 Sep 2019 15:35:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RZIO2bfK-2YF for <ippm@ietfa.amsl.com>; Thu, 19 Sep 2019 15:35:22 -0700 (PDT)
Received: from mx0a-00191d01.pphosted.com (mx0a-00191d01.pphosted.com [67.231.149.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B5A6120043 for <ippm@ietf.org>; Thu, 19 Sep 2019 15:35:21 -0700 (PDT)
Received: from pps.filterd (m0049295.ppops.net [127.0.0.1]) by m0049295.ppops.net-00191d01. (8.16.0.27/8.16.0.27) with SMTP id x8JMZFQL004913; Thu, 19 Sep 2019 18:35:16 -0400
Received: from tlpd255.enaf.dadc.sbc.com (sbcsmtp3.sbc.com [144.160.112.28]) by m0049295.ppops.net-00191d01. with ESMTP id 2v4hshs35m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 19 Sep 2019 18:35:16 -0400
Received: from enaf.dadc.sbc.com (localhost [127.0.0.1]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id x8JMZEQt006309; Thu, 19 Sep 2019 17:35:15 -0500
Received: from zlp30497.vci.att.com (zlp30497.vci.att.com [135.46.181.156]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id x8JMZ8WE006152 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 19 Sep 2019 17:35:08 -0500
Received: from zlp30497.vci.att.com (zlp30497.vci.att.com [127.0.0.1]) by zlp30497.vci.att.com (Service) with ESMTP id D8DE440006A4; Thu, 19 Sep 2019 22:35:08 +0000 (GMT)
Received: from clpi183.sldc.sbc.com (unknown [135.41.1.46]) by zlp30497.vci.att.com (Service) with ESMTP id 9AD0C4000698; Thu, 19 Sep 2019 22:35:08 +0000 (GMT)
Received: from sldc.sbc.com (localhost [127.0.0.1]) by clpi183.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id x8JMZ7S5007539; Thu, 19 Sep 2019 17:35:08 -0500
Received: from mail-azure.research.att.com (mail-azure.research.att.com [135.207.255.18]) by clpi183.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id x8JMYw0S006856; Thu, 19 Sep 2019 17:34:58 -0500
Received: from exchange.research.att.com (njbdcas1.research.att.com [135.197.255.61]) by mail-azure.research.att.com (Postfix) with ESMTP id DFBA5E1302; Thu, 19 Sep 2019 18:34:06 -0400 (EDT)
Received: from njmtexg5.research.att.com ([fe80::b09c:ff13:4487:78b6]) by njbdcas1.research.att.com ([fe80::8c6b:4b77:618f:9a01%11]) with mapi id 14.03.0468.000; Thu, 19 Sep 2019 18:34:53 -0400
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: Matt Mathis <mattmathis@google.com>, "Ruediger.Geib@telekom.de" <Ruediger.Geib@telekom.de>
CC: "ippm@ietf.org" <ippm@ietf.org>, "CIAVATTONE, LEN" <lc9892@att.com>
Thread-Topic: How should capacity measurement interact with shaping?
Thread-Index: AQHVU4HrkoUGktYwE0yJQr8LfUwZ06b/aQHwgAQxteCAL6Jm4IAAvlkA//++dPA=
Date: Thu, 19 Sep 2019 22:34:52 +0000
Message-ID: <4D7F4AD313D3FC43A053B309F97543CFA0AF94D0@njmtexg5.research.att.com>
References: <CAH56bmBmywKg_AxsHnRf97Pfxu4Yjsp_fv_s4S7LXk1voQpV1g@mail.gmail.com> <4D7F4AD313D3FC43A053B309F97543CFA0ADC777@njmtexg4.research.att.com> <LEXPR01MB05607E081CB169E34587EEEF9CA10@LEXPR01MB0560.DEUPRD01.PROD.OUTLOOK.DE> <4D7F4AD313D3FC43A053B309F97543CFA0AF9184@njmtexg5.research.att.com> <CAH56bmC3gDEDF0wypcN2Lu+Ken3E7f_zXf_5yYbJGURBsju22w@mail.gmail.com>
In-Reply-To: <CAH56bmC3gDEDF0wypcN2Lu+Ken3E7f_zXf_5yYbJGURBsju22w@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [69.141.203.172]
Content-Type: multipart/alternative; boundary="_000_4D7F4AD313D3FC43A053B309F97543CFA0AF94D0njmtexg5researc_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-09-19_05:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1909190189
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/C9x-pRWh7viWdRS3YrvpZZVJ3vI>
Subject: Re: [ippm] How should capacity measurement interact with shaping?
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Sep 2019 22:35:27 -0000

Thanks Matt!  This is an interesting trace to consider,
and an important discussion to share with the group.

When I look at the equation for BBR:
https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

both BBR and Maximum IP-layer Capacity Metric seek the
Max over some time interval. The window seems smaller for
BBR: 6 to 10 RTTs, where we’ve been using parameters that
result in a rate measurement once a second and take the max
of the 10 one-second measurements.

We’ve also evaluate several performance metrics when
adjusting load, and that determines how high the sending
rate will go (based on feedback from the receiver).
https://tools.ietf.org/html/draft-morton-ippm-capcity-metric-method-00#section-4.3

So, it seems that the MAX delivered rate for the 10 second test,  we
can all see is 94.5 Mbps. This rate was sustained for more
than a trivial amount of time, too. But if you are concerned that this
rate was somehow inflated by a large buffer and a large
burst tolerance in the shaper – that’s where the additional
metrics and slightly different sending rate control
that we described in the draft (and the slides) might help.
https://datatracker.ietf.org/meeting/105/materials/slides-105-ippm-metrics-and-methods-for-ip-capacity-00

IOW, it might well be that Max IP Capacity, measured as we designed
and parameterized it, measures 83 Mbps for this path
(assuming the 94.5 is the result of big overshoot at sender, and the
fluctuating performance afterward seems to support that).

When I was looking for background on BBR, I saw a paper comparing
BBR and CUBIC during drive tests.
http://web.cs.wpi.edu/~claypool/papers/driving-bbr/
One pair of plots seemed to indicate that BBR sent lots of Bytes
early-on, and grew the RTT pretty high before settling down
(Figure 5, a & b).
This looks a bit like the case you described below,
except 94.5 Mbps is a Received Rate – we don’t know
what came out of the network, just what went in and filled
a buffer before crashing down in the drive test.

So, I think I did more investigation than justification
for my answers, but I conclude the parameters like the
individual measurement intervals and overall time interval
from which the Max is drawn, plus the rate control algorithm
itself, play a big role here.

regards,
Al


From: Matt Mathis [mailto:mattmathis@google.com]
Sent: Thursday, September 19, 2019 5:18 PM
To: MORTON, ALFRED C (AL) <acm@research.att.com>; Ruediger.Geib@telekom.de
Cc: ippm@ietf.org
Subject: Fwd: How should capacity measurement interact with shaping?

Ok, moving the thread to IPPM

Some background, we (Measurement Lab) are testing a new transport (TCP) performance measurement tool, based on BBR-TCP.   I'm not ready to talk about results yet (well ok, it looks pretty good).    (BTW the BBR algorithm just happens to resemble the algorithm described in draft-morton-ippm-capcity-metric-method-00.)

Anyhow we noticed some interesting performance features for number of ISPs in the US and Europe and I wanted to get some input for how these cases should be treated.

One data point, a single trace saw ~94.5 Mbit/s for ~4 seconds, fluctuating performance ~75 Mb/s for ~1 second and then stable performance at ~83Mb/s for the rest of the 10 second test.    If I were to guess this is probably a policer (shaper?) with a 1 MB token bucket and a ~83Mb/s token rate (these numbers are not corrected for header overheads, which actually matter with this tool).  What is weird about it is that different ingress interfaces to the ISP (peers or serving locations) exhibit different parameters.

Now the IPPM measurement question:   Is the bulk transport capacity of this link ~94.5 Mbit/s or ~83Mb/s?   Justify your answer....?

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of control;
            too weak risks being mistaken for tacit approval.

Forwarded Conversation
Subject: How should capacity measurement interact with shaping?
------------------------

From: Matt Mathis <mattmathis@google.com<mailto:mattmathis@google.com>>
Date: Thu, Aug 15, 2019 at 8:55 AM
To: MORTON, ALFRED C (AL) <acm@research.att.com<mailto:acm@research.att.com>>

We are seeing shapers  with huge bucket sizes, perhaps as larger or larger than 100 MB.

These are prohibitive to test by default, but can have a huge impact in some common situations.  E.g. downloading software updates.

An unconditional pass is not good, because some buckets are small.  What counts as large enough to be ok, and what "derating" is ok?

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of control;
            too weak risks being mistaken for tacit approval.

----------
From: MORTON, ALFRED C (AL) <acm@research.att.com<mailto:acm@research.att.com>>
Date: Mon, Aug 19, 2019 at 5:08 AM
To: Matt Mathis <mattmathis@google.com<mailto:mattmathis@google.com>>
Cc: CIAVATTONE, LEN <lc9892@att.com<mailto:lc9892@att.com>>, Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de>>

Hi Matt, currently cruising between Crete and Malta,
with about 7 days of vacation remaining – Adding my friend Len.
You know Rüdiger. It appears I’ve forgotten how to typs in 2 weeks
given the number of typos I’ve fixed so far...

We’ve seen big buffers on a basic DOCSIS cable service (downlink >2 sec)
but,
  we have 1-way delay variation or RTT variation limits
  when searching for the max rate, that don’t many packets
  queue in the buffer

  we want the status messages that result in rate adjustment to return
 in a reasonable amount of time (50ms + RTT)

  we usually search for 10 seconds, but if we go back and test with
  a fixed rate, we can see the buffer growing if the rate is too high.

  There will eventually be a discussion on the thresholds we use
  in the search // load rate control algorithm. The copy of
  Y.1540 I sent you has a simple one, we moved beyond that now
  (see the slides I didn’t get to present at IETF).

  There is value in having some of this discussion on IPPM-list,
  so we get some *agenda time at IETF-106*

We measure rate and performance, with some performance limits
built-in.  Pass/Fail is another step, de-rating too (made sense
with MBM “target_rate”).

Al

----------
From: <Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de>>
Date: Mon, Aug 26, 2019 at 12:05 AM
To: <acm@research.att.com<mailto:acm@research.att.com>>
Cc: <lc9892@att.com<mailto:lc9892@att.com>>, <mattmathis@google.com<mailto:mattmathis@google.com>>

Hi Al,

thanks for keeping me involved. I don’t have a precise answer and doubt, that there will be a single universal truth.

If the aim is only to determine the IP bandwidth of an access, then we aren’t interested in filling a buffer. Buffering events may occur, some of which are useful and to be expected, whereas others are not desired:


  *   Sender shaping behavior may matter (is traffic at the source CBR or is it bursty)
  *   Random collisions should be tolerated at the access whose bandwidth is to be measured.
  *   Limiting packet drop due to buffer overflow is a design aim or an important part of the algorithm, I think.
  *   Shared media might create bursts. I’m not an expert in the area, but there’s an “is bandwidth available” check in some cases between a central sender using a shared medium and the receivers connected. WiFi and may be other wireless equipment buffers packets also to optimize wireless resource optimization.
  *   It might be an idea to mark some flows by ECN, once there’s a guess on a sending bitrate when to expect no or very little packet drop. Today, this is experimental. CE marks by an ECN capable device should be expected roughly once queuing starts.

Practically, the set-up should be configurable with commodity hard- and software and all metrics should be measurable at the receiver. Burstiness of traffic and a distinction between queuing events which are to be expected and (undesired) queue build up are the events to be distinguished. I hope that can be done with commodity hard- and software. I at least am not able to write down a simple metric distinguishing queues to be expected from (undesired) queue build up causing congestion. The hard- and software to be used should be part of the solution, not part of the problem (bursty source traffic and timestamps with insufficient accuracy to detect queues are what I’d like to avoid).

I’d suggest to move discussion to the list.

Regards,

Rüdiger

----------
From: MORTON, ALFRED C (AL) <acm@research.att.com<mailto:acm@research.att.com>>
Date: Thu, Sep 19, 2019 at 7:01 AM
To: Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de> <Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de>>
Cc: CIAVATTONE, LEN <lc9892@att.com<mailto:lc9892@att.com>>, mattmathis@google.com<mailto:mattmathis@google.com> <mattmathis@google.com<mailto:mattmathis@google.com>>

I’m catching-up with this thread again, but before I reply:

*** Any objection to moving this discussion to IPPM-list ?? ***

@Matt – this is a question to you at this point...

thanks,
Al

From: Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de> [mailto:Ruediger.Geib@telekom.de<mailto:Ruediger.Geib@telekom.de>]
Sent: Monday, August 26, 2019 3:05 AM
To: MORTON, ALFRED C (AL) <acm@research.att.com<mailto:acm@research.att.com>>
Cc: CIAVATTONE, LEN <lc9892@att.com<mailto:lc9892@att.com>>; mattmathis@google.com<mailto:mattmathis@google.com>
Subject: AW: How should capacity measurement interact with shaping?

Hi Al,

thanks for keeping me involved. I don’t have a precise answer and doubt, that there will be a single universal truth.

If the aim is only to determine the IP bandwidth of an access, then we aren’t interested in filling a buffer. Buffering events may occur, some of which are useful and to be expected, whereas others are not desired:

-        Sender shaping behavior may matter (is traffic at the source CBR or is it bursty)
-        Random collisions should be tolerated at the access whose bandwidth is to be measured.
-        Limiting packet drop due to buffer overflow is a design aim or an important part of the algorithm, I think.
-        Shared media might create bursts. I’m not an expert in the area, but there’s an “is bandwidth available” check in some cases between a central sender using a shared medium and the receivers connected. WiFi and may be other wireless equipment buffers packets also to optimize wireless resource optimization.
-        It might be an idea to mark some flows by ECN, once there’s a guess on a sending bitrate when to expect no or very little packet drop. Today, this is experimental. CE marks by an ECN capable device should be expected roughly once queuing starts.