Re: [ippm] How should capacity measurement interact with shaping?

Matt Mathis <mattmathis@google.com> Fri, 20 September 2019 01:43 UTC

MIME-Version: 1.0
References: <CAH56bmBmywKg_AxsHnRf97Pfxu4Yjsp_fv_s4S7LXk1voQpV1g@mail.gmail.com> <4D7F4AD313D3FC43A053B309F97543CFA0ADC777@njmtexg4.research.att.com> <LEXPR01MB05607E081CB169E34587EEEF9CA10@LEXPR01MB0560.DEUPRD01.PROD.OUTLOOK.DE> <4D7F4AD313D3FC43A053B309F97543CFA0AF9184@njmtexg5.research.att.com> <CAH56bmC3gDEDF0wypcN2Lu+Ken3E7f_zXf_5yYbJGURBsju22w@mail.gmail.com> <4D7F4AD313D3FC43A053B309F97543CFA0AF94D0@njmtexg5.research.att.com>
In-Reply-To: <4D7F4AD313D3FC43A053B309F97543CFA0AF94D0@njmtexg5.research.att.com>
From: Matt Mathis <mattmathis@google.com>
Date: Thu, 19 Sep 2019 18:43:19 -0700
Message-ID: <CAH56bmBvaFb9cT+YQUhyA4gYywhjFuhk12snFh8atB9xAA5pWg@mail.gmail.com>
To: "MORTON, ALFRED C (AL)" <acm@research.att.com>
Cc: "Ruediger.Geib@telekom.de" <Ruediger.Geib@telekom.de>, "ippm@ietf.org" <ippm@ietf.org>, "CIAVATTONE, LEN" <lc9892@att.com>
Content-Type: multipart/alternative; boundary="0000000000009f0c6a0592f2322d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/gw1ymv97FRVbu9cjcnTAQqEitrE>
Subject: Re: [ippm] How should capacity measurement interact with shaping?
Precedence: list

I am actually more interested in the philosophical questions about how this
should be reported, and what should the language be about non-stationary
available capacity.   One intersecting issue: BBR converges on both the
initial and final rate in under 2 seconds (this was a long path, so startup
took more than a second).   Do users want a quick (and relatively cheap)
test that takes 2 seconds or a longer test that is more likely to discover
the token bucket?  How long?  If we want to call them different names, what
should they be?

On the pure technical issues: BBR is still quite a moving target.   I have
a paper in draft that will shed some light on this.  It is due to be
unembargoed sometime in October.
BBRv1 (released slightly after the cacm paper you mention) measures the
max_BW every 8 RTT.  BBRv2 measures the max_BW on a sliding schedule that
loosely matches CUBIC.  (In both, min_RTT is measured every 10 seconds, in
the absence of organic low RTT samples).   BBRv2 uses additional signals
and does a much better job of avoiding overshoot at the startup.

In any case the best (most stable) BBR based metric seems to be
delta(snd.una)/elapsed_time, which is the progress as seen by upper
layers.  If you look at short time slices (we happen to be using 0.25
seconds) you see a mostly crisp square wave.  If you average from the
beginning of the connection to now, the peak rate happens at the moment the
bucket runs out of tokens, and falls towards the toke rate after that.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.


On Thu, Sep 19, 2019 at 3:35 PM MORTON, ALFRED C (AL) <acm@research.att.com>
wrote:

> Thanks Matt!  This is an interesting trace to consider,
>
> and an important discussion to share with the group.
>
>
>
> When I look at the equation for BBR:
>
>
> https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext
>
>
>
> both BBR and Maximum IP-layer Capacity Metric seek the
>
> Max over some time interval. The window seems smaller for
>
> BBR: 6 to 10 RTTs, where we’ve been using parameters that
>
> result in a rate measurement once a second and take the max
>
> of the 10 one-second measurements.
>
>
>
> We’ve also evaluate several performance metrics when
>
> adjusting load, and that determines how high the sending
>
> rate will go (based on feedback from the receiver).
>
>
> https://tools.ietf.org/html/draft-morton-ippm-capcity-metric-method-00#section-4.3
>
>
>
> So, it seems that the MAX delivered rate for the 10 second test,  we
>
> can all see is 94.5 Mbps. This rate was sustained for more
>
> than a trivial amount of time, too. But if you are concerned that this
>
> rate was somehow inflated by a large buffer and a large
>
> burst tolerance in the shaper – that’s where the additional
>
> metrics and slightly different sending rate control
>
> that we described in the draft (and the slides) might help.
>
>
> https://datatracker.ietf.org/meeting/105/materials/slides-105-ippm-metrics-and-methods-for-ip-capacity-00
>
>
>
> IOW, it might well be that Max IP Capacity, measured as we designed
>
> and parameterized it, measures 83 Mbps for this path
>
> (assuming the 94.5 is the result of big overshoot at sender, and the
>
> fluctuating performance afterward seems to support that).
>
>
>
> When I was looking for background on BBR, I saw a paper comparing
>
> BBR and CUBIC during drive tests.
>
> http://web.cs.wpi.edu/~claypool/papers/driving-bbr/
>
> One pair of plots seemed to indicate that BBR sent lots of Bytes
>
> early-on, and grew the RTT pretty high before settling down
>
> (Figure 5, a & b).
>
> This looks a bit like the case you described below,
>
> except 94.5 Mbps is a Received Rate – we don’t know
>
> what came out of the network, just what went in and filled
>
> a buffer before crashing down in the drive test.
>
>
>
> So, I think I did more investigation than justification
>
> for my answers, but I conclude the parameters like the
>
> individual measurement intervals and overall time interval
>
> from which the Max is drawn, plus the rate control algorithm
>
> itself, play a big role here.
>
>
>
> regards,
>
> Al
>
>
>
>
>
> *From:* Matt Mathis [mailto:mattmathis@google.com]
> *Sent:* Thursday, September 19, 2019 5:18 PM
> *To:* MORTON, ALFRED C (AL) <acm@research.att.com>;
> Ruediger.Geib@telekom.de
> *Cc:* ippm@ietf.org
> *Subject:* Fwd: How should capacity measurement interact with shaping?
>
>
>
> Ok, moving the thread to IPPM
>
>
>
> Some background, we (Measurement Lab) are testing a new transport (TCP)
> performance measurement tool, based on BBR-TCP.   I'm not ready to talk
> about results yet (well ok, it looks pretty good).    (BTW the BBR
> algorithm just happens to resemble the algorithm described
> in draft-morton-ippm-capcity-metric-method-00.)
>
>
>
> Anyhow we noticed some interesting performance features for number of ISPs
> in the US and Europe and I wanted to get some input for how these cases
> should be treated.
>
>
>
> One data point, a single trace saw ~94.5 Mbit/s for ~4 seconds,
> fluctuating performance ~75 Mb/s for ~1 second and then stable
> performance at ~83Mb/s for the rest of the 10 second test.    If I were to
> guess this is probably a policer (shaper?) with a 1 MB token bucket and
> a ~83Mb/s token rate (these numbers are not corrected for header overheads,
> which actually matter with this tool).  What is weird about it is that
> different ingress interfaces to the ISP (peers or serving locations)
> exhibit different parameters.
>
>
>
> Now the IPPM measurement question:   Is the bulk transport capacity of
> this link ~94.5 Mbit/s or ~83Mb/s?   Justify your answer....?
>
>
>
> Thanks,
>
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
>
> We must not tolerate intolerance;
>
>        however our response must be carefully measured:
>
>             too strong would be hypocritical and risks spiraling out of
> control;
>
>             too weak risks being mistaken for tacit approval.
>
>
>
> *Forwarded Conversation*
> *Subject: How should capacity measurement interact with shaping?*
> ------------------------
>
>
>
> From: *Matt Mathis* <mattmathis@google.com>
> Date: Thu, Aug 15, 2019 at 8:55 AM
> To: MORTON, ALFRED C (AL) <acm@research.att.com>
>
>
>
> We are seeing shapers  with huge bucket sizes, perhaps as larger or larger
> than 100 MB.
>
>
>
> These are prohibitive to test by default, but can have a huge impact in
> some common situations.  E.g. downloading software updates.
>
>
>
> An unconditional pass is not good, because some buckets are small.  What
> counts as large enough to be ok, and what "derating" is ok?
>
>
> Thanks,
>
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
>
> We must not tolerate intolerance;
>
>        however our response must be carefully measured:
>
>             too strong would be hypocritical and risks spiraling out of
> control;
>
>             too weak risks being mistaken for tacit approval.
>
>
>
> ----------
> From: *MORTON, ALFRED C (AL)* <acm@research.att.com>
> Date: Mon, Aug 19, 2019 at 5:08 AM
> To: Matt Mathis <mattmathis@google.com>
> Cc: CIAVATTONE, LEN <lc9892@att.com>, Ruediger.Geib@telekom.de <
> Ruediger.Geib@telekom.de>
>
>
>
> Hi Matt, currently cruising between Crete and Malta,
>
> with about 7 days of vacation remaining – Adding my friend Len.
>
> You know Rüdiger. It appears I’ve forgotten how to typs in 2 weeks
>
> given the number of typos I’ve fixed so far...
>
>
>
> We’ve seen big buffers on a basic DOCSIS cable service (downlink >2 sec)
>
> but,
>
>   we have 1-way delay variation or RTT variation limits
>
>   when searching for the max rate, that don’t many packets
>
>   queue in the buffer
>
>
>
>   we want the status messages that result in rate adjustment to return
>
>  in a reasonable amount of time (50ms + RTT)
>
>
>
>   we usually search for 10 seconds, but if we go back and test with
>
>   a fixed rate, we can see the buffer growing if the rate is too high.
>
>
>
>   There will eventually be a discussion on the thresholds we use
>
>   in the search // load rate control algorithm. The copy of
>
>   Y.1540 I sent you has a simple one, we moved beyond that now
>
>   (see the slides I didn’t get to present at IETF).
>
>
>
>   There is value in having some of this discussion on IPPM-list,
>
>   so we get some **agenda time at IETF-106**
>
>
>
> We measure rate and performance, with some performance limits
>
> built-in.  Pass/Fail is another step, de-rating too (made sense
>
> with MBM “target_rate”).
>
>
>
> Al
>
>
>
> ----------
> From: <Ruediger.Geib@telekom.de>
> Date: Mon, Aug 26, 2019 at 12:05 AM
> To: <acm@research.att.com>
> Cc: <lc9892@att.com>, <mattmathis@google.com>
>
>
>
> Hi Al,
>
>
>
> thanks for keeping me involved. I don’t have a precise answer and doubt,
> that there will be a single universal truth.
>
>
>
> If the aim is only to determine the IP bandwidth of an access, then we
> aren’t interested in filling a buffer. Buffering events may occur, some of
> which are useful and to be expected, whereas others are not desired:
>
>
>
>    - Sender shaping behavior may matter (is traffic at the source CBR or
>    is it bursty)
>    - Random collisions should be tolerated at the access whose bandwidth
>    is to be measured.
>    - Limiting packet drop due to buffer overflow is a design aim or an
>    important part of the algorithm, I think.
>    - Shared media might create bursts. I’m not an expert in the area, but
>    there’s an “is bandwidth available” check in some cases between a central
>    sender using a shared medium and the receivers connected. WiFi and may be
>    other wireless equipment buffers packets also to optimize wireless resource
>    optimization.
>    - It might be an idea to mark some flows by ECN, once there’s a guess
>    on a sending bitrate when to expect no or very little packet drop. Today,
>    this is experimental. CE marks by an ECN capable device should be expected
>    roughly once queuing starts.
>
>
>
> Practically, the set-up should be configurable with commodity hard- and
> software and all metrics should be measurable at the receiver. Burstiness
> of traffic and a distinction between queuing events which are to be
> expected and (undesired) queue build up are the events to be distinguished.
> I hope that can be done with commodity hard- and software. I at least am
> not able to write down a simple metric distinguishing queues to be expected
> from (undesired) queue build up causing congestion. The hard- and software
> to be used should be part of the solution, not part of the problem (bursty
> source traffic and timestamps with insufficient accuracy to detect queues
> are what I’d like to avoid).
>
>
>
> I’d suggest to move discussion to the list.
>
>
>
> Regards,
>
>
>
> Rüdiger
>
>
>
> ----------
> From: *MORTON, ALFRED C (AL)* <acm@research.att.com>
> Date: Thu, Sep 19, 2019 at 7:01 AM
> To: Ruediger.Geib@telekom.de <Ruediger.Geib@telekom.de>
> Cc: CIAVATTONE, LEN <lc9892@att.com>, mattmathis@google.com <
> mattmathis@google.com>
>
>
>
> I’m catching-up with this thread again, but before I reply:
>
>
>
> *** Any objection to moving this discussion to IPPM-list ?? ***
>
>
>
> @Matt – this is a question to you at this point...
>
>
>
> thanks,
>
> Al
>
>
>
> *From:* Ruediger.Geib@telekom.de [mailto:Ruediger.Geib@telekom.de]
> *Sent:* Monday, August 26, 2019 3:05 AM
> *To:* MORTON, ALFRED C (AL) <acm@research.att.com>
> *Cc:* CIAVATTONE, LEN <lc9892@att.com>; mattmathis@google.com
> *Subject:* AW: How should capacity measurement interact with shaping?
>
>
>
> Hi Al,
>
>
>
> thanks for keeping me involved. I don’t have a precise answer and doubt,
> that there will be a single universal truth.
>
>
>
> If the aim is only to determine the IP bandwidth of an access, then we
> aren’t interested in filling a buffer. Buffering events may occur, some of
> which are useful and to be expected, whereas others are not desired:
>
>
>
> -        Sender shaping behavior may matter (is traffic at the source CBR
> or is it bursty)
>
> -        Random collisions should be tolerated at the access whose
> bandwidth is to be measured.
>
> -        Limiting packet drop due to buffer overflow is a design aim or
> an important part of the algorithm, I think.
>
> -        Shared media might create bursts. I’m not an expert in the area,
> but there’s an “is bandwidth available” check in some cases between a
> central sender using a shared medium and the receivers connected. WiFi and
> may be other wireless equipment buffers packets also to optimize wireless
> resource optimization.
>
> -        It might be an idea to mark some flows by ECN, once there’s a
> guess on a sending bitrate when to expect no or very little packet drop.
> Today, this is experimental. CE marks by an ECN capable device should be
> expected roughly once queuing starts.
>
>
>

[ippm] Fwd: How should capacity measurement inter… Matt Mathis
Re: [ippm] How should capacity measurement intera… MORTON, ALFRED C (AL)
Re: [ippm] How should capacity measurement intera… Matt Mathis
Re: [ippm] How should capacity measurement intera… MORTON, ALFRED C (AL)
Re: [ippm] How should capacity measurement intera… Matt Mathis
Re: [ippm] How should capacity measurement intera… Ruediger.Geib
Re: [ippm] How should capacity measurement intera… Matt Mathis
Re: [ippm] Fwd: How should capacity measurement i… Joachim Fabini
Re: [ippm] Fwd: How should capacity measurement i… Ruediger.Geib
Re: [ippm] Fwd: How should capacity measurement i… Joachim Fabini
Re: [ippm] Fwd: How should capacity measurement i… Ruediger.Geib
Re: [ippm] How should capacity measurement intera… J Ignacio Alvarez-Hamelin
Re: [ippm] How should capacity measurement intera… Ruediger.Geib
Re: [ippm] How should capacity measurement intera… MORTON, ALFRED C (AL)