Re: [ippm] September Summary on Max IP-Layer Capacity Metric

There is another consideration to any short duration IP capacity
measurement:   There are several mechanisms which cause data batching.
 It is fairly normal to see packets arrive in back to back packet trains,
separated by periods of silence.   The most common causes have to do with
channel arbitration in half duplex links, but there are others such as
compressors that aggregate packets to improve encoding (I have been told
that a simplistic measurement of LTE receive rates often sees modes at
1Gb/s.)

Measuring the average rate is very tricky: if you happen to get 2 silences
and one packet train you might measure half the actual rate.  If you happen
to get two trains and one silence, you might measure twice the actual
average rate.

I recall people trying to robustly extracting an accurate rate perhaps 2
decades ago.  My fuzzy recollection  was that the algorithms were only good
enough to use for things like hinting congestion control, but never made
the grade as a deployable metric.   Unfortunately, I don't remember who did
the work.  I do think it was presented in IPPM.

BBR solves this problem in a different way - it tracks round trips.  For
every packet transmission, record a timestamp and total data ACKed by the
receiver to that point (generally equal to the total_sent -
current_inflight) .  When you receive an ACK, capture a timestamp and the
total data ACKed to that point.   Then pair each ACK with the data
captured when the corresponding segment was sent, and compute:
rtt_sample = delta(timestamp)  # 1 RTT
rate_sample = delta(total data ACKed)/rtt_sample  # one RTT's worth of data

The stream of ACKs generates a stream of singletons - nearly every ACK
generates both measurements  (There are sometimes complications having to
do with application pauses and such).

min_rtt and max_rate (used by BBR congestion control) are the windowed max
and min of the above singleton streams.

I predict that max of BBR's max_rate will be a more robust and more
accurate measure of the short duration maximum rate than anything you can
do with UDP (except perhaps using QUIC, which implements the same algorithm
over UDP).

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured:
            too strong would be hypocritical and risks spiraling out of
control;
            too weak risks being mistaken for tacit approval.

On Wed, Oct 9, 2019 at 4:19 PM MORTON, ALFRED C (AL) <acm@research.att.com>
wrote:

> Hi Joachim,
>
> Thanks for replying on the issue of sender and receiver
> measurements.
>
> Len Ciavattone and I discussed this topic further today,
> and have some thoughts to share, below.
>
> > -----Original Message-----
> > From: Joachim Fabini [mailto:joachim.fabini@tuwien.ac.at]
> > Sent: Wednesday, October 9, 2019 5:43 AM
> > To: MORTON, ALFRED C (AL) <acm@research.att.com>; ippm@ietf.org
> > Subject: Re: [ippm] September Summary on Max IP-Layer Capacity Metric
> >
> ... discussion leading to the conclusion, measure both sender and receiver
> ...
> >
> > Al wrote:
> > >
> > > We have concluded that *both* are needed, but we omitted the
> > > Sender Rate Metric from the draft.  It's actually very useful
> > > to check that the Sender achieved the desired bit rate, and to
> > > know when it doesn't in practice!
>
> Joachim wrote:
> >
> > I agree with your conclusion: having both is useful. Buffers in the
> > network may influence on either the sender or the receiver results. If
> > (a) the subpath sender->buffer has higher capacity than the subpath
> > buffer->receiver, the sender-side measurement may yield artificial
> > (optimistic) values until the buffer is filled.
> >
> > The same is true at the receiver end: if (b) the subpath
> > buffer->receiver has higher capacity than the sender-receiver subpath
> > and the buffer (for whatever reason) fills first before forwarding
> > packets to the receiver, the receiver may receive packets at a rate that
> > the network path can not sustain for an extended period. So the results
> > will be optimistic until the buffer is empty (I admit it's an
> > artificially constructed example).
> [acm]
>
> When assessing a Maximum rate as the metric specifies, the
> the "artificial (optimistic) values until the buffer is filled"
> may well be the Maximum rate observed when the method of measurement
> is searching for that Maximum, and that would not do.
> This is different from the bi-modal service rates we've discussed already,
> characterized by a multi-second duration (much longer that the
> measured RTT) and repeatable behavior.
>
> There are many ways that the Method of Measurement could handle this
> issue, and the simplest seems to come from RFC 2544 and its discussion
> of Trial duration, where relatively short trials conducted as part of the
> search are followed by longer trials to make the final determination [3].
>
> In the production network, measurements of singletons and samples
> (the terms for trials and tests of Lab Benchmarking) must be limited
> in duration because they may be service-affecting.
> But there is sufficient value in repeating a sample with a
> fixed sending rate determined by the previous search for
> the Max IP-layer Capacity, to qualify the result in terms of
> the other performance metrics measured at the same time.
>
> @@@@ So:
> A qualification measurement for the search result is a subsequent
> measurement, sending at a fixed 99.x % of the Max IP-layer Capacity
> for I, or an indefinite period. The same Max Capacity Metric is applied,
> and the Qualification for the result is a sample without packet loss
> or a growing minimum delay trend in subsequent singletons (or
> each dt of the measurement interval, I). Samples exhibiting losses or
> increasing queue occupation require a repeated search and/or test
> at reduced fixed sender rate for qualification.
>
> Here, as with any Active Capacity test, the test duration must be kept
> short. 10 second tests for each direction of transmission are common today.
> In combination with a fast search method and user-network coordination,
> the concerns raised in [4] are alleviated.
>
> >
> > As a side-note, in both cases the ability to timestamp packets at
> > ingress/egress and have accurate global (or relative) time
> > synchronization at sender and receiver may help in identifying the
> > buffering. The measured end-to-end delay will increase in case (a) and
> > decrease in case (b).
> [acm]
>
> We don't want to put too much pressure on the simple equipment that
> may be making this measurement, but time sync and relative accuracy
> over the test intervals will help, of course.
>
> >
> > > So, we add one more item to address in the draft:
> > >
> > > @@@@ Add a metric on Sender Rate, as both a
> > >   + Parameter to the IP-layer Capacity Metric Definition
> > >   + A Metric at the Src, partly as a check that the desired
> > >     Parameter was achieved, or was capable of being achieved.
> > >
> > > Thanks for this point, Joachim & Rüdiger.
> > > It was a clear omission in the draft,
> > > and should be an easy fix because we have
> > > provided the definition in other work/SDOs.
> >
> > You're welcome, I'm glad it helped.
> >
> > regards
> > Joachim
> >
> >
> > > PS: We have both in Lab Benchmarking, where RFC 2544 Throughput is
> > > based on Offered Load, and RFC 2889 Max Frame Rate is defined
> > > at the receiver. The useful cross-over between BMWG & IPPM continues.
> [acm]
>
> [3] https://tools.ietf.org/html/rfc2544#section-24
>
> [4] https://tools.ietf.org/html/rfc6815
>    - Max IP Capacity is a different method:
>    it uses short term load adjustment and is sensitive to loss and delay,
>    like other congestion control algorithms in use every day!!!
>
> >
> >
> >
> > >> -----Original Message-----
> > >> From: ippm [mailto:ippm-bounces@ietf.org] On Behalf Of MORTON,
> ALFRED C
> > >> (AL)
> > >> Sent: Sunday, September 29, 2019 5:41 PM
> > >> To: ippm@ietf.org
> > >> Subject: [ippm] September Summary on Max IP-Layer Capacity Metric
> > >>
> > >>
> > >> IPPM List September Summary on Max IP-Layer Capacity Metric
> > >> (Re: [ippm] How should capacity measurement interact with shaping?)
> > >> currently draft-morton-ippm-capcity-metric-measurement-00
> > >>
> > >> We've had a very good discussion of many important
> > >> aspects of IP layer Capacity Metric/Measurements, including:
> > >>
> > >> + Recognizing how an alt. flow control for TCP (BBR) uses a similar
> > metric
> > >> + Reporting the results under unusual circumstances
> > >> + Bringing IPPM's documented experience and literature to the problem
> > >> + Gaining experience from each-other's measurements/research
> > >> + Suggestion of related work areas
> > >>
> > >> It's useful to summarize many pages of discussion from time to
> > >> time: we can capture (what the summarizer thinks) we learned,
> > >> and new readers can join the discussion more easily.
> > >> With those goals in mind, a humble attempt to summarize follows.
> > >> Feel free to set me straight in a concise way, of course.
> > >>
> > >> @@@@ is a flag for take-aways; items to address in the draft.
> > >>
> > >> Matt Mathis engaged the "capcity" draft authors shortly
> > >> after IETF-105, and kindly agreed to foster wider review
> > >> on the ippm-list. There's a whole lot of *shaping* going on [0].
> > >> Matt's M-Lab measurements revealed a clear case of bi-modal
> > >> maximum rates (94 & 83 Mbps), consistent with a service feature
> > >> in the context of Shaping, and Rüdiger shared his experiences
> > >> with fixed access shaper design.
> > >> @@@@ A clear take-away is that reporting must account for such a
> > >> bimodal feature, if/when measured.
> > >> @@@@ Also, that wide-spread measurements will encounter wide-spread
> > >> behaviors - testing should continue + expect some evolution.
> > >>
> > >> Joachim and Rüdiger discussed the situation further, confirming
> > >> how buffers play a big part in the assessment and performance.
> > >> When answering the reporting question, the measurement time interval
> > >> (long-term?, many different shapers and on-demand technology
> > >> may be encountered, as anticipated in RFC 7312) play a key role.
> > >> Joachim also provided two key points of reasoning for BTC (RFC 3148):
> > >> categorize the influencing factors and refine the 3148 definition.
> > >> The discussion covered LTE public networks with on-demand access
> > >> and shared resources.
> > >>
> > >> @@@@ IMO, many of the above challenges fall on the measurement
> > >> methodology: allow for traffic & time to initiate an on-demand access.
> > >> @@@@ Also, results depend on the sending stream characteristics;
> > >> we've known this for a long time, still need to keep it front of mind.
> > >> @@@@ Max IP-Layer Capacity and RFC 3148 BTC (goodput) are different
> > >> metrics. Max IP-layer Capacity is like the theoretical goal for
> > goodput.
> > >>
> > >> @@@@ This is a big one: when the path we measure is state-full based
> on
> > >> many factors, the Parameter "Time of day" when a test starts is not
> > >> enough info. We need to know the time from the beginning of a
> > >> measured flow, and how the flow is constructed including how much
> > >> traffic has already been sent on that flow, because state-change
> > >> may be based on time or bytes sent or both. Re-read RFC 7312.
> > >>
> > >> @@@@ The Singleton and Statistic formulations of IPPM's framework
> > >> RFC 2330 are still valuable in this context, possibly combined with
> > >> results criteria ("stable" for X singletons, non-arbitrary threshold
> > >> needed to define "stable").
> > >>
> > >> Rüdiger proposed a back-to-back stream for BTC characterization.
> > >> Joachim felt this b2b test might be a pre-requisite to measure a
> > >> BTC singleton.
> > >> [acm] it's a tricky test in production networks, see [1]
> > >>
> > >> @@@@ Measurements depend on the access network and the use case.
> > >> Here, the use case is to assess the maximum capacity of the
> > >> access network, with specific performance criteria used in the
> > >> measurement.
> > >>
> > >> Finally, an exchange between Ignacio and Rüdiger brings us
> > >> back to first-principles: What are you trying to measure, and
> > >> what does it mean? What does it matter to demonstrate that
> > >> a portion of the network can reach a published value?
> > >> What capacity is available 100% of the time: you cannot
> > >> make measurements that saturate the network 100% of the time?
> > >> Rüdiger responded that this effort has very specific goals,
> > >> to demonstrate that the performance promised is present when
> > >> requested to do so, consistent with the metric proposed.
> > >> There are *many* other metrics, such as available BW.
> > >> Ignacio had some measurement proposals for what may be a
> > >> different network performance metric (IMO).
> > >>
> > >> @@@@ Goals made clearer in the next draft, if possible.
> > >>
> > >> Well, that's a long summary, and we have identified many work
> > >> items for the draft. We also have more measurements (and
> > >> therefore, more useful experiences) coming.
> > >>
> > >> Thanks to all who commented so far, very helpful stuff.
> > >> We look forward to additional discussion and suggestions! [2]
> > >>
> > >> regards,
> > >> Al
> > >>
> > >> [0] apologies to Jerry Lee Louis:
> > >> https://urldefense.proofpoint.com/v2/url?u=https-
> > >> 3A__www.youtube.com_watch-3Fv-3D1dC0DseCyYE&d=DwIFAw&c=LFYZ-
> > >>
> >
> o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=bbgCkEjNrPRLEewNG6ZmB_sgyglVu
> > >> M-SdbxPtJaxIWQ&s=neeGM557r0t9U2sr1X6A7GClYDTLjgvE04-cMFxL5MA&e=
> > >>
> > >> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> > >> 3A__tools.ietf.org_html_draft-2Dietf-2Dbmwg-2Db2b-2Dframe-
> > >> 2D00&d=DwIFAw&c=LFYZ-
> > >>
> >
> o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=bbgCkEjNrPRLEewNG6ZmB_sgyglVu
> > >> M-SdbxPtJaxIWQ&s=jqU4ecqKIViAJthqNnzDl7B2eHGmjAndjVhLw4YsP8Y&e=
> > >>
> > >> [2] It would be good to create threads on specific topics in future,
> > but
> > >> Keep those cards and letters coming-in, folks!
> > >>
> > >> _______________________________________________
> > >> ippm mailing list
> > >> ippm@ietf.org
> > >> https://urldefense.proofpoint.com/v2/url?u=https-
> > >> 3A__www.ietf.org_mailman_listinfo_ippm&d=DwIFAw&c=LFYZ-
> > >>
> >
> o9_HUMeMTSQicvjIg&r=OfsSu8kTIltVyD1oL72cBw&m=bbgCkEjNrPRLEewNG6ZmB_sgyglVu
> > >> M-SdbxPtJaxIWQ&s=KLFtWoMazukYq_Aqq2C67G4rzNW5De7fMNKdbYq9smQ&e=
> > >
> > > _______________________________________________
> > > ippm mailing list
> > > ippm@ietf.org
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__www.ietf.org_mailman_listinfo_ippm&d=DwIDaQ&c=LFYZ-
> > o9_HUMeMTSQicvjIg&r=_6cen3Hn-e_hOm0BhY7aIpA58dd19Z9qGQsr8-6zYMI&m=-
> >
> AM7jS5ILtkbZePUUGz24VJ_cB28J9zWMJ7Uape2Yxo&s=P8xvCZXq6ZyPDEULwO7t8a2r6JDeI
> > Z3gtdQF71kn7FU&e=
> > >
> _______________________________________________
> ippm mailing list
> ippm@ietf.org
> https://www.ietf.org/mailman/listinfo/ippm
>