Re: [ippm] Magnus Westerlund's Discuss on draft-ietf-ippm-capacity-metric-method-06: (with DISCUSS)

"MORTON, ALFRED C (AL)" <acm@research.att.com> Tue, 16 March 2021 23:43 UTC

Return-Path: <acm@research.att.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CFA713A135B; Tue, 16 Mar 2021 16:43:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E-ksSG_KjXIs; Tue, 16 Mar 2021 16:43:46 -0700 (PDT)
Received: from mx0a-00191d01.pphosted.com (mx0a-00191d01.pphosted.com [67.231.149.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 169953A1358; Tue, 16 Mar 2021 16:43:46 -0700 (PDT)
Received: from pps.filterd (m0053301.ppops.net [127.0.0.1]) by mx0a-00191d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 12GNhZFa047988; Tue, 16 Mar 2021 19:43:35 -0400
Received: from tlpd255.enaf.dadc.sbc.com (sbcsmtp3.sbc.com [144.160.112.28]) by mx0a-00191d01.pphosted.com with ESMTP id 37axb2ajns-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 16 Mar 2021 19:43:34 -0400
Received: from enaf.dadc.sbc.com (localhost [127.0.0.1]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 12GNhXl3039037; Tue, 16 Mar 2021 18:43:33 -0500
Received: from zlp30494.vci.att.com (zlp30494.vci.att.com [135.46.181.159]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 12GNhRnB038958 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 16 Mar 2021 18:43:27 -0500
Received: from zlp30494.vci.att.com (zlp30494.vci.att.com [127.0.0.1]) by zlp30494.vci.att.com (Service) with ESMTP id 5FEEF4009E78; Tue, 16 Mar 2021 23:43:27 +0000 (GMT)
Received: from clph811.sldc.sbc.com (unknown [135.41.107.12]) by zlp30494.vci.att.com (Service) with ESMTP id EAF974009E79; Tue, 16 Mar 2021 23:43:26 +0000 (GMT)
Received: from sldc.sbc.com (localhost [127.0.0.1]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 12GNhQbY004585; Tue, 16 Mar 2021 18:43:26 -0500
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.255.15]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 12GNhLTd004134; Tue, 16 Mar 2021 18:43:21 -0500
Received: from exchange.research.att.com (njmtcas1.research.att.com [135.207.255.86]) by mail-green.research.att.com (Postfix) with ESMTP id C096E10A18C7; Tue, 16 Mar 2021 19:43:20 -0400 (EDT)
Received: from njmtexg5.research.att.com ([fe80::b09c:ff13:4487:78b6]) by njmtcas1.research.att.com ([fe80::e881:676b:51b6:905d%12]) with mapi id 14.03.0513.000; Tue, 16 Mar 2021 19:43:42 -0400
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: Magnus Westerlund <magnus.westerlund@ericsson.com>, "Ruediger.Geib@telekom.de" <Ruediger.Geib@telekom.de>
CC: "tpauly@apple.com" <tpauly@apple.com>, "ianswett@google.com" <ianswett@google.com>, "draft-ietf-ippm-capacity-metric-method@ietf.org" <draft-ietf-ippm-capacity-metric-method@ietf.org>, "ippm-chairs@ietf.org" <ippm-chairs@ietf.org>, "ippm@ietf.org" <ippm@ietf.org>, "iesg@ietf.org" <iesg@ietf.org>
Thread-Topic: Magnus Westerlund's Discuss on draft-ietf-ippm-capacity-metric-method-06: (with DISCUSS)
Thread-Index: AQHXC4EzqJ5daDOk8keYoMTRANvS6qpo9txggAGffICAACldMIAGGjkAgADI6CCAAuLjkIABA+eAgAAUJgCAAB+NgP//reVwgAS5AYCAAEW4oIAEopYAgABYSYCAAEA8QIABZBMAgAY1i5A=
Date: Tue, 16 Mar 2021 23:43:42 +0000
Message-ID: <4D7F4AD313D3FC43A053B309F97543CF0147CADE55@njmtexg5.research.att.com>
References: <161426272345.2083.7668347127672505809@ietfa.amsl.com> <4D7F4AD313D3FC43A053B309F97543CF01476A0C0E@njmtexg5.research.att.com> <66f367953ae838c8ba7505c60e51367843117787.camel@ericsson.com> <4D7F4AD313D3FC43A053B309F97543CF01476A0FE3@njmtexg5.research.att.com> <HE1PR0702MB3772A66E2C0409F5A69DC7DA95999@HE1PR0702MB3772.eurprd07.prod.outlook.com> <4D7F4AD313D3FC43A053B309F97543CF0147CA50DA@njmtexg5.research.att.com> <HE1PR0702MB377281B141FBB6D63015CC1895969@HE1PR0702MB3772.eurprd07.prod.outlook.com> <FRYP281MB01127EE4544CADF8B6E6E2E19C969@FRYP281MB0112.DEUP281.PROD.OUTLOOK.COM> <HE1PR0702MB37725A93AE2748D0619DB95D95969@HE1PR0702MB3772.eurprd07.prod.outlook.com> <4D7F4AD313D3FC43A053B309F97543CF0147CA565A@njmtexg5.research.att.com> <FRYP281MB01125B1728BCEF1D721B81EE9C939@FRYP281MB0112.DEUP281.PROD.OUTLOOK.COM> <4D7F4AD313D3FC43A053B309F97543CF0147CA9031@njmtexg5.research.att.com> <VI1PR0702MB37757902F5B59F99C5D8F24995909@VI1PR0702MB3775.eurprd07.prod.outlook.com> <FRYP281MB01124DAB19CA73818AE5F3759C909@FRYP281MB0112.DEUP281.PROD.OUTLOOK.COM> <4D7F4AD313D3FC43A053B309F97543CF0147CACA0C@njmtexg5.research.att.com> <HE1PR0702MB37726F61474EDE6742DC5008956F9@HE1PR0702MB3772.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR0702MB37726F61474EDE6742DC5008956F9@HE1PR0702MB3772.eurprd07.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [24.148.42.167]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.761 definitions=2021-03-16_09:2021-03-16, 2021-03-16 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 clxscore=1015 impostorscore=0 suspectscore=0 adultscore=0 spamscore=0 mlxlogscore=999 lowpriorityscore=0 phishscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103160154
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/r9sUDcByRibHc-nKy1N5wkt3k5Q>
Subject: Re: [ippm] Magnus Westerlund's Discuss on draft-ietf-ippm-capacity-metric-method-06: (with DISCUSS)
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Mar 2021 23:43:49 -0000

Hi Magnus,

We are continuing discussion of some points among the co-authors, 
but here are some replies for this particular thread.

Al 

Open Issues:  

Wording of Applicability in section 2. (rely on the reference 
to Section 2 of [RFC7479] )

Feedback message timeout (stop test) 
  - MW: Default 500ms too high (L=10, FT=50ms), 1sec upper limit (30sec too long)

Len proposes a safety measure to back-off the rate when there is no
feedback in the time-scale of 0.1 seconds or so.
Provides clear separation between this safety measure, normal operation, 
and a timeout to terminate the test. This process is untested/not implemented.

?? Remove table entry for cc, burst count ??



> -----Original Message-----
> From: Magnus Westerlund [mailto:magnus.westerlund@ericsson.com]
> Sent: Friday, March 12, 2021 12:36 PM
> To: MORTON, ALFRED C (AL) <acm@research.att.com>; Ruediger.Geib@telekom.de
> Cc: tpauly@apple.com; ianswett@google.com; draft-ietf-ippm-capacity-
> metric-method@ietf.org; ippm-chairs@ietf.org; ippm@ietf.org; iesg@ietf.org
> Subject: RE: Magnus Westerlund's Discuss on draft-ietf-ippm-capacity-
> metric-method-06: (with DISCUSS)
> 
> Hi,
> 
> I do provide some comments inline.
> 
> As I am no longer a member of the IESG list, you will need to ensure that
> I am in to or cc field to ensure that I do receive future emails on this topic
> that you want my feedback on.
[acm] 
Yes, of course.
> 
> > >
> > > Section 2.
> > >
> > > I think the scope text is still fairly open. Yes it is clear that the
> > > load algorithm is only intended for measurements. However, the usage
> > > is not particular limited, especially in regards to my main concern of
> > > edge to central nodes across multiple AS in the Internet like the
> > > different TCP speed tests often are deployed. The security
> > > consideration requirements are good for a number of reasons but put no
> > > limitations on that aspect. So I would prefer a more explicit
> > > statement here. I think an important aspect here is that any ISP
> > > seeing issues from these measurements should know who to talk to. I
> > don't know how to best formulate that.
> 
> Okay, I will try again to reword the scope next week.
[acm] 
I think the last paragraphs in Section 2 on Applicability are where you 
want to concentrate.  As I mentioned during the IPPM meeting, the reference 
to Section 2 of [RFC7479] provides an explicit and non-Internet-wide 
applicability, and has already been agreed by IPPM WG for the problem 
statement where our memo provides a solution.

> 
> > >
> > >
> > > Section 8.1:
> > >
> > > So I am trying to understand the implication of the load algorithm at
> > > higher rates and how recommendations works out in relation to the
> > > definition of the rate.
> > >
> > > So the document says:
> > >
> > >     Each rate is defined as
> > >    datagrams of size ss, sent as a burst of count cc, each time interval
> > >    tt (default for tt is 1ms, a likely system tick-interval)
> > >
> > > So I think this definition is fine for lower rate as the number of
> > > packets in each 1 ms burst is fairly small and the buffer it hits will
> > > likely be relatively large compared to the increase in load. However
> > > at higher rates like beyond 10 GBPS where 1 GBPS steps are
> > > recommended. So transmitting at bursts every 1 ms intervals means that
> > > one are transmitting 833 packets each burst at 10 GBP rate of 1500
> > > bytes size, so likely even higher for more moderate 1200 byte size
> > > packets. That is almost 1,3 mb of data. So where pacing may be quite
> > > good at lower bit-rates < 1Gbps I wonder if it starts breaking down at
> > > higher rates, which appears to in the region where buffers becomes
> > > more shallow due to the cost of having large buffers and where good
> > > pacing reduces the need for buffering. I would also note that the
> > > reaction time for the control can be 1RTT + 50 ms which thus the
> > > increase in offered load for a step size becomes 10s of MB during an
> > > regulation period.
> > >
> > > As the load algorithm hasn't been tested beyond 10Gbps and it appears
> > > that the numbers can start to become more problematic at these speeds,
> > > wouldn't it be better to say that this is not intended beyond 10 Gbps.
> > >
> > > On the table I have the following comments:
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | Parameter    | Default     | Tested Range | Expected Safe Range
> |
> > >    |              |             | or values    | (not entirely tested,
> |
> > >    |              |             |              | other values NOT
> |
> > >    |              |             |              | RECOMMENDED)
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | FT, feedback | 50ms        | 20ms, 100ms  | 5ms <= FT <= 250ms
> |
> > >    | time         |             |              | Larger values may
> |
> > >    | interval     |             |              | slow the rate
> |
> > >    |              |             |              | increase and fail to
> |
> > >    |              |             |              | find the max
> |
> > >
> > > +--------------+-------------+--------------+-----------------------+
> > >
> > > I would note that a FT of 5 ms will have the potential to result in
> > > significant fluxtuations in some systems like mobile systems as the
> > > scheduler time is actually likely to be longer than 5 ms.
> > [acm]
> > Then we can increase the low end of the range, what value would you
> > prefer??
> 
> So, I would prefer to raise this at least to 20 ms. However, I think that
> is for general robustness without knowledge about the network one is
> measuring.
[acm] 
Ok,  20ms <= FT <= 250ms, it's in the working text.


> 
> >
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | Feedback     | L*FT, L=10  | L=100 with   | 0.5sec <= L*FT <=
> |
> > >    | message      | (500ms)     | FT=50ms      | 30sec Upper limit for
> |
> > >    | timeout      |             | (5sec)       | very unreliable test
> |
> > >    | (stop test)  |             |              | paths only
> |
> > >
> > > +--------------+-------------+--------------+-----------------------+
> > >
> > > Even the default means that one looses 10 feedback packets in a row.
> > > That is a lot and shows that one have a serious interruption on the
> > > return path. Already loosing 3 feedback packets in a row indicates
> > > that one have significant outages if this is lost.
> > >
> > > Secondly, this is formulated only based on  intervals of FT. For
> > > startup the RTT is relevant factor. So I think there are several
> > > factors here for the timeout that maybe need to be teased apart? So
> > > initially one offers a very low load and one may not have a good
> > measurement on base RTT.
> > [acm]
> > All the timeouts are conducted on inter-packet arrival times at a single
> > interface.
> > This removes the dependency on RTT.
> 
> Yes, I think you can define this as time, since you last received a feedback
> message. However, my concern with what the appropriate for a value is actually
> dependent on the RTT as it effects the control loop. So the amount of damage a
> sequence of lost feedback message is dependent on the total time as the RTT is
> effecting the reaction time. So clearly receiving no feedback message for 10
> FT intervals is bad as it represents a lot of missing packets.
[acm]
 
In the present wording, rate adjustments only take place when new feedback 
arrives. Sometimes the algorithm determines that no rate change is warranted.

Len's thought is that we could introduce a safety measure: rate reduction
each time a feedback status message is declared lost. We would need to wait
*some amount of time* to avoid over-reaction (2*FT ?), and then execute Rx-1
at 2*FT, again at 3*FT, again at 4*FT, etc. until L*FT when the procedure 
to stop a test begins.

A point is that safety measures like the above and other time-outs go
beyond the method somewhat. For example, the Load Rate Adjustment Algorithm
begins when a feedback status message arrives and ends when the new rate 
has been determined, waiting for the next message.  
The safety measure described above begins when feedback status messages 
are absent for a period of time, and haven't been tested (yet).

> 
> >
> > > Thus, time to first feedback is okay to be fairly large and 500 ms is
> > > likely okay but quite longer than expected for an access to local
> > > internet exchange measurement. However, when one scale up the rate I
> > > think these values are way to long as the total amount of traffic sent
> > > without feedback becomes quite significant. Receiving no feedback for
> > > more than 10 reporting intervals are already way to long. And to state
> > > that 30 seconds would be an acceptable value I can't support even for a
> > > measurement tool.
> > [acm]
> > We are willing to revise values, especially 30 sec, but would like to avoid
> > prematurely shutting down measurements due to (what we consider to be)
> > short interruptions.
> 
> I Understand the view that the interruption is a short time. However, a
> several second without feedback transmission continuing to transmit is
> also a significant issue. So I think everything beyond 1 second is not really
> acceptable.
[acm] 
This seems to be a very conservative opinion. How many applications 
like streaming video, VoiP, etc. will ride-out a 1 second outage and 
keep trying to restore communication on their own (so that the user
doesn't have to do it) ??

And, is there value to measure the duration of an outage when it 
occurs during testing? If you always quit quickly, because the 
Expected Safe Range demands it, you'll won't know whether an 
application would self-restore or not.  And I know that we haven't
talked about outage measurement in the draft, but we clearly intend
to measure Capacity and additional metrics.


> 
> 
> >
> > >
> > > The definition of what "Feedback message timeout" and "Load packet
> > > Timeout" is not defined. I assume that Feedback message timeout is the
> > > time without receiving any feedback messages after starting a
> > measurement.
> > [acm]
> > That's close:
> > Operation: The load packet timeout SHALL be reset to the configured
> value
> > each time a load packet received. If the timeout expires, the receiver
> SHALL
> > be closed and no further feedback sent.
> 
> Ok, that needs to be included.
[acm] 
OK it's in the working text.


> 
> >
> >
> > > Is the load packet timeout the time the receiver is waiting before
> > > using signalling channel to end the measurement without receiving any
> > > packets, or for the sender to receive feedback that says that no
> > > packets have been received? The roles here are not clear.
> > [acm]
> > Operation: The feedback message timeout SHALL be reset to the configured
> > value each time a feedback message is received. If the timeout expires, the
> > sender SHALL be closed and no further load packets sent.
> 
> Ok. So I think there are two aspects here. The separation between something
> that backs off a transmission, and terminates the whole measurement. Do
> you really like to have these two clumped together.
[acm] 
We were satisfied with that strategy *when this timeout = 5 seconds*.

Now that you would likely prefer to see the timeout in the < 0.5 seconds
range, Len proposes a safety measure to back-off the rate when there is no
feedback (a.k.a. ACK of working path), in the time-scale of 0.1 seconds
or so.

There is clear separation between this safety measure, normal operation, 
and a timeout to terminate the test.

What do you think?


> 
> >
> > >
> > > Sending packets for several seconds without seeing any result appears
> > > problematic and allowing values beyond several seconds looks broken.
> > [acm]
> > Then let us make some revisions together. You've seen our proposals.
> > Our concern is stopping a test unnecessarily. There can be a happy
> > conclusion.
> 
> Yes, and as I said above do you really want to have timeout of measurement
> be connected to a rate reductions timeout?
[acm] 
The timeout setting of 5 seconds during a 10 second test removed any
ambiguity in our minds.  In Half of randomly occurring 5 second outages,
the sender would be terminated by the test duration anyway.

So, let's see what we can do with some separation and a new safety measure
using absence of feedback at the sender.


> 
> >
> >
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | table index  | 0.5Mbps     | 0.5Mbps      | when testing <=10Gbps
> |
> > >    | 0            |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | table index  | 1Mbps       | 1Mbps        | when testing <=10Gbps
> |
> > >    | 1            |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > Why is this value not relevant when testing beyond 10 Gbps, the ramp
> up
> > > time becomes to long with these values or?
> > [acm]
> > "not relevant" is different from the title of the column, which is:
> >
> > Expected Safe Range:  when testing <=10Gbps
> >
> > The parameters above and several others simply determine where a test
> > starts.
> >
> > This parameter:
> > +--------------+-------------+--------------+-----------------------+
> > | table index  | 1Mbps       | 1Mbps -      | same as tested        |
> > | (step) size  |             | 1Gbps        |                       |
> > +--------------+-------------+--------------+-----------------------+
> >
> 
> Okay, I will consider if I have any proposal for how to make this clearer.
[acm] 
BTW this row seems clearer if we write the tested range as:

+--------------+-------------+--------------+-----------------------+
| table index  | 1Mbps       | 1Mbps<=rate  | same as tested        |
| (step) size  |             | <=1Gbps      |                       |
+--------------+-------------+--------------+-----------------------+

> 
> 
> > >
> > >    | ss, UDP      | none        | <=1222       | Recommend max at
> |
> > >    | payload      |             |              | largest value that
> |
> > >    | size, bytes  |             |              | avoids fragmentation
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > So isn't there a mismatch between the metric and the load algorithm values
> > > here? With the rate definition in Section 8.1 being defined as based on
> > > "ss" that UDP payload bytes, rather than IP packet sizes that are used?
> > [acm]
> >
> > Not really, UDP is mandatory in the metric definition.
> 
> Hmm, I think then there is a mismatch here. Section 6.3 states:
> 
> n0 is the total number of IP-layer header and payload bits that
>       can be transmitted in standard-formed packets from the Src host
>       and correctly received by the Dst host during one contiguous sub-
>       interval, dt in length, during the interval [T, T+I],
> 
> So the metric appears to be defined based on the IP packets, not the UDP
> payload size. Thus, one need to convert between the value of ss and the actual
> packet size sent. Thus, I think the rate value in the table will be
> misinterpreted as a rate of 560 mbit/s based on ss = 1210 bytes would in
> fact be an IPv6 capacity of 582,2 mbps.
[acm] 

We can certainly calculate the IP-layer rates that each payload size 
produces using UDP and IP headers, and those are the rates we are 
referring to when talking about 1Mbps step sizes, starting rates,
transition rates, etc.

Sure, we need a table for v4 and v6. The running code can use v6 
address family, too.


> 
> >
> > >
> > > I understand that one want to ensure that one measure using a size
> that
> > > actually works in the path. However, I think one should be warned that
> one
> > > might run into packet rate limitations rather than byte limits if one
> > > would use too small.
> > [acm]
> > Ok
> > "Use of too-small payload size might result in unexpected sender
> > limitations."
> >
> > > `
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | cc, burst    | none        | 1 - 100      | same as tested
> |
> > >    | count        |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > So the cc value is dependent on target rate and the value of ss and tt. So
> > > should it be included in this table? Especially as 100 is not sufficient
> > > for multi-gigabit speeds with a tt of 1 ms.
> > [acm]
> > We can remove it if the values cause confusion.
> >
> 
> I think that should be done, unless it has a real purpose here.
[acm] 

OK , will check with co-authors on this first.

> 
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | low delay    | 30ms        | 5ms, 30ms    | same as tested
> |
> > >    | range        |             |              |
> |
> > >    | threshold    |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > So I think this value is highly dependent on several aspects and maybe
> > > should get more discussion. First for a measurement campaign it is
> > > relevant what one consider as the target additional latency that is
> > > acceptable when finding capacity. Secondly, the jitter in the network
> > > technology. For WIFI,  mobile and DOCIS a to low value may be shorter
> than
> > > the scheduling latencies that might occur. It is also a question about
> how
> > > precise the implementation are capable of measuring per packet latency
> > > variances.
> > [acm]
> > As we discussed much earlier in this long thread, we arrived at both the
> > delay threshold values after testing with the WIFI, mobile and DOCIS
> access
> > services we could use in production, and many others.
> 
> Understood, did really 5 ms work well in DOCIS and 4G Mobile? Or is it 30
> ms that works well?
[acm] 

5ms worked well with PON, and ok with DOCSIS, 30ms worked well with 4G 
and **DSL, and still worked accurately with others having less delay 
variation.

> 
> >
> >
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | high delay   | 90ms        | 10ms, 90ms   | same as tested
> |
> > >    | range        |             |              |
> |
> > >    | threshold    |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > Also here I wished there was a bit more discussion. So this value
> clearly
> > > must be above expected jitter for the network technology. It also
> needs to
> > > be sufficient large to represent a fair amount of queue to avoid
> > > measurement errors. I assume that if one would chose a value larger
> than
> > > available buffer depth one would drive the network into packet loss.
> And
> > > as long as there are some room between low delay range threshold and
> > the
> > > actual delay causing loss or this higher one has a chance to regulate
> to
> > > that rate.
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | sequence     | 0           | 0, 100       | same as tested
> |
> > >    | error        |             |              |
> |
> > >    | threshold    |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > What is this value really?
> > [acm]
> > When loss or reordering occur, initially these impairments appear as
> missing
> > or unexpected sequence numbers in the stream, or sequence errors.
> >
> 
> So, this is the amount of change beyond the expected next sequential value
> should be used. Did you actually use 100 as threshold, i.e. that you need
> burst loss of 100 packets or reordering that moved a packet 100 out of
> sequence for it to be considered an error? What was the purpose of using a
> so high value?
[acm] 

One of our open-source collaborators brought in measurements with dispersed
losses and reordered packets (not bursts of anything). The threshold of 
100 worked well in these unexpected circumstances.  All we are claiming is
that we HAVE tested at 100 using the running code.


> 
> >
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | consecutive  | 2           | 2            | Use values >1 to
> |
> > >    | errored      |             |              | avoid misinterpreting
> |
> > >    | status       |             |              | transient loss
> |
> > >    | report       |             |              |
> |
> > >    | threshold    |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > Also here I am uncertain what is the criteria here?
> > [acm]
> > From the draft, where consecutive status reports are sent at feedback
> > intervals:
> >
> >    Lastly, the method for inferring congestion is that there were
> >    sequence number anomalies AND/OR the delay range was above the
> > upper
> >    threshold for two consecutive feedback intervals.
> 
> Okay I get it.
> 
> 
> >
> > >
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >    | Fast mode    | 30          | 3 * Fast     | same as tested
> |
> > >    | decrease, in |             | mode         |
> |
> > >    | table index  |             | increase     |
> |
> > >    | steps        |             |              |
> |
> > >    +--------------+-------------+--------------+----------------------
> -+
> > >
> > > So is the recommended value 30 or 3*Fast mode increase? Should they be
> > > proportional or not?
> > [acm]
> > The Default can be 3 * Fast mode increase if you want, that's what we
> > tested.
> 
> Yes, I think that makes more sense and avoid causing the values to not be
> related to.
[acm] 

OK, done in the working text.

> 
> >
> > >
> > > The last entry appears to be a summary fact of the parameterization, and
> > > is it relevant?
> > [acm]
> > It might be removed, we thought it was useful info.
> 
> It would be good if these values are not input parameters, rather a
> consequence of others would be separated to its own category.
[acm] 

OK, I turned that row into a sentence in the working text.

> 
> > >
> > >
> > > What is the goal here in relation to push other congestion controlled
> > > traffic out of the way? It appears that it is likely to cause delay
> based
> > > congestion to be pushed out of the way. I am more uncertain how it
> > > interacts with loss based ones, as depending on situation it appears
> that
> > > it could avoid going into the loss regim.
> > [acm]
> > The goal is to measure the true maximum rate during the test duration.
> >
> 
> So pushing traffic out of the way during the test period. 
[acm] 

Aren't you assuming the measurement system allows other traffic during a test?
The instructions can tell a naïve user to stop other traffic, or they may 
not see accurate results ...  All the ad hoc tools FAQ that, and then launch
9 or 10 TCP streams to a low-RTT host that pushes all the non-measurement 
traffic out the way during the test period.

Like I said at the IPPM session, we pound hard for a few seconds, then go away.
There were some testing scenarios where our load adjustment did not push
all other traffic out of the way (likely 100s of connections were active).


> I think that should
> be made more explicit and with that stated explicit it is easier to make clear
> why this only should be deployed for measurement within cooperating
> administrative domains.
> 
> 
> > >
> > > My conclusion is that some aspect of this do appear more
> clarifications on
> > > what they are
> > [acm]
> > These ASCII tables don't provide much space for explanation without
> > becoming
> > awkward due to row height.  We'll add some definitions elsewhere.
> 
> Yes, please do. I would also recommend that you try doing these as XMLv3
> tables so they look much better in the HTML version.
[acm] 

Because my current tool is v2, I'm a v2 guy. But we added the definitions 
agreed so far.

> 
> >
> > > and further assumptions on how the load algorithm will be
> > > deployed spelled out so that its function is more controlled.
> > [acm]
> > We could use some text suggestions to continue the discussion productively,
> > having already tried several times.
> >
> 
> I starting to see how we can get this into where I personally will find it
> acceptable.
> 
> I think rewriting the applicability and make the intention clear will be
> the main part. I also think there should be a paragraph in the security
> consideration that makes it clear that deployments should prevent metrics to
> be run by clients that are outside of the intended administrative domains to
> prevent that this traffic can be used to interfere with other
> administrative domains traffic.
[acm] 

the reference to Section 2 of [RFC7479] will help you.

> 
> Cheers
> 
> Magnus Westerlund
> 
>