Re: [ippm] Benjamin Kaduk's No Objection on draft-ietf-ippm-capacity-metric-method-06: (with COMMENT)
"MORTON, ALFRED C (AL)" <acm@research.att.com> Sat, 27 February 2021 20:19 UTC
Return-Path: <acm@research.att.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 22C903A14DD; Sat, 27 Feb 2021 12:19:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7zF9OPMO-XIe; Sat, 27 Feb 2021 12:19:53 -0800 (PST)
Received: from mx0a-00191d01.pphosted.com (mx0a-00191d01.pphosted.com [67.231.149.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 325BA3A1446; Sat, 27 Feb 2021 12:19:43 -0800 (PST)
Received: from pps.filterd (m0053301.ppops.net [127.0.0.1]) by mx0a-00191d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 11RKFDXY009368; Sat, 27 Feb 2021 15:19:38 -0500
Received: from tlpd255.enaf.dadc.sbc.com (sbcsmtp3.sbc.com [144.160.112.28]) by mx0a-00191d01.pphosted.com with ESMTP id 36yh1474ea-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 27 Feb 2021 15:19:38 -0500
Received: from enaf.dadc.sbc.com (localhost [127.0.0.1]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 11RKJbm8019979; Sat, 27 Feb 2021 14:19:37 -0600
Received: from zlp30499.vci.att.com (zlp30499.vci.att.com [135.46.181.149]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id 11RKJZiw019957 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 27 Feb 2021 14:19:36 -0600
Received: from zlp30499.vci.att.com (zlp30499.vci.att.com [127.0.0.1]) by zlp30499.vci.att.com (Service) with ESMTP id CE0B8401B725; Sat, 27 Feb 2021 20:19:35 +0000 (GMT)
Received: from clph811.sldc.sbc.com (unknown [135.41.107.12]) by zlp30499.vci.att.com (Service) with ESMTP id 96BE1401B724; Sat, 27 Feb 2021 20:19:35 +0000 (GMT)
Received: from sldc.sbc.com (localhost [127.0.0.1]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 11RKJZkc118347; Sat, 27 Feb 2021 14:19:35 -0600
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.178.11]) by clph811.sldc.sbc.com (8.14.5/8.14.5) with ESMTP id 11RKJRes117020; Sat, 27 Feb 2021 14:19:27 -0600
Received: from exchange.research.att.com (njbdcas1.research.att.com [135.197.255.61]) by mail-blue.research.att.com (Postfix) with ESMTP id 26D7110A18D8; Sat, 27 Feb 2021 15:19:27 -0500 (EST)
Received: from njmtexg5.research.att.com ([fe80::b09c:ff13:4487:78b6]) by njbdcas1.research.att.com ([fe80::8c6b:4b77:618f:9a01%11]) with mapi id 14.03.0468.000; Sat, 27 Feb 2021 15:19:44 -0500
From: "MORTON, ALFRED C (AL)" <acm@research.att.com>
To: Benjamin Kaduk <kaduk@mit.edu>
CC: The IESG <iesg@ietf.org>, "draft-ietf-ippm-capacity-metric-method@ietf.org" <draft-ietf-ippm-capacity-metric-method@ietf.org>, "ippm-chairs@ietf.org" <ippm-chairs@ietf.org>, "ippm@ietf.org" <ippm@ietf.org>, Ian Swett <ianswett@google.com>, "tpauly@apple.com" <tpauly@apple.com>
Thread-Topic: Benjamin Kaduk's No Objection on draft-ietf-ippm-capacity-metric-method-06: (with COMMENT)
Thread-Index: AQHXCubpIAgrxqGmcU+aUTtf1usNw6pnzooAgAHzjYCAAqh1sA==
Date: Sat, 27 Feb 2021 20:19:43 +0000
Message-ID: <4D7F4AD313D3FC43A053B309F97543CF01476A103F@njmtexg5.research.att.com>
References: <161419645471.18083.16706266293896961774@ietfa.amsl.com> <4D7F4AD313D3FC43A053B309F97543CF01476A0549@njmtexg5.research.att.com> <20210225220325.GX21@kduck.mit.edu>
In-Reply-To: <20210225220325.GX21@kduck.mit.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [24.148.42.167]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369, 18.0.761 definitions=2021-02-27_13:2021-02-26, 2021-02-27 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 mlxscore=0 phishscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 spamscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2102270172
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/EPj0KRsh84xOiwajgPA_J3dO0xU>
Subject: Re: [ippm] Benjamin Kaduk's No Objection on draft-ietf-ippm-capacity-metric-method-06: (with COMMENT)
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 20:20:03 -0000
Hi Ben, Thanks for your reply. I deleted some early agreements below, and added very few more replies. Please see [acm] with no indents... > -----Original Message----- > From: Benjamin Kaduk [mailto:kaduk@mit.edu] > Sent: Thursday, February 25, 2021 5:03 PM > To: MORTON, ALFRED C (AL) <acm@research.att.com> > Cc: The IESG <iesg@ietf.org>; draft-ietf-ippm-capacity-metric- > method@ietf.org; ippm-chairs@ietf.org; ippm@ietf.org; Ian Swett > <ianswett@google.com>; tpauly@apple.com > Subject: Re: Benjamin Kaduk's No Objection on draft-ietf-ippm-capacity- > metric-method-06: (with COMMENT) > > Hi Al, > > Also inline... > > On Thu, Feb 25, 2021 at 03:08:29AM +0000, MORTON, ALFRED C (AL) wrote: > > Hi Ben, > > > > Thanks for your detailed review; it will be a better draft when we're > done, as always! > > > > All the changes identified below are implemented in my working version. > > > > Please see replies below, [acm] > > Al > > > > > -----Original Message----- > > > From: Benjamin Kaduk via Datatracker [mailto:noreply@ietf.org] > > > Sent: Wednesday, February 24, 2021 2:54 PM > > > To: The IESG <iesg@ietf.org> > > > Cc: draft-ietf-ippm-capacity-metric-method@ietf.org; ippm- > chairs@ietf.org; > > > ippm@ietf.org; Ian Swett <ianswett@google.com>; tpauly@apple.com; > > > tpauly@apple.com > > > Subject: Benjamin Kaduk's No Objection on draft-ietf-ippm-capacity- > metric- > > > method-06: (with COMMENT) > > > > > > Benjamin Kaduk has entered the following ballot position for > > > draft-ietf-ippm-capacity-metric-method-06: No Objection > > > > > ... > > > ---------------------------------------------------------------------- > > > COMMENT: > > > ---------------------------------------------------------------------- ... <we agreed on changes in earlier sections> > > > Section 8.1 > > > > > > At the beginning of a test, the sender begins sending at rate R1 > and > > > the receiver starts a feedback timer at interval F (while awaiting > > > > > > It's a little hard to search for, but I didn't find any previous > mention > > > of 'F' or it being defined as a parameter or term. Should it be a > > > listed parameter somewhere? > > [acm] > > We define a lot of variables in this section, with limited scope of use. > > F is one of them, but I found that we had already defined F in Section > 4! > > So... F becomes FT, and like R1 and ss and cc, gets no special treatment > > beyond a definition in the text (IF you're ok with that). > > That should be fine. [acm] Since this came-up again with Magnus, I made FT a parameter in section 4. > > > > > > > If the feedback indicates that sequence number anomalies were > > > detected OR the delay range was above the upper threshold, the > > > offered load rate is decreased. Also, if congestion is now > confirmed > > > by the current feedback message being processed, then the offered > > > load rate is decreased by more than one rate (e.g., Rx-30). [...] > > > > > > Does "congestion is now confirmed" mean that "congestion confirmed" is > > > like a one-way latch and this transition only occurs at most once over > > > the course of a test? Or could the Rx-30 happen multiple times? > > > (The pseudocode indicates the former.) > > [acm] > > Yes, we are trying to describe the pseudocode, and "congestion > confirmed" > > latches when the slowAdjCount equals the slowAdjThresh (and after that, > > slowAdjCount continues upward, the slowAdjCount < slowAdjThresh > condition > > fails and slowAdjCount is never reset to zero). > > > > So, I think we want to say: > > OLD > > Also, if congestion is now confirmed by the current feedback message... > > NEW > > Also, if congestion is now confirmed for the first time by the > > current feedback message... > > +1 > > > > > > > If the feedback indicates that there were no sequence number > > > anomalies AND the delay range was above the lower threshold, but > > > below the upper threshold, the offered load rate is not changed. > > > > > > The way this is written suggests that there will always be a lower and > > > an upper threshold for delay, but the rest of the document so far didn't > > > give me that impression. E.g., we talk about PM only as "at least one > > > fundamental metric and target performance threshold MUST be supplied", > > > and to me having both upper and lower thresholds would be two > > > thresholds, not one. > > [acm] > > That's true, we tried not to force our current/best algorithm into the > > metric definition. We require some measurement to use for feedback > > in rate adjustment, otherwise you just have iPerf or other fixed rate > > tools that can only blast packets. > > > > You can build an "ok" feedback system with just one metric and one threshold, > > but there are drawbacks and it may take longer duration tests to measure > > the true maximum capacity due to a technical limitation. > > So, we gave a really good algorithm, in 8.1, in pseudocode, and even > > in running code > > That all makes sense to me. I'm not sure if there's a good way to shoehorn > some of that insight into the text of the document itself, but it also > doesn't seem like something that's critical to do. [acm] Thanks. > > > > > > > > > Section 8.2 > > > > > > Here, as with any Active Capacity test, the test duration must be > > > kept short. 10 second tests for each direction of transmission are > > > common today. The default measurement interval specified here is I > = > > > 10 seconds). In combination with a fast search method and user- > > > network coordination, the concerns raised in RFC 6815[RFC6815] are > > > alleviated. [...] > > > > > > I skimmed RFC 6815 and had a bit of a hard time making the connection > > > for why combining a 10-second interval, fast search method, and > > > user-network coordination alleviate the concerns of RFC 6815. There > > > doesn't seem to be much in 6815 itself about how testing in production > > > can be done safely, > > [acm] > > That's certainly true, but we did say: > > > > The world will not spin off axis while waiting for appropriate and > > standardized methods to emerge from the consensus process. > > > > > so my current working assumption is that the > > > conclusion presented here reflects the results of "new work" being > > > recorded for the first time (in the RFC series) in this document. > > [acm] > > > > When you put it that way, yes. Although it is a different metric from > > RFC2544 Throughput, the load adjustment search algorithm alone helps > > to make this method safer to use than any fixed-rate UDP packet blaster, > > or even a binary search-controlled measurement because of near real-time > > feedback. > > > > The other reasons why this work is different are that > > RFC 2544 Throughput measurements intend to overload the isolated test > > environment for extended periods of time: > > > > 24. Trial duration (from > https://urldefense.com/v3/__https://tools.ietf.org/html/rfc2544*section-24__;Iw!!BhdT!zEoCgaLen-TJZODnn94eTalwD4PeqYAT9DJPq4WKsUdT8dlW3SvLdotESTp-B0s$ ) > > > > The aim of these tests is to determine the rate continuously > > supportable by the DUT. The actual duration of the test trials must > > be a compromise between this aim and the duration of the benchmarking > > test suite. The duration of the test portion of each trial SHOULD be > > at least 60 seconds. ... > > > > Many automated RFC2544 test devices start a test at the highest load, > and > > search their way down to the zero-loss Throughput, subjecting the > > device under test to potentially extreme overload multiple times before > > reaching the test outcome > > > > > If that assumption is correct, I'd suggest spending some more words to > > > support the conclusion, e.g., making analogies to other "normal" > traffic > > > patterns and how the benchmarking setup is not qualitatively different > > > from them. > > [acm] > > > > OK, I put some more background together and made the case stronger: > > the memo we wrote hear is exactly what the RFC6815 authors were asking > for. > > > > The Max IP Capacity metric and method for assessing is very different > from classic RFC2544 > > Throughput metric and methods : it uses near-real-time load adjustments > that are sensitive to loss and delay, similar to other congestion control > algorithms used on the Internet every day, along with limited duration. On > the other hand, RFC2544 Throughput measurements can produce sustained > overload conditions for extended periods of time. Individual trials in a > test governed by a binary search can last 60 seconds for each step, and > the final confirmation trial may be even longer. This is very different > from "normal" traffic levels, but overload conditions are not a concern in > the isolated test environment. The concerns raised in RFC6815 were that > RFC2544 methods would be let loose on production networks, and instead the > authors challenged the standards community to develop metrics and methods > like those described in this memo. > > Thanks; that is what I was asking for. [acm] Great! > I think this is related to Magnus's Discuss point, though, and I cannot > speak for whether it will make him happy as well... [acm] Of course. It's part of the topic: testing production networks safely. We've been running versions of this method on the Internet for years, as the references and IPPM's literature clearly show. > > > > > > > > > Section 8.3 > > > > > > As testing continues, implementers should expect some evolution in > > > the methods. The ITU-T has published a Supplement (60) to the > > > Y-series of Recommendations, "Interpreting ITU-T Y.1540 maximum IP- > > > layer capacity measurements", [Y.Sup60], which is the result of > > > continued testing with the metric and method described here. > > > > > > I pulled up the [Y.Sup60] reference, and it does not seem to reference > > > this draft by name. On what basis do we conclude that it "is the result > > > of continued testing with the metric and method described here"? > > > Skimming/searching, I do see many similar formulae and methods > > > presented, but how do we conclude they are precisely the same? > > [acm] > > I'll soften that a bit. The Max IP-Layer Capacity metric is > > the same, but it is likely that a few details in the method have diverged > > over time -- much of the ITU-T testing and spec development came first. > > > > NEW > > ... [Y.Sup60], which is the result of continued testing with the metric, > > and those results have improved the method described here. > > Thanks. My primary concern here was that it seemed like we were making > statements about what some other SDO did, and typically it's good to have > sign-off from the other SDO before doing that. The NEW option doesn't seem > to have that issue, so it should be good. [acm] Nice, thanks. > > > > > > > > > > > Section 10 > > > > > > Should we say something about making sure that I is reasonably > bounded? > > > IIRC we say so elsewhere in the text but not exactly here. > > [acm] > > I added a direct reference to I at the end of item 6.: > > > > ... Testing with the Service Provider's measurement hosts SHOULD be > limited in frequency and/or overall volume of test traffic (for example, > the range of I duration values SHOULD be limited). > > > > > > 2. A REQUIRED user client-initiated setup handshake between > > > cooperating hosts and allows firewalls to control inbound > > > unsolicited UDP which either go to a control port [expected and > > > w/authentication] or to ephemeral ports that are only created > as > > > needed. [...] > > > > > > nit: the grammar is odd in the first part of this sentence; the part > > > before the "and" doesn't seem like it can join up with anything after > > > the "and". Is the intent something like "It is REQUIRED to have a user > > > client-initiated setup handshake between cooperating hosts that allows > > > firewalls to [...]"? > > [acm] > > Thanks, good re-wording, it's in. > > > > > > > > 3. Integrity protection for feedback messages conveying measurements > > > is RECOMMENDED. > > > > > > (In some sense you want authentication as well as integrity protection.) > > [acm] > > Yes. The running code has optional authentication now. > > > > NEW > > 3. Client-server authentication and integrity protection for feedback > > messages conveying measurements is RECOMMENDED. > > > > > > > > 5. Senders MUST be rate-limited. This can be accomplished using the > > > pre-built table defining all the offered load rates that will be > > > supported (Section 8.1). The recommended load-control search > > > algorithm results in "ramp up" from the lowest rate in the table. > > > > > > nit: since (effectively) each implementation will have their own > > > pre-built table, I think it should be "using a pre-built table". > > [acm] > > OK, "a" it is. > > > > > > > > > > Appendix 13 > > > > > > If we start at Rx (row) 1, is it going to cause problems when we drop > > > down to Rx = 0 in the loss/congestion cases? > > [acm] > > It would, but the current table includes a Row zero, which is where we > > cold-start. I guess it would less confusing to say: > > > > Rx = 0 # The current sending rate (equivalent to a row of the table) > > Yes, I think so. > > > > > > > > > The mechcanism in the pseudocode to stop taking large increments in > > > sending rate above the "hSpeedThresh" does not seem to be described in > > > the prose in §8.1. (That said, it seems like a good idea, given the > > > likely table composition.) [acm] We fixed this in a separate exchange, after you discovered my mis-read. Thanks! > > [acm] > > It's getting late here now, but I think it's in the first two If > statements: > > > > if ( seqErr == 0 && delay < lowThresh ) { # no > loss or delay problems, and > > if ( Rx < hSpeedTresh && slowAdjCount < slowAdjThresh ) { # Rate > < hSpeedThresh && etc. > > Rx += highSpeedDelta; # can > still use large increments > > slowAdjCount = 0; > > } else { # > otherwise (Rate >= hSpeedThresh) > > if ( Rx < maxLoadRates - 1 ) # > after checking headroom, > > Rx++; # can > only increase by one > > I followed up on this out-of-band -- I agree that the pseudocode is good, > and was wondering that the prose in Section 8.1 was divergent from the > pseudocode. Your proposal for new prose text looks good: > > % However, if a rate threshold between high and very high sending > % rates (such as 1Gbps) is exceeded, the offered load rate is only > % increased by one (Rx+1) above the rate threshold in any congestion > state. > > > > > > > > > (Also, indenting one tab for the outer conditionals and two more for the > > > inner ones looks a bit unusual.) > > [acm] > > I like unusual :-) > > > > > > > > Section 14 > > > > > > It's not entirely clear to me why RFC 2330 is classified as normative > > > but RFC 7312 is informative, just based on the locations where they are > > > referenced. > > [acm] > > It's more than that. RFC 7312 describes some unusual access conditions > that might be encountered and is only cited at the end of the Intro, once. > Certainly the measured evidence of bimodal (turbo-mode) access behavior is > in the category of RFC 7312 "messy stuff", but we still manage that pretty > well and it's not specifically mentioned in 7312. > > > > OTOH, RFC 2330 is the IPPM Framework, with an exception to be > Informative status, yet Normative in our memos, and much is owed to 2330, > starting with the singleton, sample, statistic metric development, > unmentioned stuff about clocks and accuracy, etc. > > > > Understood. > > Thanks for all the updates and explanations! > > -Ben [acm] You're welcome. Thanks for a very productive exchange! Al (for the co-authors)
- [ippm] Benjamin Kaduk's No Objection on draft-iet… Benjamin Kaduk via Datatracker
- Re: [ippm] Benjamin Kaduk's No Objection on draft… MORTON, ALFRED C (AL)
- Re: [ippm] Benjamin Kaduk's No Objection on draft… Benjamin Kaduk
- Re: [ippm] Benjamin Kaduk's No Objection on draft… MORTON, ALFRED C (AL)