Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-21: (with DISCUSS and COMMENT)
Benjamin Kaduk <kaduk@mit.edu> Fri, 28 January 2022 19:37 UTC
Return-Path: <kaduk@mit.edu>
X-Original-To: alto@ietfa.amsl.com
Delivered-To: alto@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09DA23A0E0A; Fri, 28 Jan 2022 11:37:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.501
X-Spam-Level: ***
X-Spam-Status: No, score=3.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, KHOP_HELO_FCRDNS=0.399, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tRq-8b3sj5CI; Fri, 28 Jan 2022 11:37:53 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9EC2D3A0DFF; Fri, 28 Jan 2022 11:37:52 -0800 (PST)
Received: from mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 20SJbg2N002044 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 14:37:49 -0500
Date: Fri, 28 Jan 2022 11:37:42 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: "Y. Richard Yang" <yry@cs.yale.edu>
Cc: The IESG <iesg@ietf.org>, alto-chairs <alto-chairs@ietf.org>, draft-ietf-alto-performance-metrics@ietf.org, IETF ALTO <alto@ietf.org>
Message-ID: <20220128193742.GC11486@mit.edu>
References: <164004830683.29272.10508004540707438889@ietfa.amsl.com> <CANUuoLpe7=W29DJh+BOxUmQomXg-XiU2jzHL4Fi8xkXwj_JU_Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CANUuoLpe7=W29DJh+BOxUmQomXg-XiU2jzHL4Fi8xkXwj_JU_Q@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/alto/1zz-S1bPQ96ludefQPZn6ndDfC8>
Subject: Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-21: (with DISCUSS and COMMENT)
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/alto/>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Jan 2022 19:37:57 -0000
Hi Richard, On Tue, Dec 21, 2021 at 08:13:02PM -0500, Y. Richard Yang wrote: > Hi Ben, > > Thank you so much for the wonderful, fast, thorough reviews! I understand > that Qin has already sent a reply and please see below. Yes, Qin's reply was quite helpful -- thanks, Qin! > On Mon, Dec 20, 2021 at 7:58 PM Benjamin Kaduk via Datatracker < > noreply@ietf.org> wrote: > > > Benjamin Kaduk has entered the following ballot position for > > draft-ietf-alto-performance-metrics-21: Discuss > > > > When responding, please keep the subject line intact and reply to all > > email addresses included in the To and CC lines. (Feel free to cut this > > introductory paragraph, however.) > > > > > > Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ > > for more information about how to handle DISCUSS and COMMENT positions. > > > > > > The document, along with other ballot positions, can be found here: > > https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/ > > > > > > > > ---------------------------------------------------------------------- > > DISCUSS: > > ---------------------------------------------------------------------- > > > > Thank you for addressing my previous discuss points with the -21 (and my > > apologies for the spurious one!); I'm glad to see that they were indeed > > easy to address. > > > > However, I have looked over the changes from -20 to -21 and seem to have > > found a couple more issues that should be addressed: > > > > (1) I can't replicate the Content-Length values in the examples (I only > > looked at Examples 1 and 2). Can you please share the methodology used > > to generate the values? My testing involved copy/paste from the > > htmlized version of the draft to a file, manually editing that file to > > remove the leading three spaces that come from the formatting of the > > draft, and using Unix wc(1) on the resulting file. It looks like the > > numbers reported in the -21 are computed as the overall number of > > characters in the file *minus* the number of lines in the file, but I > > think it should be the number of characters *plus* the number of lines, > > to accommodate the HTTP CRLF line endings. (My local temporary files > > contain standard Unix LF (0x0a) line endings, verified by hexdump(1).) > > > > > This is very helpful and we are impressed! Below is the method that we are > finally using: > copy .json file to a text file, e.g., example1-req.json > { > "cost-type" : {"cost-mode" : "numerical", > "cost-metric" : "delay-ow"}, > "endpoints" : { > "srcs" : [ "ipv4:192.0.2.2" ], > "dsts" : [ > "ipv4:192.0.2.89", > "ipv4:198.51.100.34" > ] > } > } > > Issue a curl request to example.com: > curl -v --http1.1 -X POST -H 'Content-Type: application/json' --crlf > --data-binary @$i example.com 2>&1 -o /dev/null | grep -Fi '> > Content-Length' > > Note that it sets http/1.1, and asks curl to fix the crlf issue (I am using > a mac and hence have 0x0a). Note $i is the input file. For example, for the > updated example1-req.json, we get 225. Is it consistent with your method? For what it's worth, my reading of the curl manual is that the "--crlf" is not needed here (and indeed in my local testing on FreeBSD the same content-length is used regardless of whether or not --crlf is used). I believe that this curl-based procedure will produce the correct results. (My original statement about adding the number of lines seems to be incorrect; sorry to have caused confusion with that.) Thanks for investigating and finding a good solution. > > > (2) We seem to be inconsistent about what the "cur" statistical operator > > for the "bw-utilized" metric indicates -- in §4.4.3 it is "the current > > instantaneous sample", but in §4.4.4 it is somehow repurposed as "The > > current ("cur") utilized bandwidth of a path is the maximum of the > > available bandwidth of all links on the path." > > > > Good comment. I see where the potential confusion can be. How about the > following change to make clear the definition of utilized bandwidth for a > path? > "The base semantics of the metric is the Unidirectional Utilized Bandwidth > metric defined in [RFC8571,RFC8570,RFC7471], but instead of specifying the > utilized bandwidth for a link, it is the utilized bandwidth of the path > from the source to the destination, where the utilized bandwidth of the > path from the source to the destination is defined as the maximum utilized > bandwidth among all links from the source to the destination." > > Overall, I agree that we should not add bw-utilized to be fully consistent, > and we will just reverted bw-utilized. I don't want to dwell on this topic too much since it sounds like we're going to just remove bw-utilized, but to briefly answer your question: the proposed text looks like a good clear definition of the semantics of the metric. I think that the §4.4.4 text would need to be adjusted to talk about "utilized bandwidth" rather than "available bandwidth" of all links on the path, in order to match it. > > > > > > ---------------------------------------------------------------------- > > COMMENT: > > ---------------------------------------------------------------------- > > > > I cannot currently provide a concise explanation of the nature of my > > unease with the "bw-utilized" metric specification that is new in this > > revision (so as to elevate it to a Discuss-level concern), but I > > strongly urge the authors and WG to consider my comments on Section 4.4.3. > > > > The new text in Section 1 explaining the origins of the metrics (e.g., > > from TE performance metrics) and why some other TE metrics are not > > defined is nicely done. I trust the responsible AD and WG chairs to > > ensure that it, and the other places where we have added new exposition, > > has gotten the appropriate level of review from the WG membership. > > > > > It is wonderful that you caught it immediately. The reason chain was > the perfect consistency, but it is not a good idea. So we will just drop > this new metric. > > > > > Section 3.1.2, 3.2.2 > > > > I see that the delay-ow and delay-rt semantics have been changed from > > milliseconds to microseconds going from -20 to -21. Either > > representation seems fine, but it may be risky to make such a change so > > late in the publication process, especially if there are already > > implementations in place. I also don't see any AD ballot comments that > > seem to motivate the change, so I'm a bit curious how it arose -- is it > > for consistency with the corresponding TE link metrics? > > > > > Exactly. It was motived by the authors' search for as perfect consistency > and we noted that > we used milliseconds in our drafts and OSPF/ISIS/BGP-LS use microseconds. > So we made the > quick change. Thanks for confirming. Hopefully we are in touch with all implementations and can get them adjusted quickly. > > Section 3.3.3 > > > > Intended Semantics: To specify temporal and spatial aggregated delay > > variation (also called delay jitter)) with respect to the minimum > > delay observed on the stream over the one-way delay from the > > specified source and destination, where the one-way delay is defined > > in Section 3.1. A non-normative reference definition of end-to-end > > one-way delay variation is [RFC3393]. [...] > > > > I note that RFC 3393 explicitly says that as part of the metric, several > > parameters must be specified, most notably the selection function F that > > unambiguously defines the two packets selected for the metric. While > > it's allowed for F to select as the "first" packet the one with the > > smallest one-way delay, which maps up to the "with respect to the > > minimum delay observed on the stream" here, it seems to me that it's > > fairly important to call out that we are not allowing the full > > flexibility of the RFC 3393 metric. Assuming, of course, that we > > specifically have that as the intent, versus allowing the full > > generality of RFC 3393. If there has been some research results since > > RFC 3393 was published that indicate that it's preferred to use the > > minimum delay for this purpose, that might be worth listing as a > > reference in addition to RFC 3393. > > > > > Excellent comment! Here is the new text: > "A non-normative reference definition of end-to-end one-way delay > variation is <xref target="RFC3393" />, which allows general delay > variations > by specifying a selection function F (Section 2.2 of [RFC3393]). See > <xref target="RFC5481" /> for additional discussions on RFC3393 related > packet delay variations. This document focuses on the specific case of > delay variation with respect to the > minimum delay observed in the packet stream, as commonly exposed by > routing link metrics. If an ALTO server provides a delay variation metric > that > is not based on the minimum delay, the server can provide the precise > definition in the "cost-context" field, for example, by specifying a > general > IPPM PDV metric in the "parameters" field. The server SHOULD be > cognizant that the "cost-context" field is optional, and hence the client > may not interpret the semantics properly." That looks great; thank you! > I looked at the IPPM registry > ( > https://www.iana.org/assignments/performance-metrics/performance-metrics.xhtml > ) > The only PDV metric defined so far is entry 3, and it is with respect to > minimum delay. > > > > Section 3.4.4 > > > > The estimation of end-to-end loss rate as the sum of per-link loss rates > > is (1) only valid in the low-loss limit, and (2) assumes that each > > link's loss events are uncorrelated with every other link's loss events. > > The current text does mention (2) in the form of "should be cognizant of > > correlated loss rates", but I don't think it touches on (1) at all. > > (The general formula for aggregating loss assuming each link is > > independent is to compute end-to-end loss as one minus the product of > > the success rate for each link.) > > > > > Excellent comments. How about this updated text: > "For estimation by aggregation of routing protocol link metrics, the > default > aggregation function to compute the loss rate of a path is to compute it as > one minus the success rate of the path, where the success rate of the path > is the product of the success rates of the links on the path, and the > success > rate of a link is one minus the loss rate of the link. In low loss-rate > settings, > the loss rate of a path can be approximated as the sum of link loss rates. > This aggregation function assumes independent link losses, and the ALTO > server should be cognizant of correlated link loss rates." That should work, thanks. > > > Section 4.4.3 > > > > It seems like there may some subtlety in the interpretation of the > > "bw-utilized" metric, which leads me to wonder if more caution is > > advised prior to adding new metrics at this stage in the document > > lifecycle. In particular, it seems like it would be natural to attempt > > to compare the "bw-utilized" value against the "bw-maxres" value and > > "bw-residual" value, but it seems to me that the inferences that can be > > made by such comparisons will depend on the topology in question. > > > > > > Routers and link capacities between them: > > > > 1Gbps 10Gbps 1Gbps > > +-----------------+=================+--------------+ > > A B C D > > > > If there is a flow using 6GBps from B to C, that would show up when > > querying "bw-utilized" between A and B, but that 6Gbps is obviously more > > than both the maximum reservable and residual bandwidth end-to-end from > > A to D; likewise, the 4GBps of residual bandwidth on the B-to-C link is > > also more than the achievable bandwidth end-to-end from A to D. So it > > seems like the utilized bandwidth is potentially from totally unrelated > > flows on paths that only have a minimal set of links in common with the > > path being queried. How do we expect someone to use the reported > > "bw-utilized" values? > > > To put it differently, I don't think that the specification of "the > > maximum utilized bandwidth among all links from the source to the > > destination" will actually provide the desired "utilized bandwidth of > > the path from the source to the destination", since the procedure as > > stated can report a bandwidth that corresponds to a different path. > > > > > Excellent comment! We will just use the previous version w/o bw-utilized > and will engage you in a separate thread, so that we will not block the > progress of the current document. Make sense? Yes, that sounds good. This will also let us exercise the IANA registry procedures quickly and get experience with them :) [nits trimmed] Thanks again, Ben
- [alto] Benjamin Kaduk's Discuss on draft-ietf-alt… Benjamin Kaduk via Datatracker
- Re: [alto] Benjamin Kaduk's Discuss on draft-ietf… Qin Wu
- Re: [alto] Benjamin Kaduk's Discuss on draft-ietf… Y. Richard Yang
- Re: [alto] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
- Re: [alto] Benjamin Kaduk's Discuss on draft-ietf… Y. Richard Yang