Re: [ippm] Lars Eggert's Discuss on draft-ietf-ippm-capacity-metric-method-10: (with DISCUSS and COMMENT)

Lars Eggert <lars@eggert.org> Tue, 18 May 2021 07:05 UTC

From: Lars Eggert <lars@eggert.org>
Message-Id: <6B3E6E27-AE72-4ABB-AE28-0A74B3EE93D0@eggert.org>
Content-Type: multipart/signed; boundary="Apple-Mail=_3784FEF4-4201-4FC8-A479-E6E55531AB7C"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\))
Date: Tue, 18 May 2021 10:04:58 +0300
In-Reply-To: <1764ec14e5394838b2a2802c45561aab@att.com>
Cc: The IESG <iesg@ietf.org>, Ian Swett <ianswett@google.com>, "ippm-chairs@ietf.org" <ippm-chairs@ietf.org>, "tpauly@apple.com" <tpauly@apple.com>, "draft-ietf-ippm-capacity-metric-method@ietf.org" <draft-ietf-ippm-capacity-metric-method@ietf.org>, "ippm@ietf.org" <ippm@ietf.org>
To: "MORTON JR., AL" <acmorton@att.com>
References: <162072161364.15626.11472410331094167599@ietfa.amsl.com> <1764ec14e5394838b2a2802c45561aab@att.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/gKlItdzniNy46cEjQbmhiGHxjgU>
Subject: Re: [ippm] Lars Eggert's Discuss on draft-ietf-ippm-capacity-metric-method-10: (with DISCUSS and COMMENT)
Precedence: list

Hi Al,

On 2021-5-17, at 21:00, MORTON JR., AL <acmorton@att.com> wrote:
>> DOWNREF from this Standards Track doc to Informational RFC7497.
>> I didn't see this called out in the Last Call, and it's also not a
>> document we have in the DOWNREF registry already.
> [acm]
> Ok, I think this means another IETF Last Call?  It's not the first time
> we've been tripped-up by keeping IPPM's Framework+updates and other
> requirements docs off the Standards-Track.
> 
> I also think that https://datatracker.ietf.org/doc/html/rfc7497 will figure
> prominently in future work on the Capacity topic, such as a test protocol, so we
> should add it to the DOWNREF registry.
> 
> @@@@ Is the DOWNREF registry an action item for chairs or AD/Martin?

yes, another LC will unfortunately be needed. It's an action item for Martin.

Note that everything below was comments, which are not blocking the document:

>> Section 1, paragraph 3, comment:
>>>   Here, the main use case is to assess the maximum capacity of the
>>>   access network, with specific performance criteria used in the
>>>   measurement.
>> 
>> If the main use case is to measure access network paths, that applicability
>> should be made very explicit, i.e., in the title and abstract.
> 
> Two things:
> 
> 1. The term "access" doesn't have a universally agreed definition regarding the portion of the network it covers, and that's partly because there are many architectures in use.
> 
> 2. If we add access to the title or abstract, we are unnecessarily limiting the scope that we define later. We refer to section of RFC 7497 where the figure includes the subscriber's device, multiple private networks, and various devices and gateways ending at a transit gateway.
> 
> So, I don't want to emphasize the term "access" as you propose.

That's fine, too. When reading the document, I came away with the impression that it did want to focus on measuring access networks, because it actually says that in several places (e.g., the quote above.) If the intent is for this to be a general metric for IP paths, maybe review the places where the document talks about "access" to make sure they are not giving the wrong impression.

> Perhaps we can clarify this further in the Intro, because the Scope says:
> 
>   The primary application of the metric and method of measurement
>   described here is the same as in Section 2 of [RFC7497]...
> 
> How about:
> Here, the main use case is to assess the maximum capacity of one or more networks
> where the subscriber receives specific performance assurances, sometimes
> referred to as the Internet access.

It's not only the intro, though. There is text talking about "access" throughout the document.

>> Section 2, paragraph 2, comment:
>>>   The scope of this memo is to define a metric and corresponding method
>>>   to unambiguously perform Active measurements of Maximum IP-Layer
>>>   Capacity, along with related metrics and methods.
>> 
>> I'm not sure what "unambiguously perform" is meant to express?
> 
> [RG] is it unambiguously measure...? If yes roughly:
> "The scope of this memo is to define Active Measurement metrics and corresponding methods
> to unambiguously determine Maximum IP-Layer Capacity and useful secondary evaluations."
> 
> [acm] With s/evaluations/metrics/ this WFM.

WFM

>> Section 2, paragraph 3, comment:
>>>   A local goal is to aid efficient test procedures where possible, and
>>>   to recommend reporting with additional interpretation of the results.
>> 
>> What is this goal "local" to - the IETF?
> [acm] Yes IETF, but that word choice seems unclear now.
> 
> [RG] further s/aid/add??
> 
>> Also, I'm unclear what "reporting
>> with additional interpretation of the results" is supposed to express.
> 
> [acm] Let's try:
> 
> Secondary goals are to add considerations for test procedures, and
> to provide interpretation of the Maximum IP-Layer Capacity results (to
> identify cases where more testing is warranted, possibly with alternate
> configurations).

WFM

>> Section 2, paragraph 10, comment:
>>>   - If a network operator is certain of the access capacity to be
>>>   validated, then testing MAY start with a fixed rate test at the
>>>   access capacity and avoid activating the load adjustment algorithm.
>>>   However, the stimulus for a diagnostic test (such as a subscriber
>>>   request) strongly implies that there is no certainty and the load
>>>   adjustment algorithm will be needed.
>> 
>> Since the first sentence of this paragraph uses a RFC2119 MAY, I'd suggest
>> rephrasing the second sentence to end with "there is no certainty and use
>> of the load adjustment algorithm is RECOMMENDED."
> 
> [RG] sounds good to me.
> [acm] WFM

WFM

>> Section 3, paragraph 4, comment:
>>>   1.  Internet access is no longer the bottleneck for many users.
>> 
>> What is the bottleneck then, and if it's not the access bandwidth, why is
>> a new metric for it helpful?
> 
> [RG]+[acm] 1.  Internet access is no longer the bottleneck for many users, but subscribers expect network providers to honor contracted access performance.
> 
> [acm] IOW, when you subscribe to a 1 Gbps service, then your ISP, you, and possibly other parties want to assure that performance level is delivered. If you test and confirm the subscribed performance level, then you can seek the location of the bottleneck elsewhere.

That context would be helpful to have in the document in some form, IMO.

> Section 4, paragraph 4, comment:
>>>   o  Src, the address of a host (such as the globally routable IP
>>>      address).
>>> 
>>>   o  Dst, the address of a host (such as the globally routable IP
>>>      address).
>> 
>> For many hosts, their IP address is not globally routable, and hasn't been
>> for decades. Or did you mean CPE?
> 
> [acm] I think *host* is general-enough to include CPE that is the source or destination of test packets. Also, "globally routable" is part of an example.

I still wonder if removing "globally routable" would reduce confusion here, but OK.

> Section 8.1, paragraph 1, comment:
>>> 8.1.  Load Rate Adjustment Algorithm
>> 
>> I wonder why this section is trying to define a crude new AIMD scheme, when we
>> have lots of good experience with TCP's AIMD approach, which converges on
>> the available bandwidth relatively well?
> 
> [acm] "crude"?? Our algorithm has the advantage of receiver measurements of rate,
> loss, and delay. So, we aren't making inferences based on ACKs, we communicate
> measurement feedback instead.

Maybe I should have said "complex" :-)

> This is not a new AIMD. I'm sure you haven't had time to review
> hundreds of test results where we compare our load adjustment algorithm to
> TCP's AIMDs when seeking to measure the maximum IP-layer capacity.
> 
> Please see section 4 of https://datatracker.ietf.org/doc/html/rfc8337
> particularly the list of section 4.1 where Matt Mathis wrote:
> 
>   o  TCP is a control system with circular dependencies -- everything
>      affects performance, including components that are explicitly not
>      part of the test (for example, the host processing power is not
>      in-scope of path performance tests).
> 
> and this is similar to the statements I have heard for >20 years in IETF,
> where the conclusion is that TCP was not designed as a measurement tool.
> 
> So, it was a goal to avoid the limitations of TCP (windows, ACK feedback,
> retransmissions, and the CCA that were available, including BBR and BBRv2).
> The algorithm we defined is intended to be fast, adjustable in straightforward
> ways, and (most important to Magnus) not to be deployed as a general-purpose
> CCA in TCP. Our scope is infrequent, diagnostic measurements.

Point taken.

>> Section 8.1, paragraph 7, comment:
>>>   If the feedback indicates that no sequence number anomalies were
>>>   detected AND the delay range was below the lower threshold, the
>>>   offered load rate is increased.  If congestion has not been confirmed
>>>   up to this point, the offered load rate is increased by more than one
>>>   rate (e.g., Rx+10).  This allows the offered load to quickly reach a
>>>   near-maximum rate.  Conversely, if congestion has been previously
>>>   confirmed, the offered load rate is only increased by one (Rx+1).
>> 
>> The first sentence talks about sequence number anomalies and delay ranges.
>> The rest of the paragraph then talks about whether congestion and whether it
>> was confirmed or not. Does this mean that (some amount of) sequence number
>> anomalies and/or delay indicates congestion? Is that defined somewhere?
> 
> [acm] Yes, in a paragraph that begins:
>     Lastly, the method for inferring congestion is that there were...
> 
> I'll add a forward reference:
>     ... If congestion has not been confirmed up to this point (see below
>     for the method to declare congestion), the offered load rate is increased ...

WFM

>> Section 8.1, paragraph 7, comment:
>>>   If the feedback indicates that sequence number anomalies were
>>>   detected OR the delay range was above the upper threshold, the
>>>   offered load rate is decreased.  The RECOMMENDED values are 0 for
>> 
>> This doesn't agree with the previous paragraph, which says "Conversely, if
>> congestion has been previously confirmed, the offered load rate is only
>> increased by one (Rx+1)." (Or I don't understand the difference between
>> congestion and sequence number anomalies/delay ranges; see previous
>> comment.)
> 
> [acm]  Your "(Or ...)" is correct, we deferred the description of congestion
> declaration to a later paragraph, where we require 2 feedback messages at least:
> 
>    Lastly, the method for inferring congestion is that there were sequence
>    number anomalies AND/OR the delay range was above the upper threshold
>    for two consecutive feedback intervals. ...

WFM

> Section 8.4, paragraph 2, comment:
>>>   This section is for the benefit of the Document Shepherd's form, and
>>>   will be deleted prior to final review.
>> 
>> Should this section have been deleted before the IESG review then? Would
>> you like the IESG to ignore it?
> 
> [acm] I think that during the February review, that's exactly what the IESG did:
> ignore the section. We would have happily supported any AD who wanted to review
> or try the code by allowing access to our server.
> 
>> You probably also need to instruct the RFC
>> Editor to remove it before publication?
> 
> [acm] OK
> RFC Editor: This section is for the benefit of the Document Shepherd's form,
> and will be deleted prior to publication.
> 
>>>> I don't think this material from 8.4 made it into the Doc Shepherd's form,
> which is one reason it's still hanging around...

WFM

> This document uses RFC2119 keywords, but does not contain the recommended
>> RFC8174 boilerplate. (It contains some text with a similar beginning.)
> [acm] I think this is covered, section 1.1.

WFM

>> 
>> --------------------------------------------------------------------------
>> -----
>> All comments below are about very minor potential issues that you may choose to
>> address in some way - or ignore - as you see fit. Some were flagged by
>> automated tools, so there will likely be some false positives. There is no need
>> to let me know what you did with these suggestions.
> 
> Ok, thanks.  I would like to know which tool(s) you used (and if free,
> even on a trial basis). Seemed to do a decent job.

https://github.com/larseggert/ietf-reviewtool

Thanks,
Lars

Attachment: signature.asc

[ippm] Lars Eggert's Discuss on draft-ietf-ippm-c… Lars Eggert via Datatracker
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… MORTON JR., AL
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… Lars Eggert
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… Martin Duke
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… MORTON JR., AL
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… Martin Duke
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… Martin Duke
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… Lars Eggert
Re: [ippm] Lars Eggert's Discuss on draft-ietf-ip… MORTON JR., AL

Re: [ippm] Lars Eggert's Discuss on draft-ietf-ippm-capacity-metric-method-10: (with DISCUSS and COMMENT)

Attachment: signature.asc