Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

"Y. Richard Yang" <yang.r.yang@gmail.com> Fri, 17 December 2021 20:52 UTC

Return-Path: <yang.r.yang@gmail.com>
X-Original-To: alto@ietfa.amsl.com
Delivered-To: alto@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1FA553A0744; Fri, 17 Dec 2021 12:52:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.903
X-Spam-Level: **
X-Spam-Status: No, score=2.903 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, GB_SUMOF=5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OLOp6MA5jYIw; Fri, 17 Dec 2021 12:52:18 -0800 (PST)
Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03AA33A074B; Fri, 17 Dec 2021 12:52:17 -0800 (PST)
Received: by mail-yb1-xb30.google.com with SMTP id v203so9885292ybe.6; Fri, 17 Dec 2021 12:52:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aZEo4Hscef+DKhfsAbW+neFU+sXgGlNXe9RW17nrZT8=; b=Oz6j7wAHiy1rJ7uJDuDZsLDB2TmilYRiRviCpMKEdvj02J9WQZrvSRTS4UmU0YZ/D5 CMIDc4QW7la6g3+khqz8jWr0Bqy2HYSxN13Lee/pO1BarpKoEKlG+4hiJtIAh36RNdg2 iTcP5dT0SkHKzYXBYRvF5MTWnC9dhAQQ4OEU9a9Bl0U0eBbyx0kri/X7t88mjnF0DF/E mpnzwitFdPJkDt/XonMXfSgeHDri2faRHgDZ+SDEoOeGPiRvO/5eJhYbC4Ilnxd5gehy p3dTYQg9Mh6GRx+Jp2bdHl2DXM1v5pyisZkk4Sz+Yw3xTohWB/Oe4zP4cB5SwcGi85q+ zvDA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aZEo4Hscef+DKhfsAbW+neFU+sXgGlNXe9RW17nrZT8=; b=Av1pG9RYP9jsONnp6+T5oS3zI/SYT0Lxef+tibmjuey+VXNtj5zMjxTSQm8tu5pFcL ple306fJ9dfE3jDEsY7XCdAiODZkHAUiCS8xt/Ct4y/ltrIkCYgW/Q1YEtbFcxGnBKGj BMFHpC2x6SAwA+iaX6Eec68misALlOsu2JQhiOZSMN7oaGLqvKp55awdxu9wAvU3DcvU yBI+2OnIzzbKpNy9dmtmeRh7uGuIFo+hQ2Dbi1Kc5LsmPxpO0/ZawCyLs4NKOCuYudRe MPQqDZiwrBV5CtI+vqBdXS9xVUR4WFQUhVBWfEJkmLGLX9yH5Hu08T2fGSv2YeY3eYy2 LQPQ==
X-Gm-Message-State: AOAM530kewdNqRMmCmOEh+sb9ztcfyW5e1vdAfXS9HmPVRo4T0g8ISBX H3yU8KAhGS55RjZmPEU39faqo/5duGihPeKIvbktltuP
X-Google-Smtp-Source: ABdhPJxlZ5DFKvCWEvyyWZjUziYYHhHUPcEcVrkoLl+9/HIrHKTmllsI3MR/PHCDmviWY8/qiH+AVDg09ak0p210Sck=
X-Received: by 2002:a25:3802:: with SMTP id f2mr6769749yba.658.1639774335765; Fri, 17 Dec 2021 12:52:15 -0800 (PST)
MIME-Version: 1.0
References: <33a6976ed85d4b0fbfa72e6b0cc39269@huawei.com> <20211208234326.GI11486@mit.edu>
In-Reply-To: <20211208234326.GI11486@mit.edu>
From: "Y. Richard Yang" <yang.r.yang@gmail.com>
Date: Fri, 17 Dec 2021 15:52:04 -0500
Message-ID: <CANUuoLrcJtiX-r1_dwoV662YkCMDscOd86T+DjdzwJEQAkuRPg@mail.gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: Qin Wu <bill.wu@huawei.com>, "draft-ietf-alto-performance-metrics@ietf.org" <draft-ietf-alto-performance-metrics@ietf.org>, "alto@ietf.org" <alto@ietf.org>, The IESG <iesg@ietf.org>, "alto-chairs@ietf.org" <alto-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000cb915605d35db65f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/alto/Wz-i8y5QyTdKdD-mtsvcSMy4sFY>
Subject: Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/alto/>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Dec 2021 20:52:23 -0000

Hi Ben,

Thank you so much for the wonderful review and we have taken a pass of the
document. One quick question: should we upload a newer version (v21) so
that you can check the detailed edits using diff or you prefer we post the
revised text inline in reply?

Thank you so much!
Richard


On Wed, Dec 8, 2021 at 6:43 PM Benjamin Kaduk <kaduk@mit.edu> wrote:

> Hi Qin,
>
> It looks like the only topic that's potentially unresolved is the BCP 18
> question.  I think internationalization is a topic where we mostly look to
> the ART ADs for guidance, and I'm reluctant to claim any kind of authority
> on the "right thing to do".  Mostly I wanted to raise the topic for
> visibility in case anyone else had any thoughts; if no one else replies, I
> think the authors should do what they feel best (which could include making
> no change to the draft).
>
> Thanks,
>
> Ben
>
> On Mon, Dec 06, 2021 at 01:25:20PM +0000, Qin Wu wrote:
> > Hi, Ben:
> > -----邮件原件-----
> > 发件人: alto [mailto:alto-bounces@ietf.org] 代表 Benjamin Kaduk
> > 发送时间: 2021年12月4日 6:30
> > 收件人: Qin Wu <bill.wu@huawei.com>
> > 抄送: draft-ietf-alto-performance-metrics@ietf.org; alto@ietf.org; The
> IESG <iesg@ietf.org>; Y. Richard Yang <yang.r.yang@gmail.com>;
> alto-chairs@ietf.org
> > 主题: Re: [alto] Benjamin Kaduk's Discuss on
> draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)
> >
> > Hi Qin,
> >
> > On Thu, Dec 02, 2021 at 09:04:18AM +0000, Qin Wu wrote:
> > > Thanks Ben for detailed valuable review, see reply and clarification
> below.
> > >
> > > -----邮件原件-----
> > > >发件人: Benjamin Kaduk via Datatracker [mailto:noreply@ietf.org]
> > > >发送时间: 2021年12月2日 13:05
> > > >收件人: The IESG <iesg@ietf.org>
> > > >抄送: draft-ietf-alto-performance-metrics@ietf.org;
> > > >alto-chairs@ietf.org; alto@ietf.org; ietf@j-f-s.de; ietf@j-f-s.de
> > > >主题: Benjamin Kaduk's Discuss on
> > > >draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)
> > >
> > > >Benjamin Kaduk has entered the following ballot position for
> > > >draft-ietf-alto-performance-metrics-20: Discuss
> > >
> > > >When responding, please keep the subject line intact and reply to all
> > > >email addresses included in the To and CC lines. (Feel free to cut
> > > >this introductory paragraph, however.)
> > >
> > >
> > > >Please refer to
> > > >https://www.ietf.org/blog/handling-iesg-ballot-positions/
> > > >for more information about how to handle DISCUSS and COMMENT
> positions.
> > >
> > >
> > > >The document, along with other ballot positions, can be found here:
> > > >https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/
> > >
> > >
> > >
> > > >---------------------------------------------------------------------
> > > >-
> > > >DISCUSS:
> > > >---------------------------------------------------------------------
> > > >-
> > >
> > > >These should all be trivial to resolve -- just some minor internal
> inconsistencies that need to be fixed before publication.
> > >
> > > >The discussion of percentile statistical operator in §2.2 is
> internally inconsistent -- if the percentile number must be an integer,
> then p99.9 is not valid.
> > > [Qin Wu] Yes, the percentile is a number following the letter 'p', but
> > > in some case when high precision is needed, this percentile number
> will be further followed by an optional decimal part The decimal part
> should start with the '.' separator. Maybe the separator cause your
> confusion. See definition in section 2.2 for details:
> > > "
> > >    percentile, with letter 'p' followed by a number:
> > >       gives the percentile specified by the number following the letter
> > >       'p'.  The number MUST be a non-negative JSON integer in the range
> > >       [0, 100] (i.e., greater than or equal to 0 and less than or equal
> > >       to 100), followed by an optional decimal part, if a higher
> > >       precision is needed.  The decimal part should start with the '.'
> > >       separator (U+002E), and followed by a sequence of one or more
> > >       ASCII numbers between '0' and '9'.
> > > "
> > > Let us know if you think separator should be changed or you live with
> the current form.
> >
> > Oops, that's my mistake and you are correct.  Sorry about that; I agree
> that no change is needed here.
> >
> > [Qin Wu] Great, thanks.
> > > >Also, the listing of "cost-source" values introduced by this document
> (in §5.1) does not include "nominal", but we do also introduce "nominal".
> > > [Qin Wu] I agree with this inconsistency issue, should be fixed in the
> next version. Thanks.
> > > >Similarly, in §3.1.3 we refer to the "-<percentile>" component of a
> cost metric string, that has been generalized to an arbitrary statistical
> operator.
> > > [Qin Wu] No, it is not arbitrary statistics operator, We did add a
> > > statement to say "
> > >    Since the identifier
> > >    does not include the -<percentile> component, the values will
> > >    represent median values.
> > > "
> > > The median value has been defined in the section 2.1 as middle-point
> > > of the observation, see median definition in section 2.2 "
> > >    median:
> > >       the mid-point (i.e., p50) of the observations.
> > > "
> >
> > Hmm, I am not sure whether my point came through properly or not.  Let
> me try again.
> >
> > In Section 3.1.3, we see the text:
> >
> >    Comment: Since the "cost-type" does not include the "cost-source"
> >    field, the values are based on "estimation".  Since the identifier
> >    does not include the -<percentile> component, the values will
> >    represent median values.
> >
> > This is the only place in the document where the string "<percentile>"
> > appears, and in particular we do not define a "percentile component"
> > anywhere that I can see.  We do, however, define a "statistical operator"
> > string (component) of a cost metric string, in Section 2.2.  In
> particular, we do have options for the statistical operator string that are
> *not* representable as percentile values, such as stddev and cur.  So, I
> think it is inaccurate to write "-<percentile>" component here.  I propose
> to instead say "Since the identifier does not include a statistical
> operator component, the values will represent median values."
> >
> > [Qin Wu] Thank for clarification, I agree with your proposed change.
> > > >---------------------------------------------------------------------
> > > >-
> > > >COMMENT:
> > > >---------------------------------------------------------------------
> > > >-
> > >
> > > >All things considered, this is a pretty well-written document that
> was easy to read.  That helped a lot as I reviewed it, especially so on a
> week with a pretty full agenda for the IESG telechat.
> > >
> > > >Section 2.2
> > >
> > > >Should we say anything about how to handle a situation where a base
> metric identifier is so long that the statistical operator string cannot be
> appended while remaining under the 32-character limit?
> > > [Qin Wu] I think base metric identifier should not be randomly
> selected, full name of base metric is not recommended, probably short name
> or abbreviation should be used if cost metric string is too long.
> > > But I am not sure we should set rule for this. Maybe the rule "The
> total length of the cost metric string MUST NOT exceed 32 " defined in
> RFC7285 is sufficient?
> >
> > As far as formal requirements go, that may be all we need.  Assuming
> that no one needs a percentile value with more than two digits of precision
> after the decimal point, the longest statistical operator component we
> currently define is seven characters, e.g., "-median".  So if someone
> happens to define a base metric identifier that's more than 25 characters,
> we set ourselves up for a situation where we can use the base metric but
> can't use -median, -stddev, or -stdvar.  If it's less than 28 characters we
> could still use -cur, -min, -max, etc., which would be a rather strange
> situation to be in!
> >
> > I suspect that the right practical approach, if this situation ever
> arose, would be to define a new base metric identifier that's an alias for
> the existing one -- just a shorter name but with the same semantics.  So we
> might end up with some text like:
> >
> > % RFC 7258 limits the overall cost metric identifier to 32 characters.
> The
> > % cost metric variants with statistical operator suffixes defined by
> this
> > % document are also subject to the same overall 32-character limit, so
> > % certain combinations of (long) base metric identifier and statistical
> > % operator will not be representable.  If such a situation arises, it
> could
> > % be addressed by defining a new base metric identifier that is an
> "alias"
> > % of the desired base metric, with identical semantics and just a
> shorter
> > % name.
> >
> > [Qin Wu] The proposed changes look great, thank for input.
> > > >   min:
> > > >      the minimal value of the observations.
> > > >   max:
> > > >      the maximal value of the observations.
> > > >   [...]
> > >
> > > >Should we say anything about what sampling period of observations is
> in scope for these operators?
> > > [Qin Wu] I think sampling period of observation is related to Method
> of Measurement or Calculation, based on earlier discussion and agreement in
> the group, we believe this more depends on measurement methodology or
> metric definition, which in some cases not necessary or feasible, we can
> look into metric definition RFC for more details. see clarification in
> section 2 for more details.
> >
> > Okay, that's a good way to handle it.
> > [Qin Wu] Thanks.
> > > >Section 3.x.4
> > >
> > > >If we're going to be recommending that implementations link to
> > > >external human-readable resources (e.g., for the SLA details of
> estimation methodology), does the guidance from BCP 18 in indicating the
> language of text come into play?
> >
> > (This was a separate point than the following paragraph, to be clear.  I
> don't have a good answer to propose.)
> >
> > [Qin Wu] I missed this comment, sorry about that.
> > I think the specification of SLA details is not scope of this document,
> but BCP 18 section 4.1 and section 4.5 will provide some guideline on how
> to specify those details. Let me know if you prefer us to add reference to
> BCP 18 instead of leaving those beyond scope.
> >
> > > >It's also a bit surprising that we specify the new fields in the
> "parameters" of a metric just in passing in the prose, without a more
> prominent indication that we're defining a new field.
> > > [Qin Wu] See CostContext defintion in section2.1, "parameters" is
> included in Costcontext object.
> >
> > Ah.  I think I forgot that the "parameters" were new in this document;
> sorry about that.
> > [Qin Wu] No problem.
> > > >Section 3.1.4
> > >
> > > >   "nominal": Typically network one-way delay does not have a nominal
> > > >   value.
> > >
> > > >Does that mean that they MUST NOT be generated, or that they should
> > > >be ignored if received, or something else?  (Similarly for the other
> > > >sections where we say the same thing.)
> > > [Qin Wu] Yes, that is my understanding. We can add a statement to make
> this behavior clear.
> > >
> > > >   This description can be either free text for possible presentation
> to
> > > >   the user, or a formal specification; see [IANA-IPPM] for the
> > > >   specification on fields which should be included.  [...]
> > >
> > > >Is the IANA registry really the best reference for what fields to
> include?  Tpically we would only refer to the registry when we care about
> the current state of registered values, but the need here seems to
> effectively be >the column headings of the registry, which could be
> obtained from the RFC defining the registry.
> > > [Qin Wu] In this IANA registry, it provide Metric Name, Metric URI,
> click URI details, it provide you more details of measurement methodology.
> That is why [IANA-registry] reference is selected, maybe we can make this
> more clear in the text.
> > >
> > > >Section 3.3.3
> > >
> > > >   Intended Semantics: To specify spatial and temporal aggregated
> delay
> > > >   variation (also called delay jitter)) with respect to the minimum
> > > >   delay observed on the stream over the one-way delay from the
> > > >   specified source and destination.  The spatial aggregation level is
> > > >   specified in the query context (e.g., PID to PID, or endpoint to
> > > >   endpoint).
> > >
> > > >I do appreciate the note about how this is not the normal statistics
> > > >variation that follows this paragraph, but I also don't think this is
> > > >a particularly clear or precise specification for how to produce the
> > > >number that is be reported.  It also doesn't seem to fully align with
> > > >the prior art in the IETF, e.g., RFC 3393.  It seems like it would be
> > > >highly preferrable to pick an existing RFC and refer to its
> > > >specification for computing a delay variation value.  (To be clear,
> > > >such a reference would then be a normative reference.)
> > > [Qin Wu] Agree, we are not introducing a new metric, we just expose
> the existing metric defined in RFC3393. Also I agree to move RFC3393 as
> normative reference, will see how to fix this.
> > > >Section 3.4.3
> > >
> > > >   Intended Semantics: To specify the number of hops in the path from
> > > >   the specified source to the specified destination.  The hop count
> is
> > > >   a basic measurement of distance in a network and can be exposed as
> > > >   the number of router hops computed from the routing protocols
> > > >   originating this information.  [...]
> > >
> > > >It seems like this could get a little messy if there are multiple
> routing protocols in use (e.g., both normal IP routing and an overlay
> network, as for service function chaining or other overlay schemes).
> > > >I don't have any suggestions for disambiguating things, though, and
> if the usage is consistent within a given ALTO Server it may not have much
> impact on the clients.
> > > [Qin Wu] Hop count has been implicitly mentioned in RFC7285, this
> document specify this metric explicitly.
> > > I am thinking which protocol is used can be indicated in in the link
> (a field named "link") providing an URI to a description of the
> "estimation" method.
> > > >Section 3.4.4
> > >
> > > >   "sla": Typically hop count does not have an SLA value.
> > >
> > > >As for "nominal", earlier, is there any guidance to give on not
> generating it or what to do if it is received?
> > > > (Also appears later, I suppose.)
> > > [Qin Wu] Will see how to provide guidance on this, thanks.
> > > >Section 4.1.4
> > >
> > > >   "estimation": The exact estimation method is out of the scope of
> this
> > > >   document.  See [Prophet] for a method to estimate TCP throughput.
> It
> > > >   is RECOMMENDED that the "parameters" field of an "estimation" TCP
> > > >   throughput metric provides two fields: (1) a congestion-control
> > > >   algorithm name (a field named "congestion-control-alg"); and (2) a
> > > >   link (a field named "link")to a description of the "estimation"
> > > >   method.  Note that as TCP congestion control algorithms evolve
> (e.g.,
> > > >   TCP Cubic Congestion Control [I-D.ietf-tcpm-rfc8312bis]), it helps
> to
> > > >   specify as many details as possible on the congestion control
> > > >   algorithm used.  This description can be either free text for
> > > >   possible presentation to the user, or a formal specification.
> > > > [...]
> > >
> > > >Do these specifics go into the "congestion-control-alg" name, or in
> the linked content?
> > > [Qin Wu] My understanding is the later, but two fields will be
> provided by one "parameters" field which can be seen as JSON object since
> "parameters" is a plural of "parameter".
> >
> > I was hoping it would be the latter :)
> > Maybe add a clause at the end of the last quoted sence like ", as part
> of the linked contents"?
> >
> > [Qin Wu] Okay, will add clarified text, thanks.
> >
> > > >Section 5.3
> > >
> > > >   To address the backward-compatibility issue, if a "cost-metric" is
> > > >   "routingcost" and the metric contains a "cost-context" field, then
> it
> > > >   MUST be "estimation"; if it is not, the client SHOULD reject the
> > > >   information as invalid.
> > >
> > > >This seems like a sub-optimal route to backwards compatibility, as it
> would (apparently) permanently lock the "routingcost" metric to only the
> "estimation" source with no way to negotiate more flexibility.  Unless we
> >define a new "routingcost2" metric that differs only in the lack of this
> restriction, of course.
> > > [Qin Wu] Probably we should have a default value for cost-context, I
> think the default value is estimation since legacy client only support
> metric estimation.
> > > >Section 5.4.1
> > >
> > > >   the ALTO server may provide the client with two pieces of
> additional
> > > >   information: (1) when the metrics are last computed, and (2) when
> the
> > > >   metrics will be updated (i.e., the validity period of the exposed
> > > >   metric values).  The ALTO server can expose these two pieces of
> > > >   information by using the HTTP response headers Last-Modified and
> > > >   Expires.
> > >
> > > >While this seems like it would work okay in the usual case, it seems
> a bit fragile, in that it may fail in boundary cases, such as when a server
> is just starting up.  I would lean towards recommending use of explicit
> data items to convey this sort of information (and also the overall
> measurement interval over which statistics are computed, which may not
> always go back to "the start of time").
> > > [Qin Wu] Okay.
> > > >Section 5.4.2
> > >
> > > >   often be link level.  For example, routing protocols often measure
> > > >   and report only per link loss, not end-to-end loss; similarly,
> > > >   routing protocols report link level available bandwidth, not
> end-to-
> > > >   end available bandwidth.  The ALTO server then needs to aggregate
> > > >   these data to provide an abstract and unified view that can be more
> > > >   useful to applications.  The server should consider that different
> > > >   metrics may use different aggregation computation.  For example,
> the
> > > >   end-to-end latency of a path is the sum of the latency of the links
> > > >   on the path; the end-to-end available bandwidth of a path is the
> > > >   minimum of the available bandwidth of the links on the path.
> > >
> > > >Some caution seems in order relating to aggregation of loss
> measurements, as loss is not always uncorrolated across links in the path.
> > > [Qin Wu] Agree, but here we just provide examples.
> >
> > That is true ... I am approaching this from the sense that there is
> pretty nasty "gotcha" that could trip up an implementor, that is very
> adjacent to what we do talk about, so adding a caution would be only a
> minor change.
> > E.g. (after the quoted text), "In contrast, aggregating loss values is
> complicated by the potential for correlated loss events on different links
> in the path."
> >
> > [Qin Wu] Agree to add caution, thank for proposed text and will consider
> it.
>


-- 
Richard