Re: [tsvwg] start of WGLC on L4S drafts

Bob Briscoe <ietf@bobbriscoe.net> Sun, 07 November 2021 16:40 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 143863A0824 for <tsvwg@ietfa.amsl.com>; Sun, 7 Nov 2021 08:40:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.429
X-Spam-Level:
X-Spam-Status: No, score=-5.429 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-3.33, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B0uA19Vf5iUx for <tsvwg@ietfa.amsl.com>; Sun, 7 Nov 2021 08:40:18 -0800 (PST)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E05483A083F for <tsvwg@ietf.org>; Sun, 7 Nov 2021 08:40:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=FfqkCcHdpxZmrdvjhNRZBYcVdeRlEgh9TN5X5AelihQ=; b=mDVpCsGRcvvrD5agdVqu4BuVAI mOqrs4DT8vIkevhIWXS5N45cOs23nf+2aCIb8E/AoNgHMCIkmQHCHbyDaqb66oGib3n8v4Q5yy3Ka jqm8UHRmGVmU3seALcwnZMwidhyUbv1CK7MxTDDI3GYKREtdfTSkamDfZfUkAnXdI/C5Y83LBWFHS sqo86o8YXWjinJQZu6M1DrbwCKhzWYNRoAetN32yLjp/mO9gl3Vy+XyUHrVJcDn5I19zS+mD24591 95W1fSQ2Oj02oN4GVqwxkcceeERb2oNTRcaxRX72208VdujjTrmtKLD3lMVGFaKCQx4OqXeFYH/1u zrL/esYQ==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:50054 helo=[192.168.1.11]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <ietf@bobbriscoe.net>) id 1mjlDL-0007sQ-GH; Sun, 07 Nov 2021 16:40:15 +0000
To: "Holland, Jake" <jholland=40akamai.com@dmarc.ietf.org>, Wesley Eddy <wes@mti-systems.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <7dd8896c-4cd8-9819-1f2a-e427b453d5f8@mti-systems.com> <C220377C-0A9A-4A0E-989A-2A8D19DE7475@akamai.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <dc9e5fda-1619-5a66-c1b8-257803cd4a8f@bobbriscoe.net>
Date: Sun, 07 Nov 2021 16:40:13 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <C220377C-0A9A-4A0E-989A-2A8D19DE7475@akamai.com>
Content-Type: multipart/alternative; boundary="------------36F7EE01E42A3D92A282FE26"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/fTRTX5ntxhOqO61ggVaIPH8Vwfw>
Subject: Re: [tsvwg] start of WGLC on L4S drafts
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Nov 2021 16:40:24 -0000

Jake,

Thank you for your new reviews. Pls see [BB] inline...

On 21/08/2021 21:11, Holland, Jake wrote:
> Overall:
> The documents are in much better shape than the last time I reviewed
> them, thanks for all the improvements.
>
> I'm reviewing l4s-id and l4s-arch but didn't have time to get to
> dualq-aqm yet.  But I wanted to make sure these got posted, and
> might not have more time before the deadline.
>
> ----
>
> draft-ietf-tsvwg-ecn-l4s-id-19:
> There's a few major issues.  These really should be fixed before
> sending it to the IESG, IMO, but with these fixed I'd be happy to
> see it shipped.
>
> Major:
>
> 1. This should be proposed standard, not experimental.
>
> [snip]

[BB] See separate thread.

>
> 2. This text from Section 4.3 should be strengthened to recommend
> detection of paths that might be classic-marking L4S traffic (shared
> queue or not):
>
>     o  In uncontrolled environments, monitoring MUST be implemented to
>        support detection of problems with an ECN-capable AQM at the path
>        bottleneck that appears not to support L4S and might be in a
>        shared queue.
>
> Even in fq, the standing queue that will be built by a scalable cc with
> classic marking is contrary to "1-2ms latency at 99th percentile" goal
> of L4S, even leaving aside the harm to colliding flows (though of course
> the harm to colliding flows, including from hash collisions in fq, is
> also another reason to make this change).

[BB] If we make this requirement so that monitoring MUST report all 
Classic ECN AQMs, operators could just have a sea of data that doesn't 
say whether there are any of the single-queue AQMs out there - much more 
likley to be a potential problem than FQ AQMs. The key word in this 
requirement is 'problems'.

Also note that the word 'might' in 'might be in a shared queue' allows 
for the case of a flow-queue that is exhibiting problems at the time it 
is monitored (e.g. during real time monitoring when within a VPN). BTW, 
we created a matrix of cases, and checked all the possible cases when we 
wordsmithed this requirement.

Also, detecting a flow-queue that is ECN marking but appears to exceed 
expected L4S delay is already in the flow's self-interest. So I wouldn't 
worry about trying to legislate over-precisely about this case. For 
instance, of the two OS developers that I have had detailed 
conversations with about this, both are working on the basis that it's 
worth taking sthg like the max of the responses to increasing delay and 
to ECN marks, which will automatically lead to a Classic-like response 
in an FQ-CoDel queue, as well as in other cases like suddenly reduced 
capacity.

> 3. This text from Section 4.3 should be strengthened to require that
> sending L4S traffic in uncontrolled environments does not happen when
> classic marking of L4S traffic is detected for a shared queue, and at
> least recommend that it not happen even for fq:
>
>     o  In uncontrolled environments, monitoring MUST be implemented to
>        support detection of problems with an ECN-capable AQM at the path
>        bottleneck that appears not to support L4S and might be in a
>        shared queue.  Such monitoring SHOULD be applied to live traffic
>        that is using Scalable congestion control.  Alternatively,
>        monitoring need not be applied to live traffic, if monitoring has
>        been arranged to cover the paths that live traffic takes through
>        uncontrolled environments.
>
> The current text requires monitoring, but only gives a single SHOULD for
> live traffic, plus non-normatively permits one alternative.  This allows
> operators to monitor but not cut off (so this requirement as currently
> written would be satisfied by write-only logging for instance, with the
> SHOULD easily dismissable with an "implementation complexity" hand-wave
> while still following the spec).
>
> Suggested alternative, feel free to edit:
>
>     o  In uncontrolled environments, L4S traffic MUST NOT be sent without
>        monitoring to detect marking of L4S traffic by non-L4S bottlenecks.
>        This monitoring can for example be performed on live traffic, or
>        can rely on monitoring that covers the paths live traffic takes
>        through uncontrolled environments.  Where non-L4S bottlenecks are
>        observed marking L4S traffic, L4S sending MUST be disabled if the
>        bottleneck is a shared queue, and SHOULD be disabled if it is FQ.

[BB] Aside: Your middle sentence has (intentionally?) lost the 'SHOULD' 
preference for monitoring live, rather than out of band.

I think the next para in the draft (not given in your quote) with its 
mandatory final sentence is stronger than your alternative.

       The detection function SHOULD be capable of making the congestion
       control adapt its ECN-marking response to coexist safely with
       Classic congestion controls such as standard Reno [RFC5681  <https://datatracker.ietf.org/doc/html/rfc5681>], as
       required by [RFC5033  <https://datatracker.ietf.org/doc/html/rfc5033>].  Alternatively, if adaptation is not
       implemented and problems with such an AQM are detected, the
       scalable congestion control MUST be replaced by a Classic
       congestion control.

The differences are:
* Replacement is a 'MUST' not a 'SHOULD', but conditional on 'problems'
* It doesn't distinguish between shared queue and FQ.

We didn't think it was reasonable to give the sender a requirement that 
depends on what type of queue it encounters, because the sender doesn't 
know the type for sure. That's why we made this conditional on 
'problems' instead. The last sentence of this passage from the 
referenced Appendix A.1.5 explains why:

    CDN servers placed within an access ISP's
    network can be considered as a single controlled environment, but any
    onward networks served by the access network, including all the
    attached customer networks, would be unlikely to fall under the same
    degree of coordinated control.  Monitoring is expressed as a 'MUST'
    for these uncontrolled segments of paths (e.g.  beyond the access ISP
    in a home network), because there is a possibility that there might
    be a shared queue Classic ECN AQM in that segment.  Nonetheless, the
    intent is to only require occasional monitoring of these uncontrolled
    regions, and not to burden CDN operators if monitoring never uncovers
    any potential problems, given it is anyway in the CDN's own interests
    not to degrade the service of its own customers.


In summary I think your suggestion of conditioning on the type of queue 
is likely to be infeasible and therefore prone to being ignored.
This is not easy but, on balance, I don't think I've seen anything here 
that is tighter than the existing wording.


>
> 4. Although l4s-arch claims that l4s-id satisfies the RFC 4774
> requirements, it's hard to tell whether it does so.  Specifically:
>
> 4.a. From section 7 of RFC 4774:
>     Specifications of alternate ECN semantics must clearly state how they
>     address the issues raised in this document, particularly the issues
>     discussed in Section 2.
>
> I don't see how issues 2 and 3 from section 2 are covered in l4s-id.
>  From section 4 of RFC 4774:
>     (2) How does the possible presence of old routers affect the
>         performance of the alternate-ECN connections?
>
>     (3) How does the possible presence of old routers affect the
>         coexistence of the alternate-ECN traffic with competing traffic
>         on the path?
>
>     When alternate semantics are defined for the ECN field, it is
>     necessary to ensure that there are no problems caused by old routers
>     along the path that don't understand the alternate ECN semantics.
>
> 4.b. Section 4 goes on to describe 3 options for how alternate ECN
> semantics should be treated.  I don't see a claim in the L4S docs
> specifying which of the 3 options the L4S spec for alternate ECN semantics
> matches, but it implicitly appears to be trying to say it's option 3
> (unsafe) and asserting that the detection and adaptive response satisfies
> the requirement for isolation on option 3, I think?
>
> Maybe there are sections in l4s-id that intend to cover these points,
> and there just needs to be text listing what they are, but I don't think
> the link is obvious enough to satisfy the "clearly state" requirement
> from RFC 4774.  So it would be very helpful to add a list of references
> to sections that are intended to address these RFC 4774 requirements.

[BB] After guidance from the chairs, the co-authors have been working 
offlist on a fairly lengthy new subsection in ecn-l4s-id about where the 
RFC4774 requirements are and are not satisfied, and how that is to be 
dealt with. Rather than pasting it all here (it's long) you should see 
it later today once the secretariat allow the draft through (or 
otherwise when the servers re-open in the morning). And I'm sure the 
chairs will give everyone time to read it over, and review it.


>
>
> Nits:
> - the list in section 7.1 has a weird formatting problem for the sub-
>    list:
>
>     o  Did use of L4S over the Internet result in improvements to the
>        following metrics:
>
>     o
>
>        *  queue delay (mean and 99th percentile) under various loads;

[BB] Thx - (had already noticed this one myself as well).

Continued...

>
> ----
>
> draft-ietf-tsvwg-l4s-arch-10:
> Summary: Almost ready
>
> +1 to Gorry's comments here, especially regarding the use of "all traffic":
> https://mailarchive.ietf.org/arch/msg/tsvwg/vMMsQpXs65lk1E7NpV5RlmpyqdI/

[BB] Accepted - see response to Gorry/Alex.

>
> Minor:
> 1. l4s-arch section 1:
>     It has been demonstrated that, once access network bit rates reach
>     levels now common in the developed world, increasing capacity offers
>     diminishing returns if latency (delay) is not addressed.
> - This needs a reference.

[BB] I've added:

   [Dukkipati15]
               Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is
               the Right Metric for Congestion Control", ACM CCR
               36(1):59--62, January 2006,
               <https://dl.acm.org/doi/10.1145/1111322.1111336>.
   [Rajiullah15]
               Rajiullah, M., "Towards a Low Latency Internet:
               Understanding and Solutions", Masters Thesis; Karlstad
               Uni, Dept of Maths & CS 2015:41, 2015, <https://www.diva-
               portal.org/smash/get/diva2:846109/FULLTEXT01.pdf>.


> Editorial:
> 1. l4s-arch 4.2 section a:
> - This is a confusing wall of text.  I think it would be better to
>    give a much briefer summary here with a reference.  Exposition
>    like "the obvious part" and "the less obvious part" are a minus
>    here--I don't think the obviousness claims made here generalize
>    well.

[BB] When it only referenced the DualQ draft and only gave a short 
summary, we were asked to summarize more fully how the DualQ works here 
(sigh!).

So we've kept the explanation here, but de-confused and de-walled it a bit:
* Removed the 'obvious/non-obvious' wording.
* Within the largest block: Introduced sub-bullets for the two parts of 
the semi-permeable membrane, each starting "Latency isolation:" and 
"Bandwidth pooling:". Then continued the part about the scheduler with 2 
further bullets, again starting each with "for latency isolation" and 
"for bandwidth pooling".
* Cut out some redundancy.
* I considered shifting out the last para of the DualQ bullet, which is 
about the DualQ document, but I decided there was no better place for it.

It's not brilliant, but I think it's better.

>
> 2. All the uses of underlining for emphasis are a minus.  The
>    places where it seems necessary or useful are a good hint that
>    the text on its own is not adequately capturing the intended
>    meaning.
>    Leaning on this kind of toned emphasis makes assumptions about
>    connotations that don't hold very well even for native English
>    speakers and break down entirely for non-native speakers, so they
>    are generally out of place in a technical document that will need
>    to be correctly interpreted by an international audience with many
>    non-native speakers, IMO.

[BB] I agree, use of emphasis is excessive. I've taken out /some/ but 
not /all/ ;)

The ones left are:
* "on /average/" (because the comparison with the other P99 is 3 lines 
of dense numbers away)
* "without the /need/ for per-flow operations" (to help the reader not 
to just run over the word without thinking)
* "without /requiring/ inspection" (---ditto---)

Hopefully, this is now down to a non-irritating level for those who 
don't like it, and now conforms to the guidance from the only source I 
could quickly find on international technical writing style:

    "Do not use italics for

      * mere emphasis. (Italics are acceptable if emphasis might
        otherwise be lost; in general, however, use syntax to provide
        emphasis.)"

American Psychological Association (APA, 2009, pp. 104-106)
via 
https://writing.stackexchange.com/questions/10750/when-should-i-use-italics-in-scientific-writing

(BTW, underscores in ASCII can mean either italics or underlining. In 
the HTML rendering, these words come out in italics - which was the 
intention)

>
> ----
>
> I'm going to be too pressed for time to do a more detailed review,
> but I wanted to get the above comments in.
>
> As a final aside, I'd like to see this happen.  The only goal I'm
> pursuing at this point wrt this work is avoiding preventable harm
> from pushing this out in a way that's likely to cause confusion
> when and if problems are encountered in production.  (In particular,
> I have given up on efforts to improve the signaling design, since
> the authors have rejected all such suggestions and this work needs
> to get moved out of the wg one way or another.)

[BB] On signalling design, I'd like to point out that it's not just the 
authors - the chairs have judged that the WG wants to go this way.

Whatever, thank you - for your reviews, and for your continuing 
perseverance and diligence.

Cheers

Bob

>
> Best,
> Jake
>
>
> On 07-29, 9:18 AM, "Wesley Eddy" <wes@mti-systems.com> wrote:
>
> This message is starting a combined working group last call on 3 of the
> L4S drafts:
>
> - Architecture: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4s-arch/
>
> - DualQ:
> https://datatracker.ietf.org/doc/draft-ietf-tsvwg-aqm-dualq-coupled/
>
> - ECN ID: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-ecn-l4s-id/
>
> The WGLC will last through 4 weeks from today, and then we'll see what
> to do next.  Please submit any comments you have on these to the TSVWG
> list in that timeframe.
>
> The chairs are considering a possible virtual interim following the
> close in order to work through feedback received.
>
> The work on the L4S operational guidance draft is continuing in
> parallel, but that draft is not being last called yet.
>
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/