Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21

Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21 - Nits

Greg White <g.white@CableLabs.com> Sun, 17 December 2023 01:07 UTC

From: Greg White <g.white@CableLabs.com>
To: Bob Briscoe <ietf@bobbriscoe.net>, Thomas Fossati <Thomas.Fossati@linaro.org>, Ruediger GEIB <Ruediger.Geib@telekom.de>
CC: tsvwg IETF list <tsvwg@ietf.org>
Thread-Topic: Review of draft-ietf-tsvwg-nqb-21 - Nits
Thread-Index: AQHaMIVoeWwc5whVTUOcrNc735DVrA==
Date: Sun, 17 Dec 2023 01:07:30 +0000
Message-ID: <E8AB6C25-76EB-4F89-B8CC-DE0FB8B6B688@CableLabs.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/16.79.23120117
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: DipGV0XkE+cFQImNMQO9RLBsZuX5zI2bOy4mpZGkvuB2k4L0a2cCzslorRpYr/+/MLkOv3HxPtLs0jOpPt/Zrrf/n+5fqVCn8o6db4qihaOcL/nqO0fIpiPCBLhfYvPaWwsoYGOHwEKVzxCXO7TJPcNKBpvbCZO4YfzxqasQAdmkc6ajRjwnX3eiUvThnCzmbVorQv7z38g1Vqz4cwjg+wiMlLz7CLRsZOl2UA7M/fAyNQUgq3/9A0NUl8gc69aIdQFoi3TBq9OASyxF5BL4MdpGpveLSZXLhOawuBnqDdBF0bMddH8j62X3bRlFNsgnrrQ0CF2qrfETMuqgTGgm1R7yKGmSuC4QPyj+ugBfxxVgjH+52HjyhK1sUEqkk6yJmrSPxTtb+nohYpmqwMcmWJD1K4IrfTJql0t3wsqwiPn9wdELbYqw5sSoFfVCapq5nL9nrVpB8xlSYCskTzJCE+cLm1an1gO+lkf678zaEFZXwMT/gnBG7kiSCPaY1padOGIbQY1xHNLdx4oXhOMkTdArdJxz8sH6bbeHEzquhfq6sKJqsNtDjDsUJ34eDWwELFp8MEcOB1lsOGYMxn6+xW0BH8pd6jgxkUVQud9JunMJaUvGkbvBQclD/QwGy7/C75Bp7Smehi5Jbd9tYArBYzGSU3xA5rZdrAdi+MaWLmGjOjfwG4yv8wRJAdzLDcZ06tvLqXaSekB8DXR/f1alK8fit4Df372kKd89LOW55vlE2vlBN+sNiSATw+E3rNQKanBqvASp2ymJOsXDBeDyifSghL0tC4stLBstmvn2EBxK2Z2oAOe+Ch6ABDh08a84EZH/elLIkTBh8N/OCdhewHTI0uTPTAmZinWb4csNjJ8BItSL365maSAqsUf9HSv9ykaaFLpLZaWRux6flssXnAG6VzDTrTkGH/+dgbJ658dO2nwRv7DRjqG2zP7K9hP9sYkon7l85zOO6XczUokO2dOjnpBvLZeynAK5ole/ETYWvMArw6WNSuxLbA9A/73+EkczBnPrcvbmEplc+hRurA361RD9Kt4xQgyixqdbByGcikvfo/d8PWZWsaLw90ySsQT4KCCQXnzsbup47Zj9NxJnyz7zPkJtHV4MwRd1kJQA7tBTgRRgnPxEBeqnCmXoFYNSdVyh4+8VFImfiYU03EvESYviVAkhPGPclAsk+OLqFp3isvrThlYXsBnqCyg3Lo1v/j6LtbeIph9Xf/DTp1ll0PfXLcaboj6JDE2sWHm+/aubR8vKyhVIoA/ceus7LRggxl3edTteqi/6ImgiZYETcBNhLrMoYrDSqMJuHIBMWQhuzA1MNKqirWs7xWV0qm8A9l3wmViX5F5GOENXgYV9hL0JOSm19A3GT2xMZ3fMcqrF+Yj0Dv4+4J5uoKE/h/CigBXl5O+/ItApgknbh3sfyvS7AieCzQpdKdgJXqzHju8KsDThkSexS4pDHRgbLBt7lCDBmd0jsu2hifhAYpbkx8qoee+lMZVHJOVvVBjVtoHhZI5iqq2b6HEqF5FSAfsGIIKKwfWSiHooGRjsCR8RPoOBbmBrARVgs48PqolH6bKCz6SJR16yk7jbo5WAuk1mNeFA6aP3XnYDcdPcYQ==
Content-Type: multipart/alternative; boundary="_000_E8AB6C2576EB4F89B8CCDE0FB8B6B688CableLabscom_"
MIME-Version: 1.0
X-OriginatorOrg: cablelabs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BN8PR06MB5892.namprd06.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 776eb5f9-48c7-4761-191d-08dbfe9c8b3c
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Dec 2023 01:07:30.7315 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: ce4fbcd1-1d81-4af0-ad0b-2998c441e160
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: s3hR+HAIGfH2V5xYr4EUOKHDpB/ihVdBErI/0R1k2VqINo2TqQGBVHxezo3ozsNgIAK7DzXanogrJo/70zPFew==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR06MB7758
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/EWVFxMBzc9RT_-EGEsrxLjnkl_k>
Subject: Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21 - Nits
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Dec 2023 01:07:41 -0000

Hi Bob,

I’ve made a series of edits to tackle the “Nits” first.  Also please see a few responses below, marked [GW].  I adopted all suggested changes unless marked otherwise here.

-Greg

From: Bob Briscoe <ietf@bobbriscoe.net>
Date: Tuesday, December 12, 2023 at 11:18 AM
To: Greg White <g.white@CableLabs.com>, Thomas Fossati <Thomas.Fossati@linaro.org>, Ruediger GEIB <Ruediger.Geib@telekom.de>
Cc: tsvwg IETF list <tsvwg@ietf.org>
Subject: Review of draft-ietf-tsvwg-nqb-21

Greg, Thomas, Rüdiger,

Here's my review of draft-ietf-tsvwg-nqb-21, which I understand is approaching WGLC shortly.
I haven't read the draft through in one sitting for some time. So, I'm afraid my review is long, but I've tried to suggest text for all the nits, which are by far the largest contributor to the length.

________________________________

[snip]
________________________________
Nits

Throughout:
§1 Intro "...managed by an end-to-end congestion control algorithm. Many of the commonly-deployed congestion control algorithms, such as Reno, Cubic or BBR, are designed to seek the available capacity of the end-to-end path..."
§3.2 "...it has not been used for these purposes end-to-end across the Internet."
§3.2 "...meeting the performance requirements of an application in an end-to-end context "
§3.2 "These mechanisms can be difficult or impossible to implement in an end-to-end context."
§3.2 "...the NQB PHB ... could conceivably be deployed end-to-end across the Internet."
§3.2 "...the performance requirements of applications cannot be assured end-to-end,"
§4.4: "End-to-end usage and DSCP re-marking"
§4.4 "...this PHB is expected to be used end-to-end across the Internet,"
§4.4 "...To ensure reliable end-to-end NQB PHB treatment,"
Appx A. "...it will severely limit the ability to provide NQB treatment end-to-end."

In the IETF transport area, "end-to-end" normally means 'between the end hosts without network involvement'. Perhaps 'whole-path' or other alternatives could be used, except in the very first case above?

[GW] In the context of DSCP PHBs, the term has a defined meaning. Per RFC2475:

        Use of the term "end-to-end" in a PHB
   definition should be interpreted to mean "host-to-host" for
   consistency.
[GW] That said, rather than explaining that here (or assuming the reader has read and remembered that detail from RFC2475), it seems that I can eliminate the term or replace it with another relatively easily.

s/this draft/this document/ (2 occurrences)
Abstract
"properties and characteristics" just one or the other would be sufficient, wouldn't it?
§1. Intro
"microflows (see [RFC2475] for the definition of a microflow)"
Summarize cross-reference: Surely it would be worth briefly restating this definition here, e.g. "microflows (application-to-application flows [RFC2475])."
I went to RFC2475 to check whether it was somehow different to the well-understood definition of microflow.

"...managed by a classic congestion control algorithm (as defined in [RFC9330]),"
Summarize cross-reference: As this is not a term that Diffserv engineers are likely to have come across, surely the referenced definition ought to be summarized here, e.g. "(one that coexists with standard Reno congestion control [RFC5681])"

"...to effectively use the link..."
I tripped up on this, initially assuming the alternative meaning of effectively (as in "it's effectively worthless"). How about:
    "...to use the link effectively..."
    "...to use the link efficiently..."

"Active Queue Management (AQM) mechanisms (such as PIE [RFC8033], DOCSIS-PIE [RFC8034], or CoDel [RFC8289])..."
It would be better to say specifically that you are talking about single queue AQMs here:
"Active Queue Management (AQM) mechanisms intended for single queues (such as PIE [RFC8033], DOCSIS-PIE [RFC8034], or PI2 [RFC9332])..."
and I think it would be preferable to leave mention of CoDel until FQ-CoDel later in the para - it would be controversial to imply that CoDel manages a single queue well (potentially with a large number of flows).

[GW] I added “intended for single queues” and reference to PI2 as suggested, but left CoDel in, since the sentence mentions “can improve QoE, but there are practical limits”, which I think is true for CoDel.  If CoDel performance is controversial, leaving it out of the list relieves it not only of the praise but also the criticism.

"If the AQM attempted to control the queue much more tightly, applications using those algorithms would not perform well."
Unnecessarily vague. How about "...would not fully utilize the link"?

"but these [FQ] are not appropriate for all bottleneck links, due to complexity or other reasons."
Rather than writing as if the IETF is pronouncing on this, how about
"but not all operators think they are appropriate for all bottleneck links, due to complexity or other reasons."
§3,. "Context"
You wouldn't normally expect an introductory section headed 'Context' to contain normative requirements. However, a couple of 'SHOULD's appear in the last para of §3.3. It might be better to shift that whole last para into the relevant requirements section (e.g. as a new subsection after §§4.2 & 4.3 which are also about mixtures of codepoints). Then state at the beginning of §3 that the whole section is informative only. However, perhaps I'm being too purist.

[GW] Point taken.  Also the first paragraph in Sec. 3.3 discusses requirements on PHB implementations (though doesn’t introduce any new ones directly).  That said, I’m going to leave this one as is for now. I think the discussion of the relationship to L4S fits well in the context section, and I don’t think the 3.3 requirements fit well in section 4 (or section 5 for the first paragraph items).  The first sentence of 3.3 states that NQB is defined independently of L4S (which I think is the right way to handle it), and moving those requirements into 4 (or 5) would start to remove that independence. If others agree with you that this text needs to move, I’ll reconsider.

§3.1. Non-Queue-Building Behavior
CURRENT:
"highly unlikely to exceed the available capacity of the network path between source and sink."
PROPOSED:
Add "...even at an inter-packet timescale." or similar wording.
REASON: people usually think of data rate as an averaged measure.
§3.2. Relationship to the Diffserv Architecture
CURRENT:
"and given no reserved bandwidth other than the bandwidth that it shares with Default traffic."
PROPOSED:
"and given no reserved bandwidth other than any minimum bandwidth that it shares with Default traffic."
REASON:
The current wording implies that all operators always give Default some reserved bandwidth.

CURRENT:
"Instead, the goal of the NQB PHB is to provide statistically better loss, latency, and jitter performance for traffic that is itself only an insignificant contributor to those degradations."
PROPOSED:
"Instead, the sole goal of the NQB PHB is to isolate NQB traffic from other traffic that degrades loss, latency and jitter, given that the NQB traffic is itself only an insignificant contributor to those degradations.
REASON:
The current wording implies that the PHB provides the better performance, which contradicts the statement in the introduction that it is NQB senders that provide the better performance, not the PHB. It would be worth repeating that here.
(see also similar comment later about the first para of §5.1 "Primary Requirements")

CURRENT:
"...relatively low data rates"
PROPOSED:
"...relatively low and smooth data rates"

CURRENT:
"The main distinctions between NQB and EF are discussed in Appendix B."
Summarize cross-reference: It would be useful to give a summary sentence here {Note 2 was my first attempt, but it's too long}.
[GW] I’m struggling to come up with a useful single sentence summary of Appendix B. Given that in this case the cross reference is in the same document, it doesn’t seem to me to overly burdensome to ask the reader to jump there to find that material.
§4.1. Non-Queue-Building Sender Requirements
CURRENT:
"Microflows that align with the description of behavior in the preceding paragraphs in this section SHOULD be identified to the network using a Diffserv Code Point (DSCP) of 45 (decimal) so that their packets can be queued separately from QB microflows."
PROPOSED:
"Microflows that mark their packets using a Diffserv Code Point (DSCP) of 45 (decimal) SHOULD  align with the description of behavior in the preceding paragraphs in this section, so that their packets can be queued separately from QB microflows with minimal harm to other NQB traffic."
REASONING:
The current wording is the wrong way round. It shouldn't recommend that all traffic that behaves like NQB has to be marked as NQB. Otherwise it would be saying that most EF, CS5, etc traffic SHOULD be marked as NQB instead.

[GW] The intent of this sentence was to say two things: 1) decimal 45 SHOULD be used as the NQB DSCP 2) the NQB DSCP should only be used on microflows that align with “the description”.  So, I think breaking those two concepts into separate sentences would achieve the goal and address your issue.  I’ve got:  “Microflows that are marked with the NQB DSCP SHOULD align with the description of behavior in the preceding paragraphs in this section. Applications are RECOMMENDED to use the Diffserv Code Point (DSCP) 45 (decimal)  to mark microflows as NQB.”
§4.2.Aggregation...
I'd prefer to see the last para shifted to after the first. These are the two paras with normative requirements in them, then the others are sort-of mitigations and exceptions. Also, the last para highlights the difference between treating NQB traffic as if it's Default, and re-marking it to be Default, which is the big important point here.

However, I can also see that the 3 paras in the middle at the moment relate more to the first para. So if the authors think the logical flow would be better as it is, I won't fight for this.
Retitle §4.2 & §4.3 (Aggregation of the NQB DSCP into another PHB; and Aggregation of other DSCPs into the NQB PHB)?
The (unspoken) distinction between these two sections seems to be more that:

  *   §4.2 is about typically uncongested core networks, where separation from Default (or another similar PHB) might not be necessary,
  *   §4.3 is really about where a PHB isolated from Default has been provided, and could be used for an aggregate of classes that would all benefit from such isolation.
The current titles focus on what *name* a PHB started with before it was aggregated, which is a bit academic, because aggregates don't necessarily bear the name of any of the classes they consist of (e.g. the Elastic aggregate).
[GW] Actually, the distinction is really what the *PHB* is, as opposed to what the *aggregate* is. (Note, PHBs aren’t aggregated, service classes are). So, (e.g.) in a node that supports the Default and EF PHBs (say a default queue and a strict high priority queue) these sections would recommend that NQB be aggregated with Default.  On the other hand, in a node that supports the NQB PHB (a non-prioritized queue that shares capacity with the default PHB) it could be OK to classify EF traffic into the NQB queue.  So, I think the current titles are appropriate (though I’ll make the small edit s/in/into/).
Perhaps:
§4.2. Aggregation of the NQB DSCP without isolation from Default traffic
§4.3. Aggregation of the NQB DSCP preserving isolation from Default traffic
Strictly, §4.2 also discusses aggregation with real-time, instead of Default, but I've assumed that such detail doesn't need to be explained in the section heading.
§4.3. Aggregation of other DSCPs in the NQB PHB
If you prefer not to change the section headings as above, pls consider...
s/in/into/
because my parser tripped up on 'in' (and 'into' is consistent with the §4.2 heading).
§4.4.1.
s/occuring/occurring/
§4.5. The NQB DSCP and Tunnels
"reordering-sensitive tunnel protocol"
Summarize cross-reference: An example of one would be useful, or an explanation of the implications. §4.1 of RFC2983 gives examples, but we shouldn't have to read references to get at least a grasp of what this draft is talking about.
§4.5. The NQB DSCP and Tunnels
"In the case of the pipe model, any DSCP manipulation (re-marking) of the outer header by intermediate nodes would be discarded at tunnel egress, potentially improving the possibility of achieving NQB treatment in subsequent nodes."
The contrary could equally apply. If the DSCP re-marking of the outer was part of an interconnection contract, it could well have been designed to preserve the NQB treatment in downstream domains.
[GW] Ok, changed it to “…. In some cases, this could improve the possibility of achieving NQB treatment in subsequent nodes, but in other cases it could degrade that possibility (e.g. if the re-marking was designed specifically to preserve NQB treatment in downstream domains).”
§5. NQB PHB Requirements
This section is built on the goal of incentive alignment. In the IETF, there ought to be consensus on incentive-alignment as a goal, but I have detected that some IETF participants dismiss incentive alignment if a network service does not *also* protect against malicious attack (or accidents). So perhaps the following would be a useful introductory sentence...
"Incentive alignment ensures a system is robust to the behaviour of the large majority of individuals and organizations who can be expected to act in their own interests (including application developers and service providers who act in the interests of their users). Malicious behaviour is not necessarily based on rational self-interest, so incentive alignment is not a sufficient defence, but the large majority of users do not act out of malice. Protection against malicious attacks (and accidents) is addressed in Section 5.2 on Traffic Protection and Section 11 on Security Considerations summarizes it."
This could replace the parenthesis later in the first para:
    "(this is discussed further in this section and Section 11<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-nqb-21#Security>)"
which read oddly to me, because it doesn't explain why the split between this section and section 11 anyway.
§5.1. Primary Requirements
"...the NQB PHB makes no guarantees ..., but instead aims to provide an upper-bound to queuing delay for as many such marked microflows as it can."
This contradicts the premise that the network does not provide the low delay - the low queueing delay is provided by the collective action of NQB senders, and the PHB only isolates them from QB traffic. Just the mention of an upper-bound gives the wrong impression that a max queuing delay number can be calculated. And 'for as many such marked microflows as it can' is also strange, given that delay variation tends to reduce for links designed for more flows (but admittedly not when more flows are used in a link than it is designed for).
Indeed, I'm not sure that this is the right place for any of the first para. It does not define a requirement. The first part about making no guarantees could be moved to the introduction. And perhaps the second part after the comma can just be dropped?
§5.1. Primary Requirements
CURRENT:
"An exception to this recommendation is discussed in Section 4.4.1."
PROPOSED:
"An exception to this recommendation for traffic sent towards a non-DS-capable domain is discussed in Section 4.4.1."
REASON: Summarize cross-reference: A short decription helps the reader more than just a bare section number.
§5.1. Primary Requirements
CURRENT:
"e.g., a deficit round-robin scheduler with equal weights"
" (e.g. with equal DRR scheduling)"
I couldn't tell whether DRR was the main focus of these examples, or the fact that the weights are equal.

For equal preference, the weights of a DRR scheduler have to be proportionate to the aggregate rates of each class of traffic. That's hard to determine with capacity-seeking traffic, but NQB is not capacity-seeking:

  *   Each NQB application is meant to limit its instantaneous rate to within a small proportion of typical total capacity - the draft suggests 1%. So giving NQB 50% forwarding preference effectively gives it much more preference than it needs. Certainly, we need to allow for the possibility of multiple NQB flows, therefore multiples of 1%. And certainly, it would be reasonable to give NQB somewhere close to the maximum proportion of capacity it might need, or a little more, so that its queuing delay remains low relative to Default traffic.

     *   For example, if there's 6Mb/s of unresponsive NQB traffic scheduled by 50:50 DRR into a 100Mb/s link, and the balance is consumed by 10 capacity-seeking QB flows that is probably fine. However, a 45Mb/s unresponsive but smooth NQB flow could also take advantage of a 50:50 scheduler. Then the 10 flows would share the balance, getting 5.5Mb/s each.
     *   This inequality isn't a problem per se, but it is problematic to hold up 50% scheduler weight as somehow a golden fraction.
     *   It would be more justifiable to say that:

        *   the scheduler weight for NQB traffic ought to be at least proportionate to the fraction of capacity that NQB is likely to use, and ideally a little higher than the highest likely fraction (to ensure low worst-case queuing delay).
        *   But then the text should admit that the fraction of capacity for NQB flows is likely to be hard to ascertain in a lo-stat-mux environment, so it is instead suggested that a fraction like 20% - 50% would be a reasonable maximum scheduler weight for NQB;
        *   Then it needs to say "The exact value is unimportant as long as it's high enough," because NQB is app-limited, so if its weight is too high, the unused capacity can be borrowed by capacity-seeking traffic.
        *   But it shouldn't be excessive, otherwise it gives more leeway for greedy abuse by NQB traffic.

  *   Taking Low Latency DOCSIS as another example, it uses the DualQ [RFC9332] and  doesn't comply with "The node SHOULD provide a scheduler ... that treats the two classes equally", because it gives 90% to the L queue.

     *   Admittedly:

        *   the 90% has to serve L4S as well as NQB traffic.
        *   the coupling between the DualQ AQMs ensures that L4S traffic nearly always uses less than its 90% scheduler weight, so RFC9332 recommends any high fraction, saying "its exact value is unimportant", because it merely ensures low delay for the L queue, not bandwidth shares

     *   However,  when there is no L4S traffic, the full 90% is available to NQB.
     *   My point is that 90% is a good figure in this case, and 15% might be a good figure in another case (e.g. with only NQB and no L4S).
     *   But the take-home message is that 50% is not a golden number.

Wouldn't it be less distracting to use an example scheduler that doesn't share capacity explicitly, but instead acts on time? For instance:

  *   two Wireless Multimedia (WMM) Access Categories (ACs) with the same EDCA parameters.
[GW] Also, Appendix B mentioned “… effectively receives a rate guarantee of 50% ...”, I’ve made that now “… could effectively receive a rate guarantee of (e.g.) 50% …”
[GW] My handling of this one likely warrants a review to see if I’ve captured your thoughts adequately, please see new line 246 in: https://github.com/gwhiteCL/NQBdraft/commit/33dbf848036a295ec6c7886bd30a2d032be560f6
§5.2.Traffic Protection
CURRENT:
"It is possible that due to an implementation error or misconfiguration, a QB microflow"
PROPOSED:
"It is possible that, due to an implementation error or misconfiguration, a QB microflow"
(added comma)

CURRENT:
"This specification does not mandate a particular algorithm for traffic protection. This is intentional, since the specifics of traffic protection could need to be different..."
PROPOSED:
"This specification does not mandate a particular algorithm for traffic protection. This is intentional, since this will probably be an area where implementers innovate, and the specifics of traffic protection could need to be different..."
§5.3. Impact on Higher Layer Protocols
I understand that the exitence of this section is a requirement from the PHB specification guidelines in RFC2475 (§3; guideline G.14). However, there are a number of problems with where this section sits in the document.

  *   It is within §5 'NQB PHB Requirements', but it contains no requirements.
  *   It is actually about the Impact of Traffic Protection on Higher Layer Protocols, but doesn't say so.
  *   It overlaps with the two paras in the previous section on Traffic Protection, starting 'In the case of', and it repeats much of the first of those two.
Suggested remedies:

  *   To generalize it from just the impact of traffic protection, it ought to open by saying:

     *   "The NQB PHB itself has no impact on higher layer protocols, because it only isolates NQB traffic from non-NQB. However, traffic protection of the PHB can have unintended side-effects on higher layer protocols."

  *   Perhaps it could be shifted to an Appendix (as suggested in RFC2475)
  *   I suggest that the two paras starting 'in the case of' in §5.2 "Traffic Protection" are given their own subsection of §5.2, perhaps titled "Potential Traffic Protection Penalties" and split into 3 paras for:

     *   reclassify
     *   re-mark
     *   discard

  *   then perhaps they could refer to the appendix about impact on higher layer protocols
[GW] The last 2 bullets relate to the 6th (last) item in your Technical Comments, so I’ll defer those until that item is addressed.
§5.3. Impact on Higher Layer Protocols
CURRENT:
"The traffic protection function described here"
Not clear which function this is referring to. Two were described (and they are separated from here by a couple of long paras).
§6. Configuration and Management
CURRENT:
"The default for such classifiers is recommended to be the assigned NQB DSCP (to identify NQB traffic) and the Default (0) DSCP (to identify QB traffic)."
SUGGESTED:
"The default classifier to distinguish NQB traffic from traffic classified as Default (DSCP 0) is recommended to be the assigned NQB DSCP (45 decimal).
REASON:
This text as it stood recommended that the Default DSCP now only identifies QB traffic. Whereas it ought to still be quite acceptable to identify traffic that doesn't build queues using DSCP 0.
§6.1. Guidance for Lower Rate Links
CURRENT:
"it is RECOMMENDED that the NQB PHB be disabled and for traffic marked with the NQB DSCP to thus be carried using the Default PHB."
PROPOSED:
Add: "However, the NQB DSCP SHOULD NOT {MUST NOT?} be re-marked to the Default DSCP (0)."
REASON:
To repeat and reinforce the similar requirement earlier, but for this context.
§7.1. DOCSIS Access Networks
Add reference the white paper on Low Latency DOCSIS at the end?
§7.2 Mobile Networks
Perhaps add a remark at the end of the 2nd para about how this relates (or not) to the primary requirements in §5.1. (non-rate limiting and equal preference)?
§7.3.1.  Interoperability with Existing Wi-Fi Networks
CURRENT:
"...the Wi-Fi link is commonly a bottleneck link.."
PROPOSED:
"...the Wi-Fi link can become a bottleneck link.."
REASON:
As it stands, this semi-contradicts the first sentence of the DOCSIS section, which says DOCSIS operators commonly configure the access to be the bottleneck. Saying 'can become' hints that it depends how good the Wi-Fi signal path is.

CURRENT:
"Wi-Fi equipment ... will support either the NQB PHB requirement for separate queuing of NQB traffic, or..."
PROPOSED:
"Wi-Fi equipment ... will support either the NQB PHB requirement for separating queuing of NQB traffic from Default, or..."
REASON:
If for instance, the 45 DSCP of NQB puts it into the VIdeo access category, it won't be separate from Video, only from Default.

CURRENT:
"Wi-Fi gear typically has hardware support (albeit generally not exposed for user control) for adjusting the EDCA parameters in order to meet the equal priority recommendation. This is discussed further below."
PROPOSED:
"The arrangement of queues in Wi-Fi gear is typically fixed, whereas most Wi-Fi gear supports adjustment of the EDCA parameters (albeit generally not exposed for user control) as recommended further below in order to meet the equal priority recommendation."
REASON:
When I read the text as it stood, it wasn't clear that it was motivating the choice of separate queuing. My sentence is still rather complex - perhaps it can be improved on.

CURRENT:
"A residential ISP that re-marks the Diffserv field to zero, bleaches all DSCPs and hence would not be impacted by"
PROPOSED:
"A residential ISP that re-marks the Diffserv field to zero would not be impacted by"
REASON:
Tautology.

CURRENT:
"* For application traffic that originates outside of the Wi-Fi network, and thus is transmitted by the Access Point, opportunities exist in the network components upstream of the Wi-Fi Access Point to police the usage of the NQB DSCP and potentially re-mark traffic that is considered non-compliant, as is recommended in Section 4.4.1<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-nqb-21#unmanaged>.  A residential ISP that re-marks the Diffserv field to zero, bleaches all DSCPs and hence would not be impacted by the introduction of traffic marked as NQB. Furthermore, any change to this practice ought to be done alongside the implementation of those recommendations in the current document. "

This bullet is too complex for me to understand. Maybe I need a break from reading this, but I don't understand how the last two sentences relate to the previous sentence in this bullet, or how they 'motivate the choice of separated queuing', which is what all the bullets are meant to be doing. I also don't know which of the two earlier practices 'this practice' is referring to, re-marking or bleaching?

[GW] I reworked it, please take a look and see if you find this more understandable. Line 339 of:
https://github.com/gwhiteCL/NQBdraft/commit/ebb3af93d5f28f4dc9c35740c5af0a0a9fc7b358

CURRENT:
"...ought to be done alongside the implementation of those recommendations in the current document."
PROPOSED:
"...would be efficient to implement at the same time as the recommendations in the current document."
(if this wording survives my previous comment)
§8. Acknowledgements
The RFC style guide recommends acknowledgements should appear at the end, before Contributors & Authors:
https://www.rfc-editor.org/rfc/rfc7322.html#section-4
§11. Security Considerations
The first para is about incentive-compatibility might be better covered fully in one place (at the head of §5) by using the material from here that explains exactly what causes the degradations. Then just explain in the Security Considerations that NQB is based on incentive alignment (§5) which makes it robust to self-interested actors, but traffic protection against malicious actors is also recommended (§5.2).

CURRENT:
"While the NQB DSCP value could be abused to gain priority on such links,"
Before making the point that the NQB DSCP would be the least likely to be abused, the blindingly obvious should be pointed out - that existing WMM WiFi APs already allow DSCP 45 and all the other DSCPs in half the space to gain priority today, whether or not 45 is assigned to NQB. This is more relevant for upstream, then the point about least worst is more relevant for downstream.

s/than any of the other 31 DSCP values that are provided priority/
 /than any of the other 31 DSCP values that are given priority/
REASON: my parser tripped over this.

CURRENT:
"The details of any security considerations that relate to deployment and operation of NQB in these network technologies are not discussed here."
PROPOSED:
"Any security considerations that relate to deployment and operation of NQB solely in specific network technologies are not discussed here."
REASON:
I think that's what you meant, and it justifies itself better.

CURRENT:
"While re-marking DSCPs is permitted for various reasons..., if done maliciously, this might negatively affect the QoS of the tampered microflow."
PROPOSED:
Add: "Nonetheless, an on-path attacker can also alter other mutable fields in the IP header (e.g. the TTL), which can wreak much more havoc than just altering QoS treatment.
Appendix A. DSCP Re-marking Policies
s/the result would be that traffic marked with the NQB DSCP would/
 /it would/

CURRENT:
"This could be another motivation to (as discussed in Section 4.3) classify CS5-marked traffic into NQB queue."
PROPOSED:
"This could be another motivation to classify CS5-marked traffic into the NQB queue (as discussed in Section 4.3)."
(note omission of 'the' as well as shift of parenthetical).
 Appendix B. Comparison to Expedited Forwarding
s/Comparison to/Comparison with/
Subtle, but my ear for English felt this sounded wrong. I didn't find the language guides on the web very useful, so I'll leave you to pick the one you feel is right.

CURRENT:
"While EF relies on rate policing and dropping of excess traffic, this is only one option for NQB. NQB alternatively recommends that the implementation re-mark and forward excess traffic using the Default PHB, rather than dropping it."
PROPOSED:
"While EF relies on rate policing and dropping of excess traffic at the domain border, this is only one option for NQB. NQB alternatively recommends traffic protection located at each potential bottleneck, where actual queuing can be detected and excess traffic can be reclassified into the Default PHB, rather than dropping it. Local traffic protection is more feasible for NQB, given the focus is on access networks, where one node is typically designed to be the known bottleneck where traffic control functions all reside. In contrast, EF is presumed to follow the Diffserv architecture [RFC2475] for core networks, where traffic conditioning is delegated to border nodes, in order to simplify high capacity interior nodes."
REASON:
The comparison seems to have omitted discussion of traffic conditioning topology (see earlier point about placement).

Also see my attempt to summarize how NQB compares with EF {Note 2}
 Appendix C. Alternate Diffserv Code Points
CURRENT:
"In networks where another ... DSCP is designated for NQB traffic, or ... it could be preferred to use another DSCP."
Tautology.

I think a paragraph break is appropriate after this, and before 'In end systems...' Or is there meant to be somehow a logical flow between them?
Reason: One part is about 'In networks'. The other is about 'In end systems'.

BTW, to make the section heading an easy read for all English speakers s/alternate/alternative/, because in British English, the adjective 'alternate' solely means 'interchanging', whereas 'alternative' means what you intended in all English variants. Nonetheless, most Brits would work out what you meant.

________________________________
Notes
{Note 1}:
The Problem with Conditioning Traffic Remotely for Low Stat-Mux Bottlenecks

The following explanation fis fairly long, because I have had to spell out assumptions behind different ways of thinking. So apologies if some of it seems patronising...

When there is low statistical multiplexing (stat-mux) at a bottleneck it becomes very inefficient (verging on impossible) to locate traffic conditioning (aka. traffic protection) at multiple ingress points remote from the PHB (the potential bottleneck).

For instance, let's start with a simple toy case in the downstream only, where all NQB flows (e.g. online game sync streams) have the same regular packet bunching, so we can know that (say) 12 of these flows in one buffer would cause too much queuing. Then if NQB traffic is being conditioned remotely at 3 ingress points, how many flows at each ingress would be too much? Not 12. Maybe 5? Or probably 4 to be certain not to cause excessive queuing at the bottneck.

But it would not be so unusual if the set of clients in a home happen to call for most of their NQB traffic via just one of the ingress points at peak time on one day, and most via another on the next day (it's not unusual for the interests of people living together to shift around together, because some families still even talk to each other sometimes ;). But remote traffic conditioning has to prevent them exceeding 4 flows via any single ingress, even if they aren't pulling any traffic from the other ingress points.

If 5 flows are passed through an aggregate traffic conditioning limit of 4 flows, usually it will ruin them all. So, to ensure that more than 12 flows don't ruin this family's service, their service is often ruined whenever they call for more than 4 flows. Ironic but a consequence of remote traffic conditioning at 3 ingresses for a low stat-mux bottleneck.

With traffic protection based on /actual/ queuing (located at the PHB itself), traffic would only be limited when 12 flows coincided, and then only when their bursts coincided.

Real traffic is not as regular as these toy example flows, so remote traffic conditioning is even less efficient. Short NQB flows would be mixed with medium and long ones, each with different regularity and different smoothness. So each traffic conditioner has to err on the side of caution in case bunches of packets from the other conditioners coincide at the bottleneck. So, 3 conditioners have to limit their /burst/ allowances to 1/3 of the available capacity for NQB at the PHB. And in low stat-mux, even for NQB, average rate will be many times less than the allowance needed for bursts.

In contrast, the RFC2475 Diffserv architecture (with traffic conditioning around a domain border protecting the PHBs on interior routers) is applicable to hi-stat-mux networks, e.g. enterpirse networks attached to large core networks. Then, although the amount and balance of traffic from different ingress points (the traffic matrix) varies, the variation is of the same order as the average, not many multiples of the average. For instance, let's multiply up the above toy example by 1,000. if 12,000 flows at a link come from 3 ingress points, with on average 4,000 each, it is highly unlikely that at peak time on one day 11,000 will all come from one ingress, and the next day 11,000 will all come from another. Instead, the range of variation at each of the 3 ingresses might be 3,000 to 5,000 flows. Also as stat-mux increases, bursts and troughs tend to cancel out more than they reinforce. So, with hi-stat-mux, remote traffic conditioning can be sufficiently efficient to be worthwhile.

Also, the aim of the RFC2475 architecture is to avoid traffic conditioning on high capacity interior nodes, in part due to the complexity at high speed, and in part because shedding traffic from such a high capacity node would impact a large proportion of the customer base.

The opposite is true with a low stat-mux bottleneck. Adding complexity to detect and handle actual queuing is feasible at lower scale, and shedding traffic by definition only affects a few flows (as few as one - with per-flow mechanisms).

{Note 2} The following is my attempt at a comparison of NQB with EF (written as a note-to-self originally, but feel free to lift parts for this draft if you want):

The main distinction between NQB and EF is that an NQB bottleneck is not guaranteed to stay below a certain queue delay, so (in the 'actual queuing' alternative) NQB relies on traffic protection at each potential bottleneck  to shed traffic that is causing actual queuing. In contrast, EF can guarantee a maximum queue at interior nodes by using traffic conditioning at border nodes to shed any traffic in excess of the contracted aggregate EF rate - even though accepting the excess traffic might not have caused any actual queuing (see Appendix B for details). The 'actual queuing' approach of NQB is more appropriate where statistical multiplexing is low, e.g. in access networks. With low stat-mux, there is high variation in the total load of the class, so it can be highly inefficient to limit traffic at each border in case correlated bursts cause queuing, compared to only dealing with queuing that actually occurs.

Bob

--

________________________________________________________________

Bob Briscoe                               http://bobbriscoe.net/

Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21 - N… Greg White
Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21 - N… Bob Briscoe
Re: [tsvwg] Review of draft-ietf-tsvwg-nqb-21 - N… Greg White