Re: [tsvwg] WGLC comments draft-ietf-tsvwg-nqb-15

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Mon, 13 March 2023 20:08 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D218BC15257A for <tsvwg@ietfa.amsl.com>; Mon, 13 Mar 2023 13:08:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G--mrCerUhNA for <tsvwg@ietfa.amsl.com>; Mon, 13 Mar 2023 13:08:20 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [IPv6:2001:630:42:150::2]) by ietfa.amsl.com (Postfix) with ESMTP id 923C1C15256E for <tsvwg@ietf.org>; Mon, 13 Mar 2023 13:08:18 -0700 (PDT)
Received: from [192.168.1.64] (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id C31361B000FC; Mon, 13 Mar 2023 20:08:12 +0000 (GMT)
Content-Type: multipart/alternative; boundary="------------AkR0beRJteD0Zjzl6cNiDEI3"
Message-ID: <f49ddf64-8af7-9e2b-95a0-7e7b2c3d06c9@erg.abdn.ac.uk>
Date: Mon, 13 Mar 2023 20:08:12 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:102.0) Gecko/20100101 Thunderbird/102.7.2
To: Greg White <g.white@cablelabs.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
References: <e9873c9c-21f6-1121-28a2-e8cf350fc9bf@erg.abdn.ac.uk> <a3911fbc-3a2b-2f0f-9cf4-22c30d33b360@erg.abdn.ac.uk> <1A75471B-FF41-4324-ADEB-9C5D599BAD8F@cablelabs.com>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Organization: UNIVERSITY OF ABERDEEN
In-Reply-To: <1A75471B-FF41-4324-ADEB-9C5D599BAD8F@cablelabs.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/W7NNUE-sEb3hOJC5k0V41Nvzj1c>
Subject: Re: [tsvwg] WGLC comments draft-ietf-tsvwg-nqb-15
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2023 20:08:22 -0000

On 13/03/2023 01:32, Greg White wrote:
> Thanks for the in-depth review and detailed comments.  It is appreciated.
>
> Please see my responses below for the first 22 points, marked [GW]. I'll tackle the rest tomorrow.
>
> -Greg
Thanks for your response and please see my comments below, marked [GF2]
>
> On 2/21/23, 3:50 PM, "tsvwg on behalf of gorry Fairhurst" <tsvwg-bounces@ietf.org  <mailto:tsvwg-bounces@ietf.org>  on behalf ofgorry@erg.abdn.ac.uk  <mailto:gorry@erg.abdn.ac.uk>> wrote:
>
>
>
>
> I would like to add my own comments to this WGLC for the NQB ID. These
> are added as an individual during the WGLC.
>
>
> To me this draft seems like it addresses a useful specification,
> although I think it still requires work.
>
>
> I hope the comments/queries below help can be discussed or addressed
> from the WGLC,
>
>
> Best wishes,
>
>
> Gorry Fairhurst
>
>
> =====
>
>
> 1. The definition of “low-rate data-rate, application-limited traffic
> flows” is still problematic for me. The text later describes this (in
> more than one place) as /relatively low data rate applications/ … which
> I am less sure about what that means? how relative is this? Can we be
> more concrete?
>
> [GW] Section 4.1 describes an upper bound using a rate equation, where the rate R is described as 'about 1 percent of "typical" network path capacity' and further provides that today that works out to 1 Mbps.  Would it help if we provided that definition earlier on in the document?  Or, it your concern that even this definition is not concrete enough?

[GF2] Hmmmm.... This needs to be handled well in the spec. At this 
moment, it linked to 4 and 13 below.


> I was particularly surprised by this: “Current examples include many
> online games” ... which clearly covers a wide range of rates.
>
> [GW] As has been discussed in a separate thread, the intent of that example was to refer to the extremely common multiplayer online PC/console type games, which may in fact cover a wide range of rates, but that range is roughly 1 kbps to 300 kbps.  Cloud-rendered game services are definitely different, and can have downstream data rates measured in the 10s of Mbps.   I have clarified this inhttps://github.com/gwhiteCL/NQBdraft/commit/8bb01560897f59383032f7d5fb82907414269522  let me know if you still find that problematic.
[GF2] Be very careful... others might cite your examples to support the 
own hopes/fears....
> It seems that it argues that the traffic carried by the PHB consumes a
> limited proportion of the available capacity, and can see how if this
> is the case and the flow/flows contributing the traffic are paced, then
> this might be acceptable for some people deploying diffserv. The
> traffic I would have in mind would be limited to a few packets over an
> RTT, or an average rate less than, say, 1 packet/second - such things as
> DNS, upper protocol control messages, infrequent exchanges, etc. Even
> when aggregated up as many many flows, such traffic seems relatively
> benign compared to the rate of modern Internet paths in general (aka >1%
> of typical path capacity). Yet it mentions, video conferencing, or
> Broadcast Video, which although rate-limited and paced are much more
> than the rates I would have expected, and can aggregate to appreciable
> traffic without traffic conditioning.
>
> [GW] Yes, it was a mistake to mention the option of aggregating CS3 (Broadcast Video) with NQB-marked traffic in the NQB PHB.  Ruediger pointed this out as well.  I've eliminated it inhttps://github.com/gwhiteCL/NQBdraft/commit/106ec7ac09ea334327fc90225ba82b01ff4cc3f0.
> Video conferencing was not explicitly mentioned in draft-15.  Could you point out the text that your comment refers to?
[GF2]  I'd need to read again, but removing recommendations for CS3 helps.
>
> I have a feeling of discomfort that stems from this PHB not requiring a
> conditioning method, and yet potentially relating to a sizeable
> proportion of the traffic on some portions of the Internet. While
> operators can use IETF specs and configure what they need for their
> service, I would like to think the IETF as a whole is careful about what
> is explicitly endorses.
>
> [GW] This is a shallow-buffered Best Effort service, not a guaranteed service. The draft does recommend a "conditioning method", although it describes it in terms of identifying flows that are inconsistent with the sender requirements, as opposed to handling cases where the aggregate of compliant flows exceeds available capacity.  In Best Effort services, overloads can happen.  Whether the impact of overload (typically packet loss and/or latency) is only felt by a subset of the users/flows or is felt by all of them could be different in different implementations. I think this is ok.

[GF2] I like this statement "This is a shallow-buffered Best Effort 
service, not a guaranteed service." and I appreciate the clarity of the 
rest of your answer, can this better come across in the ID?

> ====
>
>
> 2. In understand the goal of “This PHB is implemented without
> prioritization” and that is not in dispute, but the side effects of the
> currently discussed DSCP value is that it could be assigned by accident
> to a lower layer that treats traffic with this DSCP as having this
> priority. I expect that somehow needs extra diligence with the way we
> define the traffic profile.
>
> [GW] That topic is the subject of section 4.3.1, which provides recommendations on how to prevent or mitigate problems that could otherwise result.  Do you have any thoughts on what changes there would help eliminate this concern?
[GF2] We should check this again in the next reading....
> ===
>
>
> 3. I understand the goal “can be implemented without rate policing”, and
> I would hope that is the case for individual flows. None-the-less the
> conditioning of traffic matching this PHB is likely to be important for
> many deployments, and the later section do talk about how to condition
> the traffic, so I was unsure what was intended.
>
> [GW] Ok, I was trying to keep the abstract short, but still describe the important differences from other PHBs (primarily EF).  The intent was to say that, in a node that supports the PHB, the aggregate of NQB traffic is not limited to a subset of the link capacity that it shares with Default (i.e. if there happens to be no Default traffic, NQB could consume the entire capacity that the two PHBs share).
[GF2] I agree.
> I added the "can be" in that phrase since we've added discussion in 4.3.1 about certain cases (interconnection with unmanaged networks that don't support the PHB) where it could be needed.  Strictly speaking, I think the sentence was correct without the "can be", since it is referring to implementations of the PHB (where rate policing is not defined) as opposed to interoperability with networks that don't support the PHB.  Maybe the issue is that the abstract doesn't explicitly state that this PHB is paired with the Default PHB and that the two of them could be rate policed (or rate shaped) together.  Do you think that would solve it?
[GF2] That would indeed help.
> ====
>
>
> 4. I agree with the desire that they are “such that they are highly
> unlikely to exceed the available capacity of the network path between
> source and sink.”, but reading this left me feel really uncomfortable
> with respect to operators and networks offering less than 100 Mbps - I
> am not sure I like the idea of the IETF projecting a minimal service
> rate, so I'd encourage much more care in how this is all framed.
>
> [GW] The intent wasn't to project a minimal service rate in general for the Internet.  But, the ability to deliver on the promise of the NQB PHB does depend on the relative rates between the applications sending NQB-marked traffic (in aggregate on a link) and the link rate where the PHB is supported.  We could enable the benefits of the NQB PHB to extend to more networks by setting this application rate guidance lower, but that would then limit the applications that could take advantage of it.  This is a judgement call, but 1% of "typical" capacity (1 Mbps today) to me seems like a reasonable place to draw the line and provides some headroom in cases where the capacity is less than typical.  I think setting it to 500 kbps would be ok, but I don't think we would want to go a lot lower than that.
[GF2] It is judgment call, and a lower rate would ease the pain with 
respect to objections (including mine).  I'd expect we'll find more 
comfort with more experience, but I'm going to continue to query if a 1 
Mbps rate is low enough.
>
> This also occurs later, "“In today's network, where access network data
> rates are typically on the order of 100 Mbps, this implies 1 Mbps as an
> upper limit. “ - typically yes in many parts of the world for many this
> is not the case still.
>
> [GW] And this is covered in Section 5.3.  Just put NQB traffic and Default traffic into the same queue (as is the case for those networks today and would be the case if the NQB PHB never existed).  If L4S isn't supported in the network, then arguably this isn't needed since I think that the outcome of splitting the traffic into two equal priority queues (one shallow buffered and one deep buffered) even on low rate links isn't necessarily worse than using a single queue.  In any case, I'll try to find a place in (or before) this section to plant the seed that this is covered later.
[GF2] OK
>
> ====
>
>
> 5. I think the statement below does not represent current IETF consensus:
>
>
> “ While this architecture is powerful and flexible enough to be
> configured to meet the performance requirements of a variety of
> applications and traffic categories, or to achieve differentiated
> service offerings, it has proven problematic to enable its use for
> these purposes end-to-end across the Internet.”
> This is currently not appropriate to a standards-track document, unless
> we explicitly seek to gain that consensus, and I don’t particularly see
> why that paragraph is needed, or suggest something of the flavour:
> “ While this architecture is powerful and flexible enough to be
> configured to meet the performance requirements of a variety of
> applications and traffic categories, or to achieve differentiated
> service offerings, at the time of writing there is no specification
> to enable its use for
> these purposes end-to-end across the Internet.”
>
> [GW] Ok.  But, is it correct to say that the issue is the lack of a specification? I wouldn't agree that this is the problem.  Maybe we could simply say "it has not been used for these purposes end-to-end across the Internet."
[GF2] .... or say "no RFCs have used it end-to-end across the Interne" 
or something like that.
>
> ====
> 6. I am not sure this really matches how I see the diffserv
> architecture. Please check the wording:
> “They also
> significantly simplify access control and admission control
> functions, reducing them to simple verification of behavior.”
> - If it is indeed intended to talk about admission control, then please
> separate this from the para on diffserv and use language that maps to
> that network function.
>
> [GW] I'm not sure I fully understand what you are getting at. The "They" in that sentence was referring to "These attributes" of the NQB PHB in the previous sentence. Is that how you interpreted it?
[GF2] I didn't erad it that way sorry, perhaps replace /They/?
>
> ===
> 7. I query what is intended by the word policing, or would like more
> description for what is exactly intended, if this is different to
> traffic conditioning?
> “less stringent policing than they would with either codepoint alone”
>
> [GW] Hmm, that statement was provided by Bob, and I had interpreted it to mean the traffic protection function (in the context of PHB implementations) and re-marking/traffic policing (in the context of section 4.3.1).
> https://mailarchive.ietf.org/arch/msg/tsvwg/ohoTqN2olPG_kyML-9WsbIclkwM/
> I see that Ruediger asked a similar question:
> https://mailarchive.ietf.org/arch/msg/tsvwg/Q97SEK2sLCCSdlX4H9eQwfleZQk/
> I think the correct statement should be: "Packets marked with both codepoints SHOULD be treated the same as packets marked with either codepoint alone by the traffic protection function and by any re-marking/traffic policing function designed to protect unmanaged networks (as described in Section 4.3.1)."  Does this leave any gaps?
[GF2] That might be better.... please also make the sentence with an 
RFC2119 keyword readable on it's own though ... i.e. actually 
explain/either codepoint alone/ ...people do often do cite the RFC2119 
keyword sentences with no context!
>
> ===
> 8. “In addition, these applications send their traffic in a smooth (i.e.
> paced) manner,”
> - I know smooth pacing can be true in some cases, but is it necessarily
> true. If I take a Zone transfer or download a large DNS record is this
> paced? What are the implications here?
>
> [GW] Well, I think we need to draw the line somewhere.  Based on the current text, a burst greater than 1500 bytes above "R" would not be compliant with the sender requirements.  Do you think 1500 bytes isn't the right number?
[GF2] I was more asking if these apps really are paced?
> ===
>
>
> 9. “Note that, while such flows ordinarily don't implement a
> traditional congestion control mechanism, they”
>
>
> - The word ordinarily made me worried about how this could be read.
> Could you perhaps consider turning this around and avoid stating the
> norm. (and avoid the "flow" implementing something) e.g. When such a
> flow is generated by a sender that does not implement a traditional
> congestion control mechanism...".
>
> [GW] Sure. That is an improvement.   Actually, I'm still a bit uneasy with how that is presented. As I mentioned in
> https://mailarchive.ietf.org/arch/msg/tsvwg/i4sz-cMb5MOLBCzDv47RFB0cn7g/
> I don't see why we wouldn't want to encourage ALL NQB-marked applications to also implement an L4S-compliant congestion controller.  Later in this section we recommend that NQB non-compliant applications consider implementing L4S, but why not low-data rate ones too?
[GF2] I think some classes of very low rate apps will find it hard to 
respond to ECN - because they send too infrequently and might not be 
able to understand marks properly. To me, these are a good match to NQB. 
Anything that has many packets in flight or sends many packets/sec could 
be encouraged to use L4S. This should be thought through.
> ===
>
>
> 10. Consider: “To be clear, the description of NQB-marked flows in this
> document
>
>
> should not be interpreted as suggesting that such flows are in any
> way exempt from this responsibility.”
> /should not/is not/ ?
>
> [GW] Yes.
>
[GF2]:-)
> ===
>
>
> 11. At points there are long paras that span multiple topics, please
> break these where it is possible. e.g. Could we break the para here to
> separate the local-use and recommended approaches:
>
>
> /In networks where another (e.g., a local-use codepoint)/
>
> [GW] Ok.
[GF2]:-)
>
> ====
> 12. This expectation for applications might be OK, but it does not help
> an operator that has to support a lower rate link
> /If the application's traffic exceeds the rate equation provided in
> the first paragraph of this section, the application SHOULD NOT mark
> its traffic with the NQB DSCP.
>
>
> /
>
>
> There is text on this later, that I think really ought to be called out
> earlier in the document.
>
> [GW] Ok, I'll find a spot to mention this earlier.
[GF2]:-)
> ====
>
> 13.
> /At the time
> of writing, it is believed that 1 Mbps is a reasonable upper bound on
> instantaneous traffic rate for an NQB-marked application, but this
> value is of course subject to the context in which the application is
> expected to be deployed./
> - This is an IETF PS, and needs to have IETF consensus: I’ve argued this
> is high in the past for a PS designed for the Internet as a whole, and I
> will argue again the same. Although I can see that in some regions this
> value would be legitimate, I fear in other places it would not. I think
> this is something I where I would like to see a strong consensus.
>
> [GW] Per my comment above I think if we reduce it too much it won't provide a benefit for very many applications.  Do you have a number in mind?
[GF2] Aha - I wonder can we find a rate low enough to be happy?  I'm 
guessing an agregate of 1 Mbps might be nearer, which might really be a 
small share of many paths? I think we will need to return to point 13 
after we finish others.
> ===
>
>
> 14.
> I have two queries form this phrase:
> /happens to exceed
> the available path capacity (even on an instantaneous basis) runs the
> risk of being subjected to a traffic protection algorithm/
> - 1: Why exceed capacity? rather than its share of the capacity?
> - 2: Where is the term traffic protection defined and explained in
> section 4.1 ? (can we try to have the term defined before it is used?)
>
> [GW] How about: "An application that marks its traffic as NQB runs the risk of being subjected to a traffic protection algorithm (see Section 5.2) if it contributes to the formation of a queue in a node that supports the PHB. This could result in the excess traffic being discarded or queued separately as default traffic (and thus potentially delivered out of order)."
>
[GF2] This sounds like a good direction to go.
> ====
> 15. “The sender requirements outlined in this section are all related to
> observable attributes of the packet stream, which makes it possible
> for network elements (including nodes implementing the PHB) to
> monitor for inappropriate usage of the DSCP, and re-mark traffic that
> does not comply.
> “
>
>
> - how does this work I can see how a specific operator might know
> specific details, and that might even include the RTT between some
> source/destination pairs. But I don’t know how an operator would
> reasonably condition or police Internet traffic in terms of RTT. Maybe
> you understand better? It seems to me that an application developer
> (unless for a closed platform), would likely not know RTTs or what
> other traffic will be aggregated, so advice here to limit the type of
> traffic admitted seems hard to grasp.
>
> [GW] I actually don't think there are any cases where the RTT comes into play. The current sender requirements today are:  ((at most a few pkts per RTT) || (no more than 1 Mbps)) && (less than (1 Mbps)*T+1500B over any interval T).  Maybe we should simplify the requirements, and remove the mention of RTT?

[GF2] I'm OK with removing it from the network discussion. From the 
network perspective though, a network device typically doesn't know RTT 
- so an RTT-based rule can't be much use to it.

[GF2] To me, the RTT argument is really one that helps define a class of 
apps that don't send data often enough to be able to respond normally to 
CC or ECN, and their very infrequent rate of transmission could mean 
that they are also mostly benign.

> ===
>
>
> 16.
> /but is recommended nonetheless in order to /
> … Ought this to be a RFC2119 keyword? it might be or it might not? If
> this PS makes a recommendation it needs to be RFC2119 compliant.
> … It would be really nice to remove the /in order/ term to avoid an
> ambiguity.
>
> [GW] The RFC2119 recommendation is in the previous paragraph, this sentence is simply explaining why we made that recommendation.  Could you take a look again and see if you think that is unclear?

[GF2] Aha, so maybe we could just state the facts then, something like:

In backbone and core network switches (particularly if 
shallow-buffered), as well as in nodes that do not typically experience 
congestion, treating NQB-marked traffic the same as Default might be 
sufficient to preserve loss/latency/jitter performance for NQB traffic. 
In other nodes, treating NQB-marked traffic as Default could result in 
degradation of loss/latency/jitter performance but nonetheless preserves 
the incentives described in Section 5 
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-nqb#phb_requirements>.

> [GW] I will delete the /in order/.
[GF2} :-)
>
> ===
>
>
> 17. controlled environments
>
>
> I would also like to see a para break between the IETF-specified domains
> and any local-use description of controlled environments:
>
>
> /An alternative, in controlled environments/
>
>
> - perhaps advice on controlled environments would be better placed in a
> separate subsection?
>
> [GW] How big a concern is this for you?  It does seem like it would make things difficult to explain.  I'd probably prefer deleting any mention of controlled environments rather than trying to create a new subsection.
[GF2] Removal might also be good. Or if needed, putting in an annexe for 
now until we all agree.
> ===
>
>
> 18. Please explain:
>
>
> /would be to aggregate NQB-marked
> traffic with real-time, latency sensitive traffic./
>
>
> - How would aggregation be performed … would this traffic with this set
> of DSCP values be forwarded as the same treatment aggregate?
>
> [GW] That is what I had in mind, but this sentence has pretty low overall value in the draft.  If it is problematic, I'd probably delete it rather than spending a lot of time wordsmithing it.
[GF2] Please suggest something, removal might be fine.
> ===
>
>
> 19. The following text is still to me is problematic for a PS.
> /Similarly,
> networks and nodes that aggregate service classes as discussed in
> [RFC5127] and [RFC8100] might not be able to provide a PDB/PHB that
> meets the requirements of this document./
> - We can define in this PS, what is to be done to comply with the PS, we
> should not provide text that seems to hint that complying with the spec
> allows non-compliance.
> - This text needs to be in a separate section to clearly differentiate
> it from the other treatment
>
> [GW] I see your point.  How separate does it need to be?  An appendix?
[GF2] Moving some discussion of considerations there might work, it 
would be a good start.
> ===
>
>
> 20. All operators seems like a strong statement:
>
>
> / is recognized by all operators/ - not quite, all operators configuring
> a diffserv domain? something else?
>
> [GW] How about we obfuscate that sentence a little:  "is recognized and mapped across network boundaries accordingly." ?

[GF2] Seems better

===

>
> 21. Is this an RFC2119 requirement:
> / To
> ensure reliable end-to-end NQB PHB treatment, the appropriate NQB
> DSCP should be restored when forwarding to another network./
> -Do you have an RFC that you could quote to support this?
>
> [GW] It seems like this is something that RFC2475 would say.  It's not jumping out at me on a quick scan though.  I'll keep looking.
>    
[GF2] Citing an RFC here would be good.
> ===
>
>
> 22. I objected to the following term in my last review, and I'm going to
> push-back again on the use of "management" to characterise these
> networks, I don't think the text is anything to do with management of
> network elements:
>
>
> /Unmanaged Networks/
> I also do not find this term in RFC2475, presumably because that's not
> the appropriate term?
>
>
> This term needs explained or another term used.
>
>
> Later it also seems to link residential as unmanaged? This seems a
> mix-up of terms again.
>
> [GW] Sorry if I missed this on an earlier review.  I agree this could be improved.  I'll take a stab at it.
>
[GF2] Thanks, "unmanaged' probably might be not configured with support 
for something...

===

We have resolutions for most, although we seem to understand [1,4,13], 
I'm not sure I yet see how we should address these.

Let's deal with the other issues in a separate thread!

Gorry