Re: [tsvwg] Questions and comments on draft-ietf-tsvwg-ecn-l4s-id-06

Bob Briscoe <ietf@bobbriscoe.net> Mon, 04 November 2019 22:38 UTC

To: G Fairhurst <gorry@erg.abdn.ac.uk>, tsvwg WG <tsvwg@ietf.org>
References: <5C97C8A2.7020804@erg.abdn.ac.uk>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <fae34250-44c9-4d1b-07b1-1e701204fe8e@bobbriscoe.net>
Date: Mon, 04 Nov 2019 22:38:18 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <5C97C8A2.7020804@erg.abdn.ac.uk>
Content-Type: multipart/alternative; boundary="------------4E90F66F0BA1FB1B7C37DA6A"
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/kEvQpcAUe1CvQJ07FxVtpHgo3OY>
Subject: Re: [tsvwg] Questions and comments on draft-ietf-tsvwg-ecn-l4s-id-06
Precedence: list

Gorry,

You should see the rev'd draft soon, with a combination of yours and 
numerous other comments addressed. Thank you v much for your continued 
close attention to this.

responses inline...

no response = agree.


On 24/03/2019 18:12, G Fairhurst wrote:
> I have read and made a review of draft-ietf-tsvwg-ecn-l4s-id-06,
> I hope this helps understand what is needed to complete this draft.
>
> Gorry
> (as an individual)
>
> ---
>
> Questions:
>
> Section 4.3
> (1) “As with all transport behaviours, a detailed specification
> will need to be defined for each type of transport or application”
> - Does this require an RFC to do this? What type?
" (probably an experimental RFC)" added
>
> Section 4.3
> (2) At the end of the para I expected to see a note about what if the 
> traffic was not from a scalable congestion control. If rate-limited EF 
> traffic is submitted to this queue it won’t be a scalable CC, but it 
> could employ a shaper, or Circuit Breaker function that prevents it 
> contributing to queueing. How is this going to be described? - I think 
> the point i had is that such traffic may be responsive to ECN, but 
> doesn’t need to scale when it is itself limited in what is permitted 
> to be sent. Section 5.4.1.1.1. seems a hint to this. 
The aim of QProt is to actually be able to define the envelope that an 
/responsive L4S/ app must stay within. That's the first time this has 
been possible. But it still can't say anything about unresponsive traffic.

I've tried, but no-one has ever been able to nail down the mystery of 
how the Internet works with some unresponsive traffic, despite not being 
well-defined.

> A.1.6 speaks of scaling to smaller cwnds, though, so how does this fit?
That's a responsive thing, so not relevant here.

But it's relevant to the responsive parts earlier, in being able to 
ensure that the L4S service remains v. low latency even with v low base 
RTTs, or even when it's overloaded to the extent that the fair share 
window is sub-packet. We have an implementation of that now, BTW. Asad 
Ahmed's Master's thesis describes the implementation, that ended up not 
much extra code, and gives lots of evaluation. I'll post a link 
separately (also to ICCRG).


>
> Section 5.1
> (3) “but there is no implication that such a mechanism is necessary.”
> - I agree with the statement personally, but I am not sure we have 
> consensus on this point. If we do that would be fine, if we do not 
> then this should additional be a topic to be considered in 
> experimentation perhaps?
That's what issue #16 is about. I'll leave the text as is for now.

>
> Section 5.1
> (4) In this context, it is not clear what is meant by: “field as CE 
> for an increasing proportion of packets,” - increasing with respect to 
> what condition?
Already fixed this in -07. It now says:

     An ECT(1) packet is classified as ECN-capable
    and, if congestion increases, an L4S AQM algorithm will increasingly
    mark the ECN field as CE, otherwise forwarding packets unchanged as
    ECT(1).


> (5) “if
> the most recent ECT packet in the same flow was ECT(0), the node MAY
> classify CE packets for classic ECN [RFC3168] treatment. “
> - What happens if this intentionally manipulated to try to 
> disadvantage a flow? It seems like an off-path attack can introduce 
> rogue attack packets that could influence this method. Please consider.
The text here is just ensuring that this draft does not preclude 
innovative per-flow mechanisms.
One cannot do a security assessment without an actual mechanism.

For instance, this doesn't necessarily mean per-flow queues. It could 
just be a way to ensure that a dualQ classifies CE packets into the same 
bulk queue as other non-CE packets in the same flow. So, it's not 
possible to do a security analysis on an undefined mechanism.

FQ is indeed vulnerable to off-path attacks adding traffic into a queue, 
but that's way off topic for this section...?


>
> Section 5.4.1.1
> (6) The text in 5.4.1.1. does not start by saying that this implies an 
> additional queue, and at least for me, this is still a little hard to 
> unravel. I expect it is heading in the correct direction, but it is 
> not yet clear what the architecture looks like.
Ah! I hadn't noticed that I had silently jumped into DualQ mode here. 
Thank you.

I've introduced the jump to DualQ before this subsection, and also added 
a couple of FQ examples in a new section afterwards.

>
> Section 5:
> (7) I’ve been trying to see the place where the advice is to 
> operations staff, rather than procurers/designers. It seems to me like 
> that break is somewhere near 5.4 - am I correct? is there any chance 
> we can place a section header that has a useful heading for someone 
> looking for this?
At the end of the Scope section, I've added the following text:

    This document is about identifiers that are used for interoperation
    between hosts and networks. So the audience is broad, covering
    developers of host transports and network AQMs, as well as covering
    how operators might wish to combine various identifiers, which would
    require flexibility from equipment developers.

>
> Section 6:
> (8) Section 6 describes experiments but doesn’t give any hint at when 
> the IETF would have sufficient experience to know whether this 
> experiment is confirmed. I am looking for what sort of things need 
> experience?
I've added an initial sentence to explain:

    This draft defines an identifier that enables L4S experiments to
    inter-operate between hosts and networks. Therefore, it leaves most
    of the definition of experiments in the use of this identifier to
    the specific mechanism drafts mentioned below.

>
> (9) I wonder if appendices B, C need to be published. They suggest 
> variations that the WG has not decided to take-up and it would be 
> unwise to further promote these. If we keep these, I suggest we 
> separately review these to check their tone correctly conveys the 
> final status of the WG consensus.]
OK. we wrote them assuming they would be published as a record of the 
reasoning behind decisions taken, so their tone should be suitable. I 
think they should be published (if not, it would be wrong just to throw 
them away, and where else would be more appropriate - rhetorical question).


>
> (10) I would like reassurance that we have consensus that the 
> following two reactions are intentional and now forma part of 
> experimentation. I’d like to suggest reaction to loss is not optional, 
> and must be treated like it was a congestion loss. I therefore query 
> this text in Appendix A:
> “Current DCTCP implementations react differently to this situation.
> At least one implementation reacts only to the drop signal (e.g. by
> halving the CWND) and at least another DCTCP implementation reacts to
> both signals (e.g. by halving the CWND due to the drop and also
> further reducing the CWND based on the proportion of marked packet).
> We believe that further experimentation is needed to understand what
> is the best behaviour for the public Internet, which may or not be
> one of these existing approaches.”
I have added:

    A third approach for the public Internet has been proposed that
    adjusts the loss response to result in a halving when combined with
    the ECN response.

You will notice that the two original approaches and the new third 
approach all respond to the loss by at least halving cwnd. So I don't 
think this gets anywhere close to risking any dangerous precedents.

Personally, I am currently experimenting (with Joakim) on a way to react 
much more quickly than an EWMA would in response to a growing level of 
congestion (whether loss or ECN), while responding less jumpily to 
individual events. But that's research - not really for this list.

>
> (11) In the same point it states: “Packet loss might (rarely) occur” _ 
> i’d argue that packet loss can ALWAYS occur in the case of overload, 
> and that this is an important case that needs to be considered to 
> avoid congestion collapse. The text and my assertion appear to 
> potentially conflict.
Thanks. I've changed it to what I meant (which also side-steps the 
question of transmission losses from other links combining with ECN from 
this one, which isn't relevant here):

    Even though a bottleneck is L4S capable, it might still become
    overloaded and have to drop packets.


> ——
>
> The rest are detialed comments on carefully reading the text (i.e., 
> NiTs):

For speed, I've just done these mostly without comment
>
> “and low delay is maintained during high load.”
> - I understand, but would it be clearer to say what has a high level 
> of load?
>
> “The performance improvement is so
> great that it is motivating initial deployment of the separate parts
> of this system.”
> - this seems like a boast. Could you turn it around into a fact. Such 
> as : Initial deployment of the separate parts of the system has been 
> motivated by the performance benefits…
> - is there a reference?
>
> Section 1.1 has the title “problem”, could this say “The Latency 
> Problem”? - or something similar?
We have never said it's only a latency problem, viz. "Latency is not our 
only concern: " ... "It turns out that a TCP algorithm like DCTCP that 
solves the latency problem also solves TCP's scalability problem. "

I think that's in your head and David's not ours. This might explain why 
there's the argument over the identifier meaning both low latency and 
relaxing resequencing (!)

I've re-titled: "Latency, Loss and Scaling Problems"


>
> Is “ In the developed world,” acceptable as a phrase?
>
> “Then Diffserv is of little use.” - could be quoted and 
> misinterpreted, maybe better to say it can do little to reduce the 
> latency?
>
> “In general, AQMs” - I suggest In general, “AQM methods”, is clearer 
> than AQMs. This appears several places.
>
> “So, AQM was not widely deployed.” Is it better to say “So, this form 
> of AQM was not widely deployed.”
>
> “Flow-queuing” - needs a reference?
>
> “Latency is not our only concern:”
> - When published, I don’t think we should be stating an IETF position, 
> please rephrase. Perhaps the editors mean L4S addresses more than 
> reduced latency?
>
> “The finer sawteeth have low amplitude” - perhaps not completely clear 
> when read out of context, please add a few words around this such as 
> “sawtooth in the congestion window” … or whatever makes sense.
>
> “A supporting paper [DCttH15]” -m please remove “supporting” because 
> it does not support THIS specification.
>
> “Low-Latency, Low-Loss and Scalable (L4S) service: ‘
> - Missing a final bracket at the end of the para before the full stop.
>
> “But it is also” - remove “but”?
>
> “(DSCP [RFC2474])” - I think should be “(DSCP0 [RFC2474].”
I took this to mean (DSCP) not (DSCP0, of course.


>
> “This document is intended for experimental status, so it does not
> update any standards track RFCs.”
> - Please replace by “When published, this document will provide an 
> experimental specification. It does not
> update any standards track RFCs.”
>
> I can’t parse the following: “Ideally, the identifier for packets 
> using the Low Latency, Low Loss,
> Scalable throughput (L4S) service ought to meet the following
> requirements:”
> - I don’t think you can use “ideally” or “ought” in the sentence 
> scoping the RFC-2119 keywords. Please rephrase.
> - I note that you could choose to use “RECOMMEND” rather than “SHOULD” 
> since this is a requirements specification. This use is not consistent 
> across RFCs but can help to separate requirements from protocol 
> actions in section 4.
Yes, I've realized this subtle distinction recently.
Too subtle to do now. Would need a clear head.

>
> “to allow this experiment (amongst others).” - True, but could be 
> misinterpreted that other experiments are welcome, rather than all 
> need to be specified via RFC process. Is this more neutral: “to allow 
> experiments such as the one defined in this specification”.
>
> This seems loose: “As a condition for a host to send packets with the 
> L4S identifier
> (ECT(1)), it SHOULD implement a congestion control behaviour that
> ensures the flow rate is inversely proportional to the proportion of
> bytes in packets marked with the CE codepoint. “
I know. It says at the end "The inverse proportionality requirement 
above is worded as a 'SHOULD' rather than a 'MUST' to allow reasonable 
flexibility when defining these specifications."

Koen and I discussed this. If you imagine having defined TCP as having 
to comply with the square-root law from the start, Cubic would not have 
been possible.

If you can think of something as liberal, but not as open to abuse, pls 
do...

>
> “are examples of a scalable congestion controls.”
> - remove /a/.
>
> “A scalable congestion control MUST react…”.
> - I agree. Although I think you may first wish to point to the AQM BCP 
> and say that “even though the congestion-controller is optimised to 
> respond to congestion-experienced marks, it also needs to respond to 
> packet loss [RFC7567].”]\=
I actually think that will do the opposite of what you intend - distract 
from the loss requirement. I have merely started the sentence with:

    As well as responding to ECN markings, ...

I don't think it's appropriate to refer to the AQM BCP here. This is 
about CC.


> - I also have the same comment for A.1.3. to motivate why loss 
> reaction is important.
Here I've preceded with: "As well as responding to ECN markings in a 
scalable way, ..."
>
> Should this be with commas?:
> “non-L4S but ECN-capable bottleneck”
> be “non-L4S, but ECN-capable, bottleneck”
>
> “while it temporarily falls back to coexist with
> Reno .”
> - remove additional space. Is the word “while” better “during the time 
> it”?
Nah.

Note, having now written up a detailed design, and therefore thought 
about this more deeply, I've also deleted. "possibly temporarily" and 
"However an implementer who believes this would be beneficial if 
fall-back persists, can choose to do so,"

>
> In Section 5:
> “ Of course, a packet that carried both the ECT(1) codepoint and a
> relevant non-ECN identifier would also be classified into the L
> queue.”
> - why “of course” - I can see why this can happen. However, does it 
> HAVE to happen? why can’t a system put all ECT(1) traffic in a L4S 
> queue irrespective of the other classification, then return the 
> remaining L-compatible traffic to the L-queue? Is that not also a 
> valid approach? Or is this here stating diffserv rules? please clarify.
I meant (and have now said)

    Of course, a packet that carried both the ECT(1) codepoint and a
    non-ECN identifier associated with the L queue would be classified
    into the L queue.

In which case, 'of course' is I think justified.

>  “be used by some network operators who believe they
> identify non-L4S traffic that would be safe”
> - Our ops area colleagues may be upset by “believe they” and would 
> likely prefer “decide to”
Nah. ops people are no more sure of safety than anyone else, unless 
they're irresponsible.

>
> “(and CE indicates that it could be).”
> - perhaps:
> - “(a CE-mark indicates traffic could have was originally marked as 
> either ECT(0) or ECT(1).
>
> “ at a data rate that exceeds “ -
> - in various places: remove /data/?
>
> “for policy reason”
> - suggest “for a policy reason“
>
> “MUST NOT re-mark the end-to-end L4S identifier”
> - suggest adding “(ECT(1))” here too avoid any confusion.
I was trying to include CE as well. I've used the following:

    MUST NOT alter the end-to-end ECN identifier from L4S to Classic


>
> Section 8. I think there should be some discussion on what happens if 
> an attacker introduces ECT(1) rogue packets can it influence the 
> method, other than an attack which seeks to induce congestion?
I prefer to keep all that in l4s-arch, so I've referred there.
>
> In Appendix B:
>
> “In such cases, the L4S
> service would have to drop rather than mark frames even though
> they might contain an ECN-capable packet. “
> - Aren’t all L4S packets ECN-capable, this seems like you could have 
> an L4S packet that was not ECT(1) or CE… which is not true.
I've s/contain/encapsulate/ which hopefully clears the confusion.
>
> In B.1 - check all bullets end with “;”.
I've actually made them all end in '.' cos most were lists of full 
sentences, often multiple sentences and even multiple paragraphs.

>
> In B.2: “* CE would signify that the packet had been marked by an AQM
> implementing the L4S service.”
> - really why only L4S, rather than an “ECN service”
No. The whole list is in the context of an L4S DSCP, so this is an L4S 
ECN mark.

>
> … I didn’t re-review the remainder in this pass.
Well, I'm dead impressed by what you did re-review. Very very thorough. 
You've find some really things that imply you kept your attention and 
concentration throughout. And I appreciate that all the criticism is 
constructive. Thank you.

>
> Also:
>
> “ With a RACK-like
> approach, allowing longer before a loss is deemed to have occurred
> maintains higher throughput in the presence of reordering {ToDo:
> Quantify this statement}.”
> - missing text.

Whole set of paras has been removed (see earlier email).

Thank you muchly.


Bob



>
> ========.
> for policy reason
>
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[tsvwg] Questions and comments on draft-ietf-tsvw… G Fairhurst
Re: [tsvwg] Questions and comments on draft-ietf-… Bob Briscoe