Re: [tsvwg] L4S Review

Kuhn Nicolas <Nicolas.Kuhn@cnes.fr> Tue, 22 January 2019 10:58 UTC

From: Kuhn Nicolas <Nicolas.Kuhn@cnes.fr>
To: 'Bob Briscoe' <research@bobbriscoe.net>, tsvwg IETF list <tsvwg@ietf.org>
Thread-Topic: L4S Review
Thread-Index: AdSFn2HBJKJ4KKTUSmizpiR6h9rPTAT4hyWABS43Q8A=
Date: Tue, 22 Jan 2019 10:57:27 +0000
Deferred-Delivery: Tue, 22 Jan 2019 10:58:26 +0000
Message-ID: <F3B0A07CFD358240926B78A680E166FF1EB60DCF@TW-MBX-P02.cnesnet.ad.cnes.fr>
References: <F3B0A07CFD358240926B78A680E166FF1C145042@TW-MBX-P02.cnesnet.ad.cnes.fr> <9508306c-6832-4e6e-143c-07e1c2c75887@bobbriscoe.net>
In-Reply-To: <9508306c-6832-4e6e-143c-07e1c2c75887@bobbriscoe.net>
Accept-Language: en-US
Content-Language: fr-FR
Content-Type: multipart/alternative; boundary="_000_F3B0A07CFD358240926B78A680E166FF1EB60DCFTWMBXP02cnesnet_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/gxInM-b-LpljFZdtLT5UZreqbOI>
Subject: Re: [tsvwg] L4S Review
Precedence: list

Bob, all,

Sorry for my late reply.
XMAS break and sick leave did not help.

Thanks,

Nico

De : Bob Briscoe <research@bobbriscoe.net>
Envoyé : samedi 22 décembre 2018 01:04
À : Kuhn Nicolas <Nicolas.Kuhn@cnes.fr>; tsvwg IETF list <tsvwg@ietf.org>
Objet : Re: L4S Review

Nicolas,

Thank you very much for these reviews. As always, I've added your name to the ACKs section and given responses inline, tagged [BB]...

And many apologies for my complete silence until now. I tuned out of online life for a while a little before you posted your review emails, and I've only just worked my way back to mails from that time.
On 27/11/2018 11:18, Kuhn Nicolas wrote:
All,

This is a quick review on two L4S related drafts (draft-ietf-tsvwg-ecn-l4s-id-05 and draft-ietf-tsvwg-aqm-dualq-coupled-08).

draft-ietf-tsvwg-ecn-l4s-id-05 proposes modification to ECN semantics to introduce a new L4S network service
draft-ietf-tsvwg-aqm-dualq-coupled-08 describes DualQueue

In general, the draft says either too much or too little on the general picture and it is somehow confusing to see the interactions between all the contributions.
I would propose you to have a replicated section on all drafts explaining their interaction (or a pointer to a web page, another document, etc.).
[BB] Yes, Mikael Abrahamson made a similar comment about nowhere saying how the system works as a whole. I suggested that we will add to the L4S architecture draft to explain how it works. I would rather refer to that from the other drafts, which will otherwise remain as spec's rather than repeating the tutorial material. But I'd need to check the co-authors agree.

Would that work for you?
[NK] Totally - that would clearly help.

Some detailed comments:

*******
Identifying Modified Explicit Congestion Notification (ECN) Semantics
for Ultra-Low Queuing Delay (L4S)
draft-ietf-tsvwg-ecn-l4s-id-05

Review of https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-05
*******

What happens if the codepoints are used by the sender but the sender does not comply with the prerequisite detailed in section 4.3?
[BB] That depends on which prerequisite. Let's go through them:
#1 Queuing Delay
The main concern is to respond to ECN promptly to keep queue delay extremely low. Handling non-compliance on latency is discussed in the Security Considerations sections of the L4S architecture draft, particularly:
8.1. Traffic (Non-)Policing<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-8.1>
8.2. 'Latency Friendliness'<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-8.2>

We (at CableLabs) have developed and implemented a low complexity queue protection algorithm that can optionally sit in front of the L4S queue. In keeping with the philosophy of L4S, we have crafted the algorithm to solely protect the queue from flows causing delay, independently of how much bandwidth they use.

Essentially, it maintains a per-flow queuing score which rapidly ages out for regular flows - so that flow state storage can be recycled around the packets of live flows. Any flows building up a queuing score faster than it ages out will be candidates for sanctions if queue delay exceeds a configurable threshold (e.g. 2ms). The default sanction is to redirect packets to the Classic queue, thus protecting the latency of any other traffic in the 'L' queue.

As section 8.2 says, it might not be necessary to enforce 'Latency Friendliness' (much as TCP Friendliness could be enforced but it usually isn't). Applications will harm themselves as much as anyone else if they misbehave. And for many deployment scenarios, the DualQ AQM will be operating on a queue dedicated solely to all the flows for one customer, but isolated from other customers. Nonetheless, we developed queue protection, because we knew that certain operators might still want to offer queue protection, due to concerns about accidental or malicious applications.
#2 Rate response on loss, etc.
There have been a range of responses to loss on the Internet since the early days. L4S doesn't change any of that, so we expect the Internet to continue to muddle along in this respect. Algorithms for networks to police flow rate response to loss were developed in the last decade but they are rarely needed or deployed. If they become necessary on the Internet generally that would apply to L4S fall-back too.

The same applies to fall-back on classic ECN, eliminating RTT bias and rate response at v low RTT.
#3 Loss detection in units of time
If a flow doesn't comply with this one, its level of spurious retransmissions will rise. This is likely to harm itself more than anyone else, so I doubt compliance would need to be enforced.
4/ New text?
We intend to flesh out the sections in the architecture about queue delay protection in the next couple of months.

Also, I could write-up what I've just said under #2 & #3 as more subsections of the security considerations.

I could also add a cross-ref from this section in ecn-l4s-id. But too many x-refs can reduce readability.

Thoughts?
[NK] Adding what you mention under #2 and #3 would be interesting - to show that cases where the sender does not comply with the prerequisite are covered.

Also, another network component on the path could exploit this semantic for other purpose.

Is there a way to mention what should not be done with this new semantics ?
One should not use it for other purposes that the one proposed in the whole L4S framework ?
[BB] Can you give an example of what you're thinking of?

I prefer to keep requirements to interoperability. Often, if something turns out to be used differently to how the designers intended, it can be a good thing. But other times, it might depend on your perspective. Whatever, if something is useful in a different way to that intended by the designers, I don't think anyone is going to refrain from using it in that way just cos an RFC says so.

For instance, if the 3GPP had said, "Users MUST NOT use the control channel for sending short inane messages to their mates," would that have stopped SMS? Or, what if the IETF said "Network operators MUST NOT use TCP seq no's and ack no's to measure the user's round trip time in order to monitor service quality."
[NK] I get your point. The rationale of the draft is more on how making L4S happen and interoperate rather than detailing how this signaling could be exploited for other purposes (which is the case for any other marking).

o it SHOULD not lead to some packets of a transport-layer flow being

served by a different queue from others.
I do not understand this point. You may want to make it clearer.
[BB] OK, how about:

o it SHOULD be consistent on all the packets of a transport layer flow, so that some packets of a flow are not served by a different queue to others
[NK] That is much clearer, thanks.

The Diffserv architecture provides Expedited Forwarding [RFC3246<https://tools.ietf.org/html/rfc3246>], so

that low latency traffic can jump the queue of other traffic.

However, on access links dedicated to individual sites (homes, small

enterprises or mobile devices), often all traffic at any one time

will be latency-sensitive. Then Diffserv is of little use. Instead,

we need to remove the causes of any unnecessary delay.

IMHO some more info should be provided here. AFAIK, the Diffserv architecture does not provide hints on how to schedule packets afterwards.
Hierarchical queuing would be necessary when control, management and data traffic share the same (let's say) wireless link.
To make it clearer, I would propose you to point the draft-briscoe-tsvwg-l4s-diffserv-02 draft at this stage or remove that statement.
[BB] Perhaps the problem is that we haven't made the link with the start of the previous para clear. That says increasingly all of a user's applications at any one time need low delay. Then this 2nd para says, when everything at the same time wants low latency there's a limit to what you can do by overtaking some traffic with others, so you have to tackle the root cause of delay, which is what L4S addresses.

Is this perhaps why you didn't grock this para?
We can cross-ref to the l4s-diffserv draft, but I'd rather make the point clear, if it's not.
[NK] This is clearer to me now. I would propose you to make it more "straight to the point" and more generic : "On access links dedicated to individual sites (homes, small enterprises or mobile devices), often all traffic at any one time will be latency-sensitive. Traditional rate-based or priority-based schedulers would not guarantee a low latency for all."

The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST

be roughly proportional to the square of the likelihood that it would

have marked it if it had been an L4S packet (p_L). That is

If you want this part to be self-content, I would propose you to explain the rationale behind this. Also, I am not sure this is necessary in this draft.
[BB] OK. As suggested above, I would rather keep ecn-l4s-id and aqm-dualq-coupled as just spec's, and point to the L4S architecture draft for tutorial. Currently there's nothing about squares or square roots in the arch draft, but I have promised to add it.

So I have added a {ToDo} note at this point in ecn-l4s-id to cross-ref to the architecture for an explanation of why the square is important.
[NK] Ok !

Indeed, it raises questions on the link with Diffserv architecture and where AQM would be deployed on it.
[BB] I don't see anything in this para that raises anything about the link with the Diffserv arch. Can you explain what's on your mind?
[NK] I guess through the reading and since the document was referring to the Diffserv arch, I was wondering the link between AQM and Diffserv. Such as proposed earlier, referring to Diffserv as a "traditional classifier and scheduler" would remove the ambiguity.

flow. Such a switch-over is likely to be very rare, but It could be
Typo
[BB] OK

*******
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S)
draft-ietf-tsvwg-aqm-dualq-coupled-08
Review of https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-08
*******

This document specifies a `DualQ Coupled AQM' extension that solves

the problem of coexistence between scalable and classic flows,

without having to inspect flow identifiers. The AQM is not like

flow-queuing approaches [RFC8290<https://tools.ietf.org/html/rfc8290>] that classify packets by flow

identifier into numerous separate queues in order to isolate sparse

flows from the higher latency in the queues assigned to heavier

flows. In contrast, the AQM exploits the behaviour of scalable

congestion controls like DCTCP so that every packet in every flow

sharing the queue for DCTCP-like traffic can be served with very low

latency.

I would propose you to first define what DualQ is before saying what is is not and comparing it with flow queue schemes.
[BB] Your suggested approach is one I have adopted more recently, so your point is well made. To adopt your approach we will have to completely re-write this introduction - so I'll make sure the co-authors agree first.

IMO, we don't need to have the comparison with alternatives in each L4S draft. The L4S Architecture draft has the following structure, which is the only place where comparison with alternatives needs to be:

5<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5>. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 9<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-9>

5.1<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5.1>. Why These Primary Components? . . . . . . . . . . . . . . 9<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-9>

5.2<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5.2>. Why Not Alternative Approaches? . . . . . . . . . . . . . 10<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-10>

I do think every draft needs a statement of the problem it addresses that is not just described by cross-reference. And some of a problem statement is always about what other solutions can't do. But you're right that it doesn't have to dominate the start of the problem statement.
[NK] This discussion is at the heart of the issue with L4S : it can get confusing on where to start reading to understand how the different pieces are tied to each other. I agree that having a self-content problem statement for each draft is important. I guess the point on the L4S architecture draft covers that issue.

The following

parameters MAY be operator-configurable, e.g. to tune for non-

Internet settings:

To be generic, you may want to tune both the frequency at which the AQM parameters are adapted and the delay target.
It is confusing with the notion of maximum RTT and typical RTT.

[BB] I will try to explain these here, rather than remove them. Then if you see what I'm getting at, we can try to work out wording for the draft:

For Classic TCP and Classic AQMs:

* The target delay that a Classic AQM on a low-stat-mux link uses is set based on the typical RTT of most traffic passing through it. That's cos the amplitude of the sawteeth of a single Classic TCP flow is roughly one RTT. So, when there's only one TCP in the AQM, it won't underutilize the link in typical cases, if the target delay is set to a typical RTT. A single TCP flow with a longer RTT than the target will under-utilize the link, but nowadays it is considered acceptable to only get full bandwidth utilization for typical flows - it's considered unnecessary to set target to the maximum RTT (e.g. 200ms) which would certainly achieve 100% bandwidth utilization for flows of all RTTs, but it would cause worst-case queuing delay all the time.
* An AQM needs to know the max RTT if it is going to automatically derive its stability parameter. For instance:
* the stability analysis of PI algorithms determines the max update interval that will not lead to oscillation, given the max RTT.
* the stability of RED requires the characteristic smoothing time of the EWMA of the queue to be of the order of the max RTT flow that will use it.
Usually, these parameters are set by a human who has derived them from their assumptions about typical RTT and max RTT respectively. But a more abstract API could (and ought to) take in these two RTT assumptions and derive the specific parameters from them (even if it were just via a look-up table).
[NK] I totally agree with your point. That is what we tried to reflect in the RFC7928 about the "tunability" of the AQM parameters. An "easy to deploy" AQM would indeed need some guidelines to be adapted to the specific context (e.g. our 500ms+ RTT in SATCOM) even if RFC7567 details that specific tuning should be avoided.

Is this clearer?

Bob

________________________________________________________________

Bob Briscoe http://bobbriscoe.net/

[tsvwg] L4S Review Kuhn Nicolas
Re: [tsvwg] L4S Review Scheffenegger, Richard
Re: [tsvwg] L4S Review Kuhn Nicolas
Re: [tsvwg] L4S Review Bob Briscoe
Re: [tsvwg] L4S Review Bob Briscoe
Re: [tsvwg] L4S Review Kuhn Nicolas
Re: [tsvwg] L4S Review Kuhn Nicolas