Re: [tsvwg] L4S Review

Bob Briscoe <research@bobbriscoe.net> Sat, 22 December 2018 00:04 UTC

Return-Path: <research@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AF36B130E1D for <tsvwg@ietfa.amsl.com>; Fri, 21 Dec 2018 16:04:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uyQ-AWjzLQD0 for <tsvwg@ietfa.amsl.com>; Fri, 21 Dec 2018 16:04:36 -0800 (PST)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F0C7130DF4 for <tsvwg@ietf.org>; Fri, 21 Dec 2018 16:04:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=8nvjVZLma3VDtEolx7oAZ3k0L1+IokA2rxLzgyQVK90=; b=VbaNbeOwAT0/lH52EcOzTEJoO qsGHtTCXIWXf4B/sCS4cCLeSd+RqCGmPfpP4Y83QRJntMX6e9/U7DSuiHej5dJQsz1k4jyJZxCJpu HcMlmpCSxQltC7AgdnJNLJsynjlFEK1mQ6RMgEF/qxbO/2RGCqeAWV15lG0d8NLpB6zHDedkFadC7 U27PTTEqQpCC4mbfhILmDXUtsenS3uww7vtyHJ198frd4d86xz3Un/6UDeMCj0pdFJOTLttrk00oU oSN6SVFlTMo4alzD9x2h3tkwBfw7FzZGhV+7UJkIzgaWFWYe65q6qts9qauzr7iUsJTqoK9rQBlhA vkvf4HYAQ==;
Received: from host-79-78-166-168.static.as9105.net ([79.78.166.168]:33352 helo=[192.168.2.3]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from <research@bobbriscoe.net>) id 1gaUmG-0003Rl-C9; Sat, 22 Dec 2018 00:04:28 +0000
To: Kuhn Nicolas <Nicolas.Kuhn@cnes.fr>, tsvwg IETF list <tsvwg@ietf.org>
References: <F3B0A07CFD358240926B78A680E166FF1C145042@TW-MBX-P02.cnesnet.ad.cnes.fr>
From: Bob Briscoe <research@bobbriscoe.net>
Message-ID: <9508306c-6832-4e6e-143c-07e1c2c75887@bobbriscoe.net>
Date: Sat, 22 Dec 2018 00:04:15 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <F3B0A07CFD358240926B78A680E166FF1C145042@TW-MBX-P02.cnesnet.ad.cnes.fr>
Content-Type: multipart/alternative; boundary="------------0BC81B2317F6100CB022097B"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/YBTyEPHS03GzN8wROShQzcpegf0>
Subject: Re: [tsvwg] L4S Review
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Dec 2018 00:04:41 -0000

Nicolas,

Thank you very much for these reviews. As always, I've added your name 
to the ACKs section and given responses inline, tagged [BB]...

And many apologies for my complete silence until now. I tuned out of 
online life for a while a little before you posted your review emails, 
and I've only just worked my way back to mails from that time.

On 27/11/2018 11:18, Kuhn Nicolas wrote:
>
> All,
>
> This is a quick review on two L4S related drafts 
> (draft-ietf-tsvwg-ecn-l4s-id-05 and 
> draft-ietf-tsvwg-aqm-dualq-coupled-08).
>
> draft-ietf-tsvwg-ecn-l4s-id-05 proposes modification to ECN semantics 
> to introduce a new L4S network service
>
> draft-ietf-tsvwg-aqm-dualq-coupled-08 describes DualQueue
>
> In general, the draft says either too much or too little on the 
> general picture and it is somehow confusing to see the interactions 
> between all the contributions.
>
> I would propose you to have a replicated section on all drafts 
> explaining their interaction (or a pointer to a web page, another 
> document, etc.).
>
[BB] Yes, Mikael Abrahamson made a similar comment about nowhere saying 
how the system works as a whole. I suggested that we will add to the L4S 
architecture draft to explain how it works. I would rather refer to that 
from the other drafts, which will otherwise remain as spec's rather than 
repeating the tutorial material. But I'd need to check the co-authors agree.

Would that work for you?

> Some detailed comments:
>
> *******
>
> Identifying Modified Explicit Congestion Notification (ECN) Semantics
>
>                    for Ultra-Low Queuing Delay (L4S)
>
> draft-ietf-tsvwg-ecn-l4s-id-05
>
> Review of https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-05 
> <https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-05>
>
> *******
>
> What happens if the codepoints are used by the sender but the sender 
> does not comply with the prerequisite detailed in section 4.3?
>
[BB] That depends on which prerequisite. Let's go through them:


      #1 Queuing Delay

The main concern is to respond to ECN promptly to keep queue delay 
extremely low. Handling non-compliance on latency is discussed in the 
Security Considerations sections of the L4S architecture draft, 
particularly:
8.1. Traffic (Non-)Policing 
<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-8.1>
8.2. 'Latency Friendliness' 
<https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-8.2>

We (at CableLabs) have developed and implemented a low complexity queue 
protection algorithm that can optionally sit in front of the L4S queue. 
In keeping with the philosophy of L4S, we have crafted the algorithm to 
solely protect the queue from flows causing delay, independently of how 
much bandwidth they use.

Essentially, it maintains a per-flow queuing score which rapidly ages 
out for regular flows - so that flow state storage can be recycled 
around the packets of live flows. Any flows building up a queuing score 
faster than it ages out will be candidates for sanctions if queue delay 
exceeds a configurable threshold (e.g. 2ms). The default sanction is to 
redirect packets to the Classic queue, thus protecting the latency of 
any other traffic in the 'L' queue.

As section 8.2 says, it might not be necessary to enforce 'Latency 
Friendliness' (much as TCP Friendliness could be enforced but it usually 
isn't). Applications will harm themselves as much as anyone else if they 
misbehave. And for many deployment scenarios, the DualQ AQM will be 
operating on a queue dedicated solely to all the flows for one customer, 
but isolated from other customers. Nonetheless, we developed queue 
protection, because we knew that certain operators might still want to 
offer queue protection, due to concerns about accidental or malicious 
applications.


      #2 Rate response on loss, etc.

There have been a range of responses to loss on the Internet since the 
early days. L4S doesn't change any of that, so we expect the Internet to 
continue to muddle along in this respect. Algorithms for networks to 
police flow rate response to loss were developed in the last decade but 
they are rarely needed or deployed. If they become necessary on the 
Internet generally that would apply to L4S fall-back too.

The same applies to fall-back on classic ECN, eliminating RTT bias and 
rate response at v low RTT.


      #3 Loss detection in units of time

If a flow doesn't comply with this one, its level of spurious 
retransmissions will rise. This is likely to harm itself more than 
anyone else, so I doubt compliance would need to be enforced.


      4/ New text?

We intend to flesh out the sections in the architecture about queue 
delay protection in the next couple of months.

Also, I could write-up what I've just said under #2 & #3 as more 
subsections of the security considerations.

I could also add a cross-ref from this section in ecn-l4s-id. But too 
many x-refs can reduce readability.

Thoughts?


> Also, another network component on the path could exploit this 
> semantic for other purpose.
>
> Is there a way to mention what should not be done with this new 
> semantics ?
>
> One should not use it for other purposes that the one proposed in the 
> whole L4S framework ?
>
[BB] Can you give an example of what you're thinking of?

I prefer to keep requirements to interoperability. Often, if something 
turns out to be used differently to how the designers intended, it can 
be a good thing. But other times, it might depend on your perspective. 
Whatever, if something is useful in a different way to that intended by 
the designers, I don't think anyone is going to refrain from using it in 
that way just cos an RFC says so.

For instance, if the 3GPP had said, "Users MUST NOT use the control 
channel for sending short inane messages to their mates," would that 
have stopped SMS? Or, what if the IETF said "Network operators MUST NOT 
use TCP seq no's and ack no's to measure the user's round trip time in 
order to monitor service quality."


>    o  it SHOULD not lead to some packets of a transport-layer flow being
>       served by a different queue from others.
>
> I do not understand this point. You may want to make it clearer.
>
[BB] OK, how about:

   o it SHOULD be consistent on all the packets of a transport layer 
flow, so that some packets of a flow are not served by a different queue 
to others



>    The Diffserv architecture provides Expedited Forwarding [RFC3246  <https://tools.ietf.org/html/rfc3246>], so
>    that low latency traffic can jump the queue of other traffic.
>    However, on access links dedicated to individual sites (homes, small
>    enterprises or mobile devices), often all traffic at any one time
>    will be latency-sensitive.  Then Diffserv is of little use.  Instead,
>    we need to remove the causes of any unnecessary delay.
>
> IMHO some more info should be provided here. AFAIK, the Diffserv 
> architecture does not provide hints on how to schedule packets 
> afterwards.
>
> Hierarchical queuing would be necessary when control, management and 
> data traffic share the same (let’s say) wireless link.
>
> To make it clearer, I would propose you to point the 
> draft-briscoe-tsvwg-l4s-diffserv-02 draft at this stage or remove that 
> statement.
>
[BB] Perhaps the problem is that we haven't made the link with the start 
of the previous para clear. That says increasingly all of a user's 
applications at any one time need low delay. Then this 2nd para says, 
when everything at the same time wants low latency there's a limit to 
what you can do by overtaking some traffic with others, so you have to 
tackle the root cause of delay, which is what L4S addresses.

Is this perhaps why you didn't grock this para?
We can cross-ref to the l4s-diffserv draft, but I'd rather make the 
point clear, if it's not.

> The likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST
>    be roughly proportional to the square of the likelihood that it would
>    have marked it if it had been an L4S packet (p_L). That is
>
> If you want this part to be self-content, I would propose you to 
> explain the rationale behind this. Also, I am not sure this is 
> necessary in this draft.
>
[BB] OK. As suggested above, I would rather keep ecn-l4s-id and 
aqm-dualq-coupled as just spec's, and point to the L4S architecture 
draft for tutorial. Currently there's nothing about squares or square 
roots in the arch draft, but I have promised to add it.

So I have added a {ToDo} note at this point in ecn-l4s-id to cross-ref 
to the architecture for an explanation of why the square is important.

> Indeed, it raises questions on the link with Diffserv architecture and 
> where AQM would be deployed on it.
>
[BB] I don't see anything in this para that raises anything about the 
link with the Diffserv arch. Can you explain what's on your mind?

> flow.  Such a switch-over is likely to be very rare, but It could be
>
> Typo
>
[BB] OK
>
> *******
>
> DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
>
> (L4S)
>
> draft-ietf-tsvwg-aqm-dualq-coupled-08
>
> Review of 
> https://tools.ietf.org/html/draft-ietf-tsvwg-aqm-dualq-coupled-08
>
> *******
>
>    This document specifies a `DualQ Coupled AQM' extension that solves
>    the problem of coexistence between scalable and classic flows,
>    without having to inspect flow identifiers.  The AQM is not like
>    flow-queuing approaches [RFC8290  <https://tools.ietf.org/html/rfc8290>] that classify packets by flow
>    identifier into numerous separate queues in order to isolate sparse
>    flows from the higher latency in the queues assigned to heavier
>    flows.  In contrast, the AQM exploits the behaviour of scalable
>    congestion controls like DCTCP so that every packet in every flow
>    sharing the queue for DCTCP-like traffic can be served with very low
>    latency.
>
> I would propose you to first define what DualQ is before saying what 
> is is not and comparing it with flow queue schemes.
>
[BB] Your suggested approach is one I have adopted more recently, so 
your point is well made. To adopt your approach we will have to 
completely re-write this introduction - so I'll make sure the co-authors 
agree first.

IMO, we don't need to have the comparison with alternatives in each L4S 
draft. The L4S Architecture draft has the following structure, which is 
the only place where comparison with alternatives needs to be:

    5  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5>.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .9  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-9>
      5.1  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5.1>.  Why These Primary Components? . . . . . . . . . . . . . .9  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-9>
      5.2  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#section-5.2>.  Why Not Alternative Approaches? . . . . . . . . . . . . .10  <https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-03#page-10>


I do think every draft needs a statement of the problem it addresses 
that is not just described by cross-reference. And some of a problem 
statement is always about what other solutions can't do. But you're 
right that it doesn't have to dominate the start of the problem statement.

> The following
>    parameters MAY be operator-configurable, e.g. to tune for non-
> Internet settings:
>
> To be generic, you may want to tune both the frequency at which the 
> AQM parameters are adapted and the delay target.
>
> It is confusing with the notion of maximum RTT and typical RTT.
>

[BB] I will try to explain these here, rather than remove them. Then if 
you see what I'm getting at, we can try to work out wording for the draft:

For Classic TCP and Classic AQMs:

  * The target delay that a Classic AQM on a low-stat-mux link uses is
    set based on the typical RTT of most traffic passing through it.
    That's cos the amplitude of the sawteeth of a single Classic TCP
    flow is roughly one RTT. So, when there's only one TCP in the AQM,
    it won't underutilize the link in typical cases, if the target delay
    is set to a typical RTT. A single TCP flow with a longer RTT than
    the target will under-utilize the link, but nowadays it is
    considered acceptable to only get full bandwidth utilization for
    typical flows - it's considered unnecessary to set target to the
    maximum RTT (e.g. 200ms) which would certainly achieve 100%
    bandwidth utilization for flows of all RTTs, but it would cause
    worst-case queuing delay all the time.
  * An AQM needs to know the max RTT if it is going to automatically
    derive its stability parameter. For instance:
      o the stability analysis of PI algorithms determines the max
        update interval that will not lead to oscillation, given the max
        RTT.
      o the stability of RED requires the characteristic smoothing time
        of the EWMA of the queue to be of the order of the max RTT flow
        that will use it.

Usually, these parameters are set by a human who has derived them from 
their assumptions about typical RTT and max RTT respectively. But a more 
abstract API could (and ought to) take in these two RTT assumptions and 
derive the specific parameters from them (even if it were just via a 
look-up table).

Is this clearer?


Bob



-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/