Re: [tsvwg] [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104

Bob Briscoe <ietf@bobbriscoe.net> Mon, 25 March 2019 02:47 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C28D71202BC for <tsvwg@ietfa.amsl.com>; Sun, 24 Mar 2019 19:47:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.001
X-Spam-Level:
X-Spam-Status: No, score=-2.001 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mu6EMgYrZcNM for <tsvwg@ietfa.amsl.com>; Sun, 24 Mar 2019 19:47:46 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F53D1202BB for <tsvwg@ietf.org>; Sun, 24 Mar 2019 19:47:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=EmxnBLpU7NAu0I8Yplw9qAIZq37d77RyKq9M+5cH0lU=; b=85EGJOLTsvxRjTQ+9i3+voCI6V /GaK4spmSP9jMhqXqLGSP0ppMHmbkDaRSNMi7Z4snN+jAXruMkR/4nSbsqklJdUPlFPwuR8swHCx1 9iK70dNcihEdWplj9K18NchhQdWl3txbkqGjFmyX9pCL2AgMYoDG7GkMvBNRytCG/4WxWReMQtrFi xGfTfuzMNyuAvZdSRJQu0d/BDMhVjQmZeyyDoGdIlHZ3sGuL/AvKuNScjDvEBrM6P41Q3ozruKQTN VvxeVfWi3Le2jUDTKIDuPZ9x+NNTfMURpaj3FHL5cyddxXieMOiClw1jf8LdqJ3n22Y5T8kQ2rDy2 4NWqR9RA==;
Received: from 3.212.broadband18.iol.cz ([109.81.212.3]:18954 helo=[10.0.0.47]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from <ietf@bobbriscoe.net>) id 1h8FeJ-0001GK-8q; Mon, 25 Mar 2019 02:47:43 +0000
To: "Bless, Roland (TM)" <roland.bless@kit.edu>, Jonathan Morton <chromatix99@gmail.com>
Cc: tsvwg IETF list <tsvwg@ietf.org>, bloat <bloat@lists.bufferbloat.net>
References: <d91a6a71-5898-9571-2a02-0d9d83839615@bobbriscoe.net> <1E80578D-A589-4CA0-9015-B03B63042355@gmx.de> <CAA93jw7jvjbZkEgO8xc03uCayo+o-uENxxAkzQOaz_EZSLhocw@mail.gmail.com> <27FA673A-2C4C-4652-943F-33FAA1CF1E83@gmx.de> <1552669283.555112988@apps.rackspace.com> <alpine.DEB.2.20.1903151915320.3161@uplift.swm.pp.se> <7029DA80-8B83-4775-8261-A4ADD2CF34C7@akamai.com> <CAHxHggfPCqf9biCDmHMqA38=4y6gY6pFtRVMjMrrzYfLyRBf-g@mail.gmail.com> <1552846034.909628287@apps.rackspace.com> <5458c216-07b9-5b06-a381-326de49b53e0@bobbriscoe.net> <AC14ACBB-A7CC-40E0-882C-2519D05ADC05@akamai.com> <7e49b551-22e5-5d54-2a1c-69f53983d7e5@bobbriscoe.net> <04E62EA7-82EF-4F1B-A86D-5A23CA3B190A@gmail.com> <f331e710-ed2c-8628-4c82-f162d9cc8763@bobbriscoe.net> <C4BED95B-A169-473E-B857-C26BC2AFBE54@gmail.com> <10ac0f0f-1635-a0b3-6150-2ff3d63be788@bobbriscoe.net> <3d6b1619-ce5d-5649-0436-72bb10115e45@kit.edu> <b0320cc2-28ec-63e8-c2b8-594616b63cfe@bobbriscoe.net> <86d2b757-b9f7-391f-ea3e-5d7bab4711e8@kit.edu>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <87fad517-475e-4c4c-b629-326068fbf423@bobbriscoe.net>
Date: Mon, 25 Mar 2019 03:47:41 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1
MIME-Version: 1.0
In-Reply-To: <86d2b757-b9f7-391f-ea3e-5d7bab4711e8@kit.edu>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Ejy0AYWqz-bUw0Z60Ujmg4nBwt8>
Subject: Re: [tsvwg] [Bloat] [Ecn-sane] [iccrg] Fwd: [tcpPrague] Implementation and experimentation of TCP Prague/L4S hackaton at IETF104
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Mar 2019 02:47:50 -0000

Roland,

On 22/03/2019 13:53, Bless, Roland (TM) wrote:
> Hi Bob,
>
> see inline.
>
> Am 21.03.19 um 14:24 schrieb Bob Briscoe:
>> On 21/03/2019 08:49, Bless, Roland (TM) wrote:
>>> Hi,
>>>
>>> Am 21.03.19 um 09:02 schrieb Bob Briscoe:
>>>> Just to rapidly reply,
>>>>
>>>>
>>>> On 21/03/2019 07:46, Jonathan Morton wrote:
>>>>> The ECN field was never intended to be used as a classifier, except to
>>>>> distinguish Not-ECT flows from ECT flows (which a middlebox does need
>>>>> to know, to choose between mark and drop behaviours).  It was intended
>>>>> to be used to convey congestion information from the network to the
>>>>> receiver.  SCE adheres to that ideal.
>>>> Each PHB has a forwarding behaviour a DSCP re-marking behaviour and an
>>>> ECN marking behaviour. The ECN field is the claissifer for the ECN
>>>> marking behaviour.
>>> That's exactly the reason, why using ECT(1) as classifier for L4S
>>> behavior is not the right choice. L4S should use a DSCP for
>>> classification, because it is actually defining a PHB.
>> 1/ First Terminology
>> The definition of 'PHB' includes the drop or ECN-marking behaviour. For
>> instance, you see this in WRED or in PCN (Pre-Congestion Notification).
>> If you want to solely talk about scheduling, pls say the scheduling PHB.
> I thought that I'm well versed with Diffserv terminology, but I'm not
> aware that a Diffserv PHB requires the definition of an ECN marking
> behavior.
Ah well, I do not think that any living human has visited all the dark 
corners of the Diffserv world.

I didn't mean (or say) that a Diffserv PHB /requires/ an ECN marking 
behaviour, just that if you are talking about a Diffserv PHB that 
includes an ECN marking behaviour, it helps to say when you are solely 
talking about the scheduling part of the PHB.

In many cases when there is no ECN marking behaviour, it makes no 
difference if you omit the word scheduling, cos that is all there is to 
the behaviour.

> In fact ECN is orthogonal to Diffserv as both RFCs 2474 and
> 2475 do not even mention ECN. RFC 2475:
> "A per-hop behavior (PHB) is a description of the externally
> observable forwarding behavior of a DS node applied to a particular
> DS behavior aggregate." and "Useful behavioral distinctions
> are mainly observed when multiple behavior aggregates compete for
> buffer and bandwidth resources on a node."
Even the original experimental ECN spec RFC2481 was published just after 
2474 & 2475. So you wouldn't expect the original Diffserv specs to 
mention something that didn't exist then.

>
> Usually, there are different mechanisms how to implement a PHB,
> e.g., for EF one could use a tail drop queue and Simple Priority
> Queueing, Weighted Fair Queueing, or Deficit Round Robin and so
> on. Consequently, queueing and scheduling behavior are used to
> _implement_ a PHB, i.e., IMHO it makes sense to distinguish between
> the PHB as externally observable behavior and a specific _PHB
> implementation_ as also pointed out in RFC2475:
>     PHBs are implemented in nodes by means of some buffer management and
>     packet scheduling mechanisms.  PHBs are defined in terms of behavior
>     characteristics relevant to service provisioning policies, and not in
>     terms of particular implementation mechanisms.
>
>
> So some of the Diffserv PHBs do _not_ require using an AQM,
> which is often the basis for ECN marking, e.g., for EF
> tail drop should be sufficient. For other PHBs it may be
> useful to say something about ECN usage (as I did for LE).
>
> RFC 2475:
>
>     PHBs may be specified in terms of their resource (e.g., buffer,
>     bandwidth) priority relative to other PHBs, or in terms of their
>     relative observable traffic characteristics (e.g., delay, loss).
Since RFC2474 & 2475, AQM behaviour and/or ECN marking behaviour has 
become part of some Diffserv PHBs. E.g. WRED in AF. See any of the 
tables in RFC4594 that have an AQM column.

The need for ECN marking behaviour (rather than just AQM behaviour in 
general) as part of a PHB became necessary during the definition of PCN. 
Jo Babiarz and Kwok Ho Chan were both authors of 4594 and of many of the 
PCN specs, and proposed the term 'marking behaviour' as part of the PHB. 
You will find ECN marking behaviours are central to, for instance, RFC5670.

As I've pointed out already, the transitions used by SCE were already in 
the PCN baseline encoding spec [RFC6660], except only defined if 
accompanied by a specific DSCP (which was subsequently standardized as 
EF-ADMIT).


>
> I think that L4S therefore specifies such a PHB as it is defined
> in relation to the default PHB (as in the L4S arch draft
> "Classic service").
No. The L4S use of ECN is orthogonal to Diffserv, and can be associated 
with more than one scheduling PHB in a queuing hierarchy. See 
draft-briscoe-l4s-diffserv (which I would welcome you to review - Brian 
Carpenter is also currently reviewing it).

However, you are certainly right in thinking that L4S associated with 
the default PHB is by far and away the most important use-case for L4S. 
All the other possible schemes in l4s-diffserv are only possibilities - 
probably for corporate networks where proliferation of Diffserv models 
is currently most prevalent.

>
>> 2/ The architectural intent of the ECN field
>>
>> For many years (long before we thought of L4S) I have been making sure
>> that ECN propagation through the layers supports the duality of ECN
>> behaviours as both a classifier (on the way down from L7/L4 to L3/2) and
>> as a return value (on the way back up).
>>
>> The architecture of ECN is determined by the valid codepoint
>> transitions. They are:
> I wouldn't say that it's determined solely by the transitions.
Correct. I didn't say that either.
>
>> 1. 00->11
>> 2. 10->11
>> 3. 01->11
>> 4. 10->01
>>
>> The first three were in RFC3168, but it did not preclude the fourth.
>> The fourth was first standardized in RFC6660 (which I co-authored). This
>> had to be isolated from the e2e use of ECN by inclusion of a DSCP as well.
>>
>> The relatively late addition of the fourth approach means that an
>> attempt to mark using the SCE approach (10->01) is more likely to find
>> that it gets reversed when the outer header is decapsulated, if the
>> decapsulator hasn't been updated to the latest RFC that catered for this
>> fourth transition (RFC6040, also co-authored by me).
>>
>> L4S follows the original RFC3168 approach
>> SCE uses the fourth
>>
>> So, SCE proposes to use /a/ correct approach, but it might not work.
> In case of nodes that implement RFC6040? I think that it would
> be useful to measure how many boxes out there actually do this
[BB] To measure this, you would need to have a box between the tunnel 
endpoints to mark the outer, before you could check what the behaviour 
of the decap was.

> (or how widespread is ECN usage actually, e.g., how many boxes
> actually set CE on congestion? MAMI results anyone?).
[BB] Brian Trammel re-ran ETHZ's ECN tests in Jan'19. Informally he told 
me yesterday that he found about 13 CE marks (i.e. still hardly any). 
But this might mean the links aren't loaded when he's looking. It's hard 
to apply enough load while doing a large-scale measurement - takes far 
too long per path.

Stuart Cheshire is helping me to try to get something meaningful out of 
the data Apple is continually gathering. The data Padma presented at 
MAPRG at IETF-98 was % of Apple devices tested that saw at least one CE 
mark in 12 hours.

The hard problem is, once we find something CE-marking, we're interested 
in knowing whether it's FQ (which protects flows from each other) or 
single queue (which doesn't). Greg White, Jake Holland & I devised a 
test for this between us. Tweak a congestion control so you have an 
aggressive one. Run it in parallel with another regular CC between the 
same two hosts. Then look for correlation of RTT movements. Like common 
bottleneck detection in RMCAT.

>
>> Whereas L4S uses the original correct approach.
> Which might also not work...in case RFC3168 boxes set CE, so
> the L4S receivers/senders cannot know that the CE wasn't set
> by an L4S node.
Yes. That's a different point, but it's true.

>
>> 3a/ DualQ L4S AQMs
>> With the DualQ, the difference between the two queues is both in their
>> ECN marking behaviour and in their forwarding/scheduling behaviour.
>> However, whenever there's traffic in the classic queue the coupling
>> between the AQMs overrides the network scheduler. The coupling is solely
>> ECN behaviour not scheduling behaviour. So the primary difference
>> between the queues is in their ECN-marking behaviour.
>>
>> What do I mean by "the coupling overrides the network scheduler"? The
>> network scheduler certainly does give priority to L4S packets whenever
>> they arrive, but the coupling makes the L4S sources control how often
>> packets arrive. It's tough to reason about, because we haven't had a
>> mechanism like this before.
> Yes, the DualQ mechanism is actually nice, but what I particularly don't
> like is to fix the coupling into nodes in this way. If the congestion
> control behavior is different from your expectation it will not work
> (as already experienced with BBR) properly. This would ossify congestion
> control evolution and I see this a very big disadvantage of this
> approach.
The argument for using the square is to be compatible with the 
worst-case classic traffic, which is Reno. The worst-case will remain 
ossified for some many years yet. It doesn't mean that better CCs cannot 
develop within Classic. And to a certain extent, they still have to 
coexist with the worst case Classic... we'll see what the latest update 
on BBRv2 is in a few day's time.

But importantly, a square relationship between flow rates is not 
enforced by the network, it's encouraged for end-systems that have a 
default behaviour. But, if the network policy becomes wrong in future, 
end-systems can correct it.


>
>> 2b/ FQ L4S AQMs
>> If the AQM is implemented with per flow queues, the picture is clearer.
>> The only difference between the queues is in the ECN marking behaviour
>> of the different AQMs.
> This would at least avoid the baked-in coupling law problem...
Er, no. 1:1 is just as much a baked-in policy, and it really is baked-in 
when the network is enforcing it.

That's why we developed the DualQ - we wanted to avoid the network 
enforcing rigid fairness at every instant. It's much less ossified when 
end-systems keep to it voluntarily.



Bob



>
> Regards
>   Roland

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/