Re: [tsvwg] Update to Position Statement on ECT(1)

"alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk> Fri, 08 May 2020 22:59 UTC

Return-Path: <alex.burr@ealdwulf.org.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EC0CE3A0FA1 for <tsvwg@ietfa.amsl.com>; Fri, 8 May 2020 15:59:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.12
X-Spam-Level:
X-Spam-Status: No, score=-1.12 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TVsDg3dy7BXq for <tsvwg@ietfa.amsl.com>; Fri, 8 May 2020 15:59:00 -0700 (PDT)
Received: from sonic313-14.consmr.mail.bf2.yahoo.com (sonic313-14.consmr.mail.bf2.yahoo.com [74.6.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00C4A3A0E2C for <tsvwg@ietf.org>; Fri, 8 May 2020 15:58:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1588978737; bh=Dcpnac34c+32zlFRd2uEWGBPq9aGdbp09Ri6QlUFKMY=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=TWKERmxsGgxDuT3jLt7pwtDv7mCNXcBWfalivUAAVJJuy7wYzgArgvnB8SXRqEfRlS9Y8f0AzyasCdvPgx7MqKxnAQtEb1f2JZR7E8z435Jrhgv1bPFJ3nwc8oEb8kCkAnHC4inICSTPyyAajFck02u9KdrXh6DNseEpNlqRX2iDm3ol30NcZXADo6KYGVx5GNFLGNo8GkCHIRobRmbZdq8m99UlH0qQExWnNB5n9qhb2rv1s1Vxypx0qPVXinJsDOkBvKBQOvwagjZ8jldTDYuH7vzt2VpFlyTKvu0QLrW5gxJYBqr3uLPIyRKBL6PhUX3MiTTY0e1VTkR4eX4tKQ==
X-YMail-OSG: eKvfD6kVM1lxTcaRntoxYmU8YwdXq8khLfIoIKOoBXwia8B_n3Ifaku2TGNNudj PkmOxhmu70UKfksVtFqxp.5kzv52ePD4Y18T6s.vs52y2iZiEoiZ.eJKssFHcau5kieSEaaq6mc_ x.ppuKG86HNtHDb2P8SXyXCXH2wOZaCLt5GT.Jxnu7a_ylhAZi8cPrp1XlsuBH59DVCIqe9R1.Zw FTk8RORwFtli6rkUPBogN47D0kWq..hX2tIb6Ppr3hChA3C4.C8oX2H7hV6qflGHZT4bAqHnefKm 8VgYJ.g2q4KkmBB2CmdCRHh_vQj1zVjwa2szSKz0wSLAcf8u.DpW3FmgwGrXpbBPuvxzyurg5.xm wluGPeGlNK.4kbjwTKgJvVIu.ZRm7xGXOF8aA07up9l8sitBWogT7CQ9vkEJl8H6goci.nFR.xXE jsDkQx_Yxx11HEc3kPKlxZ5vZEhfsZFoTeUOO1u2unKTD8MdJhVa_pjIwGZuuMQeZRCgr8gtjPXF H370d3QlML0jTqKysuyPIi10tS8Ped8Tie2vo4qDoT2m3wB9a0ahzePB59fL4zAQu6JoQ9ApYA72 Dmrhaf.brVb5zAY.C8MF7zeS5.KRms5EJ415acH5TNX5s71G2PbvLp7O_MtY8VJXGuvi2uZV.w9u z3SrdS99MmT5Mnn48vh5Hz0pZ.Fq4HjquPwQri3ZWdFRmMZ8xs8GzojjOU3zULFDP3_oc7SmnbJ2 x.kV5scYJessDIs1Ak2dYu4FbbwK5grYTVLLBDpg5HMYSEv1WBf8eEy.4JHbnGNIdyY482V7Vho_ ouanQL_1JO_j_IFSBFIcBlCzLKey03TEzMpsInMLirUcwMgedlnABnBgJH02wC0xoDIBeSa4MUP9 Um01HWdrHwS7Lw9yB0nlkYxR3BrykJAkEpuoN314rw0EMbFflq9_adExprAb4.wpse5q8JmjROiu W39d5qOcVIU3bG.Hm.tvRPZ9M.SUUcVlmyjbCibRLKYpn2PCny2ODtSI9MSswngXK5TkAHYfSOPB _Mw49LZR.HHRSnZAC5NLcYmr3j6qkrghj9NozDxOZv8E0u.28VQzZhvm3PWnRMhz0lp_54zYwSOO nMJ3tVJsWeMs6oy.40XohzhAKnQYxlWziktVS88NZ6..StjSOLV07YADUolSVVIN7LaYf2cwCZS0 cZ953RSti7lL1LuQrGpCWBlhcXZc_7Ml7ou2736C3Q1tIzm_L_vzlNOG2ga.D1wWTzD.l6ILkDkI wj4cUbxOJtyWIP7Y9jS_kXFLdgMAtgM2cUu8L9tbA91W3YYl9G5EfRlE2_8uq.E36gYQ537tpbY4 V3P6W2.PhORynFzY-
Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.bf2.yahoo.com with HTTP; Fri, 8 May 2020 22:58:57 +0000
Date: Fri, 08 May 2020 22:58:53 +0000
From: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>
Reply-To: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>
To: "tsvwg@ietf.org" <tsvwg@ietf.org>, "Holland, Jake" <jholland=40akamai.com@dmarc.ietf.org>
Message-ID: <1523713490.286122.1588978733887@mail.yahoo.com>
In-Reply-To: <BE44EAE9-5CFB-4F5D-85B8-05AFA516C151@akamai.com>
References: <BE44EAE9-5CFB-4F5D-85B8-05AFA516C151@akamai.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: WebService/1.1.15902 YMailNorrin Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/jVA0rBzz-4TkZQA2D90GvIFn040>
Subject: Re: [tsvwg] Update to Position Statement on ECT(1)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 May 2020 22:59:02 -0000

Hi Jake, tsvwg,

If we're thinking of alternate encodings, it seems to me that there are three theoretically possible:

A) Jake's encoding: ECT(0) as 1/p, CE as RFC3168
 Advantages and disadvantages as mentioned already.


B) As A), but with ECT(0) and CE switched.
This retains the classification properties of L4S (both 1/p marked and unmarked packets classify as low latency) but doesn't resolve the ambiguity of CE versus RFC3168, and therefore doesn't obviously provide a benefit.

C) ECT(1) as 1/p, CE as RFC3168

Without a classifier, this has the problems with tunnels etc that people have mentioned.

However it occurs to me that the ECT(1) (and CE) could still be used as a classifier in the following way:
Have the Prague sender premark 50% of packets with CE, in an order known to the receiver (eg, alternating).
The L4S AQM and the RFC3168 would then only be able to mark different packets, so  the receiver can then determine with good confidence whether any CE marked packet was 'originally' ECT(1), and if so, it must have been marked by a RFC3168 
AQM. (At this point the sender should revert to RFC3168 and stop sending premarked CE packets).

This has the advantage of over A) that the classification signal is not lost after the first bottleneck. However it has the same problem with tunnels, and I can see some additional issues:

 Both the RFC3168 and L4S congestion signals would be degraded.  For RFC3168 this degradation would be only until the RFC3168 AQM was detected, which under conditions of congestion should be with a well defined probability per RTT. It is not obvious how much a 50% loss of bandwidth for the 1/p signal would weaken the latency 

Middleboxes which rewrite tcp (eg, by changing packet boundaries) would cause problems. Work would be required to check that they can be detected reliably.

Although the classification would be preserved across L4S AQMs, conditions of an L4S AQM followed by an RFC3168 AQM (or vice versa) could cause some congestion marks to be "undone". However this could only occur until the RFC3168 AQM was detected. IT woudl not be caused my multiple AQMS of the same type.

CE has up till now been a fixed point (no transitions away from CE) so there may be other network elements which are surprised by transitions away from it.

Anyway, hope that is of interest.

(I will not be taking a  position on the consensus call, as I do not currently work in networking and don't have time to do the necessary due diligence).

Alex





On Friday, May 8, 2020, 8:51:26 PM GMT+1, Holland, Jake <jholland=40akamai.com@dmarc.ietf.org> wrote: 

Hi tsvwg,

As promised in my original position statement[1], if I substantially changed
my views on the ECT(1) question I would post an update.

It has come to my attention that a technical fix is possible for my safety
concerns with "ECT(1) as an input" by changing the signaling scheme.

Although I still stand by all my claims in the original position statement,
the existence of a safe signaling scheme that uses ECT(1) as a network input
has changed my conclusion on the input/output question.

New position:
- I slightly prefer ECT(1) as an input (with qualifiers, given below)

(Note that this position should not be taken as an endorsement of L4S's
safety as currently proposed.)

The new signaling scheme that drove my change of position is this:

- ECT(1) is set from sender after negotiating endpoint support
- On-path devices change ECT(1)->ECT(0) to signal low levels of congestion
- On-path devices continue to use CE, as in RFC3168, to signal high levels
  of congestion, resulting in a required multiplicative decrease response.

Under this scheme, for any path with a single AQM that's dualq, ECT(1)
remains a very good classifier for that AQM.  Since this covers most
relevant paths that aren't within a controlled environment like datacenters,
it has a low downside.

Under this scheme, ECT(0) becomes the 1/p signal for dualq+TCP Prague, and
CE becomes the 1/sqrt(p) signal from the classic queue if the LL queue
overflows, and results in multiplicative decrease from the sender.

This would make L4S compatible with RFC 3168 without relying on a fragile
classic queue detection algorithm, so it would address my safety concerns.

As with all available signaling schemes, I acknowledge that this approach
is not perfect, and comes with tradeoffs.  A few of the known tradeoffs
would include (with thanks to Bob, Koen, and Kyle for explaining some of
these to me offline):
- existing tunneling decapsulation specs would often lose non-CE signals
- the existing accecn spec would often lose non-CE signals
- For paths with multiple AQMs, the classifier partially loses integrity in
  later AQMs when earlier AQMs are loaded.  (Note also the worse downside
  that increasing deployment of new AQMs potentially reduces the fidelity
  further.)

In spite of the downside from these tradeoffs (and the work that would be
necessary to fix the specs and their deployment to capture the most value
from L4S), a signaling scheme with the backward compatibility that this
approach provides is what would make the key difference between a safely
deployable L4S and not, IMO.

As I said, I still stand by my previous claims.  In particular, I still
believe that DSCP is a reasonable and appropriate classifier for L4S
traffic at this stage of its maturity.

However, I also acknowledge that there's value in getting a quicker wide
deployment, as long as it can be done safely.  Since I believe ECT(1) with
the above signaling scheme can do so, I now think it's as reasonable a
choice as DSCP, but carries substantial benefits.

Since this approach would give almost the same benefits as "ECT(1) as
output", and also provides a classifier that can serve dualq's needs well
in most of the deployment scenarios, "ECT(1) as input" is my current
preference, because of my new belief that it can be made compatible
with RFC 3168 queues and still mostly get the classification job done.

I remain opposed to moving L4S forward in a way that's not compatible with
RFC 3168 queues, as it's currently proposed.

I also remain skeptical that it's possible to get the classic queue
detection working robustly, I think that's probably a dead end. And I
have become more skeptical of the viability of the queue protection
mechanisms mentioned, because those seem to require access to the layer 4
packet contents, which has been flagged as too hard to be practical.

So I remain skeptical of the safety stories told so far for the current
L4S proposal, because it has no MD fallback signal except loss or detection.

Best,
Jake

PS: I also have some mostly-supportive comments on Kyle's remarks about
the input/output question that might be relevant.  Thread is here and should
soon get a new message:
https://mailarchive.ietf.org/arch/msg/tsvwg/VhgCiE9dF6F2Z-eN9wkpeVG2LX0/

[1] Jake's original position on ECT(1):
https://mailarchive.ietf.org/arch/msg/tsvwg/Zrk7Up6g9BwfnJLjKD44K0riAg4/