Re: [tsvwg] L4S and the detection of RFC3168 AQMs

"alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk> Sun, 13 December 2020 21:32 UTC

Return-Path: <alex.burr@ealdwulf.org.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7EA903A0AC8 for <tsvwg@ietfa.amsl.com>; Sun, 13 Dec 2020 13:32:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.654
X-Spam-Level:
X-Spam-Status: No, score=0.654 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, URIBL_BLOCKED=0.001, WEIRD_QUOTING=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Ut8xJtVS6E3 for <tsvwg@ietfa.amsl.com>; Sun, 13 Dec 2020 13:32:28 -0800 (PST)
Received: from sonic316-12.consmr.mail.bf2.yahoo.com (sonic316-12.consmr.mail.bf2.yahoo.com [74.6.130.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1DE883A0AC3 for <tsvwg@ietf.org>; Sun, 13 Dec 2020 13:32:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1607895146; bh=ftOjr4e21ZP2LaqogmWJ3GAAZ3KfMVcqBsL2/NZy1MY=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject; b=Bzesp3TTPwRTA7UlxHQapKniWwrpS8+QN9H5ajwSqNbTJCvtUxhdfq2yrbimUQyATntjSTWhnpQQQ0RXIrQFWS9wtSlaNydhnq/3FMj4Wl2DuzmxtcuCIfoUhdIvYZVi7odj9Wy0b8HGzSkmAXCUjwz6RB+xT+YMyIH/9eY/ok6vOzBaFlg80yzSxQ9LXATuAibktdxFauGO+vyDtv3a1PRRL9xGf5Dcicpz5kODMes/HIq9wlSanKA7gbYadPZzEe6b4L6XSe6E53NjepAORoXXemlTHsvRV60iMuTT5Rppi/PmqyOy0wVHwHKRnNnstd4vMeZEY8JBRyGW0HJwTA==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1607895146; bh=dPGTfJCD5+ltRSWd8ixZvR65CCT1HK+aOX63NZCIukJ=; h=Date:From:To:Subject:From:Subject; b=UNIbxk9lNPILsZr3OOLe22OuKU/W7FWgyv0QM8gY3iY8U5KqhePK5ntOpVrgyasNMCdX5laZriCeCrwSkKBI3Zaqywe7vAYnz9TPa8P94HucFyWPvqIG45echf1In7K+nNomdJ7zrQjX5H4BrQUKHfqMOBsOZ0WSv0AxAtGmvpzfcDY8wbLT27nd1hiu0nKFLY22AvPFkMe7SPkAs84+3A1Z7WWsXCSo1u2EGqoQs6rbhigwwqW1qhCuE8tfBZ99uiTI0HeUaIdrK/G+9mJf0yzVfiMbCXZAuyaJR8N95gLUUe0tcp7q9cLauV8wv4duyiWL6Ba6bEXkzbnncE55bg==
X-YMail-OSG: uXmP8xAVM1kg4tTOUQDDy6DdVLqVrP2KzUGR6.SXCdaL6qdwKgv_gXh70Mqt0sX Fg8VHHowNy2X_i_4_bm0PLbfP8Wu2yRK9BG3J_VhAqH5YW6BCcTl4.M_x.P24JpoegVK8Z_7lJSn P9AR2cYzvNzk8NYeHWhwcIFTrE6oywLFRGrSJerX.q_tky9VncMtKzm8P20cr0TYOxioLZfEAldr 0j6AC6bq4Wu1mrtscrS0olSlsd75rDl8v.TP04BbcEeySBBgyDn9qHfwZJLLeSItZINimkjH8iIp eybTCp1W67U3lX9PCZTo7PDUc7Tdxs7ylGx3E753AhoYrBeQGg4TAodL3WdxWxz3aASykC6eeS8z xZEQkUKiOHANb0chb9pIa5VhtrUyFRh_Wo98GGF0Cw8LNMTpIU4VrCmRHNOzSpB9hf3KBVCbHl8H wnxFbxod3y__fVyqyZmNp2Kyi09051qpw6bfAonFhhdXEOSdp_ZGFddFI8_9YONo3QNwun2f8NHJ lRGYP4k6.lP7oTqNoRRVvvpDurjAcLKKuKp8ZRY4S03lotA3qZSsq5sBZ0629ZT68y_Q0rHM_QM5 Ze9rbMxrwkpuoBwYmUWa4EuBc_30_rxzGYnOiq9L5DkGJ7xTNiBsqBELCHQkuSRPaIKnw9hk77UN n93aTUOKdBN16px_EyKPIyVs7YiMVcEYAGa3yIhgL3Ye0BoUdLj4eBponOSbQ8ooxUufMN7VkIUL dj1g5agmtrjcbJ0YhB8RMBZfUbIHToogbdYoPlEHRhLRAxZa.G9237WpVmCMTQX9Z5VKk3eK6Exo QTR_ah4ADmQ2uJDNRorLkhK28eRzBnRKCJPJFodIf_Asmspme5v60qPPCc3EH8oIv70ALqxiOvR9 GcpD3TCk0jUfQnPtbtSY__mve3_PXF0JfPDsMFVqoj4Ybmkg4uLRqKhyB3fyuTZyyJ3c4JI57UQn 73MOKJqoL7G4x5TCYyXZ0_BqUVt2MZROtWFkeXKV6HtR6ruRHkSSgX1ncnAwDDY4Br.GT.0H3iRd BNyv1bC7B2ingWDjQRaYU.nUwsoOHAsjqOopP1POiTc2x_QNm_FOuB0V71lreEKQkgxhuY3uG3wy PNRG.uQ6_9mcmOOEgXmPHuNcYZLgohp3uxxW_7rbeXe4FstcQj7bvDimJDM5Bcvg_NGfUyMfRX8P ECoriyTKkksrMU7lgco6V4qZsIJZ0_WedxZW92Ne9Y8xsUnWl2LuAjkwzdkIAvJvMMF3XpeJLOC2 xfcYVT.YnyTmF6NkDULmzz493Bi0wAu5HmpZt.y2984RC4AWqIiK8SDIOFyrpw4IvT68XgycmxbX XWL0r_7J7jNm9VFAwvaXaw2NbC8lvzAz6d_E23LTckj6kpZ86adbjxbVipDkdxfXrjuvJ.7ZIdYV tyygZK5Ebi50zV1Hbi2R9FXE7KvOk8tMu9kEXp5GC.jUiyLLpm6EXkgZpFTE5uKCUWjUox5rAz._ X9x6wmYENA87tXYnl3KiVXOxK2PvNibydFwwDWP..64BNT8XkVKc_EaltO.BuC.z0l2ARauakWfp kVk82YGrJUN_Ko8zLkrsDzi8clKgg6vTVRkwkDbu1FLV.goq9fCe3XmzteAZfUbSVJtGV_SpqNL6 wXg7GTtLOpG9Nsnzy4G4GTUj2QmUxLBk6nSFJE_uH7aM95oJLZgiPJ.i9duzsK4bFdaDW8z7NRRo e12HcA_AbBrULNJP0PvHJpeUKCsyCvb2LcNZf540pwDoV4rFc8Pam5Q5m0_1y_1vo3QZrJ7lZ0iF iBXOY.N6VCjqlKXYvTxzNZKAdbbUnSfV6vO5pbTkVQ3ovg7K43rn1A.MtVfOi6iERavAyMM7090g wccAXVy4CnhNuwVpQstXkwlRh0t1OTLsI.FVNh_RcIqdaXWPoK_PTanyV7D57LZrqCDcHotxV0tB Dj53nXbZ8ETSmCN_8s5mvhWBKs0tsqY1kslDKZ1HNdiApzKi6saPo.ZdJboDnzk.q5UPdt7oK4cB Q4EItmFN7MrpTkRTMWadU_yyCacqNTUG6m6HUHqSNE0cVqPDjJ3vr5iE0mhgmhbtzF0GqY1s3EHq ye9ZDJeuAj6f847Qbv11p0qXEu49Ef4APClr0mjhcwOYsg0H2iVokrBfdQZZv1zetCJfOhnnerXv VF9tfdCOb41vWOtoZj0MQucHJwJ9GV.lY0dhkTxfl70LyQhahEHm_lp64oPRJO4n1ytdIKM5j3Nv mu15j7MfHoa6eLDmRgkuxtiZ2zzJN83TR7CkDUwQb66fD99.0jtHyrJaTn656fpwEor6gSFbs3.V pRFh.m6ouTsvi87gKrR2kf3z01BDnlcIEaVyuhbqeCWcmb1S8BAgXuEpSeQ5jzeVf.XcPAlpWHMr k6JELMAHiUYa.2UhdFtP.maOwnV6W4gRVSXPF1uRsI6aocZ3Qk_q_1JYPwXJaPXmQoaD2MBn0UF_ 2KMtyX46q1IddVEWssA.uWbMH9DtD7ikrtt_45_ZrUBvK8SrCGmRq28lYc0srg5d8kc3HaFqg2Np 6IDf0UcGQCYxrnKXReEo3dQC1
Received: from sonic.gate.mail.ne1.yahoo.com by sonic316.consmr.mail.bf2.yahoo.com with HTTP; Sun, 13 Dec 2020 21:32:26 +0000
Date: Sun, 13 Dec 2020 21:32:25 +0000
From: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>
Reply-To: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>
To: Bob Briscoe <ietf@bobbriscoe.net>, Martin Duke <martin.h.duke@gmail.com>
Cc: "tsvwg@ietf.org" <tsvwg@ietf.org>
Message-ID: <1515211004.136164.1607895145754@mail.yahoo.com>
In-Reply-To: <CAM4esxQQe4MJsU3ZvdVWVeSC6z+YWCytDd3i2im27qhnss1_og@mail.gmail.com>
References: <125328289.3455959.1607381048136.ref@mail.yahoo.com> <125328289.3455959.1607381048136@mail.yahoo.com> <3F562A25-F4F2-4335-9ED7-54299500B8F6@cablelabs.com> <a35cf206-2fc7-c60e-c713-c4f916106bde@bobbriscoe.net> <CAM4esxQQe4MJsU3ZvdVWVeSC6z+YWCytDd3i2im27qhnss1_og@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: WebService/1.1.17278 YMailNorrin Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/uRkgWLmUuPqJXKm-Fmur_lV62cY>
Subject: Re: [tsvwg] L4S and the detection of RFC3168 AQMs
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Dec 2020 21:32:30 -0000

Martin,Bob, 
 I'm glad you replied, Martin; firstly for your kind words, but also because Bob's reply below was in my spam folder, so I hadn't seen it. More inline [AB]


On Friday, December 11, 2020, 8:11:23 PM GMT, Martin Duke <martin.h.duke@gmail.com> wrote: 


Alex,

Thanks for thinking outside the box like this -- it may be this sort of creativity that gets out of the box we're in.


This falls under the "much easier to do in other transports" category, where I could just send a PING or HEARTBEAT marked ECT(0) to test the queue in mid-connection, without affecting the latency of anything that matters. But in the TCP case, I'm not sure how to resolve Bob's second objection (running ECT(0) for a long time would be unacceptable).

On Wed, Dec 9, 2020 at 3:23 PM Bob Briscoe <ietf@bobbriscoe.net> wrote:
>  
>  Alex,
> 
> An interesting idea.
> 
> You've given some pro points.
> Here are a few con points:
> 
> 1) Requires L4S FQ AQMs to actively remove Classic ECN support.
> Many FQ AQMs already support Classic ECN. By this proposal, if they wanted to add L4S ECN support (which is pretty simple to do), they would have to remove Classic ECN support, dropping instead of marking ECT(0) packets. It is possible (likely?) that a number of FQ AQM implementers would vote with their feet and not remove Classic ECN support, even if the IETF published an RFC adopting your proposal. Then the certainty that your proposal is designed to create would be lost. Then we would be worse off in two directions: still no way to be certain about ECT(0) markings, while at the same time having reduced what little Classic ECN deployment there is.

[AB] That would be a bad outcome. Perhaps the constraint could be modified to allow that FQ AQMs supporting L4S were allowed to mark ECT(0) following RFC3168, so long as they ONLY do so in buckets which have not seen ECT(1) for some period? I think this would also help with your 3) below. Maybe this would also help with the issues of unfairness with a) tunnels and b) L4S and normal flows hashing to the same bucket, since the queue in that bucket would build up to the point of dropping and then L4S would fall back to reno. It would need some additional storage per bucket, to track those which have recently seen ECT(1), but that could be a 2 bit countdown counter.


> 2) For in-band Classic ECN AQM detection, the proposal requires L4S not to be used for a long time in order to detect whether L4S can be used.
> L4S hosts are the ones that need to check for Classic ECN AQMs. But they don't send ECT(0). So to take advantage of your proposal, they would have to send ECT(1) packets then, if they see one CE marking, they would have to switch to sending ECT(0) packets, and actively send enough ECT(0) to be sure that /lack/ of CE marking was not just lack of congestion. Given CE marking is currently very rare, how long without CE marking do they have to send ECT(0) packets until they can assume it's an L4S AQM? They don't want to alternate ECT0 & ECT1, cos if it is an L4S bottleneck, that will lead to reordering.
> 
> So, lack of CE marking would be ambiguous. Would it mean the bottleneck is L4S? Would it mean the capacity has increased and the flow needs to increase its rate more to find the limit? If there was a loss or losses, that would also be ambiguous. Are they congestion losses?
> 
> All the time that the L4S source is sending ECT(0) packets, the flow is losing the benefit of the low delay it could have been getting if the original CE mark was emitted from an L4S queue. This is a similar problem to an algorithm that gives false positives (like the v2 Classic ECN AQM detection that Asad & I published). It sometimes detected an L4S AQM as Classic. So it was argued that this would probably prevent people from using it. The same would surely be true of an algorithm that requires L4S not to be used in order to detect whether L4S can be used.
> 

[AB] I was indeed assuming that the transport could send ECT(0) probe whenever it wanted, and that probes would not contain data and so would not cause an ordering problem. I admit that I hadn't realised this might be difficult in tcp, although further in the thread Mike Heard and David Black seem to have identified a possible approach.

I am not sure if, supposing a probe can be sent at any time,  this completely addresses your objection. "not using L4S" could be  
a) not using ECT(1) and therefore not getting the benefit of the L queue, or
b) initially assuming that CE marks have the RFC3168 meaning, rather than the L4S meaning. 

If a way is found to send ECT(0) probes without causing reordering, presumably this addresses a)?  b) is more tricky.

Let me take a step back:
 One approach to getting this WG unstuck, would be to find some fallback algorithm which could easily be shown to be safe, even if it has other deficiencies from the point of view of those who want to deploy L4S. That would remove the most serious objection to starting the experiment; and then the WG might agree that since it would now be safe, the experiment could start, so long as there is a reasonable prospect of developing some improved fallback algorithm  . Once the experiment starts, then it would be easier to test and develop better detection algorithms along side one already known to be safe. 

You guys are in a better position to know what kind of deficiencies are tolerable to those who want to use L4S. 

The reason probing seems attractive is that, once you assume L4S doesn't mark ECT(0) , if probe packets are arranged such that each RFC3168 CE mark has the same, independent probability of hitting an ECT(0) nprobe packet, then the mathematics is very simple. At the risk of teaching everyone to suck eggs, if the proportion of probe packets is P then the expected number of CE marks until detection is 1/P. For safety, the number of CE marks until detection seems to me the right metric, since if there are no CE marks it's not a problem, and the higher frequency CE marks are, the more we care about detecting.And to calculate statistics about that, we don't need any more information than P, and any more assumptions other than that the frequency of CE marks is nonzero. 

Making P large  causes overhead in terms of the number of probe packets, which is one deficiency . On the other hand, even the most safety-minded might be placated with a sufficiently large P, and if also reduces the length of time you might need to wait to decide there is no RFC3168 AQM. 

I had actually assumed that the L4S team's  intention was to start by assuming that there were no RFC3168 AQMs. This seems to be the behavior of the pseudocode in the paper. If that is the case (and the WG accepts it) then there are no false positives at startup. The only reason then  to worry about the amount of time to wait without seeing ECT(0)->CE is if there is an RFC3168 AQM, which you previously detected, and now might no longer be the bottleneck. under those conditions it would seem reasonable to be conservative about switching back.

On the other hand if L4S transports should start in 'RFC3168  safe mode' and only switch to L4S if it decides that there is no RFC3168 AQM on the bottleneck, or if it is important to switch quickly back when the bottleneck changes, then you do need to make some assumptions about the frequency of marking  (of this stream). Unfortunately I don't have enough knowledge  of tp  (or tcp-prague) to devise such an assumption. Given some worst-case assumption of the rate at which an actually-marking AQM would mark, it would be straightforward to calculate the probability P(we should have seen ECT(0)->CE by now) by standard bayesian reasoning, and the expected number of packets before this probability would fall below a given threshold. But, I do not know if this would be a number that you would like.

Unfortunately I don't have much  spare effort to develop this, but  I hope that has provided food for thought.

Alex





> 3) Could be used for Out-of-band Classic AQM detection, but...:
> The proposal would help server operators who wanted to run pre-L4S-deployment tests to check for the presence of Classic ECN AQMs. Then ehy could run saturating ECT(0) flows for long enough to be sure that lack of CE meant absence of Classic ECN bottlenecks. But only if all L4S implementers complied (see item #1). The fact that this proposal is only really useful for pre-deployment testing would surely make FQ implementers even less keen on removing Classic ECN support.
> 
> 
> Perhaps this conversation will spark a variant of your idea. However, in its present form, unless I'm missing something, it doesn't seem to stand up to scrutiny.
> 
> Cheers
> 
> 
> Bob
> 
> 
> On 08/12/2020 19:37, Greg White wrote:
> 
> 
>>  
>>  
>> Hi Alex,
>> 
>>  
>> 
>> This does seem worth considering.  FWIW in Low Latency DOCSIS the implementation will be as you describe.  ECT(0) packets will not be marked CE.  This was done for practical reasons, since the classic AQM re-uses the DOCSIS-PIE AQM which is implemented in hardware in many devices, and does not support ECN.  For consistency, any ECT(0) traffic that were to make its way somehow (e.g. DSCP marked as NQB) into the LL queue, will also not be CE marked. 
>> 
>>  
>> 
>> -Greg
>> 
>>  
>> 
>>  
>> 
>>  
>> From: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>Reply-To: "alex.burr@ealdwulf.org.uk" <alex.burr@ealdwulf.org.uk>Date: Monday, December 7, 2020 at 3:44 PMTo: ""koen.de_schepper@nokia.com"" <koen.de_schepper@nokia.com>, ""ietf@bobbriscoe.net"" <ietf@bobbriscoe.net>, Greg White <g.white@CableLabs.com>Cc: ""tsvwg@ietf.org"" <tsvwg@ietf.org>Subject: L4S and the detection of RFC3168 AQMs
>> 
>> 
>>  
>>  
>> 
>> 
>>  Hi all,  At present the DUALQ draft leaves it to implementers to decide if traffic in the classic queue (IE, ECT(0) traffic that uses currently standard congestion controllers) should be dropped or marked (except, apparently, when an ECT(0) packet for some reason turns up in the L queue?). Perhaps it would be wise to specify that during the experiment, deployments of L4S AQMs (whether DUALQ or the other alternatives mentioned in draft-ietf-tsvwg-l4s-arch) SHOULD NOT (or maybe even MUST NOT) mark ECT(0) traffic, and only drop. This would be to preserve the fact that, as currently, those RFC3168 AQMs which are unaware of L4S, can be identified by the fact that they mark ECT(0) packets with CE. If L4S AQMs are required not to mark ECT(0) as CE, then if an endpoint (or other monitoring method) sees a CE mark on an otherwise ECT(0) flow, then it knows with more or less 100% confidence that an RFC3168-only AQM is on the path. Presumably this is safe, since L4S nodes are required to be able to support Not-ECT traffic, and ECT(0) traffic is required to be able to cope with drop.  At present the only definitive way for endpoints identify an RFC3168 AQM in use, is to observe CE marks on an ECT(0) marked flow.Bob's paper [1] gives various other methods, but this seems to be a research project at present. If any of these approaches were found to be reliable, then the above requirement could be relaxed fairly easily; the reverse is not so easy. There are, as is probably obvious, a number of reasons for wanting to identify RFC3168 AQMS: - Current efforts to gather statistics on the number of RFC3168 AQMs which might encounter problems when L4S traffic passes through them. - The the ops draft (sec 3.1)  envisages that CDN operators should test for the presence of RFC3168 AQMs, but doesn't yet specify how this should be achieved - Within a L4S transport protocol, in order to fall back to RFC3168 behavior All of these would presumably be assisted by being able to classify observed ECT(0)->CE transitions unambiguously being the result of an L4S-unaware node. Unless I'm missing something it seems very much worth considering to restrict L4S marking behavior of ECT(0) for the time being? I am not sure which drafts would need updating; DUALQ at least, but presumably the ops draft and maybe L4S-arch. regards, Alex [1] https://arxiv.org/pdf/1911.00710.pdf
>> 
>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>