Re: [tsvwg] Update to Position Statement on ECT(1)

"Holland, Jake" <jholland@akamai.com> Tue, 19 May 2020 02:56 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44DD43A1122 for <tsvwg@ietfa.amsl.com>; Mon, 18 May 2020 19:56:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.099
X-Spam-Level:
X-Spam-Status: No, score=-0.099 tagged_above=-999 required=5 tests=[DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GKSQAaGl4fGw for <tsvwg@ietfa.amsl.com>; Mon, 18 May 2020 19:56:34 -0700 (PDT)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3A7773A110D for <tsvwg@ietf.org>; Mon, 18 May 2020 19:56:31 -0700 (PDT)
Received: from pps.filterd (m0122332.ppops.net [127.0.0.1]) by mx0a-00190b01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04J2h9As013527; Tue, 19 May 2020 03:56:30 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=jan2016.eng; bh=ihNrIaTTj5jA8wi9XgLTGgCMSmQtQ96NNTqNfI7QppU=; b=Lj2vHazJnKNs6UWv0UT8IvtZnlCP+JPdbCH3xVwwh/BHaraBSy98m7rRWZXnawaRUY5O Lu40fPuI1Rj+4GPkkctDkp0SWbyht5tNYMPuGIvVswN9t9Ii+OWnDScRiTNcqxygZRZn 5IasyyCaAwNPrVsKUmZbk9kmvPmZbwbmzs4rogHbrRUxlIo0WLI2eE+IyHSfn5BX+URM C03SYleCBFxwbREqFqS09+JRb6X5zyUWQrUNO4+NmEq/ZQtoiUhEHbtQ4PQEw35z27fZ US/X4NTsr8rpZ//Z38/CSjXrPj8xLQWayosxoFvyN0OhyzVsqtNJdhMXbpc6ilXCADu6 qg==
Received: from prod-mail-ppoint4 (a72-247-45-32.deploy.static.akamaitechnologies.com [72.247.45.32] (may be forged)) by mx0a-00190b01.pphosted.com with ESMTP id 31287p3bah-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 19 May 2020 03:56:29 +0100
Received: from pps.filterd (prod-mail-ppoint4.akamai.com [127.0.0.1]) by prod-mail-ppoint4.akamai.com (8.16.0.27/8.16.0.27) with SMTP id 04J2la9Y002708; Mon, 18 May 2020 22:56:28 -0400
Received: from email.msg.corp.akamai.com ([172.27.123.57]) by prod-mail-ppoint4.akamai.com with ESMTP id 312bgubykk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 18 May 2020 22:56:28 -0400
Received: from usma1ex-dag1mb6.msg.corp.akamai.com (172.27.123.65) by usma1ex-dag1mb4.msg.corp.akamai.com (172.27.123.104) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 18 May 2020 22:56:27 -0400
Received: from usma1ex-dag1mb6.msg.corp.akamai.com ([172.27.123.65]) by usma1ex-dag1mb6.msg.corp.akamai.com ([172.27.123.65]) with mapi id 15.00.1497.006; Mon, 18 May 2020 22:56:27 -0400
From: "Holland, Jake" <jholland@akamai.com>
To: Martin Duke <martin.h.duke@gmail.com>
CC: "C. M. Heard" <heard@pobox.com>, TSVWG <tsvwg@ietf.org>
Thread-Topic: [tsvwg] Update to Position Statement on ECT(1)
Thread-Index: AQHWJXDCMj0HjdV7wk+M4pcr+ovB2aihqRUAgAAqnYCADOcbAP//2jKA
Date: Tue, 19 May 2020 02:56:27 +0000
Message-ID: <1D8D2AF8-F805-4BAC-8126-355A8337D830@akamai.com>
References: <BE44EAE9-5CFB-4F5D-85B8-05AFA516C151@akamai.com> <CACL_3VEbUHB-Omwp1-g5Tq3G3J-kKj9N3jPZLcfruicw3X=AsA@mail.gmail.com> <2CBBD8CD-2088-4E41-B113-EED665853D3C@akamai.com> <CAM4esxSFCBcxXjz5JJJg1z6+wwfN3mTrtJ8bKiBsj2TeOmmFSw@mail.gmail.com>
In-Reply-To: <CAM4esxSFCBcxXjz5JJJg1z6+wwfN3mTrtJ8bKiBsj2TeOmmFSw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.37.20051002
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.119.102]
Content-Type: multipart/alternative; boundary="_000_1D8D2AF8F8054BAC8126355A8337D830akamaicom_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-18_11:2020-05-15, 2020-05-18 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-2004280000 definitions=main-2005190023
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-18_11:2020-05-15, 2020-05-18 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 clxscore=1011 phishscore=0 cotscore=-2147483648 spamscore=0 mlxscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005190023
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/uplXgsXm4d7oieQGkkGU6fG0CaE>
Subject: Re: [tsvwg] Update to Position Statement on ECT(1)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 May 2020 02:56:44 -0000

Hi Martin,

Thanks, I think it’s an interesting direction also, and I’m glad to get some discussion on it.  Here’s hoping it turns out useful, or at least informative.

Some responses inline with <JH></JH>


From: Martin Duke <martin.h.duke@gmail.com>
Date: Monday, May 18, 2020 at 3:12 PM
To: "Holland, Jake" <jholland@akamai.com>
Cc: "C. M. Heard" <heard@pobox.com>, TSVWG <tsvwg@ietf.org>
Subject: Re: [tsvwg] Update to Position Statement on ECT(1)

Jake,

I'm intrigued by this discussion of the ECT(1)->ECT(0) proposal, as something that could definitively solve the safety concerns. I'll make two unrelated points:
1) If the current L4S proposal is in need of an MD signal, there's always dropping the packet.  Although packet loss is bad, maybe some drops at the end of slow start is the tradeoff we have to make to get low latency. Implementations really concerned about loss can be less aggressive during slow start.

<JH>
To me, the problem with loss as the only (viable) MD signal is the existing devices that don’t drop until well after exceeding reasonable fairness bounds from an unresponsive flow, so for TCP Prague to maintain compatibility with classic queues, it would still need a successful classic queue detection mechanism.

For example, Cisco’s AFD approach is a modern classic queue in a high-speed device that doesn’t need tuning like wred, and seems to be helpful, and allows ECN marking:
https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html
(h/t Dave Täht on the make-wifi-fast list, without whose comment I probably wouldn’t have noticed this)

There’s likewise some very interesting ongoing work on CoDel and PIE in P4 that might be relevant:
- https://ieeexplore.ieee.org/document/4813399
- https://p4.org/assets/P4WS_2019/p4workshop19-final10.pdf

The impact there might be lower, since they can perhaps upgrade to treat ECT(1) differently more easily, and perhaps even to dualq with perhaps a faster update cycle than replacing those nexus 9ks.  (But also because they haven’t quite gotten to marking yet, though they seem to intend to: https://github.com/PIFO-TM/ns3-bmv2/blob/master/traffic-control/examples/p4-src/pie/pie.p4#L186 )

RFC 7567 seems to encourage turning stuff like this on and enabling marking, but we’d expect it to interact poorly with L4S’s failure to treat the marks as the MD signal intended by the classic algorithms.  By the time multiple L4S flows competing with multiple classic marking flows causes loss across devices like these, based on past results we’d expect to sometimes see something between gross unfairness and near starvation for the classic flows.

Some people have argued that classic ECN had a long time to roll out without much motion, and so we should stop worrying about it.  Maybe that’s a good consensus question to check—I don’t agree with the claim, but I might be in the rough.

If consensus suggests that we should deprecate that 7567 advice and call classic ECN a failed protocol and make it historical (or deprecated without FQ), then I guess we should get that process started as soon as possible, with appropriate outreach to make for a smoother transition for those who have adopted it or are in progress working on it.

If we did that and didn’t get substantial pushback during the outreach, it would clear the way for incompatible proposals.  It’s certainly possible to land on a consensus that RFC 8311 didn’t go far enough in being permissive toward not worrying about shared classic marking queues, as I’ve heard expressed a few times.

But for now, without such a deprecation, it seems to me like a blocker for L4S as a viable experiment at internet scope if it can’t interoperate with existing devices reasonably, when the current BCP says to turn those devices on.  Even if they’re not quite turned on yet.
</JH>


2) Clearly, there is no AccECN signaling problem for ECT(1)->ECT(0) for QUIC, and for TCP paths where the option gets through. It this is an issue of the three ACe bits, I think one codepoint in ACE would be sufficient to indicate that a CE mark was received, which IMO would trump whatever other feedback is in that header.. Unless there's some sort of performance cliff in not being able to encode 7 ECT(0) marks, this seems like a non-problem

<JH>
I’m not sure I follow this suggestion.

If you’re imagining a “latch-on” behavior like ECE that waits for CWR from sender before moving along, I think the sender no longer has a CWR if they’ve negotiated ACE, so they can’t respond in a way that turns off the latch.  But if you’re just using 1 packet with 1 codepoint to indicate “CE observed”, you’ll miss the report of CE observed at sender if the ack bearing that signal is dropped or thinned or whatever, so I think neither of those would work.

But I guess you could maybe use 1 codepoint for a latched-on ECE and another codepoint for CWR, and still have a 6-slot counter for the ECT(0) signal (which is superseded by the ECE codepoint while active).

I do also think it makes good sense to use the SCE approach of reflecting each received ECT(0) in NS (or ESCE, you could still rename it), while keeping the ECE/CWR behavior the same and just not using AccECN.  This idea got a critical review from Bob here, so maybe you should check it before you believe me:
https://mailarchive.ietf.org/arch/msg/tsvwg/V5pFpko_JYaFKu68oXqe-nvgmNg/

I didn’t debate it at the time, but I found it uncompelling for a few reasons:
I think that although ack-thinning does reduce the accuracy, unless the trimmed acks are systematically biased, a smoothed NS signal would still converge to the right proportion of markings, so you’d only see minor fluctuation on the hi-fidelity response. The instantaneous error you’ll get from thinned acks isn’t very relevant as long as over time the signal converges to the proportion actually marked by the network.  And in the cases where it does diverge by enough in the unsafe direction, the MD signal would still serve as a solid fail-safe.

To me the problems seem very likely addressable, especially because degenerate cases (e.g. as mentioned in the critique: 50% marking on every other packet but ack-thinning every other ack that makes a 100% or 0% signal out of a 50% one) wouldn’t be stable, because such a pattern would induce a response from the sender that would change the mix of traffic and break the degenerate marking+thinning interaction, and since this is the hi-fidelity path I’d expect usually to see only a relatively quick fluctuation (and again, importantly: backstopped by a reliable fail-safe).

But I’m also not opposed to one ACE codepoint for a latched ECE and another for CWR, with 6 counting slots for the hi-fi signal.  Not that I’ve seen it tested, but offhand it seems like probably a pretty reasonable approach.  So my conclusion is that I’m (tentatively) pretty well convinced that the endpoint feedback signaling is solvable with 2 congestion signals, even under constraints about reliable transport of TCP options.

With that said, I’ll raise my own unrelated point:

3) I think the issue Bob raised about tunnels is the bigger crux.

To me, if we can have only 1 explicit signal and the choice is between the MD signal and the hi-fi signal, I would keep the MD signal because *normal AQM solves 90% of the problem*, and the incremental gains you can get from hi-fi congestion signaling can be realized over the decade it’ll take to get tunnel decapsulators to pass both signals, *provided you don’t break normal AQMs and give people a reason to bleach ECT(1)*.

And anyone who cares enough about latency and also controls the tunnel decapsulator can deploy an upgrade earlier, so it’ll only be the long tail of providers who don’t get around to upgrading the tunnels that will suffer with only the 90% solution for the dualq bottlenecks their tunnels cross.

Many of the tunnel owners are the same entities who’d be rolling out the dualq, so hopefully they’d care enough to also upgrade the tunnels if they’re willing to turn on the dualq.  But it’s true that some tunnels aren’t owned by the same people as the queue owners.  I think that’s a fair point, and that’s the right kind of tradeoff to be debating (or maybe even gathering information on exactly how much of internet traffic we’re talking about that live under these different conditions).

So to me, maybe a better consensus question is: what’s more important, getting the hi-fi signal from new queues, or keeping the MD signal from existing queues?

Because it seems like we can’t have both with the current tunnels out there, and it seems like there’s plenty of tunnels.

And because the response to that question determines which decade-long project is our right next move to eventually get to low latency for all (either deprecating non-FQ classic ECN or upgrading tunnels to export 2 signals, respectively), and either way we should get that ball rolling, IMO.

What I’m really against is the short-cut of rolling out something incompatible with an ongoing rollout of an existing widely implemented technology, over short-term hopes of a rapid deployment for the new thing, with the likely result (as we’ve seen from discussion) of bleaching ECT(1) in at least some networks, with who knows how long a tail it will leave behind.

I’m eager to see the discussion turn from input/output on ECT(1) to “how important are classic shared marking queues”, if that’s actually under question.
</JH>

Martin (as an individual)

On Sun, May 10, 2020 at 5:09 PM Holland, Jake <jholland=40akamai.com@dmarc.ietf.org<mailto:40akamai.com@dmarc.ietf.org>> wrote:
Hi Mike,

From: "C. M. Heard" <heard@pobox.com<mailto:heard@pobox.com>>
> Yes, combinations marked (**) below would have to changed from RFC 6040:
....
> Similar changes would be needed for draft-ietf-tsvwg-rfc6040update-shim and draft-ietf-tsvwg-ecn-encap-guidelines.
>
> Clearly, the need to get such changes deployed would be a barrier to barrier to adoption.

Yes.  I think in a recent thread I heard it confirmed that current tunnel
handling of these is kind of spotty today, specifically:

  > as an endpoint we'll be dealing with weird inconsistencies that basically
  > never fully understood anything beyond "don't lose CE marks" and maybe
  > "loss is acceptable in confusing cases", if we're lucky.

  [BB] See reply to Dave Taht just now, which pretty much confirms what
  you've just said.

that's from here:
https://mailarchive.ietf.org/arch/msg/tsvwg/QudqLu1RTQZCVnS4HNf8jnZ_kIY/<https://urldefense.proofpoint.com/v2/url?u=https-3A__mailarchive.ietf.org_arch_msg_tsvwg_QudqLu1RTQZCVnS4HNf8jnZ-5FkIY_&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=TIR1CmYggwBy2HCNBLTpW3pybzL5vXHm2w3hBT4hXr0&s=YZLWdFwAuG265end-PYLEkru5-KxLTgl3TDeIu3ldOM&e=>

(I think the Dave Taht reply he was referencing was this one:
https://mailarchive.ietf.org/arch/msg/tsvwg/2ElPK72IiFg2gHJZ_rLMUKTDfCI/<https://urldefense.proofpoint.com/v2/url?u=https-3A__mailarchive.ietf.org_arch_msg_tsvwg_2ElPK72IiFg2gHJZ-5FrLMUKTDfCI_&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=TIR1CmYggwBy2HCNBLTpW3pybzL5vXHm2w3hBT4hXr0&s=bQnu5-h2PmSQy5CniEvkBvwvMCWRHRBRuw38Oqg7s44&e=> )

I'm beginning to think the reason I've come down differently than the
L4S team on the judgement call for this approach being better, in light
of the state of tunneling encapsulation deployments, might boil down to
a disagreement over the answer to a question like:
  Which is better:
  - losing the safety response of MD from a loaded classic queue, or
  - losing some of the reliability on the low-latency response when there
    is a dualq on-path?

I'm beginning to think we might be stuck with one of those options for
tunneled paths until tunnel decap implementations can be widely upgraded
in deployment, however this lands.

> - the existing accecn spec would often lose non-CE signals
>
> Actually, I would go farther and say that something rather different from the existing AccECN draft would be needed. AccECN provides accurate feedback of the number of CE marks observed. Under the proposed scheme L4S would need to getting accurate feedback of the number of ECT(0)  (pre-congestion / some congestion) marks. AccECN would need to be re-worked to provide both that and, in addition, either the existing ECE/CWR handshake or something else that performs the equivalent function. The most obvious solution would be to repurpose NS and one or  more currently reserved flag bits (or use other ideas from RFC 7560 Sec 5.2) and leave ECE/CWR unchaged. I note in passing the SCE proposal would have to do something along the same lines (though AFAICT that has not yet been fully fleshed out).

Agreed, I think a different feedback than AccECN would be smarter if the
ECT(1)->ECT(0) approach goes forward, and I like the NS reflection approach
that SCE's implementation started with.  (Although it might lose fidelity
from some ack aggregation responses, I'd expect usually that the marking
rate maintains proportionality on the low-congestion signal, and where that
fails, the standard ECE response is at least reliable, so it covers the
safety considerations the same way as classic marking.)

However, I also think for anyone who disagrees, other viable approaches
for the feedback might be possible.  But in that direction, I do think
it probably would need to differ from AccECN.  This would of course need to
be nailed down in the end, though I didn't get into it in the first email.
But the potential complexity here is one of the reasons I rate the
suggestion as perhaps a major architectural change for L4S.

I tend to think that the per-ECT(0) reflection in NS is the best way,
but I don't think it would change the rest of the argument if that position
turned out wrong.

> - For paths with multiple AQMs, the classifier partially loses integrity in
>   later AQMs when earlier AQMs are loaded.  (Note also the worse downside
>   that increasing deployment of new AQMs potentially reduces the fidelity
>   further.)
>
> If I understand what is being said, this is because ECT(0) would become ambiguous, as it can appear either on an L4S packet with a pre-congestion marking, or a non-L4S packet.  Doesn't the same issue exist with the current L4S proposal for CE-marked packets?

Yes, but the L4S specs go over this and walk through the reasoning for
why they landed on classifying CE into the LL queue, and the net result
in that case is  that the ECT(1)->CE marking strategy that L4S currently
follows will keep the 1/p packet signals in the LL queue for the later
hops.

(The potential problems were mostly limited to mis-classified marked
classic traffic, which will tend to be fewer in number and also less
severe given that the flow is slated to back off anyway, plus a review of
the main implementations suggested they wouldn't be doing double-backoffs
if the CE packet was out of order, IIRC.)

It might be better to phrase this not as "loses integrity", but rather as
"might systematically increase the latency experienced" for the 1/p signal,
since when there are multiple dualqs in line, those packets (but not others
from L4S flows) will land in the classic queue on the later dualqs.  This
is arguably a worse downside than the classification failure from putting
CE-marked classic traffic in the LL queue.

> Actually, it seems to me that this approach would yield exactly the same congestion signaling capability as using ECT(1) as a  pre-congestion / some congestion mark. All that has been done is to reverse the role of ECT(1) and ECT(0) compared to what the SCE draft and RFC 6040 envisioned. In other words:
>
>      +-----+-----+
>      | ECN FIELD |
>      +-----+-----+
>         0     0        Not-ECT
>         0     1        ECT(1) - L4S/SCE Capable AND No Congestion
>         1     0        ECT(0) - Some Congestion OR RFC 3168 ECN Capable
>         1     1        CE

Yes, that's my understanding.  I think the whole proposal can reasonably be
summarized as "SCE with the bits flipped".

> Jake, you said that the three issues discussed above -- tunnels, AccECN, and multiple AQMs in the path are "a few of the known tradeoffs." What are the others?

These are all the ones I know of yet, but I think Bob and Koen might have
some others they already know.  I'm not sure I got the whole story yet.

Thanks for your comments.

Best regards,
Jake