Re: [tsvwg] [Ecn-sane] per-flow scheduling

"Holland, Jake" <jholland@akamai.com> Thu, 25 July 2019 19:26 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 392A81201A0 for <tsvwg@ietfa.amsl.com>; Thu, 25 Jul 2019 12:26:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id shkRMSRkYoFG for <tsvwg@ietfa.amsl.com>; Thu, 25 Jul 2019 12:26:44 -0700 (PDT)
Received: from mx0b-00190b01.pphosted.com (mx0b-00190b01.pphosted.com [IPv6:2620:100:9005:57f::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A670812019F for <tsvwg@ietf.org>; Thu, 25 Jul 2019 12:26:44 -0700 (PDT)
Received: from pps.filterd (m0122331.ppops.net [127.0.0.1]) by mx0b-00190b01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x6PJM4i4024076; Thu, 25 Jul 2019 20:25:59 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=jan2016.eng; bh=hCSUWj8D3ZGqcy42eMyM2G98EapisXQX7ovz+urd+K4=; b=PX+6jgcHuPBrZha3M2eicrs6Pl0UxdOxK2MNgVjkrrV6fkq0XgkUqpI3BXcOmwqeSi30 ceBRpU17B9LBDnOT26Og5WxWjsVXQPI52L8+ym8kuIeFxHVIghn2mS9xl9R3fb+sGaT+ q2NKcxuhEh3iosNGhcmySUJtUwJqDGGYv2X4AG4tumtxyytv/JIcfXFoWWCQJ1BwLfsb oDnoVNK4BZaAegVxl9JawIc4gJCC3gErex6OKzR+hCzzD8lvJKHVNYaca5c43rJ6awF+ J3VHpLQLR5fE4q395Oy57oOPJr+A3lJ4zijClZg/1/nogHEX9llg9cS7wJFqDaNbxxWn hQ==
Received: from prod-mail-ppoint2 (prod-mail-ppoint2.akamai.com [184.51.33.19] (may be forged)) by mx0b-00190b01.pphosted.com with ESMTP id 2tx60t992d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 25 Jul 2019 20:25:58 +0100
Received: from pps.filterd (prod-mail-ppoint2.akamai.com [127.0.0.1]) by prod-mail-ppoint2.akamai.com (8.16.0.27/8.16.0.27) with SMTP id x6PJHTPx019613; Thu, 25 Jul 2019 15:25:58 -0400
Received: from email.msg.corp.akamai.com ([172.27.25.31]) by prod-mail-ppoint2.akamai.com with ESMTP id 2tx62ykug1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 25 Jul 2019 15:25:57 -0400
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com (172.27.27.104) by ustx2ex-dag1mb5.msg.corp.akamai.com (172.27.27.105) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 25 Jul 2019 14:25:56 -0500
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com ([172.27.6.134]) by ustx2ex-dag1mb4.msg.corp.akamai.com ([172.27.6.134]) with mapi id 15.00.1473.005; Thu, 25 Jul 2019 14:25:56 -0500
From: "Holland, Jake" <jholland@akamai.com>
To: Kyle Rose <krose@krose.org>, Bob Briscoe <ietf@bobbriscoe.net>
CC: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, tsvwg IETF list <tsvwg@ietf.org>, "David P. Reed" <dpreed@deepplum.com>
Thread-Topic: [tsvwg] [Ecn-sane] per-flow scheduling
Thread-Index: AQHVJqkZ8+cUDXpQGkuv3w7DnDnX5KbP1P2AgAAMq4CAAHbvAIAG1PAAgAGrIgCAAvYMgA==
Date: Thu, 25 Jul 2019 19:25:56 +0000
Message-ID: <4833615D-4D68-4C95-A35D-BCF9702CEF69@akamai.com>
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> <1563401917.00951412@apps.rackspace.com> <D1595770-9481-46F6-AC50-3A720E28E03D@gmail.com> <d8911b7e-406d-adfd-37a5-1c2c20b353f2@bobbriscoe.net> <CAJU8_nWTuQ4ERGP9PhXhpiju_4750xc3BX10z4yp4an0QBE-xw@mail.gmail.com>
In-Reply-To: <CAJU8_nWTuQ4ERGP9PhXhpiju_4750xc3BX10z4yp4an0QBE-xw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1a.0.190609
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.37.149]
Content-Type: multipart/alternative; boundary="_000_4833615D4D684C95A35DBCF9702CEF69akamaicom_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-07-25_07:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907250229
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:5.22.84,1.0.8 definitions=2019-07-25_07:2019-07-25,2019-07-25 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 impostorscore=0 priorityscore=1501 spamscore=0 bulkscore=0 clxscore=1011 phishscore=0 mlxlogscore=999 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1906280000 definitions=main-1907250230
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/wMTikbzXgykXYHYfWqD7J00pFb0>
Subject: Re: [tsvwg] [Ecn-sane] per-flow scheduling
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jul 2019 19:26:48 -0000

Hi Kyle,

I almost agree, except that the concern is not about classic flows.

I agree (with caveats) with what Bob and Greg have said before: ordinary classic flows don’t have an incentive to mis-mark if they’ll be responding normally to CE, because a classic flow will back off too aggressively and starve itself if it’s getting CE marks from the LL queue.

That said, I had a message where I tried to express something similar to the concerns I think you just raised, with regard to a different category of flow:
https://mailarchive.ietf.org/arch/msg/tsvwg/bUu7pLmQo6BhR1mE2suJPPluW3Q

So I agree with the concerns you’ve raised here, and I want to +1 that aspect of it while also correcting that I don’t think these apply for ordinary classic flows, but rather for flows that use application-level quality metrics to change bit-rates instead responding at the transport level.

For those flows (which seems to include some of today’s video conferencing traffic), I expect they really would see an advantage by mis-marking themselves, and will require policing that imposes a policy decision.  Given that, I agree that I don’t see a simple alternative to FQ for flows originating outside the policer’s trust domain when the network is fully utilized.

I hope that makes at least a little sense.

Best regards,
Jake

From: Kyle Rose <krose@krose.org>
Date: 2019-07-23 at 11:13
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>, tsvwg IETF list <tsvwg@ietf.org>, "David P. Reed" <dpreed@deepplum.com>
Subject: Re: [tsvwg] [Ecn-sane] per-flow scheduling

On Mon, Jul 22, 2019 at 9:44 AM Bob Briscoe <ietf@bobbriscoe.net<mailto:ietf@bobbriscoe.net>> wrote:
Folks,

As promised, I've pulled together and uploaded the main architectural arguments about per-flow scheduling that cause concern:

Per-Flow Scheduling and the End-to-End Argum ent<https://urldefense.proofpoint.com/v2/url?u=http-3A__bobbriscoe.net_projects_latency_per-2Dflow-5Ftr.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=PI1HWa27sXLOTKR6A5e3p0PaPt7vS4SMNHQKYIzfXxM&s=ACtkb-7e-7Ifb6QsnMjd4WSYrCfUyWGIbBuNkDZ8V3E&e=>

It runs to 6 pages of reading. But I tried to make the time readers will have to spend worth it.

Before reading the other responses (poisoning my own thinking), I wanted to offer my own reaction. In the discussion of figure 1, you seem to imply that there's some obvious choice of bin packing for the flows involved, but that can't be right. What if the dark green flow has deadlines? Why should that be the one that gets only leftover bandwidth? I'll return to this point in a bit.

The tl;dr summary of the paper seems to be that the L4S approach leaves the allocation of limited bandwidth up to the endpoints, while FQ arbitrarily enforces equality in the presence of limited bandwidth; but in reality the bottleneck device needs to make *some* choice when there's a shortage and flows don't respond. That requires some choice of policy.

In FQ, the chosen policy is to make sure every flow has the ability to get low latency for itself, but in the absence of some other kind of trusted signaling allocates an equal proportion of the available bandwidth to each flow. ISTM this is the best you can do in an adversarial environment, because anything else can be gamed to get a more than equal share (and depending on how "flow" is defined, even this can be gamed by opening up more flows; but this is not a problem unique to FQ).

In L4S, the policy is to assume one queue is well-behaved and one not, and to use the ECT(1) codepoint as a classifier to get into one or the other. But policy choice doesn't end there: in an uncooperative or adversarial environment, you can easily get into a situation in which the bottleneck has to apply policy to several unresponsive flows in the supposedly well-behaved queue. Note that this doesn't even have to involve bad actors misclassifying on purpose: it could be two uncooperative 200 Mb VR flows competing for 300 Mb of bandwidth. In this case, L4S falls back to classic, which with DualQ means every flow, not just the uncooperative ones, suffers. As a user, I don't want my small, responsive flows to suffer when uncooperative actors decide to exceed the BBW.

Getting back to figure 1, how do you choose the right allocation? With the proposed use of ECT(1) as classifier, you have exactly one bit available to decide which queue, and therefore which policy, applies to a flow. Should all the classic flows get assigned whatever is left after the L4S flows are allocated bandwidth? That hardly seems fair to classic flows. But let's say this policy is implemented. It then escapes me how this is any different from the trust problems facing end-to-end DSCP/QoS: why wouldn't everyone just classify their classic flows as L4S, forcing everything to be treated as classic and getting access to a (greater) share of the overall BBW? Then we're left both with a spent ECT(1) codepoint and a need for FQ or some other queuing policy to arbitrate between flows, without any bits with which to implement the high-fidelity congestion signal required to achieve low latency without getting squeezed out.

The bottom line is that I see no way to escape the necessity of something FQ-like at bottlenecks outside of the sender's trust domain. If FQ can't be done in backbone-grade hardware, then the only real answer is pipes in the core big enough to force the bottleneck to live somewhere closer to the edge, where FQ does scale.

Note that, in a perfect world, FQ wouldn't trigger at all because there would always be enough bandwidth for everything users wanted to do, but in the real world it seems like the best you can possibly do in the absence of trusted information about how to prioritize traffic. IMO, best to think of FQ as a last-ditch measure indicating to the operator that they're gonna need a bigger pipe than as a steady-state bandwidth allocator.

Kyle