[tsvwg] draft-white-tsvwg-nqb-02 comments

Sebastian Moeller <moeller0@gmx.de> Sat, 24 August 2019 22:36 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id DA79A120025 for <tsvwg@ietfa.amsl.com>; Sat, 24 Aug 2019 15:36:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.349
X-Spam-Status: No, score=-2.349 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id mVKhvuDli0G0 for <tsvwg@ietfa.amsl.com>; Sat, 24 Aug 2019 15:36:56 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1E63212004E for <tsvwg@ietf.org>; Sat, 24 Aug 2019 15:36:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1566686209; bh=pDZU/KgtEBzmxNXH4hE3uwM8NExL62cddT9XYdh4Ug0=; h=X-UI-Sender-Class:From:Subject:Date:To; b=M5fwmEos107G1rf2HIU/bwNPe6ZrhgaRxR9oUZC+Fl3jKK+sOG4fu+2YhbtV45KH3 pURdPKbGSEBQ3cwpS0O6SLuXVOeH1O/H1gUvAFyF4IYTjkx03X4aqyeqjmzeQKz4Y8 Yq4zzLZM1+VLsTfz6XKtrbxNQpx6839VIC5Tgop0=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from hms-beagle2.lan ([]) by mail.gmx.com (mrgmx105 []) with ESMTPSA (Nemesis) id 1Mqs0X-1iVzTF3S1R-00mp9Z; Sun, 25 Aug 2019 00:36:48 +0200
From: Sebastian Moeller <moeller0@gmx.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Message-Id: <FC5D7B5C-0CF8-4F66-971D-767E32B5D029@gmx.de>
Date: Sun, 25 Aug 2019 00:36:46 +0200
To: tsvwg IETF list <tsvwg@ietf.org>, ECN-Sane <ecn-sane@lists.bufferbloat.net>
X-Mailer: Apple Mail (2.3445.104.11)
X-Provags-ID: V03:K1:fsmjhsUrb7TJ7+qZl5cfhdFuxhC8fbGUk2Wv/wemJDT2V/+tLXv lOrw5wnhQIuYTrgzlkCC31gA64hbOjrIdyCSDvafL0ODOL04jIejgmeWnOagYotN5dITZAb CE/iVIBY9bhov3BTD/6g7fArgoFdIg3KECnVJNACD9EvqVi0LC03CStUAYguzOrqdl6bI/J sMtFl5jrxEon6Lg2Mnx+Q==
X-UI-Out-Filterresults: notjunk:1;V03:K0:iiPNlBn0ktc=:KRJEmN4sOBgi3jRu4KXfT8 xO6a9VwpYAEj7OkWmfWqwpTQwJi7COo6ek4FDKQo6+dwkiwqbQquZv2eeeWajW6RP6BDvejQN PitHONCc60IVrj2DubEJCSOCH5bgF6oitWuUr5D4bSuIJG8c0dkFo815/VeiQU7JauOOudvu/ Aqo3ckudz7oqDhaiGsE2jF5+kDcY1aLeqy1CItoX+lrOU/5nKcpcPfegMSa4LechYryS5AQ8W uspxm5hOr1Vcz5ce78xv7M9Z5xkiPyNVTSEVmBA9/CyRkze+EaeeqXNt9KbIr3qvtPhCBcnJK LFJiY9GjUjtBMT4/DNpDI6UpWC0dGX2DYrCdKeUpCfQu2kimGIWjUpx60YaI0iuxUZfMHa7AG eM+ItjvOb17/l/NeOWxXjfRWRdrx8m6wH48emfAfjEY/Q7MJsSxdc6UoK5YoeADffsnaYdP3g snRQxrCHZz+JkdEJ38dnQeF5VJ4ffBVfMgMByPL3ul9ozZX/sgZOB3ahhdz/kxkQIWFQNF9tl V9lWxzfzZ03Qp9dfX5n2vjcIjLXIXJHc8FzWAjDcIw5KE1xHliLFVS6wRmiUC4YyfKdd5Pq12 fWNVfpK26ae1FFIWWCtGaST0e/GFDalW2g2ca9THOFhdl17a/fMukjF4BFJkNJatuHIR6WwbV ibr9/kSCz0Wx0bcFlyJsVzsIaqIMTDP7npzpgLvM2rt+Cj7Dzl3HlZIAZvLmGcpikcVUd8GNm 66JR6qRnkOIQNs8FP6nOTsUzuIsQMeylg4JaWKXBoyxAOeOtBsb75D7NEVZr1iUgMfpdCQCLZ 5KJCct+AzfNdTrzm7eaEoSFzrJxBrmLcP78JY7zmikrdP/SCzC6FWeBtzSU8/OBaSPld4q+L3 Dj+F9kDEZaJ5+GjzLxU+pAanwa1w5OSniQxE5YzG7N/DtKNblUkTgPNk+h0fyGqekG2aCfvp4 jIdwx9oNPYZkaC1h4QqSjU9Djnm+Rx2rXSl7vh3xXW4SRcPvFq2SLxnisAoOtBztlFv4Zigun bZJYurF3+4ce/5atj/pwV3M8gL3BDBTHFrv5X4Z8N8To4Nn2AuYcIxkIyW3xqayh/Rb/ywgk/ e0kRaVY7ztM7041DLENzzd0EqnFrR8bD7687zxtezP2fb/Xi9WhAULoLaP7Y8sgSsJNM7X5E9 tGTgo=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/YrevUcK83GIe8wZAZLifSu1Dp1o>
Subject: [tsvwg] draft-white-tsvwg-nqb-02 comments
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Aug 2019 22:36:59 -0000

"Flow queueing (FQ) approaches (such as fq_codel [RFC8290]), on the other hand, achieve latency improvements by associating packets into "flow" queues and then prioritizing "sparse flows", i.e. packets that arrive to an empty flow queue. Flow queueing does not attempt to differentiate between flows on the basis of value (importance or latency-sensitivity), it simply gives preference to sparse flows, and tries to guarantee that the non-sparse flows all get an equal share of the remaining channel capacity and are interleaved with one another. As a result, FQ mechanisms could be considered more appropriate for unmanaged environments and general Internet traffic."

	[SM] An intermediate hop does not have any real handle of the "value" of a packet and hence will can not "differentiate between flows on the basis of value" this is true for FQ and non-FQ approaches. What this section calls "preference to sparse flows" is basically the same as NBQ with queue protection does to NBQ-marked packets, give them the benefit of the doubt until they exceed a  variable sojourn/queueing threshold after which they are treated less preferentially, either by not being treated as "sparse" anymore, or by being redirected into the QB-flow queue. With the exception that FQ based solutions will immediately move all packets of such a non-sparse flow out of the way of qualifying sparse flows, while queue protection as described in the docsis document will only redirect newly incoming packets to the NBQ-flow queue, all falsely classified packets will remain in and "pollute" the NQB-flow queue.

"Downsides to this approach include loss of low latency performance due to the possibility of hash collisions (where a sparse flow shares a queue with a bulk data flow), complexity in managing a large number of queues in certain implementations, and some undesirable effects of the Deficit Round Robin (DRR) scheduling."

	[SM] As described in DOCSIS-MULPIv3.1 the queue protection method only has a limited number of buckets to use and will to my understanding account all flows above that number in the default bucket, this looks pretty close to the consequence of a hash collision to me as I interpret this as the same fate-sharing of nominally independent flows observed in FQ when hash collisions occur. While it is fair criticism that these failure mode exists, only mentioning in the context of FQ seems sub-optimal. Especially since the docsis queue protection is defined with only 32 non-default buckets... versus a default of 1024 flows for fq_codel.

"The DRR scheduler enforces that each non-sparse flow gets an equal fraction of link bandwidth,"

	[SM] This is actually a feature not a bug. This will only trigger under load conditions and will give behavior that end-points can actually predict reasonably well. Any other kind of bandwidth sharing between flows is bound to have better best-case behavior, but also much worse worst-case behavior (like almost complete starvation of some flows). In short equal bandwidth under load seems far superior to forward progress than "anything goes" as it will deliver something good enough without requiring an oracle and without regressing intro starvation territory.
	Tangent, I have read Bob's justification for wanting inequality here, but just mention that an intermediate hop simply can not know or reasonably balance importance of traversing flows (unless in a very controlled environment were all endpoints can be trusted to a) do the right thing and also rank their bandwidth use by overall importance).

"In effect, the network element is making a decision as to what constitutes a flow, and then forcing all such flows to take equal bandwidth at every instant."

	[SM] This seems to hold only under saturating conditions, and as argued above seems to be a reasonable compromise that will be good enough. The intermediate hop has reliable way of objectively ranking the relative importance of the concurrently active flows; and without such a ranking, treating flows all equal seems to be more cautious and conservative than basically allowing anything. 
The network element in front of the saturated link needs to make a decision (otherwise no AQM would be active) and the network element needs to "force" its view on the flows (which by the way is exactly the rationale for recommending queue protection). Also the equal bandwidth for all flows at every instant is simply wrong, as long as the link is not saturated this does not trigger, also no flow is "forced" to take more bandwidth than it requires... Let me try to give a description of how FQ behavior looks from the outside (this is a simplification and hence wrong, but hopefully less wrong than the simplification in the draft: Under saturating conditions with N flows, all flows with rates less than egress_rate/N will send at full blast, just like without saturation, then the remaining bandwidth is equally shared among those flows that are sending at higher rates. This does hence not result in equal rates for all flows at every instance.

"The Dual-queue approach defined in this document achieves the main benefit of fq_codel: latency improvement without value judgements, without the downsides."

	[SM] Well, that seems a rather subjective judgement, also wrong given that queue protection conceptually suffers from similar downsides as fq "hash collisions" and lacks the clear and justify-able middle-of-road equal bandwidth to all (that can make use of it) approach that might not be as optimal as the best possible bandwidth allotment, but has the advantage of not requiring an oracle to be actually guaranteed to work. The point is unequal sharing is a "value judgement" just as equal sharing, so claiming dualQ to be policy free is simply wrong.

"The distinction between NQB flows and QB flows is similar to the distinction made between "sparse flow queues" and "non-sparse flow queues" in fq_codel. In fq_codel, a flow queue is considered sparse if it is drained completely by each packet transmission, and remains empty for at least one cycle of the round robin over the active flows (this is approximately equivalent to saying that it utilizes less than its fair share of capacity). While this definition is convenient to implement in fq_codel, it isn't the only useful definition of sparse flows."

	[SM] Have the fq_codel authors been asked whether the choice of this sparseness measure was by convenience (only)?

Best Regards