Re: [iccrg] IETF104 ICCRG TCP Prague presentation

Sebastian Moeller <moeller0@gmx.de> Thu, 04 April 2019 10:06 UTC

Return-Path: <moeller0@gmx.de>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 431A812003F; Thu, 4 Apr 2019 03:06:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.349
X-Spam-Level:
X-Spam-Status: No, score=-2.349 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ove93leaC9O5; Thu, 4 Apr 2019 03:06:36 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 25D2E120005; Thu, 4 Apr 2019 03:06:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1554372352; bh=4ZKzF+sQTaNE7Ww05KssCB+KDBI6j0/0u73Dhvndxjg=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=Kuo+G7CqIO+tC3zQr9T+T6kaJCz0SJsck3/n7FLkCVTl6gDqrgOZhFnrhUfwQ7ioW 0Uwh1CXWGSLhMPjpDkJIaTIFJJsefrK/5sDSBQ2W/SlXS9Qi8ZFTQ7VMy+TotNqFkX /6B3UC5KeoJVzZjDdvCKXajoG7UV/21abVcO7Q1s=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [172.16.12.10] ([134.76.241.253]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MCtql-1h3Ug709b9-009jLY; Thu, 04 Apr 2019 12:05:52 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <a7bf3c8d-bb9e-e7e8-3343-ccdbac795591@bobbriscoe.net>
Date: Thu, 04 Apr 2019 12:05:49 +0200
Cc: iccrg@irtf.org, tsvwg IETF list <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <BB2A0424-E208-40B1-86DD-E74F786A1BE1@gmx.de>
References: <623CA1C6-3425-46E6-AFD9-6FD7D0DBE422@gmx.de> <a7bf3c8d-bb9e-e7e8-3343-ccdbac795591@bobbriscoe.net>
To: Bob Briscoe <in@bobbriscoe.net>
X-Mailer: Apple Mail (2.3445.9.1)
X-Provags-ID: V03:K1:68lDL8dIVSbspmH8pUIXcbp6BMJQs3vjlt+jIEjcHpxro2R6hVS 8NTFM/3fG0SI8xsDwt5L7SZWNvOStVho3CpOpCOAb/fxj3XuDYqTGlv/T2CP+0BWusTurna 48cnPxyTeo4imcf9EhF7uZFHKG/8OTM4QRnRqF8vdXm+ztYF+xZ1pr4OgvOM/qO7OQAc63O K8AHJrwS8IE6CJa9xYYKg==
X-UI-Out-Filterresults: notjunk:1;V03:K0:wIi6VloXOlo=:GfHHDP56cignjfd+0R1CV3 RoA4+GWuWczPH/HzHcbzMUiASSVh6/iEtiFceZ9LazmRq8WZ0+B9ks89OSBcQCCd6MjD2ce5H YgUwse4dvL2B8v5kvrk5oTqzYXC5JAbK1E5HTgN9LOU3uTOzi6kquHRETdAmeUJTjcjBAnHVT pcBzpw0Bre7VhDf2fYeFamZniCy52QnszajjRDfbLUIRVYUQ3H0E+kAoTAllTsKkksZpp49j3 rUnSh9fCLrVGgrKCw7VzcbLqJph0vHl3Eeox0QTARnkz6/p3yzETSKH0eFKVLaiUQ4OBR3Vo6 x4OUpUgJvEHR1JRrZs961FUy7oFMGlBP0itycZ2h7A8+mGiyxKOIwWXSHaScxo8i7Gk2DpoUy i9SOih3eOng4Aro4rocR3lZa6xmH71kGQPPiVVi4VV4q4FUqgpv2CycsDvbGsS3c8tw13BZSj f+BGluf8NEUensaaJOgZy3OndtHrmg8Z11y0xydRTifTlY6WYp8bOMK9nVwm8JcUGmKZ2r+q5 TCccHiq/utjOl2mq9Np/otZILzMBcjQvGvtGg19/0AafE0xbkieUFfyb7lQXbOyoU93mwvGxg bjH7ta5IlRf/dJL0CMswwaewWoqgzRen6f4iK0X3MWCb3oIt8AeZ5SeZIb9DiflNdycidtpVa nP0AGu12GCfnD+hPnsshwiNnKWcdbXzvWcZGmLJiDdQw/TxWMWO0vfY905quWj5KieHDn+4zA jkykEu9HT3VkruT2zuyfJGt9g8an0IoFHD/nxo57G/kzTA/yL5VYTh1hdqX45Hl9rfXDUeO8l i9rHbWWaglAX01ydN5NySQrEmd8N1dEQ8av7+rEH4e85dE/wD991Fv7hUUcmCQSTbJlyycnLE 9WUVYwTEzz8ePD56xwH1xNOW+42fLEv+lU3CsyQSDYGrNuQJMIra1aFjAnXQ2cpan0C96PxEW rMUktgm9AEQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/-Ye5b7OnjSEwDPuNpNw_oOk9ZYA>
Subject: Re: [iccrg] IETF104 ICCRG TCP Prague presentation
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Apr 2019 10:06:39 -0000

Hi Bob,

thanks for your detailed response.


> On Apr 3, 2019, at 19:40, Bob Briscoe <in@bobbriscoe.net> wrote:
> 
> Sebastien, (adding tsvwg, given this is all part of the decision over L4S/SCE),
> 
> On 02/04/2019 12:38, Sebastian Moeller wrote:
>> I finally got around to see the TCP Prague presentation, nice, especially the part around slide 11 "Fallback to Reno-friendly on Classic ECN" caught my interest.
>> 
>> The claim made there is that L4S's inability to properly deduce the type of CE signal (normal ECN response expected versus DCTCP-like ECN response expected) would only matter in the case of ECN-marking single queue AQMs and not for flow-queueung AQMs, and I would like to argue that that should be measured rather than solved by declaration.
>> 
> [BB] I don't know whether you just looked at my slides without the voice track. But the voice track makes it clear that is exactly what I've been setting up.

	I also listened to the video, but it seems I was confused (I am easily confused). So I whole heartedly agree with this approach.

> 
> To save those who weren't in the room from having to sit through the audio, I said that there are two sets of contradictory measurements: 
> 	• those from 'academic' active measurement studies that have found hardly any CE
> 	• those from Apple's passive measurements on Apple devices that have found significant localized amounts of CE.
> My hypothesis is that the active measurements aren't seeing much CE, cos they don't generate enough traffic themselves to congestion the link and they would have to coincide with other traffic to see it passively.

	I also note, that SQM for example defaults to ECN off on the path from the router towards the internet (rationale, with slow uplinks often seen at internet access links the serialization delay from a single packet can be painful already, this rationale might up for re-evaluation in the light of the newer graded types of responses to ECN marks at the core of the L4S/SCE proposals, but I digress). The consequence of this decision is that the likelyhood of seeing ECN marks is reduced making it harder to measure. If we had a significant number of sqm-devices in the field we might help with switching ECN on on the uplink as well, but we have no reliable way to estimate and no "telemetrics" from deployed instances

> 
> I said, in short, I believe the active measurements are more likely wrong, and passive more likely right.
> 
> Nonetheless the CE that Apple measured could be consistent with classic ECN marking from FQ-CoDel, e.g. in OpenWRT devices, which would protect competing flows from L4S flows.
> Apple saw CE "mainly in the uplink", which is likely to be FQ-CoDel. And the smaller amount seen in the downlink could have been peer-to-peer flows that had been marked on entry to the uplink by an FQ-CoDel home router.
> 
> With the help of others, I've devised a test for FQ vs FIFO, but we need to find where the CE-marking routers are before we apply the test (a scatter-gun approach would be infeasible given the need to congest the link). In discussions with Apple folks, we think it's best to run tests at the CDN end, to identify the network prefixes where CE is appearing (the client end used in the original tests usually doesn't know its public IP address).

	Sounds reasonable, with the potential unhappyness of ISPs for the congestion, but then this is similar to what a speedtest does (or netalyzr).

> 
> If we find any CE in the downlink from the CDN, that will disprove my peer-peer FQ hypothesis. But it won't prove that it's from a FIFO. The test for FQ vs FIFO is not straightforward, and I haven't yet worked out how we run it in the relevant direction over the paths where we find CE (not least 'cos that would involve collecting and identifying client IP addresses, which raises privacy concerns). I suspect we'll need to co-opt some volunteers in the affected localities, somehow.
> 
> An alternative to inline testing will be to find out whether the networks where CE is being seen have supplied their customers with routers with FQ-CoDel enabled. That won't prove conclusively that all the CE is coming from FQ-CoDel, but we can only go so far in trying to prove a negative.

	I wonder how are reading out the CE marking at all? Is the idea to place measurement devices between CDN end-hosts and the CPE and look at the IP headers, or is the plan to use a specific application?

> 
>> 
>> 
>> Consider the following case, that get's more and more common: traffic shaping for both ingress and egress somewhere on the end-customer side of an internet access link. For egress I agree with the rationale, that flow-queueing/fair-queueung should deal with flows that appear unresponsive sooner or later (but I note that this still will increase the memory required for the queueing and should not be considered ideal behavior).
>> But for ingress shaping downstream of the "true" bottleneck (I am simplifying over some details here) this rationale falls apart, the unexpected packets the L4S sender kept sending since it misinterpreted the CE mark, will still hog valuable bandwidth of the internet access link, and hence fair queuing on the home end of the true bottleneck will not isolate that homenet from the L4S misunderstanding.
>> 
> [BB] For the avoidance of doubt, I think you're using 'egress' to mean [home -> Internet] (I use it for the other direction, but I can work with your terminology).

	Sorry, this really is not my field so my nomenclature is probably idiosyncratic, but yes, I see this from the vantage point of the home network, and upload traffic egress from my network, while download traffic ingresses, from the other side of the access links, these direction can be defines the other way around.

> 
> In the [Internet -> home] direction, you are right that FQ on the home end would not isolate the access link from a misunderstanding of any CE marking. But I'm not suggesting it could or would.

	But that is the use-case I am concerned about, as with all the promises (better) AQM at the head-end brings, it will IMHO not eradicate the use of "ingress"-shaping on the customer side, as that is required to allow meaning-full prioritization and to establish the bandwidth-sharing model for the home network (whre typically per-flow-fairness is more desirable than no fairness, but where per-internal-IP-address-fairness might be even more desirable, Jonathan's cake actually offers that capability, but since this absolutely requires access to the internal-IP-addresses will require the AQM ti live inside the NAT typical for IPv4 access links).

> The question is about the machine at the Internet side of the access link (BRAS, CMTS, etc) that is queuing traffic into the access link towards the home. We can be nearly certain it is not doing FQ (these machines serve thousands of customers, so I don't believe any use FQ). The question is whether we can find one of these non-FQ machines that has an AQM enabled that does CE marking.
> 
> That's what we're trying to find. It's of course hard to prove a negative - that none exist. All we can hope to do is measure a sample of the paths we find.

	Okay, but any idea what to do about my hobbyhorse, short of completely disabling L4S for all nodes in the home network (which seems heavy-handed, given that L4S's goal "low latency networking" is IMHO desirable for all end-users)? This kind of set-up is not as unique as it might seem, and typically users employing it are latency sensitive and hence certainly part of the audience which might like the L4S approach other-wise.

> 
> 
> A nit: DCTCP is not /unresponsive/. It responds to congestion, but as if it is equivalent to a number of 'classic' (Reno-friendly) TCP flows.

	I agree, hence the term "appear unresponsive", the problem isn't DCTCP here, but a misunderstanding between AQM and involved TCP.

> 
>> 
>> I would be delighted to learn where my assumptions are wrong...
>> 
> Regards

	Thanks for your kind and informative response.

Best Regards
	Sebastian

> 
> 
> 
> Bob
> 
>> 
>> Regards
>> 	Sebastian Moeller
>> 
>> _______________________________________________
>> iccrg mailing list
>> 
>> iccrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/iccrg
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/