Re: [tsvwg] Another tunnel/VPN scenario (was RE: Reasons for WGLC/RFC asap)
Sebastian Moeller <moeller0@gmx.de> Thu, 10 December 2020 23:05 UTC
Return-Path: <moeller0@gmx.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4BC653A1317; Thu, 10 Dec 2020 15:05:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ng771Wr1vDqq; Thu, 10 Dec 2020 15:05:18 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8EBA73A131A; Thu, 10 Dec 2020 15:05:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1607641511; bh=PvkLUoKMxghIDQAiPCe2vAmr0dgYjNmCsl7vWypGA4E=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=Pmk+RsUwAwEfixKG9QMkhkME35scqtDngoXeMsxpRKvVreKcZ3Eed3RCQr9z0HurB tb+6WUgHAhVufrG/hvKXgCvdqPFnoCfiUlAHBQgJmQHfgA2bedHqAxInlt1x7YdcQT N6jDHvFXLR1wxu5lyxybFGRzLTFH5fwRYHyW6TiE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.42.229] ([77.0.204.100]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MDysm-1kxMsk3LOO-009wEJ; Fri, 11 Dec 2020 00:05:10 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <649D8E86-AFFE-4EF5-BA63-D7EE148F574C@cablelabs.com>
Date: Fri, 11 Dec 2020 00:05:08 +0100
Cc: Mirja Kuehlewind <mirja.kuehlewind=40ericsson.com@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <B1806726-FD85-44F2-B20F-800C8178E72B@gmx.de>
References: <MN2PR19MB4045A76BC832A078250E436483E00@MN2PR19MB4045.namprd19.prod.outlook.com> <HE1PR0701MB2876A45ED62F1174A2462FF3C2FF0@HE1PR0701MB2876.eurprd07.prod.outlook.com> <56178FE4-E6EA-4736-B77F-8E71915A171B@gmx.de> <0763351c-3ba0-2205-59eb-89a1aa74d303@bobbriscoe.net> <CC0517BE-2DFC-4425-AA0A-0E5AC4873942@gmx.de> <35560310-023f-93c5-0a3d-bd3d92447bcc@bobbriscoe.net> <b86e3a0d-3f09-b6f5-0e3b-0779b8684d4a@mti-systems.com> <7335DBFA-D255-43BE-8175-36AB231D101F@ifi.uio.no> <DA84354E-91EC-4211-98AD-83ED3594234A@gmail.com> <1AB2EA08-4494-4668-AD82-03AEBD266689@ifi.uio.no> <CC06401C-2345-4F68-96FA-B4A87C25064E@gmail.com> <24C55646-C786-4B55-BFEE-D30BBB4ED7C4@ifi.uio.no> <216A1CE6-C7ED-4ACB-9E8A-AB0CC0408712@ericsson.com> <E95EDB52-C753-46E8-9188-30E3952FB031@gmx.de> <CB6DBFBD-B65C-4EDE-92B9-F7E0FED7715A@cablelabs.com> <F746C72B-EEF1-493E-93BC-7E4731A00C20@gmx.de> <649D8E86-AFFE-4EF5-BA63-D7EE148F574C@cablelabs.com>
To: Greg White <g.white@CableLabs.com>
X-Mailer: Apple Mail (2.3445.104.17)
X-Provags-ID: V03:K1:65TjWvWkKHCtoIkGPlp9vnxGi4nOuD65U4Arn6w54U+jIPMwpQp J5afRGAiB99ZjQkbJLAKdiWejO04OVVsV9v/dMKhJIVY6pXYqNZ7Iwu1lMayQA0m7rJRxiC 4q/146XYmd+ixGpkRzVM2O2Yl/oqJ6xQU05WxYEfEmlbLroLhaIPnMNqZ15QfB7aG6WOovD FpuwQ2kU1MYEUac8ESu3Q==
X-UI-Out-Filterresults: notjunk:1;V03:K0:b5ohscxaNRE=:bXzdWOXaff+5KwqT4vZHBz 6DdhLsij2HlRs22W0oRpTb4rnqQUmBGTKkaIs/dp5zIIrTd8Qch4eBVdc4kfrNJgKXJDksLDL WmqrtuFgMQIy6snuUTzhafz1mi/8Kt24+eipQWaZmpjSUcmE5Ok2eyGrosIdzBuvgzw+oRaTj VxPJts3wri2Hm/ZMYpIFRki/jbfZksKWcopj8iDjjhK6a/xvKmGMglczSveFRkQ5SZ+wJDXtu s1mCdxNb38a3dcqVgw/L4twRomlGrWMeeZ4nr1UymwZI6Xj5f4bHeAx3w5zeebqRNycuPdkQW jyb6AyOdfpEJ0WAh++UEVdyUSKZNzrETcXgcyhjYZAaaA73jHRAEhq6csjQScmPV/ijPJIIIL SQ2lZOio0Rrl1YikBkDbzyD//XT2vycOwhLwq5VLgOoBt1DrrEpHWVOhDnJ4m0NF9GiTlVWgp UBHluL1Y+trC/OjYU59NIsDp9PYSHrocw0ieP4jvf9J9opQELy/7gGj44dz5yeVfzGkVWcLQv qOfYn0Ws8RrSOv5ID0MpGJ3vxIrDQjaEE4qdeQOhPNspYYz/F7iEfT7kl3VM7CHiVLmZ6UWb0 yeVnd0K9Uc8RzlWTFvKr6eamcq/6lCDpOTu0qKLAbJR6uhXPAl5qHuJWf1EOSmHe/re7/rKZe h6qCgH+yNBZt9YqjpCI1m5HTO3CTEXikdqDKTpcJO8pLBQMcoBIplix5YbN4a7bW1S+G3P0uy tI8p/NGZFqOtYjoYdBuIOTCtsVrDJoYCROQSpdClFcdhjR8V02OcoPDrpJzMdRYd/yK0x9dxb 1JrtK67SO+LU6v3sAcXISilmOXw9T7cCQ1phJqW4r3QM8kHWr8oTxDFIJo+WgcmNOInAvomuv OxLSjr8qzPZktoyZKSSQ==
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/8lXMZhuZiF2-IhxitnFW45EPdfo>
Subject: Re: [tsvwg] Another tunnel/VPN scenario (was RE: Reasons for WGLC/RFC asap)
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Dec 2020 23:05:25 -0000
Hi Greg, > On Dec 10, 2020, at 21:41, Greg White <g.white@CableLabs.com> wrote: > > I'm not sure why I take the bait on these... [SM] You might not realize it, but instead of the "moral high ground" for which you might be aiming, statements like this reveal that your objectivity in this matter is not really that much different from how you probably perceive mine to be. I might not be the most civil discussion partner, but I can and will be convinced by facts and real data, by pure statements without supporting data, not so much. > > Sebastian, > > You asked for a realistic scenario where a dramatic reduction in latency variation matters, [SM] Well going from 5 to 1 is not dramatic, sorry. But other than that, I am happy to agree that latency and latency variation do matter. That is why participated early on in the bufferbloat effort (at a very minor position mainly testing and reporting) and why I have been operating a latency reducing AQM on my home networks for the last 10 years. > and I gave you one (well, two). Just because you don't see the value in improving latency and latency variation doesn't mean there is no value. [SM] That is not what I argued. I argued that the additional latency variation reduction that L4S brings to the table is simply not worth the cost L4S incurs. That is different from claiming it has no value. If we lived in a pure over-buffered FIFO world with expected queueing delays in the multiple 100ms range, I would welcome L4S and what it offers, but that simply is not the reality anymore, we are talking about average queueing delays of ~5ms that L4S needs to compete with. > There is a strong interest by many in this group (and the broader networking industry) in solutions to Internet latency / latency variation, and high-fidelity congestion signaling is a key and necessary component. [SM] I keep repeating that the interest in HFCC seems to be the main reason why the idea of granting RFC status to the L4S drafts is still entertained, in spite of a) the almost complete lack of relevant testing data from team L4S (I include ableLabs here after revisiting the Low Latency DOCSIS specs) and b) the fact that the reference protocol is still in early development and admittedly not nearly finished (evident in Koen's recent proposal to change TCP Prague to switch die a CUBIC like response for RTT >= 80ms, with that numeric cut-off RTT value apparently purely driven by the fact that Pete's tests use 80 ms as one of the test RTTs) and c) the fact that the reference AQM, unlike other AQMs, requires active participation of the un-finished transport protocols to achieve the rough equitable sharing the L4S drafts seem to promise, hinting that the coupling design might be terminally unsafe/flawed. > Does it immediately solve all sources of latency? No. Is TCP Prague (or BBRv2, or SCReAM), as currently implemented, perfect? No. [SM] Look Greg, the issue is not that the L4S transport(s) are not polished enough, but that e.g. TCP Prague clearly is notbaked-enough to be considered ready for deployment, just look at its performance in the most likely bottleneck system a flw might encounter on the open internet, the dreaded FIFO: https://camo.githubusercontent.com/0ca81a2fabe48e8fce0f98f8b8347c79d27340684fe0791a3ee6685cf4cdb02e/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f73312d6368617274732f727474666169725f63635f71646973635f31306d735f3136306d732e737667 Look, how TCP Prague is clearly not useful in a world where it encounters a FIFO, as it severely throttles itself if it competes with CUBIC (independent of RTT). And neither QUIC, nor BBRv2 nor SCReAM actually attempt to implement any/all of L4S' protocol requirements and hence can not even be realistically tested during the planned experiment. Unless you ignore your own requirements and treat them to the LL queue without making sure they are compliant, but at that point you invalidate your argument that you somehow need RFC status before experimentation can start (because if you are willing to ignore these RFC during the actual experiment it seems unclear why you needRFC status at all). > Is there an issue with RFC3168 interoperability? Some say yes, some say no. [SM] Well, for anybody still in the "no" camp, please have a look at https://camo.githubusercontent.com/42b2427b18a22c9c2f181942fa9fbd08e5b678606f8b3f9c75262e4dfd6d4f1c/687474703a2f2f7363652e646e736d67722e6e65742f726573756c74732f6c34732d323032302d31312d3131543132303030302d66696e616c2f6c34732d73362d726663333136382f6c34732d73362d726663333136382d6e732d7072616775652d76732d63756269632d6e6f65636e2d66715f636f64656c5f31715f2d35304d6269742d32306d735f7463705f64656c69766572795f776974685f7274742e737667 It is a clear fact that there are severe "RFC3168 interoperability issues" unless you want to use a different term for TCP Prague unfairly driving competing non-L4S flows into starvation. We could be of different opinion on how to evaluate that fact (at least as an intellectual exercise), but the interoperability issue is not a matter of opinion. > > But we've heard from a number of very experienced transport protocol and congestion control researchers who see L4S congestion signaling as a feasible path, and we've heard from a large number of network operators who also see it as a feasible path and are willing to make the investment in deploying it. [SM] And all of the above have so far refrained from answering the explicit question I posed several times that they show the data that they have that convinced them that L4S is safe to deploy and will actually deliver on its promises over the existing internet. Sure in lieu of data we might resort to asking experts on predictions and opinions, but here we are talking about an issue that can and should be answered by experiments, data and facts. > > Also: > > 50ms is not the industry target for MTP latency for VR, it is generally 20ms or less. > https://developer.oculus.com/learn/bp-rendering/ [SM] This is pretty thin on details or references, but since Carmack worked at Occulus no wonder we see his 20ms number pop up again... > https://www-file.huawei.com/-/media/CORPORATE/PDF/ilab/cloud_vr_oriented_bearer_network_white_paper_en_v2.pdf [SM] Mmmh, that is pretty slick and interestin (also pretty short on real experimental data). Interestingly they actually describe how the network should evolve to deliver immersive VR from the cloud that might actually work reliably, as it aims for carving out reserved rate channels to ultimately deliver sensor data to cloud and cloud rendered frames to the display with reliable real-time guarantees. IMHO maybe a bit over engineered, but certainly something that can work robustly and reliably. But that is quite a different beast than L4S... So, while meh on the 20ms claim (no supporting data just listed as a number), +1 for coming up with a reliable way to assure RT delivery of data that seems not based on wishful thinking. > https://www.cs.unc.edu/~zhengf/ISMAR_2014_Latency.pdf [SM] Believe it or not, I have actually cited a reference where the magic 20ms came from... But none of this is actual peer-reviewed psychophysics research making the whole 20ms or less a bit of a cargo cult thing... Now there is actual research out there that confirms some of that (especially in the context of nausea induced by commercial/military flight simulators), but I will not waste time on doing somebody else's literature reseach. > > In isochronous applications, late packets generally *do* mean lost packets. [SM] Well, sure, but VR rendering is not necessarily an isochronous application, unless you make it such, and for in-the-cloud-rendering I am not sure that structuring your application with hard-realtime requirements is that great a design. > Current VR gear runs at 90 fps, and cloud games run at up to 120 fps. A 100ms latency spike [SM] See my links data, where the peak delay in the high priority tier one would use for real-time-ish data was 12.5ms (measured over all flows in that tier, not all flows will actually have encountered that level of delay), not 100ms, with average delay at 3.7ms. Why are we talking about a 100ms spike again? > can mean 9-12 frame times with no data received, and then 9-12 frames arriving back-to-back. Adaptive dejitter buffers can reduce packet drop rate, but only at the expense of added (and variable) delay. Either way it is a degradation. [SM] Sure, but again, if you need hard realtime guarantees, do not outsource your rendering to compute units connected over a best effort "bus" (network). Really, that is a typical, "doctor it hurts when I do X" type of situation, with the obvious solution, "don't do X then" ;) > > To Wes's point, I'm going to try to leave it there on this thread rather than encouraging even more back and forth. Don't interpret the lack of further response as agreement. [SM] ??? That ties in nicely with your patronizing introductory statements, but what it really shows that you are out of real arguments. I note that you simply ignored the parts in my response where I supported my arguments with actual data (and kept using numbers)... We could have an actual discussion instead.... Best Regards Sebastian > > -Greg > > > > On 12/9/20, 4:08 PM, "Sebastian Moeller" <moeller0@gmx.de> wrote: > > Greg, > > >> On Dec 9, 2020, at 22:56, Greg White <g.white@CableLabs.com> wrote: >> >> Sebastian, >> >> As usual, there was a lot of hyperbole and noise in your response, most of which I'll continue to ignore. But, I will respond to this: > > [SM] I believe you might not have read that post, what in there is so hyperbolic that it offended your good taste? My predictions of how L4S is going t behave are all extrapolations from data (mostly Pete's, some from team L4S), if you believe these to be wrong, please show how and why. > > >> >> [SM] Mirja, L4S offer really only very little advancement over the state of the art AQMs, 5ms average queueing delay is current reality, L4S' 1 ms (99.9 quantile) queueing delay really will not change much here, yes 1ms is smaller than 5ms, but please show a realistic scenario where that difference matters. >> >> >> This underscores a perennial misunderstanding of the benefits of L4S. It isn't about average latency. Current "state of the art" AQMs result in P99.99 latencies of 40-120 ms (depending on the traffic load), > > [SM] This is not what I see on my link, sorry. Here are the cake statistics from running 3 concurrent speedtests on three different devices (one of the speedtests exercised 8 bidirectional concurrent flows each marked with one of the CS0 to CS7 diffserv codepoints to exercise all of cake's priority tins). > > > qdisc cake 80df: dev pppoe-wan root refcnt 2 bandwidth 31Mbit diffserv3 dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 50 mpu 88 > Sent 1692154603 bytes 4053117 pkt (dropped 2367, overlimits 3194719 requeues 0) > backlog 38896b 28p requeues 0 > memory used: 663232b of 4Mb > capacity estimate: 31Mbit > min/max network layer size: 28 / 1492 > min/max overhead-adjusted size: 88 / 1542 > average network hdr offset: 0 > > Bulk Best Effort Voice > thresh 1937Kbit 31Mbit 7750Kbit > target 9.4ms 5.0ms 5.0ms > interval 104.4ms 100.0ms 100.0ms > pk_delay 48.8ms 25.0ms 12.5ms > av_delay 8.1ms 4.7ms 3.7ms > sp_delay 1.4ms 349us 311us > backlog 3036b 28348b 7512b > pkts 48019 3879642 127851 > bytes 19893008 1594898953 80834953 > way_inds 0 93129 167 > way_miss 5 80699 678 > way_cols 0 4 0 > drops 0 2367 0 > marks 262 1322 405 > ack_drop 0 0 0 > sp_flows 1 7 2 > bk_flows 1 8 2 > un_flows 0 0 0 > max_len 1492 1492 1492 > quantum 300 946 300 > > qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ---------------- > Sent 8865345923 bytes 7688386 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc cake 80e0: dev ifb4pppoe-wan root refcnt 2 bandwidth 95Mbit diffserv3 dual-dsthost nat nowash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 50 mpu 88 > Sent 8862364695 bytes 7686372 pkt (dropped 1971, overlimits 11290382 requeues 0) > backlog 64156b 43p requeues 0 > memory used: 567Kb of 4750000b > capacity estimate: 95Mbit > min/max network layer size: 28 / 1492 > min/max overhead-adjusted size: 88 / 1542 > average network hdr offset: 0 > > Bulk Best Effort Voice > thresh 5937Kbit 95Mbit 23750Kbit > target 5.0ms 5.0ms 5.0ms > interval 100.0ms 100.0ms 100.0ms > pk_delay 2.0ms 9.2ms 102us > av_delay 1.1ms 4.5ms 15us > sp_delay 4us 522us 10us > backlog 0b 64156b 0b > pkts 71084 7607860 9442 > bytes 102481496 8762307523 556904 > way_inds 0 223028 0 > way_miss 177 77358 264 > way_cols 0 3 0 > drops 18 1953 0 > marks 0 1014 0 > ack_drop 0 0 0 > sp_flows 1 17 1 > bk_flows 0 8 0 > un_flows 0 0 0 > max_len 1492 1492 638 > quantum 300 1514 724 > > Not only does the average delay (av_delay) in the non Bulk (cake's name for the scavenger background class) below 5ms, no even, the peak delay (P100 in your nomenclature) stays well below your predicted 40 to 120ms. These statistics come from an uptime of 8 hours (my ISP reconnects my PPPoE session one per day, at which point the stats get cleared). Now that is an anecdotal data point, but at least it is a real data point. > >> compared against ~6 ms at P99.99 for L4S. > > [SM] I accept your claim, but want to see real data with e.g. real speedtests as artificial load generators over the real existing internet, not simulated runs in a simulated network. > > >> For a cloud-rendered game or VR application, P99.99 packet latency occurs on average once every ~4 seconds (of course, packet latency may not be iid, so in reality high latency events are unlikely to be quite that frequent). > > [SM] Well, cloud rendered games are already a commercial reality, so the state of the art can not be so terrible that people are not willing to play. > > >> Since RTTs of greater than 20ms have been shown to cause nausea in VR applications, this is a realistic scenario where the difference *really* matters. >> >> Put another way, if, of that 20 ms motion-to-photon budget for VR, a generous 10 ms is allowed for queuing delay (leaving only 10 ms for motion capture, uplink propagation, rendering, compression, downlink propagation, decompression, display), current "state of the art" AQMs would result in 10-50% packet loss (due to late packet arrivals) as opposed to <0.001% with L4S. > > [SM] Greg, I believe you are trying to paraphrase John Carmack (see for example https://danluu.com/latency-mitigation/, would have been nice if you included a citation). There he also says "A total system latency of 50 milliseconds will feel responsive, but still subtly lagging." And that 50ms goal is well within reach with the state of the art AQMs. But even 20 ms is within reach if the motion tracking is done predictively (think Kalman filters) in which case network delays can be speculated over. > > But the fact alone that we have to go that deep into the weeds of an obscure example like VR-with-network-rendering to find a single example where L4S's claimed (not realistically proven yet) low queueing delay might be relevant is IMHO telling. Also do you have data showing your claim "current "state of the art" AQMs would result in 10-50% packet loss (due to late packet arrivals)* as opposed to <0.001% with L4S" is actually true for the existing L4S AQM and transport protocols. To me that reads like a best case prediction and not like hard data. > One could call framing predictions as if they were data as "noise" and "hyperbolic", if one would be so inclined... But humor me, and show that these are numbers from real tests over the real internet... > > > IMHO the fact that the best you can offer is this rather contrived VR-in-the-cloud example** basically demonstrates my point that L4S offers only very little over the state of the art, but does so at a considerable cost. > > > Best Regards > Sebastian > > *) That is not how packet loss works. A late packet is not a lost packet, and in this example displaying a frame that is 20+Xms in the past is better than not changing the frame at all. > > **) Which has a very simple solution, don't render VR via the cloud, the won 10ms processing time can be put to good use for e.g. taking later motion data into account for tighter integration between somatosensation and visual inputs... I understand why you resort to this example, but really, this is simply a problem, where cutting out the network completely is the best solution, at least for motion sensor integration.... > >> >> -Greg >> >> >> >> >> >> >> >> >> >> >> On 12/4/20, 6:38 AM, "tsvwg on behalf of Sebastian Moeller" <tsvwg-bounces@ietf.org on behalf of moeller0@gmx.de> wrote: >> >> Hi Mirja, >> >> more below in-line, prefixed with [SM]. >> >> >>> On Dec 4, 2020, at 13:32, Mirja Kuehlewind <mirja.kuehlewind=40ericsson.com@dmarc.ietf.org> wrote: >>> >>> Hi all, >>> >>> to add one more option. >>> >>> To be clearly any issues discussed only occurred if RCF3168-AQMs (without FQ) are deployed in the network (no matter if any traffic is actually RFC3168-ECN enabled or not). >>> >>> My understanding of the current deployment status is that only ECN-AQMs with FG support are deployed today and I don't think we should put a lot of effort in a complicated RFC3168-AQM detection mechanism which might negative impact the L4S experiment if we have no evidence that these queues are actually deployed. >> >> [SM] So you seem to reject the tunnel argument, Could you please elaborate why tunnels seem ignorable in this context, but are a big argument for re-defining CE? These two positions seem logically hard to bring into alignment. >> >>> >>> Further I would like to note that RCF3168 AQMs are already negatively impacting non ECN traffic and advantaging ECN traffic. >> >> [SM] You must mean that rfc3168 enabled flows do not suffer the other-wise required retransmission after a drop and get a slightly faster congestion feed-back? That is a rather small benefit over non-ECN flows, but sure there is a rationale for ECN usage. >> >>> However, I think it's actually a feature of L4S to provide better performance that non-ECN and therefore providing a deployment incentive for L4S, >> >> [SM] There is a difference between performing better by doing something better and "making the existing traffic artificially slower" but that is what L4S does (it works on both ends), stacking the deck against non-L4S traffic. >> >>> as long as non-L4S is not starved entirely. >> >> [SM] Please define what exactly you consider "starved entirely" to mean otherwise this is not helpful. >> >>> We really, really must stop taking about fairness as equal flow sharing. >> >> [SM] Yes, the paper about the "harm" framework that Wes posted earlier seems to be a much better basis here than a simplistic "all flows need to be strictly equal" strawmen criterion. >> >> >>> That is not the reality today (e.g. video traffic can take larger shares on low bandwidth links and that keeps it working) >> >> [SM] You wish, please talk to end users that want at the same time use concurrent video streaming and jitter sensitive on-line gaming over their internet access link, and the heroic measures they are willing to take to make this work. It is NOT working as desired out of the box in spite of DASH video being adaptive and games requiring very little traffic in comparison. The solution would be to switch video over to CBR type of streaming instead of the current burtsy video delivery (but that is unlikely to change). >> >> >> IMHO L4S will not change much here because a) it still aims to offer rough per flow fairness (at least for flows with similar network paths) and b) the real solution is controlled/targeted un-fairness where a low latency channel is carved out that works in spite of other flows not cooperating (L4S requires implicit coordination/cooperation of all flows to achieve its means, which is to stay civil optimistic). >> >> >>> and it is not desired at all because not all flows are equal!!! >> >> [SM] Now, if only we had a reliably and robust way to rank flows by importance that is actually available at the bottleneck link we would be set. Not amount of exclamation marks is going to solve that problem, that importance of flows is a badly defined problem. If our flows cross on say our shared IPS's transit uplink, which is more important? And who should be the arbiter of importance, you, me, the ISP, the upstream ISP? That gets complicated quickly, no? >> >> >>> The endpoints know what is required to make their application work and as long as there is a circuit breaker that avoids complete starving or collapse, the evolution of the Internet depends on this control in the endpoint and future applications that potentially have very different requirements. Enforcing equal sharing in the network hinders this evolution. >> >> [SM] Well, the arguments for equal-ish sharing are: >> a) simple to understand, predict and measure/debug (also conceptually easy to implement). >> b) avoids starvation as best as possible as evenly as possible >> c) is rarely pessimal (and almost never optimal), often "good enough". >> >> Compare this with your proposed "anything goes" approach (which does not reflect the current internet which seems mostly rough equitable sharing in nature) >> a) extremely hard to make predictions unless the end point controls all flows over the bottleneck >> b) has not inherent measures against starvation >> c) Has the potential to be optimal, but that requires a method to rate relative importance/value of each packet that rarely exist at the points of congestion. >> >> How should the routers at a peering point between two AS know, which of the flows in my network I value most? Simply they can't and hence will not come up with the theoretically optimal sharing behavior. I really see no evolutionary argument for anything goes here. >> >>> >>> I also would like to note that L4S is not only about lower latency. Latency is the huge problem we have in the Internet today because the current network was optimized for high bandwidth applications, however, many of the critical things we do on the Internet today actually is more sensitive to latency. This problem is still not fully solved, event hough smaller queues and AQM deployments are a good step in the right direction. >> >> [SM] Mirja, L4S offer really only very little advancement over the state of the art AQMs, 5ms average queueing delay is current reality, L4S' 1 ms (99.9 quantile) queueing delay really will not change much here, yes 1ms is smaller than 5ms, but please show a realistic scenario where that difference matters. >> >> >>> L4S goes even further and the point is not only about reducing latency but to enable the deployment of a completely new congestion control regime with takes into account all the lessons learnt from e.g. data center deployment where we not have to be bounded by today's limitation of "old" congestion controls and co-existence. >> >> [SM] I do smell second system syndrome here. Instead of aiming for a revolution, how about evolving the existing CCs instead? The current attempts at making DCTCP fit for the wider internet in the guise of TCP Prague are quite disappointing in what they actually deliver. To be blunt TCP Prague demonstrates quite well that the initial assumption DCTCP would work well over the internet if only it was safe to do so was simply wrong. The long initial ramp up time and the massively increased RTT-bias as well as the failure to compete well with cubic flows in FIFO bottlenecks are clear indicators that a new L4S reference transport protocol needs to be developed. >> >> >>> L4S is exactly a way to transmission to this new regime without starving "old" traffic but there also need to be incentives to actually move to the new world. That's what I would like to see and why I'm existed about L4S. >> >> [SM] That is a procedural argument that seems to take L4S's promises at face value, while ignoring all the data that demonstrate L4S has still a long way to go to actually deliver on its promises. >> I also do not believe it to be an acceptable way to create incentives by essentially making existing transport protocols perform worse (over L4s controlled bottlenecks). But that is what L4S does. >> >> Best Regards >> Sebastian >> >> >>> >>> Mirja >>> >>> >>> >>> >>> On 04.12.20, 12:49, "tsvwg on behalf of Michael Welzl" <tsvwg-bounces@ietf.org on behalf of michawe@ifi.uio.no> wrote: >>> >>> >>> >>>> On Dec 4, 2020, at 12:45 PM, Jonathan Morton <chromatix99@gmail.com> wrote: >>>> >>>>> On 4 Dec, 2020, at 1:33 pm, Michael Welzl <michawe@ifi.uio.no> wrote: >>>>> >>>>> Right; bad! But the inherent problem is the same: TCP Prague’s inability to detect the 3168-marking AQM algorithm. I thought that a mechanism was added, and then there were discussions of having it or not having it? Sorry, I didn’t follow this closely enough. >>>> >>>> Right, there was a heuristic added to TCP Prague to (attempt to) detect if the bottleneck was RFC-3168 or L4S. In the datasets from around March this year, we showed that it didn't work reliably, with both false-positive and false-negative results in a variety of reasonably common scenarios. This led to both the L4S "benefits" being disabled, and a continuation of the harm to conventional flows, depending on which way the failure went. >>>> >>>> The code is still there but has been disabled by default, so we're effectively back to not having it. That is reflected in our latest test data. >>>> >>>> I believe the current proposals from L4S are: >>>> >>>> 1: Use the heuristic data in manual network-operations interventions, not automatically. >>>> >>>> 2: Have TCP Prague treat longer-RTT paths as RFC-3168 but shorter ones as L4S. I assume, charitably, that this would be accompanied by a change in ECT codepoint at origin. >>>> >>>> Those proposals do not seem very convincing to me, but I am just one voice in this WG. >>> >>> Yeah, so I have added my voice for this particular issue. >>> >>> Cheers, >>> Michael >>> >>> >> >> > >
- [tsvwg] Another tunnel/VPN scenario (was RE: Reas… Black, David
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Ingemar Johansson S
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Bob Briscoe
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Bob Briscoe
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Gorry Fairhurst
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Rodney W. Grimes
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Wesley Eddy
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Steven Blake
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Michael Welzl
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Michael Welzl
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Jonathan Morton
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Michael Welzl
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Mirja Kuehlewind
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Michael Welzl
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Bob Briscoe
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Steven Blake
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Greg White
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Greg White
- Re: [tsvwg] Another tunnel/VPN scenario (was RE: … Sebastian Moeller