Re: [PANRG] On the new text on ECN: draft-irtf-panrg-what-not-to-do-16
Bob Briscoe <ietf@bobbriscoe.net> Fri, 12 March 2021 00:41 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: panrg@ietfa.amsl.com
Delivered-To: panrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 67F093A15AD for <panrg@ietfa.amsl.com>; Thu, 11 Mar 2021 16:41:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.433
X-Spam-Level:
X-Spam-Status: No, score=-1.433 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jtjTy0I4Kq7l for <panrg@ietfa.amsl.com>; Thu, 11 Mar 2021 16:41:33 -0800 (PST)
Received: from mail-ssdrsserver2.hosting.co.uk (mail-ssdrsserver2.hosting.co.uk [185.185.84.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2FED53A15B0 for <panrg@irtf.org>; Thu, 11 Mar 2021 16:41:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=c3ZSrHI6EIkumU5z0sgRg+gnDYyBliwVIhD71eA3Jdk=; b=cBEtj30C0cHW9SDzZUu+lIR92 Wy/Pawf8yEtJZwAKRApNBiaDQccawRyPOntOoWs9Qz9KkrAfhr8WSRC/8knKhYFyzz81ATXBYRSvc 50JkIhlRf3WvSTDe5ivJ9NPqEb/qk2w7AOsgtoHimpG97GPvvpwQK6cykwoku18pYok4lAtjYgVp+ xKWFVn1BFJpcThjVB/oLEvIAbpWw4PD8UL2Cp51q0BfxJN6qAk05jUMI1XPraz1egFgw27ICVKprh 88zrRhTE7dVfk6S7+jErk/aRqW/BtbZS1Ubg8N/KgO09ljlhhHfW2fNVgzUcw0LYSqdskBoNng6h8 Zd+hanEpw==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:51154 helo=[192.168.1.11]) by ssdrsserver2.hosting.co.uk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <ietf@bobbriscoe.net>) id 1lKVrs-0007ap-W5; Fri, 12 Mar 2021 00:41:29 +0000
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Cc: panrg@irtf.org
References: <d072512f-ed66-bb8b-7338-58ed720210a2@erg.abdn.ac.uk> <45a9076d-d573-b976-8465-3e731081169a@erg.abdn.ac.uk> <455c5465-7a2a-211e-a712-1c6b412c73b4@bobbriscoe.net> <CAKKJt-fomvCaK+8LW=UmC2VcAXBD2KKhA_he6iys8OvcLoOZBA@mail.gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <16ffb0c2-2330-6770-c0ce-53abc082037c@bobbriscoe.net>
Date: Fri, 12 Mar 2021 00:41:27 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <CAKKJt-fomvCaK+8LW=UmC2VcAXBD2KKhA_he6iys8OvcLoOZBA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------64153E83CF5542F0A7EB1003"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hosting.co.uk
X-AntiAbuse: Original Domain - irtf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hosting.co.uk: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hosting.co.uk: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/panrg/fSopz-997Zvex6XNCN0ekrYoTvE>
Subject: Re: [PANRG] On the new text on ECN: draft-irtf-panrg-what-not-to-do-16
X-BeenThere: panrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Path Aware Networking \(Proposed\) Research Group discussion list" <panrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/panrg>, <mailto:panrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/panrg/>
List-Post: <mailto:panrg@irtf.org>
List-Help: <mailto:panrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/panrg>, <mailto:panrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Mar 2021 00:41:36 -0000
Spencer, The apocrypha in the draft needs to tell the whole story. The current message is that everything was done right, except we got blind-sided by a bug. No. ECN involved a /three/-part deployment, client, server /and/ bottleneck AQM. The crashing home routers pushed back the client part, and servers until they finally realized they could enable ECN without making anyone more vulnerable to the home router crashes. But my email was about why the network part never got deployed (until the more recent FQ-CoDel and CAKE deployments). Even once the bug was out of the way, the deployment pain for networks wasn't worth the small performance gain. In the language of the draft, ECN didn't "Outperform End-to-end Protocol Mechanisms", which had had plenty of time to mask losses in other ways before ECN arrived (all evidenced in my original posting). The final deployment of the network part in FQ-CoDel and CAKE is more difficult to explain. I think that was down to the nature of the open source communities that built them. A network operator would have assessed whether ECN's performance gain was worth the cost of deployment (as in the story in my original posting). But for the open source community the ability to function efficiently was enough. It wasn't driven by the more traditional cost-benefit analysis approach of commercial operators. In contrast, if you look at most machines at the broadband access bottleneck from the major vendors (Nokia, Ericsson, Siemens, etc), they didn't implement AQM, let alone ECN, 'cos the operators they were selling to tendered for QoS features, which they understood as something the network alone provides, not something it helps end-systems to provide for themselves (as in AQM & ECN). Bob On 11/03/2021 23:37, Spencer Dawkins at IETF wrote: > Thanks for continuing this discussion on the mailing list. If I might > add this ... > > On Thu, Mar 11, 2021 at 11:13 AM Bob Briscoe <ietf@bobbriscoe.net > <mailto:ietf@bobbriscoe.net>> wrote: > > Gorry, and panrg, > > I'd agree with Gorry, about one chance per generation. > > > My understanding (and we're talking about this to understand it > better) is that there are two things in play with the early ECN > deployment experiment. > > Yes, routers crashing when they say non-zero ECN bits was in a device > which (at the time) wasn't likely to be updated or replaced for a > number of years (see "equipment generation"), but the second point > (and I think it was Kireeti who said this well) was that once you've > updated or replaced all of the problematic equipment, the people who > turned ECN off didn't immediately turn ECN back on (I remember the > phrase "a bad taste in their mouths" mentioned multiple times. > > So, as Kireeti and others observed, generations between updates are > getting shorter, but it's not clear that the ability of people to > forgive and forget past experiences is getting shorter in the same > way. I'm remembering the quote from Oscar Wilde, "Second marriage is > the triumph of hope over experience" - I think we're talking about the > same thing as "second marriage". > > Best, > > Spencer > > I'd also say that there are two main prongs to the ECN story. Not > just > the "one chance" point. > The other is "Outperforming End-to-end Protocol Mechanisms" > > When I was asked to write the business case for deploying ECN in > BT, my > colleagues in the tech strategy team pushed me on this point of > outperforming e2e mechanisms. Jamal Salim's performance evaluation > [RFC2884] showed ECN gave no performance benefit, except for short > flows. I had to admit that e2e FEC for short flows would be more > likely > to win out. And at the time, I'd just found Damon Wischik paper about > how little the extra percentage of traffic volume would be, if every > sender just duplicated the first few packets of each flow [Wischik08]. > > So the business case flat-lined. End systems had long since worked > out > tonnes of loss-hiding tricks. There was far too great a risk that, by > the time the 3-part deployment had got anywhere (sender, receiver and > network), the problem would have been solved e2e, and the high-risk > high-cost investment would all have been wasted. > > This was actually the start of my journey to realize that the "ECN = > drop" rule was the problem, 'cos it disallowed the real benefit of > ECN - > to cut queuing delay, by providing a finer-grained signal than > loss. I > did some calculations to work out that the noise in the delay > signal was > too great to get queue delay down as low as would be achievable > with ECN > (which could use virtual queues once ISPs realized that bandwidth had > become plentiful enough). That was actually when I started working on > chirping (bringing in Mirja as a research fellow) and came to the > conclusion that we could only get a better delay signal out of the > noise > by creating more noise with the chirps. That's when I realized the > chirps should only be used at start-up, and ECN was the only way > to keep > queueing extremely low under load. Today you can see the limits of > using > e2e delay to reduce queuing in BBR, although of course BBR is > unlikely > to be the last word in e2e delay reduction. > > I know it's unlikely that every ISP went through such a rigorous > exercise, but still none deployed it - probably intuitively > reaching the > same conclusion. The main reason being the deployment pain wasn't > worth > the small performance gain. Which is a TL;DR summary of all the above. > > The issue with Linksys home routers crashing wasn't a biggie in that > assessment of ECN - there were few enough of that model around by > then > that it could be worked round with pre-deployment validation testing > from the OS. This supports Gorry's "one chance per generation" point. > > Reference > [Wischik08] Wischik, D. "Short Messages" Philosophical > Transactions of > the Royal Society A, 2008, 366, 1941-1953 > > > Bob > > On 11/03/2021 16:17, Gorry Fairhurst wrote: > > Resend: On 11/03/2021 15:06, Gorry Fairhurst wrote: > >> HI, > >> > >> Two observations on ECN in what may have been learned? > >> > >> * I think the "one chance" is per one /Generation/ of equipment > for > >> hardware, possibly less time for software updates - but hard to > >> eliminate deployed critical bugs completely. > >> > >> * "Measure widely before, but important also gently test during > >> initial deployment", to avoid the pain when there are issues and > >> therefore to provide a chance to refine the design if there is a > >> problem, and reduce the pain of trying. > >> > >> I like the additional thought from Eric (in the RG meeting) on > >> mitigating user pain by using fallback techniques, so they do not > >> share your pain when something happens to go wong. > >> > >> ———— > >> > >> Is this a typo?: /Cannot be recovered at TCP layer/ > >> - it seems like a partial sentence, does need a /This.../ > >> > >> Gorry > >> > >> _______________________________________________ > >> Panrg mailing list > >> Panrg@irtf.org <mailto:Panrg@irtf.org> > >> https://www.irtf.org/mailman/listinfo/panrg > <https://www.irtf.org/mailman/listinfo/panrg> > > > > > > _______________________________________________ > > Panrg mailing list > > Panrg@irtf.org <mailto:Panrg@irtf.org> > > https://www.irtf.org/mailman/listinfo/panrg > <https://www.irtf.org/mailman/listinfo/panrg> > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/ <http://bobbriscoe.net/> > > _______________________________________________ > Panrg mailing list > Panrg@irtf.org <mailto:Panrg@irtf.org> > https://www.irtf.org/mailman/listinfo/panrg > <https://www.irtf.org/mailman/listinfo/panrg> > > > _______________________________________________ > Panrg mailing list > Panrg@irtf.org > https://www.irtf.org/mailman/listinfo/panrg -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- [PANRG] On the new text on ECN: draft-irtf-panrg-… Gorry Fairhurst
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Gorry Fairhurst
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Bob Briscoe
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Spencer Dawkins at IETF
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Bob Briscoe
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Holland, Jake
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Gorry Fairhurst
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Spencer Dawkins at IETF
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Spencer Dawkins at IETF
- [PANRG] Updated text on ECN: draft-irtf-panrg-wha… Spencer Dawkins at IETF
- Re: [PANRG] On the new text on ECN: draft-irtf-pa… Gorry Fairhurst
- Re: [PANRG] Updated text on ECN: draft-irtf-panrg… Spencer Dawkins at IETF
- Re: [PANRG] Updated text on ECN: draft-irtf-panrg… Gorry Fairhurst
- Re: [PANRG] Updated text on ECN: draft-irtf-panrg… Spencer Dawkins at IETF