Re: [PANRG] On the new text on ECN: draft-irtf-panrg-what-not-to-do-16

Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com> Thu, 11 March 2021 23:37 UTC

Return-Path: <spencerdawkins.ietf@gmail.com>
X-Original-To: panrg@ietfa.amsl.com
Delivered-To: panrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D1EDC3A13BF for <panrg@ietfa.amsl.com>; Thu, 11 Mar 2021 15:37:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nmY_G2BC9ZMr for <panrg@ietfa.amsl.com>; Thu, 11 Mar 2021 15:37:39 -0800 (PST)
Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E59A83A13BC for <panrg@irtf.org>; Thu, 11 Mar 2021 15:37:38 -0800 (PST)
Received: by mail-yb1-xb2f.google.com with SMTP id n195so23478940ybg.9 for <panrg@irtf.org>; Thu, 11 Mar 2021 15:37:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o34He/DIu/vgDIS/BTewnZPylDlvf93H1Z9NHWw7f2U=; b=jIm533IGAYc74oPAUc/MLdwn1/TsWLz7M3sE8mIVI0guz6+HyHDWjnujePz2iezvcR F+tADqfIPXqXi2hofW9LIyu/h5i/NwqP1Sy06JDyM1KjkQekraYM0AJSbDGCbqdomOXq mOnu0Mq2jyPu6MOsDGBkKll4r8AKDXR2kiB7ho6eBoUM91QNkjtS/2e12I6LSuknrB46 ejfzq7kt6xErEEh0Y10wmomuoVzc27ivTwAApoDxr0LqxqHG2+CGx+OZHRCofTKrb+UV XoaWpVoNp/qPf5J2LNHu2WnjQ2eJCDVOy1V7XJLHH6nbRud5dA+dhWeJbnPkxGuXPoel 0xAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o34He/DIu/vgDIS/BTewnZPylDlvf93H1Z9NHWw7f2U=; b=NyhMuBeIr88xOnNxovlru1jV5k6JgDrCRnwTlUz0XUwjWus2WUXYwKPwJV8xtsMsHZ 6nhjBr8jCLbwgSCfOR1POOuvKP0B4kI+CLGlSlYohW1DIqzl/pK+UF83UQwOsccPk7+g HVYuE4yv/s9bEXVJUAd/PcMmct/TQJUpBJqf78ZdJFCICVjhwwj1Im88+gqis1UaT77y sKiDxLBt2nWAsal4ZN04ydQ9fWMpilluxRLzbx9I8weTHseevS/p0eDEcahxKU7bw3O+ 3mUZhusoNRNX91YWc57jEEuTV6qNU50jit90bMIfV9WgLZGGE46oBTcLN4/rR4jWTANj owow==
X-Gm-Message-State: AOAM530VOFbdFaHr9L8d2LD1qdsMoPuntYx6lEncAbeQnuMtrXS1vZpI kEdI34d+bYsDW5zrN2hltS6ZxeWQYdVzC9orxz/0O2pDwGk=
X-Google-Smtp-Source: ABdhPJwu/EwDk26i6PdeoXKuLoINEqJsv0Xp6YbF0F2y08XlZQzT89HU4FUHY2GWc4OR+GwPjZve0oqTB7B7n1kQn9Y=
X-Received: by 2002:a25:4444:: with SMTP id r65mr15701094yba.84.1615505857231; Thu, 11 Mar 2021 15:37:37 -0800 (PST)
MIME-Version: 1.0
References: <d072512f-ed66-bb8b-7338-58ed720210a2@erg.abdn.ac.uk> <45a9076d-d573-b976-8465-3e731081169a@erg.abdn.ac.uk> <455c5465-7a2a-211e-a712-1c6b412c73b4@bobbriscoe.net>
In-Reply-To: <455c5465-7a2a-211e-a712-1c6b412c73b4@bobbriscoe.net>
From: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Date: Thu, 11 Mar 2021 17:37:11 -0600
Message-ID: <CAKKJt-fomvCaK+8LW=UmC2VcAXBD2KKhA_he6iys8OvcLoOZBA@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: panrg@irtf.org
Content-Type: multipart/alternative; boundary="000000000000c0c10905bd4b448e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/panrg/VvUSzOKrHRiJ4t9kXC0QFV0LcsE>
Subject: Re: [PANRG] On the new text on ECN: draft-irtf-panrg-what-not-to-do-16
X-BeenThere: panrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Path Aware Networking \(Proposed\) Research Group discussion list" <panrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/panrg>, <mailto:panrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/panrg/>
List-Post: <mailto:panrg@irtf.org>
List-Help: <mailto:panrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/panrg>, <mailto:panrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Mar 2021 23:37:42 -0000

Thanks for continuing this discussion on the mailing list. If I might add
this ...

On Thu, Mar 11, 2021 at 11:13 AM Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Gorry, and panrg,
>
> I'd agree with Gorry, about one chance per generation.
>

My understanding (and we're talking about this to understand it better) is
that there are two things in play with the early ECN deployment experiment.

Yes, routers crashing when they say non-zero ECN bits was in a device which
(at the time) wasn't likely to be updated or replaced for a number of years
(see "equipment generation"), but the second point (and I think it was
Kireeti who said this well) was that once you've updated or replaced all of
the problematic equipment, the people who turned ECN off didn't immediately
turn ECN back on (I remember the phrase "a bad taste in their mouths"
mentioned multiple times.

So, as Kireeti and others observed, generations between updates are getting
shorter, but it's not clear that the ability of people to forgive and
forget past experiences is getting shorter in the same way. I'm remembering
the quote from Oscar Wilde, "Second marriage is the triumph of hope over
experience" - I think we're talking about the same thing as "second
marriage".

Best,

Spencer


> I'd also say that there are two main prongs to the ECN story. Not just
> the "one chance" point.
> The other is "Outperforming End-to-end Protocol Mechanisms"
>
> When I was asked to write the business case for deploying ECN in BT, my
> colleagues in the tech strategy team pushed me on this point of
> outperforming e2e mechanisms. Jamal Salim's performance evaluation
> [RFC2884] showed ECN gave no performance benefit, except for short
> flows. I had to admit that e2e FEC for short flows would be more likely
> to win out. And at the time, I'd just found Damon Wischik paper about
> how little the extra percentage of traffic volume would be, if every
> sender just duplicated the first few packets of each flow [Wischik08].
>
> So the business case flat-lined. End systems had long since worked out
> tonnes of loss-hiding tricks. There was far too great a risk that, by
> the time the 3-part deployment had got anywhere (sender, receiver and
> network), the problem would have been solved e2e, and the high-risk
> high-cost investment would all have been wasted.
>
> This was actually the start of my journey to realize that the "ECN =
> drop" rule was the problem, 'cos it disallowed the real benefit of ECN -
> to cut queuing delay, by providing a finer-grained signal than loss. I
> did some calculations to work out that the noise in the delay signal was
> too great to get queue delay down as low as would be achievable with ECN
> (which could use virtual queues once ISPs realized that bandwidth had
> become plentiful enough). That was actually when I started working on
> chirping (bringing in Mirja as a research fellow) and came to the
> conclusion that we could only get a better delay signal out of the noise
> by creating more noise with the chirps. That's when I realized the
> chirps should only be used at start-up, and ECN was the only way to keep
> queueing extremely low under load. Today you can see the limits of using
> e2e delay to reduce queuing in BBR, although of course BBR is unlikely
> to be the last word in e2e delay reduction.
>
> I know it's unlikely that every ISP went through such a rigorous
> exercise, but still none deployed it - probably intuitively reaching the
> same conclusion. The main reason being the deployment pain wasn't worth
> the small performance gain. Which is a TL;DR summary of all the above.
>
> The issue with Linksys home routers crashing wasn't a biggie in that
> assessment of ECN - there were few enough of that model around by then
> that it could be worked round with pre-deployment validation testing
> from the OS. This supports Gorry's "one chance per generation" point.
>
> Reference
> [Wischik08] Wischik, D. "Short Messages" Philosophical Transactions of
> the Royal Society A, 2008, 366, 1941-1953
>
>
> Bob
>
> On 11/03/2021 16:17, Gorry Fairhurst wrote:
> > Resend: On 11/03/2021 15:06, Gorry Fairhurst wrote:
> >> HI,
> >>
> >> Two observations on ECN in what may have been learned?
> >>
> >> * I think the "one chance" is per one /Generation/ of equipment for
> >> hardware, possibly less time for software updates - but hard to
> >> eliminate deployed critical bugs completely.
> >>
> >> * "Measure widely before, but important also gently test during
> >> initial deployment", to avoid the pain when there are issues and
> >> therefore to provide a chance to refine the design if there is a
> >> problem, and reduce the pain of trying.
> >>
> >> I like the additional thought from Eric (in the RG meeting) on
> >> mitigating user pain by using fallback techniques, so they do not
> >> share your pain when something happens to go wong.
> >>
> >> ————
> >>
> >> Is this a typo?: /Cannot be recovered at TCP layer/
> >> - it seems like a partial sentence, does need a /This.../
> >>
> >> Gorry
> >>
> >> _______________________________________________
> >> Panrg mailing list
> >> Panrg@irtf.org
> >> https://www.irtf.org/mailman/listinfo/panrg
> >
> >
> > _______________________________________________
> > Panrg mailing list
> > Panrg@irtf.org
> > https://www.irtf.org/mailman/listinfo/panrg
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
> _______________________________________________
> Panrg mailing list
> Panrg@irtf.org
> https://www.irtf.org/mailman/listinfo/panrg
>