Re: [tsvwg] links to Canary methods for roll-out of new transport features
Martin Duke <martin.h.duke@gmail.com> Fri, 27 August 2021 20:53 UTC
Return-Path: <martin.h.duke@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 786EF3A18D2 for <tsvwg@ietfa.amsl.com>; Fri, 27 Aug 2021 13:53:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FtpR-RHOQOX7 for <tsvwg@ietfa.amsl.com>; Fri, 27 Aug 2021 13:53:11 -0700 (PDT)
Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5EC143A18DD for <tsvwg@ietf.org>; Fri, 27 Aug 2021 13:53:11 -0700 (PDT)
Received: by mail-io1-xd2d.google.com with SMTP id e186so10137358iof.12 for <tsvwg@ietf.org>; Fri, 27 Aug 2021 13:53:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SXzNzTb2RZxiKBfvrGF3SG+OyqRbKEoNzgep9x4mLqs=; b=PJOBxERmoBtbLcjaWaslL4AYk4ZhXkczmSBCJu08pWMNLooQK+nkxIe95fuXqfwF+4 bkmlQSujG99FHlxWy4mh2hOnPrVtaHIhz8X6jxLX4VmjSWioOqVj9baZtvKPWFFF2T3i AfA0803NLZ0rh+MVrxUBpoEjmkk6OJKxhW5fVqdbL73WQe0PuVruzwTURrepSA0NTVBQ il3BedIbv6TJUH/2mHmhkiAZZFDQsDWHnFHRkJyCoeGbwG/AUzKzmdtiuj0jDGqBQhvF gcnWRl7dFmWN4B6O1yLjTRPKtFRrhH1+ae8GC+JnXBNTPDeTEgIB/8rfDFtY+tvxhdNc A1GA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SXzNzTb2RZxiKBfvrGF3SG+OyqRbKEoNzgep9x4mLqs=; b=YkqUIXYkUSn91rK2vpve8FftZSOHDQ+X6HJS4/JMrMq5klu7XILgwR86xtElHIzDNf /GGeCkwdXeD4bEL4XSo6YSkCXZWnCzU4KE/Yo7ePpie24qsTzB+gz0FQG1GDX5jZHnXO Nv+TZkX/fPAVngWJiUipKEGuOXn4alB6UZ0hUWAQa1LNZLrfNn5Q6l0TVz9SkSifTALI Q2+cNxYhNeqTgNtwXp941bexSPiZssLeBv29Fb6XffnsYaWs54ZTEeQkEfYCiyENtp+R gXuI0It7/9A7usYnystpjuM9wkgTUm5dovjb4HcCTcPJZ+2/6t2BYr6CywSeJvv/17e1 Fu4w==
X-Gm-Message-State: AOAM531CqR040lnUA0YFLAQMpHriJNhj/7Sxu4+0ZPSpUD6v8wEH9Wg9 Y5Rn0dW2NxV/cQtxi7CUlochhvUHZBzyOXouAFg=
X-Google-Smtp-Source: ABdhPJwuho+8hFkgrR8WGgZxF5UAuOrXExmwBof30KHoTe7Aqw9DI4lhzU+4j1xpke9cpxYQJiOIAfp2AE91t70yjI0=
X-Received: by 2002:a02:cf34:: with SMTP id s20mr9957645jar.121.1630097590039; Fri, 27 Aug 2021 13:53:10 -0700 (PDT)
MIME-Version: 1.0
References: <AF731D2C-B796-4B20-973D-6DB496DB1228@akamai.com> <232F9BFA-0D05-48C5-807E-FA2A7904754A@erg.abdn.ac.uk> <eg5mzk.qx1zf8.0-qmf@smtp.gmail.com> <de1017ec-d437-4c61-9f9c-7d237eee8fcb@erg.abdn.ac.uk> <CAM4esxSf9F86dYKW5jg8AW-m7aa8bkcgkfANzwvtVBeAdYDAbA@mail.gmail.com> <15AD8CE4-F403-4837-86EF-86D3BFA70FBF@akamai.com>
In-Reply-To: <15AD8CE4-F403-4837-86EF-86D3BFA70FBF@akamai.com>
From: Martin Duke <martin.h.duke@gmail.com>
Date: Fri, 27 Aug 2021 13:52:59 -0700
Message-ID: <CAM4esxS_bv-NW53mv1D=-dSVzUPyqUBEuUvBkszqdiVM9q6o9w@mail.gmail.com>
To: "Holland, Jake" <jholland@akamai.com>
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Jonathan Morton <chromatix99@gmail.com>, tsvwg <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000cdb92c05ca90abd3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/jan2zsWH4gyVIKdd95qlr3vKD7Q>
Subject: Re: [tsvwg] links to Canary methods for roll-out of new transport features
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Aug 2021 20:53:19 -0000
Hi Jake, I don't think I was clear enough. The assumption in my proposal is that in a large deployment these things will average out. So there is no intent to create additional connections solely to detect self-harm. As large deployments will likely have multiple connections to each endpoint and even larger pools of connections through important bottlenecks, I assert that if a provider (1) establishes a Not-ECT and ECT(0) baseline; (2) Randomly turns on ECT(1) for 1/3 (?) of flows; and (3) Both Not-ECT and ECT(0) meet some standard of non-degradation/non-starvation in the aggregate. then that is strong evidence that the hypothesis that the problematic 3168 queueing is not operating on the internet. As for including both Not-ECT and ECT(0) in the baseline, I seem to recall someone making separate claims about the impact of L4S on each; if that's inaccurate, I have no objection to collapsing the control into one or the other. I hope that's clearer. Meanwhile, I am not crystal-clear as to whether the proponents intend to deploy L4S endpoints with Prague, something else that meets the Prague requirements, or nothing at all. Warmly, Martin On Sat, Aug 21, 2021 at 3:39 PM Holland, Jake <jholland@akamai.com> wrote: > Hi Martin, > > From: Martin Duke <martin.h.duke@gmail.com> > > Date: Fri,2021-08-13 at 10:14 AM > > The "control" part of the canary experiment is how you identify > collateral damage. The L4S hypotheses, informally, are that > > (1) L4S flows will experience lower latency, etc > > (2) Not-ECT flows, and 3168 flows that traverse most queueing > configurations, will not suffer significant degradation. > > (3) 3168 flows will only suffer starvation in queue configurations that > are not deployed at scale in the internet. > > > > So the proper design of a canary deployment, IMO, is in two steps: > > (a) Have servers choose between Not-ECT and ECT(0) to establish a > baseline. > > (b) servers choose between Not-ECT, ECT(0), and ECT(1) and compare the > results. > > I don't think I understand this suggestion. > > I’m not sure if you’re saying when TCP Prague is turned on, the > server would endeavor to actively send simultaneous competing > non-L4S traffic on secondary flows to the same receiver so they > can detect whether there’s impairment in the secondary flows, like > in the “Out of Band Active Detection” case in the Feb 2021 > detection paper (https://arxiv.org/pdf/1911.00710.pdf)? If so, > this is somewhat more complex than most canary systems--from the > sender you don’t generally have a way to open a second connection > correlated with the first (e.g. when a client is making web requests > to a load-balanced service) so it would need a lot of extra cross- > flow coordination features in the server as compared to a usual > canary test. > > But I also don’t see how selecting between ECT(0) and Not-ECT when > establishing a baseline is helpful here, since Non-ECT flows will > get starved the same as ECT(0) flows if they’re traversing a shared > marking classic queue in competition with TCP Prague traffic, > whereas the suggestion makes it seem like the selection between > ECT(0) and N-ECT is meant to do something helpful toward noticing > problems. (Possible source of confusion: by "3168 flows" in 3), did > you mean ECT(0) flows, thinking only those would be impacted? Or did > you mean classic traffic in general across a 3168 marking queue?) > > This leads me to believe that either I don’t understand the intended > suggestion, or that it didn’t incorporate some of the necessary > considerations of the complicating factors discussed earlier in the > thread. > > While the “self-harm” case should in principle be possible to notice > when you do manage to get simultaneous flows for your Prague and > classic traffic across the same marking bottleneck, the “being harmed > by others” case would actually be worse than invisible to the canary, > in that it would corrupt the baseline data and tend to make even the > self-harm detection cases look like noise. If you had some legit > problem paths with shared marking queues, you’d have results that > look like “yes ECT(0) and N-ECT get impaired for this receiver when > competing with our Prague flows, but they also get impaired randomly > even when we’re not sending competing Prague traffic” (for instance, > when someone else is sending Prague traffic or unresponsive traffic). > > For these reasons (and also the reasons Jonathan outlined in response > to Gorry's question about the difference between this and other CC- > related work), I don't see how the canary strategy can be used > effectively here, so to me the detection strategies are a better fit > to the problem (though they do have some of their own challenges, as > previously discussed). > > With that said, canary testing would still of course be useful to > detect implementation flaws where L4s/Prague flows systematically > suffer unexpected impairment and I expect would be used as a matter > of course, I'm just saying it would not help with the safety concerns > we've been discussing to detect harm to other people's traffic, as I > don't think they'd be able to see the collateral damage they're > causing. > > Best regards, > Jake > > >
- [tsvwg] links to Canary methods for roll-out of n… Gorry Fairhurst
- Re: [tsvwg] links to Canary methods for roll-out … Holland, Jake
- Re: [tsvwg] links to Canary methods for roll-out … Gorry (erg)
- Re: [tsvwg] links to Canary methods for roll-out … Jonathan Morton
- Re: [tsvwg] links to Canary methods for roll-out … Gorry Fairhurst
- Re: [tsvwg] links to Canary methods for roll-out … Martin Duke
- Re: [tsvwg] links to Canary methods for roll-out … Holland, Jake
- Re: [tsvwg] links to Canary methods for roll-out … Martin Duke
- Re: [tsvwg] links to Canary methods for roll-out … Jonathan Morton