[tsvwg] links to Canary methods for roll-out of new transport features

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Mon, 26 July 2021 22:59 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 40F873A0890 for <tsvwg@ietfa.amsl.com>; Mon, 26 Jul 2021 15:59:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iBbnttE2F9Yl for <tsvwg@ietfa.amsl.com>; Mon, 26 Jul 2021 15:59:10 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [137.50.19.135]) by ietfa.amsl.com (Postfix) with ESMTP id DAA233A09A5 for <tsvwg@ietf.org>; Mon, 26 Jul 2021 15:58:57 -0700 (PDT)
Received: from GF-MBP-2.lan (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id 703EF1B0023D for <tsvwg@ietf.org>; Mon, 26 Jul 2021 23:58:55 +0100 (BST)
References: <09ae8d52-da97-8226-19b2-80e8fe03cfcc@erg.abdn.ac.uk>
To: "tsvwg@ietf.org" <tsvwg@ietf.org>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
X-Forwarded-Message-Id: <09ae8d52-da97-8226-19b2-80e8fe03cfcc@erg.abdn.ac.uk>
Message-ID: <0f72bd2d-a758-befc-02c7-6bb14d4269a2@erg.abdn.ac.uk>
Date: Mon, 26 Jul 2021 23:58:54 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:78.0) Gecko/20100101 Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <09ae8d52-da97-8226-19b2-80e8fe03cfcc@erg.abdn.ac.uk>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/g211gcfQBCB4GE87F3UCdMZWWsI>
Subject: [tsvwg] links to Canary methods for roll-out of new transport features
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2021 22:59:15 -0000

So, there's been a change in the way people roll-out new features, that 
maybe we could say more about in the L4S OPS draft. What I write below 
is not specific to L4S, and I'd really welcome other familiar with using 
and evaluating such methods to chime-in and say more, but anyway here is 
starter:

Canarying is a partial and time-limited deployment of a change in a 
service/protocol and its evaluation as a part of the deployment. The 
method is used throughout the roll-out and helps to decide whether or 
not to continue with the rollout. The part of the service that receives 
the change is “the canary,” and the remainder of the service is “the 
control.” The canary deployment is performed on a small subset of the 
networ/users, than the control. Canarying is evaluated as an A/B testing 
process, to check the impact of the (initial) deployment.

See this from google:

https://urldefense.com/v3/__https://sre.google/workbook/canarying-releases/__;!!LpKI!znKgxfRLAd0ogD6r-X5u7fAz0eInch7MTeJDVQB5drmteEU3mhnP0no1TuTKHR8C$ 
[sre[.]google] [sre.google]
<https://urldefense.com/v3/__https:/sre.google/workbook/canarying-releases/__;!!LpKI!x3qiMCrsIjOFTyJVBmvIh9lHMG6KKpySWLDUWq71wo9nz1gKk0cSHRe1loo_i6Op$>

https://urldefense.com/v3/__https://developer.android.com/distribute/best-practices/launch/test-tracks__;!!LpKI!znKgxfRLAd0ogD6r-X5u7fAz0eInch7MTeJDVQB5drmteEU3mhnP0no1TuinRLVe$ 
[developer[.]android[.]com]
[developer.android.com]
<https://urldefense.com/v3/__https:/developer.android.com/distribute/best-practices/launch/test-tracks__;!!LpKI!x3qiMCrsIjOFTyJVBmvIh9lHMG6KKpySWLDUWq71wo9nz1gKk0cSHRe1lkLr2-F-$>

When working with QUIC  people have released an update to only to a 
small subset of the user base, monitor stability or another metric of 
interest, and
decide whether to roll out the update to more users, to wait for more 
data to come in, or to halt the rollout altogether.If one of the metrics 
you’re monitoring is off, or you check the user reviews and  see issues 
or complaints on a specific topic. You don't need to enable a feature
for anyone/network who you might expect to be hurt.

ECN isn't just "automatically" used, the app can decide (or at least the 
app-supplier), this will always be the case for QUIC anyway. The
result of these tests provide the sort of data that has informed QUIC 
(e.g. Chrome Canary), and I expect the basis of what is reported by
google and others in MAPRG. The point is that this allows statistical 
testing without massive impact, and the incremental roll-out.

This says something about akamai's use:

https://urldefense.com/v3/__https://www.akamai.com/uk/en/products/performance/cloudlets/phased-release.jsp__;!!LpKI!znKgxfRLAd0ogD6r-X5u7fAz0eInch7MTeJDVQB5drmteEU3mhnP0no1TvXdVpGA$ 
[akamai[.]com]
[akamai.com]
<https://urldefense.com/v3/__https:/www.akamai.com/uk/en/products/performance/cloudlets/phased-release.jsp__;!!LpKI!x3qiMCrsIjOFTyJVBmvIh9lHMG6KKpySWLDUWq71wo9nz1gKk0cSHRe1ltQQWNXF$>

Cloudflare, etc have used similar approaches:

https://urldefense.com/v3/__https://medium.com/boozt-tech/canary-release-with-cloudflare-workers-84a9b45bac0f__;!!LpKI!znKgxfRLAd0ogD6r-X5u7fAz0eInch7MTeJDVQB5drmteEU3mhnP0no1TozuZjjw$ 
[medium[.]com]
[medium.com]
<https://urldefense.com/v3/__https:/medium.com/boozt-tech/canary-release-with-cloudflare-workers-84a9b45bac0f__;!!LpKI!x3qiMCrsIjOFTyJVBmvIh9lHMG6KKpySWLDUWq71wo9nz1gKk0cSHRe1lu8IFQaL$>


Gorry