Re: [tsvwg] links to Canary methods for roll-out of new transport features

"Holland, Jake" <jholland@akamai.com> Fri, 30 July 2021 02:02 UTC

Return-Path: <jholland@akamai.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2C6633A159D for <tsvwg@ietfa.amsl.com>; Thu, 29 Jul 2021 19:02:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.249
X-Spam-Level:
X-Spam-Status: No, score=-3.249 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.452, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vs_AX4AYH9kc for <tsvwg@ietfa.amsl.com>; Thu, 29 Jul 2021 19:01:58 -0700 (PDT)
Received: from mx0b-00190b01.pphosted.com (mx0b-00190b01.pphosted.com [IPv6:2620:100:9005:57f::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1B8863A1599 for <tsvwg@ietf.org>; Thu, 29 Jul 2021 19:01:57 -0700 (PDT)
Received: from pps.filterd (m0050102.ppops.net [127.0.0.1]) by m0050102.ppops.net-00190b01. (8.16.0.43/8.16.0.43) with SMTP id 16U1xtZA010192; Fri, 30 Jul 2021 03:00:39 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=jan2016.eng; bh=paw3Imw71DaSYZe4SiXbL0DLYer9biTOZWycZVZWxkE=; b=brINz1TiYjc2inr+gagKZ6fsD1NCuJsWdnSJLBYOdksjlQY2y5vuNcWi/2k+yZ3DO03V dlUMNPFY1GW0kLLHonO0caafyDgE5cp9Uw+eZX51M7D1B/MDJ4xg4esSny8ztSb7yjkt +w3KDmXpFFJ+sKk/n2x4nUMY1obE435iHMKcZ9rCintVyqBxn9f0BAkPoIfbTHeH2vGf lxA5i4hopAZ2zlYaapJC8QpKYVYTmkVizAPOyWPjfZW/GBZ2SrDC5kVGBBLh9RO3uGcC NA4VvdlWcZdwDhyFvwUWJpc1stDKwIut+Ap/7z4yx0e/GjOmLDRXeseYSpLcRboQR7Wk bQ==
Received: from prod-mail-ppoint3 (a72-247-45-31.deploy.static.akamaitechnologies.com [72.247.45.31] (may be forged)) by m0050102.ppops.net-00190b01. with ESMTP id 3a46seaqqj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Jul 2021 03:00:39 +0100
Received: from pps.filterd (prod-mail-ppoint3.akamai.com [127.0.0.1]) by prod-mail-ppoint3.akamai.com (8.16.1.2/8.16.1.2) with SMTP id 16U1nhYQ028520; Thu, 29 Jul 2021 22:00:38 -0400
Received: from email.msg.corp.akamai.com ([172.27.165.113]) by prod-mail-ppoint3.akamai.com with ESMTP id 3a36phhucy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 29 Jul 2021 22:00:38 -0400
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com (172.27.165.122) by ustx2ex-dag1mb3.msg.corp.akamai.com (172.27.165.121) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Thu, 29 Jul 2021 21:00:37 -0500
Received: from USTX2EX-DAG1MB4.msg.corp.akamai.com ([172.27.165.122]) by ustx2ex-dag1mb4.msg.corp.akamai.com ([172.27.165.122]) with mapi id 15.00.1497.023; Thu, 29 Jul 2021 21:00:37 -0500
From: "Holland, Jake" <jholland@akamai.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] links to Canary methods for roll-out of new transport features
Thread-Index: AQHXgnHiUvT/blkBJkOyxf9UjjV2uqtapvAA
Date: Fri, 30 Jul 2021 02:00:36 +0000
Message-ID: <AF731D2C-B796-4B20-973D-6DB496DB1228@akamai.com>
References: <09ae8d52-da97-8226-19b2-80e8fe03cfcc@erg.abdn.ac.uk> <0f72bd2d-a758-befc-02c7-6bb14d4269a2@erg.abdn.ac.uk>
In-Reply-To: <0f72bd2d-a758-befc-02c7-6bb14d4269a2@erg.abdn.ac.uk>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.51.21071101
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.27.118.139]
Content-Type: text/plain; charset="utf-8"
Content-ID: <C9852BDF6872EA45B10E609ECD73EA33@akamai.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-29_20:2021-07-29, 2021-07-29 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2107300009
X-Proofpoint-GUID: pHKca6DlvDuEsMffWd-Yo_vKAYqgjnZn
X-Proofpoint-ORIG-GUID: pHKca6DlvDuEsMffWd-Yo_vKAYqgjnZn
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-29_20:2021-07-29, 2021-07-29 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 bulkscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 phishscore=0 impostorscore=0 clxscore=1015 suspectscore=0 adultscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2107300010
X-Agari-Authentication-Results: mx.akamai.com; spf=${SPFResult} (sender IP is 72.247.45.31) smtp.mailfrom=jholland@akamai.com smtp.helo=prod-mail-ppoint3
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/UdKAJpv3ho8vEl0upWarG4TnQsQ>
Subject: Re: [tsvwg] links to Canary methods for roll-out of new transport features
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2021 02:02:03 -0000

Hi Gorry,

A canary approach works very well for features that exhibit problems
on the connections that are using a new feature.

However, it works less well for features that cause problems with
competing traffic, as is the case for TCP Prague flows traversing
a classic queue.

The issue is that if the new feature (such as a TCP Prague
endpoint) is causing a problem for competing flows, possibly flows
from another different operator, there is no indication of the
problem that's visible to the TCP Prague sender except perhaps
service calls when the issue can be traced back to the source by
someone troubleshooting the impacted services.  From the experimental
endpoint's point of view, the performance is working great.  (Higher
throughput than usual!  Not much loss!)

So it's somewhat more challenging to build a canary that can detect
problems with competing impacted traffic.  This is in contrast to the
usage of canaries for most services, which will alert when the
experimental service or upgrade introduces a regression for that
service.

This problem is closely related to the difficulty of detecting a
classic queue, and in most cases would suffer the same pathologies.

This is covered to some extent in the first resource you linked,
under "Requirements of a Canary Process", and again under "Metrics
Should Indicate Problems".

I hope that's helpful.

Best regards,
Jake


On 07-26, 3:59 PM, "Gorry Fairhurst" <gorry@erg.abdn.ac.uk> wrote:

So, there's been a change in the way people roll-out new features, that 
maybe we could say more about in the L4S OPS draft. What I write below 
is not specific to L4S, and I'd really welcome other familiar with using 
and evaluating such methods to chime-in and say more, but anyway here is 
starter:

Canarying is a partial and time-limited deployment of a change in a 
service/protocol and its evaluation as a part of the deployment. The 
method is used throughout the roll-out and helps to decide whether or 
not to continue with the rollout. The part of the service that receives 
the change is “the canary,” and the remainder of the service is “the 
control.” The canary deployment is performed on a small subset of the 
networ/users, than the control. Canarying is evaluated as an A/B testing 
process, to check the impact of the (initial) deployment.

See this from google:

https://sre.google/workbook/canarying-releases/

https://developer.android.com/distribute/best-practices/launch/test-tracks 

When working with QUIC  people have released an update to only to a 
small subset of the user base, monitor stability or another metric of 
interest, and
decide whether to roll out the update to more users, to wait for more 
data to come in, or to halt the rollout altogether.If one of the metrics 
you’re monitoring is off, or you check the user reviews and  see issues 
or complaints on a specific topic. You don't need to enable a feature
for anyone/network who you might expect to be hurt.

ECN isn't just "automatically" used, the app can decide (or at least the 
app-supplier), this will always be the case for QUIC anyway. The
result of these tests provide the sort of data that has informed QUIC 
(e.g. Chrome Canary), and I expect the basis of what is reported by
google and others in MAPRG. The point is that this allows statistical 
testing without massive impact, and the incremental roll-out.

This says something about akamai's use:

https://www.akamai.com/uk/en/products/performance/cloudlets/phased-release.jsp 

Cloudflare, etc have used similar approaches:

https://medium.com/boozt-tech/canary-release-with-cloudflare-workers-84a9b45bac0f 


Gorry