Re: [Last-Call] OT: change BCP 83 [Re: Last Call: BCP 83 PR-Action Against Dan Harkins]

Colin Perkins <csp@csperkins.org> Wed, 05 October 2022 09:07 UTC

Return-Path: <csp@csperkins.org>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A76D9C14CE3E; Wed, 5 Oct 2022 02:07:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.406
X-Spam-Level:
X-Spam-Status: No, score=-4.406 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=csperkins.org
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kYfSPFtNyWj2; Wed, 5 Oct 2022 02:07:46 -0700 (PDT)
Received: from mx2.mythic-beasts.com (mx2.mythic-beasts.com [IPv6:2a00:1098:0:82:1000:0:2:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3E01DC14F738; Wed, 5 Oct 2022 02:07:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=csperkins.org; s=mythic-beasts-k1; h=Date:Subject:To:From; bh=nczbqcmiE+Bp6fdkINIdAJKjrre0I9j2VQ9GNtbrQGU=; b=VqpQ7TgQb0WmZPcHX1co2KCVX/ IqDRE3tU8TMN4J8sJVqDb4VBLxsE3jDZFtinB8Ive5Eab3C65KL1jLJPp211uVV1j+g8Zu8h99OnF Dxuh9wEEvDzZaGG6+qDXrfsTSFNnJum5pljrMg5KLnpBEQJAck0mM0zokUljEDu7WHxMKsI0eD098 lMfP4kjTdjuQqJ2IBXxi4rudRIrh0qM2mGUfi2ZTnOaqfVgluEF/Jd1gLQTHFA11fhVrFtqVynHX5 Se2i96M+1+JD3wnhQ/JgsYbrawfUql+gn3h7Ey+rfvelD8zbmlZJbnsIlSPklst2bLBpK8lTsxMU0 SeZwCuWg==;
Received: from [81.187.2.149] (port=44886 helo=[192.168.0.72]) by mailhub-hex-d.mythic-beasts.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <csp@csperkins.org>) id 1og0NQ-005EpY-M2; Wed, 05 Oct 2022 10:07:41 +0100
From: Colin Perkins <csp@csperkins.org>
To: Adam Roach <adam@nostrum.com>
Cc: Stephen Farrell <stephen.farrell@cs.tcd.ie>, Eliot Lear <lear@lear.ch>, Mladen Karan <m.karan@qmul.ac.uk>, last-call@ietf.org, IETF Chair <chair@ietf.org>, Ravi Shekhar <r.shekhar@qmul.ac.uk>
Date: Wed, 05 Oct 2022 10:07:20 +0100
X-Mailer: MailMate (1.14r5920)
Message-ID: <E36A56C6-1FF0-44A8-A0E5-182C5094F666@csperkins.org>
In-Reply-To: <066d24b4-ba69-ef73-5038-f67a9e112f0e@nostrum.com>
References: <CFE25E25-D131-468E-9923-80350D6216F3@ietf.org> <3e0356f6-8288-2ab4-ef77-52bda4ad54cf@nostrum.com> <76f3ef5e-13d0-7b0d-2b94-8f3085e06344@lear.ch> <69cff9aa-9540-b369-06d6-5cee531852f0@nostrum.com> <0a825d9b-c9c8-bb41-7141-6459f07ee531@cs.tcd.ie> <066d24b4-ba69-ef73-5038-f67a9e112f0e@nostrum.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_MailMate_536A3334-6571-4ECB-BEFB-5E828C85F9D3_="
Content-Transfer-Encoding: 8bit
X-BlackCat-Spam-Score: 0
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/4ZqB9DHsC5roTLBXgrJremfUqvQ>
Subject: Re: [Last-Call] OT: change BCP 83 [Re: Last Call: BCP 83 PR-Action Against Dan Harkins]
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Oct 2022 09:07:51 -0000

Hi Adam,

On 2 Oct 2022, at 20:42, Adam Roach wrote:
> On 10/2/22 04:26, Stephen Farrell wrote:
>> I wonder if there's any less subjective metric that could be
>> applied to mailing list archives?
>
>
> If you're offering to put in the footwork, the general outline of what 
> I know how to do would take one of two paths. Both would start with 
> getting as complete a copy of the email archives as possible. This 
> used to be easily found online, and may yet be; but if it isn't, I'm 
> sure the tools team could give you assistance.

The IETF sill maintains IMAP access to the mail archive for lists hosted 
on ietf.org, so access to the emails is straightforward.

> Then you either:
>
>  * Take a suitably large random sample of messages over the past 37
>    years (work out the size of the corpus and determine what you want
>    your confidence interval to be), and assign a team to score which
>    ones they believe meet some relevant criteria (e.g., violate 
> today's
>    code of conduct). You'll want at least two people -- and preferably
>    more -- of differing backgrounds to look at each message to
>    countervail certain kinds of biases. Or

This would be time-consuming and expensive, but would likely give an 
interesting result.

>  * Use one of the several available forum management tools to
>    automatically score each message. Details vary, but most such tools
>    will generate both "toxicity" and "sentiment" scores that you can
>    plot over time. The ones I'm familiar with are run as a service, so
>    you'd need to perform some light API integration (which might be as
>    easy as piping formail into a curl command); although it's entirely
>    possible that offline tools are also available.
>
> Again, I know how to do this, but can't invest the resources. Let me 
> know if you're earnest, and I'll happily consult with you on getting 
> it to work.

I’m part of [a project](https://sodestream.github.io) that’s doing 
mailing list analysis of IETF data. and the recent [IAB AID 
workshop](https://www.iab.org/activities/workshops/aid/) also explored 
this topic.

We haven’t spent too much time looking at sentiment analysis, but my 
colleagues took a quick look at messages on the ietf@ietf.org list.

The plots below show the average extent, expressed in the range 0…1, 
to which text in emails sent to that list in each year rate as positive, 
negative, or neutral sentiment, according to the [VADER Sentiment 
Analysis](https://github.com/cjhutto/vaderSentiment) library:


![](cid:213515C1-D217-49DF-917F-1917027BFFE5@csperkins.org 
"Unknown.png")

Redrawing the plot with a different range, to focus on the positive and 
negative sentiment categories, it’s clear that messages labelled as 
positive sentiment outweigh those labelled as negative, but there’s a 
significant fraction of negativity. Proportions don’t look to be 
changing significantly over time.

![](cid:DEBEA8F9-D09E-400E-B002-2092F65EF089@csperkins.org 
"Unknown.png")

Sentiment analysis, of course, is a crude measure that doesn’t 
necessarily correlate with toxicity. It’d be interesting to analyse 
further, and look at the other mailing lists too.

If anyone’s interested in exploring the data further, [our 
project](https://sodestream.github.io)  will be at the hackathon at the 
London IETF in a few weeks - come talk to us.

Colin
(with thanks to Mladen Karan and Ravi Shekhar, cc’d, for wrangling the 
data)