Re: [rfc-i] looking for a volunteer to write a simple script

Joe Touch <touch@strayalpha.com> Fri, 12 July 2019 18:33 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C873B12083D for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:33:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.749
X-Spam-Level:
X-Spam-Status: No, score=-4.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lLbYhpeMb-_F for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:33:25 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 124F712083A for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Fri, 12 Jul 2019 11:33:23 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 702B7B80B6C; Fri, 12 Jul 2019 11:33:17 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 451C0B80B6C; Fri, 12 Jul 2019 11:33:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ix4hVG_EoNaC; Fri, 12 Jul 2019 11:33:14 -0700 (PDT)
Received: from server217-3.web-hosting.com (server217-3.web-hosting.com [198.54.115.226]) by rfc-editor.org (Postfix) with ESMTPS id 6085FB80B6B; Fri, 12 Jul 2019 11:33:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=To:References:Message-Id: Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version: Content-Type:Sender:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6WbDoQ0NaCszFzRZV7c4+xQ6WbMHks1XsgsyZ2xhJIU=; b=v6FS03mUPgpHOtCH3Rrey7EwW cf9S6SdiLgqnpMbIGFvqN3hUGAsAd5NWHQfdAYMhGUrLKuKlTkj2fve7l5ZvufC7pccM0k5kbnLCE C/GPbuUOWmFHVROTY0KjgyKXaVFJIvqDAB8TzFjBdYM/v77lQ65ofE4tGBD/i3DIr1iXj+EhsNJ1e /xQAKjH99pUvHuqGitoslsEPhvlq4GaQpNNaZRxmzJPgCQf0Dpu17p3ZKSHm+znTgAC11pdLm+7JR Fv1Eb/CzwxHx4aeSc1EvJ263G/+HwrtrQLlr2LtC3bDZSd6tnkR5/TB6+Qqh2jgAtfstUppRPaIuG FrENK7BcA==;
Received: from [38.64.80.138] (port=56221 helo=[172.21.27.64]) by server217.web-hosting.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from <touch@strayalpha.com>) id 1hm0M4-0007B9-Op; Fri, 12 Jul 2019 14:33:18 -0400
Mime-Version: 1.0 (1.0)
From: Joe Touch <touch@strayalpha.com>
X-Mailer: iPhone Mail (16F203)
In-Reply-To: <01ADB89D-90AF-4672-A8B9-54F5B09E82D4@amsl.com>
Date: Fri, 12 Jul 2019 11:33:12 -0700
Message-Id: <176910E2-56E1-4B3B-8510-797C3560E90A@strayalpha.com>
References: <62c8413d-c735-4ec3-8b22-eb0fa5356636@Spark> <38d0704f-348c-4ec0-9d94-340747960201@Spark> <e86b8894-4d7a-4c9d-3476-0221a94c9eb0@gmx.de> <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org> <0504f606252c476f66804e338fa460b4@strayalpha.com> <c23139a7261e58cbfc93ac18a3815bad@strayalpha.com> <01ADB89D-90AF-4672-A8B9-54F5B09E82D4@amsl.com>
To: Sandy Ginoza <sginoza@amsl.com>
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - rfc-editor.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Subject: Re: [rfc-i] looking for a volunteer to write a simple script
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: Julian Reschke <julian.reschke@gmx.de>, RFC Interest <rfc-interest@rfc-editor.org>, Heather Flanagan <rse@rfc-editor.org>
Content-Type: multipart/mixed; boundary="===============3190931536192574792=="
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

I can fix that in a few minutes...

> On Jul 12, 2019, at 11:29 AM, Sandy Ginoza <sginoza@amsl.com> wrote:
> 
> Thanks Joe!
> 
> I just tested this updated script and it seems to work well.  I am happy to see it catches things like “MAY” (within quotes), but does not tag items Carsten noted (e.g., “MARSHALL”).   
> 
> I note that it does double tag keywords if they’ve already been tagged, but that should be something we can check for before running the script (i.e., check whether the keywords have already been tagged). 
> 
> Thanks!
> Sandy
> 
>> On Jul 12, 2019, at 11:02 AM, Joe Touch <touch@strayalpha.com> wrote:
>> 
>> Quick update:
>> 
>> perl -0777 -pe "s/(\b(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)\b)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
>> 
>> 
>> 
>> Joe
>> 
>>  
>>> On 2019-07-12 10:43, Joe Touch wrote:
>>> 
>>> This will do the trick:
>>> 
>>> 
>>> 
>>> perl -0777 -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
>>> 
>>> (replace INFILE.xml and OUTFILE.xml with your filenames)
>>> 
>>> If you want it to edit in-place (riskly, but simpler if you work on a copy anyway):
>>> 
>>> perl -0777 -i -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
>>> 
>>>  
>>> 
>>> 
>>> On 2019-07-12 10:26, Heather Flanagan wrote:
>>> 
>>> 
>>> 
>>> On Jul 12, 2019, at 10:23 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
>>> 
>>> On 12.07.2019 18:55, Heather Flanagan wrote:
>>> Hola a todos!
>>> 
>>> The RFC Editor has the need for a comparatively simple script that would
>>> automatically add <bcp14></bcp14> tags to requirement language in v3 RFCs.
>>> 
>>> Specifically, this would take a v3 XML input file, and create a v3 XML
>>> output file with <bcp14></bcp14> added around each instance of a 2119
>>> keyword in the file. (MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT,
>>> SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL)
>>> 
>>> Anyone up for helping us out with that?
>>> 
>>> Thanks! Heather
>>> ...
>>> 
>>> The tricky part is to find the right instances. For instance, what if it
>>> appears in a quote, or in artwork? Or if "SHALL NOT" is across a line
>>> break...
>>> 
>>> So the output will require sanity checking.
>>>  
>>> Well, yes, of course. We're aiming for a rough pass to catch maybe 80% of the situations. Everything will still need to be reviewed.
>>> 
>>> 
>>> I assume that the tool is supposed to preserve whitespace, line breaks
>>> etc? This essentially rules out running the input through an XML parser...
>>> Seriously, we're not aiming for that robust right now. It doesn't have to be perfect, it just has to help.
>>>  
>>> -Heather
>>>  
>>> 
>>> 
>>> _______________________________________________
>>> rfc-interest mailing list
>>> rfc-interest@rfc-editor.org
>>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>>> 
>>> _______________________________________________
>>> rfc-interest mailing list
>>> rfc-interest@rfc-editor.org
>>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest@rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
> 
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest