Re: [rfc-i] looking for a volunteer to write a simple script

Joe Touch <touch@strayalpha.com> Fri, 12 July 2019 17:43 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B58CC120450 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 10:43:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.75
X-Spam-Level:
X-Spam-Status: No, score=-4.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EJQ8_1nSLUFd for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 10:43:42 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C35A9120134 for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Fri, 12 Jul 2019 10:43:42 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 543A4B81340; Fri, 12 Jul 2019 10:43:37 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id ADF85B81340; Fri, 12 Jul 2019 10:43:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1J6SmAojS3up; Fri, 12 Jul 2019 10:43:34 -0700 (PDT)
Received: from server217-3.web-hosting.com (server217-3.web-hosting.com [198.54.115.226]) by rfc-editor.org (Postfix) with ESMTPS id 30F04B8133E; Fri, 12 Jul 2019 10:43:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=Message-ID:References:In-Reply-To:Subject:Cc: To:From:Date:Content-Type:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3qGeJi6XIkcXYPTVeFpwm5d/gj4KyPBj6I6sVcE8sPc=; b=xB/2dPDo96ihDbZeoyxfLQbQ0 F0k6kuypgUFk/i33B4VfhrbtTNQqWKEQ8OXmVmgnfUUKXvELaINbGbWRImANKgPIDSOLGc4QpOflx tW2bmcjm+PZDP1MHDZCDNQQ0BrjDurmP+Lm+ZE6mGctSYm4OviE3IcgHr3wW0sU3vJ9AgmZk6qDVY +VKW70x2hkCZOG+gwKwXJjZ/veIS70clt9mknCiyhUJIq1tiAw+oiw+5O2HJf0UMLQjy67h5506nj mlto8ur90SUYT/fTBpastfbLhzKVzDrhoIqqGH6onmsx9yYpGKQd78lSUxlyBfwZoCL1iiLhNhwSW B0Tb6uiag==;
Received: from [::1] (port=33414 helo=server217.web-hosting.com) by server217.web-hosting.com with esmtpa (Exim 4.92) (envelope-from <touch@strayalpha.com>) id 1hlza0-003nBy-RB; Fri, 12 Jul 2019 13:43:38 -0400
MIME-Version: 1.0
Date: Fri, 12 Jul 2019 10:43:32 -0700
From: Joe Touch <touch@strayalpha.com>
To: Heather Flanagan <rse@rfc-editor.org>
In-Reply-To: <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org>
References: <62c8413d-c735-4ec3-8b22-eb0fa5356636@Spark> <38d0704f-348c-4ec0-9d94-340747960201@Spark> <e86b8894-4d7a-4c9d-3476-0221a94c9eb0@gmx.de> <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org>
Message-ID: <0504f606252c476f66804e338fa460b4@strayalpha.com>
X-Sender: touch@strayalpha.com
User-Agent: Roundcube Webmail/1.3.7
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - rfc-editor.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Subject: Re: [rfc-i] looking for a volunteer to write a simple script
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: Julian Reschke <julian.reschke@gmx.de>, RFC Interest <rfc-interest@rfc-editor.org>
Content-Type: multipart/mixed; boundary="===============7826964819599074069=="
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

This will do the trick: 

perl -0777 -pe
"s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g"
INFILE.xml > OUTFILE.xml 

(replace INFILE.xml and OUTFILE.xml with your filenames) 

If you want it to edit in-place (riskly, but simpler if you work on a
copy anyway): 

perl -0777 -i -pe
"s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g"
INFILE.xml > OUTFILE.xml

On 2019-07-12 10:26, Heather Flanagan wrote:

> On Jul 12, 2019, at 10:23 AM, Julian Reschke <julian.reschke@gmx.de> wrote: 
> On 12.07.2019 18:55, Heather Flanagan wrote:
> Hola a todos!
> 
> The RFC Editor has the need for a comparatively simple script that would
> automatically add <bcp14></bcp14> tags to requirement language in v3 RFCs.
> 
> Specifically, this would take a v3 XML input file, and create a v3 XML
> output file with <bcp14></bcp14> added around each instance of a 2119
> keyword in the file. (MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT,
> SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL)
> 
> Anyone up for helping us out with that?
> 
> Thanks! Heather
> ... 
> The tricky part is to find the right instances. For instance, what if it
> appears in a quote, or in artwork? Or if "SHALL NOT" is across a line
> break...
> 
> So the output will require sanity checking.

Well, yes, of course. We're aiming for a rough pass to catch maybe 80%
of the situations. Everything will still need to be reviewed. 

> I assume that the tool is supposed to preserve whitespace, line breaks
> etc? This essentially rules out running the input through an XML parser...

Seriously, we're not aiming for that robust right now. It doesn't have
to be perfect, it just has to help. 

-Heather 

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest