Re: [rfc-i] looking for a volunteer to write a simple script

Joe Touch <touch@strayalpha.com> Fri, 12 July 2019 18:03 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 33506120602 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:03:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.75
X-Spam-Level:
X-Spam-Status: No, score=-4.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=strayalpha.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GCBAvAR_Amot for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:03:07 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CC7031204C2 for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Fri, 12 Jul 2019 11:03:07 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 5C2B8B813F7; Fri, 12 Jul 2019 11:03:02 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 6B6F4B813F7; Fri, 12 Jul 2019 11:03:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=strayalpha.com
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xYl1TmlwRt4P; Fri, 12 Jul 2019 11:03:00 -0700 (PDT)
Received: from server217-3.web-hosting.com (server217-3.web-hosting.com [198.54.115.226]) by rfc-editor.org (Postfix) with ESMTPS id E3882B813F6; Fri, 12 Jul 2019 11:02:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=strayalpha.com; s=default; h=Message-ID:References:In-Reply-To:Subject:Cc: To:From:Date:Content-Type:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Yt0CeErYOLbKf/PQLxPkTXE0txZmXdppcS6iQwR4fOo=; b=w6qRqhT/jShf+db85VdabmZUH 4cNO0HBWtqgmDn498z9/p8NipX531p8oQE0p/IWcnueIRVOVnRNPeNXaQTGh09LNTQ7RSL2wZ8zJm 6CI2PXJS0TPuWXifVvIjAFlIfHjfLaGWO9CTnjmtidRHqsoPfJrnX3X1xrE3PEYR1vm47Ao8WAMWq BMZfDlgwihpJ8BSCtQtb+CuXb8z/2LRc+XUkjL+/DmCsms7RDG66lUl3s1ytVm5rK1/chGyTwbbXi oBmO4yKxCfQVdJdgPJr3MhjqyZj8DFYuEl8eIGfOwW7UDz3TGyFTwBJdf7gA7fy193swgMXq17sOS D2NONL+fg==;
Received: from [::1] (port=36468 helo=server217.web-hosting.com) by server217.web-hosting.com with esmtpa (Exim 4.92) (envelope-from <touch@strayalpha.com>) id 1hlzso-0047EH-II; Fri, 12 Jul 2019 14:03:04 -0400
MIME-Version: 1.0
Date: Fri, 12 Jul 2019 11:02:58 -0700
From: Joe Touch <touch@strayalpha.com>
To: Heather Flanagan <rse@rfc-editor.org>
In-Reply-To: <0504f606252c476f66804e338fa460b4@strayalpha.com>
References: <62c8413d-c735-4ec3-8b22-eb0fa5356636@Spark> <38d0704f-348c-4ec0-9d94-340747960201@Spark> <e86b8894-4d7a-4c9d-3476-0221a94c9eb0@gmx.de> <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org> <0504f606252c476f66804e338fa460b4@strayalpha.com>
Message-ID: <c23139a7261e58cbfc93ac18a3815bad@strayalpha.com>
X-Sender: touch@strayalpha.com
User-Agent: Roundcube Webmail/1.3.7
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server217.web-hosting.com
X-AntiAbuse: Original Domain - rfc-editor.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - strayalpha.com
X-Get-Message-Sender-Via: server217.web-hosting.com: authenticated_id: touch@strayalpha.com
X-Authenticated-Sender: server217.web-hosting.com: touch@strayalpha.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Subject: Re: [rfc-i] looking for a volunteer to write a simple script
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: Julian Reschke <julian.reschke@gmx.de>, RFC Interest <rfc-interest@rfc-editor.org>
Content-Type: multipart/mixed; boundary="===============4786896157497829184=="
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

Quick update: 

perl -0777 -pe
"s/(\b(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)\b)/<bcp14>\$1<\/bcp14>/g"
INFILE.xml > OUTFILE.xml 

Joe

On 2019-07-12 10:43, Joe Touch wrote:

> This will do the trick: 
> 
> perl -0777 -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml 
> 
> (replace INFILE.xml and OUTFILE.xml with your filenames) 
> 
> If you want it to edit in-place (riskly, but simpler if you work on a copy anyway): 
> 
> perl -0777 -i -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
> 
> On 2019-07-12 10:26, Heather Flanagan wrote: 
> 
> On Jul 12, 2019, at 10:23 AM, Julian Reschke <julian.reschke@gmx.de> wrote: 
> On 12.07.2019 18:55, Heather Flanagan wrote:
> Hola a todos!
> 
> The RFC Editor has the need for a comparatively simple script that would
> automatically add <bcp14></bcp14> tags to requirement language in v3 RFCs.
> 
> Specifically, this would take a v3 XML input file, and create a v3 XML
> output file with <bcp14></bcp14> added around each instance of a 2119
> keyword in the file. (MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT,
> SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL)
> 
> Anyone up for helping us out with that?
> 
> Thanks! Heather
> ... 
> The tricky part is to find the right instances. For instance, what if it
> appears in a quote, or in artwork? Or if "SHALL NOT" is across a line
> break...
> 
> So the output will require sanity checking.

Well, yes, of course. We're aiming for a rough pass to catch maybe 80%
of the situations. Everything will still need to be reviewed. 

> I assume that the tool is supposed to preserve whitespace, line breaks
> etc? This essentially rules out running the input through an XML parser...

Seriously, we're not aiming for that robust right now. It doesn't have
to be perfect, it just has to help. 

-Heather 

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest 
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest