Re: [rfc-i] looking for a volunteer to write a simple script

Carsten Bormann <cabo@tzi.org> Fri, 12 July 2019 17:53 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D7B61207E5 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 10:53:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.949
X-Spam-Level:
X-Spam-Status: No, score=-4.949 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w7IlncxW4goS for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 10:53:07 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D02A6120896 for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Fri, 12 Jul 2019 10:53:03 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 5264DB81396; Fri, 12 Jul 2019 10:52:58 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 995A0B81396; Fri, 12 Jul 2019 10:52:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S-CdFX2lhIYd; Fri, 12 Jul 2019 10:52:55 -0700 (PDT)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) by rfc-editor.org (Postfix) with ESMTPS id B7FC2B81395; Fri, 12 Jul 2019 10:52:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost2.informatik.uni-bremen.de [134.102.200.7]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id x6CHqmR3023508; Fri, 12 Jul 2019 19:52:53 +0200 (CEST)
Received: from [192.168.217.102] (p548DCE40.dip0.t-ipconnect.de [84.141.206.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 45lgW45rDPz1Bp8; Fri, 12 Jul 2019 19:52:48 +0200 (CEST)
Mime-Version: 1.0 (1.0)
From: Carsten Bormann <cabo@tzi.org>
X-Mailer: iPhone Mail (16F203)
In-Reply-To: <0504f606252c476f66804e338fa460b4@strayalpha.com>
Date: Fri, 12 Jul 2019 19:52:47 +0200
Message-Id: <A8576629-E352-4016-ACA7-1B7160370E26@tzi.org>
References: <62c8413d-c735-4ec3-8b22-eb0fa5356636@Spark> <38d0704f-348c-4ec0-9d94-340747960201@Spark> <e86b8894-4d7a-4c9d-3476-0221a94c9eb0@gmx.de> <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org> <0504f606252c476f66804e338fa460b4@strayalpha.com>
To: Joe Touch <touch@strayalpha.com>
Subject: Re: [rfc-i] looking for a volunteer to write a simple script
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: Julian Reschke <julian.reschke@gmx.de>, RFC Interest <rfc-interest@rfc-editor.org>, Heather Flanagan <rse@rfc-editor.org>
Content-Type: multipart/mixed; boundary="===============2893650539121651957=="
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

I would spend a couple of \b (or \< and \>, I don't remember Perl well enough), so MARSHALL in MAYNARD isn't hit. (Theresa is hopeless, but we knew that.)

Sent from mobile

> On 12. Jul 2019, at 19:43, Joe Touch <touch@strayalpha.com> wrote:
> 
> This will do the trick:
> 
> 
> 
> perl -0777 -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
> 
> (replace INFILE.xml and OUTFILE.xml with your filenames)
> 
> If you want it to edit in-place (riskly, but simpler if you work on a copy anyway):
> 
> perl -0777 -i -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
> 
>  
> 
> 
>> On 2019-07-12 10:26, Heather Flanagan wrote:
>> 
>> 
>> 
>>> On Jul 12, 2019, at 10:23 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
>>> 
>>> On 12.07.2019 18:55, Heather Flanagan wrote:
>>> Hola a todos!
>>> 
>>> The RFC Editor has the need for a comparatively simple script that would
>>> automatically add <bcp14></bcp14> tags to requirement language in v3 RFCs.
>>> 
>>> Specifically, this would take a v3 XML input file, and create a v3 XML
>>> output file with <bcp14></bcp14> added around each instance of a 2119
>>> keyword in the file. (MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT,
>>> SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL)
>>> 
>>> Anyone up for helping us out with that?
>>> 
>>> Thanks! Heather
>>> ...
>>> 
>>> The tricky part is to find the right instances. For instance, what if it
>>> appears in a quote, or in artwork? Or if "SHALL NOT" is across a line
>>> break...
>>> 
>>> So the output will require sanity checking.
>>  
>> Well, yes, of course. We're aiming for a rough pass to catch maybe 80% of the situations. Everything will still need to be reviewed.
>> 
>>> 
>>> I assume that the tool is supposed to preserve whitespace, line breaks
>>> etc? This essentially rules out running the input through an XML parser...
>> 
>> Seriously, we're not aiming for that robust right now. It doesn't have to be perfect, it just has to help.
>>  
>> -Heather
>>  
>> 
>> 
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest@rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
> _______________________________________________
> rfc-interest mailing list
> rfc-interest@rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest