Re: [rfc-i] looking for a volunteer to write a simple script

Sandy Ginoza <sginoza@amsl.com> Fri, 12 July 2019 18:30 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7605812083C for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:30:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.95
X-Spam-Level:
X-Spam-Status: No, score=-4.95 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8mQPJEfsA_w2 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Fri, 12 Jul 2019 11:30:10 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6E84112083E for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Fri, 12 Jul 2019 11:30:03 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id EF337B80B1C; Fri, 12 Jul 2019 11:29:57 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 027AFB80B1A; Fri, 12 Jul 2019 11:29:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XvgPkioCXrbf; Fri, 12 Jul 2019 11:29:55 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) by rfc-editor.org (Postfix) with ESMTPS id 8A8BEB80B19; Fri, 12 Jul 2019 11:29:55 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id 48DCF1C1363; Fri, 12 Jul 2019 11:29:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JiJYfg7zIs4U; Fri, 12 Jul 2019 11:29:51 -0700 (PDT)
Received: from [IPv6:2605:e000:1524:de:f4f3:a316:8ba:911d] (unknown [IPv6:2605:e000:1524:de:f4f3:a316:8ba:911d]) by c8a.amsl.com (Postfix) with ESMTPSA id E7B061C1362; Fri, 12 Jul 2019 11:29:50 -0700 (PDT)
From: Sandy Ginoza <sginoza@amsl.com>
Message-Id: <01ADB89D-90AF-4672-A8B9-54F5B09E82D4@amsl.com>
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Date: Fri, 12 Jul 2019 11:29:59 -0700
In-Reply-To: <c23139a7261e58cbfc93ac18a3815bad@strayalpha.com>
To: Joe Touch <touch@strayalpha.com>
References: <62c8413d-c735-4ec3-8b22-eb0fa5356636@Spark> <38d0704f-348c-4ec0-9d94-340747960201@Spark> <e86b8894-4d7a-4c9d-3476-0221a94c9eb0@gmx.de> <13A89BE6-8654-49C4-9FBA-2F709EE0BA1B@rfc-editor.org> <0504f606252c476f66804e338fa460b4@strayalpha.com> <c23139a7261e58cbfc93ac18a3815bad@strayalpha.com>
X-Mailer: Apple Mail (2.3273)
Subject: Re: [rfc-i] looking for a volunteer to write a simple script
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Cc: Julian Reschke <julian.reschke@gmx.de>, RFC Interest <rfc-interest@rfc-editor.org>, Heather Flanagan <rse@rfc-editor.org>
Content-Type: multipart/mixed; boundary="===============7093634016481312011=="
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

Thanks Joe!

I just tested this updated script and it seems to work well.  I am happy to see it catches things like “MAY” (within quotes), but does not tag items Carsten noted (e.g., “MARSHALL”).   

I note that it does double tag keywords if they’ve already been tagged, but that should be something we can check for before running the script (i.e., check whether the keywords have already been tagged).  

Thanks!
Sandy

> On Jul 12, 2019, at 11:02 AM, Joe Touch <touch@strayalpha.com> wrote:
> 
> Quick update:
> 
> perl -0777 -pe "s/(\b(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)\b)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
> 
> 
> 
> Joe
> 
>  
> On 2019-07-12 10:43, Joe Touch wrote:
> 
>> This will do the trick:
>> 
>> 
>> 
>> perl -0777 -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
>> 
>> (replace INFILE.xml and OUTFILE.xml with your filenames)
>> 
>> If you want it to edit in-place (riskly, but simpler if you work on a copy anyway):
>> 
>> perl -0777 -i -pe "s/(((MUST|SHOULD|SHALL)(\s+NOT)?)|((NOT\s+)?RECOMMENDED)|MAY|OPTIONAL|REQUIRED)/<bcp14>\$1<\/bcp14>/g" INFILE.xml > OUTFILE.xml
>> 
>>  
>> 
>> 
>> On 2019-07-12 10:26, Heather Flanagan wrote:
>> 
>> 
>> 
>> On Jul 12, 2019, at 10:23 AM, Julian Reschke <julian.reschke@gmx.de <mailto:julian.reschke@gmx.de>> wrote:
>> 
>> On 12.07.2019 18:55, Heather Flanagan wrote:
>> Hola a todos!
>> 
>> The RFC Editor has the need for a comparatively simple script that would
>> automatically add <bcp14></bcp14> tags to requirement language in v3 RFCs.
>> 
>> Specifically, this would take a v3 XML input file, and create a v3 XML
>> output file with <bcp14></bcp14> added around each instance of a 2119
>> keyword in the file. (MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT,
>> SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL)
>> 
>> Anyone up for helping us out with that?
>> 
>> Thanks! Heather
>> ...
>> 
>> The tricky part is to find the right instances. For instance, what if it
>> appears in a quote, or in artwork? Or if "SHALL NOT" is across a line
>> break...
>> 
>> So the output will require sanity checking.
>>  
>> Well, yes, of course. We're aiming for a rough pass to catch maybe 80% of the situations. Everything will still need to be reviewed.
>> 
>> 
>> I assume that the tool is supposed to preserve whitespace, line breaks
>> etc? This essentially rules out running the input through an XML parser...
>> Seriously, we're not aiming for that robust right now. It doesn't have to be perfect, it just has to help.
>>  
>> -Heather
>>  
>> 
>> 
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest@rfc-editor.org <mailto:rfc-interest@rfc-editor.org>
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest@rfc-editor.org <mailto:rfc-interest@rfc-editor.org>
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>_______________________________________________
> rfc-interest mailing list
> rfc-interest@rfc-editor.org
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest