[TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Megan Ferguson <mferguson@amsl.com> Thu, 13 July 2017 02:28 UTC

Return-Path: <mferguson@amsl.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 038B0126B7F for <tools-development@ietfa.amsl.com>; Wed, 12 Jul 2017 19:28:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u3goncZ0smBD for <tools-development@ietfa.amsl.com>; Wed, 12 Jul 2017 19:28:40 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1559D1200C1 for <tools-development@ietf.org>; Wed, 12 Jul 2017 19:28:40 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id EDF121CA52E; Wed, 12 Jul 2017 19:28:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QnSIsqHe0DpK; Wed, 12 Jul 2017 19:28:35 -0700 (PDT)
Received: from meganfeiussmbp2.fios-router.home (unknown [47.144.154.234]) by c8a.amsl.com (Postfix) with ESMTPA id C05981CA528; Wed, 12 Jul 2017 19:28:35 -0700 (PDT)
Content-Type: text/plain; charset="windows-1252"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Megan Ferguson <mferguson@amsl.com>
Date: Wed, 12 Jul 2017 19:28:36 -0700
Cc: tools-development@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <A8EC1B4D-A999-4848-B7E6-ABFE199921D7@amsl.com>
References: <8158A447-3AE2-413F-8BF0-6EDA08B5B121@amsl.com>
To: henrik@levkowetz.com
X-Mailer: Apple Mail (2.1878.6)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-development/N_7IP55N3__na-lhCJ8K_EVCxFg>
Subject: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-development/>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Jul 2017 02:28:42 -0000

Hi Henrik,

Files: non-xml2rfc-generated files generally — status check
Version: 1.0.3

This mail is comprised of a few queries about this type of file generally as well as a summary 
of the manual updates we have been making in text files to get id2xml to parse and represent 
as much of the text accurately as possible.  We appreciate whatever feedback you may have on 
the following.

1) Would it be possible to use the citation tags [RFC…] and [I-D….] in the references section 
as a trigger to automatically pull those from the citation library?  (So that if the entry itself 
is poor, it doesn’t really matter…).  I believe you previously said this information was pulled from
the seriesInfo, which makes sense as taking a look at (for example):

https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.xml, 

we see a full reference entry for [ARPND] aka draft-ietf-trill-arp-optimization in the references 
section but we also see:

<!ENTITY I-D.ietf-trill-arp-optimization SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/
reference.I-D.draft-ietf-trill-arp-optimization.xml”>

at the top of the xml file.

The same question for other references that live in the citation library (e.g., other SDOs like 
W3C in bibxml6).  

[Generally, updating the citation tag only would be much less time intensive than updating the info.  
And having the correct input would be desirable (vs. re-adding the reference to the xml) so that 
references aren’t errantly removed.]

2) The SoTM text trigger seems to be quite sensitive.  The text appearing in the following files 
is not far off that generated by xml2rfc, but even the slight variations cause this text to be 
unrecognized.  Even copying in the text from https://www.ietf.org/ietf-ftp/1id-guidelines.html#anchor7 
gives an error as it is single spaced (and disclaimer that it does contain some typos…).

draft-ietf-mpls-app-aware-tldp-09
draft-ietf-pals-status-reduction-05
draft-ietf-ippm-6man-pdm-option-13

3) Here is a list of the (current) manual changes we are making in order to make id2xml parse with the 
current version.  Please let me know if I have mischaracterized any functionality or if any of these 
items can be resolved using the tool in some manner I am unaware of.

Header updates:

-Remove any blank lines between top left 'Key word: text’ entries
-Update to use first initial instead of full first name (or you get "Warning: This author is listed 
in the Authors’ Addresses section, but was not found  on the first page: and the authors section will 
be absent from the XML generated)

Boilerplate updates:

-Replace Copyright and SoTM text to ensure exactly matches output from xml2rfc
-Ensure use of “Copyright License” as a title exactly

List format updates:

-Add blank lines between list items to fix numbering
-change (1) to 1 
-fix indentation with - (dash) to all be inline indentation-wise
-fix indentation generally — if things are not aligned, they don’t work
-update + or -iv or anything not xml2rfc-compliant as a list marker

Section header updates:
-  Change Appendix A: to be Appendix A. or just Appendix A

Reference entry updates: 

-Change A. Nonymous to Nonymous, A. 
-Review comma use as missing commas will cause a reference to be missed
-Add double quotes around titles
-Date updates:
	-Change February 10 2016 to 10 February 2016 -- what about commas etc.?
	-Change Summer 1996 to a specific month
	
Authors’ Addresses updates:

-Add a blank line before email addresses (temporary?)
-Remove any number from this section heading
-Add URI: before any entry (same for email?)
-Spacing
	-Need to make sure there is whitespace where needed 
        (e.g., title with no blank line before the figure)
	- Need to review in text output from the xml because sometimes errant
-Review for mismatch between authors in header and Addresses section

Misc. updates:

-Review for defined in “section” x and similar

4) Here is a pointer to a diff file between an original and one including the manual edits we made in order to get 
the document to parse (a file we have previously discussed, just an example):

https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3preedits-rfcdiff.html 

Here is a pointer to a diff between the original and the text created from the id2xml output of our 
manually updated original (another example):

https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3-rfcdiff.html 


5) The following is a list of checks we are making once the file parses using xml2rfc.  The amount of time this takes 
is very dependent on the number of lists, figures, and references per document in addition to how clean the indentation
was in the original.

Lists:
-Review numbering of lists
-Review correct implementation of list elements
-Ensure no figures have been turned into lists errantly

Figures:
-Review figures for alignment and missing text

Spacing:
-Review indentation of text around a colon followed by two spaces
-Review whitespace generally

References:
-Include references that could not be generated through text fixes previously

Authors:
-Review all information from original is included in the file (no missing email addresses)

Thank you.

Megan