Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Henrik Levkowetz <henrik@levkowetz.com> Tue, 13 June 2017 16:35 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C4151293DF for <tools-development@ietfa.amsl.com>; Tue, 13 Jun 2017 09:35:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ETl3FhJjT3hp for <tools-development@ietfa.amsl.com>; Tue, 13 Jun 2017 09:35:01 -0700 (PDT)
Received: from durif.tools.ietf.org (durif.tools.ietf.org [IPv6:2001:1900:3001:11::3d]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00E9D12EAE4 for <tools-development@ietf.org>; Tue, 13 Jun 2017 09:34:46 -0700 (PDT)
Received: from h-43-30.a357.priv.bahnhof.se ([79.136.43.30]:60171 helo=[192.168.1.120]) by durif.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1dKomC-0002Xu-Pf; Tue, 13 Jun 2017 09:34:45 -0700
To: Megan Ferguson <mferguson@amsl.com>
References: <D6FC76F7-213D-4F80-A131-93176E19729A@amsl.com>
Cc: tools-development@ietf.org
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <5940141D.1060903@levkowetz.com>
Date: Tue, 13 Jun 2017 18:34:37 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <D6FC76F7-213D-4F80-A131-93176E19729A@amsl.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="p6ufxjb2SmpIQlutc1oS4Ou6jJM7736Ft"
X-SA-Exim-Connect-IP: 79.136.43.30
X-SA-Exim-Rcpt-To: tools-development@ietf.org, mferguson@amsl.com
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on durif.tools.ietf.org)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-development/NiaiCNM6HmAFkEPFqrVV2Vdtt_U>
Subject: Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-development/>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jun 2017 16:35:05 -0000

Hi Megan,

On 2017-06-07 20:41, Megan Ferguson wrote:
> Hi Henrik,
> 
> Input file: draft-ietf-trill-directory-assist-mechanisms-12
> Version: id2xml 1.0.0
> Issues: File not originally generated with XML
> Files available: 
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.original 
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.txt
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3-rfcdiff.html
> 
> (The .txt version above includes our manual updates and the rfcdiff
> highlights these from the original.)
> 
> Note - This file is very representative of the types of issues we
> receive when getting a file not originally created with xml2rfc and
> the case where we feel the id2xml tool would be most useful for the
> RPC. That is, we believe that this is a good example of a common use
> case of id2xml.

Ok, good to know.  I'm adding this to the test suite.
> 
> Testing this file raised a number of questions/errors.
> 
> 1) Header
> 
> a) The status “Proposed Standard” does not seem to be recognized.
> This is a common value we see in I-Ds not created with xml2rfc. For
> our purposes, Proposed Standard, Draft Standard, Internet Standard,
> Standard Track, and Standards Track should all map to “Standards
> Track” handling.

Right.  Fixed in my sources.

>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1403, in get_category
>    self.warn(line, "Expected a recognized status name, found '%s'" % (line.txt, ))
> 
> b) WG name: This document did not include a working group name in the
> header. We receive documents with no working group indicated (as with
> this one), with the full WG name, or with the abbreviation. Is this
> required for parsing?

No.  If no working group name is found, the generated xml will produce
'Network Working Group' top left.

> c) Does the use of the authors’ full names in the header (or if they
> included more than one initial) have any affect on output?

> We are getting the following error *whether or not* the initials or
> full first names appear:
> 
> Failure converting: 'NoneType' object does not support item assignment
> Traceback (most recent call last):
>   File "/usr/bin/id2xml", line 9, in <module>
>     load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
>   File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
>     xml = parser.parse_to_xml()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
>     doc = self.document()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
>     self.root.append(self.back())
>   File "<string>", line 2, in back
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
>     self.read_authors_addresses()
>   File "<string>", line 2, in read_authors_addresses
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
>     self.maybe_author_address(item)
>   File "<string>", line 2, in maybe_author_address
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
>     item['address'] = {}
> TypeError: 'NoneType' object does not support item assignment

Oops.  This is an internal error.  Fixed in my sources.

> *However*, we get the following similar error on another file that previously parsed correctly when we 
> solely updated to use the authors’ full names in the header (instead of just initials):
> 
> Failure converting: 'NoneType' object does not support item assignment
> Traceback (most recent call last):
>   File "/usr/bin/id2xml", line 9, in <module>
>     load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
>   File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
>     xml = parser.parse_to_xml()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
>     doc = self.document()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
>     self.root.append(self.back())
>   File "<string>", line 2, in back
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
>     self.read_authors_addresses()
>   File "<string>", line 2, in read_authors_addresses
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
>     self.maybe_author_address(item)
>   File "<string>", line 2, in maybe_author_address
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
>     item['address'] = {}
> TypeError: 'NoneType' object does not support item assignment
> 
> We tried (to no avail):
> -updating full first names to initials only
> -matching 1:1 the affiliation of each author with what appears in the Authors’ Addresses section 
> (original uses Huawei for two authors and Huawei Technologies was used in the Addresses section).
> -updating the header to include Internet-Draft
> -removing blank space in the left side of the header

The root cause of this was an incorrect starting value for line counting
internally.  If there had been blank lines ahead of the first text lines,
you would not have seen this (but then I would not have found the related
bug).  Fixed in my sources.

> Follow ups on header-related issues:
> 
> i) What regulations/limitations exist on how items must appear in the
> header? We are also curious about documents from other streams such
> as IRTF or Independent (which we have yet to test).

If there is a working group name, it must appear on the first line.
All left-hand header items which have the form 'Key word: Text' can
appear in any order, top left, but it is expected that there are no
interleaved blank lines top left.

> ii) We are unsure about the meaning of some of the header-related
> errors. For example:
> 
> Warning: Expected a label indication top left, found none

This is the Internet-Draft or Request for Comments indication.

> Perhaps getting a bit more information on this would help us
> troubleshoot.

Ack, understood.  I'll see what I can do.

> iii) Perhaps "Intended status" and the like should be case
> insensitive?

I've made it somewhat less case sensitive, but not completely in the
next version.  If more feedback show that complete case insensitivity
would be best, I'll be happy to make it so.

> 2) The Status of This Memo did not match the current version. This
> caused an error:
> 
> Error: Unexpected text: expected 'Internet-Drafts are working documents of the Internet
>   Engineering Task Force (IETF).  Note that other groups may also distribute working documents as Internet-Drafts.  The
>   list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.', found 'Distribution of this
>   document is unlimited. Comments should be sent’

Yes.  id2xml uses the Status of This Memo text to derive the category=
setting of the <rfc/> element.  I can change that to just ignore the
status of memo section, if you would prefer to manually insert the proper
category in the <rfc/> element.

> 3) Older documents frequently have Copyright notices at the end of the document (such as this one).  This caused an error:
> 
> Error: Unexpected text: expected 'Copyright Notice', found 'D. Eastlake, et al
>   [Page 1]’

Mmm.  Ok.  I'll do some tests on making the copyright notice optional.

> 4) The running header (perhaps due to the period after D?) was a problem and needed to be manually stripped.
> 
> D. Eastlake, et al                                              [Page 2]
> INTERNET-DRAFT                       TRILL: Directory Service Mechanisms

Maybe.  Or the ", et al".  I'll have a look at that.

> Error: Unexpected section number; expected '1' or a subsection, found 'D. Eastlake, et al
>   [Page 1]'
> 
> 5) The references section is unnumbered.  This (maybe)creates an error.  
> 
> Error: Expected an Authors' Addresses section, found 'Normative References’

Ok, will check.


> Thank you in advance for your help!

Sure :-)

Best regards,

	Henrik