Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Henrik Levkowetz <henrik@levkowetz.com> Tue, 13 June 2017 16:35 UTC

To: Megan Ferguson <mferguson@amsl.com>
References: <D6FC76F7-213D-4F80-A131-93176E19729A@amsl.com>
Cc: tools-development@ietf.org
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <5940141D.1060903@levkowetz.com>
Date: Tue, 13 Jun 2017 18:34:37 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <D6FC76F7-213D-4F80-A131-93176E19729A@amsl.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="p6ufxjb2SmpIQlutc1oS4Ou6jJM7736Ft"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-development/NiaiCNM6HmAFkEPFqrVV2Vdtt_U>
Subject: Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml
Precedence: list

Hi Megan,

On 2017-06-07 20:41, Megan Ferguson wrote:
> Hi Henrik,
> 
> Input file: draft-ietf-trill-directory-assist-mechanisms-12
> Version: id2xml 1.0.0
> Issues: File not originally generated with XML
> Files available: 
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.original 
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.txt
> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3-rfcdiff.html
> 
> (The .txt version above includes our manual updates and the rfcdiff
> highlights these from the original.)
> 
> Note - This file is very representative of the types of issues we
> receive when getting a file not originally created with xml2rfc and
> the case where we feel the id2xml tool would be most useful for the
> RPC. That is, we believe that this is a good example of a common use
> case of id2xml.

Ok, good to know.  I'm adding this to the test suite.
> 
> Testing this file raised a number of questions/errors.
> 
> 1) Header
> 
> a) The status “Proposed Standard” does not seem to be recognized.
> This is a common value we see in I-Ds not created with xml2rfc. For
> our purposes, Proposed Standard, Draft Standard, Internet Standard,
> Standard Track, and Standards Track should all map to “Standards
> Track” handling.

Right.  Fixed in my sources.

>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1403, in get_category
>    self.warn(line, "Expected a recognized status name, found '%s'" % (line.txt, ))
> 
> b) WG name: This document did not include a working group name in the
> header. We receive documents with no working group indicated (as with
> this one), with the full WG name, or with the abbreviation. Is this
> required for parsing?

No.  If no working group name is found, the generated xml will produce
'Network Working Group' top left.

> c) Does the use of the authors’ full names in the header (or if they
> included more than one initial) have any affect on output?

> We are getting the following error *whether or not* the initials or
> full first names appear:
> 
> Failure converting: 'NoneType' object does not support item assignment
> Traceback (most recent call last):
>   File "/usr/bin/id2xml", line 9, in <module>
>     load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
>   File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
>     xml = parser.parse_to_xml()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
>     doc = self.document()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
>     self.root.append(self.back())
>   File "<string>", line 2, in back
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
>     self.read_authors_addresses()
>   File "<string>", line 2, in read_authors_addresses
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
>     self.maybe_author_address(item)
>   File "<string>", line 2, in maybe_author_address
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
>     item['address'] = {}
> TypeError: 'NoneType' object does not support item assignment

Oops.  This is an internal error.  Fixed in my sources.

> *However*, we get the following similar error on another file that previously parsed correctly when we 
> solely updated to use the authors’ full names in the header (instead of just initials):
> 
> Failure converting: 'NoneType' object does not support item assignment
> Traceback (most recent call last):
>   File "/usr/bin/id2xml", line 9, in <module>
>     load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
>   File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
>     xml = parser.parse_to_xml()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
>     doc = self.document()
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
>     self.root.append(self.back())
>   File "<string>", line 2, in back
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
>     self.read_authors_addresses()
>   File "<string>", line 2, in read_authors_addresses
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
>     self.maybe_author_address(item)
>   File "<string>", line 2, in maybe_author_address
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
>     ret = fn(self, *params,**kwargs)
>   File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
>     item['address'] = {}
> TypeError: 'NoneType' object does not support item assignment
> 
> We tried (to no avail):
> -updating full first names to initials only
> -matching 1:1 the affiliation of each author with what appears in the Authors’ Addresses section 
> (original uses Huawei for two authors and Huawei Technologies was used in the Addresses section).
> -updating the header to include Internet-Draft
> -removing blank space in the left side of the header

The root cause of this was an incorrect starting value for line counting
internally.  If there had been blank lines ahead of the first text lines,
you would not have seen this (but then I would not have found the related
bug).  Fixed in my sources.

> Follow ups on header-related issues:
> 
> i) What regulations/limitations exist on how items must appear in the
> header? We are also curious about documents from other streams such
> as IRTF or Independent (which we have yet to test).

If there is a working group name, it must appear on the first line.
All left-hand header items which have the form 'Key word: Text' can
appear in any order, top left, but it is expected that there are no
interleaved blank lines top left.

> ii) We are unsure about the meaning of some of the header-related
> errors. For example:
> 
> Warning: Expected a label indication top left, found none

This is the Internet-Draft or Request for Comments indication.

> Perhaps getting a bit more information on this would help us
> troubleshoot.

Ack, understood.  I'll see what I can do.

> iii) Perhaps "Intended status" and the like should be case
> insensitive?

I've made it somewhat less case sensitive, but not completely in the
next version.  If more feedback show that complete case insensitivity
would be best, I'll be happy to make it so.

> 2) The Status of This Memo did not match the current version. This
> caused an error:
> 
> Error: Unexpected text: expected 'Internet-Drafts are working documents of the Internet
>   Engineering Task Force (IETF).  Note that other groups may also distribute working documents as Internet-Drafts.  The
>   list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.', found 'Distribution of this
>   document is unlimited. Comments should be sent’

Yes.  id2xml uses the Status of This Memo text to derive the category=
setting of the <rfc/> element.  I can change that to just ignore the
status of memo section, if you would prefer to manually insert the proper
category in the <rfc/> element.

> 3) Older documents frequently have Copyright notices at the end of the document (such as this one).  This caused an error:
> 
> Error: Unexpected text: expected 'Copyright Notice', found 'D. Eastlake, et al
>   [Page 1]’

Mmm.  Ok.  I'll do some tests on making the copyright notice optional.

> 4) The running header (perhaps due to the period after D?) was a problem and needed to be manually stripped.
> 
> D. Eastlake, et al                                              [Page 2]
> INTERNET-DRAFT                       TRILL: Directory Service Mechanisms

Maybe.  Or the ", et al".  I'll have a look at that.

> Error: Unexpected section number; expected '1' or a subsection, found 'D. Eastlake, et al
>   [Page 1]'
> 
> 5) The references section is unnumbered.  This (maybe)creates an error.  
> 
> Error: Expected an Authors' Addresses section, found 'Normative References’

Ok, will check.


> Thank you in advance for your help!

Sure :-)

Best regards,

	Henrik

Attachment: signature.asc

[TOOLS-DEVELOPMENT] Preview release of Text Submi… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Julian Reschke
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Sandy Ginoza
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… HANSEN, TONY L
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Julian Reschke
[TOOLS-DEVELOPMENT] Fwd: Preview release of Text … Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Sandy Ginoza
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Fwd: Preview release of T… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Sandy Ginoza
[TOOLS-DEVELOPMENT] Preview release of Text Submi… Megan Ferguson
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Henrik Levkowetz
Re: [TOOLS-DEVELOPMENT] Preview release of Text S… Megan Ferguson

Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Attachment: signature.asc