Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Megan Ferguson <mferguson@amsl.com> Wed, 07 June 2017 18:43 UTC

Return-Path: <mferguson@amsl.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D1ADB131462 for <tools-development@ietfa.amsl.com>; Wed, 7 Jun 2017 11:43:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tnsAlXne06eX for <tools-development@ietfa.amsl.com>; Wed, 7 Jun 2017 11:43:05 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3A1C313145F for <tools-development@ietf.org>; Wed, 7 Jun 2017 11:41:56 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id 0C49C1CA3B4; Wed, 7 Jun 2017 11:41:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JxVzBQD1synC; Wed, 7 Jun 2017 11:41:51 -0700 (PDT)
Received: from [10.0.1.11] (cpe-76-168-191-223.socal.res.rr.com [76.168.191.223]) by c8a.amsl.com (Postfix) with ESMTPA id C6C771CA3A9; Wed, 7 Jun 2017 11:41:51 -0700 (PDT)
From: Megan Ferguson <mferguson@amsl.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 07 Jun 2017 11:41:54 -0700
Message-Id: <D6FC76F7-213D-4F80-A131-93176E19729A@amsl.com>
Cc: tools-development@ietf.org
To: henrik@levkowetz.com
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
X-Mailer: Apple Mail (2.1878.6)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-development/ZXRwVo9eiCC6IlquQ99Yq-_zEVc>
Subject: Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-development/>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jun 2017 18:43:11 -0000

Hi Henrik,

Input file: draft-ietf-trill-directory-assist-mechanisms-12
Version: id2xml 1.0.0
Issues: File not originally generated with XML
Files available: 
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.original 
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3.txt
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-directory-assist-mechanisms-12v3-rfcdiff.html

(The .txt version above includes our manual updates and the rfcdiff highlights these from the original.)

Note - This file is very representative of the types of issues we receive when getting a file not originally 
created with xml2rfc and the case where we feel the id2xml tool would be most useful for the RPC.  That is, 
we believe that this is a good example of a common use case of id2xml.

Testing this file raised a number of questions/errors.

1) Header

a) The status “Proposed Standard” does not seem to be recognized.  This is a common value we see in I-Ds not created with xml2rfc.
For our purposes, Proposed Standard, Draft Standard, Internet Standard, Standard Track, and Standards Track should all map to 
“Standards Track” handling.

 File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1403, in get_category
   self.warn(line, "Expected a recognized status name, found '%s'" % (line.txt, ))

b) WG name: This document did not include a working group name in the header.  We receive documents with no working 
group indicated (as with this one), with the full WG name, or with the abbreviation.  Is this required for parsing?

c) Does the use of the authors’ full names in the header (or if they included more than one initial) have any affect on output?

We are getting the following error *whether or not* the initials or full first names appear:

Failure converting: 'NoneType' object does not support item assignment
Traceback (most recent call last):
  File "/usr/bin/id2xml", line 9, in <module>
    load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
  File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
    xml = parser.parse_to_xml()
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
    doc = self.document()
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
    self.root.append(self.back())
  File "<string>", line 2, in back
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
    self.read_authors_addresses()
  File "<string>", line 2, in read_authors_addresses
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
    self.maybe_author_address(item)
  File "<string>", line 2, in maybe_author_address
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
    item['address'] = {}
TypeError: 'NoneType' object does not support item assignment


*However*, we get the following similar error on another file that previously parsed correctly when we 
solely updated to use the authors’ full names in the header (instead of just initials):

Failure converting: 'NoneType' object does not support item assignment
Traceback (most recent call last):
  File "/usr/bin/id2xml", line 9, in <module>
    load_entry_point('id2xml==1.0.0', 'console_scripts', 'id2xml')()
  File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 222, in run
    xml = parser.parse_to_xml()
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 953, in parse_to_xml
    doc = self.document()
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 978, in document
    self.root.append(self.back())
  File "<string>", line 2, in back
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2669, in back
    self.read_authors_addresses()
  File "<string>", line 2, in read_authors_addresses
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1693, in read_authors_addresses
    self.maybe_author_address(item)
  File "<string>", line 2, in maybe_author_address
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 560, in wrap
    ret = fn(self, *params,**kwargs)
  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1793, in maybe_author_address
    item['address'] = {}
TypeError: 'NoneType' object does not support item assignment

We tried (to no avail):
-updating full first names to initials only
-matching 1:1 the affiliation of each author with what appears in the Authors’ Addresses section 
(original uses Huawei for two authors and Huawei Technologies was used in the Addresses section).
-updating the header to include Internet-Draft
-removing blank space in the left side of the header

Follow ups on header-related issues:

i) What regulations/limitations exist on how items must appear in the header?  We are also curious about documents 
from other streams such as IRTF or Independent (which we have yet to test).

ii) We are unsure about the meaning of some of the header-related errors.  For example:

Warning: Expected a label indication top left, found none

Perhaps getting a bit more information on this would help us troubleshoot.

iii) Perhaps "Intended status" and the like should be case insensitive?

2) The Status of This Memo did not match the current version.  This caused an error:

Error: Unexpected text: expected 'Internet-Drafts are working documents of the Internet
  Engineering Task Force (IETF).  Note that other groups may also distribute working documents as Internet-Drafts.  The
  list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.', found 'Distribution of this
  document is unlimited. Comments should be sent’

3) Older documents frequently have Copyright notices at the end of the document (such as this one).  This caused an error:

Error: Unexpected text: expected 'Copyright Notice', found 'D. Eastlake, et al
  [Page 1]’

4) The running header (perhaps due to the period after D?) was a problem and needed to be manually stripped.

D. Eastlake, et al                                              [Page 2]
INTERNET-DRAFT                       TRILL: Directory Service Mechanisms

Error: Unexpected section number; expected '1' or a subsection, found 'D. Eastlake, et al
  [Page 1]'

5) The references section is unnumbered.  This (maybe)creates an error.  

Error: Expected an Authors' Addresses section, found 'Normative References’

Thank you in advance for your help!

Megan