Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml

Megan Ferguson <mferguson@amsl.com> Thu, 10 August 2017 04:14 UTC

Return-Path: <mferguson@amsl.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E46D132549 for <tools-development@ietfa.amsl.com>; Wed, 9 Aug 2017 21:14:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id adUdg1KfUtMZ for <tools-development@ietfa.amsl.com>; Wed, 9 Aug 2017 21:14:55 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8A8BD132491 for <tools-development@ietf.org>; Wed, 9 Aug 2017 21:14:55 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id E317D1C3445; Wed, 9 Aug 2017 21:14:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oXK8iqOVC8ni; Wed, 9 Aug 2017 21:14:37 -0700 (PDT)
Received: from meganfeiussmbp2.fios-router.home (unknown [47.144.132.130]) by c8a.amsl.com (Postfix) with ESMTPA id 9C0031C3150; Wed, 9 Aug 2017 21:14:37 -0700 (PDT)
Content-Type: text/plain; charset="windows-1252"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Megan Ferguson <mferguson@amsl.com>
In-Reply-To: <cb8dea42-de33-27be-f81a-be8340021a65@levkowetz.com>
Date: Wed, 09 Aug 2017 21:14:51 -0700
Cc: tools-development@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <C6E05287-A060-409E-BA33-A032980AE910@amsl.com>
References: <F14A70DA-852D-4F13-9D59-82D40EC6BEE6@amsl.com> <cb8dea42-de33-27be-f81a-be8340021a65@levkowetz.com>
To: Henrik Levkowetz <henrik@levkowetz.com>
X-Mailer: Apple Mail (2.1878.6)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-development/70yE-2zjffL1FrTvhAV4RZevEqk>
Subject: Re: [TOOLS-DEVELOPMENT] Preview release of Text Submission Converter, id2xml
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-development/>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Aug 2017 04:14:58 -0000

Hi Henrik,

Thanks for the prompt reply.  

Apologies as there was one additional document I missed in my previous mail:

Input file: draft-ietf-trill-rbridge-multilevel-07
Version: id2xml 1.1.0
Issues: Unnumbered references sections, short title (non-xml2rfc-generated original)
Files available: 
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-rbridge-multilevel-07v3.original
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-rbridge-multilevel-07v3.txt
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-rbridge-multilevel-07v3-rfcdiff.html
https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-rbridge-multilevel-07v3.xml


7) This document used two unnumbered sections: “Normative References” and “Informative References”.  
id2xml added numbering and a “References” section for them to be children of, but the titles of 
the two subsections were “References” so we ended up with:

10.  References
10.1.  References
10.2.  References



8) There was no short title provided in the original.  Looks like id2xml hijacked the author name 
information to put into this position.

Perlman, et al.          Expires January 4, 2018                [Page 2]
Internet-Draft               Perlman, et al                    July 2017


Thanks again!

Megan


On Aug 9, 2017, at 11:53 AM, Henrik Levkowetz <henrik@levkowetz.com> wrote:

> Hi Megan,
> 
> On 2017-08-09 18:02, Megan Ferguson wrote:
>> Hi Henrik,
>> 
>> Mostly small input and notes for our records, so combining several
>> test docs in the message below (and using continuous numbering).
>> 
>> The majority are things that probably deserve no fix unless something
>> is easy to update. However, the Copyright title fix revisited (#6
>> below) would be good to have, IMHO.
>> 
>> General feedback that the references to RFCs and other known citation
>> tags are *much improved* (thank you!).
> 
> Oh, good :-)
> 
>> Input file: draft-ietf-ipsecme-rfc4307bis-18
>> Version: id2xml 1.1.0
>> Issues: reference parsing (note - this was an xml2rfc file originally)
>> Files available: 
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-ipsecme-rfc4307bis-18v3.original
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-ipsecme-rfc4307bis-18v3.txt
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-ipsecme-rfc4307bis-18v3-rfcdiff.html
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-ipsecme-rfc4307bis-18v3.xml
>> 
>> 
>> 1) Not sure what the deal with this reference is.  Initially, I got:
>> 
>> draft-ietf-ipsecme-rfc4307bis-18v3.txt(915): Warning: Failed parsing a reference.  Are all elements separated
>>   by commas (not periods, not just spaces)?:
>>   [TRANSCRIPTION]
>>              Bhargavan, K. and G. Leurent, "Transcript Collision
>>              Attacks: Breaking Authentication in TLS, IKE, and SSH",
>>              NDSS , feb 2016.
>> 
>> 
>> So I updated the weird space/comma and the date, and then I got:
>> 
>> Converting 'draft-ietf-ipsecme-rfc4307bis-18v3.txt'
>> 
>> draft-ietf-ipsecme-rfc4307bis-18v3.txt(918): Exception: need more than 1 value
>>   to unpack
>> Failure converting draft-ietf-ipsecme-rfc4307bis-18v3.txt: need more than 1 value to unpack
>> Traceback (most recent call last):
>>  File "/usr/bin/id2xml", line 9, in <module>
>>    load_entry_point('id2xml==1.1.0', 'console_scripts', 'id2xml')()
>>  File "/usr/lib/python2.7/site-packages/id2xml/run.py", line 226, in run
>>    xml = parser.parse_to_xml()
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 975, in parse_to_xml
>>    doc = self.document()
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 1004, in document
>>    self.root.append(self.back())
>>  File "<decorator-gen-37>", line 2, in back
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 578, in dtrace
>>    ret = fn(self, *params,**kwargs)
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2738, in back
>>    references = self.references([ str(self.section_number) ])
>>  File "<decorator-gen-38>", line 2, in references
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 578, in dtrace
>>    ret = fn(self, *params,**kwargs)
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2786, in references
>>    references = self.references(sublist, level+1)
>>  File "<decorator-gen-38>", line 2, in references
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 578, in dtrace
>>    ret = fn(self, *params,**kwargs)
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2797, in references
>>    ref, entity = self.reference()
>>  File "<decorator-gen-39>", line 2, in reference
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 578, in dtrace
>>    ret = fn(self, *params,**kwargs)
>>  File "/usr/lib/python2.7/site-packages/id2xml/parser.py", line 2955, in reference
>>    name, value = docname.split(None, 1)
>> ValueError: need more than 1 value to unpack
> 
> Oops.  Bug.  Will be fixed in 1.2.0
> 
>> The only way I could get it to parse was to remove “NDSS”.  
>> Just curious about this case as we see other items in that position frequently that don’t cause issues.
> 
> The reference patterns that id2xml recognises have either series info or
> document name in that position.  Series info implies at least 2 components,
> the series name and the series number/value.  With "NDSS" in this position,
> id2xml tried to split it in 2, to get at series name and number, and failed
> spectacularly.
> 
> After the fix I've put in now, id2xml will just ignore "NDSS", as it really
> doesn't know what to do with it.
> 
>> ----
>> 
>> Input file: draft-ietf-mpls-tp-aps-updates-04
>> Version: id2xml 1.1.0
>> Issues: Lowercase surnames, References, texttables
>> Files available: 
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-mpls-tp-aps-updates-04v3.original
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-mpls-tp-aps-updates-04v3.txt
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-mpls-tp-aps-updates-04v3-rfcdiff.html
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-mpls-tp-aps-updates-04v3.xml
>> 
>> 2) It doesn’t appear that the surnames beginning with a lowercase letter are recognized.  
>> Note - IMHO, this is okay to leave as is because the warning points out the issue and this 
>> is not common, so please feel free to leave as is unless an easy fix.
> 
> It's not really a simple fix, because it basically requires an understanding
> of what's called surname particles (things like 'van', 'von', 'de', etc.).
> 
> In the datatracker, I have a routine which does something like this, and it
> currently recognizes the following particles:
> 
>  af|al|Al|de|der|di|Di|du|el|El|Hadi|in 't|Le|st\.?|St\.?|ten|ter|van|van der|Van|von|von der|Von|zu
> 
> and this is not a complete set.
> 
> I probably can put the same set into id2xml, but it still won't be a
> complete fix, so it might be better to handle this manually anyway.  
> (In the datatracker case, that option doesn't really exist when serving a
> web-page ...)
> 
>> draft-ietf-mpls-tp-aps-updates-04v3.txt(355): Warning: This author is listed in the Authors' Addresses section, but was
>>   not found  on the first page: Huub van Helvoort
> 
> Right; a result of not recognizing the 'van'.
> 
>> 3) FYI - Here is another case where a texttable was created poorly.
>> 
>> Original:
>>   The last paragraph in Section 11 of [RFC7271] is modified as follows:
>> 
>>   ---------
>>   Old text:
>>   ---------
>>   In the state transition tables below, the letter 'i' stands for
>>   "ignore" and is an indication to remain in the current state and
>>   continue transmitting the current PSC message.
>>   ---------
>>   New text:
>>   ---------
>>   In the state transition tables below, the letter 'i' is the
>>   "ignore" flag, and if it is set it means that the top-priority
>>   global request is ignored.
>> 
>> 
>> id2xml text output:
>> 
>>   The last paragraph in Section 11 of [RFC7271] is modified as follows:
>> 
>>                                    Ol
>>                                    --
>>                                    In
>>                                    "i
>>                                    co
>>                                    Ne
>>                                    In
>>                                    gl
> 
> Huh. Ugh.
> 
>> While this use of dashes is not usual (i.e., around “Old” and “New”), just want to point out in case.
> 
> Right.
> 
> How have you resolved this in the xml?  What would be the desired output,
> if it was not made a texttable ?
> 
> 
>> ------
>> 
>> Input file: draft-ietf-bmwg-dcbench-terminology-19
>> Version: id2xml 1.1.0
>> Issues: Numbered sections after references, sections missing general text indentation, Acks trouble with ToC
>> Files available: 
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-bmwg-dcbench-terminology-19v3.original
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-bmwg-dcbench-terminology-19v3.txt
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-bmwg-dcbench-terminology-19v3-rfcdiff.html
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-bmwg-dcbench-terminology-19v3.xml
>> 
>> 4) The Abstract appeared without any indentation, which made things
>> weird in the xml (turned everything into <note>s.
> 
> Yes.  The assumption that paragraphs are indented relative to section
> titles is rather fundamental.  I've tried to get around that in earlier
> tools, and it creates a lot of other problems.  I think this will have to
> be handled by adding the appropriate paragraph indentation.
> 
>> Original:
>> 
>> Abstract
>> 
>> The purpose of this informational document is to establish definitions
>> and describe measurement techniques for data center benchmarking, as
>> well as it is to introduce new terminologies applicable to performance
>> evaluations of data center network equipment. This document establishes
>> the important concepts for benchmarking network switches and routers in
>> the data center and, is a pre-requisite to the test methodology
>> publication [draft-ietf-bmwg-dcbench-methodology]. Many of these terms
>> and methods may be applicable to network equipment beyond this
>> publication's scope as the technologies originally applied in the data
>> center are deployed elsewhere.
>> 
>> 
>> xml:
>> <abstract/><note title="The purpose of this informational document is to establish definitions"/><note title="and describe measurement techniques for data center benchmarking, as"/><note title="well as it is to introduce new terminologies applicable to performance"/><note title="evaluations of data center network equipment. This document establishes"/><note title="the important concepts for benchmarking network switches and routers in"/><note title="the data center and, is a pre-requisite to the test methodology"/><note title="publication [draft-ietf-bmwg-dcbench-methodology]. Many of these terms"/><note title="and methods may be applicable to network equipment beyond this"/><note title="publication's scope as the technologies originally applied in the data"/><note title="center are deployed elsewhere."/></front>
>> 
>> 
>> 5) Related to the above:  The whole text of the Acknowledgments section was pulled into the ToC/section title 
>> because it was not indented (same as the Abstract).
> 
> Right.  That would happen.  Sorry ,:-/
> 
>> 
>> Original:
>> 
>>   10.  References  . . . . . . . . . . . . . . . . . . . . . . . . . 16
>>     10.1.  Normative References  . . . . . . . . . . . . . . . . . . 16
>>     10.2.  Informative References  . . . . . . . . . . . . . . . . . 17
>>     10.3.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . 17
>> 
>> 
>> ...
>> 
>> 10.3.  Acknowledgments
>> 
>>         The authors would like to thank Alfred Morton, Scott Bradner,
>>         Ian Cox, Tim Stevenson for their reviews and feedback.
>> 
>> 
>> 
>> id2xml output (removed the numbering):
>>   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  16
>>     10.1.  Normative References . . . . . . . . . . . . . . . . . .  16
>>     10.2.  Informative References . . . . . . . . . . . . . . . . .  17
>>   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  17
>>   The authors would like to thank Alfred Morton, Scott Bradner, Ian
>>   Cox, Tim Stevenson for their reviews and feedback.  . . . . . . .  17
>>   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17
>> 
>> 
>> …
>> Acknowledgments
>> 
>> authors would like to thank Alfred Morton, Scott Bradner, Ian Cox, Tim
>> Stevenson for their reviews and feedback.
>> 
>> —————
>> 
>> Input file: draft-ietf-trill-mtu-negotiation-08
>> Version: id2xml 1.1.0
>> Issues: Updates values in header, Copyright title
>> Files available: 
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-mtu-negotiation-08v3.original
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-mtu-negotiation-08v3.txt
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-mtu-negotiation-08v3-rfcdiff.html
>> https://www.rfc-editor.org/rfc/v3test/draft-ietf-trill-mtu-negotiation-08v3.xml
>> 
>> 6) It appears that the title “Copyright and License Notice” is not
>> recognized. Once I updated, I got successful parsing. Just want to
>> revisit this one as it seems we get this title a lot.
> 
> Ok.  I've added the variations you mention below to the recognized titles
> of this section, in 1.2.0.  Please note that if the section text doesn't
> match the recognised boilerplate text, there will still be an error.
> 
> I'm pushing out 1.2.0 now.
> 
> 
> Best regards,
> 
> 	Henrik
> 
> 
>> 
>> Warning: Expected a back section, found '1. Introduction’
>> 
>> The list of common section titles from my previous mail on the
>> topic:
>> 
>>> Copyright
>>> Copyright Notice
>>> Copyright Notice and License
>>> Copyright and License Notice
>>> Copyright, Disclaimer, and Additional IPR Provisions
>>> Copyright and IPR Provisions
>>> Copyright Statement
>> 
>> 
>> 
>> 
>> Thanks!
>> 
>> Megan
>> 
>> 
>> 
>> _______________________________________________
>> TOOLS-DEVELOPMENT mailing list
>> TOOLS-DEVELOPMENT@ietf.org
>> https://www.ietf.org/mailman/listinfo/tools-development
>> 
>