Re: [xml2rfc-dev] xml2rfc Input Document Issues

Josh Bothun <jbothun@concentricsky.com> Wed, 03 August 2011 17:37 UTC

Return-Path: <jbothun@concentricsky.com>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C33A221F8ACE for <xml2rfc-dev@ietfa.amsl.com>; Wed, 3 Aug 2011 10:37:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.265
X-Spam-Level:
X-Spam-Status: No, score=-2.265 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8pAxrFepM18D for <xml2rfc-dev@ietfa.amsl.com>; Wed, 3 Aug 2011 10:37:06 -0700 (PDT)
Received: from locust.concentricsky.com (locust.concentricsky.com [69.30.54.148]) by ietfa.amsl.com (Postfix) with ESMTP id 4A58121F8AC9 for <xml2rfc-dev@ietf.org>; Wed, 3 Aug 2011 10:37:06 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by locust.concentricsky.com (Postfix) with ESMTP id 452409E8169; Wed, 3 Aug 2011 10:37:08 -0700 (PDT)
Received: from locust.concentricsky.com ([127.0.0.1]) by localhost (locust.concentricsky.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id clHmwjSUG039; Wed, 3 Aug 2011 10:37:03 -0700 (PDT)
Received: from [192.168.6.73] (caterpillar.concentricsky.com [74.95.42.1]) by locust.concentricsky.com (Postfix) with ESMTP id 53A2A9E80A4; Wed, 3 Aug 2011 10:37:03 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="us-ascii"
From: Josh Bothun <jbothun@concentricsky.com>
In-Reply-To: <4E34093A.5080402@att.com>
Date: Wed, 03 Aug 2011 10:37:02 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6EC89A4F-1471-4E9F-80AA-5DA1D6115C27@concentricsky.com>
References: <277044E0-588F-4A87-B773-20CAF5CADFC2@concentricsky.com> <62F897A0-3990-47BF-B597-B1729EA82D80@concentricsky.com> <4E34093A.5080402@att.com>
To: Tony Hansen <tony@att.com>
X-Mailer: Apple Mail (2.1084)
Cc: xml2rfc-dev@ietf.org
Subject: Re: [xml2rfc-dev] xml2rfc Input Document Issues
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion about particulars of xml2rfc development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xml2rfc-dev>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2011 17:37:06 -0000

On Jul 30, 2011, at 6:38 AM, Tony Hansen wrote:

> ...
> 
> On 7/30/2011 2:13 AM, Mike Biglan wrote:
>> Hi Henrik and Tony,
>> 
>> Here is a little more detail on what we worked on this past week in response to the input documents. We went ahead and tested a set of 450 documents in an attempt to categorize the errors, then fix any that needed fixing. Below are the categories that had errors and the counts in parentheses; this includes issues we have fixed, errors in the document or outside our codebase, and open questions.
>> 
>> FIXED
>> 
>> 1) Include instructions are being handled properly now.
>> 
>> 2) No DTD file was declared in the document (12)
>> We had already planned to handle this but it wasn't possible until a recent change.  I will implement a function in the application to default to rfc2629.dtd if no dtd is declared.
> 
> I'll note also that the --dtd parameter didn't seem to work either.

This has been fixed in the latest version (2.0.2).

> 
>> 9) Incorrect DTD filename given (2)
>> Could be a typo or intended to complete later, some files used 'rfcXXXX.dtd' for the DTD.  If we need to, we can treat this in the same way as if no DTD were given, but it might be more appropriate to display an error.
> 
> Can you be more specific about which documents displayed the above errors so we can see exactly what you're referring to?

Sure -- the following documents have a DOCTYPE referencing "rfcXXXX.dtd":
   draft-dnoveck-nfsv4-storage-control-01.xml
   draft-dnoveck-storage-control-01.xml
   draft-jdfalk-maawg-cfblbcp-01.xml
   draft-kanno-secsh-camellia-02.xml
   draft-kanno-tls-camellia-03.xml
   draft-worley-service-example-07.xml

> 
> ERROR: Unable to parse the XML document: draft-livingood-woundy-p4p-experiences-10.xml
> internal error, line 6, column 70
> 
> 
> I hadn't spotted these before. "Internal error" is just as bad as a exceptions.

I'm currently looking into these to see how we can better express the errors.  It looks like the pattern from documents that throw this is that they have syntax errors in the DOCTYPE declaration.

> 
> Tons of errors like
> 
> ERROR: Unable to validate the XML document: draft-maino-lisp-sec-00.xml
> Line 407: IDREF attribute target references an unknown ID "RFC5226"
> 
> 
> that need to be understood.
> 
>   Tony

This large class of errors you are getting seems to be an issue with the citation loading -- I would be curious if you ran the latest HEAD again with some of my new changes if the problem still exists.

I am able to replicate the error by invalidating my XML_LIBRARY path, or by removing the citation document entirely, however the script will also then print a warning saying that the include could not be resolved.  I've made it warning level instead of error level because it doesn't actually halt the parser, since processing instructions are not limited by the DTD, however it may be more appropriate for this to be an error.
		
If the error is still coming up, I believe that means the citation document was not found in $XML_LIBRARY OR in the same directory as the input XML file.  Running the script with --verbose may help because it prints the path to the reference its trying to load.

-josh