Re: [xml2rfc-dev] xml2rfc Input Document Issues

Julian Reschke <julian.reschke@gmx.de> Sat, 30 July 2011 14:36 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4F22221F8781 for <xml2rfc-dev@ietfa.amsl.com>; Sat, 30 Jul 2011 07:36:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.348
X-Spam-Level:
X-Spam-Status: No, score=-104.348 tagged_above=-999 required=5 tests=[AWL=-1.749, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SslseK8Z1vKm for <xml2rfc-dev@ietfa.amsl.com>; Sat, 30 Jul 2011 07:36:27 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23]) by ietfa.amsl.com (Postfix) with SMTP id 1FB5D21F877D for <xml2rfc-dev@ietf.org>; Sat, 30 Jul 2011 07:36:26 -0700 (PDT)
Received: (qmail invoked by alias); 30 Jul 2011 14:36:24 -0000
Received: from p508FDDA5.dip.t-dialin.net (EHLO [192.168.178.36]) [80.143.221.165] by mail.gmx.net (mp061) with SMTP; 30 Jul 2011 16:36:24 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX1/yRY8p57nyYc/k6h05eumd7AEKuWP2oA2C9kofOQ RsbthwHbaKGDcq
Message-ID: <4E3416E4.5010202@gmx.de>
Date: Sat, 30 Jul 2011 16:36:20 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Tony Hansen <tony@att.com>
References: <277044E0-588F-4A87-B773-20CAF5CADFC2@concentricsky.com> <62F897A0-3990-47BF-B597-B1729EA82D80@concentricsky.com> <4E34093A.5080402@att.com>
In-Reply-To: <4E34093A.5080402@att.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Cc: xml2rfc-dev@ietf.org
Subject: Re: [xml2rfc-dev] xml2rfc Input Document Issues
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion about particulars of xml2rfc development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xml2rfc-dev>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Jul 2011 14:36:28 -0000

On 2011-07-30 15:38, Tony Hansen wrote:
 > ...
>> ERROR OUTSIDE OUR CONTROL
>>
>> 4) Invalid characters for an "ID" value (30)
>> It errors if an attribute of type "ID" has a value that starts with a
>> number, or contains spaces, at symbols, or a few other characters.
>> Upon looking into this, this is an error with standard XML and
>> something outside our codebase.
>
> So the question then shifts for how to best handle such errors -- what
> kinds of error messages should be presented.

It is a non-fatal error (it's validation-level, not wellformedness). 
That being said, I support it to stay an error.

>> 5) Documents that violate the DTD (55)
> There are some meta questions that will need to be answered about the
> DTD. I know that these are out of scope for the *development* of
> xml2rfc2, but they are definitely in scope before the tool can be rolled
> out for mass use. In particular, the RFC Editor staff has various
> concerns about how best to handle RFC documents that were "legit" before
> and suddenly no longer are.

Thanks for putting this into quotes.

We should keep in mind that there are validity problems that are easily 
fixed (missing date...), but others which are not (artwork in lists).

We need to categorize into groups:

- things that are just stupid in the DTD (requiring date but allowing 
all attributes to be empty)

- things that have some justification (list required to be in t)

- things that are really needed in practice but forbidden (artwork in 
lists).

>> 6) Documents that didn't properly escape& and< characters in XML (28)
>
> Again the question then shifts for how to best handle such errors --
> what kinds of error messages should be presented.

Might be tricky because the XML parser might not tell.

> ...
>> 8) An include instruction requested a path directories, instead of
>> just the filename (12)
>> Instead of asking for 'reference.RFC.2119.xml' it asked for
>> 'bibxml2/reference.RFC.2119.xml'. Most of the documents don't do this.
>> There are ways we could handle it -- if the full path fails, it could
>> try just the basename in the toplevel directory. Conversely, if the
>> instruction asks for just the basename like usual, we can either only
>> look in the top level directory (current behavior) or do a recursive
>> search. Thoughts on the best way to handle this?
>
> I'm wondering what the current xml2rfc does in these cases.

It appears the only thing that makes sense is to resolve the name 
against a base directory, incl. path components. Everything else would 
be ambiguous.

>> 9) Incorrect DTD filename given (2)
>> Could be a typo or intended to complete later, some files used
>> 'rfcXXXX.dtd' for the DTD. If we need to, we can treat this in the
>> same way as if no DTD were given, but it might be more appropriate to
>> display an error.
>
> Can you be more specific about which documents displayed the above
> errors so we can see exactly what you're referring to?

A bad DTD filename is harmless if you don't need the DTD for validation. 
If the document *depends* on something in the DTD (like entity defs), 
this needs to be a hard error.

> ...
> ERROR: Unable to parse the XML document: draft-aayadi-6lowpan-tcphc-01.xml
> Comment not terminated , line 1192, column 8
>
>
> xml2rfc2 stopped looking for a comment "-->" when it ran into "--" in
> the text. My understanding is that this is legit XML and should be handled.

-- is not allowed inside comments. Hard XML error.

> ERROR: Unable to parse the XML document:
> draft-livingood-dns-malwareprotect-02.xml
> EntityRef: expecting ';', line 171, column 84
>
>
> &quot was missing the trailing ';'. This is a case where the error
> message could be improved to indicate *why* it was expecting a ';'.
>
> ERROR: Unable to parse the XML document:
> draft-livingood-woundy-p4p-experiences-10.xml
> internal error, line 6, column 70
>
>
> I hadn't spotted these before. "Internal error" is just as bad as a
> exceptions.
>
> Tons of errors like
>
> ERROR: Unable to validate the XML document: draft-maino-lisp-sec-00.xml
> Line 407: IDREF attribute target references an unknown ID "RFC5226"
>
>
> that need to be understood.

These indicate failure to include external <reference>s...