Re: [xml2rfc-dev] When is @ascii required?

Henrik Levkowetz <henrik@levkowetz.com> Mon, 28 October 2019 14:03 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA903120113 for <xml2rfc-dev@ietfa.amsl.com>; Mon, 28 Oct 2019 07:03:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.889
X-Spam-Level:
X-Spam-Status: No, score=-1.889 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bSP-hWJopucN for <xml2rfc-dev@ietfa.amsl.com>; Mon, 28 Oct 2019 07:03:31 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5779412002F for <xml2rfc-dev@ietf.org>; Mon, 28 Oct 2019 07:03:31 -0700 (PDT)
Received: from h-202-242.a357.priv.bahnhof.se ([158.174.202.242]:49564 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1iP5cI-0000SQ-2s; Mon, 28 Oct 2019 07:03:31 -0700
To: Carsten Bormann <cabo@tzi.org>
References: <37D9DCA7-A262-46A6-88C7-369127959164@tzi.org> <834E00E6-A39A-4E8C-8AF4-7D2F9B736C74@tzi.org> <9079ee9c-3f9c-74bc-9e84-fff223056ab9@levkowetz.com> <C1B5F114-C4D9-4713-A902-794551702092@tzi.org>
Cc: xml2rfc-dev@ietf.org
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <e3202451-b5cc-ef9c-0408-424683f9fb62@levkowetz.com>
Date: Mon, 28 Oct 2019 15:03:22 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <C1B5F114-C4D9-4713-A902-794551702092@tzi.org>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="LCMarkhi0pCD6uG9JSXwEKLUlPxIHggN3"
X-SA-Exim-Connect-IP: 158.174.202.242
X-SA-Exim-Rcpt-To: xml2rfc-dev@ietf.org, cabo@tzi.org
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc-dev/0_ntx-AVs5fuueG5oATz4fPoRYo>
Subject: Re: [xml2rfc-dev] When is @ascii required?
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion about particulars of xml2rfc V3 design, development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc-dev/>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Oct 2019 14:03:33 -0000

Hi Carsten,

On 2019-10-27 20:20, Carsten Bormann wrote:
> On Oct 27, 2019, at 19:50, Henrik Levkowetz <henrik@levkowetz.com> wrote:
>> 
>> It may very well be that the test can be improved, but it was triggered
>> by a hard-to-diagnose failure where an XML file had us-ascii encoding
>> declared, but contained non-ascii characters. 
> 

> Do you remember whether it contained non-ASCII characters themselves
> (which should already fail in the parser) or character-references to
> beyond-ASCII characters?  Or maybe entity references?

Oh, it was non-ASCII characters in the input file, and all would have been
well if it had failed cleanly with a reasonable error message in the parser,
but it didn't.

I'll look at relocating this code so as to do a first parse without
expansions of entity references, in order to not block use of such, but
still catch non-ASCII characters in files declared to be us-ascii.


Best regards,

	Henrik