Re: [xml2rfc-dev] Unicode in references

Henrik Levkowetz <henrik@levkowetz.com> Thu, 12 December 2019 06:36 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F27811200FF for <xml2rfc-dev@ietfa.amsl.com>; Wed, 11 Dec 2019 22:36:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.9
X-Spam-Level:
X-Spam-Status: No, score=-6.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Aan49f3-OZ0F for <xml2rfc-dev@ietfa.amsl.com>; Wed, 11 Dec 2019 22:36:25 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [64.170.98.42]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C115D1200FD for <xml2rfc-dev@ietf.org>; Wed, 11 Dec 2019 22:36:25 -0800 (PST)
Received: from h-202-242.a357.priv.bahnhof.se ([158.174.202.242]:62178 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1ifI5H-0001aY-Mv; Wed, 11 Dec 2019 22:36:24 -0800
To: Martin Thomson <mt@lowentropy.net>, xml2rfc-dev@ietf.org
References: <f226c310-aad9-4f70-92b6-f6cc356b3da7@www.fastmail.com>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <8ed7841d-f91a-28ab-9a5f-3435de44c8d9@levkowetz.com>
Date: Thu, 12 Dec 2019 07:36:15 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <f226c310-aad9-4f70-92b6-f6cc356b3da7@www.fastmail.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="HUFxLqWEJQDSQaEATgMHPPDeimBSB0wMP"
X-SA-Exim-Connect-IP: 158.174.202.242
X-SA-Exim-Rcpt-To: xml2rfc-dev@ietf.org, mt@lowentropy.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc-dev/rudTDhKAt5Xb1sF6-pMDxzSJqBA>
Subject: Re: [xml2rfc-dev] Unicode in references
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion about particulars of xml2rfc V3 design, development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc-dev/>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Dec 2019 06:36:27 -0000

Hi Martin,

This was a change in 2.37.0; attribute downcoding was too aggressive and
also affected attributes that legitimately could have non-ASCII content.

Clearly the change then went too far, and didn't downcode things that should
have been downcoded.

I'll use more discrimination in the upcoming bugfix release.

	Henrik

On 2019-12-12 04:01, Martin Thomson wrote:
> A recent change (I know not where, because the reference has been working for a while) resulted in the following warning:
> 
> draft-ietf-quic-tls.xml(1736): Error: Found non-ascii content in <seriesInfo> attribute value name="Advances in Cryptology – CRYPTO 2019"
> 
> This looks fine, until you realize that the apparent hyphen is instead an en-dash (U+2013).  I don't know if this was a change in the source reference, or whether this is a new change in xml2rfc, but I suspect that it is the latter.  xml2rfc produces this warning, but I'm seeing 2.36.0 (in CI) shows no such warning.  The text produced by xml2rfc 2.36 includes ASCII 45 (hyphen) for the same character.
> 
> Given that it is just a matter of time before this change makes it way to CI, I'd like to understand where this restriction came from.  And what I might do about it.
> 
> For reference, the DOI is 10.1007/978-3-030-26948-7_9 and this is the only non-ASCII character I was able to find in the reference other an "ö" in an author's first name (so that isn't ultimately rendered).