[xml2rfc-dev] Unicode in references

"Martin Thomson" <mt@lowentropy.net> Thu, 12 December 2019 03:02 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD2481200C5 for <xml2rfc-dev@ietfa.amsl.com>; Wed, 11 Dec 2019 19:02:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=l9P4R3rr; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=qeZ9qt9g
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UE71pz22DVx6 for <xml2rfc-dev@ietfa.amsl.com>; Wed, 11 Dec 2019 19:02:01 -0800 (PST)
Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3C1B1120086 for <xml2rfc-dev@ietf.org>; Wed, 11 Dec 2019 19:02:01 -0800 (PST)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 69811223CF for <xml2rfc-dev@ietf.org>; Wed, 11 Dec 2019 22:02:00 -0500 (EST)
Received: from imap2 ([10.202.2.52]) by compute1.internal (MEProxy); Wed, 11 Dec 2019 22:02:00 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:date:from:to:subject:content-type :content-transfer-encoding; s=fm1; bh=CuoCzCPpQwYqbufIwIMSUq0eme Z5HZOTj0IuBuU04Xc=; b=l9P4R3rrvbYMJNWMt+az/w30OmY+POIzADHZ7XIkdY NNikucnCI/lgCB+swDPrF7ymgn9ezmFkdsCnquFANnqotN9W4s2gOeuft+i7kzPW 8QLf8c4/eHSRQVs12l+YUeI5N0mo3dyBKxxBV5n3zFJSUU1uru8ZI8lABBRPjAiR 7TBP94J8/+6qEXExrguji1BHQI6Yum6HSVTBVfNm9wtpPhHiNdDUHd/BD0TV2x5F /p9j7yal3h75woEZgk+V/qX/QF0m1nqUvsPUl8u3YcTwYIPlDMahw7ekgWLc/4ny oWSCJQyTEmw74opRuaM1SfsD0UMCPL8fR3XueWLB83PQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=CuoCzC PpQwYqbufIwIMSUq0emeZ5HZOTj0IuBuU04Xc=; b=qeZ9qt9gB6/YcaeXMYLkH0 COgQOqJmUp8Oa29ydx81UtIxSWCx4t/6NrQSJJF9n84d2r41xlLOlRbVYcey7lIf JOMn1LaCc8KLduOleOafv9eZXPGhAq4apixUN2AgtxgrPJGoMriXZIFZk5mgDa24 ojmvF/ZejAfBeWl44Jle4LqT/dMBKwUFU2QPhBEuGXJWBFS0Nm7MRaLPi7ZRxMYV v8X+hVCIB7YVCf0CN1gtEir+veyEA1KLDLjhD7c8aSzZg2aL2suaMtNB0mUXn+x4 aeg/sJbtbShfn4Cm2pOd/k8H0hBsSHi9bFmJsf1nLUoAtlsKdy67Epy8lJ4O5tYA ==
X-ME-Sender: <xms:qK3xXZJE2Ixuqw4jARZLImk7W7QCB6TmoT-ylqlI9Z8yVWrEJIjtmQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrudeliedghedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtgfesthhqre dtreerjeenucfhrhhomhepfdforghrthhinhcuvfhhohhmshhonhdfuceomhhtsehlohif vghnthhrohhphidrnhgvtheqnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofi gvnhhtrhhophihrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:qK3xXT7DK2xURuZ2Sxl_QscZ9-lJ3zjyi5pnzpM7MvQjaWNCOFXEfw> <xmx:qK3xXWTMs3c78_Jd_bW7AMCq0-hJXteKyq0BQxlZBYiAHfM87IiLFw> <xmx:qK3xXWfTQXqE_1tfaK8TBjg2fY9V9f5EiPbEtKNbafgx5i6PrfZiJQ> <xmx:qK3xXWv21zaOtlv8T5IS8LRFhQsVoEIsIrg0hytAphsj4m-w1vq3bg>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 3498EE00A2; Wed, 11 Dec 2019 22:02:00 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-679-g1f7ccac-fmstable-20191210v1
Mime-Version: 1.0
Message-Id: <f226c310-aad9-4f70-92b6-f6cc356b3da7@www.fastmail.com>
Date: Thu, 12 Dec 2019 14:01:18 +1100
From: Martin Thomson <mt@lowentropy.net>
To: xml2rfc-dev@ietf.org
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc-dev/WeijwcIH5dGqlCw04GOIoHQJxKQ>
Subject: [xml2rfc-dev] Unicode in references
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion about particulars of xml2rfc V3 design, development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc-dev/>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Dec 2019 03:02:03 -0000

A recent change (I know not where, because the reference has been working for a while) resulted in the following warning:

draft-ietf-quic-tls.xml(1736): Error: Found non-ascii content in <seriesInfo> attribute value name="Advances in Cryptology – CRYPTO 2019"

This looks fine, until you realize that the apparent hyphen is instead an en-dash (U+2013).  I don't know if this was a change in the source reference, or whether this is a new change in xml2rfc, but I suspect that it is the latter.  xml2rfc produces this warning, but I'm seeing 2.36.0 (in CI) shows no such warning.  The text produced by xml2rfc 2.36 includes ASCII 45 (hyphen) for the same character.

Given that it is just a matter of time before this change makes it way to CI, I'd like to understand where this restriction came from.  And what I might do about it.

For reference, the DOI is 10.1007/978-3-030-26948-7_9 and this is the only non-ASCII character I was able to find in the reference other an "ö" in an author's first name (so that isn't ultimately rendered).