[rfc-i] Unicode names in RFCs and xml2rfc

"Martin Thomson" <mt@lowentropy.net> Wed, 04 December 2019 02:31 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 27542120059 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Tue, 3 Dec 2019 18:31:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.75
X-Spam-Level:
X-Spam-Status: No, score=-4.75 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=lowentropy.net header.b=WXWbwP6r; dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=messagingengine.com header.b=dykdn/uZ
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tys5-AxsXwqm for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Tue, 3 Dec 2019 18:31:41 -0800 (PST)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B931E12001A for <rfc-interest-archive-eekabaiReiB1@ietf.org>; Tue, 3 Dec 2019 18:31:41 -0800 (PST)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 5F3E1F40727; Tue, 3 Dec 2019 18:31:39 -0800 (PST)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id 8A2C1F40727 for <rfc-interest@rfc-editor.org>; Tue, 3 Dec 2019 18:31:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=WXWbwP6r; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=dykdn/uZ
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eqlMtKZ25Z5j for <rfc-interest@rfc-editor.org>; Tue, 3 Dec 2019 18:31:37 -0800 (PST)
Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by rfc-editor.org (Postfix) with ESMTPS id 5312CF40725 for <rfc-interest@rfc-editor.org>; Tue, 3 Dec 2019 18:31:37 -0800 (PST)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id D62C122652 for <rfc-interest@rfc-editor.org>; Tue, 3 Dec 2019 21:31:35 -0500 (EST)
Received: from imap2 ([10.202.2.52]) by compute1.internal (MEProxy); Tue, 03 Dec 2019 21:31:35 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:date:from:to:subject:content-type :content-transfer-encoding; s=fm3; bh=w/bQav7Y5GMAcvUW4LsWbRecWp aYdexk2PLU6urIwMA=; b=WXWbwP6rc1r7FceovTTo0qsFGCgvYg+/b+cBr9xX1a IMR3JYlzvy3G9IiUnDoN0c/dxpK8kgV1Ly3FVBvXcgPjczX5Y1eIPTIgTav5R6yU /smA6aD7SXIGJ+9/4vSpH0zE2LGoS2cZ8hNAKjjyygnqrG8PXY+nORP71+Nwm39j OJHbmo29VfaGSjZc4Li1/+8myxbUyEGi1MYQ8siZUTIY3/DMnKzGlEMTTblE34NP pyZq+WaFHssW5wYKyBSY2gPpFYgBG0YSQuI0qJ6gad0PELfChlDOi+foxfKJbKG/ Pu91RZjBb+hlByt6O6xwBqcrrlsleeopQnJwnM5CxcNw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=w/bQav 7Y5GMAcvUW4LsWbRecWpaYdexk2PLU6urIwMA=; b=dykdn/uZCbITpeYMIMapcE M47AU3Yq4rsHdLQb5E8uqzldPlN281gpBFvyJ1ft/zQKR3Ibpak4aa86KzxceS5x 1P87dhFC6BmTysa7k2D4UqGUxrg2+GeJZfDgEeo6EUOOQZsveXQKuZqgtaWTKEpl 3UsNUj4dSFIZ7SbGjD4tNVw6+8zeBmayDEXNQrmxsS8YoOVp4d4mgnFY4xmxhhMm BrqArjnEMaiZcif/o7wzP7Y/nPXGAPpL9HLLG96FfcOCerSOJkyl4ItQ5aoqWHk9 Jr6+7n6mrg3HCx/HmJ/kgzdr8/PfL1BYOTP61QAK4NX3lQPEW/R/aIFpP5SlXGGQ ==
X-ME-Sender: <xms:hxrnXQiVgoJDrLcx0MesyXJkfe6bkf0glKGtN1239p6oJppRowhH5g>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrudejkedggeeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtgfesthhqre dtreerjeenucfhrhhomhepfdforghrthhinhcuvfhhohhmshhonhdfuceomhhtsehlohif vghnthhrohhphidrnhgvtheqnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofi gvnhhtrhhophihrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:hxrnXRvyTakb9UVJIbaO4-9lyN5GsIisQPE_gytLS5PGy8gakRG6rg> <xmx:hxrnXYDsHazjB8o5Vh93AtyzFhyi4ngwGnZXzDsa2fesht6FK7ObwA> <xmx:hxrnXdyqmv7Sj56tmct9nItYY_4uh0EenOLtY9wvZsMxHrprcASs6g> <xmx:hxrnXZSYRlnAPrbthIS9ABkt6mPAJPTuYJ0tjX2TU6IwIshcvfVgbQ>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 66267E00A2; Tue, 3 Dec 2019 21:31:35 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.1.7-612-g13027cc-fmstable-20191203v1
Mime-Version: 1.0
Message-Id: <76d730cb-9fe1-4572-acbe-8db5bc0bd598@www.fastmail.com>
Date: Wed, 04 Dec 2019 13:31:15 +1100
From: Martin Thomson <mt@lowentropy.net>
To: rfc-interest@rfc-editor.org
Subject: [rfc-i] Unicode names in RFCs and xml2rfc
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

I'm reading the code in xml2rfc to work out how it is intended to work and finding it extraordinarily difficult to achieve a relatively modest goal: putting a person's name into the document.

My requirements are simple: acknowledge contributions using a person's preferred name.  More concretely, I see no value in expanding ø or ü, but I would however like to provide ASCII analogues of the Japanese names in the list.   This goal seems consistent with the text in RFC 7997:

   Person names may appear in several places within an RFC (e.g., the
   header, Acknowledgements, and References).  When a script outside the
   Unicode Latin blocks [UNICODE-CHART] is used for an individual name,
   an author-provided, ASCII-only identifier will appear immediately
   after the non-Latin characters, surrounded by parentheses.  This will
   improve general readability of the text.

I'm talking about acknowledgments, so the list appears in a <t> element.  The intent is to render the list of names in an ordinary paragraph, with commas separating each.

None of the elements that permit Unicode text fit in this context.  I realize that I could use <artwork> for this, but that's clearly an abuse of that element; more so because it renders very differently depending on context (I could probably do something with SVG, now that I think of it...).  

<u> is singularly unsuitable for this purpose.  It insists on - at a minimum - including the U+NNNN notation for every character.  If I could use format="char" or format="char-ascii" it might be acceptable.  Assuming that I have properly understood the code.  The <u> element is not documented in RFC 7991.

I appreciate the value in having a clear signal from the author that a block of text is intended to include Unicode.  Unicode tends to lead to all sorts of inconvenient inconsistencies, like multiple different dash and hyphen styles, quoting variations, and other such oddities.  I can (grudgingly) accept that some sort of indication is appropriate so that what should be relatively uncommon text usage can be scrutinized additionally.

It shouldn't be this difficult to acknowledge someone using their name.
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest