[rfc-i] Layout-affecting Unicode

Martin Thomson <mt@lowentropy.net> Wed, 28 April 2021 23:54 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E3A1B3A25AA; Wed, 28 Apr 2021 16:54:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.751
X-Spam-Level:
X-Spam-Status: No, score=-4.751 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=lowentropy.net header.b=r3nClu3E; dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=messagingengine.com header.b=AepIh1ZJ
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0aPfGDExMUfg; Wed, 28 Apr 2021 16:54:19 -0700 (PDT)
Received: from rfc-editor.org (rfc-editor.org [4.31.198.49]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1DF283A25A7; Wed, 28 Apr 2021 16:54:18 -0700 (PDT)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id 5F84AF40806; Wed, 28 Apr 2021 16:54:04 -0700 (PDT)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id A91F3F40806 for <rfc-interest@rfc-editor.org>; Wed, 28 Apr 2021 16:54:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Authentication-Results: rfcpa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b=r3nClu3E; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=AepIh1ZJ
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Chbyog1ckltx for <rfc-interest@rfc-editor.org>; Wed, 28 Apr 2021 16:54:01 -0700 (PDT)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by rfc-editor.org (Postfix) with ESMTPS id 1AD5DF40802 for <rfc-interest@rfc-editor.org>; Wed, 28 Apr 2021 16:54:00 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 4A0755C0117 for <rfc-interest@rfc-editor.org>; Wed, 28 Apr 2021 19:54:13 -0400 (EDT)
Received: from imap10 ([10.202.2.60]) by compute1.internal (MEProxy); Wed, 28 Apr 2021 19:54:13 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=mime-version:message-id:date:from:to:subject:content-type; s= fm2; bh=EPV6t0on8j3w2vWCVGd/Rw5FsqOf9W44u+TxEL4Xqf4=; b=r3nClu3E X+4ytgXf6CB+c5ubu2QJZA6Y+hBq9u85EBWSwmLy3sokS7T95zLcPRqvvs+uCfhD a76CpgiDRcHu8P95wUd2LiF15qETA7ZYo3K9mX5w6Smx5rbPymYz8z3DQqc7JIoE lnC0MJ2+s8pn6T+gwJ+s3l+C+YqML60UB49aQU7S2V+wzMV/COWg9qb9S17S6LRX hVNVkoIpmRVmJguGdayfCJaTW2Ec8X4nhdP34GqlKN9XxTvqHtJYLBIkfBQijzJ4 QnlfxjqZm1b4j5JQUnRdEYjX6Wj6EVNmFeHSVww98tlnOdUo/sV2wlhbUkPbKK8M LNpl/rYJMSmfaA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; bh=EPV6t0on8j3w2vWCVGd/Rw5FsqOf9 W44u+TxEL4Xqf4=; b=AepIh1ZJVqQxGA+uRsTBijAKqjiG9/VgEhjtXgimzJAN8 3hiivhqEg4+nwJEbs6P3RFUTaUWZzMMe8xvv3wb3YRWb2yH3oScT0sDVqLsSN7uP hBV+SDM/N9enQd91vgbb3tOQZ5EM8HsEgIHc4k1ufnxJDMZ33QXzfnW1qqzlqitI FaHYgoSY0JWCmnFyfeBtSsjZnGAvgAhFfoe1IKi3eNDpT2k60BtU5dQh4IEZhx35 VTTRv/E4dp77aNkxeZJLSdgO3J2HZBBRPZGD0qtSplyMEClk3HgHQfKFcpFMZXcP tT/wu33AtARJsp8FtRYFjsviEG+5cfplk+qubK3dw==
X-ME-Sender: <xms:pPWJYHc5GaSypw_QjxDXHYlX9hgXlwVfI_ta9bM1P2BdxwulZXWEwg> <xme:pPWJYNN9CPxOILsbPiQyBy0aqEphD32YcyCcDRLmDw1AIHJHXHXJTT53asDayHlHW QANjzoHyNF2nU96fVc>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvddvfedgvdeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtsehttdertd erredtnecuhfhrohhmpedfofgrrhhtihhnucfvhhhomhhsohhnfdcuoehmtheslhhofigv nhhtrhhophihrdhnvghtqeenucggtffrrghtthgvrhhnpefgheeigefgieelkefhtdfhtd ffgeejffdvkedufffgledtudejhfeiiefhhfevgeenucffohhmrghinhepihgvthhfrdho rhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepmh htsehlohifvghnthhrohhphidrnhgvth
X-ME-Proxy: <xmx:pPWJYAh9jeCaWY3tmJwVFrLvK6ogDYYrKUi5X09tggcFrFcbsN9EUQ> <xmx:pPWJYI_Y1jJkdnOdXn7yIfV61dpXY1aYi5u83Pwp5MXbZoiczbF2_g> <xmx:pPWJYDtzIo78Ds3h6Tjnv0mN7ln5seUfGeMYisb7R5l1uQgDtluVmA> <xmx:pfWJYB68dJVQDqHxxoYnfGrstwS0b4yStMK1RWXC3E96Vd3ovscVPQ>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 8FB6D4E00C8; Wed, 28 Apr 2021 19:54:12 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-403-gbc3c488b23-fm-20210419.005-gbc3c488b
Mime-Version: 1.0
Message-Id: <9b8666b3-46f7-4e07-b98b-190e3b51091c@www.fastmail.com>
Date: Thu, 29 Apr 2021 09:53:53 +1000
From: Martin Thomson <mt@lowentropy.net>
To: rfc-interest@rfc-editor.org
Subject: [rfc-i] Layout-affecting Unicode
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

It seems like xml2rfc has a very narrow set of Unicode characters outside of the ASCII space that it preserves[1].  The way it does this is not great [2], but it seems like these all exist to support forcing formatting constraints.

Is the use of these layout-affecting characters something we have discussed and have consensus around?  The HTML world recognizes these characters[3], but I've seen more deliberate efforts to move layout choices into document structure so that it can be controlled with style directives.

A problem I have observed that arises from these is that they create unsearchable text in that their presentation does not reveal the specific characters.  The BCP 14 boilerplate inserted by kramdown-rfc2629 for example inserts a non-breaking space between "BCP" and "14".  Searching for "BCP 14" consequently fails to return anything.  The other items also confound searching in similar ways, which is why I personally prefer structural elements for controlling layout.  This searching problem is likely part of why the RPC have insisted on having these be visible through entity references (which gets us back to [2] again).

The next obvious question is a tired one, but I feel that it needs to be asked: why privilege these characters over other Unicode in the structure/mechanisms of the system rather than simply limit their use through editorial policy?


[1] The complete set being: non-breaking whitespace, zero-width space, non-breaking hyphen, line separator, and word joiner.
[2] https://trac.tools.ietf.org/tools/xml2rfc/trac/ticket/548 is the result of my most recent trials with this problem.
[3] At least I believe so; I haven't tested all of them.
_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://www.rfc-editor.org/mailman/listinfo/rfc-interest