[rfc-i] Feedback on draft-iab-rfc-nonascii-02, allowable characters
paul.hoffman at vpnc.org (Paul Hoffman) Wed, 31 August 2016 19:32 UTC
From: paul.hoffman at vpnc.org (Paul Hoffman)
Date: Wed, 31 Aug 2016 12:32:08 -0700
Subject: [rfc-i] Feedback on draft-iab-rfc-nonascii-02,
allowable characters
In-Reply-To: <c21e9705-b4a8-1d52-3d6d-a2e5749d49ed@seantek.com>
References: <c21e9705-b4a8-1d52-3d6d-a2e5749d49ed@seantek.com>
Message-ID: <B0E3D4B2-726B-4530-A33F-73698302CF88@vpnc.org>
On 31 Aug 2016, at 10:05, Sean Leonard wrote: > /(Part 2: questions about what characters beyond ASCII are allowed)/ > > ********** > Hello draft-iab-rfc-nonascii-02 people, here is feedback on > draft-iab-rfc-nonascii-02. > > Then there is the issue of curly quotes, both in U+ syntax and in > general. Are curly quotes allowed? Should they be allowed in general > in non-ascii RFCs, or replaced for straight quotes? The xml2rfc tool > currently down-converts smart quotes to straight quotes in plain text, > but does not upconvert straight quotes to smart quotes in HTML. This > has implications for how ?verbatim? (aka literal text strings) are > notated in the RFC formats. This is a very good question, and one that we did not consider, but should. > What about marks that are currently allowed by xml2rfc, such as U+2014 > ? EM DASH, that is converted to -- in plain text? I happen to use > that character aggressively as the prose calls for it, so it would be > good to know how it will show up in the plain text format, if at all. No, it shouldn't be converted. It's a perfectly good character. But we didn't deal with that either. I would both of these issues would fall under "normal typographical punctuation". > > What about other punctuation marks such as ? ? ? ? ? ? ? > etc.? The whole raft of Unicode space characters such as EM QUAD, EM > SPACE, etc.? What about characters that have strong mathematical value > such as ? MULTIPLICATION SIGN, ? DIVISION SIGN, and ? N-ARY > SUMMATION, and the whole block of mathematical operators? Such > mathematical characters might be especially useful for cryptographic > specifications. Those would fall under "non-ASCII symbols", another topic we did not address. > And what about block elements and geometric shapes (U+2500-U+25FF) in > <artwork>? Good god, no. > Overall the implications of this draft are that uses that are not > explicitly mentioned (author names, protocol elements, addresses) are > discouraged or prohibited; therefore, characters like EM DASH and > BULLET that can be represented (however imperfectly) in ASCII ought to > continue to be used as such. Yet the text plainly states: ?To > support this move away from ASCII, RFCs will switch to supporting > UTF-8 as the default character encoding and allow support for a broad > range of Unicode character support.? That supports the proposition > that all code points that are renderable in a modern, monospace, > freely-available font (i.e., Courier New) are fair game, as well as > code points that modern operating systems are likely to render /or/ > that would appear in author names (emoji and CJK characters, Indic > scripts, Arabic scripts). Note: Courier New 5.13 (Windows 7) includes > coverage for 2852 characters and 3254 glyphs; the version with Windows > 10 supports even more, I think. Font inclusion was only one of the aspects considered; searchability was another. --Paul Hoffman
- [rfc-i] Feedback on draft-iab-rfc-nonascii-02, al… Sean Leonard
- [rfc-i] Feedback on draft-iab-rfc-nonascii-02, al… Paul Hoffman