[rfc-i] Feedback on draft-iab-rfc-nonascii-02, allowable characters

paul.hoffman at vpnc.org (Paul Hoffman) Wed, 31 August 2016 19:32 UTC

From: paul.hoffman at vpnc.org (Paul Hoffman)
Date: Wed, 31 Aug 2016 12:32:08 -0700
Subject: [rfc-i] Feedback on draft-iab-rfc-nonascii-02, allowable characters
In-Reply-To: <c21e9705-b4a8-1d52-3d6d-a2e5749d49ed@seantek.com>
References: <c21e9705-b4a8-1d52-3d6d-a2e5749d49ed@seantek.com>
Message-ID: <B0E3D4B2-726B-4530-A33F-73698302CF88@vpnc.org>

On 31 Aug 2016, at 10:05, Sean Leonard wrote:

> /(Part 2: questions about what characters beyond ASCII are allowed)/
>
> **********
> Hello draft-iab-rfc-nonascii-02 people, here is feedback on 
> draft-iab-rfc-nonascii-02.
>
> Then there is the issue of curly quotes, both in U+ syntax and in 
> general. Are curly quotes allowed? Should they be allowed in general 
> in non-ascii RFCs, or replaced for straight quotes? The xml2rfc tool 
> currently down-converts smart quotes to straight quotes in plain text, 
> but does not upconvert straight quotes to smart quotes in HTML. This 
> has implications for how ?verbatim? (aka literal text strings) are 
> notated in the RFC formats.

This is a very good question, and one that we did not consider, but 
should.

> What about marks that are currently allowed by xml2rfc, such as U+2014 
> ? EM DASH, that is converted to -- in plain text? I happen to use 
> that character aggressively as the prose calls for it, so it would be 
> good to know how it will show up in the plain text format, if at all.

No, it shouldn't be converted. It's a perfectly good character. But we 
didn't deal with that either. I would both of these issues would fall 
under "normal typographical punctuation".

>
> What about other punctuation marks such as ? ? ? ? ? ? ? 
> etc.? The whole raft of Unicode space characters such as EM QUAD, EM 
> SPACE, etc.? What about characters that have strong mathematical value 
> such as ? MULTIPLICATION SIGN, ? DIVISION SIGN, and ? N-ARY 
> SUMMATION, and the whole block of mathematical operators? Such 
> mathematical characters might be especially useful for cryptographic 
> specifications.

Those would fall under "non-ASCII symbols", another topic we did not 
address.

> And what about block elements and geometric shapes (U+2500-U+25FF) in 
> <artwork>?

Good god, no.

> Overall the implications of this draft are that uses that are not 
> explicitly mentioned (author names, protocol elements, addresses) are 
> discouraged or prohibited; therefore, characters like EM DASH and 
> BULLET that can be represented (however imperfectly) in ASCII ought to 
> continue to be used as such. Yet the text plainly states: ?To 
> support this move away from ASCII, RFCs will switch to supporting 
> UTF-8 as the default character encoding and allow support for a broad 
> range of Unicode character support.? That supports the proposition 
> that all code points that are renderable in a modern, monospace, 
> freely-available font (i.e., Courier New) are fair game, as well as 
> code points that modern operating systems are likely to render /or/ 
> that would appear in author names (emoji and CJK characters, Indic 
> scripts, Arabic scripts). Note: Courier New 5.13 (Windows 7) includes 
> coverage for 2852 characters and 3254 glyphs; the version with Windows 
> 10 supports even more, I think.

Font inclusion was only one of the aspects considered; searchability was 
another.

--Paul Hoffman