[rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax

dev+ietf at seantek.com (Sean Leonard) Wed, 31 August 2016 17:02 UTC

From: dev+ietf at seantek.com (Sean Leonard)
Date: Wed, 31 Aug 2016 10:02:22 -0700
Subject: [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax
Message-ID: <ecd3d504-764e-2b6d-72bd-3343ad22660d@seantek.com>

/(Sent this to the authors, and the suggestion was that this is the 
right mailing list for public discussion.)/

**********
Hello draft-iab-rfc-nonascii-02 people, here is feedback 
on draft-iab-rfc-nonascii-02.

Section 3.4 of draft-iab-rfc-nonascii-02 provides no less than six 
preferred alternatives for how to represent a single Unicode character 
or code point. They all pretty much say ?the ___ character (___)? in 
various permutations. None of these are inherently wrong.

However, The Unicode Standard itself (9.0.0 and prior versions) provides 
a specific convention in Appendix A:
?U+[x][x]xxxx NAME OF CHARACTER?

Notably, the convention does not use ?the ___ character? formulation. 
Grammatically, the convention is a character, so an article is omitted. 
A conforming example would be:

  1.  Temperature changes in the Temperature Control Protocol are
      indicated by U+2206 INCREMENT.

I would like to propose that this be used as at least a priority 
alternative.

In The Unicode Standard, two other conventions are noted:

U+1F631 ??? FACE SCREAMING IN FEAR

U+1F631 ???

These conventions show all-caps, and small-caps (which for PDF 
presentation purposes, are actually stored as lowercase). They also show 
curly quotes. I asked the Unicode mailing list over the weekend and the 
general sense is that the uppercase is normative in plain text (as shown 
in the UCD) but case distinctions, along with space and (nearly all) 
hyphens, are not relevant for unambiguous identification.

draft-iab-rfc-nonascii-02 is only concerned with characters, not 
semantics or presentation formats (unlike xml2rfc format). Assuming that 
plain text is the norm for purposes of draft-iab-rfc-nonascii-02, I 
suppose that it is sufficient for the plain text to have an ALL-CAPS 
name. I was going to suggest a novel xml2rfc element for Unicode code 
points, such as <ucode name="yes">?</ucode> that would be transformed 
into the output above in plain text mode. However, the xml2rfc 
transformer can detect such text by looking for the presence of ?U+1F631 
FACE SCREAMING IN FEAR?, and apply CSS to it in the html output instead, 
viz.:
span.uniname { ? ? ? ? ? ? ? ? ? /* CHAR STYLES */
text-transform: lowercase;
font-variant: small-caps;
font-size: 110%;
}

As discussed here: 
<http://www.unicode.org/mail-arch/unicode-ml/y2016-m08/0055.html>

Personally I do not see the need for quotations around the character. 
U+____ SP ? SP NAME ought to be good enough: the single ? is going to 
be non-ASCII anyway. However there are implications for combining marks, 
with or without quotes?this needs to be thought through. Consider:
U+0308 ???? COMBINING DIAERESIS vs.
U+0308 ?? COMBINING DIAERESIS vs.
U+0308 ??? COMBINING DIAERESIS vs.
U+0308 ? COMBINING DIAERESIS.
See 
<http://stackoverflow.com/questions/2224772/whats-the-unicode-glyph-used-to-indicate-combining-characters>

The question is what happens when the ? is a specific protocol element, 
which frequently (but not always) is quoted, such as "+" and treated as 
verbatim text <spanx style="verb"> or the new <tt> in xml2rfc v3.

Section 3.6 (and elsewhere) discusses ?U+ notation? without a reference. 
Appendix A of [UnicodeCurrent] is appropriate.

Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160831/38efa979/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6499 bytes
Desc: not available
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160831/38efa979/attachment.png>