[rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax
dev+ietf at seantek.com (Sean Leonard) Thu, 01 September 2016 04:58 UTC
From: dev+ietf at seantek.com (Sean Leonard)
Date: Wed, 31 Aug 2016 21:58:04 -0700
Subject: [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02,
U+ syntax
In-Reply-To: <C5791071-864F-47A8-916B-95D8BE985178@vpnc.org>
References: <ecd3d504-764e-2b6d-72bd-3343ad22660d@seantek.com>
<C5791071-864F-47A8-916B-95D8BE985178@vpnc.org>
Message-ID: <61a780a3-5892-1374-8d16-f0af51c1aba3@seantek.com>
On 8/31/2016 12:25 PM, Paul Hoffman wrote:
> On 31 Aug 2016, at 10:02, Sean Leonard wrote:
>
>> /(Sent this to the authors, and the suggestion was that this is the
>> right mailing list for public discussion.)/
>>
>> **********
>> Hello draft-iab-rfc-nonascii-02 people, here is feedback on
>> draft-iab-rfc-nonascii-02.
>>
>> Section 3.4 of draft-iab-rfc-nonascii-02 provides no less than six
>> preferred alternatives for how to represent a single Unicode
>> character or code point. They all pretty much say ?the ___ character
>> (___)? in various permutations. None of these are inherently wrong.
>>
>> However, The Unicode Standard itself (9.0.0 and prior versions)
>> provides a specific convention in Appendix A:
>> ?U+[x][x]xxxx NAME OF CHARACTER?
>>
>> Notably, the convention does not use ?the ___ character? formulation.
>> Grammatically, the convention is a character, so an article is
>> omitted. A conforming example would be:
>>
>> 1. Temperature changes in the Temperature Control Protocol are
>> indicated by U+2206 INCREMENT.
>>
>> I would like to propose that this be used as at least a priority
>> alternative.
>
> Disagree. That formulation is harder to read in running text, and
> running text is exactly the formulation we are aiming for. The fact
> that TUC likes a particular format should not impinge on our choice
> for readability.
I respectfully disagree.
As an editorial matter, draft-iab-rfc-nonascii does not express "our
[the IETF's] choice for readability". It offers no less than six
"preferred" options, and one "acceptable" option. Then it says that it's
all context-dependent.
Where I am coming from is that six or seven different options are not
helpful in the text, especially when a very commonly used option
(Appendix A of [UnicodeCurrent]) in the industry is not illuminated. I
actually just did a search of the RFC series, with the regex:
/U\+([0-9A-Fa-f]){4,6}/
and found that variations of U+hhhh[h][h] NAME have been very common.
(Variations include putting the NAME first, putting either the U+ or the
NAME in parens or quotes, etc., but in general, closer to Appendix A of
[UnicodeCurrent] than to draft-iab-rfc-nonascii.) The second most common
variation is straight up U+hhhh[h][h] notation with no further
embellishments.
My overall editorial point is that Section 3.4 be simplified to:
3.4. Body of the Document
When the mention of Unicode characters is required for correct
protocol operation and understanding, the characters' Unicode
character names or code points MUST be included in the text.
For a single Unicode character, at least two of the following
three pieces of data MUST be included:
the character itself, the character name or name alias,
and the character code point.
o Characters beyond the ASCII range will require identifying
the Unicode code point.
o Use of the actual character (e.g., ?) is encouraged so
that a reader can more easily see what the character is, if their
device can render the text.
o The use of the Unicode character names or name aliases
like "INCREMENT" in
addition to the use of Unicode code points is also encouraged.
When used, Unicode character names should be in all capital
letters.
Examples:
OLD [RFC7564]:
However, the problem is made more serious by introducing the full
range of Unicode code points into protocol strings. For example,
the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from
the Cherokee block look similar to the ASCII characters "STPETER" as
they might appear when presented using a "creative" font family.
NEW/ALLOWED:
However, the problem is made more serious by introducing the full
range of Unicode code points into protocol strings. For example,
the characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2
(???????) from the Cherokee block look similar to the ASCII
characters "STPETER" as they might appear when presented using a
"creative" font family.
ALSO ACCEPTABLE:
However, the problem is made more serious by introducing the full
range of Unicode code points into protocol strings. For example,
the characters "???????" (U+13DA U+13A2 U+13B5 U+13AC U+13A2
U+13AC U+13D2) from the Cherokee block look similar to the ASCII
characters "STPETER" as they might appear when presented using a
"creative" font family.
How the Unicode character, code point, and name or name
alias are written in the body may
depend on context and the specific character(s) in question. All are
acceptable within an RFC. BCP 137, "ASCII Escaping of Unicode
Character" describes the pros and cons of different options for
identifying Unicode characters in an ASCII document [BCP137];
see also Appendix A of [UnicodeCurrent].
With respect to Section 3.6:
3.6. Code Components
The RFC Editor encourages the use of the U+ notation
(Appendix A of [UnicodeCurrent])
except within a
code component where you must follow the rules of the programming
language in which you are writing the code.
Code components are generally expected to use fixed-width fonts.
Where such fonts are not available for a particular script, the best
script- appropriate font will be used for that part of the code
component.
Regards,
Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160831/a96c5ae5/attachment.html>
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Paul Hoffman
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Martin J. Dürst
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard