[rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02, U+ syntax
dev+ietf at seantek.com (Sean Leonard) Thu, 01 September 2016 23:29 UTC
From: dev+ietf at seantek.com (Sean Leonard)
Date: Thu, 1 Sep 2016 16:29:36 -0700
Subject: [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-nonascii-02,
U+ syntax
In-Reply-To: <f658d8b8-cfea-53e2-546f-366add175766@it.aoyama.ac.jp>
References: <ecd3d504-764e-2b6d-72bd-3343ad22660d@seantek.com>
<C5791071-864F-47A8-916B-95D8BE985178@vpnc.org>
<f658d8b8-cfea-53e2-546f-366add175766@it.aoyama.ac.jp>
Message-ID: <9d02fc10-38c9-2a93-6d96-7ff224650e04@seantek.com>
On 9/1/2016 3:18 AM, Martin J. D?rst wrote:
> P.S.: While I'm at it, in the sentence:
> BCP 137, "ASCII Escaping of Unicode
> Character" describes the pros and cons of different options for
> identifying Unicode characters in an ASCII document BCP137 [BCP137].
> there's just a bit too many "BCP 137" for my (and I hope everybody
> else's) taste. (Unless this is an error produced by the html tools
> version.)
I agree with Martin's assessment.
Suggested rewrite:
How the Unicode character, code point, and name or name
alias are written in the body may
depend on context and the specific character(s) in question.
[BCP137] and Appendix A of
[UnicodeCurrent] provide alternatives and suggestions.
All reasonable variations are acceptable within an RFC.
Regards,
Sean
>
>
> On 2016/09/01 04:25, Paul Hoffman wrote:
>> On 31 Aug 2016, at 10:02, Sean Leonard wrote:
>>
>>> /(Sent this to the authors, and the suggestion was that this is the
>>> right mailing list for public discussion.)/
>>>
>>> **********
>>> Hello draft-iab-rfc-nonascii-02 people, here is feedback on
>>> draft-iab-rfc-nonascii-02.
>>>
>>> Section 3.4 of draft-iab-rfc-nonascii-02 provides no less than six
>>> preferred alternatives for how to represent a single Unicode character
>>> or code point. They all pretty much say ?the ___ character (___)? in
>>> various permutations. None of these are inherently wrong.
>>>
>>> However, The Unicode Standard itself (9.0.0 and prior versions)
>>> provides a specific convention in Appendix A:
>>> ?U+[x][x]xxxx NAME OF CHARACTER?
>>>
>>> Notably, the convention does not use ?the ___ character? formulation.
>>> Grammatically, the convention is a character, so an article is
>>> omitted. A conforming example would be:
>>>
>>> 1. Temperature changes in the Temperature Control Protocol are
>>> indicated by U+2206 INCREMENT.
>>>
>>> I would like to propose that this be used as at least a priority
>>> alternative.
>>
>> Disagree. That formulation is harder to read in running text, and
>> running text is exactly the formulation we are aiming for. The fact that
>> TUC likes a particular format should not impinge on our choice for
>> readability.
>>
>>>
>>> In The Unicode Standard, two other conventions are noted:
>>>
>>> U+1F631 ??? FACE SCREAMING IN FEAR
>>>
>>> U+1F631 ???
>>>
>>> These conventions show all-caps, and small-caps (which for PDF
>>> presentation purposes, are actually stored as lowercase). They also
>>> show curly quotes. I asked the Unicode mailing list over the weekend
>>> and the general sense is that the uppercase is normative in plain text
>>> (as shown in the UCD) but case distinctions, along with space and
>>> (nearly all) hyphens, are not relevant for unambiguous identification.
>>
>> Neither of these are easier to read in running text than the ones in the
>> draft.
>>
>>>
>>> draft-iab-rfc-nonascii-02 is only concerned with characters, not
>>> semantics or presentation formats (unlike xml2rfc format). Assuming
>>> that plain text is the norm for purposes of draft-iab-rfc-nonascii-02,
>>> I suppose that it is sufficient for the plain text to have an ALL-CAPS
>>> name. I was going to suggest a novel xml2rfc element for Unicode code
>>> points, such as <ucode name="yes">?</ucode> that would be transformed
>>> into the output above in plain text mode. However, the xml2rfc
>>> transformer can detect such text by looking for the presence of
>>> ?U+1F631 FACE SCREAMING IN FEAR?, and apply CSS to it in the html
>>> output instead, viz.:
>>> span.uniname { ? ? ? ? ? ? ? ? ? /* CHAR STYLES */
>>> text-transform: lowercase;
>>> font-variant: small-caps;
>>> font-size: 110%;
>>> }
>>>
>>> As discussed here:
>>> <http://www.unicode.org/mail-arch/unicode-ml/y2016-m08/0055.html>
>>>
>>> Personally I do not see the need for quotations around the character.
>>> U+____ SP ? SP NAME ought to be good enough: the single ? is going
>>> to be non-ASCII anyway. However there are implications for combining
>>> marks, with or without quotes?this needs to be thought through.
>>> Consider:
>>> U+0308 ???? COMBINING DIAERESIS vs.
>>> U+0308 ?? COMBINING DIAERESIS vs.
>>> U+0308 ??? COMBINING DIAERESIS vs.
>>> U+0308 ? COMBINING DIAERESIS.
>>> See
>>> <http://stackoverflow.com/questions/2224772/whats-the-unicode-glyph-used-to-indicate-combining-characters>
>>>
>>>
>>>
>>> The question is what happens when the ? is a specific protocol
>>> element, which frequently (but not always) is quoted, such as "+" and
>>> treated as verbatim text <spanx style="verb"> or the new <tt> in
>>> xml2rfc v3.
>>
>> This is another good reason for the current rules.
>>
>>>
>>> Section 3.6 (and elsewhere) discusses ?U+ notation? without a
>>> reference. Appendix A of [UnicodeCurrent] is appropriate.
>>
>> That seems fine.
>> _______________________________________________
>> rfc-interest mailing list
>> rfc-interest at rfc-editor.org
>> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.rfc-editor.org/pipermail/rfc-interest/attachments/20160901/0b5d0c38/attachment.html>
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Paul Hoffman
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Martin J. Dürst
- [rfc-i] Feedback on Section 3.4 in draft-iab-rfc-… Sean Leonard