Re: [xml2rfc-dev] xml2rfc would not be able to render RFC 7997

Heather Flanagan <rse@rfc-editor.org> Tue, 15 October 2019 18:35 UTC

Return-Path: <rse@rfc-editor.org>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 238DF12007C for <xml2rfc-dev@ietfa.amsl.com>; Tue, 15 Oct 2019 11:35:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Q00pcaxjXvTC for <xml2rfc-dev@ietfa.amsl.com>; Tue, 15 Oct 2019 11:35:30 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 77843120122 for <xml2rfc-dev@ietf.org>; Tue, 15 Oct 2019 11:35:30 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id 5DC41203316; Tue, 15 Oct 2019 11:33:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sABzQWFuQgLX; Tue, 15 Oct 2019 11:33:15 -0700 (PDT)
Received: from [10.198.42.38] (c-71-231-216-10.hsd1.wa.comcast.net [71.231.216.10]) by c8a.amsl.com (Postfix) with ESMTPSA id 0D4CB203315; Tue, 15 Oct 2019 11:33:15 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Heather Flanagan <rse@rfc-editor.org>
In-Reply-To: <06116eaa-4dbb-1f35-6a76-d770e5775c12@gmx.de>
Date: Tue, 15 Oct 2019 11:35:29 -0700
Cc: XML Developer List <xml2rfc-dev@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <702D203A-2900-4290-8377-182F4AE2C359@rfc-editor.org>
References: <06116eaa-4dbb-1f35-6a76-d770e5775c12@gmx.de>
To: Julian Reschke <julian.reschke@gmx.de>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc-dev/Yqmc8o-3sqbb3yY40balwcznHck>
Subject: Re: [xml2rfc-dev] xml2rfc would not be able to render RFC 7997
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion about particulars of xml2rfc V3 design, development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc-dev/>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2019 18:35:33 -0000


> On Oct 14, 2019, at 11:58 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
> So,
> 
> RFC 7997 is "The Use of Non-ASCII Characters in RFCs". In
> <https://www.greenbytes.de/tech/webdav/rfc7997.html#rfc.section.3.2> it
> says:
> 
>> Example Acknowledgements section:
>> 
>> OLD:
>> 
>> The following people contributed significant text to early versions of this draft: Patrik Faltstrom, William Chan, and Fred Baker.
>> 
>> PROPOSED/NEW:
>> 
>> The following people contributed significant text to early versions of this draft: Patrik Fältström (Faltstrom), 陈智昌 (William Chan), and Fred Baker.
> 
> However,
> <https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-09#appendix-A.1>
> states:
> 
>> A.1.  <u>
>> 
>>   In xml2rfc vocabulary version 3, the elements <author>,
>>   <organisation>, <street>, <city>, <region>, <code>, <country>,
>>   <postalLine>, <email>, <seriesInfo>, and <title> may contain non-
>>   ascii characters for the purpose of rendering author names,
>>   addresses, and reference titles correctly.  They also have an
>>   additional "ascii" attribute for the purpose of proper rendering in
>>   ascii-only media.
>> 
>>   In order to insert Unicode characters in any other context, xml2rfc
>>   vocabulary v3 requires that the Unicode string be enclosed within an
>>   <u> element.  The element will be expanded inline based on the value
>>   of a "format" attribute.  This provides a generalised means of
>>   generating the 6 methods of Unicode renderings listed in [RFC7997],
>>   Section 3.4, and also several others found in for instance the RFC
>>   Format Tools example rendering of RFC 7700, at https://rfc-
>>   format.github.io/draft-iab-rfc-css-bis/sample2-v2.html.
>> 
>>   The "format" attribute accepts either a simplified format
>>   specification, or a full format string with placeholders for the
>>   various possible Unicode expansions.
>> 
>> A.1.1.  Expansion of simplified <u> format specifications
>> 
>>   The simplified format consists of dash-separated keywords, where each
>>   keyword represents a possible expansion of the Unicode character or
>>   string; use for example "<u "lit-num-name">foo</u>" to expand the
>>   text to its literal value, code point values, and code point names.
>> 
>>   A combination of up to 3 of the following keywords may be used,
>>   separated by dashes: "num", "lit", "name", "ascii", "char".  The
>>   keywords are expanded as follows and combined, with the second and
>>   third enclosed in parentheses (if present):
>> 
>>      "num"    The numeric value(s) of the element text, in U+1234
>>               notation
>> 
>>      "name"   The Unicode name(s) of the element text
>> 
>>      "lit"    The literal element text, enclosed in quotes
>> 
>>      "char"   The literal element text, without quotes
>> 
>>      "ascii"  The value of the 'ascii' attribute on the <u> element
>> 
>>   In order to ensure that no specification mistakes can result for
>>   rendering methods that cannot render all Unicode code points, "num"
>>   MUST always be part of the specified format.
>> 
>>   The default value of the "format" attribute is "lit-name-num".
> 
> So, unless I'm missing something, the only way to get non-ASCII
> characters into regular prose is using <u>, and using <u> implies
> automatic expansion of characters to numerical representations of the
> codepoints.
> 
> Possible solutions:
> 
> 1) In RFC 7997bis, remove the suggestion to allow non-ASCII names in
> Acknowledgements etc.
> 
> 2) Relax the requirements for <u> so that it doesn't *need* to be used
> in prose.
> 
> 3) Relax the requirement about output formats for <u>.
> 
> My preference would be 2) or 3).

I agree that 1) is not ideal - won’t go that route.

I like 3) over 2) because the point of <u> is to help be clear in text that might be semantically important for the spec about what characters are being used. If we just say “any prose”, I feel like that might open us up to the confusion we’re trying to avoid. Does that make sense?

I haven’t added <u> to the 7991bis doc. I’m currently looking at reverting <seriesInfo> as per https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/7, so I’m not far away from <u>. 

-Heather

> 
> Best regards, Julian
> 
> PS: tracked for now at
> <https://trac.tools.ietf.org/tools/xml2rfc/trac/ticket/416>
> 
> _______________________________________________
> xml2rfc-dev mailing list
> xml2rfc-dev@ietf.org
> https://www.ietf.org/mailman/listinfo/xml2rfc-dev