Re: [xml2rfc-dev] xml2rfc would not be able to render RFC 7997

Henrik Levkowetz <henrik@levkowetz.com> Tue, 15 October 2019 19:46 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: xml2rfc-dev@ietfa.amsl.com
Delivered-To: xml2rfc-dev@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30B7212004A for <xml2rfc-dev@ietfa.amsl.com>; Tue, 15 Oct 2019 12:46:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.107
X-Spam-Level:
X-Spam-Status: No, score=-1.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IvS9oJEgLbAi for <xml2rfc-dev@ietfa.amsl.com>; Tue, 15 Oct 2019 12:46:35 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66EA5120046 for <xml2rfc-dev@ietf.org>; Tue, 15 Oct 2019 12:46:35 -0700 (PDT)
Received: from h-202-242.a357.priv.bahnhof.se ([158.174.202.242]:61230 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1iKSm9-0002jo-JI; Tue, 15 Oct 2019 12:46:34 -0700
To: "Andrew G. Malis" <agmalis@gmail.com>
References: <06116eaa-4dbb-1f35-6a76-d770e5775c12@gmx.de> <702D203A-2900-4290-8377-182F4AE2C359@rfc-editor.org> <1e73462a-b240-88ec-2ac1-068b3a1e0d2f@levkowetz.com> <CAA=duU0UEMPRRSjzm=K2FUsHSnntky2aNTB0Ni1tgrMZ_4SoBA@mail.gmail.com>
Cc: Heather Flanagan <rse@rfc-editor.org>, Julian Reschke <julian.reschke@gmx.de>, XML Developer List <xml2rfc-dev@ietf.org>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <b02dd39e-8071-56d4-8405-f2e2df359b91@levkowetz.com>
Date: Tue, 15 Oct 2019 21:46:25 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAA=duU0UEMPRRSjzm=K2FUsHSnntky2aNTB0Ni1tgrMZ_4SoBA@mail.gmail.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="iHNxcRIVUPdw3IqPRHCCnF3ev3RPp3Gbf"
X-SA-Exim-Connect-IP: 158.174.202.242
X-SA-Exim-Rcpt-To: xml2rfc-dev@ietf.org, julian.reschke@gmx.de, rse@rfc-editor.org, agmalis@gmail.com
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc-dev/U42b6jq-f82cBMHozpajzfxP_d8>
Subject: Re: [xml2rfc-dev] xml2rfc would not be able to render RFC 7997
X-BeenThere: xml2rfc-dev@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion about particulars of xml2rfc V3 design, development and code." <xml2rfc-dev.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc-dev/>
List-Post: <mailto:xml2rfc-dev@ietf.org>
List-Help: <mailto:xml2rfc-dev-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc-dev>, <mailto:xml2rfc-dev-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2019 19:46:37 -0000

Hi Andy,

On 2019-10-15 21:30, Andrew G. Malis wrote:
> Henrik,
> 
> It's not just contributors, but general acknowledgements as well, such as
> the example in Julian's email that kicked off this thread. Why not just
> allow non-ASCII everywhere? That solves Tom's problem as well.

From one point of view, that would certainly be the easiest.  However, it
also makes it essentially impossible to build tooling that checked that
the limitations of RFC7997 are followed.

With the current <u>, it is hard to violate 7997.  Without it, it is very
easy to do so, and very hard to separate valid and invalid use of non-ASCII.

If RFC 7997 changes and lifts the restrictions on non-ASCII, the need for
<u> becomes much less.

Till then, I'd very much prefer to make it possible to do exactly what 7997
permits, rather than generally lift the restriction on use of non-ASCII.

Best regards,

	Henrik

> 
> Cheers,
> Andy
> 
> 
> On Tue, Oct 15, 2019 at 3:07 PM Henrik Levkowetz <henrik@levkowetz.com>
> wrote:
> 
>>
>> On 2019-10-15 20:35, Heather Flanagan wrote:
>> >
>> >
>> >> On Oct 14, 2019, at 11:58 PM, Julian Reschke <julian.reschke@gmx.de>
>> wrote:
>> >>
>> >> So,
>> >>
>> >> RFC 7997 is "The Use of Non-ASCII Characters in RFCs". In
>> >> <https://www.greenbytes.de/tech/webdav/rfc7997.html#rfc.section.3.2> it
>> >> says:
>> >>
>> >>> Example Acknowledgements section:
>> >>>
>> >>> OLD:
>> >>>
>> >>> The following people contributed significant text to early versions of
>> this draft: Patrik Faltstrom, William Chan, and Fred Baker.
>> >>>
>> >>> PROPOSED/NEW:
>> >>>
>> >>> The following people contributed significant text to early versions of
>> this draft: Patrik Fältström (Faltstrom), 陈智昌 (William Chan), and Fred
>> Baker.
>> >>
>> >> However,
>> >> <
>> https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-09#appendix-A.1
>> >
>> >> states:
>> >>
>> >>> A.1.  <u>
>> >>>
>> >>>   In xml2rfc vocabulary version 3, the elements <author>,
>> >>>   <organisation>, <street>, <city>, <region>, <code>, <country>,
>> >>>   <postalLine>, <email>, <seriesInfo>, and <title> may contain non-
>> >>>   ascii characters for the purpose of rendering author names,
>> >>>   addresses, and reference titles correctly.  They also have an
>> >>>   additional "ascii" attribute for the purpose of proper rendering in
>> >>>   ascii-only media.
>> >>>
>> >>>   In order to insert Unicode characters in any other context, xml2rfc
>> >>>   vocabulary v3 requires that the Unicode string be enclosed within an
>> >>>   <u> element.  The element will be expanded inline based on the value
>> >>>   of a "format" attribute.  This provides a generalised means of
>> >>>   generating the 6 methods of Unicode renderings listed in [RFC7997],
>> >>>   Section 3.4, and also several others found in for instance the RFC
>> >>>   Format Tools example rendering of RFC 7700, at https://rfc-
>> >>>   format.github.io/draft-iab-rfc-css-bis/sample2-v2.html.
>> >>>
>> >>>   The "format" attribute accepts either a simplified format
>> >>>   specification, or a full format string with placeholders for the
>> >>>   various possible Unicode expansions.
>> >>>
>> >>> A.1.1.  Expansion of simplified <u> format specifications
>> >>>
>> >>>   The simplified format consists of dash-separated keywords, where each
>> >>>   keyword represents a possible expansion of the Unicode character or
>> >>>   string; use for example "<u "lit-num-name">foo</u>" to expand the
>> >>>   text to its literal value, code point values, and code point names.
>> >>>
>> >>>   A combination of up to 3 of the following keywords may be used,
>> >>>   separated by dashes: "num", "lit", "name", "ascii", "char".  The
>> >>>   keywords are expanded as follows and combined, with the second and
>> >>>   third enclosed in parentheses (if present):
>> >>>
>> >>>      "num"    The numeric value(s) of the element text, in U+1234
>> >>>               notation
>> >>>
>> >>>      "name"   The Unicode name(s) of the element text
>> >>>
>> >>>      "lit"    The literal element text, enclosed in quotes
>> >>>
>> >>>      "char"   The literal element text, without quotes
>> >>>
>> >>>      "ascii"  The value of the 'ascii' attribute on the <u> element
>> >>>
>> >>>   In order to ensure that no specification mistakes can result for
>> >>>   rendering methods that cannot render all Unicode code points, "num"
>> >>>   MUST always be part of the specified format.
>> >>>
>> >>>   The default value of the "format" attribute is "lit-name-num".
>> >>
>> >> So, unless I'm missing something, the only way to get non-ASCII
>> >> characters into regular prose is using <u>, and using <u> implies
>> >> automatic expansion of characters to numerical representations of the
>> >> codepoints.
>> >>
>> >> Possible solutions:
>> >>
>> >> 1) In RFC 7997bis, remove the suggestion to allow non-ASCII names in
>> >> Acknowledgements etc.
>> >>
>> >> 2) Relax the requirements for <u> so that it doesn't *need* to be used
>> >> in prose.
>> >>
>> >> 3) Relax the requirement about output formats for <u>.
>> >>
>> >> My preference would be 2) or 3).
>> >
>> > I agree that 1) is not ideal - won’t go that route.
>> >
>> > I like 3) over 2) because the point of <u> is to help be clear in text
>> that might be semantically important for the spec about what characters are
>> being used. If we just say “any prose”, I feel like that might open us up
>> to the confusion we’re trying to avoid. Does that make sense?
>>
>> The problem here is that if you relax the requirements on <u> too much,
>> it looses its function.  It's current function is exactly to permit
>> insertion of non-ASCII in prose, but only if there is an expansion that
>> guarantees that the resulting specification always is explicit.  If it's
>> possible to use <u> to insert arbitrary non-ascii without expansion,
>> you're effectively back at no limitations on non-ascii at all.
>>
>> I'm very strongly against removing the restriction on <u>.  In that case
>> it's better to permit any unicode in prose in general, and just drop <u>.
>>
>> For the specific purpose of permitting non-ascii names in acknowledgements,
>> I'd like to suggest that we consider approaches that build on the current
>> <author> entry instead.  For author, we already have well-defined handling
>> of ASCII and non-ASCII parts that we can build on. Some possible
>> variations:
>>
>>  * Add a role="contributor" to <author>, and automatically generate a
>>    contributors section.
>>
>>  * Add a role="contributor" to <author>, and make it possible to use <xref>
>>    to pull in contributor names at selected points in prose
>>
>>  * Add a role="contributor" to <author>, and add a new <aref> element that
>>    lets you reference (insert names from) such entries in prose.
>>
>>  * Permit insertion of <author> entries in prose directly.
>>
>>
>> Regards,
>>
>>         Henrik
>>
>>
>>
>> > I haven’t added <u> to the 7991bis doc. I’m currently looking at
>> reverting <seriesInfo> as per
>> https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/7, so I’m
>> not far away from <u>.
>> >
>> > -Heather
>> >
>> >>
>> >> Best regards, Julian
>> >>
>> >> PS: tracked for now at
>> >> <https://trac.tools.ietf.org/tools/xml2rfc/trac/ticket/416>
>> >>
>> >> _______________________________________________
>> >> xml2rfc-dev mailing list
>> >> xml2rfc-dev@ietf.org
>> >> https://www.ietf.org/mailman/listinfo/xml2rfc-dev
>> >
>> > _______________________________________________
>> > xml2rfc-dev mailing list
>> > xml2rfc-dev@ietf.org
>> > https://www.ietf.org/mailman/listinfo/xml2rfc-dev
>> >
>>
>> _______________________________________________
>> xml2rfc-dev mailing list
>> xml2rfc-dev@ietf.org
>> https://www.ietf.org/mailman/listinfo/xml2rfc-dev
>>
>