Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03

Asmus Freytag <asmusf@ix.netcom.com> Mon, 11 September 2023 08:03 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 39B5EC151077; Mon, 11 Sep 2023 01:03:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NW7lShiWZnKz; Mon, 11 Sep 2023 01:03:43 -0700 (PDT)
Received: from mta-101a.earthlink-vadesecure.net (mta-101b.earthlink-vadesecure.net [51.81.61.61]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DCBF7C151535; Mon, 11 Sep 2023 01:03:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=2mxB7jwGE80LxcYtCiyzessFaaiB8wHLIRNMLd VVDgU=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1694419421; x=1695024221; b=J1WU38po6C1ZwWvJtGfSWcholEKlU5rmItCPw+gsGyyh4klXwNEOdGy QYZVZzan0MAwCBdn8pZ6abi0iaFr0KSnXBp7Arrc5ONqaqZTilqidOJWK6rfQpLhMkhbSqw 5gy77wqaWoT3mwzuPGypcgRfYnCpo999fpW0opr8bHafIcg7RDJ/l7TR23Cfu5t4Is+KdJV i3CSz1D07Ksv3kpd8JIN4jOFSSAVKNMn2X0GAuwCrKWYGZBnqt36FIk21LGAC/KELK50lIM nYErpIyB2/Pvl7i1VQD7NsIvjCDfcUVc75OeDX6cIbw5glzgpfV37oS/XLQCrE16r2LH/zY x3A==
Received: from [10.71.219.206] ([142.147.89.234]) by vsel1nmtao01p.internal.vadesecure.com with ngmta id 7fd5a80e-1783c97c4a337164; Mon, 11 Sep 2023 08:03:40 +0000
Content-Type: multipart/alternative; boundary="------------fDkfu0m6ZuVGMn0UIBWaUGVb"
Message-ID: <3477e8e0-558c-8cbb-c282-ca70bf1467ae@ix.netcom.com>
Date: Mon, 11 Sep 2023 01:03:39 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0
Content-Language: en-US
To: "Manger, James" <James.H.Manger@team.telstra.com>, Tim Bray <tbray@textuality.com>
Cc: "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>
References: <CAHBU6is50TkpDsqXTp6WxdVSgE66j3gGHZ60ey2jFYbefaHFJw@mail.gmail.com> <ME3PR01MB59730B45D9339180AF00E941E5F3A@ME3PR01MB5973.ausprd01.prod.outlook.com> <CAHBU6ivc4W3KyYtbK2H7PQUa8C4+g=73nSTgBK+xLXnzH7V6GA@mail.gmail.com> <ME3PR01MB5973C8061732F354E5C7F242E5F2A@ME3PR01MB5973.ausprd01.prod.outlook.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <ME3PR01MB5973C8061732F354E5C7F242E5F2A@ME3PR01MB5973.ausprd01.prod.outlook.com>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Q732x_NX6d44juDyrpBgaYL02ks>
Subject: Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Sep 2023 08:03:45 -0000

It seems to me that the concerns about repertoire exist independent of 
whether ill-formed data can pass.

The proposed subset excludes noncharacters, which none of them lead to 
ill-formed encoding forms. It also excludes controls, again, those may 
be excluded in some other repertoires as well, but they don't represent 
an issue for the UTFs.

However, if you adopt a repertoire, it's necessary to say what that 
means. And that includes saying what will happen with data that is 
outside that repertoire. (Whether or not is is ill-formed on the level 
of the encoding form).

And part of that would include whether the specification is written as a 
text pipeline where the sender can stick escape sequences into the 
stream and the eventual receiver is expected to handle those as it they 
were permitted in the data stream (while all intervening partners can 
ignore them).

Or you would assert that the repertoire provides a firm restriction on 
the contents of any data and escape sequences may not exceed the 
repertoire. And conformant implementations will need to respond with 
proper warnings, exceptions etc.

All of these concerns should be addressed by a specification that uses 
the various subsets (and the draft should spell out guidance).

Icing on the cake would be if there was some terminology that 
differentiated these two approaches.


Seen in this light, the surrogates are only a special instance of a more 
general issue (except that they are also ill-formed on the level of the 
encoding form.). Therefore, it is probably not productive to deep end on 
which libraries raise exceptions properly. It would suffice to note that 
unless the specification (using a subset) itself addresses the need for 
strict response to ill-formed or out-of-repertoire input, 
implementations depending on it cannot be sure that the input is clean.

A./