Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
Steffen Nurpmeso <steffen@sdaoden.eu> Tue, 03 October 2023 16:04 UTC
Return-Path: <steffen@sdaoden.eu>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 589B6C14CE29; Tue, 3 Oct 2023 09:04:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Level:
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XBAGlb8Fu7hv; Tue, 3 Oct 2023 09:04:17 -0700 (PDT)
Received: from sdaoden.eu (sdaoden.eu [217.144.132.164]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 986DCC14CF05; Tue, 3 Oct 2023 09:04:16 -0700 (PDT)
Date: Tue, 03 Oct 2023 18:04:13 +0200
Author: Steffen Nurpmeso <steffen@sdaoden.eu>
From: Steffen Nurpmeso <steffen@sdaoden.eu>
To: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>
Cc: Tim Bray <tbray@textuality.com>, "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>, Steffen Nurpmeso <steffen@sdaoden.eu>
Message-ID: <20231003160413.zxbWD%steffen@sdaoden.eu>
In-Reply-To: <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com>
References: <169566019635.41806.9804796677919971070@ietfa.amsl.com> <CAHBU6is-wU2NLXNWL56nSJ4=nKvDzGv_Aw4qJN6N2O8CuM4-yw@mail.gmail.com> <SYBPR01MB59814B3448F5754AAEDA1740E5C7A@SYBPR01MB5981.ausprd01.prod.outlook.com> <CAHBU6iueqtd5T1T-ciYUMWvmo8XqBQqO5LkWbdRaoXQzPYSQOQ@mail.gmail.com> <SY4PR01MB5980D009F1623E3694B871B7E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com> <CAChr6SzMXqmEJvwQ0Vb0+CfchBn2kMueQJ-2Th1=4Oct8b9t6A@mail.gmail.com> <E1464943-EB11-4FA4-B933-4F138C6C34A0@tzi.org> <CAHBU6itgC07j0P5DcACDyHSjEOG6=j5kWE=eYF8E0NA3mm_b5A@mail.gmail.com> <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outlook.com>
Mail-Followup-To: "Manger, James" <James.H.Manger=40team.telstra.com@dmarc.ietf.org>, Tim Bray <tbray@textuality.com>, "i18ndir@ietf.org" <i18ndir@ietf.org>, ART Area <art@ietf.org>, Steffen Nurpmeso <steffen@sdaoden.eu>
User-Agent: s-nail v14.9.24-524-gd5f7c65f62
OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt
BlahBlahBlah: Any stupid boy can crush a beetle. But all the professors in the world can make no bugs.
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/hWUcezo-NjAmwdreem0jAwwhp98>
Subject: Re: [I18ndir] [art] New Version Notification for draft-bray-unichars-06.txt
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Oct 2023 16:04:22 -0000
Manger, James wrote in <SY4PR01MB59803C733B6B6A1C9D4E04F4E5C5A@SY4PR01MB5980.ausprd01.prod.outl\ ook.com>: |[1]draft-bray-unichars[/1] §3 “Dealing with problematic code points” \ |suggests “replacing problematic code points with "�" (U+FFFD, REPLACEMENT \ |CHARACTER)” (or signalling an error, |but I’ll only talk about the replacement option in this email). | | [1] https://datatracker.ietf.org/doc/html/draft-bray-unichars | |* An ill-formed sequence of code units needs to be replaced. It is \ |far less obvious to me that “problematic” scalars should be replaced. \ |Even for noncharacters Unicode provides a |good [2] FAQ[/2] and [3]corrigendum #9 “Clarification about noncharact\ |ers”[/3] that suggests passing them along (treating them like unassigned \ |scalars) is often the best policy |(because the internal/interchange boundary is blurry). |So §4.3 defining unicode-assignable that excludes noncharacters is \ |fine -- when to be lenient on receiving a supposed unicode-assignable \ |value is less obvious. |But §3 looks dodgy. | |* U+FFFD is an obvious choice to replace code units or scalars you \ |don’t want. But Unicode does allow choices. [4]Unicode ch3[/4] C10 \ |only says “with a marker such as U+FFFD”. [5] |Unicode TR36[/5] says “where U+FFFD is not available, a common alternative \ |is "?"”. Java, for instance, uses “?” is some common circumstances. \ |Unichars does not admit such an option. The situation on the iconv(3) (POSIX, Portable Operating System Interface) front is a disaster really. Let me quote some code snippet of the honourable Bruno Haible of GNU iconv and more: /* Irix iconv() inserts a NUL byte if it cannot convert. NetBSD iconv() inserts a question mark if it cannot convert. ("Citrus" that is, also used on some other BSD's.) The small (but very neat, and lots of "impressive" code snippets) musl C library as used for example by AlpineLinux uses asterisk *. Only GNU libiconv and GNU libc are known to prefer to fail rather than doing a lossy conversion. */ Irix made an incredible bad choice. Anyhow it is all totally intransparent for programmers anyhow, since you do know nothing of at least one character set, neither the "resolved name", nor whether it is multi-byte or "multi-word" or "multi-multi-{word,multi}", not whether it has "state", not whether ASCII NUL can be a vivid part (of "unused" "word"s), nothing. Off-topic but i am still hoping we get a dedicated _addressable_ behaviour switch for iconv(3) that then does _not_ overload the EILSEQ error (as there is no "invalid input", just missing output convertability). Of course, this is all about boundaries in between different character sets with non-Unicode involved. Just to add that the most widely used conversion library GNU iconv does for many years: UTF-8: Reject surrogates and out-of-range code points. * lib/utf8.h (utf8_mbtowc, utf8_wctomb): Reject code points in the range 0xD800..0xDFFF and >= 0x110000. As in, "IETF is one thing, but programmers have to deal with that on the code side". No no. UTF-8 as in RFC 3629, and with _exactly_ the "well-formed" check as defined by if(LIKELY(x <= 0x7Fu)) c = x; /* 0xF8, but Unicode guarantees maximum of 0x10FFFFu -> F4 8F BF BF. * Unicode 9.0, 3.9, UTF-8, Table 3-7. Well-Formed UTF-8 Byte Sequences */ else if(LIKELY(x > 0xC0u && x <= 0xF4u)){ ... }else goto jerr; I am a bit outdated. If some _sender_ pushes through garbage because it has to, and i have had requests to disable iconv so that garbage is sent over as mail for the MUA i maintain, mind you, there are administrators and they want the mail report to be sent out no matter what garbage there is!, .. and let me tell it was requested by a Dr. aka Bachelor++, even a Master-of-the-Universe, these are the real life guys, heaven!, and, furthermore, that over fourty years old protocol with its almost thirty years MIME extensions actually CAN deliver this s..t!!, then, if the IETF is about to define that network protocols should use UTF-8, then it should define that the one and only Unicode character which was specifically designed for that purpose -- to indicate that some non-representable had to be represented -- should be used. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
- [I18ndir] Fwd: New Version Notification for draft… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Claudio Allocchio
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Paul Hoffman
- Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Carsten Bormann
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
- Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] New Version Notification for … Steffen Nurpmeso
- Re: [I18ndir] [art] New Version Notification for … Manger, James
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Steffen Nurpmeso
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Manger, James
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Martin J. Dürst
- Re: [I18ndir] Fwd: New Version Notification for d… Asmus Freytag
- Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
- Re: [I18ndir] [art] New Version Notification for … Tim Bray
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Manger, James
- Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] Fwd: New Version Notification… Martin J. Dürst
- Re: [I18ndir] [art] New Version Notification for … Martin J. Dürst
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre
- Re: [I18ndir] [art] New Version Notification for … Rob Sayre