Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03

Tim Bray <tbray@textuality.com> Sun, 10 September 2023 21:23 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ABAEAC151085 for <i18ndir@ietfa.amsl.com>; Sun, 10 Sep 2023 14:23:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zxz6QoyfZESy for <i18ndir@ietfa.amsl.com>; Sun, 10 Sep 2023 14:23:29 -0700 (PDT)
Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BC969C14CE44 for <i18ndir@ietf.org>; Sun, 10 Sep 2023 14:23:29 -0700 (PDT)
Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-5009d4a4897so6387055e87.0 for <i18ndir@ietf.org>; Sun, 10 Sep 2023 14:23:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1694381008; x=1694985808; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5wPqNswIFfotFTj7j6wcHPGwXteyQ3zKrQcAJdyWwrQ=; b=NG9MQU8g5gsSuB9UerpNHggn1DoMmXhIKTHRN/CC//81Ts4ySyjEo4p9Ly6LDZtDPE 8CklB4wI/qeqFlYEE5wD+21Npvb9CSNAaHZuROjpXxAE2mVf+X3c0+L8CETCuoujNbtV 0KfP89lob+6Bp1NSf1XpacWST9EPlL8R3AYGQ=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694381008; x=1694985808; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5wPqNswIFfotFTj7j6wcHPGwXteyQ3zKrQcAJdyWwrQ=; b=LTR05habFU+xrIIPa/LBl/R8nIfToC3HOl1kyWJDGXcFSctTI9qy6ADCI00NOP6ums XEF88A8/pylRIA72Cp35GeEAhbYtLuk1t7Wap4zA/2En3SjN1v3X6mBBNEXKB7LBTdlJ QEKtm45mfbLM1xiiYfHpLmIZw6r1V9Nid7mkh6CnfVMJCJ3gm1XNRJg1231qXOMMWKH7 9q6uEFOykxerXeC5uY21qU9l85Y7Z7e72XzD6/HrK2VrezMlnRCLfrQLTZ+bVKJUNWT6 Ajw+1vMmBPWqWetMP3KoBS9jIfSFptuu3Qamxb6tC6Vn3gSCimXLJ81GPg5pgWZrx/ZX 1hTw==
X-Gm-Message-State: AOJu0YwS66w/QG/5zpN0xt61A7ryT8rECJmee0QXjWQOTkKTXKYB0Rw5 bRwO0e8s+Y9IhSijg0nVw8F0lQh/s8V2mcOoPsxgqA==
X-Google-Smtp-Source: AGHT+IFIBoKGUgAgTX1x5WK0hCXP6aF3VZ+L9NW8gZvHw26FNn6kBFHm67ptuebdOS6JHPDo+NX5qkLU3fAuin9WbTg=
X-Received: by 2002:a05:6512:610:b0:4fb:99d9:6ba2 with SMTP id b16-20020a056512061000b004fb99d96ba2mr5715918lfe.24.1694381007740; Sun, 10 Sep 2023 14:23:27 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Sun, 10 Sep 2023 14:23:26 -0700
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Sun, 10 Sep 2023 14:23:23 -0700
Mime-Version: 1.0 (Mimestream 1.1.1)
References: <CAHBU6ivc4W3KyYtbK2H7PQUa8C4+g=73nSTgBK+xLXnzH7V6GA@mail.gmail.com> <9885BB5C-89C5-40AA-96CB-CE5A9811D39D@tzi.org>
In-Reply-To: <9885BB5C-89C5-40AA-96CB-CE5A9811D39D@tzi.org>
From: Tim Bray <tbray@textuality.com>
Date: Sun, 10 Sep 2023 14:23:26 -0700
Message-ID: <CAHBU6isneG874m8vJixwB0y3E_1p_E15vDH4gB4uuegkpA1tCg@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: "Manger, James" <James.H.Manger@team.telstra.com>, Asmus Freytag <asmusf@ix.netcom.com>, i18ndir@ietf.org, ART Area <art@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000014bb55060507d275"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/PwZohFWoRvyCqlghEofEoRs15oU>
Subject: Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Sep 2023 21:23:34 -0000

My understanding of what you’re saying: Since the IETF mandates the use of
UTF-8, and since the definition of UTF-8 forbids the encoding of surrogates
(while being technically capable of encoding them), there are no problems
here, nobody ever need worry about surrogates, even when a packaging format
explicitly allows all Unicode code points.

With respect, I disagree, if only because of JSON’s unfortunate escaping
mechanism, and because a service I helped build at AWS recently experienced
a production takedown because failure to deal with an incoming bogus
surrogate.  I think the statement in the current draft, that developers
must be prepared to handle various flavors of Unicode garbage, is factually
correct, and that explicitly restricting character repertoires will be
beneficial for specification authors and specification readers.

This is not trying to be a general how-to-use-Unicode BCP. If you want to
build one of those, spin up a WG. This draft identifies and discusses
several character repertoires, for use in references from other documents.


On Sep 10, 2023 at 12:30:32 PM, Carsten Bormann <cabo@tzi.org> wrote:

>
>
> Sent from mobile, sorry for terse
>
> On 10. Sep 2023, at 19:52, Tim Bray <tbray@textuality.com> wrote:
>
> However, you’ve established that the reading of 8949 is at least ambiguous
> on this point.
>
>
> That completely misleading impression is a symptom of the fact that this
> document is unsalvageable.
>
> STD 63 (RFC 3629) is utterly clear on what utf-8 is.
> Therefore rfc8949 did not have to do anything except reference STD  63 to
> be completely clear about what cbor text strings are as well.
>
> If you wonder about my strong reaction (and if you don’t), please do read
> rfc9413.
>
>
>