Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03

Rob Sayre <sayrer@gmail.com> Sat, 09 September 2023 23:56 UTC

Return-Path: <sayrer@gmail.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 687FBC151095; Sat, 9 Sep 2023 16:56:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DNIVx4W62X9J; Sat, 9 Sep 2023 16:56:39 -0700 (PDT)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DA0A5C15107D; Sat, 9 Sep 2023 16:56:39 -0700 (PDT)
Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-52bcd4db4c0so6940838a12.0; Sat, 09 Sep 2023 16:56:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694303798; x=1694908598; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZHrc7XYqlcHcCCBx9Np/L66T6795D2NZ7iYQq1PvU/U=; b=bi2vLpyrCA2pW3bVo6oLKm3c4YhH91rzi0or7M18q0LcaXp/7gILWrdKAXKRz7zl0a 8Z/t5y3a4p56BfOytBBb6ckgaA8yuKUrSS0cQlXU6kpdyd1FvpMAssQ5czcZOK3YJqCP bhxJlB7YTH5ei4Y10YoCo1cAdCwbB5SWhJTicf1Iuv7Y4I7QtnIE5rHvMMJ37rNW+ATY S/npBMoQYU/obnQiIFZeNPFbn2hSZDPKfuoHQSytBacTWTuSMh9bpLARBpxmTE7vX78c ERhm7jynHZaDIeEH0gWBC16ALreeeKosb4rKnJ6BM5DkjR036Up9y5LFo1lkKtqT9I7g KcFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694303798; x=1694908598; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZHrc7XYqlcHcCCBx9Np/L66T6795D2NZ7iYQq1PvU/U=; b=WnnOeLE+Qt7L9b9fz6C2xNa8+P8Lvv4EbSv4iwz7P4pSpxWJJmxElwXeEdPM9aWUkc 4EOfcKaAvMwL0eqMzfIvlrVQp1W9BKhPuw/+VTXMlGqtxuOHQHaU3ejBjQDKTcbGsMH1 Bt3NTh+9HoshdQTI7p303h0PUAbDGw8JZhkZDNe/haYF4NvDRuikAOAgjp0fuvwWzGom OaGqbYGg+qjox8qSeiSSNWYpMzJXzIvrLF7XOo5q/IpznRHkPZkGTsvI+G7T4IOdTHUw aSfDqFYs2wDdbrX0h1AaqVspmB/Cmtqt4POXpFobNC7+dEbbhlLTP/WfCu0knu5y8J8Q JieA==
X-Gm-Message-State: AOJu0YzWUM3r9CiktJXA8z7Vgsb3q2jqYdELOgSIXEyNVDM3SkOIBmCl H3MU+iaVAdnfbGn3Ja1cPDyx313FiHmsQ5ljTtA=
X-Google-Smtp-Source: AGHT+IGxezRFxNptZUslR9PPUlc/YhFygZbqjP1BezyC7QpkwJwOUMZsg2oEVBgOFu3hcBCHiMbm2l0OfDggSBDYiHE=
X-Received: by 2002:a05:6402:3551:b0:52a:586a:b19a with SMTP id f17-20020a056402355100b0052a586ab19amr10638801edd.21.1694303798011; Sat, 09 Sep 2023 16:56:38 -0700 (PDT)
MIME-Version: 1.0
References: <CAHBU6is50TkpDsqXTp6WxdVSgE66j3gGHZ60ey2jFYbefaHFJw@mail.gmail.com> <20230909165843.GlTJy%steffen@sdaoden.eu> <CAChr6Sz=rMqwp3GOoqGgSsxE8Pqe3GCTqaOLpBO=YN+7v1Ui8Q@mail.gmail.com> <CAHBU6itKdzNdEpvq8m2vGmvFtRKDSeSvAaLM0CFqa3aQJUoheg@mail.gmail.com>
In-Reply-To: <CAHBU6itKdzNdEpvq8m2vGmvFtRKDSeSvAaLM0CFqa3aQJUoheg@mail.gmail.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Sat, 09 Sep 2023 16:56:26 -0700
Message-ID: <CAChr6SySae8nK7ynBhwFVwf0aGzjpPQKJV2EZKjgN4d6-5r3gQ@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Cc: i18ndir@ietf.org, ART Area <art@ietf.org>, Steffen Nurpmeso <steffen@sdaoden.eu>
Content-Type: multipart/alternative; boundary="00000000000005aab60604f5d8a1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/53-i0P6TadDl6CkI5KY5KAWOhOg>
Subject: Re: [I18ndir] [art] Just uploaded draft-bray-unichars-03
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Sep 2023 23:56:40 -0000

These are all up to you and Paul, as far as I'm concerned.

I couldn't think of anything better than "repertoire", so here we are.

On the subject of "dealing", I really do think it should be rephrased so
that senders are advised that their messages might not work. (this is true,
they might not work)

thanks,
Rob

On Sat, Sep 9, 2023 at 4:50 PM Tim Bray <tbray@textuality.com> wrote:

> On Sep 9, 2023 at 1:56:47 PM, Rob Sayre <sayrer@gmail.com> wrote:
>
>> 5. Refining Character Repertoires
>>
>> The IETF typically uses well-known data formats such as JSON, I-JSON,
>> CBOR, YAML, and XML. These formats have default character repertoires. For
>> example, JSON allows member names and string values to include any Unicode
>> code points, including all the problematic types; the following is a legal
>> JSON document:
>>
>
> I’m OK with Rob’s redraft, if only to lose the fuzzy word “typically”.
> Anyone object?
>
> {"example": "\u0000\U0089\uDEAD\u7FFFF"}
>>
>> The value of the "example" field contains the C0 Control NUL, the C1
>> Control "CHARACTER TABULATION WITH JUSTIFICATION", an unpaired surrogate,
>> and the noncharacter U+7FFFF. It is unlikely to be useful as the value of a
>> text field. It cannot be serialized into legal UTF-8, but many libraries
>> will silently parse this and generate an ill-formed UTF-8 string.
>> Implementors must be prepared to deal with these sorts of problematic code
>> points
>>
>> [ The first part, "unlikely to be useful as the value of a text field",
>> is good. But, the next part mixes "legal" and "ill-formed", and I don't
>> think that is a good idea. There is still a lowercase requirement after
>> that, and I think I disagree. Implementors do not have to be "prepared to
>> deal with these sorts of problematic code points". Maybe: "Some messages
>> will contain these problematic code points". That is true, but you don't
>> have to deal with them. ]
>>
>
> Yeah, I’m pretty sure you do, but by “deal with” I include rejecting such
> input docs, e.g. by raising a well-known exception with a helpful error
> message.  I guess that should be spelled out explicitly?
>
> It is unlikely that anyone specifying a new data format would choose to
>> allow this character repertoire.
>>
>> [ Instead: The JSON character repertoire is too permissive, so it's best
>> for new specifications to require that the contents of member names and
>> string values contain only Useful Assignables (see Section 4.2). ]
>>
>
> I prefer the current language. Anyone prefer Rob’s or to propose another
> option?
>
> Then, I got to the end, and noticed that "character repertoire" might not
>> be the best choice. "Character encoding" or "character set"? "Vocabulary"?
>> No shade for the authors here, writing about language itself is really
>> difficult.
>>
>
> I personally want to stay with “repertoire” if only by extension from its
> well-understood meaning when applied to encoding systems.
>
>>
>>>