Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Fri, 10 March 2023 09:56 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31770C1BED1B for <xml2rfc@ietfa.amsl.com>; Fri, 10 Mar 2023 01:56:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uTNmJsDOuAJm for <xml2rfc@ietfa.amsl.com>; Fri, 10 Mar 2023 01:56:28 -0800 (PST)
Received: from JPN01-TYC-obe.outbound.protection.outlook.com (mail-tycjpn01on2071a.outbound.protection.outlook.com [IPv6:2a01:111:f403:7010::71a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D5985C1BE884 for <xml2rfc@ietf.org>; Fri, 10 Mar 2023 01:56:28 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xd8QMwPkooy7L22Vxdo24gdtOK8ZKeBwuX5HP8Yv9puTvRXTumZCzH99w6sqDtw+baya3om/nCCKje03fxdsbaqtRee6w93NdtA12jFxMyTPQQJdbH7Y0oEF468w6xkgd6Yix9YUdVCkE83MgMoxn9vBpLvDOCROeN0QQwAEjE7O/2ppaoQ2jjrS6pxKysREYpqxDZCfVtoMiqQYJ4uVCbff5mIdhW2l8Y8JljS/x5SaDAqy78i19ggsjQ4VJu70Fqlb4Iv/8KQeVZB3ty4NtaMyiUOPTJxdvNkXx0HVReOOUovqfkc9vq/ClXa4YjZ1/p6VVF+/GIkVK5zg8au74Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4+eGSEfW+mUF31/OPZhrGi2WmiJvHp6t8fjnbeEIV9g=; b=MxMw56ke2Zcy2QD/q+QfCiW5l+DmvAx3uAFewADsfLSCxo7ONPI0wcU+Gy9teUk+l2QLGeyr53gbgs48J27gDN6u2V+Fs6DWS3htV6/7HfTs9Ebw8rnGpu8lpQW6XF14pM/c6I1xw4j0xzb9qZELTC4qwH38wcMruoLtyYlinZQcn26pU36KhGZslQE8D9m1sq/kLaxmDfXzBxl1cPM38F0P+VW54CS/Ml9jfFtwbnfromhEHsh24NhoCMoap4A0/2N6TXuigNvvmDKCYuLgXB8+O57iNqtQLNJ3kDGxPHVw6cdegs3hPdF5dq71//tYwsEsKA7g6lLir6b9HsDTIw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=it.aoyama.ac.jp; dmarc=pass action=none header.from=it.aoyama.ac.jp; dkim=pass header.d=it.aoyama.ac.jp; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector2-itaoyama-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4+eGSEfW+mUF31/OPZhrGi2WmiJvHp6t8fjnbeEIV9g=; b=Xx+lcSQc3yyCSpjgAZaBzcAkRb2W88H+/Xo3gSc8pOy/DrUJ6/3JuWe9WlSf5JMHA1AvhfEqUcjZ9ahcBXDRNSCyRhIOoYvqiHnmXMQHHMSL/yogd47v9tYdWmSWRC1BroF1YAcjrj6UkuUWZshdYTPHmoVcMYSBBBK7Y7KkJlQ=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=it.aoyama.ac.jp;
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7) by TYCPR01MB10398.jpnprd01.prod.outlook.com (2603:1096:400:243::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.19; Fri, 10 Mar 2023 09:56:23 +0000
Received: from TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::b8ae:9cb8:821b:ccc5]) by TYAPR01MB5689.jpnprd01.prod.outlook.com ([fe80::b8ae:9cb8:821b:ccc5%4]) with mapi id 15.20.6178.019; Fri, 10 Mar 2023 09:56:23 +0000
Message-ID: <749022e1-922f-3934-37ed-ee5e19f4e302@it.aoyama.ac.jp>
Date: Fri, 10 Mar 2023 18:56:19 +0900
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0
Content-Language: en-US
To: Carsten Bormann <cabo@tzi.org>, "John R. Levine" <johnl@taugh.com>
Cc: xml2rfc@ietf.org
References: <20230304041905.DA71BA438468@ary.qy> <5E205D41-48AD-4CEB-A867-E893D7B33CAC@tzi.org> <4faeb176-7d39-7394-d0a4-2cad71409f3d@taugh.com> <9EA7DD10-809E-47B3-8C7E-D385EAC14754@tzi.org>
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
In-Reply-To: <9EA7DD10-809E-47B3-8C7E-D385EAC14754@tzi.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: TYBP286CA0025.JPNP286.PROD.OUTLOOK.COM (2603:1096:404:10a::13) To TYAPR01MB5689.jpnprd01.prod.outlook.com (2603:1096:404:8053::7)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: TYAPR01MB5689:EE_|TYCPR01MB10398:EE_
X-MS-Office365-Filtering-Correlation-Id: 9dab71ad-0fe8-4a5b-5739-08db214db460
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: TeYA8r6BrzqV0gbQJgpJiJeH/XY63qrUMa1v9MF+636cC4aujGwSD23K9v79Usd/KS5QLRLuS96XuoG8noI9+Ga416kco4H263H5qf1AWSFlUJDUxa+9U4pMTxnohde9I/+xD9wNqvw1ftMw6yNxjU3g1L2pPhA/nqIJI6wYD1kHIFWS0r23EpL7bfXOdl+JhiuSFIIcUqQfvri++pOI+UtZsJkwL7q2Y2xPQVrgp9bI6ZsHovIj/k6edR5HKNN4hipsT3A+qTJReyF4psNWTTkhmK2/cdjRlyiIOzMHZ73kXMP/N3OC3TSqb1tEax7ULJNDYzx6WpYR5/eFFc9n+sEB3onrjwTlFEyCUSl+Fx1PHKTfSI4lgXi1h2yKK+h23HUghbtJ6LcddWDBHNBhXttTp7pGvGPw6UV3ihKh1bkXvyYO8JBWgnUWtmX1rSVJgM5kfRQz+o7UJPC/OSAwSJsFIveiWU48fHGAVVBFQwumGISTXzn+MppzDaxV324AwLcKKYab2UDxUqtZ2/ZOexuNuGwCS4fQF/s8lx4zC6ngLmezyQbOg6Gl7alQx/l+Golnd0dezpdNst3OMQFgzBABP4T7mGNd7qJgkxUwpJxJfgsh4Q6Y0kwWLIQuwqtt/UgUxP0sgiYSm3sQg0zGh5Z4EsVfoRungJUiyljJEz3ADqeFHLglKJE2OCmvVSG646SakVbbM/1BKV+JnSK1/0RnWv55IZ6gHBikJORlXe5NEYxd0t0JiTk4nnhr/+p6ilMUlbHTkPN2JaLF1lHkMg==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB5689.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(346002)(366004)(39850400004)(396003)(136003)(376002)(451199018)(40140700001)(38100700002)(786003)(110136005)(2616005)(478600001)(2906002)(6486002)(52116002)(5660300002)(316002)(36916002)(66946007)(8676002)(66476007)(66556008)(41300700001)(4326008)(8936002)(6512007)(26005)(31696002)(38350700002)(86362001)(41320700001)(6666004)(186003)(6506007)(53546011)(83380400001)(31686004)(45980500001)(43740500002); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: rvHCV9nZZHdweAE6s/DZxminABMi1LazFF/Ufe32ZuISXJlvAcjMi3XG9U6E7cGDMJXxI5xb2rviOjUFu8CjyDIuV3UdJZjBAdHPHNQ3ppOz07XKdOdFnsQ2yJVopfMNQiqS99GkPfO7M1kNzKDzGSZ6VnN4tL+scRIH2S8LZsnDrMy6ksPP9hKDqtwCQngiTqp9byrXBWffwzIEPy5B//kPLshswOB95eFAMTHy2UY4K5+O6k+ZtwhtlmtWrrLQm/kpEpVlsaQ2chTQJmoN7rCBe2DnyLmtTtpR6pMOrNrBrcp+0+QSquy+uSLwwPLxXW/8x3M2FrQ0G5uM02CQYwrYpE8w8aCKTZeK65lxqwqAgPgsHf5TXxYfR+9KWySt28uXIvHltqDwZzMJKOKW8/9EJGExgZ1vTatSurAnIQoyU9B/5Qi6JRPc5QIhWk/4YIYHm9U1feRX8tmAyNEUhIrJ3KHyOmH1R3WyUY1Lj/U9M509nxg1fBJxVnC3fOUKNeruy7hgZiysxd9v1F5FXhdnOfy6j9Iv81T6vIcCqZUbs4hIo2ZtwcmfgxGNytOV0Z598Ma+eFoJ+f10SI4tgpwQSY7qj4OQ3PqWXfPeIn35V2P9VU4mkcsO4EKh3+TaMJY9FhIc9Tu2CIkPpdaKkLrm8sqS/hyvwmqHLVIhQBGOmDuFNta2oX3y0CYyyIEzCQVXSWH5C/btOAB65hqZhwrXydfFwehppioBROeG8qKVgmDEedojrTKj+PBWLuG1XtKnZupkzeT6bpc1IL+WI3x2xK9gs5HBeAO2ccmko+RPiOXe7u1WN42lJkQAdoUjq/JjO/Th/20eGz8ZXjZ1dyrde3yFRLK/82B5ppg8GBEsGhjAbdKg9cSp+pnpqw6P2zrt+elC9rb18iwssI0KkGVQS/9fATbxYWpCyrgFMGIq7Z/UnnFQYdEQRy8s4FjV543KWfuiadmpJCPKZiNd4ZWuTcmn4/WIG2slqM2JI5zmuhzckA+Inx97llOk8YML8yDcb7RBBfgjxUV6iLptZajzy1HU0mriVsFTZLwCTBFggC1rgAkyD7resNFvqcHZjMv6dUsYi9FCZhiVW8B8ICpPj39XiIKYYxfFnS6kWOxdXXfwucxh9l+lAKhwkVfzbrhZc4R5LGn/dq3eDf0OEW2PR65/mMFE/zcfQFDfNy16ryQNFEJwD7P/lNVLRFZUjCYqIhx7weD+5g9N4TnVkrxTCkd5ukoJRXt2/aEquXdfOHvm3FE0rJp2IrKbGnAxTsJiurOSUp8TkcwlnpdjAXXJTcOhiHkxZcB09VYMMwPwsGrQpVP1adY8xSD8Jsye890RQMhm4FthEAr/CzpSOY8tGqnyYuhdb3FEQEzJ7IYLrqmqi067he04PQcp7mN9Md4xbTcaf3CrkGmDBqJlo40clWsT92Pb2n6biCdgJCPnm02wC9LtvaD1R0itdMZBVn3vYe1/KN90UM2Juw4rkthbzKkOKungrUYV8DGMEkDYvM/Fj12HdirlZCl8X7uRvN/bXthblJevTyRzAiTtOj2kAoH3B7WjB1kYoDpcMDsWXsiDfg5ZkcGWgrW0NJvIqCZpPBtSOcAjd4dpQJlpJw==
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 9dab71ad-0fe8-4a5b-5739-08db214db460
X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB5689.jpnprd01.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Mar 2023 09:56:23.1553 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: DGWFN1KFkkB35cWX0PQuP2UloYrLnjs7GOhBqR5Q3fnnbZ7+q6Ss4J3BSU96R1PXkDOYFai53NEw1vZE3zDy3w==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYCPR01MB10398
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/Wvw2Zc6MeIu8BsPgSLhZldqSF9M>
Subject: Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Mar 2023 09:56:30 -0000

On 2023-03-05 01:38, Carsten Bormann wrote:
> On 2023-03-04, at 16:46, John R Levine <johnl@taugh.com> wrote:
>>
>> In any event, this reminds us that we need some discipline in what we allow beyond letters and punctuation.  Unicode does not make this any easier by providing so many different glyphs that look nearly or exactly the same.
> 
> Correct, except that the “allow” is a bit misplaced.  “Recommend”, “nudge authors towards”,  “consider good style” etc. would have worked better for me.

Yes indeed.

> Anyway, that’s why there is now authoring support in kramdown-rfc for character repertoire diagnostics, initially with the tool “echars” (which doesn’t require actually using markdown).
> 
> For those actually using markdown, eventually, I expect the yaml header to the markdown input to be able to carry a declaration of what non 10,32-126,160,8203,8209,8288 characters are actually desired in the input, so warnings can be emitted if the document isn’t staying inside those bounds.

Please make the warnings say this in a positive way. We don't want 
people to fall back on Latin or ASCII just because they got scared by a 
warning.

Declaring the characters involved should work for most scripts, but for 
Arabic and most Indic scripts at least, that's not enough, because you 
need the right contextual forms in the font. Also, for 
Chinese/Japanese(/Korean), you should know which it is (including the 
distinction between traditional and simplified Chinese) in order to 
select the right font. (Getting the wrong font will be legible, but will 
not look good enough for an RFC.)

Regards,   Martin.



> Both of these would be helped by access to information about the current repertoire limitations of xml2rfc, which is why I initiated this subthread.