[Cbor] Re: dCBOR: Normalization of Strings

Carsten Bormann <cabo@tzi.org> Mon, 29 July 2024 23:14 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 89D90C18DB9E for <cbor@ietfa.amsl.com>; Mon, 29 Jul 2024 16:14:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k7pSAnoDLKVw for <cbor@ietfa.amsl.com>; Mon, 29 Jul 2024 16:14:16 -0700 (PDT)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0CD37C18DBAD for <cbor@ietf.org>; Mon, 29 Jul 2024 16:14:15 -0700 (PDT)
Received: from smtpclient.apple (p200300f5ff074b560c68abb61b8fe5e9.dip0.t-ipconnect.de [IPv6:2003:f5:ff07:4b56:c68:abb6:1b8f:e5e9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4WXvLs26WxzDCbn; Tue, 30 Jul 2024 01:14:13 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <D9570449-9EB8-49D9-95A3-62343790882E@cursive.net>
Date: Tue, 30 Jul 2024 01:14:02 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <691C2539-84F9-4D38-BCF9-DF4C7EDC0168@tzi.org>
References: <E8325093-F005-4A56-8AFB-9C1637E19EA4@wolfmcnally.com> <F04DCC66-CECA-4FFD-9AF1-C58F983A8EF1@cursive.net> <D6A3A142-0999-4D0B-9CBC-A698BC384DD4@wolfmcnally.com> <8A2595D5-B9EC-4C5E-B83A-D10DE2D4B4DD@wolfmcnally.com> <D9570449-9EB8-49D9-95A3-62343790882E@cursive.net>
To: Joe Hildebrand <hildjj@cursive.net>
X-Mailer: Apple Mail (2.3774.600.62)
Message-ID-Hash: CQBO3YTDVFVAU2GD76SREEYWNK54WG6Q
X-Message-ID-Hash: CQBO3YTDVFVAU2GD76SREEYWNK54WG6Q
X-MailFrom: cabo@tzi.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Wolf McNally <wolf@wolfmcnally.com>, CBOR <cbor@ietf.org>, Christopher Allen <christophera@lifewithalacrity.com>, Shannon Appelcline <shannon.appelcline@gmail.com>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: dCBOR: Normalization of Strings
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/LuZgrVJtiFScfGk_ilrHUyJEfKI>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>

On 30. Jul 2024, at 00:19, Joe Hildebrand <hildjj@cursive.net> wrote:
> 
>> Latin-1

Latin-1 became bed-ridden in 1990 (iron curtain gone, so we now really needed all these Czech, Polish, Hungarian, … characters) and its corpse was completely shredded in 1999 (introduction of the Euro, the symbol for which is not in Latin-1).
I wouldn’t build any strategy on Latin-1 or the stopgap Windows-1252 (Windows 95, with € finally added in Windows XP).

Latin-1 of course is an encoding scheme (which could be called UCS-1) as well as a character repertoire (which is identical the Unicode range of U+0000 .. U+00FF).  Instead of the latter, a larger set of Latin characters might still be useful for memoizing much of NFC normalization and validation.

Grüße, Carsten