[Cbor] Re: dCBOR: Normalization of Strings
Wolf McNally <wolf@wolfmcnally.com> Mon, 29 July 2024 23:50 UTC
Return-Path: <wolf@wolfmcnally.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3BCE4C180B43 for <cbor@ietfa.amsl.com>; Mon, 29 Jul 2024 16:50:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wolfmcnally-com.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IhbEHjo95Zc3 for <cbor@ietfa.amsl.com>; Mon, 29 Jul 2024 16:50:08 -0700 (PDT)
Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B24AFC14F6B8 for <cbor@ietf.org>; Mon, 29 Jul 2024 16:50:08 -0700 (PDT)
Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1fc569440e1so34280345ad.3 for <cbor@ietf.org>; Mon, 29 Jul 2024 16:50:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wolfmcnally-com.20230601.gappssmtp.com; s=20230601; t=1722297008; x=1722901808; darn=ietf.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5P5Yxj2VSbBePoRnlcYJxMcp6kBSGSkI66jVHoJxKLA=; b=CalKB3B/5tQcrniMF3qBflYoNOq+BDNFlDKWWpzpgwJ+bgKakkJRt6zOrD/bCpup2d G1n0dOXKxCmyuUF0/ls73qT+K5aUOy+lltX45sKyeAdXLl5GXXtE/rmdVPZ48WeDEs8k EVEK0A6oAa78C/wSRHi3slTsd4TnorZeyF6+mUqhKmKtmJSc9kLCy3dYYCVQNhmtCgms t9/lcqhlyDDYzw89vTha9ffaqUqr8dUqbwtw4TxdYCYnxT4BCO8gUpuuVuk3GrVSnqqK 0UopbldyFj0LeR1HS8IJsfpQ8wrpi7m2dUN0Yd+kMiWl0ZdnUCZmEe5ftBocu3PIz/jX Sj8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722297008; x=1722901808; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5P5Yxj2VSbBePoRnlcYJxMcp6kBSGSkI66jVHoJxKLA=; b=CnGtT/oOKQITXqx5tYvfdH7hfssdoNqB6WF2H/sNybYMVJtOxRjluh0h3bUlqOpqFC 0fQKvoTyCBGkXyuDREYHG3xrzWcj2lkCWVqHaFaxAROpaRWnti5fd59OAiibzMO9I0X/ xE46lNzhQoqsKwJ6/B6TXojVGPA/J39+q7OKwKccw4dMPLWNZOQVH2lCzkVhyEHAuYCG I9E+AQvScoxbsVDFuZOjneDAjVZ97AS0jTD024oLOXSbjiS3Q0MyEPnvP2zz7j2156cc pPqce9hpAvzJQ+wIgkIWc0z0gdBK6YTnYmhuUDmm6AZk5hSq4TQ1dmudLh1qZqF7FZnX P5Vw==
X-Forwarded-Encrypted: i=1; AJvYcCUCU1QjhUnW65AWGXSZF4YZRtiASEnokSupxRi+fxbIWTe57O5UY3d6KAy3/OGTPw+U3BWYmLxt4B5RAi0w
X-Gm-Message-State: AOJu0Yx0A9bNxQA/60DgMN1SP8TZIHPzF2QIxgtBFGIiIb3e2rYRvgKp UWj2/uWY0aVzYPtnkSV0iM1aF0U8cWmXpIob4IR5DCccRDVnRVvbyKg114H6LDI=
X-Google-Smtp-Source: AGHT+IGtgJ9EueyNsL6NGoIIaMUNU9DddZ1pYJR/8AjHlfaqshXS0Nrlt2TeJKp4tRIs98vBYUXyeQ==
X-Received: by 2002:a17:902:e881:b0:1fd:83c4:1b25 with SMTP id d9443c01a7336-1ff048300bemr109208365ad.28.1722297008053; Mon, 29 Jul 2024 16:50:08 -0700 (PDT)
Received: from smtpclient.apple ([192.145.119.154]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7ee0190sm88864455ad.160.2024.07.29.16.50.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jul 2024 16:50:07 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\))
From: Wolf McNally <wolf@wolfmcnally.com>
In-Reply-To: <691C2539-84F9-4D38-BCF9-DF4C7EDC0168@tzi.org>
Date: Mon, 29 Jul 2024 16:49:55 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <5DC70280-3D09-49D0-B2EF-0F47F5DB37F6@wolfmcnally.com>
References: <E8325093-F005-4A56-8AFB-9C1637E19EA4@wolfmcnally.com> <F04DCC66-CECA-4FFD-9AF1-C58F983A8EF1@cursive.net> <D6A3A142-0999-4D0B-9CBC-A698BC384DD4@wolfmcnally.com> <8A2595D5-B9EC-4C5E-B83A-D10DE2D4B4DD@wolfmcnally.com> <D9570449-9EB8-49D9-95A3-62343790882E@cursive.net> <691C2539-84F9-4D38-BCF9-DF4C7EDC0168@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3774.600.62)
Message-ID-Hash: DG4HE733UUGBVFGKEUUWPNAM2RXQZ2G2
X-Message-ID-Hash: DG4HE733UUGBVFGKEUUWPNAM2RXQZ2G2
X-MailFrom: wolf@wolfmcnally.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cbor.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Joe Hildebrand <hildjj@cursive.net>, CBOR <cbor@ietf.org>, Christopher Allen <christophera@lifewithalacrity.com>, Shannon Appelcline <shannon.appelcline@gmail.com>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Cbor] Re: dCBOR: Normalization of Strings
List-Id: "Concise Binary Object Representation (CBOR)" <cbor.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/rx7N4827v7wNopSt8byeLMPdjGk>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Owner: <mailto:cbor-owner@ietf.org>
List-Post: <mailto:cbor@ietf.org>
List-Subscribe: <mailto:cbor-join@ietf.org>
List-Unsubscribe: <mailto:cbor-leave@ietf.org>
Carsten, In a constrained environment where you must sacrifice full dCBOR compliance, you could of course use some other heuristic to support a limited set of NFC-compliant strings and yet support wider range of characters than Latin-1. You can advance this strategy until the number of false negatives falls below some arbitrary threshold. Or you might want to reconsider your use cases for dCBOR in such extremely constrained environments. ~ Wolf > On Jul 29, 2024, at 4:14 PM, Carsten Bormann <cabo@tzi.org> wrote: > > On 30. Jul 2024, at 00:19, Joe Hildebrand <hildjj@cursive.net> wrote: >> >>> Latin-1 > > Latin-1 became bed-ridden in 1990 (iron curtain gone, so we now really needed all these Czech, Polish, Hungarian, … characters) and its corpse was completely shredded in 1999 (introduction of the Euro, the symbol for which is not in Latin-1). > I wouldn’t build any strategy on Latin-1 or the stopgap Windows-1252 (Windows 95, with € finally added in Windows XP). > > Latin-1 of course is an encoding scheme (which could be called UCS-1) as well as a character repertoire (which is identical the Unicode range of U+0000 .. U+00FF). Instead of the latter, a larger set of Latin characters might still be useful for memoizing much of NFC normalization and validation. > > Grüße, Carsten >
- [Cbor] dCBOR: Normalization of Strings Wolf McNally
- [Cbor] Re: dCBOR: Normalization of Strings Carsten Bormann
- [Cbor] Re: dCBOR: Normalization of Strings Joe Hildebrand
- [Cbor] Re: dCBOR: Normalization of Strings Carsten Bormann
- [Cbor] Re: dCBOR: Normalization of Strings Joe Hildebrand
- [Cbor] Re: dCBOR: Normalization of Strings Carsten Bormann
- [Cbor] Re: dCBOR: Normalization of Strings Wolf McNally
- [Cbor] Re: dCBOR: Normalization of Strings Wolf McNally
- [Cbor] Re: dCBOR: Normalization of Strings Joe Hildebrand
- [Cbor] Re: dCBOR: Normalization of Strings Carsten Bormann
- [Cbor] Re: dCBOR: Normalization of Strings Wolf McNally
- [Cbor] Re: dCBOR: Normalization of Strings Joe Hildebrand
- [Cbor] Re: dCBOR: Normalization of Strings Wolf McNally