Re: [I18nrp] next steps
Nico Williams <nico@cryptonector.com> Tue, 24 July 2018 07:12 UTC
Return-Path: <nico@cryptonector.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 541A1130E4A
for <i18nrp@ietfa.amsl.com>; Tue, 24 Jul 2018 00:12:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9,
DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key)
header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 6jxqz3NUrrXI for <i18nrp@ietfa.amsl.com>;
Tue, 24 Jul 2018 00:12:14 -0700 (PDT)
Received: from pdx1-sub0-mail-a14.g.dreamhost.com (smtp9.dreamhost.com
[64.90.62.178])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id F0490130E12
for <i18nrp@ietf.org>; Tue, 24 Jul 2018 00:12:13 -0700 (PDT)
Received: from pdx1-sub0-mail-a14.g.dreamhost.com (localhost [127.0.0.1])
by pdx1-sub0-mail-a14.g.dreamhost.com (Postfix) with ESMTP id 26A5180F68;
Tue, 24 Jul 2018 00:12:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date
:from:to:cc:subject:message-id:references:mime-version
:content-type:in-reply-to; s=cryptonector.com; bh=eah0e2MSUxSM5g
ABpodCIoc+E7A=; b=qRtKqH+cUpMhzSUH6JZg7CWJber9lRVztAG3OM9NQNU9VF
mnlHVAI8p28p5nQJPOp2IJZUJ2njmyX6eKUvUZ+1aZ7xTQdk/y0dp6HItWPHwm4d
eEIrJ1xXBTkdiJNndVw9xrSrW8KEs5+tXKFrnAR1HGp9FW3RIv0L7dAbgLq8Y=
Received: from localhost (cpe-70-123-158-140.austin.res.rr.com
[70.123.158.140])
(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
(No client certificate requested)
(Authenticated sender: nico@cryptonector.com)
by pdx1-sub0-mail-a14.g.dreamhost.com (Postfix) with ESMTPSA id B12397FAFD;
Tue, 24 Jul 2018 00:12:11 -0700 (PDT)
Date: Tue, 24 Jul 2018 02:12:09 -0500
From: Nico Williams <nico@cryptonector.com>
To: Asmus Freytag <asmusf@ix.netcom.com>
Cc: i18nrp@ietf.org
Message-ID: <20180724071208.GB5700@localhost>
References: <E10F785F-39A8-4A03-B5F0-0672B806B440@vpnc.org>
<de326e16-8f93-7afd-0090-06ee7e672471@stpeter.im>
<5b569782.1c69fb81.4603.51f9@mx.google.com>
<9c81fb76-e4cd-a9d7-eded-960256f224ec@ix.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9c81fb76-e4cd-a9d7-eded-960256f224ec@ix.netcom.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/J9gP2SrfIJgaWqxp9xhODVB4LPk>
Subject: Re: [I18nrp] next steps
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>,
<mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>,
<mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Jul 2018 07:12:15 -0000
On Mon, Jul 23, 2018 at 08:38:15PM -0700, Asmus Freytag wrote: > If you define equivalence relations (e.g. variants) based on code points, > then you'll need to apply these / compare strings in NFD otherwise you will > miss out on the effect of re-ordering of combining sequences. Or NFC. It doesn't matter which form you choose as the internal form for comparison, provided that the comparison is form-insensitive. Now, since normalization to NFC is defined in terms of an initial normalization to NFD, it's obviously best to use NFD internally anyways. Note, BTW, that form-insensitive comparison is not necessarily enough to implement form-insensitivity. If you have hash tables, say, and you're hashing strings for hash table lookups, then you have to normalize during hashing as well in order to have form-insensitive behavior. The nice thing about this form-insensitive comparison/hashing is that it can be done with fixed and small memory allocation, as you can step strings character by character, and there is a maximum length in codepoints to a character normalized to NFD, thus the amount of memory needed is fixed, and that memory can be allocated on the stack (where you have one). Indeed, you can even optimize form-insensitive string comparison so that for the mostly-ASCII and mostly-not-equivalent or mostly-equal input cases... most characters do not require canonical decomposition. Nico --
- [I18nrp] Minutes / summary? Paul Hoffman
- Re: [I18nrp] Minutes / summary? Peter Saint-Andre
- [I18nrp] next steps Larry Masinter
- Re: [I18nrp] next steps Asmus Freytag
- Re: [I18nrp] next steps Nico Williams
- Re: [I18nrp] next steps Nico Williams
- Re: [I18nrp] Minutes / summary? Pete Resnick
- Re: [I18nrp] next steps Larry Masinter
- Re: [I18nrp] next steps Nico Williams