Re: [I18nrp] next steps

Nico Williams <nico@cryptonector.com> Tue, 24 July 2018 07:02 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A50A3130E4A for <i18nrp@ietfa.amsl.com>; Tue, 24 Jul 2018 00:02:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w6LTosbduxFz for <i18nrp@ietfa.amsl.com>; Tue, 24 Jul 2018 00:02:18 -0700 (PDT)
Received: from pdx1-sub0-mail-a14.g.dreamhost.com (smtp9.dreamhost.com [64.90.62.178]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 340EB130E12 for <i18nrp@ietf.org>; Tue, 24 Jul 2018 00:02:18 -0700 (PDT)
Received: from pdx1-sub0-mail-a14.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a14.g.dreamhost.com (Postfix) with ESMTP id 3471980F60; Tue, 24 Jul 2018 00:02:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=3Omk4VM+PJJIKkepiSC25K3DdBY=; b=scdbVCXimDD TqIPgbd0b3Pv08oK4vPIykNgsO2SSaqqtDF7JsGbGGzcXgBNZDr51bI73dimsVcZ bvk00+1XVKzHAjanQ24BaRO/YCzkGfOPkyQr7z3Zzjs/T1H8RFITD2Rg6Pmu5rdU U7wdSXiP5Yqekvq6gvw5eXsoiJlOMHa4=
Received: from localhost (cpe-70-123-158-140.austin.res.rr.com [70.123.158.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a14.g.dreamhost.com (Postfix) with ESMTPSA id 8F16380F50; Tue, 24 Jul 2018 00:02:16 -0700 (PDT)
Date: Tue, 24 Jul 2018 02:02:13 -0500
From: Nico Williams <nico@cryptonector.com>
To: Larry Masinter <LMM@acm.org>
Cc: Peter Saint-Andre <stpeter@stpeter.im>, Paul Hoffman <paul.hoffman@vpnc.org>, "i18nrp@ietf.org" <i18nrp@ietf.org>
Message-ID: <20180724070212.GA5700@localhost>
References: <E10F785F-39A8-4A03-B5F0-0672B806B440@vpnc.org> <de326e16-8f93-7afd-0090-06ee7e672471@stpeter.im> <5b569782.1c69fb81.4603.51f9@mx.google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <5b569782.1c69fb81.4603.51f9@mx.google.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/r43a6dnu_96k2_UvtB1DPgaxcxY>
Subject: Re: [I18nrp] next steps
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Jul 2018 07:02:20 -0000

On Mon, Jul 23, 2018 at 08:05:41PM -0700, Larry Masinter wrote:
> Is it time to talk about specific drafts?
> 
> This seems more general:
> 
> There is a problem in RFC 6365 “Terminology used…” that I think is at
> the core: It seems to imply that the way to compute equivalence of
> strings REQUIRES normalization to allow determining a~b iff
> normalize(a)==normalize(b).  But in general, the equivalence relations
> one might want, for usability reasons, don’t have an acceptable normal
> form.
> 
> Continuing to put in normalization as an essential step is doomed to
> failure. Some protocols may need to be redefined to allow
> non-normalized strings. 

Well, somewhere the equivalence check has to be done, else you have
problems.  But you're absolutely right that always normalizing isn't
always the right answer.

For protocols like remote filesystem protocols, the best thing to do is
to implement form-insensitive / form-preserving semantics in the
filesystem (but not the server component).

For security protocols where relying parties need never re-encode to
verify signatures, then form-insensitivity can work.  In many cases it's
easy and convenient enough to make issuers/authorities normalize anyways
-- so might as well do that.

For something like DNS, normalization really is necessary, and has to be
done on the client-side.  (Though with on-the-fly signing one could also
just put ToUnicode() and ToASCII() functionality on the server, but I
don't think this is needed, nor desirable, nor desired.)

I speculate that in most cases in Internet protocols where Unicode
strings are carried, the best behavior is form-insensitive /
form-preserving, with a few exceptions where either an authority
normalizes, or the clients do.

In particular, it was a mistake to require normalization for NFSv4.

Quite often NFSv4 clients (Unix and Unix-like clients) have no idea
whether the application is even providing Unicode strings (because the
locale information is not visible to the NFS client).

NFSv4 servers are almost never very tightly integrated with the
filesystem -- typically there is a pluggable filesystem layer.  And with
other access methods to content with (local POSIX API access, other
protocols), the servers are just not in a position to authoritatively
perform normalization -- this alone argues for putting all normalization
burdens on the filesystem, not the server component, not the clients.
And the realities of actual use and implementation prevent us from
effectively dictating a normal form: HFS+ normalizes to ~ NFD; input
methods generally produce pre-composed sequences.

Only f-i/f-p can work for NFSv4.

I have been pushing form-insensitive / form-preserving behavior for well
over a decade now.  It's been lonely.  Are you joining me? :)

Nico
--