Re: [I18nrp] next steps

Larry Masinter <LMM@acm.org> Thu, 26 July 2018 01:54 UTC

Return-Path: <masinter@gmail.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D7CE130F3E for <i18nrp@ietfa.amsl.com>; Wed, 25 Jul 2018 18:54:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.4
X-Spam-Level:
X-Spam-Status: No, score=-1.4 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FHrKwiWlHaUr for <i18nrp@ietfa.amsl.com>; Wed, 25 Jul 2018 18:54:29 -0700 (PDT)
Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 61FD4127332 for <i18nrp@ietf.org>; Wed, 25 Jul 2018 18:54:29 -0700 (PDT)
Received: by mail-pf1-x430.google.com with SMTP id q7-v6so43430pff.2 for <i18nrp@ietf.org>; Wed, 25 Jul 2018 18:54:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-transfer-encoding:thread-index :content-language; bh=/7lOaMGv8o6Ahd4UshFKy6E6sIaEwdqdV8+W2bSerCA=; b=JvrwSX3xvPR0jrMKwd/UrjwkgIl+siNY7Hi6x4cLYQQrODGU4v0bPTRR1kCcslaFEW ziRfLRVizQ6Q85mSHMI+lBWWRTRs3TmKSMrMv2Udn1Qi7LNBdIACtc/sc2rwjgzilFzu ntMDXMCb7Ce/yyHGD8IyCBOjjsdSOI1a8otCLZzurnJoxja13yDHbgOF3DkUKBECJ5TA LKjeNVL7eQnXp01YcffAraZlgELPRgJl3+lHx8yIS1ROP6kG2HxXlpiy8OSVFeySmO1M 4HRePNiRuB1Epr0GeYafX6GMVnziXnfm/yYBDsjZG2RXoXSFwNCry3D3tEiL932k7GLF nWyg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:references:in-reply-to:subject :date:message-id:mime-version:content-transfer-encoding:thread-index :content-language; bh=/7lOaMGv8o6Ahd4UshFKy6E6sIaEwdqdV8+W2bSerCA=; b=lltpnj3/YF32M2KolD4JbltGn7PPcPAXyOw9SPBA2ww2/6+jpfKDp7Me8hkLmH3VSX MHzMlk/I6Q0BBk4vURX29ZTNUJJn0hDxkDL8Avndl+bwnBp6K01Zlcvpwiodxy6Lot0/ UYXOUgRTSxxBL0Ou3ZLlInAxhJVLiepLqltgYTrDEfFe7mLUn49tvisDmFAd06Z5VICX AtYZz5EcW6FfQF3igs7D9O24MJibGEEu9RByP72yaHmQaVD1DO0jIuZbwUBMXXRJANBu eWSoMYHUrgjq+HC1enC18m0J5icUH1pWLx294sYxHUbXl7TD6FRfFa+iGrBACRrE7AOO nDuA==
X-Gm-Message-State: AOUpUlG6aTABQFdag21yxxWrKCzrGKIYG8XlWdcmDM1a5i2g1zfld0kU g+nVuTDyzr2Uz+HKZ29tLWNxnQF/
X-Google-Smtp-Source: AAOMgpcjdiEW4kkTNCKDAJijP5jnte57wxk4bOvp9fFV6HMC/RNHFaFc21lA9JclhUayGj+u8mZFPA==
X-Received: by 2002:a62:ce0b:: with SMTP id y11-v6mr745197pfg.95.1532570068495; Wed, 25 Jul 2018 18:54:28 -0700 (PDT)
Received: from TVPC (c-24-6-174-39.hsd1.ca.comcast.net. [24.6.174.39]) by smtp.gmail.com with ESMTPSA id j191-v6sm25054pfc.136.2018.07.25.18.54.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Jul 2018 18:54:26 -0700 (PDT)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'Nico Williams'" <nico@cryptonector.com>
Cc: "'Peter Saint-Andre'" <stpeter@stpeter.im>, "'Paul Hoffman'" <paul.hoffman@vpnc.org>, <i18nrp@ietf.org>
References: <E10F785F-39A8-4A03-B5F0-0672B806B440@vpnc.org> <de326e16-8f93-7afd-0090-06ee7e672471@stpeter.im> <5b569782.1c69fb81.4603.51f9@mx.google.com> <20180724070212.GA5700@localhost>
In-Reply-To: <20180724070212.GA5700@localhost>
Date: Wed, 25 Jul 2018 18:54:27 -0700
Message-ID: <002801d42483$96a9e7a0$c3fdb6e0$@acm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQKNPw6ZpzopClf/AqEAlmT/kPf+4wLhlkPyAfKSA74A0tGeZKMBOqSg
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/OijYbMo_b8AfHFhSVYAp8ZNkTcg>
Subject: Re: [I18nrp] next steps
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2018 01:54:31 -0000

Here's a wild idea, flame away

> > There is a problem in RFC 6365 “Terminology used…” that I think is at
> > the core: It seems to imply that the way to compute equivalence of
> > strings REQUIRES normalization to allow determining a~b iff
> > normalize(a)==normalize(b).  But in general, the equivalence relations
> > one might want, for usability reasons, don’t have an acceptable normal
> > form.
> >
> > Continuing to put in normalization as an essential step is doomed to
> > failure. Some protocols may need to be redefined to allow
> > non-normalized strings.
> 
> Well, somewhere the equivalence check has to be done, else you have
> problems.  But you're absolutely right that always normalizing isn't always the
> right answer.

It isn't clear how Unicode defines "normalization" compared to how
the term is used in mathematics. If you start with an equivalence relation
https://en.wikipedia.org/wiki/Equivalence_relation 
you can define a "canonicalization" if the set has a total ordering,
and pick the "least" or "most". But the form you might get by
doing so might not be desirable, or suitable to be called "normal",
even if it's useful for determining equivalence.

You don't need to normalize locally for using hash-table caching
if you allow the hash-table to reflect a cached redirect,
and put the onus on the origin to supply a canonical string.

You might think this would bloat the table with all of the
possible redirects, but as long as you look up on data entry
and the origin is consistent, there should only be one redirect.
In fact, you could limit it to one, or just periodically clear LRU.

I'd even include case folding, punycode and %xx-URL percent encodings
as possible input forms. The times they are typed in are
rare.

It would allow different TLDs to define their own additional
equivalences and case folding.

Some variation of this could be made to work for DNS, HTTP,
and email addresses with some enormous deployment problems.
Would need to prove that it works.

Sorry if this is terse

Larry
--
https://LarryMasinter.net