Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]

Asmus Freytag <asmusf@ix.netcom.com> Thu, 19 March 2015 04:21 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 392F41A876D for <lucid@ietfa.amsl.com>; Wed, 18 Mar 2015 21:21:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CEsiBydsj-eC for <lucid@ietfa.amsl.com>; Wed, 18 Mar 2015 21:21:29 -0700 (PDT)
Received: from elasmtp-banded.atl.sa.earthlink.net (elasmtp-banded.atl.sa.earthlink.net [209.86.89.70]) by ietfa.amsl.com (Postfix) with ESMTP id 5DAC01A8766 for <lucid@ietf.org>; Wed, 18 Mar 2015 21:21:29 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=X7tY5ADkG5R/xmZ8UIIIK9Hs3qlcvHUY5cPWjAsH5PMs7eBdXd0yjDkg0G+hB48l; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP;
Received: from [72.244.206.133] (helo=[192.168.0.107]) by elasmtp-banded.atl.sa.earthlink.net with esmtpa (Exim 4.67) (envelope-from <asmusf@ix.netcom.com>) id 1YYRxX-0006Su-II for lucid@ietf.org; Thu, 19 Mar 2015 00:21:28 -0400
Message-ID: <550A4EC6.3090203@ix.netcom.com>
Date: Wed, 18 Mar 2015 21:21:26 -0700
From: Asmus Freytag <asmusf@ix.netcom.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: lucid@ietf.org
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150319014018.GI5743@mx1.yitter.info> <BLUPR03MB1378184CE32E928A3086665582010@BLUPR03MB1378.namprd03.prod.outlook.com> <20150319023029.GA6046@mx1.yitter.info>
In-Reply-To: <20150319023029.GA6046@mx1.yitter.info>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2b65b6112f8911537c9cad3191107dd905d55cc620145e8f3350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 72.244.206.133
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/IvWKLvce3V6DRnZK2JNSaer9IDg>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 04:21:31 -0000

On 3/18/2015 7:30 PM, Andrew Sullivan wrote:
> On Thu, Mar 19, 2015 at 02:11:56AM +0000, Shawn Steele wrote:
>>> At every level.  DNS names are exact-match.  Each identifier is unique.
>> For the machine, sure.  But if you throw in font weirdness and stuff, they become non-unique (to humans, not to machines) even in ASCII.  To make them be more unique one has to impose more rules, like "use a decent font", "lowercase everything".  Etc.
>>
> Yes.  That's what we're trying to understand.
>   
>> No, even all NFC or NFKC would be 100% unique to the machine
> This is either tautologically true, or false.  Certainly we learned
> with IDNA2003 that NFKC doesn't work, because while it's good for
> increasing match probability the identifiers aren't stable.  So when
> they're handed around through different environments, stuff happens
> that is bad.

As in, buggy implementations?
>
>>   What I'm questioning is how unique is good enough?
> Surely that's part of what we're trying to understand?

I understood the IETF concern this way:

"While we all know that human perception can be tricked, and poor 
rendering doesn't help, we are concerned about the case where careful, 
conscientious users cannot tell apart two identifiers that the protocol 
deems unique."

While this looks superficially like the normalization issue, it is not, 
but it may be (perhaps partially) something that can be addressed at the 
protocol level with reasonable cost. So why not deal with it there?

Yes, this doesn't address the rest of the human perception issues, but 
because the subset we identified doesn't really hinge on any human 
inadequacies, addressing the current issue should not interfere with 
other solutions that address these inadequacies.


>
>> We do seem to have some desire to be linguistic.  Otherwise Sharp-S and Greek didn't need touched.
> We have people who want identifiers that work as useful things in
> their writing systems.  And sharp-s and sigma seemed to be cases where
> people were quite surprised, which means that the identifers are less
> useful (because if the identifier system furnishes you with surprises,
> that's inconvenient).  This new set of cases is in fact another set of
> lurking surprises, which is why some of us are concerned about these
> cases.
As  a btw:

I'm amazed at the near total lack of IDNs registered for the Latin 
script in the root. It seems that people like the "fall-back" nature of 
non-accented ASCII labels for anything that should be accessed 
universally (top level).

So, for that script at least, you could say that users don't like being 
surprised by a more linguistically accurate, but less universally 
accessible way of constructing identifiers.

Interesting....

A./