Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
Andrew Sullivan <ajs@anvilwalrusden.com> Thu, 19 March 2015 01:40 UTC
Return-Path: <ajs@anvilwalrusden.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D5A401AC3CE for <lucid@ietfa.amsl.com>; Wed, 18 Mar 2015 18:40:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.559
X-Spam-Level: **
X-Spam-Status: No, score=2.559 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_MISMATCH_INFO=1.448, HOST_MISMATCH_NET=0.311] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wFQAXokCzELO for <lucid@ietfa.amsl.com>; Wed, 18 Mar 2015 18:40:22 -0700 (PDT)
Received: from mx1.yitter.info (ow5p.x.rootbsd.net [208.79.81.114]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AEC301AC3C8 for <lucid@ietf.org>; Wed, 18 Mar 2015 18:40:22 -0700 (PDT)
Received: from mx1.yitter.info (unknown [67.211.120.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.yitter.info (Postfix) with ESMTPSA id 925568A035 for <lucid@ietf.org>; Thu, 19 Mar 2015 01:40:20 +0000 (UTC)
Date: Wed, 18 Mar 2015 21:40:19 -0400
From: Andrew Sullivan <ajs@anvilwalrusden.com>
To: lucid@ietf.org
Message-ID: <20150319014018.GI5743@mx1.yitter.info>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/izzVIqE56AXPZ-UOtiVZ7alSgJg>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 01:40:25 -0000
On Wed, Mar 11, 2015 at 09:58:26PM +0000, Shawn Steele wrote: > > > It says "there's a spectrum", but then "We use the term "homoglyph" strictly:..." That seems like an attempt at a hard line, though it does say "normally" > As I said in an exchange off-list, the point here is to situate "homoglyph" in a spectrum of different cases, all of which are more or less confusable. rn and m are plainly not homoglyphs, because if you set them in even moderately large type with serifs you can immediately see the difference. Latin A and Cyrillic A are just homoglyphs: no reasonable font even tries to differentiate them, partly because as a matter of history they evolved from the same form and therefore it's not surprising that they look the same now. They're different abstract characters because they are in different writing systems, but they look identical. > > ʻokina > > > > Hmm, I'd thought I'd seen it different, but maybe not, sorry if that was a bad example. It wasn't a bad example. It was a good example, but of something different than what you intended. :) The point is precisely that these fine details are going to matter in this discussion, so we need to attend to them. > > Although I think it's fair for an RFC to indicate that some fonts can exacerbate the problem, I'm not sure that it's fair to state that parts of the problem could be "solved" by a font. For example, sometimes the font choice may be under the control of the attacker. > I think the draft says "mitigate", not "solve". > ?? 2.2.2 explicitly emphasizes the Arabic Hamza Above case, though it does go on to mention that there are other characters. There's a _whole appendix_ that attempts to illustrate the range of the issue, and the text goes out of its way to point the reader at that and to avoid talking about specific characters except to explain the history, because we're trying to attend to the general problem. > For example, this focuses on the examples illuminated by a very esoteric Arabic code point. I’m digressing, but I never see any discussion of 拔 vs 拨 or 暖 vs 暧 (depending on font, YMMV) In fact there is a (different) Han character in the appendix. > > IDNA is supposed to be providing unique identifiers. > At what level? At every level. DNS names are exact-match. Each identifier is unique. In the DNS, color.tld and colour.tld are not the same identifier. Conceptually they are confusing for Canadians (and spelled wrong for USians and English respectively), but they're different. We want that to be as reliable as possible, and we seem to have stumbled on a class of cases for which we cannot make derivative-property rules. I think that might be a problem, which is why I wanted to get a discussion going in order to understand. > Can they provide a unique FQDN that maps to a single server? Sure, but they could do that if we’d just left it at “all of Unicode”. > No, we would not have. Are you quite sure you understand how the matching rules work? It was _never_ "all of Unicode", because that doesn't even catch the normalization cases. > At the human level, it most certainly does not. That's sort of the point, here, though. But I think your description of how this would work (elided) presents a false dichotomy: either everyone can get this with 100% accuracy or it's not a unique identifier. By similar reasoning, if you cannot eliminate all crime then you should have no laws. > B) We want a human safe unique identifier. It seems to me that, more mildly, we could just say that we want one as safe as possible, or whose "safety" we understand. Remember, this is a BoF and the point of the I-D was to get people to understand and articulate problems. > b. I’m not sure this is achievable. It’ll be hard for sure. Yes. > c. IMO such an identifier could not also be perfectly linguistic. Existing identifiers of all sorts are already not linguistic. I observe that "anvilwalrusden" is not, to my knowledge, a word in any language. Certainly "dyn" (my employer) is not -- the name is a constant source of fun around the office, since virtually everybody pronounces it "wrong" (i.e. correctly according to the locally-prevailing norms of pronunciation). > ii. Eg: map Latin, Greek & Cyrillic as appropriate. > This might actually not be a bad idea. > d. This isn’t IDNA This BoF is not only about IDNA. > > Clearly some people think that the class of the problem here is a real problem, and presumably that it is worth the effort to attempt to solve the perceived problem. I would be satisfied if, in Dallas, we came away with a clear delineation of the problem, or even a clear idea of how we might delineate the problem. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
- Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
- [Lucid] [mark@macchiato.com: Re: Non-normalizable… Andrew Sullivan
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Shawn Steele
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Andrew Sullivan
- Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… John C Klensin
- [Lucid] FW: [mark@macchiato.com: Re: Non-normaliz… Shawn Steele