Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 19 March 2015 18:29 UTC

Return-Path: <Shawn.Steele@microsoft.com>
X-Original-To: lucid@ietfa.amsl.com
Delivered-To: lucid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B1541A87A4 for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 11:29:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xVqb8uzbhKE0 for <lucid@ietfa.amsl.com>; Thu, 19 Mar 2015 11:29:25 -0700 (PDT)
Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0707.outbound.protection.outlook.com [IPv6:2a01:111:f400:fc10::707]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFC941A8772 for <lucid@ietf.org>; Thu, 19 Mar 2015 11:29:25 -0700 (PDT)
Received: from BLUPR03MB1378.namprd03.prod.outlook.com (25.163.81.12) by BLUPR03MB1380.namprd03.prod.outlook.com (25.163.81.139) with Microsoft SMTP Server (TLS) id 15.1.112.19; Thu, 19 Mar 2015 18:29:06 +0000
Received: from BLUPR03MB1378.namprd03.prod.outlook.com ([25.163.81.12]) by BLUPR03MB1378.namprd03.prod.outlook.com ([25.163.81.12]) with mapi id 15.01.0112.000; Thu, 19 Mar 2015 18:29:06 +0000
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: "Asmus Freytag (t)" <asmus-inc@ix.netcom.com>, John C Klensin <john-ietf@jck.com>
Thread-Topic: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
Thread-Index: AQHQYeWvrIxB8XNhl06inScNBJS0IJ0jC27ggAAKHACAAB/PsIAAPkeAgACgVQCAAAQLwA==
Date: Thu, 19 Mar 2015 18:29:06 +0000
Message-ID: <BLUPR03MB1378985F9780A98646E7B31B82010@BLUPR03MB1378.namprd03.prod.outlook.com>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd0 3.prod.outlook.com> <20150311200941.GV15037@mx1.yitter.info> <CY1PR0301MB0731F4EBE5EB5C3340F7059282190@CY1PR0301MB0731.namprd03.prod.outlook.com> <20150319014018.GI5743@mx1.yitter.info> <BLUPR03MB1378184CE32E928A3086665582010@BLUPR03MB1378.namprd03.prod.outlook.com> <20150319023029.GA6046@mx1.yitter.info> <BLUPR03MB137886903F15000BB01E3F5882010@BLUPR03MB1378.namprd03.prod.outlook.com> <A62526FD387D08270363E96E@JcK-HP8200.jck.com> <550B0A32.8080704@ix.netcom.com>
In-Reply-To: <550B0A32.8080704@ix.netcom.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [50.34.94.236]
authentication-results: ix.netcom.com; dkim=none (message not signed) header.d=none;
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1380;
x-forefront-antispam-report: BMV:1; SFV:NSPM; SFS:(10019020)(6009001)(86362001)(106116001)(99286002)(33656002)(46102003)(122556002)(93886004)(40100003)(19580405001)(19580395003)(92566002)(86612001)(102836002)(2950100001)(2900100001)(77156002)(62966003)(74316001)(2656002)(50986999)(87936001)(76176999)(66066001)(54356999)(76576001)(7059030)(220923002)(222073002); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR03MB1380; H:BLUPR03MB1378.namprd03.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
x-microsoft-antispam-prvs: <BLUPR03MB13802BDA943C04F5295AE24582010@BLUPR03MB1380.namprd03.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(5002010)(5005006); SRVR:BLUPR03MB1380; BCL:0; PCL:0; RULEID:; SRVR:BLUPR03MB1380;
x-forefront-prvs: 052017CAF1
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: microsoft.onmicrosoft.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Mar 2015 18:29:06.0869 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR03MB1380
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/CwLkU-QrPNvhYCNMtZWi8iiKewQ>
Cc: "lucid@ietf.org" <lucid@ietf.org>, Andrew Sullivan <ajs@anvilwalrusden.com>
Subject: Re: [Lucid] FW: [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
X-BeenThere: lucid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Locale-free UniCode Identifiers \(LUCID\)" <lucid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lucid>, <mailto:lucid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/lucid/>
List-Post: <mailto:lucid@ietf.org>
List-Help: <mailto:lucid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lucid>, <mailto:lucid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Mar 2015 18:29:28 -0000

> Any of these forms, if they can be guaranteed to be stable, will satisfy that condition, but the point about the discussion is that identifiers that are "reasonably mnemonic" as one could characterize IDNs, do occupy a space between machine and human interaction with writing systems.

This discussion seems to be about lots of things, but "reasonably mnemonic" would limit the problem hugely.  That implies that I can see it and remember what it was to type it later.  However that scenario doesn't hit any of the security concerns being raised as I'm going to type it using the appropriate keyboard for my language.  Sure, if the domain had a Cyrillic l may have trouble, however that's not mnemonic, that's someone being sneaky.  (See fraud below)

> Actually, Arabic even has potential "variants" among the digits, so while you can make the identifier be based on the numeric value of what the human wrote in whatever writing system, you get into issues of recognition of identifiers etc. once they are communicated in their human-readable form.

That's digressing from my point.  I was trying to say that, if we consider the problem without any human factors, we already have unique IDs.  There's no way that machines confuse any of these names, it's only the humans that do that.  (Yes, they're an important part). 

> It's not just the conventions of a group of users, but also human perception issues that plague human-readable identifiers, and the "sociological" issue of the need to counteract fraud.

That's an additional requirement to the "reasonably mnemonic" above.  However if this entire exercise is intended to counteract fraud, then we get my view that "trying to address these codepoints is way more costly than its benefit".  

The people in this room aren't going to click on "1stbank.trustme.com", but everyone else on the planet (or at least 90% of them) click that link.  So worrying about someone spelling it "lstbank.com" provides close to zero value.  At least in the right font 50% of the other humans might start suspecting that one.

Heck REAL BUSINESSES send me to crappy URLs like that all the time.  "We partnered with ABCsurvey.com to get your feedback, trust us".  "We're replying about your recent experience from trustme@emailhosting.com".  "We see you like our museum, consider donating at fundraising.com/ourmuseum".  Even my BANK does that.  Heck, they even send me the "1stbank.trustme.com" links.  It irritates me to no end. 

(This total obliviousness to security/spoofing/phishing isn't even limited to DNS, I've received calls because of billing glitches and the bank says "you can pay now, give us your CC or check routing info".  That's normal.  They they ask my secret code or whatever to verify it's me, and I point out that I can't even verify that they're who they say they are and they're stumped.  Well, you can call us back at 1-800-123-4567.  Seriously!?!?!  Even my bank has no clue what security is?)

Fix that, and then we can talk about whether it's worth fixing a couple esoteric code points.  There are FAR easier exploits for any serious attacker to attack.  It's certainly not worth risking breaking real strings and real mnemonics that real users might want to use.  (IMO its not even worth prohibiting I♥NY.com)

At this point I'm well beyond the "identifying the problem statement", sorry.

However, that does bring me back to the "what are we trying to do this for?" question.  IMO there is no value in trying to secure DNS by prohibiting these.  However I'm open to the idea that there's an machine/computer science need for unique identifiers that apparently we want to be mnemonic.  I have no clue what those scenarios are.  I get wanting an IRI to identify a unique resource, but we already have that (presuming the machine does the interpretation there's no ambiguity).  I am clearly totally missing whatever scenario it is where a confusable type ID is needed for an identifier in a system where it could actually be confused.

-Shawn