Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 12 January 2015 09:30 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: pkix@ietfa.amsl.com
Delivered-To: pkix@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F330E1A8A83; Mon, 12 Jan 2015 01:30:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.801
X-Spam-Level:
X-Spam-Status: No, score=-1.801 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MPdj-mZmmykQ; Mon, 12 Jan 2015 01:30:32 -0800 (PST)
Received: from scintmta01-14.scbb.aoyama.ac.jp (scintmta.scbb.aoyama.ac.jp [133.2.253.64]) by ietfa.amsl.com (Postfix) with ESMTP id 9ED8F1A8A65; Mon, 12 Jan 2015 01:30:32 -0800 (PST)
Received: from scmeg01-14.scbb.aoyama.ac.jp (scmeg01-14.scbb.aoyama.ac.jp [133.2.253.15]) by scintmta01-14.scbb.aoyama.ac.jp (Postfix) with ESMTP id DF15432E577; Mon, 12 Jan 2015 18:29:47 +0900 (JST)
Received: from itmail2.it.aoyama.ac.jp (unknown [133.2.206.134]) by scmeg01-14.scbb.aoyama.ac.jp with smtp id 7f59_2c34_a9afb93b_81b0_4108_81fc_202249b90379; Mon, 12 Jan 2015 18:29:47 +0900
Received: from [133.2.210.64] (unknown [133.2.210.64]) by itmail2.it.aoyama.ac.jp (Postfix) with ESMTP id 14027BF521; Mon, 12 Jan 2015 18:29:47 +0900 (JST)
Message-ID: <54B3940A.6020308@it.aoyama.ac.jp>
Date: Mon, 12 Jan 2015 18:29:46 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Sean Leonard <dev+ietf@seantek.com>, Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A557E1.6050502@intertwingly.net> <CACweHNCQZg1U1u8U=-f6h0+BPnp6Wr_T=r_wGiPAbhTbuMCGWQ@mail.gmail.com> <54A94109.5010901@intertwingly.net> <00cf01d02cc7$d5dba4c0$4001a8c0@gateway.2wire.net> <54B16C2B.9050604@seantek.com> <54B17BBE.4000900@intertwingly.net> <54B18B61.8010308@seantek.com> <54B19435.8070401@intertwingly.net> <54B1B211.3050807@seantek.com> <54B1B682.3070609@intertwingly.net> <54B28E0F.8070306@gmx.de> <54B2936B.7030805@intertwingly.net> <05AD7DE2-1C54-45CD-B33A-13766D771E57@mnot.net> <54B2A2CD.5080502@gmx.de> <1A5BBD25-FEBD-49B1-9EFB-4EF8877BF0E7@mnot.net> <54B2A4F9.2070909@gmx.de> <54B2A894.4020201@intertwingly.net> <54B2F4C3.5020008@seantek.com>
In-Reply-To: <54B2F4C3.5020008@seantek.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/pkix/iLA3eoYhmvdH4VuE9XXGqznLIFw>
X-Mailman-Approved-At: Mon, 12 Jan 2015 07:56:06 -0800
Cc: "pkix@ietf.org" <pkix@ietf.org>, apps-discuss@ietf.org
Subject: Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt
X-BeenThere: pkix@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: PKIX Working Group <pkix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pkix>, <mailto:pkix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pkix/>
List-Post: <mailto:pkix@ietf.org>
List-Help: <mailto:pkix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pkix>, <mailto:pkix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jan 2015 09:30:35 -0000

On 2015/01/12 07:10, Sean Leonard wrote:

> The security angle brings up another problem: the interoperable
> transcription of URIs across systems. The ASCII range is a limited
> repertoire, so it is easy to write it out unambiguously on paper,
> display it on a TV screen, say it over the radio or a public service
> announcement, or memorize it on your smartphone, in order to type it
> into your web browser, the command-line, or any other system of choice.

Many other writing systems are equally limited, and are easier to use 
for those using them natively. Some writing systems in East Asia aren't 
that limited, but have other mechanisms to just about allow the same 
things to happen.

> If you allow the enormous (and ever-expanding) range of Unicode
> characters in "URIs", all of those use cases become fundamentally
> ambiguous, inviting homograph attacks. Which smiley face out of nearly a
> hundred smiley emoji do you mean when you say "http://foo.com/😋" ??

Homograph attacks can happen, but only at points where different actors 
can add to the same namespace. There are exceptions, but usually, the 
path part isn't such an exception.

Also, people using smileys in their IRIs and hoping that these get 
easily copied are just shooting themselves in the foot. Just because it 
turns out that using some Unicode characters in IRIs isn't necessarily a 
good idea doesn't mean that we should exclude them all.

> How
> about an URI containing "ῗ" (U+1FD7 GREEK SMALL LETTER IOTA WITH
> DIALYTIKA AND PERISPOMENI)--what composition or decomposition mode? What
> if the combining accent mark code points are in a different order?
>
> ***
> I have empathy for what Sam/the W3C wants, since the HTML protocol slots
> basically beg to be filled with Unicode strings like <a
> href="http://zh.wikipedia.org/wiki/巴泰勒米·波岡達"> (instead of <a
> href="http://zh.wikipedia.org/wiki/%E5%B7%B4%E6%B3%B0%E5%8B%92%E7%B1%B3%C2%B7%E6%B3%A2%E5%B2%A1%E9%81%94">).

Very very much so. The former is readable (although there are better 
examples than foreign names such as Barthélemy Boganda) to a significant 
part of the world's population; the later is gibberish for everybody.


> But maybe the more interoperable approach is to define a format and
> mechanism (e.g., IRIs, or something like IRIs v2) to map /from ///the
> Unicode-capable protocol slots, /to/ the well-standardized RFC 3986 URI
> format.

With IRIs, that's essentially what we have. (of course I don't want to 
imply that the IRI spec cannot be improved upon)

Regards,   Martin.