Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 12 January 2015 09:30 UTC

Message-ID: <54B3940A.6020308@it.aoyama.ac.jp>
Date: Mon, 12 Jan 2015 18:29:46 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Sean Leonard <dev+ietf@seantek.com>, Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A557E1.6050502@intertwingly.net> <CACweHNCQZg1U1u8U=-f6h0+BPnp6Wr_T=r_wGiPAbhTbuMCGWQ@mail.gmail.com> <54A94109.5010901@intertwingly.net> <00cf01d02cc7$d5dba4c0$4001a8c0@gateway.2wire.net> <54B16C2B.9050604@seantek.com> <54B17BBE.4000900@intertwingly.net> <54B18B61.8010308@seantek.com> <54B19435.8070401@intertwingly.net> <54B1B211.3050807@seantek.com> <54B1B682.3070609@intertwingly.net> <54B28E0F.8070306@gmx.de> <54B2936B.7030805@intertwingly.net> <05AD7DE2-1C54-45CD-B33A-13766D771E57@mnot.net> <54B2A2CD.5080502@gmx.de> <1A5BBD25-FEBD-49B1-9EFB-4EF8877BF0E7@mnot.net> <54B2A4F9.2070909@gmx.de> <54B2A894.4020201@intertwingly.net> <54B2F4C3.5020008@seantek.com>
In-Reply-To: <54B2F4C3.5020008@seantek.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/pkix/iLA3eoYhmvdH4VuE9XXGqznLIFw>
Cc: "pkix@ietf.org" <pkix@ietf.org>, apps-discuss@ietf.org
Subject: Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt
Precedence: list

On 2015/01/12 07:10, Sean Leonard wrote:

> The security angle brings up another problem: the interoperable
> transcription of URIs across systems. The ASCII range is a limited
> repertoire, so it is easy to write it out unambiguously on paper,
> display it on a TV screen, say it over the radio or a public service
> announcement, or memorize it on your smartphone, in order to type it
> into your web browser, the command-line, or any other system of choice.

Many other writing systems are equally limited, and are easier to use 
for those using them natively. Some writing systems in East Asia aren't 
that limited, but have other mechanisms to just about allow the same 
things to happen.

> If you allow the enormous (and ever-expanding) range of Unicode
> characters in "URIs", all of those use cases become fundamentally
> ambiguous, inviting homograph attacks. Which smiley face out of nearly a
> hundred smiley emoji do you mean when you say "http://foo.com/😋" ??

Homograph attacks can happen, but only at points where different actors 
can add to the same namespace. There are exceptions, but usually, the 
path part isn't such an exception.

Also, people using smileys in their IRIs and hoping that these get 
easily copied are just shooting themselves in the foot. Just because it 
turns out that using some Unicode characters in IRIs isn't necessarily a 
good idea doesn't mean that we should exclude them all.

> How
> about an URI containing "ῗ" (U+1FD7 GREEK SMALL LETTER IOTA WITH
> DIALYTIKA AND PERISPOMENI)--what composition or decomposition mode? What
> if the combining accent mark code points are in a different order?
>
> ***
> I have empathy for what Sam/the W3C wants, since the HTML protocol slots
> basically beg to be filled with Unicode strings like <a
> href="http://zh.wikipedia.org/wiki/巴泰勒米·波岡達"> (instead of <a
> href="http://zh.wikipedia.org/wiki/%E5%B7%B4%E6%B3%B0%E5%8B%92%E7%B1%B3%C2%B7%E6%B3%A2%E5%B2%A1%E9%81%94">).

Very very much so. The former is readable (although there are better 
examples than foreign names such as Barthélemy Boganda) to a significant 
part of the world's population; the later is gibberish for everybody.

> But maybe the more interoperable approach is to define a format and
> mechanism (e.g., IRIs, or something like IRIs v2) to map /from ///the
> Unicode-capable protocol slots, /to/ the well-standardized RFC 3986 URI
> format.

With IRIs, that's essentially what we have. (of course I don't want to 
imply that the IRI spec cannot be improved upon)

Regards,   Martin.

Re: [pkix] [apps-discuss] character repertoire fo… Sean Leonard
Re: [pkix] [apps-discuss] character repertoire fo… Sean Leonard
Re: [pkix] [apps-discuss] character repertoire fo… Sam Ruby
Re: [pkix] [apps-discuss] character repertoire fo… Sam Ruby
Re: [pkix] [apps-discuss] character repertoire fo… Martin J. Dürst
Re: [pkix] [apps-discuss] character repertoire fo… Nico Williams
Re: [pkix] [apps-discuss] character repertoire fo… Martin J. Dürst
Re: [pkix] [apps-discuss] character repertoire fo… Graham Klyne