Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt
Sean Leonard <dev+ietf@seantek.com> Sun, 11 January 2015 22:11 UTC
Return-Path: <dev+ietf@seantek.com>
X-Original-To: pkix@ietfa.amsl.com
Delivered-To: pkix@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9A8161A1A3B; Sun, 11 Jan 2015 14:11:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.6
X-Spam-Level:
X-Spam-Status: No, score=-4.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 032nKCohgF8t; Sun, 11 Jan 2015 14:11:22 -0800 (PST)
Received: from mxout-07.mxes.net (mxout-07.mxes.net [216.86.168.182]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F3CD81A0396; Sun, 11 Jan 2015 14:11:21 -0800 (PST)
Received: from [192.168.123.7] (unknown [23.241.1.22]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 1333222E200; Sun, 11 Jan 2015 17:11:19 -0500 (EST)
Message-ID: <54B2F4C3.5020008@seantek.com>
Date: Sun, 11 Jan 2015 14:10:11 -0800
From: Sean Leonard <dev+ietf@seantek.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Sam Ruby <rubys@intertwingly.net>, Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <EAACE200D9B0224D94BF52CF2DD166A425A68A90@ex10mb6.qut.edu.au> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A557E1.6050502@intertwingly.net> <CACweHNCQZg1U1u8U=-f6h0+BPnp6Wr_T=r_wGiPAbhTbuMCGWQ@mail.gmail.com> <54A94109.5010901@intertwingly.net> <00cf01d02cc7$d5dba4c0$4001a8c0@gateway.2wire.net> <54B16C2B.9050604@seantek.com> <54B17BBE.4000900@intertwingly.net> <54B18B61.8010308@seantek.com> <54B19435.8070401@intertwingly.net> <54B1B211.3050807@seantek.com> <54B1B682.3070609@intertwingly.net> <54B28E0F.8070306@gmx.de> <54B2936B.7030805@intertwingly.net> <05AD7DE2-1C54-45CD-B33A-13766D771E57@mnot.net> <54B2A2CD.5080502@gmx.de> <1A5BBD25-FEBD-49B1-9EFB-4EF8877BF0E7@mnot.net> <54B2A4F9.2070909@gmx.de> <54B2A894.4020201@intertwingly.net>
In-Reply-To: <54B2A894.4020201@intertwingly.net>
Content-Type: multipart/alternative; boundary="------------010706050003040906040904"
Archived-At: <http://mailarchive.ietf.org/arch/msg/pkix/BeH7og4MCcR4bILcuCdnEx3EPpo>
Cc: "pkix@ietf.org" <pkix@ietf.org>, apps-discuss@ietf.org
Subject: Re: [pkix] [apps-discuss] character repertoire for fragment identifiers, was: Fwd: FW: New Version Notification for draft-kerwin-file-scheme-13.txt
X-BeenThere: pkix@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: PKIX Working Group <pkix.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pkix>, <mailto:pkix-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pkix/>
List-Post: <mailto:pkix@ietf.org>
List-Help: <mailto:pkix-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pkix>, <mailto:pkix-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jan 2015 22:11:25 -0000
[Adding pkix@] On the two intertwined points: On 1/11/2015 8:28 AM, Julian Reschke wrote: > On 2015-01-11 17:19, Sam Ruby wrote: >> ... >>> Now suffering from information overflow. >>> >>> We were discussing RFC 3986. Which *ASCII* characters that are >>> currently >>> forbidden in fragment identifiers do you want to allow? >> >> We seem to be in a loop. >> >> To you, RFC 3986 (including potential errata and/or bis work) implies >> ASCII. >> >> I point out that that restriction does not seem to make sense for >> fragments. > > Actually, you did not. Instead you pointed to lots of material > somewhere else that I didn't want to parse just to find out what > you're proposing. > >> You return back to asking about ASCII. > > Because that's what RFC 3986 is concerned with. > >> To help break this loop, permit me to turn this question around. >> Restricting the scope of the conversation to fragments, why do you >> believe it makes sense to limit such to ASCII only? What problems do >> non-ASCII characters in fragments cause? > > I fail to see how fragments are special here compared to, say, the path. > > I fully agree that work on what we used to call IRIs is something that > needs to be done. However right now I'm trying to figure out what's > wrong with RFC *3986*. > > I hear you saying that URIs should allow non-ASCII characters in > certain places. This may break code where these characters are put on > the wire (such as HTTP) or stored in places that do not allow > non-ASCII characters (say, a database column). > > The fact that it's hard to extend the URI character repertoire beyond > ASCII is why the IETF attempted to do it in a separate spec/construct. > I'm not convinced that the situation has changed sufficiently to > invalidate that approach. On 1/11/2015 8:45 AM, Sam Ruby wrote: > On 01/11/2015 11:29 AM, Julian Reschke wrote: >> On 2015-01-11 17:25, Mark Nottingham wrote: >>> I and others have brought that up. What’s interesting is that they say >>> it’s reasonably interoperable with deployed implementations. >>> >>> Cheers, >>> ... >> >> Let me guess: "deployed implementations" == "what current browsers do". > > Your sarcasm is not appreciated. > > I encourage you to actually look at test results: > > https://url.spec.whatwg.org/interop/test-results/7b83ef3682 *** I fully agree with Julian on the matter of US-ASCII for URIs. URIs (RFC 3986) are only made of US-ASCII characters. If someone wishes to extend URIs (as opposed to IRIs or whatever) to include non-US-ASCII characters, that's a problem for web browsers and all other Internet software alike. This goes exactly to my point about protocol slots. Certificates, CRLs, and other security objects are just as fundamentally a part of the Web (and web browser) infrastructure as HTML. In X.509/PKIX security objects, the GeneralName uniformResourceIdentifier construct is US-ASCII only (IA5String). If you extend "URIs" to be beyond US-ASCII, RFC 5280 has to be updated...and all the security libraries that depend upon it. Just because HTML(5) can be served as UTF-8 or use & encoding or whatever, doesn't make the problem go away. Does the URL Interop test-results explicitly test for certificates? I suggest attempting to put some non-US-ASCII characters in a GeneralName protocol slot (say, for revocation) and see what happens. HTML 4.01 is at least consistent in saying (for its time) that hrefs and other things are URIs. For interoperable behavior, use US-ASCII characters only and stick with % encoding. The security angle brings up another problem: the interoperable transcription of URIs across systems. The ASCII range is a limited repertoire, so it is easy to write it out unambiguously on paper, display it on a TV screen, say it over the radio or a public service announcement, or memorize it on your smartphone, in order to type it into your web browser, the command-line, or any other system of choice. If you allow the enormous (and ever-expanding) range of Unicode characters in "URIs", all of those use cases become fundamentally ambiguous, inviting homograph attacks. Which smiley face out of nearly a hundred smiley emoji do you mean when you say "http://foo.com/😋" ?? How about an URI containing "ῗ" (U+1FD7 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI)--what composition or decomposition mode? What if the combining accent mark code points are in a different order? *** I have empathy for what Sam/the W3C wants, since the HTML protocol slots basically beg to be filled with Unicode strings like <a href="http://zh.wikipedia.org/wiki/巴泰勒米·波岡達"> (instead of <a href="http://zh.wikipedia.org/wiki/%E5%B7%B4%E6%B3%B0%E5%8B%92%E7%B1%B3%C2%B7%E6%B3%A2%E5%B2%A1%E9%81%94">). But maybe the more interoperable approach is to define a format and mechanism (e.g., IRIs, or something like IRIs v2) to map /from ///the Unicode-capable protocol slots, /to/ the well-standardized RFC 3986 URI format. My 2¢. Sean
- Re: [pkix] [apps-discuss] character repertoire fo… Sean Leonard
- Re: [pkix] [apps-discuss] character repertoire fo… Sean Leonard
- Re: [pkix] [apps-discuss] character repertoire fo… Sam Ruby
- Re: [pkix] [apps-discuss] character repertoire fo… Sam Ruby
- Re: [pkix] [apps-discuss] character repertoire fo… Martin J. Dürst
- Re: [pkix] [apps-discuss] character repertoire fo… Nico Williams
- Re: [pkix] [apps-discuss] character repertoire fo… Martin J. Dürst
- Re: [pkix] [apps-discuss] character repertoire fo… Graham Klyne