Re: [Uri-review] ws: and wss: schemes

Ian Hickson <ian@hixie.ch> Tue, 13 October 2009 11:51 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: uri-review@core3.amsl.com
Delivered-To: uri-review@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 52AD228C14B; Tue, 13 Oct 2009 04:51:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h6qRpzI0yfmb; Tue, 13 Oct 2009 04:51:29 -0700 (PDT)
Received: from looneymail-a2.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by core3.amsl.com (Postfix) with ESMTP id 614F93A67E5; Tue, 13 Oct 2009 04:51:29 -0700 (PDT)
Received: from hixie.dreamhostps.com (hixie.dreamhost.com [208.113.210.27]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by looneymail-a2.g.dreamhost.com (Postfix) with ESMTP id 68C0A16D3D5; Tue, 13 Oct 2009 04:51:30 -0700 (PDT)
Date: Tue, 13 Oct 2009 12:02:47 +0000
From: Ian Hickson <ian@hixie.ch>
To: URI <uri@w3.org>
In-Reply-To: <4AB3507A.7050802@it.aoyama.ac.jp>
Message-ID: <Pine.LNX.4.62.0910131143010.25383@hixie.dreamhostps.com>
References: <Pine.LNX.4.62.0908070531430.28566@hixie.dreamhostps.com> <1249651007.25446.8934.camel@dbooth-laptop> <0B450D619CC0486E8BD51C31FBA214AD@POCZTOWIEC> <20090812021926.GC19298@shareable.org> <AB9A0CF094F04D39BC7DC5DEAFF7FC1C@POCZTOWIEC> <4AA8A2CE.3000801@it.aoyama.ac.jp> <34660A8503164BE88641374ADF2BF1A3@POCZTOWIEC> <20090910124618.GB32178@shareable.org> <11DFA16908CB4B7D8AF0F45975DE425A@POCZTOWIEC> <20090910224151.GA17387@shareable.org> <Pine.LNX.4.62.0909170834040.14605@hixie.dreamhostps.com> <4AB21FD6.3070008@gmx.de> <4AB3507A.7050802@it.aoyama.ac.jp>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="1909464018-372218033-1255435367=:25383"
Cc: "public-iri@w3.org" <public-iri@w3.org>, uri-review@ietf.org, hybi@ietf.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Subject: Re: [Uri-review] ws: and wss: schemes
X-BeenThere: uri-review@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Proposed URI Schemes <uri-review.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/uri-review>, <mailto:uri-review-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/uri-review>
List-Post: <mailto:uri-review@ietf.org>
List-Help: <mailto:uri-review-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/uri-review>, <mailto:uri-review-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Oct 2009 11:51:30 -0000

On Thu, 17 Sep 2009, Julian Reschke wrote:
> 
> It now says:
> 
> >    Encoding considerations.
> >       Characters in the host component that are excluded by the syntax
> >       defined above must be converted from Unicode to ASCII by applying
> >       the IDNA ToASCII algorithm to the Unicode host name, with both the
> >       AllowUnassigned and UseSTD3ASCIIRules flags set, and using the
> >       result of this algorithm as the host in the URI.
> > 
> >       Characters in other components that are excluded by the syntax
> >       defined above must be converted from Unicode to ASCII by first
> >       encoding the characters as UTF-8 and then replacing the
> >       corresponding bytes using their percent-encoded form as defined in
> >       the URI and IRI specification.  [RFC3986] [RFC3987]
> 
> I think that's good, except that the mention of IRI in the last sentence 
> seems to be superfluous. RFC3986 already defines everything that is 
> needed here. Or is there something specific from the IRI spec you think 
> is relevant? (In which case it should state that more clearly).

I think referencing IRI for the paragraph that talks about how to process 
IRIs is helpful, even if not strictly necessary. (As Martin later pointed 
out, though, in general, how to convert Unicode to UTF-8 to percent 
escapes appears to be defined in 3987, not 3986.)


On Fri, 18 Sep 2009, "Martin J. Dürst" wrote:
>
> I think this has various problems.
> 
> First, it is fixed to IDNA 2003 (I think I may have said this already). 
> IDNA 2008 is around the door. It doesn't use terms such as "ToASCII" or 
> "AllowUnassigned".

What are the magic terms that we should use instead? (This will affect 
HTML5 also; any advice on how to fix the terminology there would be very 
welcome also.)


> Second, if this is about resolution (rather than about generic 
> conversion), and because this is a new scheme, it should not exclude the 
> case that some part of a domain name (reg-name) is percent-encoded, 
> because both RFC 3986 and 3987 allow this.

Not sure what you mean here.


> Third, wording this as "characters" seems to say that this is a 
> character-by-character operation, or that it might be applied to 
> subsequent non-ASCII characters in groups, but ToASCII, when used, has 
> to be applied to whole labels, not characters.

The paragraph applies it to the whole hostname.


> Fourth, as http://tools.ietf.org/html/draft-iab-idn-encoding-00 shows in 
> more detail, assuming that DNS is always used for resolution of 
> reg-names, and the technology will never be used e.g. on intranets with 
> other resolution services seems to be unnecessarily restrictive.

Not sure what you mean here.


> Ideally, all the above points should be addressed by some work on the 
> IRI front (public-iri@w3.org cc'ed), but that work isn't done yet.

That would indeed be ideal.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'