Re: [URN] Re: I18N does not belong in URNs
Martin J Duerst <mduerst@ifi.unizh.ch> Fri, 15 November 1996 18:42 UTC
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id NAA15446 for urn-ietf-out; Fri, 15 Nov 1996 13:42:31 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id NAA15437 for <urn-ietf@services.bunyip.com>; Fri, 15 Nov 1996 13:42:28 -0500
Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA29572 (mail destined for urn-ietf@services.bunyip.com); Fri, 15 Nov 96 13:41:39 -0500
Received: from ifi.unizh.ch by josef.ifi.unizh.ch id <00924-0@josef.ifi.unizh.ch>; Fri, 15 Nov 1996 19:40:39 +0100
Subject: Re: [URN] Re: I18N does not belong in URNs
To: yergeau@alis.com
Date: Fri, 15 Nov 1996 19:40:38 +0100
Cc: dgd@cs.bu.edu, urn-ietf@bunyip.com
In-Reply-To: <2.2.32.19961115155024.007169c0@genstar.alis.ca> from "Francois Yergeau" at Nov 15, 96 10:50:24 am
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Content-Length: 4755
From: Martin J Duerst <mduerst@ifi.unizh.ch>
Message-Id: <"josef.ifi..890:15.10.96.18.40.40"@ifi.unizh.ch>
Sender: owner-urn-ietf@services.bunyip.com
Precedence: bulk
Reply-To: Martin J Duerst <mduerst@ifi.unizh.ch>
Errors-To: owner-urn-ietf@bunyip.com
Francois Yergeau wrote: >I fail to see why the %-encoded URN should be the reference. This is a >fallback to the bad old 7-bit days, and results in a needless waste of >bandwidth and storage resources. Reading the recent report of the IAB >Character Set Workshop (draft-weider-iab-char-wrkshop-00.txt) It's nice to have this available finally. >, I find in >section 8.2 (Recommendations for new Internet protocols): > > "New protocols do not suffer from the need to be compatible > with old 7-bit pipes. New protocol specifications SHOULD > use ISO 10646 as the base charset unless there is an > overriding need to use a different base charset." That's indeed what we are doing. Pipe width and base charset are not directly related. >Elsewhere (3.4.3), UTF-8 is recommended as the encoding and use of escape >mechanisms is warned against ("...must be weighed very carefully"). This warns against techniques such as SGML &#nnn;. %HH is not on the character level, it is on the octet level. And it is already well established for URLs. >> We can define the standard as %-encoded UTF-8, and if people implement >>this other ways, they are implementing convenience features in the >>interface: the software will always have the %-encoded URN available. Much software will probably do so anyway, despite what the standard says, and without creating a conflict, because storing and comparing is more efficient on the 8-bit form. >As if 8-bit octets on-the-wire were something evil! It is much wiser, IMHO, >to have the real UTF-8 as the reference value, and have the %-encoding as >the convenience feature (it must be there anyway for reserved and unsafe >characters, so there is no risk that an application will not support it). >If a user needs ASCII-only, let *his* software do the %-encoding for him, >but let's not force a 9-byte encoding on CJK characters when 3 are enough. > >There should be a good reason to burden the whole world forever with >%-encoding of all 8-bit octets, and I see none at all, except for a visceral >and unwarranted fear of 8-bit octets. I think we have to be careful, because there are at least two ways in which URNs can be transferred/stored: - In "dedicated" protocols and databases. An example is the header of an HTTP request. - In text. An example is HTML. For the former, raw 8-bit (i.e. UTF-8) can be used. According to the standards, officially HTTP headers are limited to ASCII, but in practice, they will pass 8 bits without problems. (If not, please don't make a long discussion out of this. It only serves as an example of a (part of) a protocol that up to now transmitted "raw" data without consideration to character set issues. For the later, as Francois probably knows even better than I do from his work on URL internationalization, putting an URN with 10646 characters into a HTML document written in iso-8859-1 in raw 8-bit form will produce bad results. Without extremely clever tool support, it will neither be possible to input such an URN, nor will an URN show up with the characters it represents. Transcoding, as well as other operations such as cut-and-paste, will also not do what everybody would hope for. Just saying "use 8 bits, use 8 bits" could however give the impression to some implementors that the UTF-8 8-bit octets should appear as such in an HTML document in iso-8859-1. Whatever we make the "standard" or "base" form, or whether we such a form or not, we should therefore clearly say that URNs - Can be transmitted/stored in 8-bit form in protocols/databases that accomodate URNs as such, and not as part of text and/or associated with character encoding information. - Have to be interpreted and treated as characters when transmitted as part of an encoded text with (explicitly or implicitly) associated character encoding information. Those characters that cannot be represented in the choosen encoding, as well as %HH sequences that do not form valid UTF-8 sequences (and of course reserved characters) have to stay in %HH form. I know that the last point may again frighten some of you. It seems to introduce a new representation. But if you think about URLs in EBCDIC, you will see that it is nothing new. Personally, I think that the second paragraph above could be amended with a sentence saying that to avoid eventual misinterpretations due to lack of appropriate information about character encoding, and to make the URN transcribable to the widest audience, full %HH encoding can/should be choosen. We may have to discuss about how strong this wording should be. But we definitely have to include something that avoids misunderstandings so that raw 8-bit UTF-8 will never turn up as such in e.g. iso-8859-1 documents. Regards, Martin.
- Re: [URN] Re: I18N does not belong in URNs Francois Yergeau
- Re: [URN] Re: I18N does not belong in URNs Martin J Duerst
- Re: [URN] Re: I18N does not belong in URNs Ron Daniel
- Re: [URN] Re: I18N does not belong in URNs Francois Yergeau
- Re: [URN] Re: I18N does not belong in URNs Martin J Duerst
- Re: [URN] Re: I18N does (somehow) belong in URNs Martin J Duerst
- [URN] Re: I18N does not belong in URNs Keith Moore
- Re: [URN] Re: I18N does not belong in URNs Fisher Mark
- Re: [URN] Re: I18N does not belong in URNs David G. Durand
- Re: [URN] Re: I18N does not belong in URNs Martin J Duerst
- Re: [URN] UTF encodings - why UTF-8? (was: I18N d… Martin J Duerst
- [URN] UTF encodings - why UTF-8? (was: I18N does … Toby Speight
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Dirk vanGulik
- Re: [URN] Please avoid "URNs are" Harald.T.Alvestrand
- Re: [URN] Re: I18N does not belong in URNs Keith Moore
- Re: [URN] Re: I18N does not belong in URNs Keith Moore
- Re: [URN] Re: I18N does not belong in URNs Terry Allen
- Re: [URN] Please avoid "URNs are" Martin J Duerst
- Re: [URN] Please avoid "URNs are" Keith Moore
- [URN] Re: I18N does not belong in URNs Keith Moore
- Re: [URN] Please avoid "URNs are" Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] Please avoid "URNs are" Keith Moore
- Re: [URN] Please avoid "URNs are" Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] Please avoid "URNs are" Keith Moore
- Re: [URN] Please avoid "URNs are" Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Larry Masinter
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Ron Daniel
- Re: [URN] I18N does not belong in URNs Dirk.vanGulik
- Re: [URN] I18N does not belong in URNs Fisher Mark
- Re: [URN] I18N does not belong in URNs Dirk.vanGulik
- Re: [URN] I18N does not belong in URNs Harald.T.Alvestrand
- Re: [URN] I18N does not belong in URNs Jim Conklin
- Re: [URN] I18N does not belong in URNs Stu Weibel
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Jim Conklin
- Re: [URN] I18N does not belong in URNs Dirk.vanGulik
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs David G. Durand
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Keith Moore
- [URN] Re: Comments on "I18N does not belong in UR… Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Fisher Mark
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Patrik Faltstrom
- Re: [URN] I18N does not belong in URNs jayhawk
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Karen R. Sollins
- Re: [URN] I18N does not belong in URNs Terry Allen
- Re: [URN] Comments on "I18N does not belong in UR… Karen R. Sollins
- Re: [URN] I18N does not belong in URNs Lewis Girod
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs David G. Durand
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Keith Moore
- [URN] Comments on "I18N does not belong in URNs" Ryan Moats
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Larry Masinter
- Re: [URN] I18N does not belong in URNs Jon Knight
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Martin J Duerst
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Lewis Girod
- Re: [URN] I18N does not belong in URNs Keith Moore
- Re: [URN] I18N does not belong in URNs Terry Allen
- [URN] I18N does not belong in URNs Keith Moore