Re: [precis] string classes and normalization forms
Marc Blanchet <marc.blanchet@viagenie.ca> Sat, 05 March 2011 15:16 UTC
Return-Path: <marc.blanchet@viagenie.ca>
X-Original-To: precis@core3.amsl.com
Delivered-To: precis@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1603F3A67F4 for <precis@core3.amsl.com>; Sat, 5 Mar 2011 07:16:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.599
X-Spam-Level:
X-Spam-Status: No, score=-104.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, GB_I_LETTER=-2, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4NzzStk+pBCs for <precis@core3.amsl.com>; Sat, 5 Mar 2011 07:16:22 -0800 (PST)
Received: from jazz.viagenie.ca (jazz.viagenie.ca [IPv6:2620:0:230:8000::2]) by core3.amsl.com (Postfix) with ESMTP id 0E0463A69DE for <precis@ietf.org>; Sat, 5 Mar 2011 07:16:22 -0800 (PST)
Received: from mbl.lan (modemcable205.228-23-96.mc.videotron.ca [96.23.228.205]) by jazz.viagenie.ca (Postfix) with ESMTPSA id 53F4D20CC2; Sat, 5 Mar 2011 10:17:31 -0500 (EST)
Message-ID: <4D72540A.5090800@viagenie.ca>
Date: Sat, 05 Mar 2011 10:17:30 -0500
From: Marc Blanchet <marc.blanchet@viagenie.ca>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; fr; rv:1.9.2.14) Gecko/20110221 Lightning/1.0b2 Thunderbird/3.1.8
MIME-Version: 1.0
To: precis@ietf.org
References: <4D71655E.1070409@stpeter.im> <A17A39A6-6314-4704-B98B-3523A0BEA54C@frobbit.se>
In-Reply-To: <A17A39A6-6314-4704-B98B-3523A0BEA54C@frobbit.se>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: patrick@frobbit.se
Subject: Re: [precis] string classes and normalization forms
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Mar 2011 15:16:24 -0000
Le 11-03-05 01:59, Patrik Fältström a écrit : > Sorry if this has been discussed already... > > Lots of the information in this document is the same as RFC 5892. > > Is not a better solution to have this document be a "diff", so that it is building upon RFC 5892? > to me, it is just too early. The framework was a first start to discuss. When we will have agreement on what to do, then we would see whatever is the best way to write it down, including a diff, or else. Marc. > Patrik > > On 4 mar 2011, at 23.19, Peter Saint-Andre wrote: > >> <hat type='individual'/> >> >> I started to write a document outlining results of my own research and >> discussion within the XMPP WG, but then I realized it would be more >> productive to provide feedback on draft-blanchet-precis-framework-00. >> Please take these comments in the spirit of exploration and as a spur to >> discussion in the PRECIS WG. (Thanks to various XMPP WG folks, esp. Joe >> Hildebrand, for productive conversations about these issues.) >> >> Issue #1: String Classes >> >> draft-blanchet-precis-framework-00 describes these string classes: >> >> o domain U-label >> o domain A-label >> o domain name >> o email address >> o restricted identifier >> o less-restrictive identifier >> >> We can leave the first four to other specs, no? >> >> In the document I started to write, I was going to define two classes: >> >> a. "names" (or "usernamey things" if you like) >> b. "codes" (or "passwordy things" if you like) >> >> (There is also the possibility that we might want something like a >> free-form string, but it's not clear to me if we really need a >> technology for preparing and comparing those -- we can simply treat them >> as UTF-8 encoded Unicode codepoints, or somesuch.) >> >> Let me try to describe the classes I had in mind: >> >> a. NAMES. I see a "name" as a word or set of words that is used to >> identify or address a network entity such as a user, an account, a venue >> (e.g., a chatroom), an information source (e.g., a feed), or a >> collection of data (e.g., a file). For the convenience of humans, a name >> typically consists of a memorable sequence of letters, numbers, and a >> few conventional symbol and punctuation characters. The "name" class >> would disallow spaces, the at-sign (because usernamey things are often >> used as the left-hand side of email addresses and Jabber IDs and such), >> almost all symbol characters (except those from the ASCII range), etc. >> Also disallowed would be any character that is compatibility >> decomposable into another character (e.g., U+017F "ſ" is compatibility >> decomposable into U+0073 "s") or into a sequence of characters (e.g., >> U+2163 "Ⅳ" is compatibility decomposable into U+0049 "I" and U+0056 >> "V"). All members of the "name" class would contain only lowercase >> letters, not uppercase letters or titlecase letters (this is different >> from IDNA, where uppercase letters are allowed and preserved but case is >> ignored for comparison purposes). >> >> The foregoing description is similar to the "Less-Restrictive >> Identifier" class from draft-blanchet-precis-framework-00. I don't know >> if I see a need for the "Restricted Identifier" class from the I-D -- >> i.e., a string class that disallows all punctuation and all display >> characters (BTW what exactly is a display character?). >> >> b. CODES. I see a "code" as a sequence of letters, numbers, and symbols >> that is used as a secret for access to some resource on a network (e.g., >> an account or a venue). To improve security, codes would be >> case-sensitive. The "@" character and other punctuation and basic symbol >> characters would be allowed, but symbols outside the US-ASCII range >> would be disallowed. We would also still disallow any character that is >> compatibility decomposable into another character or into a sequence of >> characters. >> >> Issue #2: Normalization. >> >> Following IDNA2003, existing stringprep profiles all use Unicode >> Normalization Form KC (NFKC), which performs canonical decomposition and >> compatibility decomposition, followed by canonical and compatibility >> recomposition. This choice made sense in IDNA2003 because the DNS packet >> format has fixed-length labels, and NFKC in effect compresses a sequence >> of characters into the smallest number of bytes possible by performing >> recomposition. However, experience with some of the application >> protocols that are currently using NFKC (e.g., XMPP) has shown that >> recomposition is an expensive operation to perform in application >> servers. In addition, the application protocols that use stringprep all >> use TCP with security-layer or application-layer compression (e.g., via >> TLS or things like XEP-0138 in XMPP), so fixing the length of strings is >> much less important. >> >> What matters most in application protocols is ensuring that network >> entities (such as clients and servers) all communicate a consistent >> string representation over the wire. For this purpose, Normalization >> Form D (NFD), which simply performs canonical decomposition, provides >> the most efficient approach. As noted above, we can disallow any >> characters that would require compatibility decomposition, thus removing >> the need for compatibility decomposition and recomposition. This is what >> happened in IDNA208, enabling the IDNA folks to move from NFKC to NFC. >> If we take the same approach in PRECIS but also get rid of recomposition >> entirely, we can move from NFKC (the most complex and therefore most >> computationally intensive normalization form) to NFD (the least complex >> and therefore least computationally intensive normalization form). This >> will be a big win for application servers. >> >> OK, I think that's enough controversy for today. :) >> >> Peter >> >> -- >> Peter Saint-Andre >> https://stpeter.im/ >> >> >> >> _______________________________________________ >> precis mailing list >> precis@ietf.org >> https://www.ietf.org/mailman/listinfo/precis > > _______________________________________________ > precis mailing list > precis@ietf.org > https://www.ietf.org/mailman/listinfo/precis -- ========= IPv6 book: Migrating to IPv6, Wiley. http://www.ipv6book.ca Stun/Turn server for VoIP NAT-FW traversal: http://numb.viagenie.ca DTN Implementation: http://postellation.viagenie.ca NAT64-DNS64 Opensource: http://ecdysis.viagenie.ca
- [precis] string classes and normalization forms Peter Saint-Andre
- Re: [precis] string classes and normalization for… Patrik Fältström
- Re: [precis] string classes and normalization for… Marc Blanchet
- Re: [precis] string classes and normalization for… Patrik Fältström