Re: [precis] string classes and normalization forms

Le 11-03-05 01:59, Patrik Fältström a écrit :
> Sorry if this has been discussed already...
>
> Lots of the information in this document is the same as RFC 5892.
>
> Is not a better solution to have this document be a "diff", so that it is building upon RFC 5892?
>

to me, it is just too early. The framework was a first start to discuss. 
When we will have agreement on what to do, then we would see whatever is 
the best way to write it down, including a diff, or else.

Marc.

>     Patrik
>
> On 4 mar 2011, at 23.19, Peter Saint-Andre wrote:
>
>> <hat type='individual'/>
>>
>> I started to write a document outlining results of my own research and
>> discussion within the XMPP WG, but then I realized it would be more
>> productive to provide feedback on draft-blanchet-precis-framework-00.
>> Please take these comments in the spirit of exploration and as a spur to
>> discussion in the PRECIS WG. (Thanks to various XMPP WG folks, esp. Joe
>> Hildebrand, for productive conversations about these issues.)
>>
>> Issue #1: String Classes
>>
>> draft-blanchet-precis-framework-00 describes these string classes:
>>
>>    o  domain U-label
>>    o  domain A-label
>>    o  domain name
>>    o  email address
>>    o  restricted identifier
>>    o  less-restrictive identifier
>>
>> We can leave the first four to other specs, no?
>>
>> In the document I started to write, I was going to define two classes:
>>
>> a. "names" (or "usernamey things" if you like)
>> b. "codes" (or "passwordy things" if you like)
>>
>> (There is also the possibility that we might want something like a
>> free-form string, but it's not clear to me if we really need a
>> technology for preparing and comparing those -- we can simply treat them
>> as UTF-8 encoded Unicode codepoints, or somesuch.)
>>
>> Let me try to describe the classes I had in mind:
>>
>> a. NAMES. I see a "name" as a word or set of words that is used to
>> identify or address a network entity such as a user, an account, a venue
>> (e.g., a chatroom), an information source (e.g., a feed), or a
>> collection of data (e.g., a file). For the convenience of humans, a name
>> typically consists of a memorable sequence of letters, numbers, and a
>> few conventional symbol and punctuation characters. The "name" class
>> would disallow spaces, the at-sign (because usernamey things are often
>> used as the left-hand side of email addresses and Jabber IDs and such),
>> almost all symbol characters (except those from the ASCII range), etc.
>> Also disallowed would be any character that is compatibility
>> decomposable into another character (e.g., U+017F "ſ" is compatibility
>> decomposable into U+0073 "s") or into a sequence of characters (e.g.,
>> U+2163 "Ⅳ" is compatibility decomposable into U+0049 "I" and U+0056
>> "V"). All members of the "name" class would contain only lowercase
>> letters, not uppercase letters or titlecase letters (this is different
>> from IDNA, where uppercase letters are allowed and preserved but case is
>> ignored for comparison purposes).
>>
>> The foregoing description is similar to the "Less-Restrictive
>> Identifier" class from draft-blanchet-precis-framework-00. I don't know
>> if I see a need for the "Restricted Identifier" class from the I-D --
>> i.e., a string class that disallows all punctuation and all display
>> characters (BTW what exactly is a display character?).
>>
>> b. CODES. I see a "code" as a sequence of letters, numbers, and symbols
>> that is used as a secret for access to some resource on a network (e.g.,
>> an account or a venue). To improve security, codes would be
>> case-sensitive. The "@" character and other punctuation and basic symbol
>> characters would be allowed, but symbols outside the US-ASCII range
>> would be disallowed. We would also still disallow any character that is
>> compatibility decomposable into another character or into a sequence of
>> characters.
>>
>> Issue #2: Normalization.
>>
>> Following IDNA2003, existing stringprep profiles all use Unicode
>> Normalization Form KC (NFKC), which performs canonical decomposition and
>> compatibility decomposition, followed by canonical and compatibility
>> recomposition. This choice made sense in IDNA2003 because the DNS packet
>> format has fixed-length labels, and NFKC in effect compresses a sequence
>> of characters into the smallest number of bytes possible by performing
>> recomposition. However, experience with some of the application
>> protocols that are currently using NFKC (e.g., XMPP) has shown that
>> recomposition is an expensive operation to perform in application
>> servers. In addition, the application protocols that use stringprep all
>> use TCP with security-layer or application-layer compression (e.g., via
>> TLS or things like XEP-0138 in XMPP), so fixing the length of strings is
>> much less important.
>>
>> What matters most in application protocols is ensuring that network
>> entities (such as clients and servers) all communicate a consistent
>> string representation over the wire. For this purpose, Normalization
>> Form D (NFD), which simply performs canonical decomposition, provides
>> the most efficient approach. As noted above, we can disallow any
>> characters that would require compatibility decomposition, thus removing
>> the need for compatibility decomposition and recomposition. This is what
>> happened in IDNA208, enabling the IDNA folks to move from NFKC to NFC.
>> If we take the same approach in PRECIS but also get rid of recomposition
>> entirely, we can move from NFKC (the most complex and therefore most
>> computationally intensive normalization form) to NFD (the least complex
>> and therefore least computationally intensive normalization form). This
>> will be a big win for application servers.
>>
>> OK, I think that's enough controversy for today. :)
>>
>> Peter
>>
>> --
>> Peter Saint-Andre
>> https://stpeter.im/
>>
>>
>>
>> _______________________________________________
>> precis mailing list
>> precis@ietf.org
>> https://www.ietf.org/mailman/listinfo/precis
>
> _______________________________________________
> precis mailing list
> precis@ietf.org
> https://www.ietf.org/mailman/listinfo/precis

-- 
=========
IPv6 book: Migrating to IPv6, Wiley. http://www.ipv6book.ca
Stun/Turn server for VoIP NAT-FW traversal: http://numb.viagenie.ca
DTN Implementation: http://postellation.viagenie.ca
NAT64-DNS64 Opensource: http://ecdysis.viagenie.ca