Re: [precis] Fwd: I-D Action: draft-blanchet-precis-framework-01.txt

Florian Zeitz <florob@babelmonkeys.de> Sat, 21 May 2011 01:02 UTC

Return-Path: <florob@babelmonkeys.de>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 84C5AE068B for <precis@ietfa.amsl.com>; Fri, 20 May 2011 18:02:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tDc49oJs72l3 for <precis@ietfa.amsl.com>; Fri, 20 May 2011 18:02:50 -0700 (PDT)
Received: from babelmonkeys.de (unknown [IPv6:2a01:4f8:140:9341:a2b3::ab]) by ietfa.amsl.com (Postfix) with ESMTP id C6FEBE0665 for <precis@ietf.org>; Fri, 20 May 2011 18:02:48 -0700 (PDT)
Received: from xdsl-213-196-246-162.netcologne.de ([213.196.246.162] helo=[192.168.0.38]) by babelmonkeys.de with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <florob@babelmonkeys.de>) id 1QNaad-0001O2-3Q for precis@ietf.org; Sat, 21 May 2011 03:02:47 +0200
Message-ID: <4DD70F31.4080308@babelmonkeys.de>
Date: Sat, 21 May 2011 03:02:41 +0200
From: Florian Zeitz <florob@babelmonkeys.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: precis@ietf.org
References: <4DD2A264.3040001@stpeter.im>
In-Reply-To: <4DD2A264.3040001@stpeter.im>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Subject: Re: [precis] Fwd: I-D Action: draft-blanchet-precis-framework-01.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 21 May 2011 01:02:51 -0000

Am 17.05.2011 18:29, schrieb Peter Saint-Andre:
> This document attempts to incorporate the rough proposal I outlined in
> Prague. It still contains a number of open issues. Feedback is very much
> welcome!
> 
> /psa
> 
Hy,

this is some feedback based on an initial read-through. It contains both
comments and questions that came to mind during reading.
The questions are largely unfiltered (i.e. answers to them became
somewhat apparent after having read the whole document) in the hope that
this will help identify areas where the text could be improved.

First of all I maintain that "compatibility equivalent" (used in
multiple places) is the wrong term. Unicode 6.0 Chapter 3.7 defines
compatibility equivalent by their full compatibility decompositions
being identical. Compatibility decomposition in turn however is defined
as the result of applying both compatibility mappings and canonical
mappings.
That is different from the definition NFKC(cp) == cp.
I think the proper wording for the category Q is "All compatibility
decomposable characters".

As someone else had mentioned in Prague I too am not sure "Wordything"
is a good expression for passwords. While passwords are words in the
sense of being elements of L((((A|Q)\(L|N|O|P))|K)*) (i.e. The language
described by the regular expression that (hopefully) describes
Wordything) people tend to think of words as "Those things listed in a
dictionary" which is precisely what we don't want people to use as password.

I understand Stringything stems from XMPP's resources.
However the draft describes it as being used for nick-names. It is not
clear to me what the argument for differentiating nick-names and normal
names is. Especially considering both potentially need to be entered by
a end-user.

Having a Valid and a Disallowed rule was quite confusing. Especially
considering that Valid rules contain exceptions from the Disallowed rule
and vice-versa.
Some of the questions that sprung to mind were:
What about code points that are neither matched by the Valid nor the
Disallowed rule? Can that happen?
Is the union of both sets (Valid and Disallowed) the set of all Unicode
code points? Is it guaranteed to stay that way in future Unicode versions?
Also theses rules don't describe behaviour for CONTEXT? code points.

The draft says in a lot of places that directionality, case-mapping and
normalization rules are application specific (At least sections 3,
3.{1,2,3}.{4,5,6} and 4.1). That seams rather redundant.

Maybe that's just me, but if we specify that applications need to
specify how to do normalization I think it might be worthwhile to give
some guidance on when to normalize strings.

The OPEN ISSUE in section 3.2.5 suggests mapping uppercase and titlecase
code points to their lowercase equivalents maximizes entropy. Maybe I'm
just confused, but doesn't that effectively reduce the set of possible
characters and therefore entropy?
Personally I think saying it's NOT RECOMMENDED for application protocols
to perform such mappings is the right thing to do.
Furthermore I think allowing symbols and punctuation is the right thing
to do. In the age of laptops not being able to type your password any
more should be less of a problem.

The Unassigned rule states that not yet assigned code points are to be
considered Unassigned. To me that seems not only trivial, but especially
doesn't appear to be a rule on how to handle unassigned code points at
all...

The fact that code points can only have one derived property appears
strange to me. It makes the algorithm non-deterministic in that it
sometimes sets the derived property to one of three values. I think
allowing multiple derived properties at the same time might be clearer.

Is it expected that people will calculate derived properties themselves,
or look them up in a to be created normative table separate from the
RFC? It seams to me that the former would need a complete Unicode table
and an implementation of a  relatively complex algorithm, while the
later is a simple table lookup from a smaller table.
However the former still seems more elegant...

And last and least a remark: IIRC what was agreed on was a inclusive
approach only disallowing certain characters. The pseudo-code however
defaults to DISALLOWED if no class was matched.
That seems right to me, but implies we are still using an approach that
is only allowing certain characters, however it's more flexible and
version agile.

--
Florian Zeitz

> -------- Original Message --------
> Subject: I-D Action: draft-blanchet-precis-framework-01.txt
> Date: Tue, 17 May 2011 09:22:44 -0700
> From: internet-drafts@ietf.org
> Reply-To: internet-drafts@ietf.org
> To: i-d-announce@ietf.org
> 
> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> 
> 	Title           : PRECIS Framework: Handling Internationalized Strings
> in Protocols
> 	Author(s)       : Marc Blanchet
>                           Peter Saint-Andre
> 	Filename        : draft-blanchet-precis-framework-01.txt
> 	Pages           : 24
> 	Date            : 2011-05-17
> 
>    Application protocols that make use of Unicode code points in
>    protocol strings need to prepare such strings in order to perform
>    comparison operations (e.g., for purposes of authentication or
>    authorization).  In general, this problem has been labeled the
>    &quot;preparation and comparison of internationalized strings&quot; or
>    &quot;PRECIS&quot;.  This document defines a framework that enables
> application
>    protocols to prepare various classes of strings in a way that depends
>    on the properties of Unicode code points.  Because this framework
>    does not depend on large tables of Unicode code points as in
>    stringprep (RFC 3454), it is more agile with regard to changes in the
>    underlying Unicode database and thus provides improved flexibility to
>    application protocols.  A specification that reuses this framework
>    either can directly use the base string classes defined in this
>    document or can subclass the base string classes as needed.  This
>    framework uses an approach similar to that of the revised
>    internationalized domain names in applications (IDNA) technology (RFC
>    5890, RFC 5891, RFC 5892, RFC 5893, RFC 5894) and thus adheres to the
>    high-level design goals described in RFC 4690, albeit for non-IDNA
>    technologies.  This document obsoletes RFC 3454.
> 
> 
> A URL for this Internet-Draft is:
> http://www.ietf.org/internet-drafts/draft-blanchet-precis-framework-01.txt
> 
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
> 
> This Internet-Draft can be retrieved at:
> ftp://ftp.ietf.org/internet-drafts/draft-blanchet-precis-framework-01.txt
> _______________________________________________
> I-D-Announce mailing list
> I-D-Announce@ietf.org
> https://www.ietf.org/mailman/listinfo/i-d-announce
> Internet-Draft directories: http://www.ietf.org/shadow.html
> or ftp://ftp.ietf.org/ietf/1shadow-sites.txt
> 
> 
> 
> 
> _______________________________________________
> precis mailing list
> precis@ietf.org
> https://www.ietf.org/mailman/listinfo/precis