Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Peter Saint-Andre <stpeter@stpeter.im> Wed, 09 October 2013 03:20 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34F2E11E8123 for <precis@ietfa.amsl.com>; Tue, 8 Oct 2013 20:20:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.211
X-Spam-Level:
X-Spam-Status: No, score=-102.211 tagged_above=-999 required=5 tests=[AWL=0.387, BAYES_00=-2.599, HTML_MESSAGE=0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0Hl3pSdKk61d for <precis@ietfa.amsl.com>; Tue, 8 Oct 2013 20:20:43 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id F1ED511E8122 for <precis@ietf.org>; Tue, 8 Oct 2013 20:20:39 -0700 (PDT)
Received: from [192.168.1.3] (unknown [71.237.13.154]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 178A0414CD; Tue, 8 Oct 2013 21:26:36 -0600 (MDT)
Message-ID: <5254CB86.40706@stpeter.im>
Date: Tue, 08 Oct 2013 21:20:38 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version: 1.0
To: Florian Zeitz <florob@babelmonkeys.de>, precis@ietf.org
References: <20130828154603.a94201dea74f29229b4767b2@jprs.co.jp> <522FD033.3070001@babelmonkeys.de>
In-Reply-To: <522FD033.3070001@babelmonkeys.de>
Content-Type: multipart/alternative; boundary="------------060404010701050007030104"
Subject: Re: [precis] WGLC: draft-ietf-precis-framework-09.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Oct 2013 03:20:48 -0000

Hi Florian, thanks for the review! Comments inline.

On 09/10/2013 08:06 PM, Florian Zeitz wrote:
> Am 28.08.2013 08:46, schrieb Yoshiro YONEYA:
>> Dear all,
>>
>> This message starts two weeks Working Group Last Call (WGLC) on
>> draft-ietf-precis-framework-09.txt (PRECIS Framework: Preparation and
>> Comparison of Internationalized Strings in Application Protocols).
>>
>> Please review the document and send comments to the list (precis@ietf.org),
>> the co-chairs (precis-chairs@tools.ietf.org), or the authors
>> (draft-ietf-precis-framework@tools.ietf.org) by the end of WGLC.
>>
>> The WGLC will end on Wednesday, Sep 11th.
>>
> I've reviewed this draft, and generally think it takes a sensible approach.
> However, I do have one major grief about it, as well as some smaller
> comments.
>
> The major thing that bothers me about this draft is that string classes
> IMHO conflate to separate concepts. On the one hand they specify valid
> and disallowed codepoints. On the other hand they specify (or rather,
> let the application protocol specify) mappings and normalization.
> The problem I have with this is, that it makes it unclear which strings
> are valid in a certain class.

You are correct. Validity really applies at the level of a profile, not 
a class.
>
> E.g. consider an applications protocol that specifies FreeformClass
> mixed with NFKC. This means characters, which have a compatibility
> equivalent are valid in the sense that they are FREE_PVAL, but are
> invalid in the normalization form. It is unclear to me, whether a string
> containing characters with a compatibility equivalent would be contained
> in the FreeformClass, or more precisely, this specialization thereof.
>
> Similar considerations are true for e.g. mixing case mapping with
> IdentifierClass. Uppercase characters are PVALID/ID_PVAL, but shouldn't
> be present after mapping.
>
> I would prefer it if we specified classes solely in terms of valid and
> disallowed codepoints and directionality requirements.
When you suggest that we specify a class in terms of codepoints, are you 
suggesting that go back to something like the stringprep model, in which 
a class or profile defines a lookup table?
> We would then have separate text saying that an application protocol
> MUST also specify which mappings and normalization to apply, what entity
> needs to apply them (e.g. only the server), and when they need to be
> applied (e.g. when comparing strings, before storing them, before
> display to a user). Both StringPrep-bis and 6122bis already have text to
> this effect. It seems sensible to me to generally require application
> protocols to specify the "who", and "when" beyond the "what". E.g. it is
> often sensible to display identifiers with their case as entered, but
> compare them after case folding. The current text might suggest that
> mappings have to be applied to user input immediately.
I agree that all good application protocols that use PRECIS need to 
specify the enforcement rules, as we already do for SASL and XMPP. I am 
less sure that the PRECIS framework needs to legislate that.
>
> The following are smaller comments ordered by section:
>
> Section 3.1:
> This section talks about "safety" of strings, without ever defining what
> that means in this context. The term "very safe" used to describe the
> IdentifierClass also strangely reminds me of statements about "absolute
> security". Maybe there is a way to generally word this better?

I'll think about better wording and suggest something on the list.
>
> The sentence "Directionality:  defines application behavior in the
> presence of code points that have directionality" seems a bit off to me.
> It is very different from the explanation given later in Section 4.1.
>  From my understanding this is about the allowed combinations of
> characters with directionality, and not about "application beahvior" in
> their presence. It could be about both, but I have not seen a draft talk
> about anything but allowed combinations (i.e. the Bidi Rule) yet.

See other messages in this thread.
>
> Section 3.3.3 and 3.4.3:
> While "unassigned codepoints are unassigned" is a nice tautology, I'm
> not sure what this means in terms of their treatment. In general I feel
> like more explanation is needed about unassigned codepoints and their
> (possible) handling.
Good point. I'll propose text.
>
> Section 3.3.6:
> I think it would be sensible to suggest using the Unicode Default Case
> Folding algorithm, if case mapping is to be applied.
That seems reasonable.
>
> Section 5:
> I feel like this lacks a normative statement about contextual rules.
> E.g. "A character with the derived property value CONTEXTJ or CONTEXTO
>     (CONTEXTUAL RULE REQUIRED) MUST NOT be used unless an appropriate
>     rule has been established and the context of the character is
>
As mentioned, we just point to IDNA2008 here, but I think you and Martin 
are right that we need provide some more detals here. For example, RFC 
5891 says:

###

The Unicode string MUST NOT contain any characters whose validity is 
context-dependent, unless the validity is positively confirmed by a 
contextual rule. To check this, each code point identified as CONTEXTJ 
or CONTEXTO in the Tables document [RFC5892 
<http://tools.ietf.org/html/rfc5892>] MUST have a non-null rule. If such 
a code point is missing a rule, the label is invalid. If the rule exists 
but the result of applying the rule is negative or inconclusive, the 
proposed label is invalid.

###

IMHO your text is more to the point.

Peter