Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Florian Zeitz <florob@babelmonkeys.de> Sat, 12 October 2013 11:26 UTC

Return-Path: <florob@babelmonkeys.de>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1726B21E8163 for <precis@ietfa.amsl.com>; Sat, 12 Oct 2013 04:26:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xG+t0swLEFWi for <precis@ietfa.amsl.com>; Sat, 12 Oct 2013 04:26:01 -0700 (PDT)
Received: from babelmonkeys.de (babelmonkeys.de [IPv6:2a02:d40:3:1:10a1:5eff:fe52:509]) by ietfa.amsl.com (Postfix) with ESMTP id CE26021F9AE2 for <precis@ietf.org>; Sat, 12 Oct 2013 04:25:59 -0700 (PDT)
Received: from xdsl-87-79-85-187.netcologne.de ([87.79.85.187] helo=[192.168.0.131]) by babelmonkeys.de with esmtpsa (TLS1.0:DHE_RSA_CAMELLIA_256_CBC_SHA1:256) (Exim 4.80) (envelope-from <florob@babelmonkeys.de>) id 1VUxMS-0004JX-3t; Sat, 12 Oct 2013 13:27:56 +0200
Message-ID: <525931C0.6090600@babelmonkeys.de>
Date: Sat, 12 Oct 2013 13:25:52 +0200
From: Florian Zeitz <florob@babelmonkeys.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version: 1.0
To: Peter Saint-Andre <stpeter@stpeter.im>, precis@ietf.org
References: <20130828154603.a94201dea74f29229b4767b2@jprs.co.jp> <522FD033.3070001@babelmonkeys.de> <5254CB86.40706@stpeter.im> <5258A86C.7080708@babelmonkeys.de> <5258B4F8.4030601@stpeter.im>
In-Reply-To: <5258B4F8.4030601@stpeter.im>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Subject: Re: [precis] WGLC: draft-ietf-precis-framework-09.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Oct 2013 11:26:07 -0000

On 12.10.2013 04:33, Peter Saint-Andre wrote:
> Heh, I was just now addressing your feedback in my working copy of the
> spec. -)
> 
> On 10/11/2013 07:39 PM, Florian Zeitz wrote:
>> On 09.10.2013 05:20, Peter Saint-Andre wrote:
>>> Hi Florian, thanks for the review! Comments inline.
>>>
>>> On 09/10/2013 08:06 PM, Florian Zeitz wrote:
>>>> The major thing that bothers me about this draft is that string classes
>>>> IMHO conflate to separate concepts. On the one hand they specify valid
>>>> and disallowed codepoints. On the other hand they specify (or rather,
>>>> let the application protocol specify) mappings and normalization.
>>>> The problem I have with this is, that it makes it unclear which strings
>>>> are valid in a certain class.
>>>
>>> You are correct. Validity really applies at the level of a profile, not
>>> a class.
>>>>
>>>> E.g. consider an applications protocol that specifies FreeformClass
>>>> mixed with NFKC. This means characters, which have a compatibility
>>>> equivalent are valid in the sense that they are FREE_PVAL, but are
>>>> invalid in the normalization form. It is unclear to me, whether a string
>>>> containing characters with a compatibility equivalent would be contained
>>>> in the FreeformClass, or more precisely, this specialization thereof.
>>>>
>>>> Similar considerations are true for e.g. mixing case mapping with
>>>> IdentifierClass. Uppercase characters are PVALID/ID_PVAL, but shouldn't
>>>> be present after mapping.
>>>>
>>>> I would prefer it if we specified classes solely in terms of valid and
>>>> disallowed codepoints and directionality requirements.
>>> When you suggest that we specify a class in terms of codepoints, are you
>>> suggesting that go back to something like the stringprep model, in which
>>> a class or profile defines a lookup table?
>> Well, yes and no. We certainly want the rule/category based algorithm in
>> order to have Unicode version agility, and I'm not suggesting we get rid
>> of it. I'm also not suggesting we drop the rules about having some
>> codepoints only valid in a certain context.
>> I do however think it may be more sensible to say a string is within a
>> PRECIS class iff all its characters are PVALID, CONTEXTO, or CONTEXTJ
>> for this class, and a contextual rule is fulfilled, if required.
> 
> The way I see it, it doesn't make much sense to talk about a string
> matching a class. In practice within an application protocol, a string
> will be checked against the full set of rules as defined by a profile. A
> string class provides a kind of "substrate", if you will, but it doesn't
> define things in enough detail to perform string matching.
> 
What I'm trying to avoid here is a certain ambiguity I think we have
now. To give an example: Text we have in 6122bis now says «MUST consist
only of Unicode code points that conform to the "FreeformClass" base
string class». For arguments sake lets pretend it also specified NFKC.
Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in
this case?

It's either "Yes, that is clearly FREE_PVAL", or "No, that must be
normalized to U+0032 DIGIT TWO", depending on your reading.

>> This may even already be the intent, but as I said a profile can easily
>> be defined such, that a string matches this criteria, but can never be
>> produced after the specified normalization and all mappings were applied.
>> At any rate I think we need clearer text about the intention here,
>> answering the question: "When is a string allowed by a profile?". I
>> personally can not really tell from the draft right now.
> 
> In part, I don't think it is the responsibility of this specification to
> answer that question, other than to make it clear that you need to check
> a string against the full set of rules defined by a profile. I do think
> it would be helpful to provide some examples, although I think they
> probably belong in the various specs that define the profiles (so far
> that would be nickname, saslprepbis, and 6122bis).
> 
I think I agree. And I think that is why I suggested leaving
normalizations and mappings out of the classes. We want to tell people
that they have to normalize, and we want them to think about mappings.
But the exact order of those operations, who needs to perform them, what
is valid in protocol slots, etc. is their business.

And what that means in particular (to me) is that a profile would tell
you after which of the steps a string needs to be PVALID under a certain
class. E.g. for SASLPrepbis you would (as I understand it) split up the
simple username into its parts, perform normalization and all mappings
except case mapping, and then check whether the userparts are
PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL
mechanism tells you to.

If everyone is required to specify such rules (usually less complex ones
I'd hope) I don't see the benefit of formally including normalization
and mappings in the definition of a class (in particular if that
definition pretty much is "that's up to you").
This would also allow making this text generic, and not repeating it for
IdentifierClass and FreeformClass.

>>>> We would then have separate text saying that an application protocol
>>>> MUST also specify which mappings and normalization to apply, what entity
>>>> needs to apply them (e.g. only the server), and when they need to be
>>>> applied (e.g. when comparing strings, before storing them, before
>>>> display to a user). Both StringPrep-bis and 6122bis already have text to
>>>> this effect. It seems sensible to me to generally require application
>>>> protocols to specify the "who", and "when" beyond the "what". E.g. it is
>>>> often sensible to display identifiers with their case as entered, but
>>>> compare them after case folding. The current text might suggest that
>>>> mappings have to be applied to user input immediately.
>>> I agree that all good application protocols that use PRECIS need to
>>> specify the enforcement rules, as we already do for SASL and XMPP. I am
>>> less sure that the PRECIS framework needs to legislate that.
>> I think not legislating this only gives people a great way to shoot
>> themselves in the foot. I could be convinced otherwise though.
> 
> Yes, we are trying to prevent such "foot guns". I don't think we can get
> very specific (e.g., some technologies that use PRECIS might not have a
> client-server architecture). I'll see about proposing some text here...
> 
I don't think we have to be too specific here. Basically we want to get
people thinking. They must answer the questions "Which entity has to
perform which steps?" and "At what point in the protocol flow is this
required?" I don't think that restricts the text to client-server
architectures as such.

Florian