Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Peter Saint-Andre <stpeter@stpeter.im> Mon, 14 October 2013 15:36 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A4AF211E81E2 for <precis@ietfa.amsl.com>; Mon, 14 Oct 2013 08:36:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R8n7LeblBcCX for <precis@ietfa.amsl.com>; Mon, 14 Oct 2013 08:36:45 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id B7E8611E81A1 for <precis@ietf.org>; Mon, 14 Oct 2013 08:36:28 -0700 (PDT)
Received: from ergon.local (unknown [24.8.129.242]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id EA50240FA9; Mon, 14 Oct 2013 09:42:33 -0600 (MDT)
Message-ID: <525C0F72.1050002@stpeter.im>
Date: Mon, 14 Oct 2013 09:36:18 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Florian Zeitz <florob@babelmonkeys.de>
References: <20130828154603.a94201dea74f29229b4767b2@jprs.co.jp> <522FD033.3070001@babelmonkeys.de> <5254CB86.40706@stpeter.im> <5258A86C.7080708@babelmonkeys.de> <5258B4F8.4030601@stpeter.im> <525931C0.6090600@babelmonkeys.de>
In-Reply-To: <525931C0.6090600@babelmonkeys.de>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Cc: precis@ietf.org
Subject: Re: [precis] WGLC: draft-ietf-precis-framework-09.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Oct 2013 15:36:58 -0000

On 10/12/13 5:25 AM, Florian Zeitz wrote:
> On 12.10.2013 04:33, Peter Saint-Andre wrote:
>>
>> On 10/11/2013 07:39 PM, Florian Zeitz wrote:
>>> On 09.10.2013 05:20, Peter Saint-Andre wrote:
>>>> Hi Florian, thanks for the review! Comments inline.
>>>>
>>>> On 09/10/2013 08:06 PM, Florian Zeitz wrote:
>>>>> The major thing that bothers me about this draft is that string classes
>>>>> IMHO conflate to separate concepts. On the one hand they specify valid
>>>>> and disallowed codepoints. On the other hand they specify (or rather,
>>>>> let the application protocol specify) mappings and normalization.
>>>>> The problem I have with this is, that it makes it unclear which strings
>>>>> are valid in a certain class.
>>>>
>>>> You are correct. Validity really applies at the level of a profile, not
>>>> a class.
>>>>>
>>>>> E.g. consider an applications protocol that specifies FreeformClass
>>>>> mixed with NFKC. This means characters, which have a compatibility
>>>>> equivalent are valid in the sense that they are FREE_PVAL, but are
>>>>> invalid in the normalization form. It is unclear to me, whether a string
>>>>> containing characters with a compatibility equivalent would be contained
>>>>> in the FreeformClass, or more precisely, this specialization thereof.
>>>>>
>>>>> Similar considerations are true for e.g. mixing case mapping with
>>>>> IdentifierClass. Uppercase characters are PVALID/ID_PVAL, but shouldn't
>>>>> be present after mapping.
>>>>>
>>>>> I would prefer it if we specified classes solely in terms of valid and
>>>>> disallowed codepoints and directionality requirements.
>>>> When you suggest that we specify a class in terms of codepoints, are you
>>>> suggesting that go back to something like the stringprep model, in which
>>>> a class or profile defines a lookup table?
>>> Well, yes and no. We certainly want the rule/category based algorithm in
>>> order to have Unicode version agility, and I'm not suggesting we get rid
>>> of it. I'm also not suggesting we drop the rules about having some
>>> codepoints only valid in a certain context.
>>> I do however think it may be more sensible to say a string is within a
>>> PRECIS class iff all its characters are PVALID, CONTEXTO, or CONTEXTJ
>>> for this class, and a contextual rule is fulfilled, if required.
>>
>> The way I see it, it doesn't make much sense to talk about a string
>> matching a class. In practice within an application protocol, a string
>> will be checked against the full set of rules as defined by a profile. A
>> string class provides a kind of "substrate", if you will, but it doesn't
>> define things in enough detail to perform string matching.
>>
> What I'm trying to avoid here is a certain ambiguity I think we have
> now. To give an example: Text we have in 6122bis now says «MUST consist
> only of Unicode code points that conform to the "FreeformClass" base
> string class». 

Ah, I see your point. I think we'll need to adjust the text in all of
the documents that use the framework.

So for instance, currently we say:

   A resourcepart MUST consist only of Unicode code points that conform
   to the "FreeformClass" base string class defined in
   [I-D.ietf-precis-framework].  (Note that there is no XMPP-specific
   subclass for resourceparts.)

   The normalization and mapping rules for the resourcepart of a JID are
   as follows, where the operations specified MUST be completed in the
   order shown:

   1.  Fullwidth and halfwidth characters MAY be mapped to their
       decomposition equivalents.

   [etc.]

I think we'll need to change that to say something like this:

   A resourcepart MUST consist only of Unicode code points that conform
   to the "JIDresourceFreeformClass" profile, which is defined as
   follows:

   1. The base string class is the "FreeformClass" class specified in
   [I-D.ietf-precis-framework]

   2.  Fullwidth and halfwidth characters MAY be mapped to their
       decomposition equivalents.

That is, the base string class can immediately limit the characters that
you even consider. For the "JIDlocalIdentifierClass" profile (or
whatever we call it), if a character is disallowed by the
IdentifierClass then you don't need to consider it further, but if it's
allowed then you need to complete further processing (such as the
relevant mapping operations).

> For arguments sake lets pretend it also specified NFKC.
> Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in
> this case?
> 
> It's either "Yes, that is clearly FREE_PVAL", or "No, that must be
> normalized to U+0032 DIGIT TWO", depending on your reading.

Here again I think conformance applies only at the level of the profile.
The base string classes just limit the universe of characters you need
to consider.

>>> This may even already be the intent, but as I said a profile can easily
>>> be defined such, that a string matches this criteria, but can never be
>>> produced after the specified normalization and all mappings were applied.
>>> At any rate I think we need clearer text about the intention here,
>>> answering the question: "When is a string allowed by a profile?". I
>>> personally can not really tell from the draft right now.
>>
>> In part, I don't think it is the responsibility of this specification to
>> answer that question, other than to make it clear that you need to check
>> a string against the full set of rules defined by a profile. I do think
>> it would be helpful to provide some examples, although I think they
>> probably belong in the various specs that define the profiles (so far
>> that would be nickname, saslprepbis, and 6122bis).
>>
> I think I agree. And I think that is why I suggested leaving
> normalizations and mappings out of the classes. We want to tell people
> that they have to normalize, and we want them to think about mappings.
> But the exact order of those operations, who needs to perform them, what
> is valid in protocol slots, etc. is their business.
> 
> And what that means in particular (to me) is that a profile would tell
> you after which of the steps a string needs to be PVALID under a certain
> class. 

s/class/profile/ (IMHO)

> E.g. for SASLPrepbis you would (as I understand it) split up the
> simple username into its parts, perform normalization and all mappings
> except case mapping, and then check whether the userparts are
> PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL
> mechanism tells you to.
> 
> If everyone is required to specify such rules (usually less complex ones
> I'd hope) I don't see the benefit of formally including normalization
> and mappings in the definition of a class (in particular if that
> definition pretty much is "that's up to you").
> This would also allow making this text generic, and not repeating it for
> IdentifierClass and FreeformClass.

I'm sorry, I've lost track of exactly what "this text" refers to here.

>>>>> We would then have separate text saying that an application protocol
>>>>> MUST also specify which mappings and normalization to apply, what entity
>>>>> needs to apply them (e.g. only the server), and when they need to be
>>>>> applied (e.g. when comparing strings, before storing them, before
>>>>> display to a user). Both StringPrep-bis and 6122bis already have text to
>>>>> this effect. It seems sensible to me to generally require application
>>>>> protocols to specify the "who", and "when" beyond the "what". E.g. it is
>>>>> often sensible to display identifiers with their case as entered, but
>>>>> compare them after case folding. The current text might suggest that
>>>>> mappings have to be applied to user input immediately.
>>>> I agree that all good application protocols that use PRECIS need to
>>>> specify the enforcement rules, as we already do for SASL and XMPP. I am
>>>> less sure that the PRECIS framework needs to legislate that.
>>> I think not legislating this only gives people a great way to shoot
>>> themselves in the foot. I could be convinced otherwise though.
>>
>> Yes, we are trying to prevent such "foot guns". I don't think we can get
>> very specific (e.g., some technologies that use PRECIS might not have a
>> client-server architecture). I'll see about proposing some text here...
>>
> I don't think we have to be too specific here. Basically we want to get
> people thinking. They must answer the questions "Which entity has to
> perform which steps?" and "At what point in the protocol flow is this
> required?" I don't think that restricts the text to client-server
> architectures as such.

True. Let me see if I can find a good place for that text, because I
agree with you that we really do want to get people thinking about those
topics.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/