Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Peter Saint-Andre <stpeter@stpeter.im> Tue, 15 October 2013 12:45 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D16F11E81E0 for <precis@ietfa.amsl.com>; Tue, 15 Oct 2013 05:45:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.331
X-Spam-Level:
X-Spam-Status: No, score=-102.331 tagged_above=-999 required=5 tests=[AWL=0.268, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7yKO0JDSa8xP for <precis@ietfa.amsl.com>; Tue, 15 Oct 2013 05:45:24 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id E27D321F9D56 for <precis@ietf.org>; Tue, 15 Oct 2013 05:45:20 -0700 (PDT)
Received: from ergon.local (unknown [71.237.13.154]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 125BD40FA9; Tue, 15 Oct 2013 06:51:31 -0600 (MDT)
Message-ID: <525D38DC.7010504@stpeter.im>
Date: Tue, 15 Oct 2013 06:45:16 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Florian Zeitz <florob@babelmonkeys.de>
References: <20130828154603.a94201dea74f29229b4767b2@jprs.co.jp> <522FD033.3070001@babelmonkeys.de> <5254CB86.40706@stpeter.im> <5258A86C.7080708@babelmonkeys.de> <5258B4F8.4030601@stpeter.im> <525931C0.6090600@babelmonkeys.de> <525C0F72.1050002@stpeter.im> <525D1608.3050309@babelmonkeys.de>
In-Reply-To: <525D1608.3050309@babelmonkeys.de>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Cc: precis@ietf.org
Subject: Re: [precis] WGLC: draft-ietf-precis-framework-09.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2013 12:45:33 -0000

On 10/15/13 4:16 AM, Florian Zeitz wrote:
> Am 14.10.2013 17:36, schrieb Peter Saint-Andre:
>> On 10/12/13 5:25 AM, Florian Zeitz wrote:
>>> On 12.10.2013 04:33, Peter Saint-Andre wrote:
>>>> On 10/11/2013 07:39 PM, Florian Zeitz wrote:
>>>>> On 09.10.2013 05:20, Peter Saint-Andre wrote:
>>> What I'm trying to avoid here is a certain ambiguity I think we have
>>> now. To give an example: Text we have in 6122bis now says «MUST consist
>>> only of Unicode code points that conform to the "FreeformClass" base
>>> string class». 
>>
>> Ah, I see your point. I think we'll need to adjust the text in all of
>> the documents that use the framework.
>>
>> So for instance, currently we say:
>>
>>    A resourcepart MUST consist only of Unicode code points that conform
>>    to the "FreeformClass" base string class defined in
>>    [I-D.ietf-precis-framework].  (Note that there is no XMPP-specific
>>    subclass for resourceparts.)
>>
>>    The normalization and mapping rules for the resourcepart of a JID are
>>    as follows, where the operations specified MUST be completed in the
>>    order shown:
>>
>>    1.  Fullwidth and halfwidth characters MAY be mapped to their
>>        decomposition equivalents.
>>
>>    [etc.]
>>
>> I think we'll need to change that to say something like this:
>>
>>    A resourcepart MUST consist only of Unicode code points that conform
>>    to the "JIDresourceFreeformClass" profile, which is defined as
>>    follows:
>>
>>    1. The base string class is the "FreeformClass" class specified in
>>    [I-D.ietf-precis-framework]
>>
>>    2.  Fullwidth and halfwidth characters MAY be mapped to their
>>        decomposition equivalents.
>>
>> That is, the base string class can immediately limit the characters that
>> you even consider. For the "JIDlocalIdentifierClass" profile (or
>> whatever we call it), if a character is disallowed by the
>> IdentifierClass then you don't need to consider it further, but if it's
>> allowed then you need to complete further processing (such as the
>> relevant mapping operations).
>>
> I'm not sure that is the right approach, see below.
> It also occurs to me that the framework draft currently calls this
> "Usage" instead of "Profile", or is that yet another concept?

Based on other last call feedback (see exchanges with Martin), I've
provisionally replaced the concepts of subclasses and usages with the
concept of a profile. This has (IMHO) simplified matters.

>>> For arguments sake lets pretend it also specified NFKC.
>>> Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in
>>> this case?
>>>
>>> It's either "Yes, that is clearly FREE_PVAL", or "No, that must be
>>> normalized to U+0032 DIGIT TWO", depending on your reading.
>>
>> Here again I think conformance applies only at the level of the profile.
>> The base string classes just limit the universe of characters you need
>> to consider.
>>
>>>>> This may even already be the intent, but as I said a profile can easily
>>>>> be defined such, that a string matches this criteria, but can never be
>>>>> produced after the specified normalization and all mappings were applied.
>>>>> At any rate I think we need clearer text about the intention here,
>>>>> answering the question: "When is a string allowed by a profile?". I
>>>>> personally can not really tell from the draft right now.
>>>>
>>>> In part, I don't think it is the responsibility of this specification to
>>>> answer that question, other than to make it clear that you need to check
>>>> a string against the full set of rules defined by a profile. I do think
>>>> it would be helpful to provide some examples, although I think they
>>>> probably belong in the various specs that define the profiles (so far
>>>> that would be nickname, saslprepbis, and 6122bis).
>>>>
>>> I think I agree. And I think that is why I suggested leaving
>>> normalizations and mappings out of the classes. We want to tell people
>>> that they have to normalize, and we want them to think about mappings.
>>> But the exact order of those operations, who needs to perform them, what
>>> is valid in protocol slots, etc. is their business.
>>>
>>> And what that means in particular (to me) is that a profile would tell
>>> you after which of the steps a string needs to be PVALID under a certain
>>> class. 
>>
>> s/class/profile/ (IMHO)
>>
> I think that might be where we are talking cross purposes.
> My understanding is, that we have an algorithm that will tell us whether
> a codepoint is PVALID, according to the set of codepoints a class
> allows. Profiles have no influence on this decision. I.e. nothing that
> determines whether a codepoint is PVALID may be (re)defined by a
> profile. This always requires a subclass.

It all depends on what you mean by the "P" in PVALID. In practice, my
feeling is that you want to know if a given codepoint is allowed in,
say, the localpart of an XMPP address. The base class isn't always going
to answer that question for you.

>>> E.g. for SASLPrepbis you would (as I understand it) split up the
>>> simple username into its parts, perform normalization and all mappings
>>> except case mapping, and then check whether the userparts are
>>> PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL
>>> mechanism tells you to.
>>>
>>> If everyone is required to specify such rules (usually less complex ones
>>> I'd hope) I don't see the benefit of formally including normalization
>>> and mappings in the definition of a class (in particular if that
>>> definition pretty much is "that's up to you").
>>> This would also allow making this text generic, and not repeating it for
>>> IdentifierClass and FreeformClass.
>>
>> I'm sorry, I've lost track of exactly what "this text" refers to here.
>>
> I think I had this much clearer in my head then I managed to express it,
> sorry. Let me try again:
> What we have right now are classes. These do two things:
> a) Limit the character set
> b) Specify a set of properties usages need to define

s/usages/profiles/ yes (if we accept the simplification that Martin and
I worked out).

> Specifically a) encompasses Valid, Disallowed, and Unassigned, while
> b) encompasses Width Mapping, Additional Mappings, Case Mapping,
> Normalization, and Directionality
> 
> Note that the text describing b) is almost completely identical for
> IdentifierClass and FreeformClass. This is what "this text" was refering to.

Thanks for the clarification.

> My suggestion is to restrict classes to include only the a) properties.
> We would still say that a usage needs to specify everything in b) though.

I'm confused again, because that's what I *thought* we were doing all
along. However, if that wasn't clear to you then we need to improve the
text.

> The main benefit I see in this is that classes become self-contained.
> Saying that a string has to conform to a class becomes a much more
> self-explanatory statement. The usage would however need to further
> define at which point in time this conformance is necessary.
> E.g. if normalization is done server side, clients may already need to
> produce strings conforming to the class. But when clients perform
> normalization checking whether a string is in a class after
> normalization might reduce user surprise.

Indeed.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/