Re: [precis] WGLC: draft-ietf-precis-framework-09.txt

Peter Saint-Andre <stpeter@stpeter.im> Tue, 15 October 2013 17:56 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5DC3C11E81F7 for <precis@ietfa.amsl.com>; Tue, 15 Oct 2013 10:56:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.491
X-Spam-Level:
X-Spam-Status: No, score=-102.491 tagged_above=-999 required=5 tests=[AWL=0.108, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DUYs9AUaI5A9 for <precis@ietfa.amsl.com>; Tue, 15 Oct 2013 10:56:35 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id F0F0911E819D for <precis@ietf.org>; Tue, 15 Oct 2013 10:56:27 -0700 (PDT)
Received: from ergon.local (unknown [128.107.239.235]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id F1AB641019; Tue, 15 Oct 2013 12:02:43 -0600 (MDT)
Message-ID: <525D81C7.6010904@stpeter.im>
Date: Tue, 15 Oct 2013 11:56:23 -0600
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Florian Zeitz <florob@babelmonkeys.de>
References: <20130828154603.a94201dea74f29229b4767b2@jprs.co.jp> <522FD033.3070001@babelmonkeys.de> <5254CB86.40706@stpeter.im> <5258A86C.7080708@babelmonkeys.de> <5258B4F8.4030601@stpeter.im> <525931C0.6090600@babelmonkeys.de> <525C0F72.1050002@stpeter.im> <525D1608.3050309@babelmonkeys.de> <525D38DC.7010504@stpeter.im> <525D43F8.6030300@babelmonkeys.de>
In-Reply-To: <525D43F8.6030300@babelmonkeys.de>
X-Enigmail-Version: 1.5.2
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Cc: precis@ietf.org
Subject: Re: [precis] WGLC: draft-ietf-precis-framework-09.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/precis>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2013 17:56:54 -0000

On 10/15/13 7:32 AM, Florian Zeitz wrote:
> Am 15.10.2013 14:45, schrieb Peter Saint-Andre:
>> On 10/15/13 4:16 AM, Florian Zeitz wrote:
>>> Am 14.10.2013 17:36, schrieb Peter Saint-Andre:
>>>> On 10/12/13 5:25 AM, Florian Zeitz wrote:
>>>>> On 12.10.2013 04:33, Peter Saint-Andre wrote:
>>>>>> On 10/11/2013 07:39 PM, Florian Zeitz wrote:
>>>>>>> On 09.10.2013 05:20, Peter Saint-Andre wrote:
>>>>> What I'm trying to avoid here is a certain ambiguity I think we have
>>>>> now. To give an example: Text we have in 6122bis now says «MUST consist
>>>>> only of Unicode code points that conform to the "FreeformClass" base
>>>>> string class». 
>>>>
>>>> Ah, I see your point. I think we'll need to adjust the text in all of
>>>> the documents that use the framework.
>>>>
>>>> So for instance, currently we say:
>>>>
>>>>    A resourcepart MUST consist only of Unicode code points that conform
>>>>    to the "FreeformClass" base string class defined in
>>>>    [I-D.ietf-precis-framework].  (Note that there is no XMPP-specific
>>>>    subclass for resourceparts.)
>>>>
>>>>    The normalization and mapping rules for the resourcepart of a JID are
>>>>    as follows, where the operations specified MUST be completed in the
>>>>    order shown:
>>>>
>>>>    1.  Fullwidth and halfwidth characters MAY be mapped to their
>>>>        decomposition equivalents.
>>>>
>>>>    [etc.]
>>>>
>>>> I think we'll need to change that to say something like this:
>>>>
>>>>    A resourcepart MUST consist only of Unicode code points that conform
>>>>    to the "JIDresourceFreeformClass" profile, which is defined as
>>>>    follows:
>>>>
>>>>    1. The base string class is the "FreeformClass" class specified in
>>>>    [I-D.ietf-precis-framework]
>>>>
>>>>    2.  Fullwidth and halfwidth characters MAY be mapped to their
>>>>        decomposition equivalents.
>>>>
>>>> That is, the base string class can immediately limit the characters that
>>>> you even consider. For the "JIDlocalIdentifierClass" profile (or
>>>> whatever we call it), if a character is disallowed by the
>>>> IdentifierClass then you don't need to consider it further, but if it's
>>>> allowed then you need to complete further processing (such as the
>>>> relevant mapping operations).
>>>>
>>> I'm not sure that is the right approach, see below.
>>> It also occurs to me that the framework draft currently calls this
>>> "Usage" instead of "Profile", or is that yet another concept?
>>
>> Based on other last call feedback (see exchanges with Martin), I've
>> provisionally replaced the concepts of subclasses and usages with the
>> concept of a profile. This has (IMHO) simplified matters.
>>
>>>>> For arguments sake lets pretend it also specified NFKC.
>>>>> Does "U+1D7D0 MATHEMATICAL BOLD DIGIT TWO" conform to "FreeformClass" in
>>>>> this case?
>>>>>
>>>>> It's either "Yes, that is clearly FREE_PVAL", or "No, that must be
>>>>> normalized to U+0032 DIGIT TWO", depending on your reading.
>>>>
>>>> Here again I think conformance applies only at the level of the profile.
>>>> The base string classes just limit the universe of characters you need
>>>> to consider.
>>>>
>>>>>>> This may even already be the intent, but as I said a profile can easily
>>>>>>> be defined such, that a string matches this criteria, but can never be
>>>>>>> produced after the specified normalization and all mappings were applied.
>>>>>>> At any rate I think we need clearer text about the intention here,
>>>>>>> answering the question: "When is a string allowed by a profile?". I
>>>>>>> personally can not really tell from the draft right now.
>>>>>>
>>>>>> In part, I don't think it is the responsibility of this specification to
>>>>>> answer that question, other than to make it clear that you need to check
>>>>>> a string against the full set of rules defined by a profile. I do think
>>>>>> it would be helpful to provide some examples, although I think they
>>>>>> probably belong in the various specs that define the profiles (so far
>>>>>> that would be nickname, saslprepbis, and 6122bis).
>>>>>>
>>>>> I think I agree. And I think that is why I suggested leaving
>>>>> normalizations and mappings out of the classes. We want to tell people
>>>>> that they have to normalize, and we want them to think about mappings.
>>>>> But the exact order of those operations, who needs to perform them, what
>>>>> is valid in protocol slots, etc. is their business.
>>>>>
>>>>> And what that means in particular (to me) is that a profile would tell
>>>>> you after which of the steps a string needs to be PVALID under a certain
>>>>> class. 
>>>>
>>>> s/class/profile/ (IMHO)
>>>>
>>> I think that might be where we are talking cross purposes.
>>> My understanding is, that we have an algorithm that will tell us whether
>>> a codepoint is PVALID, according to the set of codepoints a class
>>> allows. Profiles have no influence on this decision. I.e. nothing that
>>> determines whether a codepoint is PVALID may be (re)defined by a
>>> profile. This always requires a subclass.
>>
>> It all depends on what you mean by the "P" in PVALID. In practice, my
>> feeling is that you want to know if a given codepoint is allowed in,
>> say, the localpart of an XMPP address. The base class isn't always going
>> to answer that question for you.
>>
> I'm actually talking about the calculated property here, be it named as
> it will. My point is we have an algorithm to calculate PVALID, but we
> don't have one for "is this an allowed string for a profile". Which, to
> be clear, I think is fine for the framework document.

OK. I agree.

>>>>> E.g. for SASLPrepbis you would (as I understand it) split up the
>>>>> simple username into its parts, perform normalization and all mappings
>>>>> except case mapping, and then check whether the userparts are
>>>>> PVALID/ID_PVAL. Case mappings would then be performed whenever the SASL
>>>>> mechanism tells you to.
>>>>>
>>>>> If everyone is required to specify such rules (usually less complex ones
>>>>> I'd hope) I don't see the benefit of formally including normalization
>>>>> and mappings in the definition of a class (in particular if that
>>>>> definition pretty much is "that's up to you").
>>>>> This would also allow making this text generic, and not repeating it for
>>>>> IdentifierClass and FreeformClass.
>>>>
>>>> I'm sorry, I've lost track of exactly what "this text" refers to here.
>>>>
>>> I think I had this much clearer in my head then I managed to express it,
>>> sorry. Let me try again:
>>> What we have right now are classes. These do two things:
>>> a) Limit the character set
>>> b) Specify a set of properties usages need to define
>>
>> s/usages/profiles/ yes (if we accept the simplification that Martin and
>> I worked out).
>>
>>> Specifically a) encompasses Valid, Disallowed, and Unassigned, while
>>> b) encompasses Width Mapping, Additional Mappings, Case Mapping,
>>> Normalization, and Directionality
>>>
>>> Note that the text describing b) is almost completely identical for
>>> IdentifierClass and FreeformClass. This is what "this text" was refering to.
>>
>> Thanks for the clarification.
>>
>>> My suggestion is to restrict classes to include only the a) properties.
>>> We would still say that a usage needs to specify everything in b) though.
>>
>> I'm confused again, because that's what I *thought* we were doing all
>> along. However, if that wasn't clear to you then we need to improve the
>> text.
>>
> Umm... I think I'm still not making myself clear :/. And I suspect that
> is, because "include" is ambiguous here.
> I do realize that we are already at the point where the classes
> explicitly specify everything in a), while everything in b) is profile
> dependent.
> What I'm saying is that IMHO b) should not be part of a class at all.

I understand. You're saying let's move everything about mapping and
normalization and directionality out of Section 3 (about string classes)
and move it to Section 4 (about profiles).

That makes a lot of sense. Sorry I was so dense. :-)

> I'd want to have classes, consisting of Valid, Disallowed and Unassigned
> sets. For these we have an algorithm determining whether a codepoint is
> within the class (i.e. PVALID or CONTEXT[JO] + rule), or not.
> Each usage (I think the term is more appropriate for this scheme), would
> then define everything from the b) set, and which class to restrict data to.

Got it.

BTW, I prefer the term profile because during WG meetings and list
discussions that's the term I've heard people naturally use. It seems
artificial to force people to call it a "usage" when their tendency is
to use the term "profile".

Peter

-- 
Peter Saint-Andre
https://stpeter.im/