Re: [xmpp] [precis] review of draft-ietf-xmpp-6122bis-12

"Joe Hildebrand (jhildebr)" <jhildebr@cisco.com> Thu, 31 July 2014 15:49 UTC

From: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
To: Florian Zeitz <florob@babelmonkeys.de>, "xmpp@ietf.org" <xmpp@ietf.org>, "precis@ietf.org" <precis@ietf.org>
Thread-Topic: [precis] review of draft-ietf-xmpp-6122bis-12
Thread-Index: AQHPrDzL9R10zr+WqkC/gobE1sN1DZu5kueAgACx84A=
Date: Thu, 31 Jul 2014 15:49:40 +0000
Message-ID: <CFFFBD55.575F2%jhildebr@cisco.com>
References: <CFFEBEEE.575AE%jhildebr@cisco.com> <53D97BF1.6010602@babelmonkeys.de>
In-Reply-To: <53D97BF1.6010602@babelmonkeys.de>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.4.3.140616
Content-Type: text/plain; charset="utf-8"
Content-ID: <EBE82659C1FD6544BFF1BF80C9147498@emea.cisco.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/xmpp/25_fxpOzT1Dh2sNtB2LUqTY_VWY
Subject: Re: [xmpp] [precis] review of draft-ietf-xmpp-6122bis-12
Precedence: list

On 7/30/14, 5:12 PM, "Florian Zeitz" <florob@babelmonkeys.de> wrote:

>Hy Joe, I largely agree with your comments. Some remarks inline.
>
>On 30.07.2014 23:25, Joe Hildebrand (jhildebr) wrote:
>> [...]
>>  > 3.1.  Fundamentals
>>  > 
>>  >       jid           = [ localpart "@" ] domainpart [ "/"
>>resourcepart ]
>>  >       localpart     = 1*1023(localpoint)
>>  >                       ;
>>  >                       ; a "localpoint" is a UTF-8 encoded
>>  >                       ; Unicode code point that conforms to
>>  >                       ; the "JIDlocalIdentifierClass" profile
>>  >                       ; of the PRECIS IdentifierClass
>>  >                       ;
>> 
>> This implies 1023 codepoints, not 1023 bytes to me. Same issue for ifqdn
>> and resourcepart.  6122 just had 1*; I think going back to that would be
>> fine since we have a rule below that captures the max size.
>> 
>That is somewhat debatable. 1* is as correct/wrong as 1*1023 is.
>1023 is an upper limit on the number of codepoints (all of them 7-bit
>ASCII), it does however not capture the separate rule about the maximum
>number of bytes. I'm not sure which version is less confusing.


I'm remembering that we decided that ABNF only worked on octets, not
codepoints.  Maybe we just need to tweak the explanatory text, perhaps
like this:


a "localpoint" is a byte from a UTF-8 encoded...

and perhaps s/localpoint/localbyte/g


>> [snip: when are rules applied, what about dumb clients?]
>I specifically pushed for having text clarifying this. I thought it was
>sufficiently covered in Section 4. Do you disagree, and/or think we need
>a forward reference here?

A reference to section 4 would make me happier, although it's still not
clear that some of the rules are ones clients MUST do if they want interop.

>>  >    A localpart MUST consist only of Unicode code points that conform
>>to
>>  >    the "JIDlocalIdentifierClass" profile of the "IdentifierClass"
>>base
>>  >    string class defined in [I-D.ietf-precis-framework].  The
>>  >    JIDlocalIdentifierClass profile includes all code points allowed
>>by
>>  >    the IdentifierClass base class, with the exception of the
>>following
>>  >    characters that are explicitly disallowed in XMPP localparts:
>> 
>> (special precis focus)
>> I would have expected this to be phrased more similarly to step 2 of
>> http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-5, or
>> for section 5 to just have a step about codepoints forbidden in a given
>> usage of the selected precis class.
>> 
>Personally I think this is fine. It's what I would have expected from
>the description in
><http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-4.1.6>.
>If anything, I would have expected this to be phrased similarly to
>section 3.2.3 or 3.3.3 (Disallowed).
>My understanding is that this would be checked as part of step 5 in
>section 5.

Because of a lack of parallel structure, the precis framework doc is not
clear to me on this point.  I think this is something that should be
clarified, since my understanding is we're about to open that doc back up
for edits on exactly this point; how to take one of the classes and mold
it to the needs of your protocol by prohibiting particular codepoints.

>> [...]
>>  >    1.  Fullwidth and halfwidth characters MAY be mapped to their
>>  >        decomposition mappings.
>> 
>> (precis)
>> I need a hint as to when do this.  "MAY" isn't nearly enough.
>> 
>I think this actually relates a lot to how we expect XMPP resources to
>be used. They are not supposed to be entered by users. Possibly not even
>user visible.
>
>If this mapping was only relevant to servers, I'd say "whatever floats
>your boat". Make it implementation defined behaviour and that is that.
>Servers probably wouldn't implement it, because it's pointless work.
>
>However, thick clients have some features that communicate full JIDs
>between clients directly. For this case it is potentially vital that
>both clients perform the same set of mappings. Hence, I actually tend to
>prefer "MUST NOT" here.

MUST NOT would make me happy.  It will increase interoperability.

>>  >    2.  Map any instances of non-ASCII space to ASCII space (U+0020).
>> 
>> (precis)
>> I was hoping either the framework doc or the mappings doc would tell me
>> more about which characters to map here.  RFC 3454 had table C.1.2, but
>>I
>> don't see any hints about what I'm supposed to do now.  Is the rule
>>"has a
>> compatibility mapping to U+0020"?  That doesn't hit U+200B which is in
>> C.1.2, nor does "has category Zs".  draft-ietf-precis-mappings says
>> "Therefore, the special mapping table should be based on a well-
>>    defined mapping table for each protocol", which although I don't
>> particularly like, I can live with - but we need the table here.
>> 
>I think we need more specific text, and I think it is quite likely more
>appropriate for the framework or mapping document than this profile.
>I'm not sure it needs to be a table. Potentially "has category Zs" is
>good enough, but I'd want to look closer at the Unicode data first.

The mapping doc makes sense to me.

>I think your remark about U+200B being in C.1.2 is slightly off.
>The tables in appendix C are specifically prohibition tables, and not
>mapping tables. B.1 lists U+200B as commonly mapped to nothing, which I
>think is the right thing to do wrt mapping this codepoint. Do you have
>any other codepoint in mind that is not in Zs, but should be mapped to
>ASCII space?

You are of course correct about U+200B being in table B.1.  All of the
other codepoints in C.1.2 are in Zs, and there are no other codepoints in
the Unicode 7.0 UCD that are Zs.

>>  >    3.  So-called additional mappings MAY be applied, such as mapping
>>of
>>  >        characters that are similar to common delimiters (such as '@',
>>  >        ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
>>  >        STOP (U+3002) to FULL STOP (U+002E)) and special handling of
>>  >        certain characters or classes of characters (e.g., mapping of
>>  >        non-ASCII spaces to ASCII space); the PRECIS mappings document
>>  >        [I-D.ietf-precis-mappings] describes such mappings in more
>>  >        detail.
>>  > 
>>  >    4.  Uppercase and titlecase characters MAY be mapped to their
>>  >        lowercase equivalents, preferably using Unicode Default Case
>>  >        Folding as defined in Chapter 3 of the Unicode Standard
>>  >        [UNICODE].
>> 
>> Again, I need more about the MAY here.
>> 
>See above. We might want to have a discussion about the effects of
>different implementations performing different sets of mappings.
>I personally think we want to explicitly mandate or disallow mappings of
>codepoints that are otherwise PVALID.

As above, MUST NOT for the mappings feels right to me, but I'm open to
other arguments.  I'm most worried about XEP-45 nicks here.

>> 
>> I think it might be nice to point out that this may have made
>> previously-valid JIDs no longer valid (or vice-versa), and that we
>>suggest
>> careful testing before migrating user data.
>> 
>Do we have data to what extend this is the case? I.e. which codepoints
>that were previously allowed are now disallowed?
>I think it might be worthwhile to be very specific here, particularly
>concerning why we think these changes are appropriate. There may be a
>very vocal minority that is not okay with their JIDs being deprecated.

Yeah, that sounds like a bit of work, but it may be worthwhile.

-- 
Joe Hildebrand

[xmpp] review of draft-ietf-xmpp-6122bis-12 Joe Hildebrand (jhildebr)
Re: [xmpp] [precis] review of draft-ietf-xmpp-612… Florian Zeitz
Re: [xmpp] [precis] review of draft-ietf-xmpp-612… Joe Hildebrand (jhildebr)
Re: [xmpp] review of draft-ietf-xmpp-6122bis-12 Peter Saint-Andre
Re: [xmpp] review of draft-ietf-xmpp-6122bis-12 Peter Saint-Andre