Re: [xmpp] [precis] review of draft-ietf-xmpp-6122bis-12

Florian Zeitz <florob@babelmonkeys.de> Wed, 30 July 2014 23:12 UTC

Message-ID: <53D97BF1.6010602@babelmonkeys.de>
Date: Thu, 31 Jul 2014 01:12:49 +0200
From: Florian Zeitz <florob@babelmonkeys.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, "xmpp@ietf.org" <xmpp@ietf.org>, "precis@ietf.org" <precis@ietf.org>
References: <CFFEBEEE.575AE%jhildebr@cisco.com>
In-Reply-To: <CFFEBEEE.575AE%jhildebr@cisco.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/xmpp/5IS_4ekTYpd9t7pj2cC6OE4TuxE
Subject: Re: [xmpp] [precis] review of draft-ietf-xmpp-6122bis-12
Precedence: list

Hy Joe, I largely agree with your comments. Some remarks inline.

On 30.07.2014 23:25, Joe Hildebrand (jhildebr) wrote:
> [...]
>  > 3.1.  Fundamentals
>  > 
>  >       jid           = [ localpart "@" ] domainpart [ "/" resourcepart ]
>  >       localpart     = 1*1023(localpoint)
>  >                       ;
>  >                       ; a "localpoint" is a UTF-8 encoded
>  >                       ; Unicode code point that conforms to
>  >                       ; the "JIDlocalIdentifierClass" profile
>  >                       ; of the PRECIS IdentifierClass
>  >                       ;
> 
> This implies 1023 codepoints, not 1023 bytes to me. Same issue for ifqdn
> and resourcepart.  6122 just had 1*; I think going back to that would be
> fine since we have a rule below that captures the max size.
> 
That is somewhat debatable. 1* is as correct/wrong as 1*1023 is.
1023 is an upper limit on the number of codepoints (all of them 7-bit
ASCII), it does however not capture the separate rule about the maximum
number of bytes. I'm not sure which version is less confusing.

> [snip: when are rules applied, what about dumb clients?]
I specifically pushed for having text clarifying this. I thought it was
sufficiently covered in Section 4. Do you disagree, and/or think we need
a forward reference here?

>  >    A localpart MUST consist only of Unicode code points that conform to
>  >    the "JIDlocalIdentifierClass" profile of the "IdentifierClass" base
>  >    string class defined in [I-D.ietf-precis-framework].  The
>  >    JIDlocalIdentifierClass profile includes all code points allowed by
>  >    the IdentifierClass base class, with the exception of the following
>  >    characters that are explicitly disallowed in XMPP localparts:
> 
> (special precis focus)
> I would have expected this to be phrased more similarly to step 2 of
> http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-5, or
> for section 5 to just have a step about codepoints forbidden in a given
> usage of the selected precis class.
> 
Personally I think this is fine. It's what I would have expected from
the description in
<http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-4.1.6>.
If anything, I would have expected this to be phrased similarly to
section 3.2.3 or 3.3.3 (Disallowed).
My understanding is that this would be checked as part of step 5 in
section 5.

> [...]
>  >    1.  Fullwidth and halfwidth characters MAY be mapped to their
>  >        decomposition mappings.
> 
> (precis)
> I need a hint as to when do this.  "MAY" isn't nearly enough.
> 
I think this actually relates a lot to how we expect XMPP resources to
be used. They are not supposed to be entered by users. Possibly not even
user visible.

If this mapping was only relevant to servers, I'd say "whatever floats
your boat". Make it implementation defined behaviour and that is that.
Servers probably wouldn't implement it, because it's pointless work.

However, thick clients have some features that communicate full JIDs
between clients directly. For this case it is potentially vital that
both clients perform the same set of mappings. Hence, I actually tend to
prefer "MUST NOT" here.

>  >    2.  Map any instances of non-ASCII space to ASCII space (U+0020).
> 
> (precis)
> I was hoping either the framework doc or the mappings doc would tell me
> more about which characters to map here.  RFC 3454 had table C.1.2, but I
> don't see any hints about what I'm supposed to do now.  Is the rule "has a
> compatibility mapping to U+0020"?  That doesn't hit U+200B which is in
> C.1.2, nor does "has category Zs".  draft-ietf-precis-mappings says
> "Therefore, the special mapping table should be based on a well-
>    defined mapping table for each protocol", which although I don't
> particularly like, I can live with - but we need the table here.
> 
I think we need more specific text, and I think it is quite likely more
appropriate for the framework or mapping document than this profile.
I'm not sure it needs to be a table. Potentially "has category Zs" is
good enough, but I'd want to look closer at the Unicode data first.

I think your remark about U+200B being in C.1.2 is slightly off.
The tables in appendix C are specifically prohibition tables, and not
mapping tables. B.1 lists U+200B as commonly mapped to nothing, which I
think is the right thing to do wrt mapping this codepoint. Do you have
any other codepoint in mind that is not in Zs, but should be mapped to
ASCII space?

>  >    3.  So-called additional mappings MAY be applied, such as mapping of
>  >        characters that are similar to common delimiters (such as '@',
>  >        ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
>  >        STOP (U+3002) to FULL STOP (U+002E)) and special handling of
>  >        certain characters or classes of characters (e.g., mapping of
>  >        non-ASCII spaces to ASCII space); the PRECIS mappings document
>  >        [I-D.ietf-precis-mappings] describes such mappings in more
>  >        detail.
>  > 
>  >    4.  Uppercase and titlecase characters MAY be mapped to their
>  >        lowercase equivalents, preferably using Unicode Default Case
>  >        Folding as defined in Chapter 3 of the Unicode Standard
>  >        [UNICODE].
> 
> Again, I need more about the MAY here.
> 
See above. We might want to have a discussion about the effects of
different implementations performing different sets of mappings.
I personally think we want to explicitly mandate or disallow mappings of
codepoints that are otherwise PVALID.

> [...]
>  > Appendix A.  Differences from RFC 6122
>  > 
>  >    Based on consensus derived from working group discussion,
>  >    implementation and deployment experience, and formal interoperability
>  >    testing, the following substantive modifications were made from RFC
>  >    6122.
> 
> I think it might be nice to point out that this may have made
> previously-valid JIDs no longer valid (or vice-versa), and that we suggest
> careful testing before migrating user data.
> 
Do we have data to what extend this is the case? I.e. which codepoints
that were previously allowed are now disallowed?
I think it might be worthwhile to be very specific here, particularly
concerning why we think these changes are appropriate. There may be a
very vocal minority that is not okay with their JIDs being deprecated.

Regards,
Florian

[xmpp] review of draft-ietf-xmpp-6122bis-12 Joe Hildebrand (jhildebr)
Re: [xmpp] [precis] review of draft-ietf-xmpp-612… Florian Zeitz
Re: [xmpp] [precis] review of draft-ietf-xmpp-612… Joe Hildebrand (jhildebr)
Re: [xmpp] review of draft-ietf-xmpp-6122bis-12 Peter Saint-Andre
Re: [xmpp] review of draft-ietf-xmpp-6122bis-12 Peter Saint-Andre