Re: [xmpp] 6122bis: Unicode versions

Waqas Hussain <waqas20@gmail.com> Tue, 19 July 2011 05:49 UTC

MIME-Version: 1.0
In-Reply-To: <4E24F184.3040505@stpeter.im>
References: <4E20989B.1030709@stpeter.im> <CALm9TZ_OE-CQ-=354bGBi_cDmtJKeoG7gwkTBnzhQXEkq2uuxw@mail.gmail.com> <4E24B0D8.90808@stpeter.im> <CALm9TZ_bno7ZVeoAHpvgPATBR7f7M1e0H3iGT=OzeZpQ6h1U-A@mail.gmail.com> <4E24F184.3040505@stpeter.im>
From: Waqas Hussain <waqas20@gmail.com>
Date: Tue, 19 Jul 2011 10:49:28 +0500
Message-ID: <CALm9TZ_ZTqX3PzbTvprYmx6zF2KS6mO2H14SvgwghkQ9SAxFAw@mail.gmail.com>
To: Peter Saint-Andre <stpeter@stpeter.im>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: XMPP <xmpp@ietf.org>
Subject: Re: [xmpp] 6122bis: Unicode versions
Precedence: list

On Tue, Jul 19, 2011 at 7:52 AM, Peter Saint-Andre <stpeter@stpeter.im> wrote:
> On 7/18/11 5:12 PM, Waqas Hussain wrote:
>>
>> On Tue, Jul 19, 2011 at 3:16 AM, Peter Saint-Andre<stpeter@stpeter.im>
>>  wrote:
>>>
>>> On 7/16/11 1:54 PM, Waqas Hussain wrote:
>>>>
>>>> On Sat, Jul 16, 2011 at 12:44 AM, Peter Saint-Andre<stpeter@stpeter.im>
>>>>  wrote:
>>>>>
>>>>> The good thing about the post-stringprep world is that we have agility
>>>>> with regard to Unicode versions (a.k.a. "Unicode agility"). No more
>>>>> being stuck at Unicode 3.2!
>>>>>
>>>>> The bad thing is that we have Unicode agility. What if my client (or
>>>>> your server) has Unicode 5.0 but my server has Unicode 6.0? The parties
>>>>> might differ in their interpretation of certain code points, causing
>>>>> problems with authentication, stanza routing, etc.
>>>>>
>>>>> We might be able to mitigate these problems if we had a way to discover
>>>>> which version of Unicode the other side supports.
>>>>
>>>> That's the discussion I'm interested in. How can we mitigate Unicode
>>>> version incompatibility? And also, stringprep and post-stringprep
>>>> incompatability? Has there been anything written on this that I can
>>>> read?
>>>
>>> So far, I am not too concerned about incompatibilities between Unicode
>>> versions. See for example:
>>>
>>> https://datatracker.ietf.org/doc/draft-faltstrom-5892bis/
>>>
>>> As you can see from that document, only three rather obscure code points
>>> changed in backward-incompatible ways between Unicode 5.0 and Unicode
>>> 6.0.
>>
>> That's reassuring.
>>
>>> Naturally, the changes between Unicode 3.2 (hardcoded into stringprep)
>>> and
>>> Unicode 6.0 were more substantial. Most of those changes were new code
>>> points, not code points that changed in backward-incompatible ways.
>>> During
>>> the transition from IDNA2003 to IDNA2008, in practice the most
>>> troublesome
>>> code points were:
>>>
>>>   00DF (LATIN SMALL LETTER SHARP S)
>>>   03C2 (GREEK SMALL LETTER FINAL SIGMA)
>>>
>>> See http://tools.ietf.org/html/rfc5894#section-7.2 for details (there are
>>> other troublesome characters, but those were the worst because they were
>>> more widely deployed). Domain registrars know about those code points and
>>> probably have special processes for dealing with them.
>>
>> I was mainly concerned about things like Jehan's suggestion of
>> 're-encoding'. I don't think that's somewhere we want to go.
>
> Agreed.
>
>>>> I don't really see there being a good solution. The best we might
>>>> reasonably be able to do is handle<jid-malformed/>    errors and just
>>>> accept that either two incompatible entities wont be able to
>>>> communicate at all,
>>>
>>> Unfortunate, but possible in a small number of cases.
>>>
>>>> or one entity might see the other entity as
>>>> multiple JIDs.
>>>
>>> How so?
>>
>> See below.
>>
>>>> A recommendation that servers prep JIDs on all outgoing
>>>> stanzas might fix the latter.
>>>
>>> s/recommendation/requirement/ :)
>>>
>>> But yes.
>>
>> Note what I mean here is that some servers while verifying JIDs on
>> outgoing stanzas don't actually replace the to/from unprepped values
>> with the prepped ones in what gets sent over the wire. So the JID
>> ABC@example.com is sent as ABC@example.com over the wire, not as
>> abc@example.com. This will interact badly with IDNA2003 and IDNA2008
>> having different outputs for a given input.
>
> I see your point, and I agree that prepping all outbound JIDs would help to
> avoid this problem. Paradoxically, prepping inbound JIDs would hurt, not
> help (see below).

It helps too: I could create the JID fussball@example.com on a
IDNA2003 server, and send stanzas as both fußball@example.com and
fussball@example.com. If my server passes the 'from' attribute as-is,
I can make myself seem like two entities to an IDNA2008 server/client.
Not too worrying a problem I suppose.

>> Effectively, if given unprepped JID string X, which IDNA2008 preps to
>> string Y, but IDNA2003 accepts without prepping, and given the above
>> server behavior, a client could receive stanzas from both X and Y, and
>> treat them as the different JIDs when they are in fact the same
>> (that's just one example, others, e.g., the reverse could also be
>> possible). I haven't verified that this is actually possible, but IIRC
>> the two specs don't have the same transformations in many cases. How
>> compatible are the IDNA2003 vs IDNA2008 transformations?
>
> As explained in RFC 5894, in fact there are few characters that are
> interpreted differently in IDNA2008 compared to IDNA2003: Eszett, Greek
> Final Sigma, Zero Width Joiner, and Zero Width Non-Joiner.
>
> So, for instance, in IDNA2003 you could register fussball.de but not
> fußball.de because ß was mapped to "ss" (see Appendix B of RFC 3454, i.e.
> "Table B.2" as invoked by Nameprep in RFC 3491). In IDNA2008, ß is a
> distinct, allowable character, so you can now register fußball.de. Clearly
> the registrar for .de needs to know this when accepting registrations,
> because it might want to reserve fußball.de if fussball.de is already
> registered, automatically assign fußball.de to the registrant for
> fussball.de, or apply some other policy.
>
> Now, the same is true for Nodeprep as used to stringprep the localpart of
> JIDs -- see Appendix A of RFC 3920. So in the current XMPP network (RFC
> 6122), you could register an account like fussball@example.com but if you
> tried to register fußball@example.com it would be stringprepped to
> fussball@example.com. If we migrate to 6122bis, fußball would be allowed as
> a username. Therefore a 6122bis-compliant server might allow both accounts
> to be registered and might route stanzas from both fussball@example.com and
> fußball@example.com over an s2s link to your server. But if your
> 6122-compliant server stringpreps JIDs on incoming stanzas then it would
> consider both of those JIDs to be the same, since it doesn't consider
> fußball@example.com to be valid (if your server doesn't stringprep JIDs on
> incoming stanzas then it would return a <jid-malformed/> error instead).
> Clearly this opens up the possibility of some attacks -- if I know you are
> subscribed to fussball@example.com for the latest scores, I could register
> fußball@example.com and send you bogus information.

This makes a bit nervous.

I assume this affects more than just XMPP? I'm interested in hearing
what non-XMPP folks might have to say on the matter.

> As mentioned, this applies to four characters that are allowed in IDNA2008
> and PRECIS (with the caveat that PRECIS isn't done yet!) but that are mapped
> to other characters (ß mapped to ss, ς mapped to σ) or to nothing (for Zero
> Width Joiner and Zero Width Non-Joiner) in IDNA2003 and Nodeprep. In these
> four cases, the post-stringprep technologies are more inclusive and we could
> have problems of the kind I've outlined above (not "double JIDs" but certain
> new JIDs registered with 6122bis-compliant servers that would be treated as
> equivalent to old JIDs by 6122-compliant software). Any migration plan we
> devise will need to provide guidelines for handling these cases.

+1.

> All of this is a lot easier if existing servers reject JIDs that they
> consider malformed, instead of prepping them. However, note that RFCs 3920
> and 6120 don't say that a server MUST reject malformed JIDs, so some
> existing servers might be liberal in what they accept, which in this case
> leads to bad consequences.

I think that's what existing servers probably do. We can check/ask
them. The ones which do stringprep anyway :)

> Peter

--
Waqas Hussain

[xmpp] 6122bis: Unicode versions Peter Saint-Andre
Re: [xmpp] 6122bis: Unicode versions Matt Miller
Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
Re: [xmpp] 6122bis: Unicode versions Jehan Pagès
Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
Re: [xmpp] 6122bis: Unicode versions Florian Zeitz
Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
Re: [xmpp] 6122bis: Unicode versions Florian Zeitz