Re: [xmpp] 6122bis: Unicode versions
Waqas Hussain <waqas20@gmail.com> Tue, 19 July 2011 05:49 UTC
Return-Path: <waqas20@gmail.com>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DC7AD21F86A1 for <xmpp@ietfa.amsl.com>; Mon, 18 Jul 2011 22:49:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.599
X-Spam-Level:
X-Spam-Status: No, score=-5.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id or8CYeSwTiBg for <xmpp@ietfa.amsl.com>; Mon, 18 Jul 2011 22:49:49 -0700 (PDT)
Received: from mail-gw0-f44.google.com (mail-gw0-f44.google.com [74.125.83.44]) by ietfa.amsl.com (Postfix) with ESMTP id 6FB9E21F8680 for <xmpp@ietf.org>; Mon, 18 Jul 2011 22:49:49 -0700 (PDT)
Received: by gwb20 with SMTP id 20so1863881gwb.31 for <xmpp@ietf.org>; Mon, 18 Jul 2011 22:49:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=elPLInBGSz36NNESO4ncmEQQWA3eS4ae2/MsbZX4pis=; b=dShAiZu13ZflzzbwN6+BVef8ColU6R5MF1+gCTLO7wM8XvvIAkEc0PiaOgeXx3E4CK Z0TDCzOJSXX6MpjGVPS7sG38zobO4rpKReiWkmHDn6v8ETfNZVm/nZXXV+ezn0YBgmGT fGkSS7Yi6nUSM1RAY3fUE630V6Rw81DKlZtR4=
Received: by 10.150.117.23 with SMTP id p23mr5380512ybc.358.1311054588135; Mon, 18 Jul 2011 22:49:48 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.150.157.16 with HTTP; Mon, 18 Jul 2011 22:49:28 -0700 (PDT)
In-Reply-To: <4E24F184.3040505@stpeter.im>
References: <4E20989B.1030709@stpeter.im> <CALm9TZ_OE-CQ-=354bGBi_cDmtJKeoG7gwkTBnzhQXEkq2uuxw@mail.gmail.com> <4E24B0D8.90808@stpeter.im> <CALm9TZ_bno7ZVeoAHpvgPATBR7f7M1e0H3iGT=OzeZpQ6h1U-A@mail.gmail.com> <4E24F184.3040505@stpeter.im>
From: Waqas Hussain <waqas20@gmail.com>
Date: Tue, 19 Jul 2011 10:49:28 +0500
Message-ID: <CALm9TZ_ZTqX3PzbTvprYmx6zF2KS6mO2H14SvgwghkQ9SAxFAw@mail.gmail.com>
To: Peter Saint-Andre <stpeter@stpeter.im>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: XMPP <xmpp@ietf.org>
Subject: Re: [xmpp] 6122bis: Unicode versions
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xmpp>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Jul 2011 05:49:54 -0000
On Tue, Jul 19, 2011 at 7:52 AM, Peter Saint-Andre <stpeter@stpeter.im> wrote: > On 7/18/11 5:12 PM, Waqas Hussain wrote: >> >> On Tue, Jul 19, 2011 at 3:16 AM, Peter Saint-Andre<stpeter@stpeter.im> >> wrote: >>> >>> On 7/16/11 1:54 PM, Waqas Hussain wrote: >>>> >>>> On Sat, Jul 16, 2011 at 12:44 AM, Peter Saint-Andre<stpeter@stpeter.im> >>>> wrote: >>>>> >>>>> The good thing about the post-stringprep world is that we have agility >>>>> with regard to Unicode versions (a.k.a. "Unicode agility"). No more >>>>> being stuck at Unicode 3.2! >>>>> >>>>> The bad thing is that we have Unicode agility. What if my client (or >>>>> your server) has Unicode 5.0 but my server has Unicode 6.0? The parties >>>>> might differ in their interpretation of certain code points, causing >>>>> problems with authentication, stanza routing, etc. >>>>> >>>>> We might be able to mitigate these problems if we had a way to discover >>>>> which version of Unicode the other side supports. >>>> >>>> That's the discussion I'm interested in. How can we mitigate Unicode >>>> version incompatibility? And also, stringprep and post-stringprep >>>> incompatability? Has there been anything written on this that I can >>>> read? >>> >>> So far, I am not too concerned about incompatibilities between Unicode >>> versions. See for example: >>> >>> https://datatracker.ietf.org/doc/draft-faltstrom-5892bis/ >>> >>> As you can see from that document, only three rather obscure code points >>> changed in backward-incompatible ways between Unicode 5.0 and Unicode >>> 6.0. >> >> That's reassuring. >> >>> Naturally, the changes between Unicode 3.2 (hardcoded into stringprep) >>> and >>> Unicode 6.0 were more substantial. Most of those changes were new code >>> points, not code points that changed in backward-incompatible ways. >>> During >>> the transition from IDNA2003 to IDNA2008, in practice the most >>> troublesome >>> code points were: >>> >>> 00DF (LATIN SMALL LETTER SHARP S) >>> 03C2 (GREEK SMALL LETTER FINAL SIGMA) >>> >>> See http://tools.ietf.org/html/rfc5894#section-7.2 for details (there are >>> other troublesome characters, but those were the worst because they were >>> more widely deployed). Domain registrars know about those code points and >>> probably have special processes for dealing with them. >> >> I was mainly concerned about things like Jehan's suggestion of >> 're-encoding'. I don't think that's somewhere we want to go. > > Agreed. > >>>> I don't really see there being a good solution. The best we might >>>> reasonably be able to do is handle<jid-malformed/> errors and just >>>> accept that either two incompatible entities wont be able to >>>> communicate at all, >>> >>> Unfortunate, but possible in a small number of cases. >>> >>>> or one entity might see the other entity as >>>> multiple JIDs. >>> >>> How so? >> >> See below. >> >>>> A recommendation that servers prep JIDs on all outgoing >>>> stanzas might fix the latter. >>> >>> s/recommendation/requirement/ :) >>> >>> But yes. >> >> Note what I mean here is that some servers while verifying JIDs on >> outgoing stanzas don't actually replace the to/from unprepped values >> with the prepped ones in what gets sent over the wire. So the JID >> ABC@example.com is sent as ABC@example.com over the wire, not as >> abc@example.com. This will interact badly with IDNA2003 and IDNA2008 >> having different outputs for a given input. > > I see your point, and I agree that prepping all outbound JIDs would help to > avoid this problem. Paradoxically, prepping inbound JIDs would hurt, not > help (see below). It helps too: I could create the JID fussball@example.com on a IDNA2003 server, and send stanzas as both fußball@example.com and fussball@example.com. If my server passes the 'from' attribute as-is, I can make myself seem like two entities to an IDNA2008 server/client. Not too worrying a problem I suppose. >> Effectively, if given unprepped JID string X, which IDNA2008 preps to >> string Y, but IDNA2003 accepts without prepping, and given the above >> server behavior, a client could receive stanzas from both X and Y, and >> treat them as the different JIDs when they are in fact the same >> (that's just one example, others, e.g., the reverse could also be >> possible). I haven't verified that this is actually possible, but IIRC >> the two specs don't have the same transformations in many cases. How >> compatible are the IDNA2003 vs IDNA2008 transformations? > > As explained in RFC 5894, in fact there are few characters that are > interpreted differently in IDNA2008 compared to IDNA2003: Eszett, Greek > Final Sigma, Zero Width Joiner, and Zero Width Non-Joiner. > > So, for instance, in IDNA2003 you could register fussball.de but not > fußball.de because ß was mapped to "ss" (see Appendix B of RFC 3454, i.e. > "Table B.2" as invoked by Nameprep in RFC 3491). In IDNA2008, ß is a > distinct, allowable character, so you can now register fußball.de. Clearly > the registrar for .de needs to know this when accepting registrations, > because it might want to reserve fußball.de if fussball.de is already > registered, automatically assign fußball.de to the registrant for > fussball.de, or apply some other policy. > > Now, the same is true for Nodeprep as used to stringprep the localpart of > JIDs -- see Appendix A of RFC 3920. So in the current XMPP network (RFC > 6122), you could register an account like fussball@example.com but if you > tried to register fußball@example.com it would be stringprepped to > fussball@example.com. If we migrate to 6122bis, fußball would be allowed as > a username. Therefore a 6122bis-compliant server might allow both accounts > to be registered and might route stanzas from both fussball@example.com and > fußball@example.com over an s2s link to your server. But if your > 6122-compliant server stringpreps JIDs on incoming stanzas then it would > consider both of those JIDs to be the same, since it doesn't consider > fußball@example.com to be valid (if your server doesn't stringprep JIDs on > incoming stanzas then it would return a <jid-malformed/> error instead). > Clearly this opens up the possibility of some attacks -- if I know you are > subscribed to fussball@example.com for the latest scores, I could register > fußball@example.com and send you bogus information. This makes a bit nervous. I assume this affects more than just XMPP? I'm interested in hearing what non-XMPP folks might have to say on the matter. > As mentioned, this applies to four characters that are allowed in IDNA2008 > and PRECIS (with the caveat that PRECIS isn't done yet!) but that are mapped > to other characters (ß mapped to ss, ς mapped to σ) or to nothing (for Zero > Width Joiner and Zero Width Non-Joiner) in IDNA2003 and Nodeprep. In these > four cases, the post-stringprep technologies are more inclusive and we could > have problems of the kind I've outlined above (not "double JIDs" but certain > new JIDs registered with 6122bis-compliant servers that would be treated as > equivalent to old JIDs by 6122-compliant software). Any migration plan we > devise will need to provide guidelines for handling these cases. +1. > All of this is a lot easier if existing servers reject JIDs that they > consider malformed, instead of prepping them. However, note that RFCs 3920 > and 6120 don't say that a server MUST reject malformed JIDs, so some > existing servers might be liberal in what they accept, which in this case > leads to bad consequences. I think that's what existing servers probably do. We can check/ask them. The ones which do stringprep anyway :) > Peter -- Waqas Hussain
- [xmpp] 6122bis: Unicode versions Peter Saint-Andre
- Re: [xmpp] 6122bis: Unicode versions Matt Miller
- Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
- Re: [xmpp] 6122bis: Unicode versions Jehan Pagès
- Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
- Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
- Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
- Re: [xmpp] 6122bis: Unicode versions Waqas Hussain
- Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
- Re: [xmpp] 6122bis: Unicode versions Florian Zeitz
- Re: [xmpp] 6122bis: Unicode versions Peter Saint-Andre
- Re: [xmpp] 6122bis: Unicode versions Florian Zeitz