Re: [xmpp] Unicode Version Interop Concerns in JIDs

Alexey Melnikov <alexey.melnikov@isode.com> Mon, 23 September 2019 15:52 UTC

Return-Path: <alexey.melnikov@isode.com>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5573A120130 for <xmpp@ietfa.amsl.com>; Mon, 23 Sep 2019 08:52:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=isode.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RPDx4qe5cRPy for <xmpp@ietfa.amsl.com>; Mon, 23 Sep 2019 08:52:17 -0700 (PDT)
Received: from statler.isode.com (Statler.isode.com [62.232.206.189]) by ietfa.amsl.com (Postfix) with ESMTP id D724B12084A for <xmpp@ietf.org>; Mon, 23 Sep 2019 08:52:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1569253936; d=isode.com; s=june2016; i=@isode.com; bh=S8qIJ9rfH0/3zQ3Nl6dnfAodPZsMfb2HXtJtNwzelx8=; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version: In-Reply-To:References:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description; b=Wj+mNgRNcQIkR106sQuHjai4hWnbtODVaYdnkKhAuE2tXDJw8vmI/jUk6pbukUEVJJ1+kB LHrlD9/2F/5MvA4TLzqPJEwbdtWRWQVPNXSWPfXD15DoGVTUR3MjFX2r/2To4p6NgyR6aN +5nC8RaZIeb9ZABeWFZU4Yw9bCWikxc=;
Received: from [192.168.1.237] (host86-134-78-18.range86-134.btcentralplus.com [86.134.78.18]) by statler.isode.com (submission channel) via TCP with ESMTPSA id <XYjqLQB8p54O@statler.isode.com>; Mon, 23 Sep 2019 16:52:15 +0100
To: Ralph Meijer <ralphm@ik.nu>, xmpp@ietf.org
Cc: Peter Saint-Andre <stpeter@mozilla.com>, Alexey Melnikov <aamelnikov@fastmail.fm>, Barry Leiba <barryleiba@computer.org>
References: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
From: Alexey Melnikov <alexey.melnikov@isode.com>
Autocrypt: addr=alexey.melnikov@isode.com; prefer-encrypt=mutual; keydata= xsBNBFWQBiQBCADFmiucA1/FCqO+LUOm/Xf2+NpPuSbPcLAWd0x1K1V4F1WTPScSolQ/u0y8 faozrF3uQXZxInvmLJOALfOqm4lfg8CN2BqAxMrlCqka1Ku8UJ9A6kOGaZWlBKUmiIjVng9D 91k8MRare9dE5b0Yj33mUO/ifhC+np0H7CXpB6E2IzvAUkgWCPlXEVO6ffV1Xr+J/UeArqoF Fj1RoMN+Kc701e3GzKHpuryng66Jx9+k7daOSgWNF1zOU1JCJKIZ1uHIlzro1y0KtvWTwwM1 331q72HWESG0NatDnu1QotxxFHLDQFsHZ59A5yvIdyeZvjuEr9paorNVIk7Esg4THaljABEB AAHNK0FsZXhleSBNZWxuaWtvdiA8YWxleGV5Lm1lbG5pa292QGlzb2RlLmNvbT7CwHkEEwEI ACMFAlWQBiQCGw8HCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBdp82zsCM7iDACB/4q EiLSBhpjDJ+pm1f6IXQvtTW3YGrca0kidZ0yX/qn30bkRtFsjXJVOspKENzBhZCI5bX/Si0I qKkR0DqnuJqchVzKXl25HfMvA2w2KRr8VFLFWMnCB4/jnaMWWQ4EV1MqbyFXl3m0LwZ1U4rd EQLvzPTNd8tqyy093rN53jTl0FmBAEHYzbUHsYB6wx8gqJBFkIGEgPHftZboJ/8ywo983YBQ /8Brb8awV3PeonmHc7W9QMKoN37U9VLbXOvAZVDvJ4QI1P/P3Uad1tkkeyuCcluuPe2M7CjK HF8zQHYGSCz70NY+MhbfhgT5WMjPuW5ls+Q1yES257+lmRkx3eyT
Message-ID: <481768d0-7ca4-b515-32b2-223b1d3e816c@isode.com>
Date: Mon, 23 Sep 2019 16:52:12 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.0
In-Reply-To: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-transfer-encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/xmpp/SXymXR6fubMwe-FjNnOVDUenr-U>
Subject: Re: [xmpp] Unicode Version Interop Concerns in JIDs
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xmpp/>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Sep 2019 15:52:20 -0000

Hi Ralph,

On 10/09/2019 15:38, Ralph Meijer wrote:
> Hi,
> 
> Recently, there's been a discussion in the XSF Discussion room [1] about
> interop issues in the face of different Unicode versions used for
> processing XMPP Addresses, or JIDs. That particular discussion was
> mostly focused on nicknames in Multi-User Chat (MUC) rooms, which are
> encoded in the resourcepart of a JID, but is a concern for other address
> handling. As I suggested giving this topic a wider audience, I write on
> behalf of those involved in the initial discussion.
> 
> Ever since RFC 6122 was obsoleted by RFC 7622 [2], both titled “XMPP:
> Address Format”, resourceprep (which was fixed to Unicode 3.2) was
> replaced by PRÉCIS processing as discussed in section 3.4. This in turn
> the the resourcepart is a OpaqueString profile of the PRECIS
> FreeformClass as defined in RFC 7613 [3], section 4.2 and RFC 7564 [4],
> section 4.3 respectively. The idea is that in the face of newer Unicode
> versions, application can make use of the new codepoints therein.
> 
> RFC 7622 has extensive texts on JID handling, but there is uncertaintly
> over when servers, services like MUC, and clients, should be liberal or
> strict when checking JIDs. Different implementations perform their
> processing based on differing versions of Unicode, implementations have
> install bases still depending on older versions of the software and thus
> the Unicode version they check against, and finally, there are
> implementations and deployments performing the obsoleted stringprep.
> 
> A particular example is the following. Say a MUC service (including its
> server-to-server (s2s) handling) checks against Unicode version 12. One
> user, with a client and their server checking against Unicode >=9,
> chooses to use the nickname 'I♥🥓' (I love bacon). The MUC service
> assumes everything is fine, and the occupant JID becomes
> room@muc.server.example/I♥🥓. Both the BLACK HEART SUIT (U+2665) and
> BACON (U+1F953) are in the Symbols, Other (So) category, and thus valid
> for FreeformClass.
> 
> Now another user comes along, using a server that supports Unicode 6.3.
> Since BACON wasn't defined before Unicode 9, its code point is
> unassigned. When receiving presence from the other user, what should the
> receiving server do?
> 
>  a) It is liberal in what it accepts from other servers, it passes
> incoming remote stanzas on to the client.
> 
>  b) It is strict, and sends back a <jid-malformed/>, which likely boots
> the recipient from the room.
> 
>  c) In case a), if it wants to use private messaging towards the
> occupant JID, their own server might reject this with a similar
> <jid-malformed/> error.
> 
> The above is just an example. MIX [5] refers to RFC 7700 [6], obsoleted
> by RFC 8266, for preparing nicknames, which in turn also depends on
> FreeformClass, and thus exhibiting similar concerns, but not on the
> routing level.
> 
> Basically the question comes down to: how do we robustly handle
> different Unicode Versions in clients, services, and servers?

I agree this is a problem and as you demonstrate above there might not
be any simple solution for it. I think this issue deserve a document to
be written (an Internet Draft or an XSF document).

My personal inclination is that clients/servers should be liberal with
unassigned codepoints in JIDs that they don't control (i.e. servers can
be strict with local JIDs, but should be liberal with remote JIDs),
which I think is either case a) or an extension of it.

Best Regards,
Alexey

> [1] <xmpp:xsf@muc.xmpp.org>
> [2] RFC 7622: XMPP: Address Format
>     <https://tools.ietf.org/html/rfc7622>
> [3] RFC 7613: PRECIS Representing Usernames and Passwords
>     <https://tools.ietf.org/html/rfc7613>
> [4] RFC 7564: PRECIS in Application Protocols
>     <https://tools.ietf.org/html/rfc7564>
> [5] XEP-0369: Mediated Information eXchange (MIX)
>     <https://xmpp.org/extensions/xep-0369.html>
> [6] <https://tools.ietf.org/html/rfc7700>
> [7] RFC 8266: PRECIS Representing Nicknames
>     <https://tools.ietf.org/html/rfc8266>
>