[xmpp] Unicode Version Interop Concerns in JIDs
Ralph Meijer <ralphm@ik.nu> Tue, 10 September 2019 14:39 UTC
Return-Path: <ralphm@ik.nu>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0475612008A for <xmpp@ietfa.amsl.com>; Tue, 10 Sep 2019 07:39:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bmJUsiYvtDYm for <xmpp@ietfa.amsl.com>; Tue, 10 Sep 2019 07:38:59 -0700 (PDT)
Received: from mag.ik.nu (mag.ik.nu [IPv6:2001:985:2560:1::42]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D22B12007C for <xmpp@ietf.org>; Tue, 10 Sep 2019 07:38:59 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mag.ik.nu (Postfix) with ESMTP id 365B51600E9; Tue, 10 Sep 2019 16:38:57 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mag.ik.nu
Received: from mag.ik.nu ([IPv6:::1]) by localhost (mag.ik.nu [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id O3-CVpwiOLVN; Tue, 10 Sep 2019 16:38:54 +0200 (CEST)
Received: from [IPv6:2001:985:2560:1:95a7:f2d6:4ace:4715] (unknown [IPv6:2001:985:2560:1:95a7:f2d6:4ace:4715]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: ralphm) by mag.ik.nu (Postfix) with ESMTPSA id 937CF1600BF; Tue, 10 Sep 2019 16:38:52 +0200 (CEST)
To: xmpp@ietf.org
From: Ralph Meijer <ralphm@ik.nu>
Cc: Peter Saint-Andre <stpeter@mozilla.com>, Alexey Melnikov <aamelnikov@fastmail.fm>, Barry Leiba <barryleiba@computer.org>
Message-ID: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
Date: Tue, 10 Sep 2019 16:38:50 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/xmpp/a-WhzOTyOq168GujQHgzQ1-DURI>
Subject: [xmpp] Unicode Version Interop Concerns in JIDs
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xmpp/>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Sep 2019 14:39:02 -0000
Hi, Recently, there's been a discussion in the XSF Discussion room [1] about interop issues in the face of different Unicode versions used for processing XMPP Addresses, or JIDs. That particular discussion was mostly focused on nicknames in Multi-User Chat (MUC) rooms, which are encoded in the resourcepart of a JID, but is a concern for other address handling. As I suggested giving this topic a wider audience, I write on behalf of those involved in the initial discussion. Ever since RFC 6122 was obsoleted by RFC 7622 [2], both titled “XMPP: Address Format”, resourceprep (which was fixed to Unicode 3.2) was replaced by PRÉCIS processing as discussed in section 3.4. This in turn the the resourcepart is a OpaqueString profile of the PRECIS FreeformClass as defined in RFC 7613 [3], section 4.2 and RFC 7564 [4], section 4.3 respectively. The idea is that in the face of newer Unicode versions, application can make use of the new codepoints therein. RFC 7622 has extensive texts on JID handling, but there is uncertaintly over when servers, services like MUC, and clients, should be liberal or strict when checking JIDs. Different implementations perform their processing based on differing versions of Unicode, implementations have install bases still depending on older versions of the software and thus the Unicode version they check against, and finally, there are implementations and deployments performing the obsoleted stringprep. A particular example is the following. Say a MUC service (including its server-to-server (s2s) handling) checks against Unicode version 12. One user, with a client and their server checking against Unicode >=9, chooses to use the nickname 'I♥🥓' (I love bacon). The MUC service assumes everything is fine, and the occupant JID becomes room@muc.server.example/I♥🥓. Both the BLACK HEART SUIT (U+2665) and BACON (U+1F953) are in the Symbols, Other (So) category, and thus valid for FreeformClass. Now another user comes along, using a server that supports Unicode 6.3. Since BACON wasn't defined before Unicode 9, its code point is unassigned. When receiving presence from the other user, what should the receiving server do? a) It is liberal in what it accepts from other servers, it passes incoming remote stanzas on to the client. b) It is strict, and sends back a <jid-malformed/>, which likely boots the recipient from the room. c) In case a), if it wants to use private messaging towards the occupant JID, their own server might reject this with a similar <jid-malformed/> error. The above is just an example. MIX [5] refers to RFC 7700 [6], obsoleted by RFC 8266, for preparing nicknames, which in turn also depends on FreeformClass, and thus exhibiting similar concerns, but not on the routing level. Basically the question comes down to: how do we robustly handle different Unicode Versions in clients, services, and servers? [1] <xmpp:xsf@muc.xmpp.org> [2] RFC 7622: XMPP: Address Format <https://tools.ietf.org/html/rfc7622> [3] RFC 7613: PRECIS Representing Usernames and Passwords <https://tools.ietf.org/html/rfc7613> [4] RFC 7564: PRECIS in Application Protocols <https://tools.ietf.org/html/rfc7564> [5] XEP-0369: Mediated Information eXchange (MIX) <https://xmpp.org/extensions/xep-0369.html> [6] <https://tools.ietf.org/html/rfc7700> [7] RFC 8266: PRECIS Representing Nicknames <https://tools.ietf.org/html/rfc8266> -- ralphm
- [xmpp] Unicode Version Interop Concerns in JIDs Ralph Meijer
- Re: [xmpp] Unicode Version Interop Concerns in JI… Florian Schmaus
- Re: [xmpp] Unicode Version Interop Concerns in JI… Sam Whited
- Re: [xmpp] Unicode Version Interop Concerns in JI… Alexey Melnikov
- Re: [xmpp] Unicode Version Interop Concerns in JI… Waqas Hussain