[xmpp] Unicode Version Interop Concerns in JIDs

Ralph Meijer <ralphm@ik.nu> Tue, 10 September 2019 14:39 UTC

Return-Path: <ralphm@ik.nu>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0475612008A for <xmpp@ietfa.amsl.com>; Tue, 10 Sep 2019 07:39:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bmJUsiYvtDYm for <xmpp@ietfa.amsl.com>; Tue, 10 Sep 2019 07:38:59 -0700 (PDT)
Received: from mag.ik.nu (mag.ik.nu [IPv6:2001:985:2560:1::42]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D22B12007C for <xmpp@ietf.org>; Tue, 10 Sep 2019 07:38:59 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mag.ik.nu (Postfix) with ESMTP id 365B51600E9; Tue, 10 Sep 2019 16:38:57 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mag.ik.nu
Received: from mag.ik.nu ([IPv6:::1]) by localhost (mag.ik.nu [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id O3-CVpwiOLVN; Tue, 10 Sep 2019 16:38:54 +0200 (CEST)
Received: from [IPv6:2001:985:2560:1:95a7:f2d6:4ace:4715] (unknown [IPv6:2001:985:2560:1:95a7:f2d6:4ace:4715]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: ralphm) by mag.ik.nu (Postfix) with ESMTPSA id 937CF1600BF; Tue, 10 Sep 2019 16:38:52 +0200 (CEST)
To: xmpp@ietf.org
From: Ralph Meijer <ralphm@ik.nu>
Cc: Peter Saint-Andre <stpeter@mozilla.com>, Alexey Melnikov <aamelnikov@fastmail.fm>, Barry Leiba <barryleiba@computer.org>
Message-ID: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
Date: Tue, 10 Sep 2019 16:38:50 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/xmpp/a-WhzOTyOq168GujQHgzQ1-DURI>
Subject: [xmpp] Unicode Version Interop Concerns in JIDs
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xmpp/>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Sep 2019 14:39:02 -0000

Hi,

Recently, there's been a discussion in the XSF Discussion room [1] about 
interop issues in the face of different Unicode versions used for 
processing XMPP Addresses, or JIDs. That particular discussion was 
mostly focused on nicknames in Multi-User Chat (MUC) rooms, which are 
encoded in the resourcepart of a JID, but is a concern for other address 
handling. As I suggested giving this topic a wider audience, I write on 
behalf of those involved in the initial discussion.

Ever since RFC 6122 was obsoleted by RFC 7622 [2], both titled “XMPP: 
Address Format”, resourceprep (which was fixed to Unicode 3.2) was 
replaced by PRÉCIS processing as discussed in section 3.4. This in turn 
the the resourcepart is a OpaqueString profile of the PRECIS 
FreeformClass as defined in RFC 7613 [3], section 4.2 and RFC 7564 [4], 
section 4.3 respectively. The idea is that in the face of newer Unicode 
versions, application can make use of the new codepoints therein.

RFC 7622 has extensive texts on JID handling, but there is uncertaintly 
over when servers, services like MUC, and clients, should be liberal or 
strict when checking JIDs. Different implementations perform their 
processing based on differing versions of Unicode, implementations have 
install bases still depending on older versions of the software and thus 
the Unicode version they check against, and finally, there are 
implementations and deployments performing the obsoleted stringprep.

A particular example is the following. Say a MUC service (including its 
server-to-server (s2s) handling) checks against Unicode version 12. One 
user, with a client and their server checking against Unicode >=9, 
chooses to use the nickname 'I♥🥓' (I love bacon). The MUC service 
assumes everything is fine, and the occupant JID becomes 
room@muc.server.example/I♥🥓. Both the BLACK HEART SUIT (U+2665) and 
BACON (U+1F953) are in the Symbols, Other (So) category, and thus valid 
for FreeformClass.

Now another user comes along, using a server that supports Unicode 6.3. 
Since BACON wasn't defined before Unicode 9, its code point is 
unassigned. When receiving presence from the other user, what should the 
receiving server do?

  a) It is liberal in what it accepts from other servers, it passes 
incoming remote stanzas on to the client.

  b) It is strict, and sends back a <jid-malformed/>, which likely boots 
the recipient from the room.

  c) In case a), if it wants to use private messaging towards the 
occupant JID, their own server might reject this with a similar 
<jid-malformed/> error.

The above is just an example. MIX [5] refers to RFC 7700 [6], obsoleted 
by RFC 8266, for preparing nicknames, which in turn also depends on 
FreeformClass, and thus exhibiting similar concerns, but not on the 
routing level.

Basically the question comes down to: how do we robustly handle 
different Unicode Versions in clients, services, and servers?

[1] <xmpp:xsf@muc.xmpp.org>
[2] RFC 7622: XMPP: Address Format
     <https://tools.ietf.org/html/rfc7622>
[3] RFC 7613: PRECIS Representing Usernames and Passwords
     <https://tools.ietf.org/html/rfc7613>
[4] RFC 7564: PRECIS in Application Protocols
     <https://tools.ietf.org/html/rfc7564>
[5] XEP-0369: Mediated Information eXchange (MIX)
     <https://xmpp.org/extensions/xep-0369.html>
[6] <https://tools.ietf.org/html/rfc7700>
[7] RFC 8266: PRECIS Representing Nicknames
     <https://tools.ietf.org/html/rfc8266>

-- 
ralphm