Re: [xmpp] Unicode Version Interop Concerns in JIDs

Waqas Hussain <waqas20@gmail.com> Mon, 23 September 2019 20:11 UTC

Return-Path: <waqas20@gmail.com>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80B7312000F for <xmpp@ietfa.amsl.com>; Mon, 23 Sep 2019 13:11:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.748
X-Spam-Level:
X-Spam-Status: No, score=-1.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kZCwMifEuhdW for <xmpp@ietfa.amsl.com>; Mon, 23 Sep 2019 13:11:37 -0700 (PDT)
Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 21DE0120020 for <xmpp@ietf.org>; Mon, 23 Sep 2019 13:11:37 -0700 (PDT)
Received: by mail-oi1-x22b.google.com with SMTP id w144so8774669oia.6 for <xmpp@ietf.org>; Mon, 23 Sep 2019 13:11:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+znZvvqali4cc0o5+7d5vjfPAjtsGC1F6n9aMxxt0P0=; b=hbqkp2gVPkidBKJ5AFvektcagp8S54LUyDdOFWeadrKHxVtkwHqIcgvAYnPxqtlZwA VDtsE6G9Ds7NZN/BbUs9Fe9bzHHt1b9KGPbGtQdR2YKBI/bsetgfYRk3pfkZeOd8jTUs gcAKZbJP95nCU12MRqmPF9zsq93jVttAzBPi+6/e8J8tch92dY4euiJfqq0wCEh1nzap lBsE7ntAr+Xa2p6iMyvDHIzjoFkwjRaN4TkhC30fxaxa5/Jrelo8XSWVZK5aJwq2hnqi gLFttZO7NNx45YH1PFvZVQmKNas3jho2eJI7U2V7nxUOuzvxR+qsQKV7m/3nd2RIC7B/ pwqQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+znZvvqali4cc0o5+7d5vjfPAjtsGC1F6n9aMxxt0P0=; b=ii0OTE6lAcotp5ezWoCPb93RXwBA+b/8sFFuqmmTBcyu15Ga9u/Q2Xdmt1RQW06wJH snxtxX2Khy7ukYQUrhco33n54unBAAVV/1Tgz9H69cq5BUsuMonQGLKWV+5q5MWMbaVj W0APyRMtuzWnRoEw1TG/IZSTAv4HTnA4RI8u9Za/rf4BGS8AdeIFuqGrSs3CdQUvFN/N VlKBshyRMWNxhi36Tr0u0/QCbd0DMoUufW4y2GX2NKy+CNEfSmZnBzniJEe1MRAmTrvI 31f9SICSqnV/ciyApZMLaSuadlfvfiftB0ae7kDiAeVXVKwvYw+35qFMzICiXqa89h7K 4NPA==
X-Gm-Message-State: APjAAAVuljgpWKAO58xe2M+/qMVt8mHvFBQond93qgmM/2jS4lj+XvjQ 77NYmK3fyvo3C6+nyH4uRqvDv/TIUwAg6GDqIWI=
X-Google-Smtp-Source: APXvYqwBB1iUQeMKU/uMou42cLdJRj7MLmKAuyAXkz5WeZb3M4ZPLIHPurpG+Y8RqLzVjdzKyxRgVSOhwr7mV2yWOpU=
X-Received: by 2002:aca:7212:: with SMTP id p18mr1558327oic.165.1569269496267; Mon, 23 Sep 2019 13:11:36 -0700 (PDT)
MIME-Version: 1.0
References: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
In-Reply-To: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
From: Waqas Hussain <waqas20@gmail.com>
Date: Mon, 23 Sep 2019 16:11:25 -0400
Message-ID: <CALm9TZ8zba_ubSX=WvOSqFib_MMYR5P4jp+_4wc5DeRCU09Q7w@mail.gmail.com>
To: Ralph Meijer <ralphm@ik.nu>
Cc: XMPP Working Group <xmpp@ietf.org>, Alexey Melnikov <aamelnikov@fastmail.fm>, Barry Leiba <barryleiba@computer.org>
Content-Type: multipart/alternative; boundary="000000000000e1c9b705933e0667"
Archived-At: <https://mailarchive.ietf.org/arch/msg/xmpp/HAGTHbisrRnySzdqaOH-tEQFxgM>
Subject: Re: [xmpp] Unicode Version Interop Concerns in JIDs
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xmpp/>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Sep 2019 20:11:40 -0000

On Tue, Sep 10, 2019 at 10:39 AM Ralph Meijer <ralphm@ik.nu> wrote:

> Hi,
>
> Recently, there's been a discussion in the XSF Discussion room [1] about
> interop issues in the face of different Unicode versions used for
> processing XMPP Addresses, or JIDs. That particular discussion was
> mostly focused on nicknames in Multi-User Chat (MUC) rooms, which are
> encoded in the resourcepart of a JID, but is a concern for other address
> handling. As I suggested giving this topic a wider audience, I write on
> behalf of those involved in the initial discussion.
>
> Ever since RFC 6122 was obsoleted by RFC 7622 [2], both titled “XMPP:
> Address Format”, resourceprep (which was fixed to Unicode 3.2) was
> replaced by PRÉCIS processing as discussed in section 3.4. This in turn
> the the resourcepart is a OpaqueString profile of the PRECIS
> FreeformClass as defined in RFC 7613 [3], section 4.2 and RFC 7564 [4],
> section 4.3 respectively. The idea is that in the face of newer Unicode
> versions, application can make use of the new codepoints therein.
>
> RFC 7622 has extensive texts on JID handling, but there is uncertaintly
> over when servers, services like MUC, and clients, should be liberal or
> strict when checking JIDs. Different implementations perform their
> processing based on differing versions of Unicode, implementations have
> install bases still depending on older versions of the software and thus
> the Unicode version they check against, and finally, there are
> implementations and deployments performing the obsoleted stringprep.
>
> A particular example is the following. Say a MUC service (including its
> server-to-server (s2s) handling) checks against Unicode version 12. One
> user, with a client and their server checking against Unicode >=9,
> chooses to use the nickname 'I♥🥓' (I love bacon). The MUC service
> assumes everything is fine, and the occupant JID becomes
> room@muc.server.example/I♥🥓. Both the BLACK HEART SUIT (U+2665) and
> BACON (U+1F953) are in the Symbols, Other (So) category, and thus valid
> for FreeformClass.
>
> Now another user comes along, using a server that supports Unicode 6.3.
> Since BACON wasn't defined before Unicode 9, its code point is
> unassigned. When receiving presence from the other user, what should the
> receiving server do?
>
>   a) It is liberal in what it accepts from other servers, it passes
> incoming remote stanzas on to the client.
>
>   b) It is strict, and sends back a <jid-malformed/>, which likely boots
> the recipient from the room.
>
>   c) In case a), if it wants to use private messaging towards the
> occupant JID, their own server might reject this with a similar
> <jid-malformed/> error.
>
> The above is just an example. MIX [5] refers to RFC 7700 [6], obsoleted
> by RFC 8266, for preparing nicknames, which in turn also depends on
> FreeformClass, and thus exhibiting similar concerns, but not on the
> routing level.
>
> Basically the question comes down to: how do we robustly handle
> different Unicode Versions in clients, services, and servers?
>
> [1] <xmpp:xsf@muc.xmpp.org>
> [2] RFC 7622: XMPP: Address Format
>      <https://tools.ietf.org/html/rfc7622>
> [3] RFC 7613: PRECIS Representing Usernames and Passwords
>      <https://tools.ietf.org/html/rfc7613>
> [4] RFC 7564: PRECIS in Application Protocols
>      <https://tools.ietf.org/html/rfc7564>
> [5] XEP-0369: Mediated Information eXchange (MIX)
>      <https://xmpp.org/extensions/xep-0369.html>
> [6] <https://tools.ietf.org/html/rfc7700>
> [7] RFC 8266: PRECIS Representing Nicknames
>      <https://tools.ietf.org/html/rfc8266>
>
> --
> ralphm
>
> _______________________________________________
> xmpp mailing list
> xmpp@ietf.org
> https://www.ietf.org/mailman/listinfo/xmpp



There's an old thread on this from 2011 on the IETF list. I don't believe
the core compatibility issue ever got resolved. See this message and the
connected thread:

https://mailarchive.ietf.org/arch/msg/xmpp/A2PT_EpDpR1swKbHC_te0NWwRgs

We also lack any form of advertisement of supported unicode version by a
remote entity, which is unfortunate. A stream feature and a caps hash may
be useful.

--
Waqas