Re: [xmpp] Unicode Version Interop Concerns in JIDs

Florian Schmaus <flo@geekplace.eu> Fri, 13 September 2019 07:35 UTC

Return-Path: <fschmaus@gmail.com>
X-Original-To: xmpp@ietfa.amsl.com
Delivered-To: xmpp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4193712004D for <xmpp@ietfa.amsl.com>; Fri, 13 Sep 2019 00:35:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.922
X-Spam-Level:
X-Spam-Status: No, score=-1.922 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.026, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Io2-PVGfH56u for <xmpp@ietfa.amsl.com>; Fri, 13 Sep 2019 00:35:51 -0700 (PDT)
Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 047B0120020 for <xmpp@ietf.org>; Fri, 13 Sep 2019 00:35:51 -0700 (PDT)
Received: by mail-wr1-f47.google.com with SMTP id i18so312606wru.11 for <xmpp@ietf.org>; Fri, 13 Sep 2019 00:35:50 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to; bh=hF6IAz1XPKyN6WMV7ztGprItDngPPkS3/G5F3N87Vjs=; b=ivjxygcqTvMI2hMRbGCBhHvFeBl2PtFhrdowwH5VVBOEZGV/1yhHgpY55pVjBGEkmm pekDRymM6segYQMkFuuhH0OELiSsNUBXC4WehKWpE8sIY07koXAyTzBXB9Ipt7KPZTOL p/ZsU435NVMiutJmPV+aaN7IaszQtlW+K8LcVKrDZlHEKzagdeUVteUivP3u1xBNWRi5 Sy4ffAK/bwM/BV6tATrdABD9VNh//Jk5O9gcNu2uawU7sO/ttljauGCOfZa/04dpzhu6 ovmJqVGXEtoMWAiDnfoZm1qRasWj1eLWpZl5yj/Ac1wWS1zXoxDARQggoHbgNR2iq9DC S9Aw==
X-Gm-Message-State: APjAAAVxpNqexADC59LIMLCQBgZsbha14yYmhe3OBDgGmNIXo4vHdfyR zy9k39Xp0ui/nLyqET+zfE7sKzVWYsg=
X-Google-Smtp-Source: APXvYqwzCLuSHNyRa24YuYe/dGPjP5KZ/SBek1Zy/vWlhcpFud3ek1vrvGDkqD62fXw04VVbvcKkBw==
X-Received: by 2002:adf:f606:: with SMTP id t6mr5195820wrp.197.1568360148828; Fri, 13 Sep 2019 00:35:48 -0700 (PDT)
Received: from [10.188.34.160] (nat-inf2.rrze.uni-erlangen.de. [131.188.6.45]) by smtp.googlemail.com with ESMTPSA id q192sm1541702wme.23.2019.09.13.00.35.47 for <xmpp@ietf.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Sep 2019 00:35:47 -0700 (PDT)
To: xmpp@ietf.org
References: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
From: Florian Schmaus <flo@geekplace.eu>
Openpgp: preference=signencrypt
Autocrypt: addr=flo@geekplace.eu; prefer-encrypt=mutual; keydata= mQENBEw8UF4BCAC4H+pf0bJjP8iUvOXtyfM052WptOwK+YCVWx5y8TExQ6u2WuKnsLC5AhdQ qChyLU08zIkno2dvfhyRxxMqhUPmo60ckn6AjLrif28vZiHJRWCfJTipxL2mZO0xNW68d23k 9G4f7+hzNyjWV5SpFG2qg4DWKmwIonZHZMZAK3NtWK7h+3uIVXk32Veuseh/qACZRI63EuQH e+BhnHDFLbb7gYhm78tuzVobU1mEqiNSA783BpxoVUSCEine1/qB5kObmq9Nno0cwnPui8GS sAUmNItKC270UdwLimFdCnV8qEbVEVj+Nh+nE+LVMdNZJa95x/4HHz9oIj8TGc1/RNiXABEB AAG0IkZsb3JpYW4gU2NobWF1cyA8ZmxvQGdlZWtwbGFjZS5ldT6JAVcEEwEIAEECGwMFCwkI BwMFFQoJCAsFFgMCAQACHgECF4ACGQEWIQQTV7AYZbJQPBhFPSCMrCqWeFSONQUCWzoxDQUJ EsBHrwAKCRCMrCqWeFSONWwyB/9GdtTjAryks1OR5kbhSKiw132im4/Z0eDs681qHJlG/lyC uPORV+ru0fxxSNY4QxydG7+pJtJfaTtEn2562ziqr/peqtLdWw+F5RctJbiJD+TPEdAUdxA0 FlTdQoaGz3sC9NxVfYXDGTGGTvC2GUjI2PWgM6RCH1UA3/eWtDogypl4eXHJpjMwDclUxi/X rQjd65gCEAAt5YK+lF3cnvrVzc4AWHoVShVfPePyKAhJOdvFZn9f+3uRcNv5OsZUVZ8ZwAWv dqDp2MO1pjOTyO7aFc2sISjoXc5DQ+74F6e41/KyDImH6ims3mE5taD0RpgDfDxOMxaNvIGU MrzrmFbiuQENBEw8UF4BCADoJRRtsvwu0qPbYKZGxa+sJ44zDX8oLBr/UD8aESTPi7nXtc5V FRQ7v66JEKkKTYq9T/J29P5HsdxMomiR5pbaRUaAjeENscxzXY8BZTZVzSotqQ6ZHyOeGqkK XhNNVUx7pFZF1AO46bk8Ob++6jEFNCSIUNgiDsFggGwd3ngPLrpDblQQujC5pAT9JB6X+OnE 41cYSS5rCbDPaBKHtIyTftcCPwjsgic0qKMhXgthR86Qmna4ZUeHN9+8cEszk/LSEJysDv4Q +j9HiezRQxFXgKjsMyTdD8TAo3uVpZXc7vOrGagi7agK4QAMuozmbwVbOohYvR0w6mZmYEsE uh9fABEBAAGJATwEGAEIACYCGwwWIQQTV7AYZbJQPBhFPSCMrCqWeFSONQUCWzoxIwUJEsBH xQAKCRCMrCqWeFSONQdGB/9qe949moyhflZf/Gj8B4D7RQ+8z4taoo3LBbxl9Kp0gz2C0wgh kkeHDVt2Kf8yiRLkH9DdFnqowYb19qWHJ3+1dmUU2S8VWk24NYDE7svgw2lQOE8/pItXTG/C m9s7Rp8DHcTE1dqPwIGR7LhLtt/+U/NMZi8+cr/AiYlUCD88NcHEScqnO6srTzEWFye2BYRp m3ayR/DN2gJTIdWSqODT/yN07cFphYozg5aIgGzzy4nGGQnm5sLNmsvmu1oY2aAaK5LafqzA 60zEcnRKmX/MsGK7SiOHPIQrot33gjvhnhrtDcVfna26fTvdjkpZoczmpsQhjZdj0kU3VDyP yNkluQENBFdWjtMBCAC9XPyeOKXvBPiwMMqAZIXiqTpy7uKmElD1RpXYl/0ZC+oEvXhlYZE5 sAm3uRN3hulH86wNAP1lvV5nSRa/r4pPr1I8zqzfl1EN0CmVdeIR77UZOhfgLtEKRmUUf3YK 2ZIjVJ9zhYfBZpuuRd6ckoUzZsp2MgdID2ezxcpuBNL8EVkr15p5sEkEU+pqY/QUuXY1MCtf Cs0q4RWUO9UOiAX2tCbMVvDAxtItBEVIwJ5p94glK3tfaBfHE6787KbN5a5AV3vgKVGjlKHA FPr8yY+F5lj9fKjxCjgkga3nwz0vF+FX/8BbErBHU/gUgnFzbwZxq/+XtQxK297k5hc6kEVH ABEBAAGJArsEGAEIACYCGwIWIQQTV7AYZbJQPBhFPSCMrCqWeFSONQUCWzoxOAUJB6YJZQGJ wL0gBBkBCABmBQJXVo7TXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9wZW5wZ3Au ZmlmdGhob3JzZW1hbi5uZXQ5Nzc1MDU5RjNBMjFEQ0UxNkJFNEZCQUUyMjM5QTdFOEY1ODUy MDUyAAoJECI5p+j1hSBS7FsIAJVU3gkZdex8Tj+vwHeLdtupi5iGtcnkijnFyhC7Fbkzn83y Jj2QsYVpPGVC1X2zDFoqoV15GTqBnYoL3QayMZM4zglTP81nBSNbrOai2RYFnTMNv2ivgWPN j38y07+T0Z+boJ+0xrsTT5QYkk75cv8X694YhyaHTcljDwK56dhY+9i/h9cfPZON/cwWoymA PUxNsVqovUfFF+eX9gmZHjzqjEdsdcS5eXb1kr8sdXIhwYRfPeZutTzuKHEYzw1bIidxZeX8 +Q+qbZxC/IOTpE/JC++IAdABExtuZaaABirXXqXNTZPPROcF8Rfo9IoBuJ5s/2zR2j664fB/ p5JQyRwJEIysKpZ4VI41iYkIALMQ/+GvcUhdr0H8iYb1HeijZ2eTQRAv3j7cEAK+8dbBslYr b8eG7pO6swnuhXzEwuxSqoq1UA50sa7L++cN0oJk7S0FDkhVb7vDU1BNQ1DXTeiNbQpvLqXB Y7/drAwHGMo6PS4IkEhzBZfs7FP/Tewpr8LC9i4FdlzDcCxj5rHUVS/+aerd8KZtRKmXmes7 gBxZ+Klwj8eizPmLp4lRxwVjOLQxOSEielhWiuzSzlZGvz5RmBqumVc0sUSB/GTBKYpcIhP/ mBKGNutYkMzCK/JJ5LID/MCpsRsjH8Syd5aRg4shE0aeh1KV9WF/YiQPC/V03LO5Fx2JULpg wmAlqFE=
Message-ID: <b1eccf9b-e43b-9c38-c589-81c4a042487f@geekplace.eu>
Date: Fri, 13 Sep 2019 09:35:46 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <dbbb91ba-9116-50f7-fefa-09ef2bd5991d@ik.nu>
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="FUPHomxaR6T9J7DQf5qS3sge3xV4GeCYc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/xmpp/Qh261Z73mEr6TS7dUvcG44CXT74>
Subject: Re: [xmpp] Unicode Version Interop Concerns in JIDs
X-BeenThere: xmpp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: XMPP Working Group <xmpp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xmpp>, <mailto:xmpp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xmpp/>
List-Post: <mailto:xmpp@ietf.org>
List-Help: <mailto:xmpp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xmpp>, <mailto:xmpp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Sep 2019 07:35:54 -0000

On 10.09.19 16:38, Ralph Meijer wrote:
> Now another user comes along, using a server that supports Unicode 6.3.
> Since BACON wasn't defined before Unicode 9, its code point is
> unassigned. When receiving presence from the other user, what should the
> receiving server do?


There is a fourth option/solution:

       agility regarding the supported Unicode standard.

PRECIS libraries typically use the Unicode/Character predicate and
property retrieval functions (.e.g., isISOControl(),
getDirectionality(), …) from their runtime environment. For example the
java.lang.Character API.

The problem is that most runtimes only update their supported Unicode
version with new (major) runtime release (e.g., Java 9). The key
observation here is: to support a new version of the Unicode standard,
no API changes or changes of the PRECIS library are necessary. What
changes is usually that a code point transitions from unassigned to
assigned, and so do its properties, and hence the return values of that
Unicode predicate and property retrieval functions.

So it would be great if runtimes would provide a way to load a new
Unicode Character Database [1] without updating the runtime environment.
Something like Character.loadUnicodeDb("unicode-12.1.0.dat")

Now I know that we may not have not much control over what functionality
runtime environments provide. However, nothing prevents you from using
(or implementing) a Unicode library with that such feature. I am
considering implementing something like that in [2]. The whole process
could eventually be automated, which, I believe, is solving the issue
Ralph describes.

- Florian

1: http://www.unicode.org/reports/tr44/tr44-24.html
2: https://bitbucket.org/sco0ter/precis