Re: Possible BofF question -- I18n

Christian Huitema <huitema@huitema.net> Tue, 05 June 2018 06:34 UTC

Return-Path: <huitema@huitema.net>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6F74A130EDF for <ietf@ietfa.amsl.com>; Mon, 4 Jun 2018 23:34:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jf2yWD7TSgqm for <ietf@ietfa.amsl.com>; Mon, 4 Jun 2018 23:34:10 -0700 (PDT)
Received: from mx43-out1.antispamcloud.com (mx43-out1.antispamcloud.com [138.201.61.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 751F2130EDE for <ietf@ietf.org>; Mon, 4 Jun 2018 23:34:09 -0700 (PDT)
Received: from xsmtp12.mail2web.com ([168.144.250.177]) by mx26.antispamcloud.com with esmtps (TLSv1.2:AES128-SHA:128) (Exim 4.89) (envelope-from <huitema@huitema.net>) id 1fQ5Xe-0007dZ-2F for ietf@ietf.org; Tue, 05 Jun 2018 08:34:06 +0200
Received: from [10.5.2.17] (helo=xmail07.myhosting.com) by xsmtp12.mail2web.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from <huitema@huitema.net>) id 1fQ5XW-0000to-Jg for ietf@ietf.org; Tue, 05 Jun 2018 02:33:55 -0400
Received: (qmail 25290 invoked from network); 5 Jun 2018 06:33:51 -0000
Received: from unknown (HELO [192.168.0.104]) (Authenticated-user:_huitema@huitema.net@[172.56.42.104]) (envelope-sender <huitema@huitema.net>) by xmail07.myhosting.com (qmail-ldap-1.03) with ESMTPA for <ietf@ietf.org>; 5 Jun 2018 06:33:50 -0000
To: ietf@ietf.org
References: <383c2404-7beb-63e9-b2b2-e75fd1b174f1@mozilla.com> <20180601041949.GH14446@localhost> <A13FFF23-49BD-459D-8B5B-D3448154EEBC@frobbit.se> <20180601151053.GI14446@localhost> <2584adb9-1622-8b49-7236-ecc7dd374974@mozilla.com> <alpine.OSX.2.21.1806011219340.7621@ary.qy> <CAK3OfOgv33SJiPJ6ypo8k5hcpnjcJdRso6EXb9b12YNcdDgMUg@mail.gmail.com> <6c5d5618-74a5-dcc8-d818-89243a41f307@gmail.com> <20180603061350.GM14446@localhost> <d125f213-c096-1e93-0a6e-ffdfc55a7ac6@gmail.com> <20180605031021.GO14446@localhost> <CAC4RtVAHd37mHFv7TypVdKATtHtBNX0pEszbn+ke5RMh-oExMA@mail.gmail.com>
From: Christian Huitema <huitema@huitema.net>
Openpgp: preference=signencrypt
Autocrypt: addr=huitema@huitema.net; prefer-encrypt=mutual; keydata= xsBNBFIRX8gBCAC26usy/Ya38IqaLBSu33vKD6hP5Yw390XsWLaAZTeQR64OJEkoOdXpvcOS HWfMIlD5s5+oHfLe8jjmErFAXYJ8yytPj1fD2OdSKAe1TccUBiOXT8wdVxSr5d0alExVv/LO I/vA2aU1TwOkVHKSapD7j8/HZBrqIWRrXUSj2f5n9tY2nJzG9KRzSG0giaJWBfUFiGb4lvsy IaCaIU0YpfkDDk6PtK5YYzuCeF0B+O7N9LhDu/foUUc4MNq4K3EKDPb2FL1Hrv0XHpkXeMRZ olpH8SUFUJbmi+zYRuUgcXgMZRmZFL1tu6z9h6gY4/KPyF9aYot6zG28Qk/BFQRtj7V1ABEB AAHNJ0NocmlzdGlhbiBIdWl0ZW1hIDxodWl0ZW1hQGh1aXRlbWEubmV0PsLAeQQTAQIAIwUC UhFfyAIbLwcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEJNDCbJVyA1yhbYH/1ud6x6m VqGIp0JcZUfSQO8w+TjugqxCyGNn+w/6Qb5O/xENxNQ4HaMQ5uSRK9n8WKKDDRSzwZ4syKKf wbkfj05vgFxrjCynVbm1zs2X2aGXh+PxPL/WHUaxzEP7KjYbLtCUZDRzOOrm+0LMktngT/k3 6+EZoLEM52hwwpIAzJoscyEz7QfqMOZtFm6xQnlvDQeIrHx0KUvwo/vgDLK3SuruG1CSHcR0 D24kEEUa044AIUKBS3b0b8AR7f6mP2NcnLpdsibtpabi9BzqAidcY/EjTaoea46HXALk/eJd 6OLkLE6UQe1PPzQC4jB7rErX2BxnSkHDw50xMgLRcl5/b1bOwE0EUhFfyAEIAKp7Cp8lqKTV CC9QiAf6QTIjW+lie5J44Ad++0k8gRgANZVWubQuCQ71gxDWLtxYfFkEXjG4TXV/MUtnOliG 5rc2E+ih6Dg61Y5PQakm9OwPIsOx+2R+iSW325ngln2UQrVPgloO83QiUoi7mBJPbcHlxkhZ bd3+EjFxSLIQogt29sTcg2oSh4oljUpz5niTt69IOfZx21kf29NfDE+Iw56gfrxI2ywZbu5o G+d0ZSp0lsovygpk4jK04fDTq0vxjEU5HjPcsXC4CSZdq5E2DrF4nOh1UHkHzeaXdYR2Bn1Y wTePfaHBFlvQzI+Li/Q6AD/uxbTM0vIcsUxrv3MNHCUAEQEAAcLBfgQYAQIACQUCUhFfyAIb LgEpCRCTQwmyVcgNcsBdIAQZAQIABgUCUhFfyAAKCRC22tOSFDh1UOlBB/94RsCJepNvmi/c YiNmMnm0mKb6vjv43OsHkqrrCqJSfo95KHyl5Up4JEp8tiJMyYT2mp4IsirZHxz/5lqkw9Az tcGAF3GlFsj++xTyD07DXlNeddwTKlqPRi/b8sppjtWur6Pm+wnAHp0mQ7GidhxHccFCl65w uT7S/ocb1MjrTgnAMiz+x87d48n1UJ7yIdI41Wpg2XFZiA9xPBiDuuoPwFj14/nK0elV5Dvq 4/HVgfurb4+fd74PV/CC/dmd7hg0ZRlgnB5rFUcFO7ywb7/TvICIIaLWcI42OJDSZjZ/MAzz BeXm263lHh+kFxkh2LxEHnQGHCHGpTYyi4Z3dv03HtkH/1SI8joQMQq00Bv+RdEbJXfEExrT u4gtdZAihwvy97OPA2nCdTAHm/phkzryMeOaOztI4PS8u2Ce5lUB6P/HcGtK/038KdX5MYST Fn8KUDt4o29bkv0CUXwDzS3oTzPNtGdryBkRMc9b+yn9+AdwFEH4auhiTQXPMnl0+G3nhKr7 jvzVFJCRif3OAhEm4vmBNDE3uuaXFQnbK56GJrnqVN+KX5Z3M7X3fA8UcVCGOEHXRP/aubiw Ngawj0V9x+43kUapFp+nF69R53UI65YtJ95ec4PTO/Edvap8h1UbdEOc4+TiYwY1TBuIKltY 1cnrjgAWUh/Ucvr++/KbD9tD6C8=
Message-ID: <403781cc-901d-2357-6e2d-6c68317b212e@huitema.net>
Date: Mon, 04 Jun 2018 23:33:46 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <CAC4RtVAHd37mHFv7TypVdKATtHtBNX0pEszbn+ke5RMh-oExMA@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------499A8E7F733E72A5432DD43A"
Content-Language: en-US
Subject: Re: Possible BofF question -- I18n
X-Originating-IP: 168.144.250.177
X-AntiSpamCloud-Domain: xsmtpout.mail2web.com
X-AntiSpamCloud-Username: 168.144.250.0/24
Authentication-Results: antispamcloud.com; auth=pass smtp.auth=168.144.250.0/24@xsmtpout.mail2web.com
X-AntiSpamCloud-Outgoing-Class: unsure
X-AntiSpamCloud-Outgoing-Evidence: Combined (0.45)
X-Recommended-Action: accept
X-Filter-ID: EX5BVjFpneJeBchSMxfU5hCB+g2d5VmA0F/LQhyO1Op602E9L7XzfQH6nu9C/Fh9KJzpNe6xgvOx q3u0UDjvO4tDwdNAVmgs/7KSsJM2vK1s1ujulqUFmMITHM77eiViamddtzb/IiU5X++Fa9Ufcc7i TvJ2/ZGzVWB9scFAaCdIFaUvXN+CI+RGy3Me16pBy0LkIcQZ+RRGzUcmNZyDXh/TBCf6oYXAWGet lavcAjD9ytQxIHf9lN5jjLJaPK8lRJSPf/SXbEnDSsal/zZzc4n9VZdr7RAFD5mRwooUYhwMPaBP aKeQW+/QlaOdv8isl/qMm08Zpim2AHUKEWvQ6G/bWfgucjnNmABpGhD9TTttrFCuZ0NkwnSz2Luu o1u9uevuNfM1HjkNEFwape+IgNezYqxGMqsKjARq8PBC4qjpVMhqNcdjhoIlgrKzBvjTmdySlZou 9qHIGOZDEEo7Oyc1nq0gsY582CWqKjiRB3ukywmZtiDkyd4mEBjJGGEJgawbllbHk+xyUKopM6rc KCaQX/lIXcRWtobViGg9fpXXTg8/eGzRNUUVgcQ9smKz3i8E/aRzTFCeI34EHZZ1cyqYW70hkwA1 mrQfeuIi+VuU2yw6z85L3UyCdO+oQ3S7VPd90DA1c/ZLOZZo7XGPVfWv8HL1YL3Zn8TE/e4IMjT6 4dZYZAAUgQSn0n4YsmwRv0RwnfgFPg0ja1g9il7QHeggKW28pboyZCmKkHUYXakQnOynX+FvbeDV lcCnlQTbnXjBtq9Nz6QfirzlAbTufXtB4/4Pbrz2QtFuyl+Sh6fpQ3qZejKJfUGfjTTOjHFDfqb5 R4VemuUI6bcEARsm0FlT7Jswx8bNhhAZMlhFDwKNZy58uobCIkCdwVDO83SGTnM2K/9iKCD9v589 nVS3hWSdEOMftBjsWb6BDQzjSsHUIomTnJwT4ky6b7E7Hukt2Ge4B8NG0VKlrY+34Zmj+F/tjlrZ UvGhhjiSam0tWhQxL7hrJSk60SF3F6RYOYr2
X-Report-Abuse-To: spam@quarantine6.antispamcloud.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/CykPLCcfiwDCRHYr8xaKumvrHP0>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2018 06:34:13 -0000


On 6/4/2018 10:50 PM, Barry Leiba wrote:

.. a long list of questions, such as
> - we say that "nicolas" is not equivalent to "nicolás"
> - but we say that "nicolás" *is* equivalent to "nicola´s", and we
> handle this using normalization
> - does that mean that it's OK to have "nicolas" and "nicolás" as two
> different usernames assigned to two different users?
> - if yes, how do we deal with the human interface issues involved?
> What happens if the human identified as "nicolás" uses an input
> mechanism that doesn't have a way to enter "á"?  How can he log in?
> - if no, how do we make sure (in an automated way) that we don't make
> that assignment?
> - does the answer change if "nicolás" is a domain name instead of a username?
> - does the answer change if "nicolás" is a *password*?
> - and what about "nicolàs"?  and "nicolâs"?  and "nicoläs"?
> - what about "nicolаs" (that's a Cyrillic character in the penultimate
> position)?
> - what about "nicolαs" (that's a Greek character in the penultimate position)?
> - what about other Unicode characters that look like "a", either
> exactly (as with Cyrillic) or closely (as with Greek)?
> - what about handling of "ä" vs "ae"?  Do we want to avoid assigning
> "käse" and "kaese" as distinct usernames?  Does the answer to this
> differ depending upon whether the language is German (where using "ae"
> to represent "ä" is common) or Swedish (where it is not)?

When I look at these questions, I can't help thinking that we are trying
to deal with human interface issues at the wrong layer. Or rather, that
there are some layers at which the human interface issues are paramount,
and some layers at which it is much better to deal with binary strings.

For example, if I were writing a mail UI, I would be very concerned with
the representation of names and other strings. But then I would have
tools. I can consult with interaction designers, I can run the proposed
UI designs through user panels, I can design specific UI for specific
subsets of users, I can get feedback from beta users, I can analyze the
telemetry, I can push software updates to fix my inevitable mistakes.

On the other hand, I am writing an SMTP MTA, a DNS recursive resolver,
or a SIP server, I don't have any of those tools at my disposal. My
server is suppose to exactly implement the specified protocol. I will
only get indirect feedback from users who maybe are not even aware of
the server's presence. I will get telemetry about my server's
performance, but I won't be able to measure the level of befuddlement of
the users whose packets were processed.

Forty years ago, we started a path on a slippery slope with a basic
normalization process -- considering lower and upper case letters as
equivalent. That was probably justified by the hardware of the time,
when some devices could only produce upper case letters, something like
the Telex alphabet. But we slipped on the slope with enthusiasm,
embedding case insensitive comparisons in all kind of protocols, and
then attempting to extend the concept piecemeal to a variety of languages.

In hindsight, that was a bad idea. It leads to an expectation that
intermediaries not only can "normalize" character strings, but are
expected to do it. Barry gives some great examples of that silliness
with variations of European alphabets, but if I understand correctly the
same games can be played with Arabic/Persian letters or with variations
of the Chinese characters, and probably with quite a number of different
scripts.

Text comparison looks fundamentally like a human interaction engineering
issue, and a very hard one at that. I can't believe for a minute that
engineers writing code for message passing servers will deal with that
sort of problem without making a mess of it. Besides, it is not obvious
at all that there is one single right answer to these questions. So my
BofF question would not be "how to educate the engineers on the fine
points of normalizing Unicode strings", but rather, can we layer the
designs so that "the network" handles binary, and only specialized
systems handle the mapping from binary to "meaning"?

-- Christian Huitema