Re: Possible BofF question -- I18n
Nico Williams <nico@cryptonector.com> Tue, 05 June 2018 06:53 UTC
Return-Path: <nico@cryptonector.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 746EE130F00 for <ietf@ietfa.amsl.com>; Mon, 4 Jun 2018 23:53:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1Ul6LcG4jRnW for <ietf@ietfa.amsl.com>; Mon, 4 Jun 2018 23:53:44 -0700 (PDT)
Received: from homiemail-a55.g.dreamhost.com (homie-sub4.mail.dreamhost.com [69.163.253.135]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7F49F130EFF for <ietf@ietf.org>; Mon, 4 Jun 2018 23:53:44 -0700 (PDT)
Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id DBC6368013439 for <ietf@ietf.org>; Mon, 4 Jun 2018 23:53:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:references:in-reply-to:from:date:message-id:subject :to:cc:content-type; s=cryptonector.com; bh=vSFSLjace3iokE/xEQGZ 71jHK1k=; b=xzr9bEXLoY2SY4AR9plmAeTkTy84XxGl1+h0aV/K7iMrgD1ssjfb 8kBfdj+FZ3coQAqQOBhswCP3Jj5NR1AUXdCy58E1xlW+TTKAfVETxWzx4MW8++05 ik1sQg++UtX8kjxjOnDt1XxpNMxeMWoQ4lf7oq5usMUCW9nUF6AE1qo=
Received: from mail-vk0-f52.google.com (mail-vk0-f52.google.com [209.85.213.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id AAF7168013436 for <ietf@ietf.org>; Mon, 4 Jun 2018 23:53:43 -0700 (PDT)
Received: by mail-vk0-f52.google.com with SMTP id x4-v6so736273vkx.11 for <ietf@ietf.org>; Mon, 04 Jun 2018 23:53:43 -0700 (PDT)
X-Gm-Message-State: APt69E1QA9o12lK8o5YA6+zuMMNYonVScnMjryCOo7wQF+0kMJ91kN3p wBNIbMzBQ1juiJI43g6i8oBX1K3HCLWqnPLHFw==
X-Google-Smtp-Source: ADUXVKLsJPAnouDbkq9BxnQaSTf9IUWhLIPhHZYEcln6OXn7z9Ce2e/ub7mFpo0kjudbsQg+21YIg/mtAoT9GdJkYZ8=
X-Received: by 2002:a1f:d5c2:: with SMTP id m185-v6mr813376vkg.133.1528181623029; Mon, 04 Jun 2018 23:53:43 -0700 (PDT)
MIME-Version: 1.0
References: <383c2404-7beb-63e9-b2b2-e75fd1b174f1@mozilla.com> <20180601041949.GH14446@localhost> <A13FFF23-49BD-459D-8B5B-D3448154EEBC@frobbit.se> <20180601151053.GI14446@localhost> <2584adb9-1622-8b49-7236-ecc7dd374974@mozilla.com> <alpine.OSX.2.21.1806011219340.7621@ary.qy> <CAK3OfOgv33SJiPJ6ypo8k5hcpnjcJdRso6EXb9b12YNcdDgMUg@mail.gmail.com> <6c5d5618-74a5-dcc8-d818-89243a41f307@gmail.com> <20180603061350.GM14446@localhost> <d125f213-c096-1e93-0a6e-ffdfc55a7ac6@gmail.com> <20180605031021.GO14446@localhost> <CAC4RtVAHd37mHFv7TypVdKATtHtBNX0pEszbn+ke5RMh-oExMA@mail.gmail.com>
In-Reply-To: <CAC4RtVAHd37mHFv7TypVdKATtHtBNX0pEszbn+ke5RMh-oExMA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
Date: Tue, 05 Jun 2018 02:53:30 -0400
X-Gmail-Original-Message-ID: <CAK3OfOh8SsGgKTSKFXvgpY3Ju1=Mz+csuu9P_AMcC8KXBxfkkQ@mail.gmail.com>
Message-ID: <CAK3OfOh8SsGgKTSKFXvgpY3Ju1=Mz+csuu9P_AMcC8KXBxfkkQ@mail.gmail.com>
Subject: Re: Possible BofF question -- I18n
To: Barry Leiba <barryleiba@computer.org>
Cc: Brian E Carpenter <brian.e.carpenter@gmail.com>, IETF general list <ietf@ietf.org>, John R Levine <johnl@taugh.com>
Content-Type: multipart/alternative; boundary="000000000000cb0675056ddf8246"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/A1I9CyoVogeO6MvszupTNS3IgkE>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2018 06:53:48 -0000
On Tue, Jun 5, 2018 at 1:50 AM Barry Leiba <barryleiba@computer.org> wrote: > On Mon, Jun 4, 2018 at 11:10 PM, Nico Williams <nico@cryptonector.com> > wrote: > >> We're in a space where the evaluation of A==B depends on more than > >> the bit strings A and B. Your post about form-insensitive filename > >> comparisons is a case in point, although I don't pretend to understand > >> it. OK, we can argue whether that's a dark art or simply complicated > > > > form_insensitive_strcmp(a, b) == memcmp(normalize(a), normalize(b)) > > > > Except that actually one can greatly optimize this to avoid most of the > > compute and memory cost of normalization. > > > > To see why consider comparing my first name as I usually write it > > (Nicolas) vs. how it should be written (Nicolás). The two strings > > should compare as not equivalent. But the two ways to write the second > > form (with the ´ precomposed vs. decomposed) should compare as > > equivalent (because they are). > > But there's one of the things that makes this a complicated topic: I was describing a specific primitive, but i do like your taking this further: - we say that "nicolas" is not equivalent to "nicolás" > - but we say that "nicolás" *is* equivalent to "nicola´s", and we > handle this using normalization Right, that is simple enough. For some value of simple. You need normalization code (which isn't trivial), then it is simple. - does that mean that it's OK to have "nicolas" and "nicolás" as two > different usernames assigned to two different users? In filesystems there's also whether to be case-sensitive, and out can be a per-filesystem opt-in. As to usernames, principal names, and so on, well, it's a rather subjective choice. "Nicolas" is perfectly correct in French, and is distinct from "Nicolás", though it can be confusing, especially if you have software that cannot display accents... Now, they obvious question is: is this something a protocol should address by making ´ equivalent to 'a' globally, or should this be policy local to an appropriate administration domain? Well, that's a bit of a judgement call, but the best option is to give people the freedom to make that choice where possible. Thus, not globally considering any combinations of 'a' equivalent to each other and 'a... is the better approach. In terms of DNS, consider a proposal to ban mixing of scripts in any one label... But in South Korea it is common to mix Hangul with -ing endings, so why should .kr not be allowed to use at least that sort of actor mixing? There are almost certainly other similar cases, and more will arise as culture evolves! Who is in a better position than they registries to make such a decision? Certainly NOT the IETF, not any one participant and not the IETF collectively. There is a big difference between form equivalence (same exact character, two or more ways to represent it as the codepoint level) and confusables. We can trivially (see above) deal with the former, but the latter is going to need local policy. I really don't see a better answer re: confusables, and i know that's not a popular opinion, but i don't think it's wrong. - if yes, how do we deal with the human interface issues involved? > What happens if the human identified as "nicolás" uses an input > mechanism that doesn't have a way to enter "á"? How can he log in? Answered above. - if no, how do we make sure (in an automated way) that we don't make > that assignment? This one is easy: IF you really want this (i don't think we should want this globally) decompose (normalize to NFD) then drop combining codepoint. This answer won't work for cross-script confusables, naturally, which is partly why i wouldn't recommend this approach. - does the answer change if "nicolás" is a domain name instead of a > username? Same answer! The local authority (here: the registry) should decide this, write a policy, and enforce it (by having the registrars implement it). (See comments below about user-agents as vessels for local policy as well.) I don't think we can make such a policy globally that doesn't risk angering some local communities. - does the answer change if "nicolás" is a *password*? It can. Losing some entropy in a password might be safe, but this is simpler as a global policy rather than as local policy. It's even simpler to tell users to only use characters they can reliably input on all devices (this isn't as trivial as it should be, but by and large this approach works). - and what about "nicolàs"? and "nicolâs"? and "nicoläs? - what about "nicolаs" (that's a Cyrillic character in the penultimate > position)? - what about "nicolαs" (that's a Greek character in the penultimate > position)? > - what about other Unicode characters that look like "a", either > exactly (as with Cyrillic) or closely (as with Greek)? > - what about handling of "ä" vs "ae"? Do we want to avoid assigning > "käse" and "kaese" as distinct usernames? Same answers as above. Does the answer to this > differ depending upon whether the language is German (where using "ae" > to represent "ä" is common) or Swedish (where it is not)? Only if the context can let an end-user choose one (or more) language(s). DNS, for example, cannot. A filesystem cannot either. Text documents / word processors can (and might, especially in a search function). Now extend this to the many other characters that can look similar > (say, "n" vs "ñ" in Spanish). Extend it to other language-related > issues ("i" vs "ı" vs "İ" vs "I" in Turkish; all the character > variants in Arabic). Same answers as above. The protocols should be permissive. Local policies should be less so - perhaps no more permissive than is absolutely necessary. Note that a user-agent is also a place where local policy can be applied. In fact, there exist browser extensions to deal with confusables. These are only some of the reasons it's difficult. And the number of > people who stand up and say, "oh, just <do this> and the problem is > solved," demonstrates that too too too many people *think* they > understand... and don't. It's difficult because our world culture has globalized while at the sane tone we are not willing to unify confusable characters. When i say "we" here i mean mankind in all its local polities. We've tried Han unification, and that failed as a matter of politics. We (the IETF) can hate this all we like, but we cannot change it and should not even try. We've talked about human rights and I18N.. some might say that getting their characters drawn the way they want without needing a user context.. is a human right.. These are global political issues way beyond the IETF's reach. Nico --
- Possible OBF question -- I18n John C Klensin
- Re: Possible OBF question -- I18n John Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… John R Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… tom p.
- Re: Possible BofF question -- I18n (was: Re: Poss… John R Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Peter Saint-Andre
- Re: Possible BofF question -- I18n (was: Re: Poss… Donald Eastlake
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n Adam Roach
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n Brian E Carpenter
- RE: Possible BofF question -- I18n (was: Re: Poss… Larry Masinter
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- RE: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… Benjamin Kaduk
- Re: Possible BofF question -- I18n (was: Re: Poss… Peter Saint-Andre
- Re: Possible BofF question -- I18n (was: Re: Poss… Niels ten Oever
- Re: Possible BofF question -- I18n (was: Re: Poss… John R Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… Peter Saint-Andre
- Re: Possible BofF question -- I18n (was: Re: Poss… John R Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Stephen Farrell
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n (was: Re: Poss… Peter Saint-Andre
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Spencer Dawkins at IETF
- Re: Possible BofF question -- I18n Brian E Carpenter
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n John R Levine
- Re: Possible BofF question -- I18n (was: Re: Poss… John C Klensin
- Re: Possible BofF question -- I18n (was: Re: Poss… Nico Williams
- Re: Possible BofF question -- I18n Nico Williams
- Re: Possible BofF question -- I18n Nico Williams
- Re: Possible BofF question -- I18n (was: Re: Poss… Patrik Fältström
- Re: Possible BofF question -- I18n Brian E Carpenter
- Re: Possible BofF question -- I18n Nico Williams
- Re: Possible BofF question -- I18n Barry Leiba
- Re: Possible BofF question -- I18n Viktor Dukhovni
- Re: Possible BofF question -- I18n Christian Huitema
- Re: Possible BofF question -- I18n Nico Williams
- Re: Possible BofF question -- I18n tom p.
- Re: Possible BofF question -- I18n Barry Leiba
- Re: Possible BofF question -- I18n ned+ietf
- Re: Possible BofF question -- I18n ned+ietf