Re: Possible BofF question -- I18n

ned+ietf@mauve.mrochek.com Tue, 05 June 2018 14:24 UTC

Return-Path: <ned+ietf@mauve.mrochek.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4592C1310AD for <ietf@ietfa.amsl.com>; Tue, 5 Jun 2018 07:24:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5uD3JXGCbMzA for <ietf@ietfa.amsl.com>; Tue, 5 Jun 2018 07:24:54 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [68.183.62.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7F7501310AB for <ietf@ietf.org>; Tue, 5 Jun 2018 07:24:54 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01QTC9HQWK1C00ZDVO@mauve.mrochek.com> for ietf@ietf.org; Tue, 5 Jun 2018 07:19:51 -0700 (PDT)
MIME-version: 1.0
Content-transfer-encoding: 8bit
Content-type: TEXT/PLAIN; charset="utf-8"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01QSRDQV586O000051@mauve.mrochek.com> (original mail from NED@mauve.mrochek.com) for ietf@ietf.org; Tue, 5 Jun 2018 07:19:48 -0700 (PDT)
From: ned+ietf@mauve.mrochek.com
Cc: "ietf@ietf.org Discussion" <ietf@ietf.org>
Message-id: <01QTC9HOXY0S000051@mauve.mrochek.com>
Date: Tue, 05 Jun 2018 05:33:41 -0700
Subject: Re: Possible BofF question -- I18n
In-reply-to: "Your message dated Tue, 05 Jun 2018 02:14:28 -0400" <8A1334A8-D44B-4675-8C6C-5A50643015ED@dukhovni.org>
References: <383c2404-7beb-63e9-b2b2-e75fd1b174f1@mozilla.com> <20180601041949.GH14446@localhost> <A13FFF23-49BD-459D-8B5B-D3448154EEBC@frobbit.se> <20180601151053.GI14446@localhost> <2584adb9-1622-8b49-7236-ecc7dd374974@mozilla.com> <alpine.OSX.2.21.1806011219340.7621@ary.qy> <CAK3OfOgv33SJiPJ6ypo8k5hcpnjcJdRso6EXb9b12YNcdDgMUg@mail.gmail.com> <6c5d5618-74a5-dcc8-d818-89243a41f307@gmail.com> <20180603061350.GM14446@localhost> <d125f213-c096-1e93-0a6e-ffdfc55a7ac6@gmail.com> <20180605031021.GO14446@localhost> <CAC4RtVAHd37mHFv7TypVdKATtHtBNX0pEszbn+ke5RMh-oExMA@mail.gmail.com> <8A1334A8-D44B-4675-8C6C-5A50643015ED@dukhovni.org>
To: Viktor Dukhovni <ietf-dane@dukhovni.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/HD8QzfMOPpezDZUFIOdieFi90u8>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2018 14:24:57 -0000

Viktor is spot-on here. A few addtitional comments...

> > On Jun 5, 2018, at 1:50 AM, Barry Leiba <barryleiba@computer.org> wrote:
> >
> >> To see why consider comparing my first name as I usually write it
> >> (Nicolas) vs.  how it should be written (Nicolás).  The two strings
> >> should compare as not equivalent.  But the two ways to write the second
> >> form (with the &acute; precomposed vs. decomposed) should compare as
> >> equivalent (because they are).

> The above is a simple statement about *equivalence*

> > - does that mean that it's OK to have "nicolas" and "nicolás" as two
> > different usernames assigned to two different users?

> They are not equivalent, whether assigning both is a good idea
> is an entirely separate question.

And one which is highly context dependent. The most the IETF could possibly
do in this space is document some of the tradeoffs. And I question whether
it's in the IETF's job description to undertake such work.

> > - what about handling of "ä" vs "ae"?

> They are not equivalent. This is confusability question,
> not an equivalence question.

> >  Do we want to avoid assigning
> > "käse" and "kaese" as distinct usernames?

> Ditto.

> > These are only some of the reasons it's difficult.  And the number of
> > people who stand up and say, "oh, just <do this> and the problem is
> > solved," demonstrates that too too too many people *think* they
> > understand... and don't.

> Or do, but the participants are talking past each other...

More likely people are hearing what they want to - or have conditioned
themselves to - hear. A lot of times what people are actually saying is, "We
have no choice but to solve this problem, none of the solutions are ideal
because <reasons>, but I think the best we can do is to <do this>."

Somehow the first part of that is lost, leaving just the <do this> part, and
then the accusations of failing to understand - and worse - start flying.

This has been going on since I began participating in the IETF 29 years ago;
and frankly it got tiresome after the first month.

But there are three key differences between then and now.

The first is that back there was critical work to do: Internet email was
limited to ASCII text, charsets were a balkanized disaster, etc. Solving these
problems was difficult both technically and politically, but it was clear that
the benefits far outweighed the costs.

Fast forward to today, and most of those problems are solved: MIME/EAI/IDN,
Unicode/UTF-8, etc. The benefits of working on the remaining issues are far
smaller, and as for the costs ... a previous email in this thread actually had
the unbridled temerity to haul out the old "Americans think ASCII is
sufficient" line. The memories this calls forth are not good.

The second difference is that we now have decades of experience with how these
efforts actually go to look back on. And to be blunt, the track record of the
"experts" isn't exactly stellar.  For example, 25 years ago the expert
consensus was that any solution to the charset problem had to retain all of the
balkanization - and the specific proposal was to use a profile of ISO 2022 for
inline charset selection. As for Unicode, it was regarded as a hot mess
unworthy of any real consideration.

Back in 1992, Nathaniel Borenstein and I wrote in RFC 1341 that:

            It is our hope  that  ISO  10646  or  some  other
            effort  will  eventually define a single world character set
            which can then be specified for use in Internet mail, but in
            the  advance of that definition we cannot specify the use of
            ISO  10646,  Unicode,  or  any  other  character  set  whose
            definition is, as of this writing, incomplete.

This hardly seems like a contentious statement now, but it definitely was at
the time.

Of course anyone can be wrong. But there have been some real whoppers along the
way, and this along with the absolute certainty that accompanied them makes
trust somewhat difficult.

> Natural language issues are messy, and necessitate trade-offs,
> which trade-offs to make can be the subject of much debate.

Debating trade-offs is fine, even laudable. But a significant number of these
discussions aren't debates, they start out as or devolve into little more than
a series of personal attacks.

And this brings me to the third difference. Back in 1990 my tolerance for
condescension in general and ad-homs in particular was pretty high. (Although
not infinite, as some participants no doubt recall.) These days, not so much,
in  part because I've gotten old and cranky, but also because of changes in our
broader societal discourse. (Enough said on this last point.)

As a result of these differences I'm, shall we say, considerably more selective
in regards to participation. And I strongly doubt I'm alone in this.

> For example, in EAI, I find the decision to introduce non-identity
> content-transfer-encodings of composite MIME parts to be far more
> problematic than the problem it is intended to solve.  I wasn't
> around for the discussion, and probably would not have been able
> to change the outcome, but one way or the other someone would have
> had to walk away unhappy...

For me, EAI's end-to-end requirement is the far and away its most problematic
aspect, and one which in practice looks like it's going to  be ignored,
leading to egregiously incompliant implementations and all that implies.

Even so, I take your point. And I wonder to what extent support for
message/global is actually going to happen, although if just-send-UTF-8 comes
to rule the day you can just treat it as message/rfc822 and be done with it.

> (an alternative would introduce
> punycode encoding of localparts and break the sacrosanct rules about
> local parts being only understood at the destination, pick which
> axioms to violate, ... at least one).

First, having now done a full EAI implementation and seen how it interoperates,
it's quite clear to me that this is the approach we should have taken. (And
I say this as someone with their name on one of the core EAI documents.)

Second, I hope you realize that the local part interpretation rule is 
BS now and always has been. The problem is agents outside a given
ADMD at least have to be able to compare addresses for equality. (Mailing
lists managers are the obvious example of such an agent, but there are
countless others.)

And since most of these agents have elected to ignore ASCII case, anyone
who tried to take advantage of the supposed ability to have case-sensitive
local parts ends up in a world of hurt eventually.

This is going to be even more fun with EAI. Normalization of some sort is
pretty clearly needed, rule or no rule. And given the lack of specific advice
in the current standards, its not beyond possibility that case insensitivity
will again rule the day.

And yes, this is one of those "somethings" we should be discussing.

				Ned