Re: Possible BofF question -- I18n (was: Re: Possible OBF question -- I18n)

Nico Williams <nico@cryptonector.com> Sun, 03 June 2018 06:08 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04ABC124F57; Sat, 2 Jun 2018 23:08:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AILG_UJczIOy; Sat, 2 Jun 2018 23:08:37 -0700 (PDT)
Received: from homiemail-a128.g.dreamhost.com (homie-sub4.mail.dreamhost.com [69.163.253.135]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 79AF31267BB; Sat, 2 Jun 2018 23:08:37 -0700 (PDT)
Received: from homiemail-a128.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a128.g.dreamhost.com (Postfix) with ESMTP id 9377C30030403; Sat, 2 Jun 2018 23:08:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=DUtSF3cvXFPaGy /xA4Np0ZJbeJ4=; b=ORYgHJpSRbB+q23uLf19Ins/SaD9Sp+YWfFKqs16ZpEMgf 2nDKaXYOy6L4lm142VFCJOa+jKndovAhhsBmHtdF3VbX0z3eCKx74f0fK64XtF28 iRM4ANIondDr5df883G0Tlk1A0hlIRizkxnYYNPHA4pQuY5Q/izVaQ4e3sF8A=
Received: from localhost (50-232-84-66-static.hfc.comcastbusiness.net [50.232.84.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a128.g.dreamhost.com (Postfix) with ESMTPSA id C7CEC30030401; Sat, 2 Jun 2018 23:08:35 -0700 (PDT)
Date: Sun, 03 Jun 2018 01:08:33 -0500
From: Nico Williams <nico@cryptonector.com>
To: John C Klensin <john-ietf@jck.com>
Cc: IETF general list <ietf@ietf.org>, art-ads@ietf.org
Subject: Re: Possible BofF question -- I18n (was: Re: Possible OBF question -- I18n)
Message-ID: <20180603060831.GL14446@localhost>
References: <862E5704FEE30E9EFA684961@PSB> <20180602000921.GJ14446@localhost> <915C420BB5B4877B35C23EEA@PSB>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <915C420BB5B4877B35C23EEA@PSB>
User-Agent: Mutt/1.5.24 (2015-08-30)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/Sotv4qqO-ALNDo8A3ek_7UEVx8s>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2018 06:08:40 -0000

On Sat, Jun 02, 2018 at 10:15:51AM -0400, John C Klensin wrote:
> --On Friday, June 1, 2018 19:09 -0500 Nico Williams
> <nico@cryptonector.com> wrote:
> 
> >...
> > Please be specific.  What failures have we had.
> >...
> 
> >> * Mentions of UTR#46 in general I18N contexts may be close to
> > 
> > Finally.  900 words and this is the one specific thing, but
> > not that specific either:
> >...
> >  Is this an elliptical way of alluding to confusables issues?
> 
> No.  "Confusables" are, IMO. at most a symptom.  See below for a
> bit more.

Ok.

> >...
>  
> > Can you boil things down to a small set of bullet items?
> > 
> >  - IDNs and confusables?
> > 
> >  - I18N and human rights?
> > 
> >    (what's that got to do with Internet protocols?  sure we
> > have to make    sure they internationalize, but we already
> > knew that long before    anyone made the connection to human
> > rights)
> 
> And that is consistent with, and complementary to, what I said
> in my note to Neils.
> 
> >  - something else?  what?
> > 
> > Please also list any high-profile failures of the IETF in this
> > space.
> > 
> > Keep it brief out of consideration for your readers' time.
> >...
> 
> I'm sorry.  You are asking for details and asking that I/we be
> brief.  If we are brief, you denounce us for being non-specific.
> It is hard to satisfy contradictory requests.

No, in the previous reply you provided no specifics but were very
long-winded.  Let's see how it goes this time.

>    ----
> 
> More generally, I'm sorry but I had assumed, given your
> extensive comments and claim of having developed expertise, that
> you had been following these activities.  Without exception,
> every item on the list below has been identified on at least the
> most relevant one of the IAB i18 list (i18n-discuss@iab.org),
> the former IDNABIS, EAI, or PRECIS list, or this (IETF discuss)
> lists. Many, if not most, have had attention drawn to them on
> multiple lists in the hope of alerting people who are not
> following all of them.  There have also been extensive
> discussions on several of these issues, including lack of
> progress, in the ICANN SSAC, Universal Acceptance, ccTLD/CCNSO,
> and other contexts, and at IGF and other more political forums,
> but those are not IETF venues.   
> 
> If you, as an expert, have been unaware of those issues and

Be careful.  I never claimed ot be an expert.  I said I'd learned
plenty, and I said this in the context of your and Levine's claim that
we have no experts: I explained that many of us can specialize as
needed.  You essentially deny this.  You seem to claim that you (or
someone else?) are a priest of I18N and that the IETF has no one else
who could help.

Needless to say I find that attitude offensive.  Not just because I
think I can specialize as needed, but because damn it, there had better
be lots of participants at the IETF who also can.

I had a boss once who said "we are generalists who can specialize".  I
already felt that way then, but that was a great synthesis, and I really
like that formulation.  Not every person is a generalist who can
specialize as needed, but I'm certain that many of the participants at
the IETF (and their colleagues at their places of employment) are.

> discussions and there are a significant number of other experts
> like you, then the hypothesis that we lack sufficient expertise
> may be false and should be replaced by a hypothesis that we have
> not been successful in getting a sufficient number of the
> available experts to engage.  That would probably not change the
> need to do something, but should be a much easier problem to
> solve.

I've explained that I've not been following the IETF lately.  It's
absolutely fair to ask for a list of issues you think are urgent, and
it's not OK to play gotcha and say "aha! you don't know?  you know
nothing!1!!".

Do you even read your own posts?  They drip in condescention.

Anyways, let me give you a few I18N credentials, as it were, to
demonstrate my point that I learned something useful (not that I am an
expert) in this area of which I once knew nothing.

ZFS implements form-insensitive filename comparisons.  It does this
because I insisted on it at Sun way back when.  It was my response to
the mess that HFS+ made by normalizing to NFD (well, something very
close to NFD) while OS X's input methods generally produce precomposed
codepoints.  It took some doing to get there.  I did none of the
implementation work, mind you -- just some of the code reviews and
pointed out one important optimization in the core loop of the
u8_textprep() function in Solaris/Illumos -- but I did win the argument,
and ZFS does this because of that.

(I should add that ZFS also hashes filenames, and the hash function
normalizes as it goes.  The net effect is that hash table lookups are
also form-insensitive.)

I've pushed form-insensitive comparison in the IETF, though here no one
wants to hear about it.  IIRC the NFSv4 WG ignored my pleas on this.
I'm quite certain that you've read some such posts from me, and you
could easily find them if you wish.

Of course, form-insensitivity isn't always the right or even a workable
answer.  In some cases normalizing early is better, or the only option.
And often that's a good enough answer, but for filesystems it really is
better to do form-insensitive string comparison.

It's very annoying that we try to specify normalization in NFSv4,
WebDAV, ... when the functionality needs to be deep down in the
filesystem -- NFSv4, WebDAV, and many other protocols are just
protocols, and their implementations are almost never so closely bound
to a filesystem implementation that specifying what to do about
normalization in the protocol spec could possibly be the right thing to
do.  I think it is a serious design mistake to require normalization at
the remote filesystem protocol level rather than to do so at the
filesystem layer.

A second credential would be some old blog posts I've written about this
topic:

http://cryptonector.com/2006/12/filesystem-i18n/
http://cryptonector.com/2010/04/on-unicode-normalization-or-why-normalization-insensitivity-should-be-rule/

I won't bother digging up my many posts on IETF lists on these topics.
You can look for them yourself.

> In any event, and with the understanding that this is almost
> certainly not a complete list, we'd had:

Yes, with that unnecessary, insulting, and pointless aside of yours out
of the way, let's get to the list!  I'm itching.

> * Identification of an incorrect assumption, made in both the
> IDNA2003 and IDNA2008 designs, about the scope and effectiveness
> of normalization in Unicode, creating security risks as well as
> heightened risk for user-perceived errors in comparisons.  That
> issue was initially identified differently, with the original
> identification turning out to be a subset or symptom of the
> larger one.   The problem generated two IAB statements, one of
> which essentially acknowledges that no progress had been made, a
> BOF, and a few iterations of an I-D which we have not been able
> to progress.

OK.  Sounds important.  The BoF should link to these IAB statements.

> * The freeze Patrik identified, which is, in some respects, just
> another symptom of our inability to engage on the above issue.

Slow review/publication process?

> * A draft clarifying the responsibilities of zone administrators
> ("registries") under IDNA2008 which we have been unable to
> process.

Yes, *this* I think is urgent.  But: who should do this?

ICANN can require that registries set out some policies.  Registries can
require that registrars follow them.  The IETF cannot do any of that --
we can only give them advice.

ISTR participating in threads here where you (I think) said that the
registries don't want to hear anything about confusables from us.  Is
that still the case?  Has there been any change?  Correct me if my
memory is wrong.

> * A draft proposing dealing with a number of issues by creating
> a "troublesome characters" list, which we have not been able to
> process, even to the extent of a serious discussion of whether
> such a list under IETF auspices is desirable and can be
> maintained.

Yeah, I don't think the IETF is the right place for this.  Why would it
be?  No, we can give the UC some feedback, but this is squarely their
job.  Even if they don't want to do it, we can't quite do it for them.

> * Difficulty bringing the EAI/SMTPUTF8 and PRECIS work to
> conclusion with work about which everyone has high confidence
> due to the WGs running out of energy and active, informed, and
> contributing, participation.
> 
> * Difficulty getting adequate input and review for work being
> done in LAMPS on non-ASCII characters in X.509 certificates.

ISTM that the way you get this done is by having shepherds and/or ADs go
look for experts they can pester into doing these reviews.

I've been asked before to review things like SASLprep (which I did).

This seems like a failure of the shepherds and ADs, not a systemic
failure.  But if it is systemic, then the obvious thing to do is to make
participants pay a bit of a cost of doing business.  Want your documents
reviewed?  Then review or find someone to review these others.  The IETF
already does this informally, and it should work here.

> * The "many IDNA standards" problem Patrik and I mentioned
> earlier in the context of UTR#46.   Given the variety of
> implementations and interpretations, it seems clear that the
> IAB/IETF should be considering how to get the message out, but
> it has not been possible to even start that discussion.  For
> this issue and the next one, it is possible to claim that the
> IETF's responsibility stops when a specification is put out
> there and dealing with variations, alternate interpretations,
> and plain non-conformance is someone else's problem.   I see
> that as an abdication of responsibility, especially when IETF
> may not have done a good enough job of explaining the reasons
> for its decisions.  YMMD.

We don't have a protocol police.  I'm not at all sure what you have in
mind.

Perhaps the ISOC should fund a para-trooper team to help out with fixing
[open source] implementation issues.  Maybe they can develop the cache
necessary to get non-open source implementors to pay them attention.  I
think that's a great idea.  But short of that, I don't see what we can
do here but cajole people.

> * Similarly the EAI WG, after running experiments after its
> first-round work, came to the conclusion that attempts to
> convert local parts in transit (similar to IDNA), especially
> using Punycode encoding, was risky, unlikely to work well in
> many important cases, and just plain a bad idea.  Yet we've seen
> heavily-promoted implementations that do just that, claiming
> that it works well because their clients in users interact well
> with their servers and mail stores.   We could be doing a better
> job of explaining why that is a bad idea.

Maybe they aren't wrong.  Maybe we are.

> * As another element of the UTR#46-related problems, IDNA2008's
> rule structure prohibits the use of symbols in domain name
> labels.   We've seen some top-level registries violate that rule
> and sell names containing some of those code points,
> particularly emoji, and UTR#46 specifically authorized emoji.

"Meh"

See comments above about protocol police and our lack of authority over
the registries, or, really, anything other than the RFC-Editor and IANA.

(Do I like the idea of emoji in domainnames?  Not really, but I'm no sure
I care.  What difference does it make to me?  None.  Sure, I can
recognize domainnames by their form, and that gets harder when people
use emojis in their domainnames -- but that's also true if they just use
Chinese characters.  So I don't care.  It's very likely that this is the
calculus that's leading to this rule breaking.)

> Whether they do so or not, ICANN can put some pressure on
> second-level registrations but,  by design and their own
> decisions, cannot say very much about zones further down the
> tree.  Should IETF or IAB be following up some of the
> explanations that have been made in ICAAN or assorted mailing
> list with a document explaining more broadly why the IDNA2008
> rules are the way they are (including the rules for lookup-time
> checking, which UTR#46 has been interpreted as discouraging) and
> why people should be paying attention to them?  I don't know,
> but believe that the discussion has not occurred and that, if
> work were done on developing such a description, there would be
> no way to discuss and progress it is a problem.

If the market is ignoring our rules, then simply complaining about that
isn't going to solve anything.  I think we may need to re-open the
specific issues in IDNA.

This doesn't strike me as urgent, honestly, but I suppose we can differ
on this.

> Again, not a complete list.   I suggest that each of those is a

Sure.  A small-but-incomplete list is a lot better than just some
handwaving.  And it's much better than saying that not knowing that list
makes one an ignoramus.

> significant problem and that the combination is more so, at

I thought the word urgent was used.  Now we're down to "significant"?

> least unless one doesn't believe that addressing i18n usability,
> security, and interoperability issues is important.  I also

Yea, I think so too.  But note that you've not given examples of
breakage.  You've only handwaved a lot.

> suggest that many of those issues should be considered important
> enough to justify the discussion proposed for this BOF even if
> none of the others were out there, which I think is Patrik's
> point.   YMMD.

I don't object to the BoF.  I wanted a list of specific items to
discuss.

> Patrik has addressed your category list question.  I think he is
> right although the more detailed list above should fill in some
> of the blanks on the protocols side.   I've explained my
> position on the human rights side.   And, btw, I think the
> confusion issue is actually two separate sets of problems
> depending on whether the confusion is deliberate or not.
> 
> I've read your comments about confusion.  In the interest of not
> making this note even longer and because I don't want to stray
> into trying to solve specific problems, I'm not going to respond
> except to suggest that there is ample evidence that parts of
> your analysis are just incorrect.

No, please do respond to that.  Feel free to start a new thread.  That's
a very specific topic that is ripe for a deep-dive.

Nico
--