Re: [idn] ietf london idn wg meeting minutes

"James Seng/Personal" <James@Seng.cc> Wed, 22 August 2001 04:04 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA08164 for <idn-archive@lists.ietf.org>; Wed, 22 Aug 2001 00:04:13 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 15ZOqC-000Bw1-00 for idn-data@psg.com; Tue, 21 Aug 2001 20:38:04 -0700
Received: from [203.126.116.228] (helo=mail.i-dns.net) by psg.com with esmtp (Exim 3.33 #1) id 15ZOq4-000Bvr-00 for idn@ops.ietf.org; Tue, 21 Aug 2001 20:37:57 -0700
Received: from jamessonyvaio (unknown [203.126.116.227]) by mail.i-dns.net (Postfix) with SMTP id A116CFFC10; Wed, 22 Aug 2001 11:38:58 +0800 (SGT)
Message-ID: <049001c12abb$c80c8840$dd00a8c0@jamessonyvaio>
From: James Seng/Personal <James@Seng.cc>
To: idn@ops.ietf.org, Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>
References: <5.1.0.14.1.20010821122941.03451718@127.0.0.1>
Subject: Re: [idn] ietf london idn wg meeting minutes
Date: Wed, 22 Aug 2001 11:37:34 +0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

One note:

The minutes is combination of notes from David Lawrence and Donald E.
Eastlake. Thanks for been the wg scribes.

-James Seng

----- Original Message -----
From: "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
To: <idn@ops.ietf.org>
Sent: Wednesday, August 22, 2001 2:17 AM
Subject: [idn] ietf london idn wg meeting minutes


> included is the first version of the idn wg meeting minutes.
> please send editorial comments to me, substantive comments to the
mailing list.
>
> Marc.
>
> ===========================================
>
> IETF IDN Working group session
> 9 August 2001
> London, England
>
> Agenda Bashing:
>
> Agreement on the floor to cut
>   Reordering, nameprep update, Uname proposal, Hangulchar, tsconv
> from planned agenda.
>
> ================================================================
>
> WG UPDATE, Marc Blanchet
>
> Coordination with other groups/efforts:
> - IETF apps area
>     - "requirements" for encoding: ACE or UTF8
>     - directory efforts: directory@apps.ietf.org
> - Unicode/ISO
>     - Any modifications to Unicode/ISO tables should be done by those
>     parties, not IETF
> - IETF dnsext WG
>     - Any modification to DNS protocol should be discussed in dnsext
> - ICANN/IANA
>     - Policies
>
> - Pool W, pool of documents that identify core of interest by WG
> - Currently:
>     - requirements
>       co-chairs believe there is a wg rough concensus and intend to
forward
>       it to IESG for Informational.
>     - idna
>     - nameprep
>     - dude
>     - aceid
>     - jpchar
>     - ace-eval-jp
>     - mace
>     - uname
>     - tsconv
>     - udns
>     - amc-ace-z
>     - hangeulchar
>     - lsb-ace
> - Today's focus is on standards track proposals
>
> ================================================================
>
> ACE EVALUATION WITH IDNs ALREADY REGISTERED, Yoshiro Yoneya
>
> - Done by CNNIC, KRNIC, TWNIC and JPNIC with data they have for
>    registered domain names, focusing on ACEs in Pool W.
> - Most important evaluation criterion to study is to maximize number
>    of characters, raw speed is less important because nameprep is the
>    slow stage.
> - Long IDNs (more than 15 Han characters) are already registered.
> - Evaluated ACEs: DUDE, AMC-ACEZ, MACE and RACE
> - Focus on DUDE and AMC-ACE-Z with MACE&RACE as reference
>
> Graphs of efficiency of domain names from each of KRNIC and TWNIC,
> where AMC-ACE-Z shows best compressions
>
> Charts showing that the four NICs consider AMC-ACE-Z to be either good
> or very good, while others were "bad" or "very bad" for at least one
NIC.
>
> MACE co-authors (including the presenter, Yoneya-san) support
> AMC-ACE-Z.
>
> Recommendation from the study is: AMC-ACE-Z
>
> ================
> WG Questions for sense of the group:
>
> Question: If there is a need for an ACE, choose one:
> - DUDE                                  few hands
> - AMC-ACE-Z                             most hands
> - MACE                                  (removed at request of
authors)
> - don't care but want an ACE chosen     fair bunch of hands
>
> Erik Nordmark: question is, if you use an ACE, this is the one. Not
> saying you need to use an ACE anywhere.
>
> ?: What is re-ordering?
>
> James Seng: pre-processing to make more frequently used chars more
> compressed.
>
> Paul Hoffman: Not binding vote. Should be comfirmed on mailing list.
>
> Concensus: AMC-ACE-Z (with many don't care so long one is choosen)
>
> Should we do reordering?
> - Yes                                   some hands
> - No                                    some hands
> No clear result of poll.
>
> Erik Nordmark: A lot fewer people participated in the former poll but
> not the latter.  Why?
>
> Bill Manning: We read the draft but didn't understand it, and need to
> read it again.
>
> Paul Hoffman: Don't understand the re-ordering draft. Does not
> broaden to other scripts.
>
> Larry Masinter: Re-ordering adds complexity.
>
> Kilnam Chon: Re-ordering is critical for CJK but add complexity.
>
> Paul Hoffman: This draft adds complexity, so perhaps people are
waiting
> to decide how to judge whether the added complexity is worth it.
>
> Eric Chen: This is just intended to help CJK. Most of the interest
> is in CJK. Why not?
>
> James Seng: What I'm hearing is that the authors should do a
> cost/benefit analysis, but it is clear the draft is not ready to move
> forward.
>
> Erik Nordmark: Can someone do a pro/con analysis draft, or someone do
pro and
> someone con, to help drive the discussion on the mailing list?
>
> Paul Hoffman: Let's make Adam [Costello] do it. [laughter]
>
> Kilnam Chon: This straw poll process isn't really valid because not
> enough representation from people for whom this is really important.
> There's always a trade-off.
>
> James Seng: Could someone who voted against the lsb draft just explain
> why you are against it?
>
> Paul: I'd rather someone else did, but I will ... the reordering draft
> is somewhat of a hack to optimize for certain scripts, but it is at
> the cost of other scripts, isn't really generalized, and there has
> been no analysis of how beneficial it is for DUDE and AMC-ACE-Z.
>
> Dongman Lee: The author was not trying to propose this as a
> generalized mechanism.  It is not surprising that since CJK is driving
> internationalization, that proposals would be specific to that.
>
> Ted Hardie: As Paul pointed out, this has different effects on
> different scripts, but now that we are focused on one ACE we can ask
> more specifically for the authors to focus on just how it affects
> AMC-ACE-Z.
>
> Concensus: discuss the reordering on mailing list and request authors
>   of ACE and reordering to come to a proposal with analysis.
>
> ================================================================
>
> MATCHING (NAMEPREP)
>
> - Need for a standardized pre-processing step regardless of what IDN
>     protocol we choose?
>     Yes      lot of hands
>     No       one hand
>
> (Discussion clarified the question from the original.)
>
> Other comments:
>
> Patrik Faltstrom: Doesn't preclude other pre-processing before it,
> which some people have worried it would.  But even so, IETF really
> needs to have one standard way of processing Unicode.
>
> James Seng: When you say one standard way, do you mean one with
flexibility
> for locale, or essentially fixed?
>
> Patrik Falstrom: Essentially fixed.
>
> Dave Crocker: I thank Patrik for his comments that helped clarify
> things for me.  I used to be resistant to it, but am coming to accept
> it.  It is quite a bit like the case-insensitive/sensitive thing we're
> so used to in ASCII.  There are two processes here: case-mapping and
> determining the legal character set.  Keep them cleanly separate.
>
> ? Russell:
>
> Wenhui Zhang: Should have a standard that includes where local issues
> can be defined, which can include their standarized pre-processing.
>
> ?: Goal of working group is noble, but are trying to kill all the
> birds with one stone, and so we need a really large stone.  So many
> legacy systems are optimized for their local languages, and will have
> a lot of pain to switch to what is being planned.  They don't have
> much of a voice here, those who are going to suffer most.
>
> ?: Look into what happened in the LDAP group, how they ended up with a
> bunch of language-specific things.  It is difficult, but it can be
> done, and since it has already been solved, build on it.
>
> Erik Nordmark: Can we get back on the topic of this question?  We seem
to be
> wandering into the general requirements area.
>
> POLL: Many to 1 in favour of standard pre-processing step.
>
> Post poll:
>
> John Klensin: I can agree that a standard pre-processing step is
> needed, but I can't agree if that necessarily means having a single
> binary result even in ambiguous situations.  Very concerned about
that.
> This working group might be resulting in something that is totally
> irrelevant.
>
> Eric Brunner-Williams: The ambiguity need not exist in "uniprep" (the
> first of the stages observed by Dave Crocker), the problem arises in
> the other part.
>
> Paul: I think we should now work toward an architecture that includes
> pre-nameprep, nameprep and post-nameprep.  The middle one can be
> generally standardized while the other stages need not be.
>
> Erik Nordmark: Addressing John's concern of irrelevance, I can see how
> this work would eventually be superseded by something better, but that
> doesn't mean we have to stop doing this very useful work now.
>
> Dave Crocker: Dealing with "language" is out of scope for this group,
this
> working group should just be about expanding the set of strings that
> are usable as domain names.  In that context canonicalization makes a
> lot of sense, but not when we start talking about natual language.
>
> Ted Hardie: I have to take exception to Paul describing a system that
> is not standarized end-to-end; it can't include processing that is not
> standardized.  Also agree with Dave that we can't work with natural
> language, we don't have the expertise.
>
> ?: Rigorously avoid natural languages.
>
> Eric Chen: We need to consider natual language!
>
> Dave Crocker: The scope is very narrow and does not include languages.
>
> Harald: "Yes."
>
> Paul Hoffman: Please defer all questions of language, there will be a
draft
> soon that addresses where it should be addressed.
>
> Next step will be for the authors to clarify the relation between
> the various proposals for processing into a cohesive architecture,
> namely nameprep, tsconv, jpchar, hangeulchar.
>
> ================================================================
>
> PROTOCOL PROPOSALS, Dave Crocker
>
> Dave's Disclaimers:
> - System oriented person
> - Not a Unicode expert, or even naif
> - Entirely biased -- wanted to be objective, but failed
>
> IDN Task:
> - Enhance range of domain names that are useful
> - Not human "name"
> - Not "language"
> - Has no sets
> - Requires: fairness, efficiency, reliability, transition, ...
>
> The Usual Suspects:
>       Encoding            Approach
> 1. ACE only             IDNA
> 2. UTF-8 only           IDNA-mod, uDNS
> 3. ACE then UTF-8       IDNA-mod, uDNS
> 4. ACE & UTF-8 both     uDNS, uNAME
> 5. Anything goes        uNAME
>
> Encoding efficiency:
> - ACE is an encoding scheme
> - UTF-8 is an encoding scheme
> - Both map many bits to a variable length string
> - All variable length strings are unfair to some poeple
> - Fair vs unfair unfairness:
>     - longer mapping mean shorter names
>     - shorter names restricted to information dense character sets
>
> Encoding comparison:
> 1. ACE is three minuses bad.
> 2. UTF-8  is two minuses bad.
>
> Charts showing that there are a lot of modules in both systems, and we
> have to worry about all the modules in both systems.
>
> ACE has an extraordinarily minimal amount of change necessary to make
> an IDN useful, just two applications.  This is about as good a
> transition scheme as you can possibly get.
>
> UTF-8 is an extreme in the opposite direction, it requires that
> everything work end-to-end.
>
> 1. ACE only four pluses good
> 2. UTF-8 only five minuses bad
>
> Transition Interactions:
> ----------------------------------------------------------------------
---
> ----------------------------------------------------------------------
---
>                   Client->  Server->           ACE              UTF-8
>                   Server    Client
> ----------------------------------------------------------------------
---
> 1. old client   old dn    new dn          transparent     UTF-8 and
ACE
>      new server                                           maybe break
client
> ----------------------------------------------------------------------
---
> 2. new client,  new dn    old dn          transparent     break
server?
>      old server
> ----------------------------------------------------------------------
---
> ----------------------------------------------------------------------
---
>
> Specification comparison:
>
> ----------------------------------------------------------------------
---
> ----------------------------------------------------------------------
---
>                   Efficiency          Transition
Risk/Operational
>                                                               Expense
> ----------------------------------------------------------------------
---
> IDNA (ACE)     bad(data)            automatic               none
> ----------------------------------------------------------------------
---
>                                       how to detect
> uDNS (UTF-8)   poor(data)           when to use ACE?        high
>                                       (poorly defined
>                                       and not realistic)
> ----------------------------------------------------------------------
---
>                                       unstated
> uName (both)   bad (round trip)     (and based on CNRP,     very, very
>                                       with no meaningful      high
>                                       deployment)
> ----------------------------------------------------------------------
---
> ----------------------------------------------------------------------
---
>
> Olafur: Hard for me to say this to you Dave, given our history, but
> good job.
>
> Harald: Think you underestimate the cost of ACE a bit, in that leakage
> will confuse users.  But UTF-8 leakage will also confuse users, but
> likely even a bit more! But the ranking is still good.
>
> Paul: uName doesn't actually have CNRP in it; it was put in the draft
> and then explicitly shot down in the draft.  It uses a new RR, but the
> end result is pretty much the same as far as your conclusions go.
>
> Erik Nordmark: Can we vote on it without a UTF-8 draft in the pool?
> Would need a draft very fast.
>
> Poll:
> - idna?
>     Yes   Most
>     No    Some
> - udns?
>     Yes   Few
>     No    Most
> - uname?
>     Yes   Few
>     No    Most
>
> Interpretation by Harald and Marc was that: IDNA was the only strongly
> supported proposal in the room and the other two had
> strong opposition. Interpretation was agreed by the floor.
>
> Nameprep discussion back (some time remaining)
>
> Paul Hoffman: Good (from a marketing sense) user interfaces will do a
lot of
> mucking with input.  Really should have it defined how and where they
> can do that. If you change machine, different local translation
> tables can yield different names.
>
> James Seng: It can be very hard to determine what local conversion
> option to turn on. Not sure if this wg has capability to deal
> with codepoint matching. We need to reference code points outside
> the IETF, at Unicode Consortium.
>
> Paul Hoffman: Unicode has put mapping tables out of scope.
>
> Harald Alvestrand: This working group is internationalized access to
domain
> names, not localized.  This group is trying to specialize what a
> client must do no matter where it is in the world.  I would accept a
> statement that the relationship between the pre-processing drafts.  It
> has to be made mandatory though or it should not be part of the output
> of this group.
>
> Wenhui Zheng: IDNA draft should be explicit where the local
> interface/mapping should be done.
>
> Eric Chen: We have built a house and opened some gates but not others.
> Some languages can come in and others can not. IDNA should open its
> gate to allow other languages to do their thing.
>
> ================================================================
> NEXT STEPS
>
> - AMC-ACE-Z as chosen ACE.
> - Reordering to be discussed on mailing list.
> - relation between nameprep/tsconv/hanguelchar/jpchar/stringprep
>     to be consolidated into one architecture.
> - Go forward with IDNA.
>
>
>