Re: Volunteer needed to serve as IANA charset reviewer

John C Klensin <john-ietf@jck.com> Thu, 07 September 2006 13:59 UTC

Date: Thu, 07 Sep 2006 09:58:49 -0400
From: John C Klensin <john-ietf@jck.com>
To: Ned Freed <ned.freed@mrochek.com>
Subject: Re: Volunteer needed to serve as IANA charset reviewer
Message-ID: <9401EE90FE7BD970E71C9285@p3.JCK.COM>
In-Reply-To: <01M6VLE70BJ60008CX@mauve.mrochek.com>
References: <p06240600c124bdb12d16@[10.0.1.2]> <BDA09F0B9086491428F8F2FC@p3.JCK.COM> <01M6VLE70BJ60008CX@mauve.mrochek.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: Ted Hardie <hardie@qualcomm.com>, discuss@apps.ietf.org, ietf-charsets@iana.org
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org

Ned,

Several observations...

The first is that my note was intended as "is it time to review
RFC 2978 and the definition of the charset reviewer job".  Just
a question.  I had no expectation of discontinuing the current
registry, nor any realistic one of banning future registrations.
I think your comments, Mark's, and those of others are
consistent with my goal in asking the question.  What should be
done is another matter -- see below.

Second, while I agree with your concern about GB 18030 and its
ilk, what I learned in trying to put a network-Unicode
definition together (see draft-klensin-net-utf8-01.txt) is that,
for practical use, just specifying "UTF-8" may not be good
enough either.  For example, for at least most purposes other
than pure rendering, one probably wants to specify the
normalization form (ideally a "stable" one(++)) for text going
on the wire, so "Unicode, in Stable NFC, encoded in UTF-8" is
probably the level of specification we are looking for, not
"UTF-8".   I deliberately said "Unicode" in my note, not because
I thought it was adequate, but because I was certain that it
would expose this issue if we got this far.

If we really need to be pushing toward a specific encoding and
either the required specification of the normalization applied
or, preferably, a specific normalization, then RFC 2978 isn't
our only issue -- we need to review, and possibly reopen RFC
2277 and 3629 and might need to look at some other
specifications.  Realizing this was what caused me to
temporarily put the  network-Unicode draft on hold.

I am delighted that you would be willing to take this on -- I
think you have just exactly the right combination of skill and
experience with both character sets and Internet applications
protocols.

Your ability to do the currently-defined job, or a slightly
different one, is largely independent of whether the
specifications for new additions to the registry are what we
should have today.  Clearly, the registry serves the purpose of
reducing the odds of the same name being used, inadvertently, to
describe different things and that is a benefit in itself.  Mark
suggests that the definitions are not sufficiently consistent
and of high quality to be used for anything else.    I think we
need to figure out what we need (does the current quality of
registrations meet your criteria for "accurately and
consistently"?) and then respecify things so that we get it on
future reservations (and maybe can ask IANA to send out requests
for clarification to relevant existing ones).  Certainly your
notion of overhauling the current registry is consistent with
this... it even goes beyond what I had hoped there were energy
for.

You wrote...

> The plain fact of the matter is that we have done a miserable
> job of producing an accurate and useful charset registry, and
> considerable work needs to be done both to register various
> missing charsets as well as to clean up the existing registry,
> which contains many errors. I've seen no interest whatsoever in
> registering new charsets for new protocols, so to my mind
> pushing back on, say, the recent registration of iso-8859-11,
> is an overreaction to a non-problem. [**]

Speaking personally, we are in complete agreement.  

> Well, I have to say that to the extent we've pushed back on
> registrations, what we've ended up with is ad-hoc mess of
> unregistered usage. I am therefore quite skeptical of any
> belief that pushing back on registrations is a useful tactic.

Also agree, regardless of what my note appeared to say (in the
interest of opening up exactly this discussion).
 
    john

++ For those who have not been following that particular piece
of work, the Unicode Consortium now has a proposal for "Stable
Normalization Process" under public review (see
http://www.unicode.org/review/pr-95.html).  It differs from the
existing normalization forms by applying additional prohibitions
on unassigned code points and problematic sequences and
originated from discussions about the conditions under which
IDNA and Stringprep could be migrated from Unicode 3.2 to
contemporary versions.  I would encourage those in IETF who are
interested in these issues to review that proposal carefully and
comment on it as appropriate.

Re: Volunteer needed to serve as IANA charset rev… John C Klensin
Volunteer needed to serve as IANA charset reviewer Ted Hardie
Re: Volunteer needed to serve as IANA charset rev… John C Klensin
Re: Volunteer needed to serve as IANA charset rev… Tim Bray
Re: Volunteer needed to serve as IANA charset rev… Ned Freed
Re: Volunteer needed to serve as IANA charset rev… Keith Moore
Re: Volunteer needed to serve as IANA charset rev… Tim Bray
Re: Volunteer needed to serve as IANA charset rev… Ned Freed
Re: Volunteer needed to serve as IANA charset rev… Bruce Lilly
Re: Volunteer needed to serve as IANA charset rev… Keld Jørn Simonsen
Re: Volunteer needed to serve as IANA charset rev… Terje Bless
Re: Volunteer needed to serve as IANA charset rev… Mark Davis
Re: Volunteer needed to serve as IANA charset rev… Martin Duerst
Problems (and non-problems) in charset registry (… Martin Duerst
Re: Problems (and non-problems) in charset regist… Ned Freed