Re: [Ltru] UTF-8 In Registry Considered Harmful, or Doom! Doom! I Cry Doom!

"Mark Davis" <mark.davis@icu-project.org> Tue, 20 March 2007 21:10 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTlbU-0001tx-HN; Tue, 20 Mar 2007 17:10:48 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTlbS-0001tZ-E7 for ltru@ietf.org; Tue, 20 Mar 2007 17:10:46 -0400
Received: from wr-out-0506.google.com ([64.233.184.238]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HTlbM-0008DX-1E for ltru@ietf.org; Tue, 20 Mar 2007 17:10:46 -0400
Received: by wr-out-0506.google.com with SMTP id 37so27567wra for <ltru@ietf.org>; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=Q7AdF9uemGpgVegQVknFL7gcTn/3ErtM9zl5YZoKHannHpdKEjo85h4AQOnYTDhgYZ2J7GwkGX+h3YtJG5gkVEB4DZJyfnx0XOHZav9YEX237jqiCb9OakDi8F8/8Sjxu1EBHLxo2XdEy4RPDSUMXen2zeksZ/j8ZzuMm/Hk2Rw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=DtBel7y4SeeRshC3aAjH7o0AmUp603vdd5Um1I7LCMt7oJuOdBOZ2k1IydLLEtW97hDx/Fg+irAwylTBm5XSs0Iz2KYvGyfnL/c4xqOiVoxU6EEZYWS+LCki+5AaysUWFe7q0f7KydcWL3SEYwJApOv1mHBXc2HRZdBI9aPiffM=
Received: by 10.90.115.4 with SMTP id n4mr1551994agc.1174425039764; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
Received: by 10.114.196.2 with HTTP; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
Message-ID: <30b660a20703201410hd6de8b4l4d15674018fad562@mail.gmail.com>
Date: Tue, 20 Mar 2007 14:10:39 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
Subject: Re: [Ltru] UTF-8 In Registry Considered Harmful, or Doom! Doom! I Cry Doom!
In-Reply-To: <20070320194535.GH2981@mercury.ccil.org>
MIME-Version: 1.0
References: <20070320194535.GH2981@mercury.ccil.org>
X-Google-Sender-Auth: 00cf4e7ed82e2066
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 10ba05e7e8a9aa6adb025f426bef3a30
Cc: ltru@ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1784830229=="
Errors-To: ltru-bounces@ietf.org

I have some qualms along the same line. I don't think we should expect IANA
to do anything but paste some text into a position in the file. (Frankly, I
think even that is too much -- it would be better if we just handed them a
replacement file which we've reviewed, but people didn't like that idea.) So
processing NCRs and extracting UTF-8 is too much.

The only process for producing UTF-8 that seems tenable to me is where the
text is sent as an email file attachment, where the attachment is in UTF-8.

Mark

On 3/20/07, John Cowan <cowan@ccil.org> wrote:
>
> I wish to oppose in the strongest terms the proposal to change the
> registration process in RFC 4646bis to publish the registry in UTF-8.
> This is *not* a matter of a one-time conversion from an ASCII I-D to a
> UTF-8 registry.  It also affects every addition and update thereafter.
>
> The essential link in adding and modifying subtags is an email sent from
> the Language Subtag Reviewer to IANA requesting the change.  Now either
> (1) that email will contain NCRs or (2) it will not.  In case (1), a
> *continuing* capacity to convert the NCRs to UTF-8 on the part of IANA is
> assumed, something they seemingly do not have today.  Are we confident
> that this will reliably be performed as part of registry updating?
> We are creating a whole new risk for ourselves.  I foresee a time when
> the registry contains a mixture of UTF-8 and NCR representations.
>
> In case (2), the email contains the characters directly.  Then either
> (2a) it is in UTF-8, or (2b) it is in some other encoding sufficient
> to represent the particular characters needed.  In either subcase, we
> are taking risks: email is notoriously the most i18n-broken part of the
> Internet, and probably every one of us has received at least one broken
> email of this kind.  In subcase (2b) we also add the risk that IANA (or
> their mail-processing tools) will fail to interpret the encoding, or will
> interpret it wrongly due to problems at the sender's end.  We are, once
> again, creating a whole new risk for ourselves.  I foresee a time when
> the registry contains a mixture of UTF-8 and non-UTF-8 representations.
>
> There is also the matter of the implicit contract which RFC 4646
> created about the registry: that it would be in ASCII with NCRs.
> Are we *convinced* that we have *no choice* but to break this contract?
> The current system may not produce pretty results, but it *works*.
> Let's not break it gratuitously.
>
> Yours emphatically,
>
> --
> Mos Eisley spaceport.  You will never           John Cowan
> see a more wretched hive of scum and            cowan@ccil.org
> villainy -- unless you watch the                http://www.ccil.org/~cowan
> Jerry Springer Show.   --georgettesworld.com
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>



-- 
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru