Re: [Ltru] UTF-8 In Registry Considered Harmful, or Doom! Doom! I Cry Doom!
"Mark Davis" <mark.davis@icu-project.org> Tue, 20 March 2007 21:10 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTlbU-0001tx-HN; Tue, 20 Mar 2007 17:10:48 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTlbS-0001tZ-E7 for ltru@ietf.org; Tue, 20 Mar 2007 17:10:46 -0400
Received: from wr-out-0506.google.com ([64.233.184.238]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HTlbM-0008DX-1E for ltru@ietf.org; Tue, 20 Mar 2007 17:10:46 -0400
Received: by wr-out-0506.google.com with SMTP id 37so27567wra for <ltru@ietf.org>; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=Q7AdF9uemGpgVegQVknFL7gcTn/3ErtM9zl5YZoKHannHpdKEjo85h4AQOnYTDhgYZ2J7GwkGX+h3YtJG5gkVEB4DZJyfnx0XOHZav9YEX237jqiCb9OakDi8F8/8Sjxu1EBHLxo2XdEy4RPDSUMXen2zeksZ/j8ZzuMm/Hk2Rw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=DtBel7y4SeeRshC3aAjH7o0AmUp603vdd5Um1I7LCMt7oJuOdBOZ2k1IydLLEtW97hDx/Fg+irAwylTBm5XSs0Iz2KYvGyfnL/c4xqOiVoxU6EEZYWS+LCki+5AaysUWFe7q0f7KydcWL3SEYwJApOv1mHBXc2HRZdBI9aPiffM=
Received: by 10.90.115.4 with SMTP id n4mr1551994agc.1174425039764; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
Received: by 10.114.196.2 with HTTP; Tue, 20 Mar 2007 14:10:39 -0700 (PDT)
Message-ID: <30b660a20703201410hd6de8b4l4d15674018fad562@mail.gmail.com>
Date: Tue, 20 Mar 2007 14:10:39 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: John Cowan <cowan@ccil.org>
Subject: Re: [Ltru] UTF-8 In Registry Considered Harmful, or Doom! Doom! I Cry Doom!
In-Reply-To: <20070320194535.GH2981@mercury.ccil.org>
MIME-Version: 1.0
References: <20070320194535.GH2981@mercury.ccil.org>
X-Google-Sender-Auth: 00cf4e7ed82e2066
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 10ba05e7e8a9aa6adb025f426bef3a30
Cc: ltru@ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1784830229=="
Errors-To: ltru-bounces@ietf.org
I have some qualms along the same line. I don't think we should expect IANA to do anything but paste some text into a position in the file. (Frankly, I think even that is too much -- it would be better if we just handed them a replacement file which we've reviewed, but people didn't like that idea.) So processing NCRs and extracting UTF-8 is too much. The only process for producing UTF-8 that seems tenable to me is where the text is sent as an email file attachment, where the attachment is in UTF-8. Mark On 3/20/07, John Cowan <cowan@ccil.org> wrote: > > I wish to oppose in the strongest terms the proposal to change the > registration process in RFC 4646bis to publish the registry in UTF-8. > This is *not* a matter of a one-time conversion from an ASCII I-D to a > UTF-8 registry. It also affects every addition and update thereafter. > > The essential link in adding and modifying subtags is an email sent from > the Language Subtag Reviewer to IANA requesting the change. Now either > (1) that email will contain NCRs or (2) it will not. In case (1), a > *continuing* capacity to convert the NCRs to UTF-8 on the part of IANA is > assumed, something they seemingly do not have today. Are we confident > that this will reliably be performed as part of registry updating? > We are creating a whole new risk for ourselves. I foresee a time when > the registry contains a mixture of UTF-8 and NCR representations. > > In case (2), the email contains the characters directly. Then either > (2a) it is in UTF-8, or (2b) it is in some other encoding sufficient > to represent the particular characters needed. In either subcase, we > are taking risks: email is notoriously the most i18n-broken part of the > Internet, and probably every one of us has received at least one broken > email of this kind. In subcase (2b) we also add the risk that IANA (or > their mail-processing tools) will fail to interpret the encoding, or will > interpret it wrongly due to problems at the sender's end. We are, once > again, creating a whole new risk for ourselves. I foresee a time when > the registry contains a mixture of UTF-8 and non-UTF-8 representations. > > There is also the matter of the implicit contract which RFC 4646 > created about the registry: that it would be in ASCII with NCRs. > Are we *convinced* that we have *no choice* but to break this contract? > The current system may not produce pretty results, but it *works*. > Let's not break it gratuitously. > > Yours emphatically, > > -- > Mos Eisley spaceport. You will never John Cowan > see a more wretched hive of scum and cowan@ccil.org > villainy -- unless you watch the http://www.ccil.org/~cowan > Jerry Springer Show. --georgettesworld.com > > _______________________________________________ > Ltru mailing list > Ltru@ietf.org > https://www1.ietf.org/mailman/listinfo/ltru > -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] UTF-8 In Registry Considered Harmful, or D… John Cowan
- Re: [Ltru] UTF-8 In Registry Considered Harmful, … Mark Davis
- [Ltru] Re: UTF-8 In Registry Considered Harmful, … Frank Ellermann
- RE: [Ltru] UTF-8 In Registry Considered Harmful, … McDonald, Ira
- Re: [Ltru] UTF-8 In Registry Considered Harmful, … David Conrad
- [Ltru] Re: UTF-8 In Registry Considered Harmful, … Doug Ewell