Re: UTF-8 text

James Cloos <cloos@jhcloos.com> Thu, 18 April 2013 18:13 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D4CB721F8A0C for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.447
X-Spam-Level:
X-Spam-Status: No, score=-10.447 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YX66NUocRxxq for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 29A9721F89B0 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UStJs-0003Js-7F for ietf-http-wg-dist@listhub.w3.org; Thu, 18 Apr 2013 18:12:28 +0000
Resent-Date: Thu, 18 Apr 2013 18:12:28 +0000
Resent-Message-Id: <E1UStJs-0003Js-7F@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1UStJp-0003IZ-9I for ietf-http-wg@listhub.w3.org; Thu, 18 Apr 2013 18:12:25 +0000
Received: from eagle.jhcloos.com ([207.210.242.212]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1UStJo-0005X0-8Q for ietf-http-wg@w3.org; Thu, 18 Apr 2013 18:12:25 +0000
Received: by eagle.jhcloos.com (Postfix, from userid 10) id 869E4400C2; Thu, 18 Apr 2013 18:11:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jhcloos.com; s=eagle; t=1366308721; bh=IoMqSw/64BF4rdUmX87rr5uPD/Wuc6yTgZyZ30Gr9yo=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=HvYQnLeFo4t0k47vMxPoAWJXU6a3gYwIwPbBLpe77+rNoJ0gJ5RKQ/Zti36WNrk3J cYVwOz7+mLsUFt9uCNIk1Rs+2JGX09WXyeo4SiJUG7N7Kq0jIOAKsJvNF0Uy1Fc2gZ CJPWh2BstQa/DY+s+BURCph8bR9veA/D/KcKyli0=
Received: by carbon.jhcloos.org (Postfix, from userid 500) id 7F962737D8; Thu, 18 Apr 2013 18:11:10 +0000 (UTC)
From: James Cloos <cloos@jhcloos.com>
To: <ietf-http-wg@w3.org>
Cc: James M Snell <jasnell@gmail.com>, =?iso-8859-1?Q?Fr=E9d=E9ric?= Kayser <f.kayser@free.fr>
In-Reply-To: <CABP7Rbdd8tPQXCgbzJ-PH76Nz3SswO0LJxPvwUt2HMCFJ6Pz3Q@mail.gmail.com> (James M. Snell's message of "Wed, 17 Apr 2013 11:46:06 -0700")
References: <CABP7RbdtNtBin_mq0qVNDChCKq8jvjFKUU6oMZy7XZMnb_JwTw@mail.gmail.com> <1369687790.71571377.1366222842246.JavaMail.root@zimbra71-e12.priv.proxad.net> <CABP7Rbdd8tPQXCgbzJ-PH76Nz3SswO0LJxPvwUt2HMCFJ6Pz3Q@mail.gmail.com>
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.3.50 (gnu/linux)
Face: iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAgMAAABinRfyAAAACVBMVEX///8ZGXBQKKnCrDQ3 AAAAJElEQVQImWNgQAAXzwQg4SKASgAlXIEEiwsSIYBEcLaAtMEAADJnB+kKcKioAAAAAElFTkSu QmCC
Copyright: Copyright 2013 James Cloos
OpenPGP: ED7DAEA6; url=http://jhcloos.com/public_key/0xED7DAEA6.asc
OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Date: Thu, 18 Apr 2013 14:11:10 -0400
Message-ID: <m3vc7jhfoo.fsf@carbon.jhcloos.org>
Lines: 25
MIME-Version: 1.0
Content-Type: text/plain
X-Hashcash: 1:28:130418:ietf-http-wg@w3.org::4Nni98s+itF1wkT1:000000000000000000000000000000000000000002eAjP
X-Hashcash: 1:28:130418:jasnell@gmail.com::MiQMnY6KSw4wHYyd:00000000000000000000000000000000000000000005Kq5t
Received-SPF: pass client-ip=207.210.242.212; envelope-from=cloos@jhcloos.com; helo=eagle.jhcloos.com
X-W3C-Hub-Spam-Status: No, score=-3.8
X-W3C-Hub-Spam-Report: AWL=-3.171, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.556, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1UStJo-0005X0-8Q 3abdcd1e97c24f2b3975f5bcfc09f799
X-Original-To: ietf-http-wg@w3.org
Subject: Re: UTF-8 text
Archived-At: <http://www.w3.org/mid/m3vc7jhfoo.fsf@carbon.jhcloos.org>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17344
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

>>>>> "JMS" == James M Snell <jasnell@gmail.com> writes:

JMS> This is a more difficult question. In theory, yes, we ought to be
JMS> able to support these, but there's the question of backwards
JMS> compatibility.  We could define that the new :path field (and
JMS> referer, location, link, etc) contain a UTF-8 encoded IRIs, so for
JMS> backwards compatibility with HTTP/1, an implementation would need
JMS> to do the appropriate standard conversion to a URI. Going the other
JMS> direction, an impl could choose to leave it as a URI or convert it
JMS> to it's IRI form. I think this makes a lot of sense and has a very
JMS> clear http/2 <--> http/1 translation. So I'm +1 on it.

I also like this, bug have to ask:  Do any non-10646 IRIs encode
differently depending on language?  Ie, would forcing everything
to 10646/utf8 loose information due to character unification?

Think of the differences between the zht, zhs, jp and ko glyphs
of characters unified by 10646.

Perhaps it doesn't matter, even if so?  Or perhaps the utf8 IRI should
be accompanied by a optional language hint?

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6