Re: UTF-8 text

James Cloos <> Thu, 18 April 2013 18:13 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D4CB721F8A0C for <>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -10.447
X-Spam-Status: No, score=-10.447 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id YX66NUocRxxq for <>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 29A9721F89B0 for <>; Thu, 18 Apr 2013 11:13:17 -0700 (PDT)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1UStJs-0003Js-7F for; Thu, 18 Apr 2013 18:12:28 +0000
Resent-Date: Thu, 18 Apr 2013 18:12:28 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1UStJp-0003IZ-9I for; Thu, 18 Apr 2013 18:12:25 +0000
Received: from ([]) by with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <>) id 1UStJo-0005X0-8Q for; Thu, 18 Apr 2013 18:12:25 +0000
Received: by (Postfix, from userid 10) id 869E4400C2; Thu, 18 Apr 2013 18:11:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=eagle; t=1366308721; bh=IoMqSw/64BF4rdUmX87rr5uPD/Wuc6yTgZyZ30Gr9yo=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=HvYQnLeFo4t0k47vMxPoAWJXU6a3gYwIwPbBLpe77+rNoJ0gJ5RKQ/Zti36WNrk3J cYVwOz7+mLsUFt9uCNIk1Rs+2JGX09WXyeo4SiJUG7N7Kq0jIOAKsJvNF0Uy1Fc2gZ CJPWh2BstQa/DY+s+BURCph8bR9veA/D/KcKyli0=
Received: by (Postfix, from userid 500) id 7F962737D8; Thu, 18 Apr 2013 18:11:10 +0000 (UTC)
From: James Cloos <>
To: <>
Cc: James M Snell <>, =?iso-8859-1?Q?Fr=E9d=E9ric?= Kayser <>
In-Reply-To: <> (James M. Snell's message of "Wed, 17 Apr 2013 11:46:06 -0700")
References: <> <> <>
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.3.50 (gnu/linux)
Copyright: Copyright 2013 James Cloos
OpenPGP: ED7DAEA6; url=
OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Date: Thu, 18 Apr 2013 14:11:10 -0400
Message-ID: <>
Lines: 25
MIME-Version: 1.0
Content-Type: text/plain
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-3.8
X-W3C-Hub-Spam-Report: AWL=-3.171, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.556, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: 1UStJo-0005X0-8Q 3abdcd1e97c24f2b3975f5bcfc09f799
Subject: Re: UTF-8 text
Archived-At: <>
X-Mailing-List: <> archive/latest/17344
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

>>>>> "JMS" == James M Snell <> writes:

JMS> This is a more difficult question. In theory, yes, we ought to be
JMS> able to support these, but there's the question of backwards
JMS> compatibility.  We could define that the new :path field (and
JMS> referer, location, link, etc) contain a UTF-8 encoded IRIs, so for
JMS> backwards compatibility with HTTP/1, an implementation would need
JMS> to do the appropriate standard conversion to a URI. Going the other
JMS> direction, an impl could choose to leave it as a URI or convert it
JMS> to it's IRI form. I think this makes a lot of sense and has a very
JMS> clear http/2 <--> http/1 translation. So I'm +1 on it.

I also like this, bug have to ask:  Do any non-10646 IRIs encode
differently depending on language?  Ie, would forcing everything
to 10646/utf8 loose information due to character unification?

Think of the differences between the zht, zhs, jp and ko glyphs
of characters unified by 10646.

Perhaps it doesn't matter, even if so?  Or perhaps the utf8 IRI should
be accompanied by a optional language hint?

James Cloos <>         OpenPGP: 1024D/ED7DAEA6