Re: [apps-discuss] Fw: draft-pbryan-zyp-json-pointer: name syntax for non-ASCII

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 28 November 2011 11:23 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1CBD321F8C95 for <apps-discuss@ietfa.amsl.com>; Mon, 28 Nov 2011 03:23:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -96.89
X-Spam-Level:
X-Spam-Status: No, score=-96.89 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_50=0.001, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, J_CHICKENPOX_15=0.6, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YkkPXtCJWOYu for <apps-discuss@ietfa.amsl.com>; Mon, 28 Nov 2011 03:23:15 -0800 (PST)
Received: from scintmta01.scbb.aoyama.ac.jp (scintmta01.scbb.aoyama.ac.jp [133.2.253.33]) by ietfa.amsl.com (Postfix) with ESMTP id 320C121F8C5E for <apps-discuss@ietf.org>; Mon, 28 Nov 2011 03:23:14 -0800 (PST)
Received: from scmse01.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta01.scbb.aoyama.ac.jp (secret/secret) with SMTP id pASBMwYE019336 for <apps-discuss@ietf.org>; Mon, 28 Nov 2011 20:23:01 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 06ab_3794_53243f12_19b3_11e1_b92c_001d096c566a; Mon, 28 Nov 2011 20:22:58 +0900
Received: from [IPv6:::1] ([133.2.210.1]:43977) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S1573324> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Mon, 28 Nov 2011 20:22:59 +0900
Message-ID: <4ED36F0D.7070009@it.aoyama.ac.jp>
Date: Mon, 28 Nov 2011 20:22:53 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: "t.petch" <ietfc@btconnect.com>
References: <092801ccaacc$ada7b000$4001a8c0@gateway.2wire.net>
In-Reply-To: <092801ccaacc$ada7b000$4001a8c0@gateway.2wire.net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: apps-discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fw: draft-pbryan-zyp-json-pointer: name syntax for non-ASCII
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Nov 2011 11:23:16 -0000

On 2011/11/25 2:15, t.petch wrote:
>   ---- Original Message -----
>   From: "Graham Klyne"<GK@ninebynine.org>

>> I spotted this discussion and was reminded that one of the older URI specs had
>> some explicit discussion of characters and octets and encoding.  I recall a
> line
>> that looked something like this:
>>
>>     URI characters ->  URI octets ->  URI octets %-encoded to US-ASCII
>>
>> but I can no longer find it (quickly).  But
> http://www.ietf.org/rfc/rfc3986.txt
>> sections 1.2 and section 2 (esp. intro) address some of the issues.  The point
>> being that character encoding to octets is a separate concern from %-encoding
> to
>> URI (or IRI) on-the-wire characters.
 >
>   My bible for this is RFC2130 which gives
>
>   character->coded character set->character encoding scheme->transfer encoding
>   syntax
>
>   which Unicode seemed to get spot on,

Of course, because Unicode is a character encoding.

> but HTML and URIs ... um

They are not character encodings, but work on a higher level. HTML 
pretty well conforms to http://www.w3.org/TR/charmod/#sec-RefProcModel, 
the Reference Processing Model of the Character Model for the World Wide 
Web. See also RFC 2070.

For URIs, the situation is indeed murky, because escaping (%-encoding) 
is on the octet level, rather than on the character level, and depending 
on the component, the character->octet mapping is undefined. But for new 
stuff, such as the syntax discussed in this thread, the character->octet 
mapping just has to be fixed to UTF-8, which brings that part of an URI 
mostly in line with the above model.

Regards,    Martin.