Re: [apps-discuss] draft-pbryan-zyp-json-pointer: name syntax for non-ASCII

Graham Klyne <GK@ninebynine.org> Tue, 22 November 2011 11:50 UTC

Return-Path: <GK@ninebynine.org>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BF8E21F8D13 for <apps-discuss@ietfa.amsl.com>; Tue, 22 Nov 2011 03:50:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.442
X-Spam-Level:
X-Spam-Status: No, score=-6.442 tagged_above=-999 required=5 tests=[AWL=-0.143, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zsnZmHMlBfW9 for <apps-discuss@ietfa.amsl.com>; Tue, 22 Nov 2011 03:49:59 -0800 (PST)
Received: from relay3.mail.ox.ac.uk (relay3.mail.ox.ac.uk [163.1.2.165]) by ietfa.amsl.com (Postfix) with ESMTP id F14B121F8D08 for <apps-discuss@ietf.org>; Tue, 22 Nov 2011 03:49:58 -0800 (PST)
Received: from smtp2.mail.ox.ac.uk ([163.1.2.205]) by relay3.mail.ox.ac.uk with esmtp (Exim 4.75) (envelope-from <GK@ninebynine.org>) id 1RSorI-0000Ri-Ax; Tue, 22 Nov 2011 11:49:52 +0000
Received: from tinos.zoo.ox.ac.uk ([129.67.24.47]) by smtp2.mail.ox.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <GK@ninebynine.org>) id 1RSorI-0002iw-6y; Tue, 22 Nov 2011 11:49:52 +0000
Message-ID: <4ECB86D5.9080907@ninebynine.org>
Date: Tue, 22 Nov 2011 11:26:13 +0000
From: Graham Klyne <GK@ninebynine.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: =?ISO-8859-1?Q?=22Martin_J=2E_D=FCrst=22?= <duerst@it.aoyama.ac.jp>
References: <4ECA5C66.1040305@gmx.de> <1321903463.1990.16.camel@neutron> <4ECAA9FE.6080802@gmx.de> <1321905599.1990.23.camel@neutron> <4ECAAF39.8000702@gmx.de> <1321906189.1990.26.camel@neutron> <4ECAB0BC.0@gmx.de> <6462023D-F767-45DE-9AF0-011CC48374CF@mnot.net> <1321912269.1990.32.camel@neutron> <E880E90A-332F-4D2F-9B20-7B7ADD03FE27@mnot.net> <4ECB66B0.6060102@it.aoyama.ac.jp>
In-Reply-To: <4ECB66B0.6060102@it.aoyama.ac.jp>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Oxford-Username: zool0635
Cc: Mark Nottingham <mnot@mnot.net>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] draft-pbryan-zyp-json-pointer: name syntax for non-ASCII
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Nov 2011 11:50:03 -0000

I spotted this discussion and was reminded that one of the older URI specs had 
some explicit discussion of characters and octets and encoding.  I recall a line 
that looked something like this:

   URI characters -> URI octets -> URI octets %-encoded to US-ASCII

but I can no longer find it (quickly).  But http://www.ietf.org/rfc/rfc3986.txt 
sections 1.2 and section 2 (esp. intro) address some of the issues.  The point 
being that character encoding to octets is a separate concern from %-encoding to 
URI (or IRI) on-the-wire characters.

#g
--

On 22/11/2011 09:09, "Martin J. Dürst" wrote:
> On 2011/11/22 8:43, Mark Nottingham wrote:
>> The usual approach to this sort of thing is to define the "canonical" way to
>> do it; i.e., json pointers *are* unicode strings; then you can define
>> encodings into various environments (like URIs).
>>
>> In this case, it'd probably be good enough to say that json pointers are
>> unicode strings,
>
> Up to here, this makes a ton of sense.
>
>> but that when they need to be in ASCII environments (like URIs) they get
>> UTF-8'ed and then percent-escaped.
>
> This would mean that e.g. in a Java program that for some reason is kept in
> US-ASCII, I'd have to use %-encoding. This doesn't make sense to me at all.
>
> So I'd say that json pointers are escaped according to the conventions of the
> substrate that carries them when needed (e.g. pure ASCII, or other kinds of
> encodings that can't handle the whole Unicode range).
>
> Then for json pointers as fragment identifiers, I'd mention that where necessary
> (e.g. for URIs), the convention for converting from IRIs to URIs (see RFC 3987)
> applies.
>
> By the way, I don't see a need to escape "/" at all in a fragment identifier.
> "/" is plain and simply allowed in fragment identifiers. Please see
> http://tools.ietf.org/html/rfc3986#section-3.5. Of course, it's not forbidden to
> escape "/", so software that is interpreting a fragment identifier has to make
> sure it does the right thing.
>
> Regards, Martin.
>
>
>> Cheers,
>>
>>
>> On 22/11/2011, at 8:51 AM, Paul C. Bryan wrote:
>>
>>> Okay, so I'll write-up separate sections for JSON string values and URI
>>> fragment identifiers. Any objections?
>>>
>>> Paul
>>>
>>> On Tue, 2011-11-22 at 07:55 +1100, Mark Nottingham wrote:
>>>> +1 to Julian here -- there's no reason why non-ASCII chars need to be
>>>> percent-encoded when they occur inside a JSON document, only when they're in
>>>> a URI (or similar context).
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On 22/11/2011, at 7:12 AM, Julian Reschke wrote:
>>>>
>>>>> On 2011-11-21 21:09, Paul C. Bryan wrote:
>>>>>> The intent is to allow a JSON Pointer to be expressed as a JSON string
>>>>>> value as well as a URI fragment identifier. The latter is the most
>>>>>> significant driver for URI percent-encoding.
>>>>>> ...
>>>>>
>>>>> Well, you could use it as fragment identifier (or otherwise URI component)
>>>>> by UTF-8-percent-escaping.
>>>>>
>>>>> The question is whether that use case requires them to be all ASCII every
>>>>> else, such as in a JSON patch document.
>>>>>
>>>>> Best regards, Julian
>>>>> _______________________________________________
>>>>> apps-discuss mailing list
>>>>>
>>>> apps-discuss@ietf.org
>>>>
>>>>>
>>>> https://www.ietf.org/mailman/listinfo/apps-discuss
>>>>
>>>>
>>>> --
>>>> Mark Nottingham
>>>> http://www.mnot.net/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> apps-discuss mailing list
>>> apps-discuss@ietf.org
>>> https://www.ietf.org/mailman/listinfo/apps-discuss
>>
>> --
>> Mark Nottingham http://www.mnot.net/
>>
>>
>>
>> _______________________________________________
>> apps-discuss mailing list
>> apps-discuss@ietf.org
>> https://www.ietf.org/mailman/listinfo/apps-discuss
>>
> _______________________________________________
> apps-discuss mailing list
> apps-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss
>