Re: [apps-discuss] Feedback on draft-ietf-appsawg-json-pointer-00

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Fri, 02 March 2012 03:58 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8FB7921E8032 for <apps-discuss@ietfa.amsl.com>; Thu, 1 Mar 2012 19:58:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.449
X-Spam-Level:
X-Spam-Status: No, score=-100.449 tagged_above=-999 required=5 tests=[AWL=-0.659, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f0M3elOnTDid for <apps-discuss@ietfa.amsl.com>; Thu, 1 Mar 2012 19:58:04 -0800 (PST)
Received: from scintmta02.scbb.aoyama.ac.jp (scintmta02.scbb.aoyama.ac.jp [133.2.253.34]) by ietfa.amsl.com (Postfix) with ESMTP id 3D21A21E8014 for <apps-discuss@ietf.org>; Thu, 1 Mar 2012 19:58:03 -0800 (PST)
Received: from scmse01.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id q223vr29006366 for <apps-discuss@ietf.org>; Fri, 2 Mar 2012 12:57:53 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 5ea7_5621_e2d0b2c8_641b_11e1_b789_001d096c566a; Fri, 02 Mar 2012 12:57:52 +0900
Received: from [IPv6:::1] ([133.2.210.1]:36594) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S15A2C9D> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Fri, 2 Mar 2012 12:57:56 +0900
Message-ID: <4F50453B.5020708@it.aoyama.ac.jp>
Date: Fri, 02 Mar 2012 12:57:47 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: "Paul C. Bryan" <pbryan@anode.ca>
References: <4F4FD8A5.6010603@cloudmark.com> <1330638350.2531.11.camel@neutron>
In-Reply-To: <1330638350.2531.11.camel@neutron>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Feedback on draft-ietf-appsawg-json-pointer-00
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Mar 2012 03:58:05 -0000

On 2012/03/02 6:45, Paul C. Bryan wrote:
> On Thu, 2012-03-01 at 12:14 -0800, Mike Acar wrote:

>> That is, if the reference token equals the name of some value within the
>> object, move to that value. However, the tokens and values are Unicode
>> strings; I'm not an expert in Unicode, but my impression is that testing
>> Unicode strings for equality is not as simple as comparing sequences of
>> bytes. For example, there are linguistic considerations: I believe
>> German ö and oe are considered identical.
>
>
> While we may consider ö and oe to be linguistically equivalent, I do no
> believe they are considered lexicographically equivalent in a Unicode
> string comparison. Someone please correct me if I'm wrong. Would it help
> to define the comparison as being lexicographical?

No. Lexicographical is usually used with respect to order, not 
equivalence (see e.g. http://en.wikipedia.org/wiki/Lexicographical_order)

>> There's also the question of JSON documents with different encodings;
>> UTF8 is the default, but UTF-16 and -32 with both endiannesses are also
>> supported. Presumably this question will disappear in practice, since
>> implementations will operate on deserialized data structures, not on
>> JSON texts.
>
> Since they're logically the same underlying Unicode representations, I'm
> not sure there's any issue to consider here.

The best way to spec this is to say that equivalence is checked 
codepoint-by-codepoint. That solves both issues, because codepoints are 
independent of UTF-8/UTF-16/UTF-32, simply the Unicode character numbers.

Regards,    Martin.