Re: [Json] About JSON equality

Nico Williams <nico@cryptonector.com> Tue, 19 February 2013 23:37 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3379421F8873 for <json@ietfa.amsl.com>; Tue, 19 Feb 2013 15:37:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.744
X-Spam-Level:
X-Spam-Status: No, score=-3.744 tagged_above=-999 required=5 tests=[AWL=-1.767, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SoSf2M5Wlsjj for <json@ietfa.amsl.com>; Tue, 19 Feb 2013 15:37:37 -0800 (PST)
Received: from homiemail-a24.g.dreamhost.com (caiajhbdcbhh.dreamhost.com [208.97.132.177]) by ietfa.amsl.com (Postfix) with ESMTP id 9EF7F21F886F for <json@ietf.org>; Tue, 19 Feb 2013 15:37:37 -0800 (PST)
Received: from homiemail-a24.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a24.g.dreamhost.com (Postfix) with ESMTP id 5D72C2C806D for <json@ietf.org>; Tue, 19 Feb 2013 15:37:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=yHj8EaBQNuxtOYS8l263 nd6kwCw=; b=HvOfqZSgq2cJ3/OE/KsQEV0a8NE7tcJTHyjYoAIHEwhZK0nvvSFc iTiMBFKt5XPAGwqR4OkdSg+Ru+MD97LIv2+RDlRHSBvAanYJbEI1a62C5QNC0YaV UuoueJE9mRYtgKCbFIoZ1FP81ZSfEY19AOdoc90cnPk4Ihe5y/T10OM=
Received: from mail-ia0-f180.google.com (mail-ia0-f180.google.com [209.85.210.180]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a24.g.dreamhost.com (Postfix) with ESMTPSA id 3A7532C806B for <json@ietf.org>; Tue, 19 Feb 2013 15:37:37 -0800 (PST)
Received: by mail-ia0-f180.google.com with SMTP id f27so6550133iae.39 for <json@ietf.org>; Tue, 19 Feb 2013 15:37:36 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.50.170.36 with SMTP id aj4mr10066479igc.67.1361317056640; Tue, 19 Feb 2013 15:37:36 -0800 (PST)
Received: by 10.64.102.201 with HTTP; Tue, 19 Feb 2013 15:37:36 -0800 (PST)
In-Reply-To: <CALcybBAqONQ+UAzcnJFkphsQk=qSpLwdEoYR-6YETY2GP_EN6w@mail.gmail.com>
References: <CALcybBAqONQ+UAzcnJFkphsQk=qSpLwdEoYR-6YETY2GP_EN6w@mail.gmail.com>
Date: Tue, 19 Feb 2013 17:37:36 -0600
Message-ID: <CAK3OfOi35=UrFs+uzvMgvHGKHz6heNEYk5PUKSwJ-_g3P2E9RA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Francis Galiegue <fgaliegue@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Cc: json@ietf.org
Subject: Re: [Json] About JSON equality
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion related to JavaScript Object Notation \(JSON\)." <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 23:37:38 -0000

On Tue, Feb 19, 2013 at 4:55 PM, Francis Galiegue <fgaliegue@gmail.com> wrote:
> This definition is only lacking for strings. On some other draft which
> I cannot remember off the top of my head, two JSON String values are
> to be considered equal if, position wise, their Unicode code points
> are the same (and THAT INCLUDES \u0000).

String equality is definitely a case where normalization does come in.

In practice most Unicode strings are in NFC (normalization form
composed) by accident: it's what input methods typically produce.  But
it doesn't have to be that way.  For some scripts it almost certainly
isn't always that way (I'm thinking of Hangul in particular).  And
then there's the dreaded HFS+, which, for some unknown [to me] reason
that I cannot fathom, normalizes to NFD on create(!).

So, suppose you were creating a JSON representation of file names in a
filesystem.  If that filesystem were HFS+ then you're guaranteed to
have NFD.  If it's any other filesystem (except, perhaps, other
filesystems on OS X, like at least one port of ZFS to it) then chances
are very high that the file names will be in NFC.  This is not really
a contrived example -- I really, really wish it were, but I think it
more than likely that we'll have such uses of JSON.

So, if you want to be able to compare JSON strings for equality, you
do need to specify whether you care about normalization.
Normalization-insensitive string comparison is *not* that expensive
for mostly ASCII strings: the trivial optimization is to have a fast
path that compares characters in logical two-byte windows that move
one byte at a time, and a slow path that gets triggered only whenever
one of those bytes is non-ASCII.  But it's still a PITA.

Nico
--