Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Nico Williams <nico@cryptonector.com> Tue, 26 November 2013 15:27 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 396381ACCF6 for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 07:27:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aYePu9LXEnAh for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 07:27:09 -0800 (PST)
Received: from homiemail-a111.g.dreamhost.com (caiajhbdcaid.dreamhost.com [208.97.132.83]) by ietfa.amsl.com (Postfix) with ESMTP id DA5531ACCE8 for <json@ietf.org>; Tue, 26 Nov 2013 07:27:08 -0800 (PST)
Received: from homiemail-a111.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a111.g.dreamhost.com (Postfix) with ESMTP id 971242005D905; Tue, 26 Nov 2013 07:27:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=aE2g+Akc/Vr9kX vNAFE4DSZXExs=; b=bpO0O6unccCmDfvYiBSNx+LJDT+unkkmK4FtWu/SGRAvW0 YourF6RlOdH5e+JSO5CCNMhWHWBhv5oCgR0OfecIh0V279VPtabZUDNWW3AhmlBd 8nWy7BCPz+OsiscNlPqKv0RmYdxAHjaPzqupljly/M2ohmePWcR2ru+C44fu8=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a111.g.dreamhost.com (Postfix) with ESMTPA id 0B5522005D903; Tue, 26 Nov 2013 07:27:06 -0800 (PST)
Date: Tue, 26 Nov 2013 09:27:05 -0600
From: Nico Williams <nico@cryptonector.com>
To: Henri Sivonen <hsivonen@hsivonen.fi>
Message-ID: <20131126152700.GN3655@localhost>
References: <8413609C8A86497F856897AF2AA24960@codalogic> <CEAA3067.2D132%jhildebr@cisco.com> <CANXqsRJEtBoprQFrftz80ZigmBR_NHoEXK1sR4GyBtz5B2KC8Q@mail.gmail.com> <20131120223305.GB5476@mercury.ccil.org> <CANXqsRJmNmSRXssBnw3tGUt0veViENLoS=dp+gEr2RqvNAf4JQ@mail.gmail.com> <20131121165615.GA12138@mercury.ccil.org> <CANXqsRKrcR54TzSFng0ysyTV60-uZZ7QQ-G4xJOB0gO29C7-Ag@mail.gmail.com> <54E53D571E5E4589B2E9FA17DC816002@codalogic> <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: John Cowan <cowan@mercury.ccil.org>, Pete Cordell <petejson@codalogic.com>, Paul Hoffman <paul.hoffman@vpnc.org>, JSON WG <json@ietf.org>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Nov 2013 15:27:11 -0000

On Tue, Nov 26, 2013 at 04:10:14PM +0200, Henri Sivonen wrote:
> On Fri, Nov 22, 2013 at 1:33 PM, Pete Cordell <petejson@codalogic.com> wrote:
> > I'm hoping that for the IETFs purposes we'll be looking at
> > JSONs wider utility into broader areas, which may even include logging to
> > files and interprocess communication where there might be sensible reasons
> > to choose something other than UTF-8.
> 
> What sensible reasons could there possibly be?

I can't think of any either.  UTF-32 is superficially appealing (O(1)
indexing!) but it's only O(1) indexing by codepoint counts, not
character counts so it's still lame and you pay for longer strings.
It's possible that on some architectures / for some use cases it
performs fabulously better than the alternatives, and though I doubt it,
that would be a reason not to *forbid* the use of UTF-32.  What we
clearly don't have consensus for is requiring support of UTF-32.

> There exists no situation where using UTF-32 for interchange makes
> sense. I think proponents of craziness of the level of using UTF-32
> for interchange should show evidence of existing crazy deployments
> instead of asking future implementers to support UTF-32 just because
> it wasn't possible to prove non-existence.

No, I think this is too much.  If someone wants to use UTF-32 because
they have numbers showing that for IPC and local processing it's faster,
that might be compelling; let them.

Anyways, I think we're focusing too hard on details that aren't terribly
important.  The "non-BOM-based sniffing rules" work and can be derived
by any capable implementor whether stated or not by the RFC.

> I continue to strongly disapprove of non-BOM-based sniffing rules
> unless there's compelling evidence that such rules are needed in order
> to interoperate with bogus existing serializers.

I think it's fair to object to requiring sniffing, and I support not
requiring it.  I don't see anything wrong with leaving those in for
those who want to include support for it.

Nico
--