Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Nico Williams <nico@cryptonector.com> Wed, 27 November 2013 00:11 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31AE01AE070 for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 16:11:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BwOYBg2dOUZN for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 16:11:33 -0800 (PST)
Received: from homiemail-a27.g.dreamhost.com (caiajhbdccah.dreamhost.com [208.97.132.207]) by ietfa.amsl.com (Postfix) with ESMTP id 083E31AE06D for <json@ietf.org>; Tue, 26 Nov 2013 16:11:33 -0800 (PST)
Received: from homiemail-a27.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a27.g.dreamhost.com (Postfix) with ESMTP id 6FDEF598065; Tue, 26 Nov 2013 16:11:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=lqfhBaOEMxsuKyABXHCraKUboy0=; b=v6DNbueIs46 VZMrpnNGxNW3e6PRMEj4jl2XsqbqeeR0HOkUraq8qquDoWIZIygwR7xFpHg1sxzG 2pvEKpkPpOKdOnNfN0o/WN7TpAbebq3qLduKGVFkhbRjouGVv8SOyY9o3lFozJ8w yANPM7vUJqGJcXgWFWdqYhbdtpuCeoeA=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a27.g.dreamhost.com (Postfix) with ESMTPA id EB3BB598060; Tue, 26 Nov 2013 16:11:31 -0800 (PST)
Date: Tue, 26 Nov 2013 18:11:31 -0600
From: Nico Williams <nico@cryptonector.com>
To: Carsten Bormann <cabo@tzi.org>
Message-ID: <20131127001127.GI21240@localhost>
References: <54E53D571E5E4589B2E9FA17DC816002@codalogic> <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com> <20131126160127.GE20755@mercury.ccil.org> <CAO1wJ5Qz2-vue_OGp79JZ+DW0ELofwdh8vy=9UZk53SMYs1bOg@mail.gmail.com> <20131126172410.GC21240@localhost> <mev999tgjoao5cj84fuk4t8pvi4t9pj6hs@hive.bjoern.hoehrmann.de> <20131126220036.GG21240@localhost> <95E46767-DBBF-489F-83BD-80BEC697C999@tzi.org> <20131126230730.GH21240@localhost> <8F52BEE2-F477-4C5A-AD7E-3FCB5765706C@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <8F52BEE2-F477-4C5A-AD7E-3FCB5765706C@tzi.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Transfer-Encoding: quoted-printable
Cc: es-discuss <es-discuss@mozilla.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, www-tag <www-tag@w3.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Nov 2013 00:11:35 -0000

On Wed, Nov 27, 2013 at 12:20:25AM +0100, Carsten Bormann wrote:
> On 27 Nov 2013, at 00:07, Nico Williams <nico@cryptonector.com> wrote:
> > Do you want to say anything about other encodings?  What would that be?
> 
> JSON is encoded in UTF-8.
> 
> There is no need to discuss JSON in other encodings, because it
> wouldn’t be JSON.

Thanks.

My opinion as to MIME contexts:

    I'm not opposed to saying that the application/json media type
    requires UTF-8.  Others have objected, and I believe the WG
    consensus to be that the application/json media type allows all of
    UTF-8/16/32.

    I believe we should settle for an interop note noting that UTF-8 has
    the best interoperability, and a recommendation that UTF-8 be used.

My opinion as to non-MIME contexts:

    I'm not opposed to recommending that JSON texts for interchange in
    non-MIME contexts be encoded in UTF-8, and I'm not opposed to
    requiring that use of any other encoding be expressed as metadata.

    I do object to requiring that under all circumstances -even in
    non-MIME contexts- UTF-8 must be used.

> (And no, I see no need to handle UTF-16LE, UTF-16BE, UTF-32LE or
> UTF-32BE in any special way, even if RFC 4627 was written at a time
> when it still seemed useful to pay them lip service.  But I recognize
> that there appears to be WG consensus to keep these corpses on life
> support, maybe because UTF-16 is the internal encoding of the
> programming language that gave JSON its name.)

Right, that appears to be the consensus, and more than that, it seems
extremely unlikely to change.

Assuming *that*, what are you willing to settle for?

Nico

PS: Back to my hypo...

    If my hypothetical JSON-using shell were to escape all non-ASCII
    characters in JSON string values, then encode the JSON text in
    UTF-8, then convert the result to the current locale's codeset
    (doing the reverse to parse), and the resulting texts either never
    leak to other locales, why should anyone care?

    Most (but not all) non-Unicode locales use ASCII-compatible codesets,
    thus the result would be "proper" JSON texts in most cases anyways...

    As to why one might want to do that: because JSON texts are...
    *text*, i.e., editable in your favorite $EDITOR, readable with your
    favorite $PAGER, and so on.  It might be a problem if such texts
    leaked outside that locale, but we already have that problem in
    spades, and no JSON parser would be called upon to try to
    auto-detect any encodings other than UTF-8/16/32.