Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Nico Williams <nico@cryptonector.com> Tue, 26 November 2013 22:00 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEDD11ADF95 for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 14:00:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mEBrNPAq5i9e for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 14:00:42 -0800 (PST)
Received: from homiemail-a104.g.dreamhost.com (caiajhbdcbef.dreamhost.com [208.97.132.145]) by ietfa.amsl.com (Postfix) with ESMTP id 9DBD81ADF7F for <json@ietf.org>; Tue, 26 Nov 2013 14:00:42 -0800 (PST)
Received: from homiemail-a104.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a104.g.dreamhost.com (Postfix) with ESMTP id 4A0972005D107; Tue, 26 Nov 2013 14:00:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=5jgEo/2Nr55kW9 rE7Xm/+HI5inI=; b=gkKx0R3ACOD1CPFg1hFEtm1Hs7FRzTohxQilpkURxJ/BOw TyzQ67CNa56KGFnMsRoFO6L1kca6C+aBeSgPYKCIG59yrqA7DmR/I5b9NrDP5QhI AiWgfLRD7eIS4pbJxNhBm8ok5K/nrfeUzvreDMY/P6mL6B7yDHZn92sj7/Ezg=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a104.g.dreamhost.com (Postfix) with ESMTPA id D9A832005D105; Tue, 26 Nov 2013 14:00:41 -0800 (PST)
Date: Tue, 26 Nov 2013 16:00:41 -0600
From: Nico Williams <nico@cryptonector.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Message-ID: <20131126220036.GG21240@localhost>
References: <20131120223305.GB5476@mercury.ccil.org> <CANXqsRJmNmSRXssBnw3tGUt0veViENLoS=dp+gEr2RqvNAf4JQ@mail.gmail.com> <20131121165615.GA12138@mercury.ccil.org> <CANXqsRKrcR54TzSFng0ysyTV60-uZZ7QQ-G4xJOB0gO29C7-Ag@mail.gmail.com> <54E53D571E5E4589B2E9FA17DC816002@codalogic> <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com> <20131126160127.GE20755@mercury.ccil.org> <CAO1wJ5Qz2-vue_OGp79JZ+DW0ELofwdh8vy=9UZk53SMYs1bOg@mail.gmail.com> <20131126172410.GC21240@localhost> <mev999tgjoao5cj84fuk4t8pvi4t9pj6hs@hive.bjoern.hoehrmann.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <mev999tgjoao5cj84fuk4t8pvi4t9pj6hs@hive.bjoern.hoehrmann.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: es-discuss <es-discuss@mozilla.org>, www-tag <www-tag@w3.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Nov 2013 22:00:45 -0000

On Tue, Nov 26, 2013 at 09:15:38PM +0100, Bjoern Hoehrmann wrote:
> * Nico Williams wrote:
> >We must not require encoding detection functionality in parsers.  We
> >must not forbid it either.  We might need to say that encodings other
> >than UTF-8/16/32 may not be reliably detected, therefore they are highly
> >discouraged, even forbidden except where protocols specifically call for
> >them.
> 
> When I pass a fully conforming UTF-8 encoded application/json entity to
> a fully conforming JSON parser I do not want the parser to do something
> funny like interpreting the document as if it were Windows-1252 encoded.
> I am amazed how many people here think a parser that does that should
> not be considered broken.

You missed the point.  I'm outlining what we can and should do.  We
should strongly encourage UTF-8 (require it, even, for parsers).  We
should not forbid other encodings -- at least not UTF-16 nor UTF-32 --
though we might agree to say nothing about them.

As to non-UTF encodings, well, think of something like the Korn Shell,
with it's... very strange "compound variables", and consider something
more like the Windows Power Shell.

It might be awesome to have a Unix shell that uses JSON as a [far]
superior alternative to the Korn Shell's compound variable disaster.
But you see, if you have any non-Unicode locales, how would such a shell
encode its JSON values?  Obviously: not in any UTF (except, maybe,
UTF-7).  It'd not be hard for such a shell to handle non-Unicode locales
just fine.  Not that such a shell's JSON parser should auto-detect
encodings (no way), but you know well enough that there's text documents
lying around in all sorts of encodings without the encoding metadata
being recorded anywhere.

If you wanted to forbid non-Unicode, non-UTF encodings, then you'd be
preventing such a shell, and for what reason?  If you only mean that
auto-detection of encoding should not even be mentioned, I'm fine with
that, and I've already said so earlier.

(Of course I'd love to see non-Unicode locales disappear, but I don't
think that's in the cards.  And yes, I had a Unix Power-like shell in
mind when I wrote the text you quoted.)

Nico
--