Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Henri Sivonen <hsivonen@hsivonen.fi> Tue, 26 November 2013 14:10 UTC

Return-Path: <hsivonen@hsivonen.fi>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4A8221AE207 for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 06:10:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.379
X-Spam-Level:
X-Spam-Status: No, score=-1.379 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bbNslNbDB-Gv for <json@ietfa.amsl.com>; Tue, 26 Nov 2013 06:10:14 -0800 (PST)
Received: from mail-ob0-x22b.google.com (mail-ob0-x22b.google.com [IPv6:2607:f8b0:4003:c01::22b]) by ietfa.amsl.com (Postfix) with ESMTP id BE9A21AE217 for <json@ietf.org>; Tue, 26 Nov 2013 06:10:14 -0800 (PST)
Received: by mail-ob0-f171.google.com with SMTP id wp18so5801189obc.2 for <json@ietf.org>; Tue, 26 Nov 2013 06:10:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hsivonen.fi; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=JPuNIv+qkz782plFkCzxkpJpr5ohLmujKe0B4ZMfZrE=; b=BJ6KfEp+W/EG6CfLnYdAylHofUpZDMx8lr5jxQvZ5TQSc3E/mLpOYl3nuFDS2eTGlS Cch3nhlvv6Mqh7pAx8Z62Wxi6ACNQ/U+2h5aTpB7fRxFvOdkC9XVy2Ao9JD6Ynub5Smg 5wVH3TIAgOCj/5dNksV8uYw2nGoDExbsmq30A=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=JPuNIv+qkz782plFkCzxkpJpr5ohLmujKe0B4ZMfZrE=; b=XiAn53RoDjyg4DHc3jdEoNzbkUXVPL2VbbQhz0mIwJGLAiczJ/Og5pF5hOHMmMEIlv QDaY5K0S7lkK/jH3yzLn8wolSpuvwurnQENm5S69PFDFkQxLZqBFbQNlmojPcwaek5cm NBjTB4ysMaxL+YgmSK3W3r4ih24fbg8DgdSuUVppRtdgYv5kXyzXK3kLLtMNvnRqXw7n OrtdMdbDLgfO2xzcMJMxwSC8rfqbtycS7BOTriVhf1nC4rJ/DOgcLLsoSUCvjRk3T7BO CAtmzU+TlqivPDDNR4hJiqlqZUj815vb5sUmzK1NrJsZmmDvpAMZiE6tjxTNw5yBi5UE ah9A==
X-Gm-Message-State: ALoCoQngrofv+cOJ2FgEYAsn4WfhkesEbKPrDezFEf2q1ufdDmJQBBIBc+bexvki4PacGW8JeDpP
MIME-Version: 1.0
X-Received: by 10.182.219.197 with SMTP id pq5mr1507532obc.64.1385475014280; Tue, 26 Nov 2013 06:10:14 -0800 (PST)
Received: by 10.182.119.130 with HTTP; Tue, 26 Nov 2013 06:10:14 -0800 (PST)
In-Reply-To: <54E53D571E5E4589B2E9FA17DC816002@codalogic>
References: <8413609C8A86497F856897AF2AA24960@codalogic> <CEAA3067.2D132%jhildebr@cisco.com> <CANXqsRJEtBoprQFrftz80ZigmBR_NHoEXK1sR4GyBtz5B2KC8Q@mail.gmail.com> <20131120223305.GB5476@mercury.ccil.org> <CANXqsRJmNmSRXssBnw3tGUt0veViENLoS=dp+gEr2RqvNAf4JQ@mail.gmail.com> <20131121165615.GA12138@mercury.ccil.org> <CANXqsRKrcR54TzSFng0ysyTV60-uZZ7QQ-G4xJOB0gO29C7-Ag@mail.gmail.com> <54E53D571E5E4589B2E9FA17DC816002@codalogic>
Date: Tue, 26 Nov 2013 16:10:14 +0200
Message-ID: <CANXqsRJi8dv0Giw7CZWP=10qEJEXGyRTb0HFnE9MpeAxc2_0rA@mail.gmail.com>
From: Henri Sivonen <hsivonen@hsivonen.fi>
To: Pete Cordell <petejson@codalogic.com>
Content-Type: text/plain; charset="UTF-8"
Cc: John Cowan <cowan@mercury.ccil.org>, Paul Hoffman <paul.hoffman@vpnc.org>, JSON WG <json@ietf.org>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Nov 2013 14:10:17 -0000

On Fri, Nov 22, 2013 at 1:33 PM, Pete Cordell <petejson@codalogic.com> wrote:
> Personally I think we have to be careful not to fall into the trap of
> assuming that the only use-case for JSON will be in "to browser"
> communications.

I don't expect it to be the only use.

> I'm hoping that for the IETFs purposes we'll be looking at
> JSONs wider utility into broader areas, which may even include logging to
> files and interprocess communication where there might be sensible reasons
> to choose something other than UTF-8.

What sensible reasons could there possibly be?

The one reason for using UTF-16 is contrived. (Your JSON consists
almost entirely of East Asian string literals with next to no JSON
syntax itself, you are bandwidth-constrained and, magically,
simultaneously so CPU-constrained that you can't use gzip.) For UTF-32
not even contrived reasons exist.

(If you use shared-memory IPC between processes that use non-UTF-8 for
representing Unicode strings, you shouldn't treat the exchange as
happening via char* plus the encoding layer and the JSON MIME type but
using char16_t* or char32_t* without the encoding layer involved. For
example, if you JSONify data to communicate from Web Workers to the
main thread, conceptually, the JSONification happens to Unicode
strings--not to bytes, so the JSON RFC doesn't get involved.)

On Fri, Nov 22, 2013 at 6:39 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Henri Sivonen scripsit:
>
>> Even if no one or approximately no one (outside test cases) actually
>> emits JSON in UTF-32?
>
> How on earth would you know that?

There exists no situation where using UTF-32 for interchange makes
sense. I think proponents of craziness of the level of using UTF-32
for interchange should show evidence of existing crazy deployments
instead of asking future implementers to support UTF-32 just because
it wasn't possible to prove non-existence.

On Fri, Nov 22, 2013 at 9:28 PM, Pete Cordell <petejson@codalogic.com> wrote:
>       00 00 -- --  UTF-32BE
>       00 xx -- --  UTF-16BE
>       xx 00 00 00  UTF-32LE
>       xx 00 00 xx  UTF-16LE
>       xx 00 xx --  UTF-16LE
>       xx xx -- --  UTF-8

I continue to strongly disapprove of non-BOM-based sniffing rules
unless there's compelling evidence that such rules are needed in order
to interoperate with bogus existing serializers.

-- 
Henri Sivonen
hsivonen@hsivonen.fi
http://hsivonen.fi/