Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Henri Sivonen <hsivonen@hsivonen.fi> Fri, 22 November 2013 09:22 UTC

Return-Path: <hsivonen@hsivonen.fi>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 433FE1AE337 for <json@ietfa.amsl.com>; Fri, 22 Nov 2013 01:22:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.135
X-Spam-Level:
X-Spam-Status: No, score=0.135 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, SPF_PASS=-0.001, URIBL_RHS_DOB=1.514] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wvIjwFv-y30s for <json@ietfa.amsl.com>; Fri, 22 Nov 2013 01:22:44 -0800 (PST)
Received: from mail-ob0-x22b.google.com (mail-ob0-x22b.google.com [IPv6:2607:f8b0:4003:c01::22b]) by ietfa.amsl.com (Postfix) with ESMTP id DE5631AE353 for <json@ietf.org>; Fri, 22 Nov 2013 01:22:42 -0800 (PST)
Received: by mail-ob0-f171.google.com with SMTP id wp18so1016918obc.16 for <json@ietf.org>; Fri, 22 Nov 2013 01:22:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hsivonen.fi; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=GbW/kfLHSNDBuI9z/WAXzIXLW0x0/0ijYTmn2locDyk=; b=cgNeGobNA3fjMZF37nyJA3qSGQdI3HjwAMAH9FzxwiRadoX7Tm37twD1Ey+z3t//oL eJOLgmQUOOryASzDwhTIQHa+E0cpHEH1TuwFrYmx31LrR310PstWzICy4jBS0Q/JYllO DQt5wwpqJj+sbNOI1tgqppC0R7SqUeY6MFJhE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=GbW/kfLHSNDBuI9z/WAXzIXLW0x0/0ijYTmn2locDyk=; b=QZIxyrnRXiIxsMoOaLmi7rRrvOHj+aodwFYpgcFbsrZtdMiSFwD1NOEW5jgvAhHPrd eFGxYF2V7saxYFtDQTmRFvK014aH/V7cmsosbjojIP6JLFD+SAlxsNPTRLG241y5m81o 4LfNIB4oVZCZjLsdkjAGcT5kkB9IIthG0Fidg97eYhCiX/AO1UMtj79/g0T/H+G11Jr6 RfQpb+K84OrCwfYPn7VqjxCCbpPZ37UHq+Q2YsM8996kg640W9p89jCIBHQnSKG9NLq3 l41udKsEZ3Rb9LY8O+0cOB0jdMWL8SWNGaxOb4nk0Y/ZUkCCLkaJ/pNT8wXA2kfE1Bho PHGQ==
X-Gm-Message-State: ALoCoQlGD5DcWGN53qlRdV9THNecn/a6bsHk3UntA3BuwI3WwntqyGqMiuXTjQbgPxc9DLtm1mJV
MIME-Version: 1.0
X-Received: by 10.182.28.35 with SMTP id y3mr1192463obg.55.1385112155665; Fri, 22 Nov 2013 01:22:35 -0800 (PST)
Received: by 10.182.119.130 with HTTP; Fri, 22 Nov 2013 01:22:35 -0800 (PST)
In-Reply-To: <20131121165615.GA12138@mercury.ccil.org>
References: <8413609C8A86497F856897AF2AA24960@codalogic> <CEAA3067.2D132%jhildebr@cisco.com> <CANXqsRJEtBoprQFrftz80ZigmBR_NHoEXK1sR4GyBtz5B2KC8Q@mail.gmail.com> <20131120223305.GB5476@mercury.ccil.org> <CANXqsRJmNmSRXssBnw3tGUt0veViENLoS=dp+gEr2RqvNAf4JQ@mail.gmail.com> <20131121165615.GA12138@mercury.ccil.org>
Date: Fri, 22 Nov 2013 11:22:35 +0200
Message-ID: <CANXqsRKrcR54TzSFng0ysyTV60-uZZ7QQ-G4xJOB0gO29C7-Ag@mail.gmail.com>
From: Henri Sivonen <hsivonen@hsivonen.fi>
To: John Cowan <cowan@mercury.ccil.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: Pete Cordell <petejson@codalogic.com>, Paul Hoffman <paul.hoffman@vpnc.org>, JSON WG <json@ietf.org>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>, "www-tag@w3.org" <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 09:22:46 -0000

On Thu, Nov 21, 2013 at 3:39 PM, Anne van Kesteren <annevk@annevk.nl> wrote:
> XHR's responseType = "json" only supports UTF-8 (optionally with a
> leading BOM), across the board.

Good point. I wrote the code that enforces that constraint, but I forgot.

Well, there's an interoperability reason against UTF-16, too, then.

On Thu, Nov 21, 2013 at 6:56 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Henri Sivonen scripsit:
>
>> Why not? Surely existing still deployed producers should be what
>> matters when deciding what needs to be ingested--not previous specs.
>> That is, compatibility should be considered in terms of what's out
>> there--not in terms of what unreasonable things were written down in a
>> previous RFC.
>
> In principle, maybe.  But testing a dozen browsers isn't enough here.
> We simply don't know how much non-browser traffic involves JSON (though
> we know there are many such interactions), or what representations they
> are using.  We have therefore decided in the 4627bis effort not to say
> that anything that was previously valid is now invalid.  At least, that
> is what I understand us to be doing.

Even if no one or approximately no one (outside test cases) actually
emits JSON in UTF-32?

>> UTF-32 harms JSON interchange, because Gecko removed all UTF-32
>> support throughout the engine (other engines probably did, too, but
>> I'm too busy to check) and, therefore, XHR responseType = "json"
>> doesn't support UTF-32.
>
> That has about as much weight as "XYZ implementation only supports
> ASCII, so the use of non-ASCII characters harms JSON interchange" or "ABC
> implementation only supports 32-bit integers, so the use of decimal points
> harms JSON interchange."  An implementation's self-imposed limitations
> don't affect the standard.

Well, what (broadly deployed) running code does affects
interoperability regardless of who imposed the behavior. (In this
case, the behavior is imposed by a spec: the XHR spec requires that
JSON be always treated as UTF-8.)

On Thu, Nov 21, 2013 at 7:11 PM, Joe Hildebrand (jhildebr)
<jhildebr@cisco.com> wrote:
> Specifically, the charter (http://datatracker.ietf.org/wg/json/charter/)
> says:
>
> "Any changes that break compatibility with existing implementations of
> either RFC 4627 or the ECMAScript specification will need to have very
> strong justification and broad support."

"existing implementations" ≠ "existing specs"

I think you should have to show an existing implementation with
substantial deployment that in its substantially deployed
configuration emits JSON in UTF-32 to have a justification for keeping
UTF-32 in the spec.

(I have to wonder what kind of theorizing was the cause of putting
UTF-32 in the spec in the first place. I also have to wonder if the
IETF JSON spec would have supported UTF-64 for completeness if someone
had written an April 1st RFC for UTF-64.)

-- 
Henri Sivonen
hsivonen@hsivonen.fi
http://hsivonen.fi/