Re: [Json] BOMs

Nico Williams <nico@cryptonector.com> Thu, 21 November 2013 19:58 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D58851AE17E; Thu, 21 Nov 2013 11:58:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.378
X-Spam-Level:
X-Spam-Status: No, score=-1.378 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5VfdNuRbx9DW; Thu, 21 Nov 2013 11:58:24 -0800 (PST)
Received: from homiemail-a95.g.dreamhost.com (mailbigip.dreamhost.com [208.97.132.5]) by ietfa.amsl.com (Postfix) with ESMTP id 233DF1AE23F; Thu, 21 Nov 2013 11:58:23 -0800 (PST)
Received: from homiemail-a95.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a95.g.dreamhost.com (Postfix) with ESMTP id 7AD5F1E064; Thu, 21 Nov 2013 11:58:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=5KDkM/G3D0FHMG3sDNdk WfK68O4=; b=a8Yi/ze54R+d7HC7fow6T/D6SNBdthxCU+ib5IHnNc3dXWpa2JkT vDScwtkM/fG9n180m8l/5/jK8Z2Og5xJQl894KzvjSND8X8XYRZptLxIXKpT7wdO jBIATf9U7cz5CWxdG/paHQmsKUprfg+VasoF7RnHMA2LR2niJCn8/ZQ=
Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a95.g.dreamhost.com (Postfix) with ESMTPSA id EADF71E05D; Thu, 21 Nov 2013 11:58:15 -0800 (PST)
Received: by mail-wg0-f43.google.com with SMTP id b13so229280wgh.34 for <multiple recipients>; Thu, 21 Nov 2013 11:58:14 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=I5Bk+awGK4RogfmUskT5WO2wCRudinrfhU/JVLN3JG4=; b=C66z3SuSxwU+kqWL6j32/ViKk8cld74TybUWuPU+IvRUwMiOrsa9MlXwSvuvptlLSb arK0M25hA/jL1T2c7g7vofgmV1vB2jSAcSoaeX3Fvgh01p1Ng4wUw2bwvj459GaTWYmB Z3UM6gOFMgJViCWyvrn2XMmaFuM93uPEMsxXkqIl/e3rXyd5CBzL8yZlDtgfSv3AL6zC ic38SK7CNaF4RBwrvdiG+wNZnnj7UvkSjPlD5sVTN1/boLp3461nKk2mmMbpzTuOiHVe z1pa5uqXalyGUBMPHj9blcqj6qjjYGBTNJC43Npmqe41s2cRSPh1n0rnr0FNct92AInr vcbg==
MIME-Version: 1.0
X-Received: by 10.194.84.72 with SMTP id w8mr3040716wjy.55.1385063894096; Thu, 21 Nov 2013 11:58:14 -0800 (PST)
Received: by 10.216.151.136 with HTTP; Thu, 21 Nov 2013 11:58:13 -0800 (PST)
In-Reply-To: <8qns8959uu813f01i1pr5mg2h0676vimg5@hive.bjoern.hoehrmann.de>
References: <528B46EA.4040503@it.aoyama.ac.jp> <43255615-2FC9-4726-99FD-1B13D6B1F033@wirfs-brock.com> <f5br4ackyqm.fsf@troutbeck.inf.ed.ac.uk> <528C5445.3050600@it.aoyama.ac.jp> <A20405C4-F7AA-4141-AE19-222708A096F7@wirfs-brock.com> <CANXqsR+KwYJyZgCLB+b7P6O3=EgY3io-XwvuBLsfWOQ8zbp8Ww@mail.gmail.com> <50CFBDEE-53A5-4159-93C4-348CF31EC8EF@wirfs-brock.com> <qkfs89lqbec1g7qog6no9ukd23jpslparp@hive.bjoern.hoehrmann.de> <20131121191533.GC12138@mercury.ccil.org> <8qns8959uu813f01i1pr5mg2h0676vimg5@hive.bjoern.hoehrmann.de>
Date: Thu, 21 Nov 2013 13:58:13 -0600
Message-ID: <CAK3OfOirdehPYpe_sOQtNJ9oDLex5NycfJTU3sw3fUQZpTkgcw@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: text/plain; charset="UTF-8"
Cc: es-discuss <es-discuss@mozilla.org>, John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, www-tag <www-tag@w3.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 19:58:25 -0000

On Thu, Nov 21, 2013 at 1:37 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * John Cowan wrote:
>>Bjoern Hoehrmann scripsit:
>>
>>> Is there any chance, by the way, to change `JSON.stringify` so it does
>>> not output strings that cannot be encoded using UTF-8? Specifically,
>>>
>>>   JSON.stringify(JSON.parse("\"\uD800\""))
>>>
>>> would need to escape the surrogate instead of emitting it literally.
>>
>>No, there isn't.  We've been down this road repeatedly.  People can and
>>do use JSON strings to encode arbitrary sequences of unsigned 16-bit integers.
>
> The output of JSON.stringify("\uD800") contains no backslash character,
> if you call `utf8_encode(JSON.stringify("\uD800"))` you get an exception
> because UTF-8 cannot encode the lone surrogate and `utf8_encode` does
> not know it could encode it as `\uD800` without loss of information. If
> `JSON.stringify` produced an escape sequence instead, there would be no
> problem passing the output to `utf8_encode`.

That's just one implementation.  We had hundreds of e-mails in this
list about this.  Well over a thousand to cover several issues like
this.  I think the only area where we have [roughly] consensus to
revisit the previous consensus is the top-level value restriction,
which has led to the whole UTF and byte-order detection sub-thread
(which we had, also, had before).  We're on much stronger ground to
revisit this one matter than the whole unpaired surrogates matter, and
we're much much less likely to change our consensus on that because
one proposal is about relaxing JSON to match ECMAScript's definition,
while yours is to do the opposite.

Nico
--