Re: [Json] Unpaired surrogates in JSON strings

Carsten Bormann <cabo@tzi.org> Thu, 06 June 2013 06:41 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1042321F96FC for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 23:41:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.249
X-Spam-Level:
X-Spam-Status: No, score=-106.249 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r1wuEosiaJHg for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 23:41:20 -0700 (PDT)
Received: from informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) by ietfa.amsl.com (Postfix) with ESMTP id 742D621F9636 for <json@ietf.org>; Wed, 5 Jun 2013 23:41:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from smtp-fb3.informatik.uni-bremen.de (smtp-fb3.informatik.uni-bremen.de [134.102.224.120]) by informatik.uni-bremen.de (8.14.4/8.14.4) with ESMTP id r566f8Aj015629; Thu, 6 Jun 2013 08:41:08 +0200 (CEST)
Received: from [10.53.212.32] (unknown [88.128.80.2]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp-fb3.informatik.uni-bremen.de (Postfix) with ESMTPSA id C57083959; Thu, 6 Jun 2013 08:41:07 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <51AFE107.7020301@crockford.com>
Date: Thu, 06 Jun 2013 08:41:07 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <81EF29EC-BFEB-4AE0-AFCF-6359BC31D354@tzi.org>
References: <20130605162246.GG3680@mercury.ccil.org> <51AF7988.6040009@crockford.com> <20130605184702.GB6999@mercury.ccil.org> <51AF8A09.50806@crockford.com> <AE081E5F-82AB-416F-A690-E8373C0369B0@vpnc.org> <CAHBU6is9NBuicPm=mNSTLRUvXjrAt8BA5KH=A4pSeCNJy=vTNQ@mail.gmail.com> <51AFE107.7020301@crockford.com>
To: Douglas Crockford <douglas@crockford.com>
X-Mailer: Apple Mail (2.1503)
Cc: json@ietf.org
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 06:41:28 -0000

On Jun 6, 2013, at 03:08, Douglas Crockford <douglas@crockford.com> wrote:

> JSON is just the pipe. It doesn't need to be enforcing Unicode over JavaScript. The sender and receiver can argue about what it means to be a character. JSON has always been agnostic about this.

JSON-the-format is about sequences of characters and can and probably should be character-encoding-scheme agnostic (as long as that character encoding scheme represents Unicode characters, because that's what the ABNF is about).

JSON-the-media-type (application/json) can't.  Media types describe the semantics of a sequence of bytes.
No characters in sight without defining how to get there.  So either we stick with auto-detecting one of the seven UTFs (with some implied or explicit semantics about BOMs, which is incompletely specified in RFC 4627) or we describe reality (UTF-8).

Grüße, Carsten