Re: [Json] [Technical Errata Reported] RFC8259 (7673)

Tim Bray <tbray@textuality.com> Wed, 11 October 2023 15:14 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8ED8CC151089 for <json@ietfa.amsl.com>; Wed, 11 Oct 2023 08:14:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.106
X-Spam-Level:
X-Spam-Status: No, score=-7.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=textuality.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zk4Y1aSxiVxt for <json@ietfa.amsl.com>; Wed, 11 Oct 2023 08:14:28 -0700 (PDT)
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2B1ACC14F736 for <json@ietf.org>; Wed, 11 Oct 2023 08:14:27 -0700 (PDT)
Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-53627feca49so11507916a12.1 for <json@ietf.org>; Wed, 11 Oct 2023 08:14:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textuality.com; s=google; t=1697037266; x=1697642066; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KX9DOT1ZhmVNi8v9EZit1sWOoyFima3HWl504ylyWsA=; b=FmLhr5YA0Vf2rlcQHtNnB69l/T3UAV2HCg6QCFgT6tVa3nIUmrb9bP3iUGZQWS6q+Z E+CLtVvjNSofvsbpuN7Qej5XoeeCOMAcrGuoigab17jPquJN6osrb1+qN6SYYB6dE4Zo UCYQFutQjOlLZO0I2dbaD8f2L74rqg51MlqTs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697037266; x=1697642066; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KX9DOT1ZhmVNi8v9EZit1sWOoyFima3HWl504ylyWsA=; b=I0mdMnP5LgtWpXt+cmtfXxws0LpT9Nl9/8f3rn60AM5s8N5XoH0pUd8Cjhmd4vyorg Im1Y4CUQooW3QhaTndoq5rIlAmUSfbnwzdL/iBw3mB226hhoi+w8sQWJIIX5uVscfzKR CM0DdN8VGnPDRiauWtD1JYORNhzswds2pP2yW3BBX0zDbExoC3QYvsHyXTS26jNIe2Ws GBd9tz5wMCYQrrP+d0VxDw3wJ76MAyff2laftbgClXW2MRc9HL5ZYNsSwVowjaKL7Ru+ iafLR1tDjiM7iBgVATClFuZnDrjAbG5AxuUp5a0UovqfjFnycZIXR66WtaVbPz3AJW2U 1B8w==
X-Gm-Message-State: AOJu0YwCmwxgilEc0nwcRPBtekNZsgPasd70UzABdMAQdHN6QXc7uZ1u Sh3nLXPz3WGQsgDSI8cYuRrDGn9l/0Qh1ICghP1TQQ==
X-Google-Smtp-Source: AGHT+IG5PzBa7O5nWMxaVSuhWz1CTDt8J6fhZkOxuLu6q1gHfsTltuKa0e6qSXIVXpSU51PAVTDgPIU9utSZ2wjm+Po=
X-Received: by 2002:aa7:d699:0:b0:525:440a:616a with SMTP id d25-20020aa7d699000000b00525440a616amr19126902edr.20.1697037265820; Wed, 11 Oct 2023 08:14:25 -0700 (PDT)
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 11 Oct 2023 08:14:24 -0700
Received: from 1064022179695 named unknown by gmailapi.google.com with HTTPREST; Wed, 11 Oct 2023 08:14:21 -0700
Mime-Version: 1.0 (Mimestream 1.1.2)
References: <20231011065619.82BC5E6D69@rfcpa.amsl.com> <CE22DEB9-FA3A-439B-A4CD-79138DBB18A5@cursive.net>
In-Reply-To: <CE22DEB9-FA3A-439B-A4CD-79138DBB18A5@cursive.net>
From: Tim Bray <tbray@textuality.com>
Date: Wed, 11 Oct 2023 08:14:24 -0700
Message-ID: <CAHBU6is2zYR8V-BK8_OVe8iWhfxRym=+=4s1X26e1kfJzOWGdg@mail.gmail.com>
To: Joe Hildebrand <hildjj@cursive.net>
Cc: "Murray S. Kucherawy" <superuser@gmail.com>, Francesca Palombini <francesca.palombini@ericsson.com>, linuxwolf+ietf@outer-planes.net, zachmcollier@gmail.com, json@ietf.org, RFC Editor <rfc-editor@rfc-editor.org>
Content-Type: multipart/alternative; boundary="00000000000066706306077247f3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/duK-6y1yEPRrLTgj5t5uROiHqhM>
Subject: Re: [Json] [Technical Errata Reported] RFC8259 (7673)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Oct 2023 15:14:32 -0000

 Well, this is a strange one.  All the specs for JSON have said the same
thing, and the thing they’ve said is kind of stupid. The requested change
is to add U+7F DEL to the list of characters that MUST be escaped.

However, I created a document containing one field containing only a single
U+7F, it is available at https://www.tbray.org/tmp/del.json, and it seems
to be legal JSON.  So, in fact, while DEL should have been included in the
must-escapes, the world’s software has learned to live with it not being
escaped.

Note that this file does not display correctly.

 ~ % curl https://www.tbray.org/tmp/del.json | jsonlint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
 Current
                                 Dload  Upload   Total   Spent    Left
 Speed
100    23  100    23    0     0    224      0 --:--:-- --:--:-- --:--:--
237
{
  "example": ""
}

FWIW, those who care about such issues may be interested in
https://datatracker.ietf.org/doc/draft-bray-unichars/ currently under
discussion in the art@ and i18ndir@ mailing lists.


On Oct 11, 2023 at 7:17:00 AM, Joe Hildebrand <hildjj@cursive.net> wrote:

> Suggested resolution: reject
>
> RFC 8259's ABNF is quite clear that these codepoints are allowed:
> "unescaped = %x20-21 / %x23-5B / %x5D-10FFFF"
>
> ECMA-404 agrees: "the control characters U+0000 to U+001F".
>
> json.org's wording is awkward, but still clear: "any of the Unicode code
> points except the 32 control codes and "double quote"
>
> Here is some JS to prove it got implemented this way:
>
> ```
> JSON.parse('"\x7f"')
> ```
>
> The approach in the errata might have been the correct one to have been
> specified, but it wasn't.  Even if we had wanted to make this change, it
> was far too late by the time RFC 4627 was written.
>
> —
> Joe Hildebrand
>
> On Oct 11, 2023, at 12:56 AM, RFC Errata System <rfc-editor@rfc-editor.org>
> wrote:
>
>
> The following errata report has been submitted for RFC8259,
>
> "The JavaScript Object Notation (JSON) Data Interchange Format".
>
>
> --------------------------------------
>
> You may review the report below and at:
>
> https://www.rfc-editor.org/errata/eid7673
>
>
> --------------------------------------
>
> Type: Technical
>
> Reported by: Zachary Collier (Zamicol) <zachmcollier@gmail.com>
>
>
> Section: 7
>
>
> Original Text
>
> -------------
>
> The representation of strings is similar to conventions used in the C
> family
>
> of programming languages.  A string begins and ends with quotation marks.
> All
>
> Unicode characters may be placed within the quotation marks, except for the
>
> characters that MUST be escaped: quotation mark, reverse solidus, and the
>
> control characters (U+0000 through U+001F).
>
>
> Corrected Text
>
> --------------
>
> The representation of strings is similar to conventions used in the C
> family
>
> of programming languages.  A string begins and ends with quotation marks.
> All
>
> Unicode characters may be placed within the quotation marks, except for the
>
> characters that MUST be escaped: quotation mark, reverse solidus, and the
>
> control characters (U+0000 through U+001F, U+007F, and U+0080 through
>
> U+009F).
>
>
>
> Notes
>
> -----
>
> There are 33 7-bit control characters, but the JSON RFC only listed 32 by
>
> omitting the inclusion of the last control character in the 7-bit ASCII
> range,
>
> 'del.'  However, JSON is not limited to 7-bit ASCII; it is Unicode.
> Unicode
>
> encompasses 65 control characters from U+0080 to U+009F, totaling an
> additional
>
> 32 characters.  The section that currently reads "U+0000 through U+001F"
> should
>
> include these additional control characters reading as "U+0000 through
> U+001F,
>
> U+007F, and U+0080 through U+009F"
>
>
> Instructions:
>
> -------------
>
> This erratum is currently posted as "Reported". If necessary, please
>
> use "Reply All" to discuss whether it should be verified or
>
> rejected. When a decision is reached, the verifying party
>
> can log in to change the status and edit the report, if necessary.
>
>
> --------------------------------------
>
> RFC8259 (draft-ietf-jsonbis-rfc7159bis-04)
>
> --------------------------------------
>
> Title               : The JavaScript Object Notation (JSON) Data
> Interchange Format
>
> Publication Date    : December 2017
>
> Author(s)           : T. Bray, Ed.
>
> Category            : INTERNET STANDARD
>
> Source              : Javascript Object Notation Update
>
> Area                : Applications and Real-Time
>
> Stream              : IETF
>
> Verifying Party     : IESG
>
>
> _______________________________________________
>
> json mailing list
>
> json@ietf.org
>
> https://www.ietf.org/mailman/listinfo/json
>
>
>