Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Patrik Fältström <paf@frobbit.se> Sun, 07 December 2014 09:40 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E0D391A6F0E; Sun, 7 Dec 2014 01:40:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.261
X-Spam-Level:
X-Spam-Status: No, score=-1.261 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_SE=0.35, MIME_8BIT_HEADER=0.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3EFztIV0SLBV; Sun, 7 Dec 2014 01:40:46 -0800 (PST)
Received: from mail.frobbit.se (mail.frobbit.se [IPv6:2a02:80:3ffe::176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8A14D1A1EED; Sun, 7 Dec 2014 01:40:46 -0800 (PST)
Received: from [IPv6:2a02:80:3ffc::1953:222f:c738:98bf] (unknown [IPv6:2a02:80:3ffc:0:1953:222f:c738:98bf]) by mail.frobbit.se (Postfix) with ESMTPSA id A980E2037E; Sun, 7 Dec 2014 10:40:44 +0100 (CET)
Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
Content-Type: multipart/signed; boundary="Apple-Mail=_FA1E5871-D734-4B59-A023-3F6646C9ED4C"; protocol="application/pgp-signature"; micalg="pgp-sha1"
X-Pgp-Agent: GPGMail 2.5b3
From: Patrik Fältström <paf@frobbit.se>
In-Reply-To: <CE03DB3D7B45C245BCA0D24327794936289DC7@MX104CL02.corp.emc.com>
Date: Sun, 07 Dec 2014 10:40:43 +0100
Message-Id: <89601952-AA04-44EE-A6DA-E76D0AB07C21@frobbit.se>
References: <CE03DB3D7B45C245BCA0D24327794936289DC7@MX104CL02.corp.emc.com>
To: "Black, David" <david.black@emc.com>
X-Mailer: Apple Mail (2.1993)
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf/X0Oj_nWrl6fnc3bqFv9kG_8MveQ
Cc: "General Area Review Team (gen-art@ietf.org)" <gen-art@ietf.org>, "json@ietf.org" <json@ietf.org>, "ops-dir@ietf.org" <ops-dir@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Dec 2014 09:40:49 -0000

All,

Many definitions in this document have been specified in the form of 1:1 mappings of Unicode code points to single bytes. This include the record separator and for example definitions of the string "false" in the form of five octets.

This is ok if the encoding used for Unicode is UTF-8, and indeed it says that the parser should try to parse the string as if it is a UTF-8 sequence of characters.

But it also reference RFC7159, which doesn't require UTF-8 but instead for some weird reason also allow other encodings of Unicode text. And on top of that it says Byte Order Mark is not allowed.

This together implies that first of all this draft might not lead to stable implementations, secondly one can not store in JSON strings that include the Byte Order Mark, and there are other unspecified situations.

Or, in short, weaknesses, as I see it, in RFC7159 are made even more weak and potentially dangerous with draft-ietf-json-text-sequence.

Yes, I and others should probably not have let RFC7159 through, because there might be where the bugs are.

Suggestion: this draft, draft-ietf-json-text-sequence, should say explicitly that the only "profile" of RFC7159 that is allowed is UTF-8. That should be a MUST.

Reminder for the IETF: having "or" statements is not recommended at all when talking about these kind of things, and RFC7159 include at least one "or" too many. The recommendation from IETF is to use UTF-8 encoding for Unicode (when serializing text).

   Patrik

> On 5 dec 2014, at 15:51, Black, David <david.black@emc.com> wrote:
> 
> This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows ...
> 
> I am the assigned Gen-ART reviewer for this draft. For background on
> Gen-ART, please see the FAQ at:
> 
> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
> 
> Please resolve these comments along with any other Last Call comments
> you may receive.
> 
> I have reviewed this document as part of the Operational directorate's ongoing
> effort to review all IETF documents being processed by the IESG.  These comments
> were written primarily for the benefit of the operational area directors.
> Document editors and WG chairs should treat these comments just like any other
> last call comments.
> 
> Document: draft-ietf-json-text-sequence-09
> Reviewer: David Black
> Review Date: Dec 5, 2014
> IETF LC End Date: Dec 5, 2014
> IESG Telechat date: Dec 18, 2014
> 
> Summary: This draft is on the right track, but has open issues
> 		described in the review.
> 
> This draft specifies a format that packs multiple JSON texts into a
> single string.  The ASCII RS (0x1E) character is used to separate texts,
> and a linefeed is appended to each text to ensure that a complete text
> always ends with a whitespace character.
> 
> All of the open issues are minor - the most important ones center on
> treatment of incomplete JSON texts - that appears to be an afterthought
> in this draft and needs more attention.  I also found a couple of
> minor issues in the Security and IANA Considerations sections, both of
> which are almost nits.
> 
> Major issues: None.
> 
> Minor issues:
> 
> [A] Section 2.1:
> 
>   If parsing of such an octet string as a JSON text fails, the parser
>   SHOULD nonetheless continue parsing the remainder of the sequence.
> 
> That's not quite right - there are two levels of parsing, JSON
> sequence parsing and JSON text parsing of each text in the sequence,
> both of which might be implemented in a single-pass parser.  For such an
> implementation, the above sentence could be (mis-)read to imply that the
> JSON text parse should resume from the point at which it failed, which
> would be silly (although I've seen heroic PL/1 parsers do exactly that).
> Instead, the parse needs to skip ahead to the next RS, ignoring the rest
> of the JSON text that failed to parse.  I suggest:
> 
>   If parsing of such an octet string as a JSON text fails, and the
>   octet string is followed by an RS octet, the parser
>   SHOULD nonetheless skip ahead to that RS octet and continue parsing
>   the remainder of the sequence from there.
> 
> That also covers the case where there is nothing more to parse after the
> JSON text that caused the parse failure.
> 
> [B] Section 2.3:
> 
> Is incremental parsing of a JSON text within a sequence allowed, or
> is the parser required to not produce any results until the parse of
> the entire text is successful?  I'd expect that incremental parsing
> is ok (so results may be produced from a text that ultimately fails
> to parse), and I think that's worth stating.
> 
> [C] Section 2.4:
> 
>   Parsers MUST check that any JSON texts that are a top-level number
>   include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
>   number, otherwise the JSON-text may have been truncated.
> 
> That reference to the "ws" rule doesn't get the job done because that
> rule allows a match to no characters - it's of the form ws = *( ... )
> where ... is the list of whitespace characters.  What's needed here is
> a rule of the form vws = 1*( ...) to force there to be at least one
> whitespace character, but see the next issue for a better way to deal
> with this topic by pulling the appended LF into the sequence parse
> instead of the text parse.
> 
> [D] I wonder whether the possibility of incomplete texts ought to be
> encoded into the parsing rules to directly catch JSON texts that must
> be incomplete because the last character is not LF, e.g.:
> 
>     JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
>     RS = %x1E; "record separator" (RS), see RFC20
>     possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
>                               ; JSON text (see RFC7159)
>     truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
> 					; to parse as JSON text
>     empty-JSON = LF ; only the LF appended by the encoder, nothing to parse
> 
>     not-RS = %x00-1D / %x1F-FF; any octet other than RS
>     not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF
> 
> Note that this won't detect all incomplete JSON texts, because LF is allowed
> within a JSON text (and this should be stated).
> 
> [E] Section 3 - Security Considerations
> 
> Incomplete and malformed JSON texts can be used to attack JSON parsers -
> that should be pointed out, as I don't see that in RFC 7159's security
> considerations and incomplete texts are a relevant consideration for
> this draft.
> 
> [F] Section 4 - IANA Considerations
> 
>   Security considerations: See <this document, once published>,
>   Section 3.
> 
>   Interoperability considerations: Described herein.
> 
>   Published specification: <this document, once published>.
> 
>   Applications that use this media type: <by publication time
>   <https://stedolan.github.io/jq> is likely to support this format>.
> 
> Replace all three instances of the angle bracketed text.  The first two
> instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
> Editor to insert the number of the RFC when published.  The third instance
> should be resolved now, or could have an RFC Editor note added indicating
> that the author will resolve that during Authors 48 hours.
> 
> Nits/editorial comments:
> 
> idnits didn't like the reference to RFC 20 for ASCII:
> 
>  ** Downref: Normative reference to an Unknown state RFC: RFC   20
> 
> RFC 5234 (ABNF) uses this, which looks like a better reference:
> 
>   [US-ASCII]  American National Standards Institute, "Coded Character
>               Set -- 7-bit American Standard Code for Information
>               Interchange", ANSI X3.4, 1986.
> 
> --- Selected RFC 5706 Appendix A Q&A for OPS-Dir review ---
> 
> Most of these questions are n/a because this draft describes a format
> that will be used in other protocols to which RFC 5706's concerns would apply.
> 
> A.1.4   Have the Requirements on other protocols and functional
>       components been discussed?
> 
> The specification of the interaction of the JSON sequence parser with the
> JSON text parser is not as clear as it should be for incomplete or malformed
> JSON texts.  See Minor Issues [A]-[E] above.
> 
> A.1.8   Are there fault or threshold conditions that should be reported?
> 
> Yes, incomplete JSON texts - this is covered in sections 2.3 and 2.4.
> 
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Distinguished Engineer
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> david.black@emc.com        Mobile: +1 (978) 394-7754
> ----------------------------------------------------
>