Re: [secdir] secdir review of draft-ietf-json-text-sequence-11

Nico Williams <nico@cryptonector.com> Tue, 16 December 2014 00:01 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 708661A6FFA; Mon, 15 Dec 2014 16:01:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Level:
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aNxur-QIL01V; Mon, 15 Dec 2014 16:01:15 -0800 (PST)
Received: from homiemail-a66.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) by ietfa.amsl.com (Postfix) with ESMTP id 202691A6FDA; Mon, 15 Dec 2014 16:01:15 -0800 (PST)
Received: from homiemail-a66.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a66.g.dreamhost.com (Postfix) with ESMTP id A706C350078; Mon, 15 Dec 2014 16:01:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=J0s/jYB9l1MLyeeLtqQ2bZIQcYM=; b=RbbipLUPzgz gSePFjRlyedKSgz2i5qhH49Iw6Y2EAa+3HHUAZopJmp9pRtazKOoNBEhbOdrsGVt MglpcHWaHwzAO5PNdNAPl75YxlF+ksh6hMsMSpcpoN2a1pJijmFXPBJ3OKVnr+wu YZiAH7owsw4oqHMALt3ce9/uJFgFovBQ=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a66.g.dreamhost.com (Postfix) with ESMTPA id 38A1A35005B; Mon, 15 Dec 2014 16:01:14 -0800 (PST)
Date: Mon, 15 Dec 2014 18:01:13 -0600
From: Nico Williams <nico@cryptonector.com>
To: Carl Wallace <carl@redhoundsoftware.com>
Message-ID: <20141216000109.GP3241@localhost>
References: <D0B1EECD.29290%carl@redhoundsoftware.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <D0B1EECD.29290%carl@redhoundsoftware.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/secdir/nZWXALzIt4TFN1kIOP6IkqbzxIU
Cc: draft-ietf-json-text-sequence@tools.ietf.org, iesg@ietf.org, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-json-text-sequence-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Dec 2014 00:01:16 -0000

On Sat, Dec 13, 2014 at 01:25:33PM -0500, Carl Wallace wrote:
> I have reviewed this document as part of the security directorate’s
> ongoing effort to review all IETF documents being processed by the IESG.

Thanks.

> This document describes a means of encoding and parsing arbitrarily large
> sequences of JSON texts. The document claims no new security
> considerations beyond those in RFC 7159.  I have some concern that the

Though it turns out that RFC7159's security considerations section was
probably too brief.  Anyways.

> addition of a LF by the JSON text sequence encoder that is not removed by
> the JSON text sequence parser has the potential to break detached JSON web
> signatures that cover JSON text sequence elements.  A few general comments
> and questions are below:

That's a good point.  Signing individual JSON text sequence elements is
outside the scope of this document.  Either a complete sequence is
signed, or individual texts are signed with a wrapper.  I wouldn't
recommend transferring a sequence of texts and a parallel sequence of
signatures, say.

> - In the abstract of version 11 the hex value for ASCII line feeds was
> changed to 0x1E from 0x0A.

Yeah, a widely noted typo :/

> - In section 2.1, shouldn’t possible-JSON be defined as *(not-RS) to allow
> for parsing of <RS><RS><RS>?

The preceding 1*RS in JSON-sequence takes care of that, although that
assumes greedy ABNF matching (RFC5234 does not speak to greediness).

> - Section 2.3 should clarify that malformed JSON text sequences are also
> not fatal (I.e., arriving at an RS without having seen a LF).

That's covered in detail in section 2.4.  I wouldn't want to replicate
any part of 2.4 in 2.3.

> - The second paragraph of Section 2.3 suggests that an incremental JSON
> parser may be used across content from multiple sequence elements.  The
> ABNF for encoders does not seem to support this.

Are you thinking of a a sequence like <RS> <JSON-text-w/o-trailing-LF>
<JSON-text> ... being parsed as two JSON texts?

Or are you thinking of a <RS> <JSON-text-containing-RS> <LF>?

The latter must yield a parse error from the JSON text parser.

As to the former... I think I'd thought about it and left it as-is,
though I can't recall why, with the security considerations text about
smuggling being sufficient (except it's not quite).

How about this new text:

   After reading and parsing a JSON text, the sequence parser MUST skip
   to the RS byte following the JSON text (or terminate if at end of
   input).

And in section 3:

   JSON text sequences like <RS> <JSON-text> <JSON-text> <LF> are
   sequences of one text, not two.  JSON text sequence parser
   implementations may emit the first JSON text, but should detect and
   log such sequences.

> - The last sentence of the second paragraph in Section 2.4 is a bit
> confusing. If a JSON-text has no trailing ws and the LF from the JSON text
> sequence is used to match the ws rule, should this be reported as a
> malformed JSON text sequence, a malformed JSON-text or neither?

Neither.  The LF doubles as the ws that makes the number/boolean/null
parse correctly, and as the LF needed for the JSON text sequence.

This is also true for string/array/object, but since those are
self-delimited the presence of the LF is less critical.

Section 2.4 is really

a) justification for the LF,

b) a warning to implementors not to include a bug where permissive
   parsing (not insisting on the trailing LF) results in very incorrect
   results when the top-level value type is a number (or boolean, or
   null).

> - In section 2.4, the “parsers” referenced in the second and third
> paragraphs should be qualified as JSON text sequence parsers or JSON
> parsers.  

When used without a qualifier the intended meaning is "JSON text
sequence parser", but that's a mouthful.  I may add text in section 1.1
about this.

> - In section 2.4, assuming the parser is a JSON text sequence parser, why
> is the MUST drop in the third paragraph necessary given potential use of
> incremental parsers mentioned in section 2.3 and later in 2.4? How do you
> distinguish between a “non-self-delimited top-level value” and a piece of
> JSON-text being parsed incrementally?

See above.  In my implementation I check that the last byte consumed by
the [incremental] JSON text parser was a whitespace character.

> - The examples for possibly truncated JSON texts seem to assume that the
> LF appended by a JSON text sequence encoder is absent. How would a
> truncated JSON text within a properly delimited JSON text sequence be
> detected? [...]

Well, if the LF is present then it's not truncated.  E.g., the number in
<RS>1<RS> may have been truncated, but the number in <RS>123<LF> cannot
have been.

>     [...] For example, if a component feeding JSON text elements to a JSON
> text sequence encoder failed where the JSON text sequence encoder
> succeeds, the result would be a truncated JSON text within a valid JSON
> text sequence. I realize the ABNF should preclude encoding invalid JSON
> texts but there is no text that instructs encoders to fail when invalid
> JSON texts are detected.

Yes, a JSON text sequence encoder should also encode the texts
themselves.  When it doesn't the result may be garbage, but the JSON
text sequence parser can cope with such garbage.

> - The failure to address the LF in the parser section seems like it could
> cause problems for re-encoding.  What would prevent accumulation of LF
> characters as a JSON text sequence was encoded, parsed, re-encoded,
> re-parsed and re-encoded?

There's no requirement that parsing and re-encoding be the identity
function.  There's no such requirement for JSON texts, therefore there
can't be for JSON text sequences.

This comes up a lot for JSON implementations.  Users often want the
result of parsing and re-encoding to leave numbers as-is, object [key]
names in the same order as in the input, indentation/whitespace the
same, and so on.  The answer is generally "sorry, that's ETOOHARD".

(There are implementations that preserve object name/value insertion/
parse order.  But keeping track of encoded number form along with parsed
numbers can be a headache.  Besides, implementations often lose number
precision/range.)

Nico
--