Re: [secdir] secdir review of draft-ietf-json-text-sequence-11

Carl Wallace <carl@redhoundsoftware.com> Tue, 16 December 2014 12:51 UTC

Return-Path: <carl@redhoundsoftware.com>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 101171A701F for <secdir@ietfa.amsl.com>; Tue, 16 Dec 2014 04:51:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ro-BXkLyzxK3 for <secdir@ietfa.amsl.com>; Tue, 16 Dec 2014 04:51:41 -0800 (PST)
Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1AE1D1A1AE0 for <secdir@ietf.org>; Tue, 16 Dec 2014 04:51:40 -0800 (PST)
Received: by mail-qa0-f51.google.com with SMTP id k15so9644281qaq.10 for <secdir@ietf.org>; Tue, 16 Dec 2014 04:51:40 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:user-agent:date:subject:from:to:cc:message-id :thread-topic:references:in-reply-to:mime-version:content-type :content-transfer-encoding; bh=xb5E/juJFelmc/UcIGVuVT3Q7B/7PoW27v5LkYyaWTg=; b=V8yVqHQjCrspBOYZ3HC434SDZ8w4IG5Ji2e4xUzkP8U6lKaHLPU3sz3+1pXGXpQGQb 8+P5dz3vduLoY9eGbIz+V7/GSrV+h32Inh+2z1rWrVhCmd6L7p7+VZAFSlS+a5mn7fw/ i5um4VSDrckzmG48O/RfT5xt0flsDbnutu0GHQ2SA2jh8fyaUKb7WMh+nQI40W85I1Kz kdfdw8IFIQ65BeGnhj1gvCVFLn4ooNSKIhxJKcHuT/XHHSVZX4q/jaJBy/HwuzM6KoLy RT3VtzE527uSxzix50E7c1+CaW4NvShUSQ+UD/ZEBrIpFMzQw6uhYvW+YqrzGBbnpEUi wpiw==
X-Gm-Message-State: ALoCoQlIr+2Mo6wNhduzWBpsfMjGzvg1ujAHTGtBEn0WEYqWey9ie5w9CagvghRqpehOGhBbcqiS
X-Received: by 10.224.169.2 with SMTP id w2mr64388796qay.33.1418734300014; Tue, 16 Dec 2014 04:51:40 -0800 (PST)
Received: from [192.168.2.39] (pool-173-79-132-199.washdc.fios.verizon.net. [173.79.132.199]) by mx.google.com with ESMTPSA id 89sm635064qgz.39.2014.12.16.04.51.38 (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 16 Dec 2014 04:51:39 -0800 (PST)
User-Agent: Microsoft-MacOutlook/14.4.3.140616
Date: Tue, 16 Dec 2014 07:51:32 -0500
From: Carl Wallace <carl@redhoundsoftware.com>
To: Nico Williams <nico@cryptonector.com>
Message-ID: <D0B587AB.2948E%carl@redhoundsoftware.com>
Thread-Topic: [secdir] secdir review of draft-ietf-json-text-sequence-11
References: <D0B1EECD.29290%carl@redhoundsoftware.com> <20141216000109.GP3241@localhost>
In-Reply-To: <20141216000109.GP3241@localhost>
Mime-version: 1.0
Content-type: text/plain; charset="UTF-8"
Content-transfer-encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/secdir/Wz-RUiSxvzK41vBy1O-WEQOFZ3A
Cc: draft-ietf-json-text-sequence@tools.ietf.org, iesg@ietf.org, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-json-text-sequence-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Dec 2014 12:51:44 -0000

Inline...

On 12/15/14, 7:01 PM, "Nico Williams" <nico@cryptonector.com>; wrote:

><snip>
>> This document describes a means of encoding and parsing arbitrarily
>>large
>> sequences of JSON texts. The document claims no new security
>> considerations beyond those in RFC 7159.  I have some concern that the
>
>Though it turns out that RFC7159's security considerations section was
>probably too brief.  Anyways.

Seems a good place to augment since you have noticed shortcomings.

>
>> addition of a LF by the JSON text sequence encoder that is not removed
>>by
>> the JSON text sequence parser has the potential to break detached JSON
>>web
>> signatures that cover JSON text sequence elements.  A few general
>>comments
>> and questions are below:
>
>That's a good point.  Signing individual JSON text sequence elements is
>outside the scope of this document.  Either a complete sequence is
>signed, or individual texts are signed with a wrapper.  I wouldn't
>recommend transferring a sequence of texts and a parallel sequence of
>signatures, say.

It’s probably worth stating up front the process of encoding a JSON text
sequence alters the JSON text sequence elements and that signing
individual elements is outside the scope.

><snip>
>> - In section 2.1, shouldn’t possible-JSON be defined as *(not-RS) to
>>allow
>> for parsing of <RS><RS><RS>?
>
>The preceding 1*RS in JSON-sequence takes care of that, although that
>assumes greedy ABNF matching (RFC5234 does not speak to greediness).

OK.

>
>> - Section 2.3 should clarify that malformed JSON text sequences are also
>> not fatal (I.e., arriving at an RS without having seen a LF).
>
>That's covered in detail in section 2.4.  I wouldn't want to replicate
>any part of 2.4 in 2.3.

I don’t recall seeing any mention of malformed JSON text sequences. I
think you either get to a next <RS> or end of the stream.

>
>> - The second paragraph of Section 2.3 suggests that an incremental JSON
>> parser may be used across content from multiple sequence elements.  The
>> ABNF for encoders does not seem to support this.
>
>Are you thinking of a a sequence like <RS> <JSON-text-w/o-trailing-LF>
><JSON-text> ... being parsed as two JSON texts?

I was thinking more like <RS> <some fraction of JSON possibly a single
character>.

>
>Or are you thinking of a <RS> <JSON-text-containing-RS> <LF>?
>
>The latter must yield a parse error from the JSON text parser.
>
>As to the former... I think I'd thought about it and left it as-is,
>though I can't recall why, with the security considerations text about
>smuggling being sufficient (except it's not quite).
>
>How about this new text:
>
>   After reading and parsing a JSON text, the sequence parser MUST skip
>   to the RS byte following the JSON text (or terminate if at end of
>   input).

I think this text would be a nice addition. I noticed the lack of calling
out "read until end if no <RS>" but it seemed implied.

>
>And in section 3:
>
>   JSON text sequences like <RS> <JSON-text> <JSON-text> <LF> are
>   sequences of one text, not two.  JSON text sequence parser
>   implementations may emit the first JSON text, but should detect and
>   log such sequences.
>
>> - The last sentence of the second paragraph in Section 2.4 is a bit
>> confusing. If a JSON-text has no trailing ws and the LF from the JSON
>>text
>> sequence is used to match the ws rule, should this be reported as a
>> malformed JSON text sequence, a malformed JSON-text or neither?
>
>Neither.  The LF doubles as the ws that makes the number/boolean/null
>parse correctly, and as the LF needed for the JSON text sequence.

Is this doubling simply trying to save a byte or something else?

In any case, the JSON text sequence parser cannot know whether the LF is
serving as the ws for the entire number that was to have been encoded or a
truncated value. It seems better for the JSON text sequence encoder to
encode what it gets as a sequence, the JSON text sequence parser to remove
what the encoder added with the JSON parser left to figure out what to do
with the “possible JSON” and/or error indicating a sequence parsing error.
 

Reporting possible truncation would be just as easy when finding a
<RS>123<LF><RS> instead of <RS>123<ws><LF><RS>. If the only purpose of the
<LF> is to attempt to serve in place of missing <ws>, simply parsing from
<RS> to <RS> seems fine. Given a missing <ws> is not the only form of
possible truncation, it doesn’t buy much to add the <LF>.

>
>This is also true for string/array/object, but since those are
>self-delimited the presence of the LF is less critical.
>
>Section 2.4 is really
>
>a) justification for the LF,
>
>b) a warning to implementors not to include a bug where permissive
>   parsing (not insisting on the trailing LF) results in very incorrect
>   results when the top-level value type is a number (or boolean, or
>   null).

As noted above, the warning about possibly truncated JSON text is
necessary, not sure about a).

><snip>
>> - In section 2.4, assuming the parser is a JSON text sequence parser,
>>why
>> is the MUST drop in the third paragraph necessary given potential use of
>> incremental parsers mentioned in section 2.3 and later in 2.4? How do
>>you
>> distinguish between a “non-self-delimited top-level value” and a piece
>>of
>> JSON-text being parsed incrementally?
>
>See above.  In my implementation I check that the last byte consumed by
>the [incremental] JSON text parser was a whitespace character.
>
>> - The examples for possibly truncated JSON texts seem to assume that the
>> LF appended by a JSON text sequence encoder is absent. How would a
>> truncated JSON text within a properly delimited JSON text sequence be
>> detected? [...]
>
>Well, if the LF is present then it's not truncated.  E.g., the number in
><RS>1<RS> may have been truncated, but the number in <RS>123<LF> cannot
>have been.

I think this is where we differ. How can the JSON text sequence decoder
know whether or not 123 was the complete value that should have been
encoded in the sequence? The component that generated 123 may have been
intended to pass 1234<ws> before it failed. In this case, the JSON text
sequence encoder is not doing a favor by supplying the <LF> as the <ws>.
I would prefer for the JSON text sequence encoder to not alter what it is
encoding by lending the <LF> to the element.

>
>>     [...] For example, if a component feeding JSON text elements to a
>>JSON
>> text sequence encoder failed where the JSON text sequence encoder
>> succeeds, the result would be a truncated JSON text within a valid JSON
>> text sequence. I realize the ABNF should preclude encoding invalid JSON
>> texts but there is no text that instructs encoders to fail when invalid
>> JSON texts are detected.
>
>Yes, a JSON text sequence encoder should also encode the texts
>themselves.  When it doesn't the result may be garbage, but the JSON
>text sequence parser can cope with such garbage.

This gets at my question about the ABNF not supporting incremental
encoding. An incremental snip may not parse (for any number of reasons) so
the sequence encoder has a bit of state to maintain. In any case, calling
out this “should also encode" for encoders would be a helpful addition.
This probably causes some interop problems worth mentioning where one
encoder just encodes what it is handed without re-encoding, possibly in
support of incremental encoding, while another (subsequently used) encoder
does try to parse each element.

>
>> - The failure to address the LF in the parser section seems like it
>>could
>> cause problems for re-encoding.  What would prevent accumulation of LF
>> characters as a JSON text sequence was encoded, parsed, re-encoded,
>> re-parsed and re-encoded?
>
>There's no requirement that parsing and re-encoding be the identity
>function.  There's no such requirement for JSON texts, therefore there
>can't be for JSON text sequences.
>
>This comes up a lot for JSON implementations.  Users often want the
>result of parsing and re-encoding to leave numbers as-is, object [key]
>names in the same order as in the input, indentation/whitespace the
>same, and so on.  The answer is generally "sorry, that's ETOOHARD".
>
>(There are implementations that preserve object name/value insertion/
>parse order.  But keeping track of encoded number form along with parsed
>numbers can be a headache.  Besides, implementations often lose number
>precision/range.)

OK, though as noted above I still don’t see the need for adding the <LF>
in the encoder without removing it in the corresponding parser.