Re: [secdir] secdir review of draft-ietf-json-text-sequence-11

Nico Williams <nico@cryptonector.com> Tue, 16 December 2014 17:48 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 005E11A7020; Tue, 16 Dec 2014 09:48:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Level:
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FJNJ8j3u1fIi; Tue, 16 Dec 2014 09:48:37 -0800 (PST)
Received: from homiemail-a108.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) by ietfa.amsl.com (Postfix) with ESMTP id 4D9FB1A702F; Tue, 16 Dec 2014 09:48:35 -0800 (PST)
Received: from homiemail-a108.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a108.g.dreamhost.com (Postfix) with ESMTP id 1516B20058D84; Tue, 16 Dec 2014 09:48:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= cryptonector.com; bh=Krfdi5Gck6GwvsAxfzsdGuuaWuE=; b=Uj+41RhOHrr Tj6b922wSXg+yyljT+BXqZ30SAH/TEQvYj+b1mgW+BBQYxSo1cKx+pVdH9WdR4o+ hpd5jM+27EVeyUYf0VtFycR+yc99YpnhYaKNTEqj8yzMAv+hYP33fO89LKhn3NV5 GreDBRu1Ie2lOXItI8PxKNud/nhtc1pI=
Received: from localhost (108-207-244-174.lightspeed.austtx.sbcglobal.net [108.207.244.174]) (Authenticated sender: nico@cryptonector.com) by homiemail-a108.g.dreamhost.com (Postfix) with ESMTPA id A26C420058D82; Tue, 16 Dec 2014 09:48:34 -0800 (PST)
Date: Tue, 16 Dec 2014 11:48:34 -0600
From: Nico Williams <nico@cryptonector.com>
To: Carl Wallace <carl@redhoundsoftware.com>
Message-ID: <20141216174829.GZ3241@localhost>
References: <D0B1EECD.29290%carl@redhoundsoftware.com> <20141216000109.GP3241@localhost> <D0B587AB.2948E%carl@redhoundsoftware.com> <20141216163238.GT3241@localhost> <D0B5C964.2954A%carl@redhoundsoftware.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <D0B5C964.2954A%carl@redhoundsoftware.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/secdir/8hzjqa5x66_3uhgQbuicK8GRTK4
Cc: draft-ietf-json-text-sequence@tools.ietf.org, iesg@ietf.org, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-json-text-sequence-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Dec 2014 17:48:39 -0000

On Tue, Dec 16, 2014 at 12:20:08PM -0500, Carl Wallace wrote:
> On 12/16/14, 11:32 AM, "Nico Williams" <nico@cryptonector.com>; wrote:
> >OK, that will be section 3 (security considerations text), something
> >like:
> >
> >   Parsing and re-encoding a JSON text sequence need not produce the
> >   same sequence of octets.  Do not rely on being able to reproduce the
> >   same inputs to a cryptographic integrity protection function.
> 
> If supporting signing is not important to anyone, OK. This seems like a
> significant sacrifice, especially when positioned against the benefit of
> adding but not removing <LF>s.

Supporting validation of signed sequences by first re-encoding the
sequence is absolutely not a goal.

This is almost dicta for all encodings of any messages.

We found out long ago that it doesn't work when the encoding does have a
canonical form (and even when it does but it's just not used by the
signer).

> >Section 2.3 actually says "malformed" in its first sentence.  It
> >mentions truncation only as an example of why a JSON text might be
> >malformed, in the second sentent.
> 
> I am making a distinction between failure to parse a JSON text and failure
> to parse a JSON text sequence. I think the text only addresses the former.

The whole section is about JSON text parse errors not being fatal for
sequence parsing.  I don't understand the objection.  Perhaps if you
propose text I will?

> >><snip>
> >> [extensive discussion of the LF elided]
> 
> How can a decoder know that <RS>123<LF><RS> was what the originator
> intended and not something that was terminated by the text sequence
> encoder? The originator may have intended <RS>1234<ws><LF><RS>. There
> seems to be some assumption that the supplier of JSON text may fail to
> self-delimit but would not fail to supply the full value. It’s a contrived
> example, but how should an incremental JSON parser handle texts returned
> from a parser operating on the sequence: <RS>123<LF><RS>4<ws><LF><RS>?
> Would it be two values 123 and 4 or one value 1234? Why is it not be
> preferable to report an error here <RS>123<LF><RS> instead of trying to
> auto-terminate it when encoding the sequence?

The assumption is that the "process" writing the sequence will properly
encode the sequence elements, and will write the <RS><element><LF>
sequence correctly.  There is no assumption about atomic completion of
the write.  There is an assumption that incomplete writes will be
truncated from some arbitrary point in that byte sequence to the end of
it (that is, bytes will not be dropped from the middle or beginning).

The concerns about truncation spring from limitations of POSIX write
semantics, particularly O_APPEND writes.  Applications may have to use
writev() or else marshall the <RS><element><LF> into a buffer prior to
calling write().  These details are out of scope.

Applications will have to synchronize if they use write() to write the
RS, then an incremental JSON text encoder that may call write() multiple
times, then again a write() to write the LF.  This too is out of scope.

A paragraph of text about these assumptions may be warranted, but I
really don't want to have any references to POSIX and so on.  I think
the need for a modicum of atomicity (that which POSIX write semantics
provide) should be evident to implementors.

> >> OK, though as noted above I still don’t see the need for adding the <LF>
> >> in the encoder without removing it in the corresponding parser.
> >
> >There is no need to remove it in the sequence parser, though the
> >sequence parser may do it.  The sequence parser need only reject
> >top-level number/true/false/null values whose text did not end in a
> >whitespace.  A sequence parser could do this by insisting that the
> >sequence element end in an LF, and it can remove the trailing LF as well
> >as leaving it in, as the trailing LF's presence (or absence) does not
> >affect the validity of the JSON text to be parsed.
> 
> Is this universally true where JSON text is incrementally packaged into a
> text sequence?

See above.

> >
> >Adding the LF in the sequence encoder does not hurt, since all JSON
> >texts can end with arbitrary amounts of ws.  It merely helps delimit
> >sequence elements, both to ensure that top-level numbers/true/false/null
> >are delimited, and to help keep lines shorter for users using $PAGER and
> >$EDITOR to view JSON text sequences.
> >
> >Delimiting otherwise non-self- delimiting texts is an important
> >function.
> 
> I guess we just disagree on whether the text sequence encoder is
> necessarily in a position to terminate data that may be incrementally
> supplied or incompletely supplied by a caller and whether or not this
> important function should be allocated to the caller instead of to the
> JSON text sequence encoder.  One alternative would be to add a <ws> only

Of course a properly functioning encoder on a properly function system
is in a position to terminate each element.  How can this be in doubt?

A sequence encoder might write() RS, then invoke an incremental JSON
text encoder to encode and write() the JSON text, then finally when the
JSON text encoder completes its task, the sequence encoder write()s the
LF.

The sequence encoder can also marshall the whole thing into a buffer and
write() that.

Part of the point of JSON text sequences is to permit online processing
of large data sets without requiring a streaming JSON text parser (which
is not the same as an incremental parser).  That point also applies to
encoding: if the sequence elements are small enough to fit into memory
and be parsed non-incrementally, then they also necessarily meet similar
constraints on the encoder side.

The only way to screw this up is when breaking down the process of
writing a sequence into a non-atomic sequence of operations that race in
a multi-process/threaded writer.  The details of that problem and how to
avoid it are clearly out of scope for this document.

> when a non-self-delimited text is passed to a non-incremental encoder,
> then encode that altered value into the sequence (and terminate all
> sequence elements with <RS> only).

The RS/LF bracketing was the result of lengthy discussions on the JSON
WG list.  The RS/LF bracketing design should not be reconsidered at this
time unless you have a security concern that cannot be addressed
otherwise.  I reject the concern about validation of signatures via the
use of re-encoded sequences (see above and earlier) -- it is commonly
accepted and strongly recommended practice that signatures should be
validated over what is signed, then and only then (after the signature
is validated successfully) should the payload be parsed.  If you have
any other security concerns relating to the LF, let's hear them.

Nico
--