Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04

Martin Thomson <martin.thomson@gmail.com> Tue, 30 July 2013 07:07 UTC

Return-Path: <martin.thomson@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9996B21E80BB; Tue, 30 Jul 2013 00:07:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.351
X-Spam-Level:
X-Spam-Status: No, score=-2.351 tagged_above=-999 required=5 tests=[AWL=0.249, BAYES_00=-2.599, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OSgQw-c6cfR4; Tue, 30 Jul 2013 00:07:42 -0700 (PDT)
Received: from mail-we0-x22a.google.com (mail-we0-x22a.google.com [IPv6:2a00:1450:400c:c03::22a]) by ietfa.amsl.com (Postfix) with ESMTP id 83F8F11E81B9; Tue, 30 Jul 2013 00:07:41 -0700 (PDT)
Received: by mail-we0-f170.google.com with SMTP id w60so4824385wes.29 for <multiple recipients>; Tue, 30 Jul 2013 00:07:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=biVXeFJONYzMNQcTFLeavVEkNqNQHbEr+BScZ9cEk14=; b=ATnlJ+qtrfW9gQx1OLm02ptUU7XjDWYJDpFVxOVUPyfEimt0MnGF5UISo2M/xdT4TY P1USSPOZfR6h0er40N6hctrh/zV6dUhPRD2ImFN7z2Mbp56/Pkv6u7VXOv4N0sP3HI5g poGkELXzADE4OqALKYf+e4afa5VulBrg0SdepbghR+oE4tQSKMPpNo+Zqv1TVi/jAQL5 kB7uErravFPHAnn0PvKKAFXmH/xkXGSD1F3+DTIpSXZ5I/5mcd3lG06+vbTD/MADG1J8 oCPrd6EYTZskQUXsGLwGIsVWwJ+PfBrXEkADWdh1vaS/X4gZKvBly0K64ao7OKWX67FV FBww==
MIME-Version: 1.0
X-Received: by 10.194.78.110 with SMTP id a14mr45581698wjx.84.1375168060668; Tue, 30 Jul 2013 00:07:40 -0700 (PDT)
Received: by 10.194.60.46 with HTTP; Tue, 30 Jul 2013 00:07:40 -0700 (PDT)
In-Reply-To: <CABkgnnXtCBHnOpY_=t7yWD-+7rSFHKdUi0VGUSVJqXq+xV-G2g@mail.gmail.com>
References: <CABkgnnXtCBHnOpY_=t7yWD-+7rSFHKdUi0VGUSVJqXq+xV-G2g@mail.gmail.com>
Date: Tue, 30 Jul 2013 00:07:40 -0700
Message-ID: <CABkgnnV7BG7Be8kCBPsyCqM0AudwSPR2ok1g=qpB0pJqC0Obtg@mail.gmail.com>
From: Martin Thomson <martin.thomson@gmail.com>
To: IETF Apps Discuss <apps-discuss@ietf.org>, draft-bormann-cbor.all@tools.ietf.org
Content-Type: text/plain; charset="UTF-8"
Cc: "gen-art@ietf.org" <gen-art@ietf.org>
Subject: Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Jul 2013 07:07:44 -0000

Adding draft-relevant recipients.

On 30 July 2013 00:05, Martin Thomson <martin.thomson@gmail.com> wrote:
> I am the assigned Gen-ART reviewer for this draft. For background on
> Gen-ART, please see the FAQ at
>
> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>
> Please resolve these comments along with any other Last Call comments
> you may receive.
>
> Document: draft-bormann-cbor-04
> Reviewer: Martin Thomson
> Review Date: 2013-07-29
> IETF LC End Date: ?
> IESG Telechat date: 2013-08-15
>
> Summary:
> This document is not ready for publication as proposed standard.
>
> I'm glad that I held this review until Paul's appsarea presentation.
> This made it very clear to me that the types of concerns I have are
> considered basically irrelevant by the authors because they aren't
> interested in changing the design goals.  I don't find the specific
> design goals to be interesting and am of the opinion that the goals
> are significant as a matter of general application.  I hope that is
> clear from my review.
>
> Independent of any conclusions regarding design goals, there are
> issues that need addressing.
>
> (This is an atypical Gen-ART review.  I make no apologies for that.  I
> didn't intend to write a review like this when I started, but feel
> that it's important to commit these thoughts to the record.  It's also
> somewhat long, sorry.  I tried to edit it down.)
>
> I have reviewed the mailing list feedback, and it's not clear to me
> that there is consensus to publish this.  It might be that the dissent
> that I have observed is not significant in Barry's learned judgment,
> or that this is merely dissent on design goals and therefore
> irrelevant.  The fact that this work isn't a product of a working
> group still concerns me.  I'm actually interested in why this is
> AD-sponsored rather than a working group product.
>
> Major issues:
> My major concerns with this document might be viewed as disagreements
> with particular design choices.  And, I consider it likely that the
> authors will conclude that the document is still worth publishing as
> is, or perhaps with some minor changes.  In the end, I have no issue
> with that, but expect that the end result will be that the resulting
> RFC is ignored.
>
> What would cause this to be tragic, is if publication of this were
> used to prevent other work in this area from subsequently being
> published.  (For those drawing less-than-charitable inferences from
> this, I have no desire to throw my hat into this particular ring,
> except perhaps in jest [1].)
>
> This design is far too complex and large.  Regardless of how
> well-considered it might be, or how well this meets the stated design
> goals, I can't see anything but failure in this document's future.
> JSON succeeds largely because it doesn't attempt to address so many
> needs at once, but I could even make a case for why JSON contains too
> many features.
>
> In comparison with JSON, this document does one major thing wrong: it
> has more options than JSON in several dimensions.  There are more
> types, there are several more dimensions for extensibility than JSON:
> types extensions, values extensions (values of 28-30 in the lower bits
> of the type byte), plus the ability to apply arbitrary tags to any
> value.  I believe all of these to be major problems that will cause
> them to be ignored, poorly implemented, and therefore useless.
>
> In part, this complexity produces implementations that are far more
> complex than they might need to be, unless additional standardization
> is undertaken.  That idea is something I'm uncomfortable with.
>
> Design issue: extensibility:
> This document avoids discussion of issues regarding schema-less
> document formats that I believe to be fundamental.  These issues are
> critical when considering the creation of a new interchange format.
> By choosing this specific design it makes a number of trade-offs that
> in my opinion are ill-chosen.  This may be in part because the
> document is unclear about how applications intend to use the documents
> it describes.
>
> You may conclude after reading this review that this is simply because
> the document does not explain the rationale for selecting the approach
> it takes.  I hope that isn't the conclusion you reach, but appreciate
> the reasons why you might do so.
>
> I believe the fundamental problem to be one that arises from a
> misunderstanding about what it means to have no schema.  Aside from
> formats that require detailed contextual knowledge to interpret, there
> are several steps toward the impossible, platonic ideal of a perfectly
> self-describing format.  It's impossible because ultimately the entity
> that consumes the data is required at some level to understand the
> semantics that are being conveyed.  In practice, no generic format can
> effectively self-describe to the level of semantics.
>
> This draft describes a format that is more capable at self-description
> than JSON.  I believe that to not just be unnecessary, but
> counterproductive.  At best, it might provide implementations with a
> way to avoid an occasional extra line of code for type conversion.
>
> Extensibility as it relates to types:
> The use of extensive typing in CBOR implies an assumption of a major
> role for generic processing.  XML schema and XQuery demonstrate that
> this desire is not new, but they also demonstrate the folly of
> pursuing those goals.
>
> JSON relies on a single mechanism for extensibility. JSON maps that
> contain unknown or unsupported keys are (usually) ignored.  This
> allows new values to be added to documents without destroying the
> ability of an old processor to extract the values that it supports.
> The limited type information JSON carries leaks out, but it's unclear
> what value this has to a generic processor.  All of the generic uses
> I've seen merely carry that type information, no specific use is made
> of the knowledge it provides.
>
> ASN.1 extensibility, as encoded in PER, leads to no type information
> leaking.  Unsupported extensions are skipped based on a length field.
>
> (As an aside, PER is omitted from the analysis in the appendix, which
> I note from the mailing lists is due to its dependency on schema.
> Interestingly, I believe it to be possible - though not trivial - to
> create an ASN.1 description with all the properties described in CBOR
> that would have roughly equivalent, if not fully equivalent,
> properties to CBOR when serialized.)
>
> By defining an extensibility scheme for types, CBOR effectively
> acknowledges that a generic processor doesn't need type information
> (just delineation information), but it then creates an extensive type
> system.  That seems wasteful.
>
> Design issue: types:
> The addition of the ability to carry uninterpreted binary data is a
> valuable and important feature.  If that was all this document did,
> then that might have been enough.  But instead it adds numerous
> different types.
>
> I can understand why multiple integer encoding sizes are desirable,
> and maybe even floating point representations, but this describes
> bignums in both base 2 and 10, embedded CBOR documents in three forms,
> URIs, base64 encoded strings, regexes, MIME bodies, date and times in
> two different forms, and potentially more.
>
> I also challenge the assertion made where the code required for
> parsing a data type produces larger code sizes if performed outside of
> a common shared library.  That's arguably provably true, but last time
> I checked a few extra procedure calls (or equivalent) weren't the
> issue for code size.  Sheer number of options on the other hand might
> be.
>
> Half-precision floating point numbers are a good example of excessive
> exuberance.  They are not available in many languages for good reason:
> they aren't good for much.  They actually tend to cause errors in
> software in the same way that threading libraries do: it's not that
> it's hard to use them, it's that it's harder than people think.  And
> requiring that implementations parse these creates unnecessary
> complexity.  I do not believe that for the very small subset of cases
> where half precision is actually useful, the cost of transmitting the
> extra 2 bytes of a single-precision number is not going to be a
> burden.  However, the cost of carrying the code required to decode
> them is not as trivial as this makes out.  The fact that this requires
> an appendix would seem to indicate that this is special enough that
> inclusion should have been very carefully considered.  To be honest,
> if it were my choice, I would have excluded single-precision floating
> point numbers as well, they too create more trouble than they are
> worth.
>
> Design issue: optionality
> CBOR embraces the idea that support for types is optional.  Given the
> extensive nature of the type system, it's almost certain that
> implementations will choose to avoid implementation of some subset of
> the types.  The document makes no statements about what types are
> mandatory for implementations, so I'm not sure how it is possible to
> provide interoperable implementations.
>
> If published in its current form, I predict that only a small subset
> of types will be implemented and become interoperable.
>
> Design issue: tagging
> The tagging feature has a wonderful property: the ability to create
> emergency complexity.  Given that a tag itself can be arbitrarily
> complex, I'm almost certain that this is a feature you do not want.
>
> Minor issues:
> Design issue: negative numbers
> Obviously, the authors will be well-prepared for arguments that
> describe as silly the separation of integer types into distinct
> positive and negative types.  But it's true, this is a strange choice,
> and a very strange design.
>
> The fact that this format is capable of describing 64-bit negative
> numbers creates a problem for implementations that I'm surprised
> hasn't been raised already.  In most languages I use, there is no
> native type that is capable of carrying the most negative value that
> can be expressed in this format.  -2^64 is twice as large as a 64-bit
> twos-complement integer can store.
>
> It almost looks as though CBOR is defining a 65-bit, 33-bit or 17-bit
> twos complement integer format, with the most significant bit isolated
> from the others, except that the negative expression doesn't even have
> the good sense to be properly sortable.  Given that and the fact that
> bignums are also defined, I find this choice to be baffling.
>
> Document issue: Canonicalization
> Please remove Section 3.6.  c14n is hard, and this format actually
> makes it impossible to standardize a c14n scheme, that says a lot
> about it.  In comparison, JSON is almost trivial to canonicalize.
>
> If the intent of this section is to describe some of the possible
> gotchas, such as those described in the last paragraph, then that
> would be good.  Changing the focus to "Canonicalization
> Considerations" might help.
>
> I believe that there are several issues that this section would still
> need to consider.  For instance, the use of the types that contain
> additional JSON encoding hints carry additional semantics that might
> not be significant to the application protocol.
>
> Extension based on minor values 28-30 (the "additional information" space):
> ...is impossible as defined.  Section 5.1 seems to imply otherwise.
> I'm not sure how that would ever happen without breaking existing
> parsers.  Section 5.2 actually makes this worse by making a
> wishy-washy commitment to size for 28 and 29, but no commitment at all
> for 30.
>
> Nits:
> Section 3.7 uses the terms "well-formed" and "valid" in a sense that I
> believe to be consistent with their use in XML and XML Schema.  I
> found the definition of "valid" to be a little difficult to parse;
> specifically, it's not clear whether invalid is the logical inverse of
> valid.
>
> Appendix B/Table 4 has a TBD on it.  Can this be checked?
>
> Table 4 keeps getting forward references, but it's hidden in an
> appendix.  I found that frustrating as a reader because the forward
> references imply that there is something important there.  And that
> implication was completely right, this needs promotion.  I know why
> it's hidden, but that reason just supports my earlier theses.
>
> Section 5.1 says "An IANA registry is appropriate here.".  Why not
> reference Section 7.1?
>
> [1] https://github.com/martinthomson/aweson