Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04

Carsten Bormann <cabo@tzi.org> Mon, 12 August 2013 20:37 UTC

Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
Date: Mon, 12 Aug 2013 22:37:28 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <2B377A8B-C954-4DEE-B19C-506CE8B295A2@tzi.org>
References: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
To: Joe Hildebrand <jhildebr@cisco.com>
Cc: "draft-bormann-cbor-04.all@tools.ietf.org" <draft-bormann-cbor-04.all@tools.ietf.org>, "gen-art@ietf.org" <gen-art@ietf.org>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
Precedence: list

On Aug 5, 2013, at 19:43, Joe Hildebrand (jhildebr) <jhildebr@cisco.com> wrote:

> Sorry, my response is also correspondingly long.  There are some original
> comments at the end 

[...]

We have worked a bit on the detail remarks of your review.
(The previous message addressed the grander aspects).
Item-by-item replies below:

        Other things that ought to be discussed:

        I would like to see another design goal: "May be implemented in modern web
        browsers".  That should be possible with the new binary types.

Indeed, implementability on a wide set of platforms is an important
goal, somewhat implicit in out objective 2.  I believe the design goal
was achieved.

        I still don't see the need for non-string map keys.  JSON mapping would be
        easier without them, as would uniqueness checking.  If they are to be
        retained, they should have some motivation in the spec, describing how and
        why they might be used.

I'm not sure they particularly need to be motivated in the spec. However, we will
add something like "For example, using numbers for keys is useful in
cases where the values are numeric and numeric ordering of keys is important."

Further, data formats have been using complex map keys for a long time,
e.g. cf. YAML.

As an example, CBOR (diagnostic notation)

{ {"country": "DE", "license": "HB-MH 765"}: {"make": "Ford", "km": 192735.2},
  {"country": "FI", "license": "601 ICE"}: {"make": "BMW", "km": 68923.1} }

might be used to describe a map of cars, indexed by their license
plate, where the license plate uses a country identifier and the plate
text as its two components.

In JSON, I would be forced to munge together the unrelated country and
license strings using some custom delimiter syntax (say,
"DE/HB-MH 765"); in CBOR (or YAML), I can keep them nice and semantic.

In YAML:

---
? country: DE
  license: HB-MH 765
: make: Ford
  km: 192735.2
? country: FI
  license: 601 ICE
: make: BMW
  km: 68923.1

As always, protocols based on CBOR do not have to use this
functionality; in section 3.4 we encourage using strings, or integers,
as map keys (and »keys should be of a single CBOR type«).

        I wish I could think of another good simple value that we might register
        one day.  The only one I've come up with is "no-op", which I might use in
        a streaming application as a trivial keep-alive or a marker between
        records to ensure parser state sync.  I wouldn't classify that as "good"
        however.

This is indeed the less likely avenue of extension of the format.
However, as an example, we would have used simple values for the
Infinities and NaN if the IEEE 754 formats didn't already provide a
good representation.

        Nested tags ought to be forbidden until we come up with a strong use case
        for them.

The tag 55799 ("self-describing CBOR", allowing to start a CBOR data
item with 0xd9d9f7 for file type disambiguation) in -05 is a pretty
strong use case; you should be able to use it independent of whether
the data item is top-level tagged or not.

        I could use some implementation guidance on how to generate the most
        compact floating-point type for a given number, assuming we keep all of
        the floating-point types.

I hope to have my C code for this on github, soon...
I did it like this:

void cbor_encoder_write_double(double v)
{
  float fv = v;
  if (fv == v) {  // i.e., the 32-bit float value doesn't lose data
    // ... do the same thing for half; encode as half or single
  } else {
    // ... encode as double
  }
}

If your platform doesn't give you the necessary floating point types,
but you have (possibly coded yourself) the bitwise access to the
floating point representation you need to send binary doubles, this is
a bit more work, which I haven't done yet (it will probably look a bit
like the Python code in Appendix D).

        I don't like 2.4.4.2 "Expected Later Encoding for CBOR to JSON
        Converters".  Having a sender care about the encoding of a second format
        at the same time seems unnecessarily complex.  I'd like to see this
        section and the corresponding tags moved to another spec, or just removed.

I think we have a pretty good use case in constrained implementations
that marshal data for later use a CBOR to JSON converter.
(See other message.)

        Similar for tags 33 and 34.  Just send the raw bytes as a byte string;
        there's no need to actually base64 encode.

(See other message.)

        Section 3.2.3, we should call out the heresy of including UTF-16
        surrogates encoded as UTF-8 for those that can't read the UTF-8 spec.

I'm with you on the subject matter, but I'm not sure if adorning
normative references with "hey, we really mean this" should be
necessary...

        Overall in section 3.2, "should probably issue an error but might take
        some other action" seems like it will cause some interop surprises in
        practice.

-05 will have some cleanups here.

        Section 3.3, "all number representations are equivalent" is unclear, even
        with the clarifying phrase afterward.

Slightly clarified.

        If section 3.6 stays, the numerics need more work.  +/- Infinity should be
        treated like NaN.

(Why?)

        "if the result is the same value" would benefit from
        some more clarity.  The tags section is also somewhat unclear.

Clarified slightly.

        Section 4.2, what about numbers with uninteresting fractional parts, like
        1.0?  What about numbers in exponential format without fractional parts,
        like 1e10?  I would recommend against even suggest encoding in place.
        It's likely to cause a security issue for the reason mentioned.

Clarified in -05 that it is actually warning against the in-place
strategy.

        Section 6, including the diagnostic notation is a little strange.  I would
        at least like "it is not meant to be parsed" to be strengthened to "MUST
        NOT be parsed".

(See other message.)

        Section 7.1, simple values 0..15 can never be used with this construction.
         If that's intentional, then give a good reason.

(Fixed the wording based on IANA input.)

        Section 7.2, do we have language we can use about how to reclaim
        first-come-first-serve tags that aren't being used anymore?  e.g. the web
        page is down and the requestor may be dead.

It is generally impossible to reclaim IANA values once assigned --
even if the requestor is no longer active, the data may still be in
use on the wire or in some archive.  Our registry should be no
different than anyone else's.

        In section 7.3, yes, we should make "application/mmmmm+cbor" valid.

Added section 7.4 based on text suggested by Tony Hansen.

        In section 8, perhaps mention a stack attack like:

        0x818181818181818181...

        I implemented depth counting with a maximum as an approach to avoid this.

        There are likely to be lots of other security concerns.

Added a paragraph on resource exhaustion attacks.

        I have checked all of the examples in Appendix A.  I would have expected

        2(h'010000000000000000') | 0xc249010000000000000000

        Not 18446744073709551616, since in the diagnostic, I don't necessarily
        support bignums.

        Same with 0xc349010000000000000000.

I prefer to keep the explanatory value of actually giving the decoded
bignum value.  Text added:
»In the diagnostic notation provided for bignums, their intended
numeric value is shown as a decimal number (such as
18446744073709551616) instead of showing a tagged byte string (such as
2(h'010000000000000000')).«

        For numbers, section 9.8.1 of ECMA-262 (JSON.stringify) is relevant.  So:

        5.960464477539063e-8 | 0xf90001   (not e-08)
        0.00006103515625 | 0xf90400       (not e-05)

Changed this way in -05.

        In Appendix B, there are holes in the jump table.  If you're going to have
        a table, call out the invalid values, such as 0x1c-0x1f.

This may be shortened to rather call out the remaining valid values.
Done in -05.

        In Appendix C, the use of the variable "breakable" is unclear, and it's
        not obvious how you'll get to the "no enclosing indefinite" case.

breakable starts out as false.  It is given as true where instead of a
data item a "break" stop code is allowed to occur.
The "no enclosing indefinite" case is reached if breakable is not set
and an 0xFF is encountered anyway.

        In Appendix D, shouldn't the input be unsigned or an array of bytes?

It could also be unsigned short (which would be widened to int
anyway), but in this case it shouldn't matter.
(Array of bytes would just be slightly more tedious.)

        Appendix E.1, aren't there some cases of DER where you need the schema to
        parse?

As opposed to in BER?  I'm not aware of that.
(DER is a restricted subset of BER.)
DER and BER are not exactly enabling schemaless decoding; they have in
common that that implicit tagging may hide the data type information
so you may need the schema information (ASN.1 definition) to find out
the actual data type in use.
(However, you can parse the surface structure of a BER/DER object
without schema information.)
We included this not because it is an alternative to CBOR, but because
its use in the IETF is widespread.

        Add PER.

PER does need schema information even for parsing and would require
its own little dissertation, so we left this off.

        Appendix E, we should mention Smile, since the first two octets are so
        cute.

:)

(Smile was on my initial list of formats to cover in the appendix, but
it is rather complicated and would need large amounts of text, beyond
what would be appropriate for this little appendix.)

        Overall, I think this doc shows a lot of promise, and I'm looking forward
        to having something on standards track that has these properties.

Thank you for this detailed review!

Grüße, Carsten

[apps-discuss] Gen-ART review of draft-bormann-cb… Martin Thomson
Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
Re: [apps-discuss] Gen-ART review of draft-borman… Zach Shelby
Re: [apps-discuss] Gen-ART review of draft-borman… Tim Bray
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker
[apps-discuss] CBOR and BULK (was Re: Gen-ART rev… Pierre Thierry
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
Re: [apps-discuss] Gen-ART review of draft-borman… Tony Finch
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Paul Hoffman
Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker