Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
Carsten Bormann <cabo@tzi.org> Mon, 12 August 2013 20:37 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1E2C921F9FD6; Mon, 12 Aug 2013 13:37:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.098
X-Spam-Level:
X-Spam-Status: No, score=-105.098 tagged_above=-999 required=5 tests=[AWL=-1.015, BAYES_00=-2.599, FF_IHOPE_YOU_SINK=2.166, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oW6sw8fnEPk4; Mon, 12 Aug 2013 13:37:35 -0700 (PDT)
Received: from informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) by ietfa.amsl.com (Postfix) with ESMTP id 9502421F9E45; Mon, 12 Aug 2013 13:37:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from smtp-fb3.informatik.uni-bremen.de (smtp-fb3.informatik.uni-bremen.de [134.102.224.120]) by informatik.uni-bremen.de (8.14.4/8.14.4) with ESMTP id r7CKbUYQ012599; Mon, 12 Aug 2013 22:37:30 +0200 (CEST)
Received: from [192.168.217.105] (p54894F9F.dip0.t-ipconnect.de [84.137.79.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp-fb3.informatik.uni-bremen.de (Postfix) with ESMTPSA id 38411F01; Mon, 12 Aug 2013 22:37:29 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
Date: Mon, 12 Aug 2013 22:37:28 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <2B377A8B-C954-4DEE-B19C-506CE8B295A2@tzi.org>
References: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
To: Joe Hildebrand <jhildebr@cisco.com>
X-Mailer: Apple Mail (2.1508)
Cc: "draft-bormann-cbor-04.all@tools.ietf.org" <draft-bormann-cbor-04.all@tools.ietf.org>, "gen-art@ietf.org" <gen-art@ietf.org>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Aug 2013 20:37:47 -0000
On Aug 5, 2013, at 19:43, Joe Hildebrand (jhildebr) <jhildebr@cisco.com> wrote: > Sorry, my response is also correspondingly long. There are some original > comments at the end [...] We have worked a bit on the detail remarks of your review. (The previous message addressed the grander aspects). Item-by-item replies below: Other things that ought to be discussed: I would like to see another design goal: "May be implemented in modern web browsers". That should be possible with the new binary types. Indeed, implementability on a wide set of platforms is an important goal, somewhat implicit in out objective 2. I believe the design goal was achieved. I still don't see the need for non-string map keys. JSON mapping would be easier without them, as would uniqueness checking. If they are to be retained, they should have some motivation in the spec, describing how and why they might be used. I'm not sure they particularly need to be motivated in the spec. However, we will add something like "For example, using numbers for keys is useful in cases where the values are numeric and numeric ordering of keys is important." Further, data formats have been using complex map keys for a long time, e.g. cf. YAML. As an example, CBOR (diagnostic notation) { {"country": "DE", "license": "HB-MH 765"}: {"make": "Ford", "km": 192735.2}, {"country": "FI", "license": "601 ICE"}: {"make": "BMW", "km": 68923.1} } might be used to describe a map of cars, indexed by their license plate, where the license plate uses a country identifier and the plate text as its two components. In JSON, I would be forced to munge together the unrelated country and license strings using some custom delimiter syntax (say, "DE/HB-MH 765"); in CBOR (or YAML), I can keep them nice and semantic. In YAML: --- ? country: DE license: HB-MH 765 : make: Ford km: 192735.2 ? country: FI license: 601 ICE : make: BMW km: 68923.1 As always, protocols based on CBOR do not have to use this functionality; in section 3.4 we encourage using strings, or integers, as map keys (and »keys should be of a single CBOR type«). I wish I could think of another good simple value that we might register one day. The only one I've come up with is "no-op", which I might use in a streaming application as a trivial keep-alive or a marker between records to ensure parser state sync. I wouldn't classify that as "good" however. This is indeed the less likely avenue of extension of the format. However, as an example, we would have used simple values for the Infinities and NaN if the IEEE 754 formats didn't already provide a good representation. Nested tags ought to be forbidden until we come up with a strong use case for them. The tag 55799 ("self-describing CBOR", allowing to start a CBOR data item with 0xd9d9f7 for file type disambiguation) in -05 is a pretty strong use case; you should be able to use it independent of whether the data item is top-level tagged or not. I could use some implementation guidance on how to generate the most compact floating-point type for a given number, assuming we keep all of the floating-point types. I hope to have my C code for this on github, soon... I did it like this: void cbor_encoder_write_double(double v) { float fv = v; if (fv == v) { // i.e., the 32-bit float value doesn't lose data // ... do the same thing for half; encode as half or single } else { // ... encode as double } } If your platform doesn't give you the necessary floating point types, but you have (possibly coded yourself) the bitwise access to the floating point representation you need to send binary doubles, this is a bit more work, which I haven't done yet (it will probably look a bit like the Python code in Appendix D). I don't like 2.4.4.2 "Expected Later Encoding for CBOR to JSON Converters". Having a sender care about the encoding of a second format at the same time seems unnecessarily complex. I'd like to see this section and the corresponding tags moved to another spec, or just removed. I think we have a pretty good use case in constrained implementations that marshal data for later use a CBOR to JSON converter. (See other message.) Similar for tags 33 and 34. Just send the raw bytes as a byte string; there's no need to actually base64 encode. (See other message.) Section 3.2.3, we should call out the heresy of including UTF-16 surrogates encoded as UTF-8 for those that can't read the UTF-8 spec. I'm with you on the subject matter, but I'm not sure if adorning normative references with "hey, we really mean this" should be necessary... Overall in section 3.2, "should probably issue an error but might take some other action" seems like it will cause some interop surprises in practice. -05 will have some cleanups here. Section 3.3, "all number representations are equivalent" is unclear, even with the clarifying phrase afterward. Slightly clarified. If section 3.6 stays, the numerics need more work. +/- Infinity should be treated like NaN. (Why?) "if the result is the same value" would benefit from some more clarity. The tags section is also somewhat unclear. Clarified slightly. Section 4.2, what about numbers with uninteresting fractional parts, like 1.0? What about numbers in exponential format without fractional parts, like 1e10? I would recommend against even suggest encoding in place. It's likely to cause a security issue for the reason mentioned. Clarified in -05 that it is actually warning against the in-place strategy. Section 6, including the diagnostic notation is a little strange. I would at least like "it is not meant to be parsed" to be strengthened to "MUST NOT be parsed". (See other message.) Section 7.1, simple values 0..15 can never be used with this construction. If that's intentional, then give a good reason. (Fixed the wording based on IANA input.) Section 7.2, do we have language we can use about how to reclaim first-come-first-serve tags that aren't being used anymore? e.g. the web page is down and the requestor may be dead. It is generally impossible to reclaim IANA values once assigned -- even if the requestor is no longer active, the data may still be in use on the wire or in some archive. Our registry should be no different than anyone else's. In section 7.3, yes, we should make "application/mmmmm+cbor" valid. Added section 7.4 based on text suggested by Tony Hansen. In section 8, perhaps mention a stack attack like: 0x818181818181818181... I implemented depth counting with a maximum as an approach to avoid this. There are likely to be lots of other security concerns. Added a paragraph on resource exhaustion attacks. I have checked all of the examples in Appendix A. I would have expected 2(h'010000000000000000') | 0xc249010000000000000000 Not 18446744073709551616, since in the diagnostic, I don't necessarily support bignums. Same with 0xc349010000000000000000. I prefer to keep the explanatory value of actually giving the decoded bignum value. Text added: »In the diagnostic notation provided for bignums, their intended numeric value is shown as a decimal number (such as 18446744073709551616) instead of showing a tagged byte string (such as 2(h'010000000000000000')).« For numbers, section 9.8.1 of ECMA-262 (JSON.stringify) is relevant. So: 5.960464477539063e-8 | 0xf90001 (not e-08) 0.00006103515625 | 0xf90400 (not e-05) Changed this way in -05. In Appendix B, there are holes in the jump table. If you're going to have a table, call out the invalid values, such as 0x1c-0x1f. This may be shortened to rather call out the remaining valid values. Done in -05. In Appendix C, the use of the variable "breakable" is unclear, and it's not obvious how you'll get to the "no enclosing indefinite" case. breakable starts out as false. It is given as true where instead of a data item a "break" stop code is allowed to occur. The "no enclosing indefinite" case is reached if breakable is not set and an 0xFF is encountered anyway. In Appendix D, shouldn't the input be unsigned or an array of bytes? It could also be unsigned short (which would be widened to int anyway), but in this case it shouldn't matter. (Array of bytes would just be slightly more tedious.) Appendix E.1, aren't there some cases of DER where you need the schema to parse? As opposed to in BER? I'm not aware of that. (DER is a restricted subset of BER.) DER and BER are not exactly enabling schemaless decoding; they have in common that that implicit tagging may hide the data type information so you may need the schema information (ASN.1 definition) to find out the actual data type in use. (However, you can parse the surface structure of a BER/DER object without schema information.) We included this not because it is an alternative to CBOR, but because its use in the IETF is widespread. Add PER. PER does need schema information even for parsing and would require its own little dissertation, so we left this off. Appendix E, we should mention Smile, since the first two octets are so cute. :) (Smile was on my initial list of formats to cover in the appendix, but it is rather complicated and would need large amounts of text, beyond what would be appropriate for this little appendix.) Overall, I think this doc shows a lot of promise, and I'm looking forward to having something on standards track that has these properties. Thank you for this detailed review! Grüße, Carsten
- [apps-discuss] Gen-ART review of draft-bormann-cb… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
- Re: [apps-discuss] Gen-ART review of draft-borman… Zach Shelby
- Re: [apps-discuss] Gen-ART review of draft-borman… Tim Bray
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker
- [apps-discuss] CBOR and BULK (was Re: Gen-ART rev… Pierre Thierry
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Tony Finch
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Paul Hoffman
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker