Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
Carsten Bormann <cabo@tzi.org> Mon, 12 August 2013 19:28 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1902721F9976; Mon, 12 Aug 2013 12:28:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.166
X-Spam-Level:
X-Spam-Status: No, score=-105.166 tagged_above=-999 required=5 tests=[AWL=-1.083, BAYES_00=-2.599, FF_IHOPE_YOU_SINK=2.166, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s7+aBbeGXauj; Mon, 12 Aug 2013 12:28:45 -0700 (PDT)
Received: from informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) by ietfa.amsl.com (Postfix) with ESMTP id D8A0921F994C; Mon, 12 Aug 2013 12:28:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from smtp-fb3.informatik.uni-bremen.de (smtp-fb3.informatik.uni-bremen.de [134.102.224.120]) by informatik.uni-bremen.de (8.14.4/8.14.4) with ESMTP id r7CJSW8A004192; Mon, 12 Aug 2013 21:28:32 +0200 (CEST)
Received: from [192.168.217.105] (p54894F9F.dip0.t-ipconnect.de [84.137.79.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp-fb3.informatik.uni-bremen.de (Postfix) with ESMTPSA id E2C59ED8; Mon, 12 Aug 2013 21:28:31 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Content-Type: text/plain; charset="windows-1252"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
Date: Mon, 12 Aug 2013 21:28:30 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <F51C4773-131F-43DE-8140-EAB95708584E@tzi.org>
References: <A723FC6ECC552A4D8C8249D9E07425A71418233A@xmb-rcd-x10.cisco.com>
To: Joe Hildebrand <jhildebr@cisco.com>
X-Mailer: Apple Mail (2.1508)
Cc: "draft-bormann-cbor-04.all@tools.ietf.org" <draft-bormann-cbor-04.all@tools.ietf.org>, "gen-art@ietf.org" <gen-art@ietf.org>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Gen-ART review of draft-bormann-cbor-04
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Aug 2013 19:28:51 -0000
On Aug 5, 2013, at 19:43, Joe Hildebrand (jhildebr) <jhildebr@cisco.com> wrote: > Sorry, my response is also correspondingly long. The present message wraps up the more general comments sent by Joe Hildebrand concerning CBOR. Lots of juicy comments. Thanks to all! For these, we are not using the line-by-line style because many of the topics were touched on by more than one person. A separate message will follow with line-by-line replies to the more specific comments. # Data model ## Extensibility It is easy to fall into the trap of optimizing for the current environment. Experience says there will be some future requirements that are genuinely unknown today, which may require extending a format in unforeseen ways. CBOR plans for this by leaving ~ 10 % of the encoding space unused (additional information 28..30). There is fundamentally no way to guarantee backward compatibility for this kind of unpredicted extension process. Leaving code space unallocated may allow for forward compatibility, though. CBOR also will be used in multiple environments. Instead of forcing each environment to invent their own format, CBOR allows adapting the format by providing extensible tagging and extensible primitive ("simple") values. Both are based on registries. Tags are more important in practice (and likely to be more numerous). The tagging scheme even allows first-come-first-served registration. Obviously, there is a trade-off between adaptability and interoperability. CBOR tries to maximize interoperability by allowing a generic parser to easily handle tags and primitives the registration of which it is not aware of. This will also enable evolving the decoders and the applications using them on separate timescales, if that is desired. We have the same cognitive dissonance as everybody else about opening the floodgates to extensibility. The alternative, making it painful or impossible to adapt the spec to a specific environment, is not necessarily more desirable. Many existing data formats make extensibility impossible; if a protocol or application wants one of those, that's fine; in fact, some of the other data formats that have been brought up since we started the discussion of CBOR have also chosen the "no extensibility" path. ## Tags CBOR has a number of predefined tags. This is one of the pain points for extensibility: most tags will not be of interest to some application or protocol developers, but having them specified early in the process will lead to better interoperability for those who want them. During the early development of the spec, Paul and I disagreed about which tags should be in the first RFC, but ultimately it doesn't really matter because the choice of tags is up to the protocol or application that uses CBOR. For all of these tags, a specific CBOR application will specify which of them are actually in use. There is little disagreement that we need a tag for a date/time format; there is probably no way to agree on a single one, so the document currently includes both RFC 3339 and epoch-based formats. Bignums (integer) may be less obvious to some, they are very obvious in other environments. Having (extended precision) base2 floating point as well as a base10 floating point in there also seems obvious for those environments that need it. These are the typical programming language types. YAGNI doesn't apply; leaving them out of the spec just means that there likely will be several, incompatible versions of them. Four tags on strings indicate further likely kinds of data. Nesting CBOR encoding into a byte string seems natural to support. We also included tags for URIs, regular expressions, and MIME messages (OK, maybe less obvious). Then we have two tags that indicate that a text string is encoded in Base64/Base64url (33/34). The argument could be made that these could be decoded and sent as byte strings; however, there may be a need to keep the exact format (whitespace and padding), e.g. for signature checking. Tags 21-23 do the inverse: they tag byte strings with the base encoding that any JSON converter on the path should be using. Converter hints may be a new concept in a data format; in a world where JSON is widely used in processing chains, they could be eminently useful (Paul and I still disagree on this one.) Tags can be nested. Actually, since every tag specifies its possible contents, none of the tags specified in -04 can be nested. To cite an obvious example where nested tagging would be used, tag 1 accepts a floating point value and might be changed to also accept a base10 floating point value (tag 1 currently does not allow that, though). Disallowing the nesting of tags just forces a protocol designer to turn tag1(tag2(x)) into something more complex like tag1([tag2(x)]). We don't think forcing the introduction of this complexity reduces complexity. For an implementation that does know both tags, the processing will be natural. Any other implementation can ignore the unknown tags or (preferably) hand them to the application in a similar way the other containing data structures are handed to the application (i.e, tagx(y) is not much different in handling from [x, y] or {x: y}). A later BCP might want to warn specifiers of future tags that designing them in such a way you get recursion (as opposed to mere nesting) is not such a good idea, but you have to come up with very contrived examples before that becomes a concern. # Encoding issues ## Numbers Interchanging numbers is a much more complicated issue than one would think. Recent discussions of number precision issues in the JSON WG show that it is easy to underestimate the problem and get this wrong. In the area of integer numbers, CBOR is highly opinionated in favor of unsigned (i.e., non-negative) numbers — this is what is actually needed in the majority of applications. Many applications also will clearly need negative numbers, so they have their own encoding. Obviously, if you encode N bits for a negative number, you need N+1 bits in a signed representation of that since you need to add the sign bit. We chose to keep N at 8, 16, 32, 64, so N+1 has a slightly odd size, which implementers indeed need to be aware of if that is larger than their number range (not a problem, e.g., for JavaScript). CBOR protocols should be explicit about the number range and precision that is expected to be supported. In the area of floating point numbers, CBOR ignores IEEE 754 decimal numbers (in favor of its own, simpler format) and only supports three sizes of the IEEE 754 binary floating point numbers. That is, again, opinionated (but can be fixed in a pinch by introducing appropriate tags). The inclusion of 16-bit floating point is motivated by its usefulness for low-precision sensor data. The reason this may not be obvious to many is likely that programming languages evolve on time scales where this 2008 addition to IEEE 754 is still very, very recent. ## Maps As with most modern programming languages, CBOR has a map type; this is intended to be a general map and not to be limited to text string keys as in JSON objects. Dependent specifications have already been started that make good use of non-text keys. In particular, integer keys are a good match for environments that already maintain some form of registry based on integers. CBOR maps are intended to be maps, not multimaps. Duplicate keys are an encoding error. Unfortunately, what should be considered duplicate is often application dependent (are 0 and 0.0 the same key? "Å" and "Å"?), so a generic parser will only be able to do part of the work. CBOR maps do not ascribe meaning to the order of items in the map. This is hard to prescribe in the format; it is a requirement on the applications using it. # Implementation issues Complexity/Code size: We will leave an assessment of that to those who have implemented the spec (and we have heard from a few). The pseudo-code in Appendix C should give some impression how a minimal implementation could look like. Performance: Memory allocation indeed is a significant contributor to the CPU use of a format decoder. CBOR tries to help a bit by providing right at the start of a data item the size of the object that needs to be allocated (as opposed to the length of its representation as in ASN.1 BER). Of course, that is not possible in indefinite length encoding ("streaming"); here we trade off some additional CPU in the decoder (needed for reallocation/copying of unknown-length structures) against less complexity in the encoder, for those cases where the encoder would need to perform significant buffering to obtain these sizes before encoding a data item. # Document issues We are open to suggestions for replacement words for "well-formed" and "valid" in Section 3.7. A definition of "valid" is not actually needed; that term is only used in its English sense in the discussion of generic parsers (and the connotation with XML's sense of valid is unfortunate). "Well-formed" is defined algorithmically in the pseudocode in Appendix C. For a binary format, there is a strong requirement for a diagnostic notation that everyone understands that is trying to get an application going. So defining it in the document is the best way to get this. This definition is not needed for interop, so we can do away with ABNF etc. I started out by saying something equivalent to "MUST NOT be parsed", but an RFC 2119 MUST struck me as silly when diagnostic notation is not meant for interop in the first place. (Also, there might be tools for encoding test vectors etc. that do parse diagnostic notation; in -05, we have added a pointer to YAML as another format that may be useful for text-based specification of CBOR data items in configuration files.) Grüße, Carsten
- [apps-discuss] Gen-ART review of draft-bormann-cb… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
- Re: [apps-discuss] Gen-ART review of draft-borman… Zach Shelby
- Re: [apps-discuss] Gen-ART review of draft-borman… Tim Bray
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker
- [apps-discuss] CBOR and BULK (was Re: Gen-ART rev… Pierre Thierry
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Joe Hildebrand (jhildebr)
- Re: [apps-discuss] Gen-ART review of draft-borman… Martin Thomson
- Re: [apps-discuss] Gen-ART review of draft-borman… Tony Finch
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Paul Hoffman
- Re: [apps-discuss] Gen-ART review of draft-borman… Carsten Bormann
- Re: [apps-discuss] Gen-ART review of draft-borman… Phillip Hallam-Baker