[Cbor] my (WGLC re-)views on error processing in RFC7049bis and future-proofing
Michael Richardson <mcr+ietf@sandelman.ca> Thu, 14 May 2020 16:31 UTC
Return-Path: <mcr+ietf@sandelman.ca>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 361AF3A0BD6 for <cbor@ietfa.amsl.com>; Thu, 14 May 2020 09:31:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qxdYCPv0Nush for <cbor@ietfa.amsl.com>; Thu, 14 May 2020 09:31:45 -0700 (PDT)
Received: from tuna.sandelman.ca (tuna.sandelman.ca [209.87.249.19]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A443D3A0BBF for <cbor@ietf.org>; Thu, 14 May 2020 09:31:45 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by tuna.sandelman.ca (Postfix) with ESMTP id 5EFEB389D8 for <cbor@ietf.org>; Thu, 14 May 2020 12:29:38 -0400 (EDT)
Received: from tuna.sandelman.ca ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with LMTP id AqhLuQxqmm89 for <cbor@ietf.org>; Thu, 14 May 2020 12:29:34 -0400 (EDT)
Received: from sandelman.ca (obiwan.sandelman.ca [IPv6:2607:f0b0:f:2::247]) by tuna.sandelman.ca (Postfix) with ESMTP id 08F693899A for <cbor@ietf.org>; Thu, 14 May 2020 12:29:34 -0400 (EDT)
Received: from localhost (localhost [IPv6:::1]) by sandelman.ca (Postfix) with ESMTP id D9F1D213 for <cbor@ietf.org>; Thu, 14 May 2020 12:31:39 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: cbor@ietf.org
In-Reply-To: <CANh-dXmjD=RCwh7ExjSvFx+5ciew+eqHoVS88OommQ2xVnX5=Q@mail.gmail.com>
References: <17300.1588779159@localhost> <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org> <CANh-dXkdjMyO=WFUxrF06OfP+RE9v11unKJXL8P3UtEe+prV1w@mail.gmail.com> <13690.1588894939@localhost> <CANh-dXmjD=RCwh7ExjSvFx+5ciew+eqHoVS88OommQ2xVnX5=Q@mail.gmail.com>
X-Mailer: MH-E 8.6+git; nmh 1.7+dev; GNU Emacs 26.1
X-Face: $\n1pF)h^`}$H>Hk{L"x@)JS7<%Az}5RyS@k9X%29-lHB$Ti.V>2bi.~ehC0; <'$9xN5Ub# z!G,p`nR&p7Fz@^UXIn156S8.~^@MJ*mMsD7=QFeq%AL4m<nPbLgmtKK-5dC@#:k
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Thu, 14 May 2020 12:31:39 -0400
Message-ID: <2963.1589473899@localhost>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/QOk_hrJoF8NcuiorkeXex9mUH4w>
Subject: [Cbor] my (WGLC re-)views on error processing in RFC7049bis and future-proofing
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 May 2020 16:31:50 -0000
Carsten and Paul, my appologies for the length of this email. I've been working on this an hour or so a day for a week :-( I will probably reply and emphasis my important points. This document says it is going to Internet Standard, but *only* in the Contributing section. Please also forgive me that I haven't re-read RFC7049 in a long while, so I might be complaining about things that are just matching 7049. Jeffrey Yasskin <jyasskin@chromium.org> wrote: >> I am a user of parsers, I have occasionally had to write my own >> conversions, but mostly I would say that I am not up to speed on some >> of the details. > That's all reasonable. :) Given the discussion, and Lawrence's quick > analysis of the existing tags, how do you currently feel about the > state of RFC7049bis's requirements around error handling? Maybe time for a top-to-bottom read. Probably appropriate thing to do during WGLC :-) Intro does say: "It does not create a new version of the format." I think that I'd like to amend this to say that: This document is a revised edition of [RFC7049], with editorial improvements, added detail, and fixed errata. This revision formally obsoletes RFC 7049, while keeping full compatibility of the interchange format from RFC 7049. It does not create a new version of the format. -> This document is a revised edition of [RFC7049], with editorial improvements, added detail, and fixed errata. In clarifying some interpretations of [RFC7049] it may in some cases create situations where an existing parser may no longer comply to this specification. While this revision formally obsoletes RFC 7049, it does not obsolete any valid encoders, and thus keeps full compatibility with the interchange format from RFC 7049. It does not create a new version of the format. I note this text (section 2.1): While there is a strong expectation that generic encoders and decoders can represent "false", "true", and "null" ("undefined" is intentionally omitted) in the form appropriate for their programming environment, implementation of the data model extensions created by tags is truly optional and a matter of implementation quality. This seems to have something to do with tags. Also section 2.2 ends with: "0.0" as an integer (major type 0, Section 3.1). However, if a specific data model declares that floating-point and integer representations of integral values are equivalent, using both map keys "0" and "0.0" in a single map would be considered duplicates, even while encoded as different major types, and so invalid; and an encoder could encode integral-valued floats as integers or vice versa, perhaps to save encoded bytes. To me, this says that all generic encoders that intend to return data in the native form of their programming environment need to be configured as to the protocol. This is supporting my suggestion that a well designed library would/could have to be configured for the specific data model when it comes to how unknown tags are treated. > * I'm not opposed to advice for parsers to have an option to treat a > value tagged with an unknown tag as equivalent to the value itself. > * I dislike the idea of any generic parser doing that by default, I think > based on reasoning like in > https://tools.ietf.org/html/draft-iab-protocol-maintenance-04. * If a > parser passes unknown tags up to the application, a higher-level > protocol can ignore them itself, skip their data item, return an error, > or do something else appropriate to the context. So if the RFC should > recommend a default in generic parsers, I'd vote for that one. * I > don't intend to draft this wording myself. :) I think that I'm arguing for generic parsers to be in RFC7049 mode, which might mean ignoring unknown tags (if that's what they did before), passing the data up using whatever native interpretation there is, until they are configured otherwise. That is, if there is a seconds-since-epoch tag (XXX) which a generic parser did not understand, followed by an integer, that it would return an integer if it did not have an interface that passed tags. > 28, 29, 30: These values are reserved for future additions to the > CBOR format. In the present version of CBOR, the encoded item is > not well-formed. I think that there is a bug here. What should a parser written today do when it encounters these values? (forward reference to section 7.2?) Getting this right is how we deal with future-proofing. It seems seeing such a thing means a current decoder has to abort/fail. What we write here has a profound implication, I think, on how easily we could act on the advice of section 7.2. Section 10, first paragraph implies we should say something. In general, I think that the details in this introductionary encoding section are too detailed, particularly for 31. I think that detail belongs later on. I got no value (I retained nothing) from having that level of detail there. I wonder if section 3.1, under major type 0 should give clarify that "0" is encoded as 0b000_00000. (That is no negative 0) "A string containing an invalid UTF-8 sequence is well- formed but invalid." I think that this might need clarification. I guess that RFC8742 include sequences of 7049bis CBOR sequences. I wonder if Updates 8742 is appropriate. > If the break stop code appears after a key in a map, in place of that > key's value, the map is not well-formed. This does mean that the entire map is not well-formed, or just the key/value pair where this occurs? I take the first meaning, but I want to be sure. 3.2.3: (Note that zero-length chunks, while not particularly useful, are permitted.) they might be useful in non-TCP/IP situations where it is useful to send a "keep-alive" on some channel. I think that it might be cleaner to swap the order of sections 3.2 (infinite length things), and 3.3 (floating-point and stuff). This just puts major type 7 more in context first. > As with all other major types, the 5-bit value 24 signifies a single- > byte extension: it is followed by an additional byte to represent the > simple value. (To minimize confusion, only the values 32 to 255 are > used.) This maintains the structure of the initial bytes: as for the Or, to put it another way, 5-bit Values 24->31 in table 3 are also "Simple Values". Could future Simple Values (such as 0..19) can, have complex structure the way that values 24->27 do? Or to put it another way, can a decoder depend upon unassigned simple values having the one-or-two byte structure presented and be able to skip unknown values? Or does a decoder that encounters undefined values here have to fail? > formed. (This implies that an encoder cannot encode false, true, > null, or undefined in two-byte sequences, only the one-byte variants > of these are well-formed.) I here suggest the text say: + formed. (This implies that an encoder cannot encode false, true, + null, floats, undefined-23, reserved-[28..31], or break in two-byte + sequences, only the one-byte variants of these are well-formed.) While it's too late to change, was there a reason "True" didn't get 0b111_00001? Clearly False, and Null would then compete to be 0b111_00000, and maybe that's reason enough to not play such games. Section 3.4 says: } Their primary purpose in this specification is to define common data } types such as dates. A secondary purpose is to provide conversion } hints when it is foreseen that the CBOR data item needs to be } translated into a different format, requiring hints about the content } of items. I don't think that the "primary purpose" is still just dates. The note about "hints" suggests that tags are always advisory, and I think that this thread has established that for some protocols, they really are not. } Understanding the semantics of tags is optional for a } decoder; it can simply present both the tag number and the tag } content to the application, without interpreting the additional } semantics of the tag. I wonder if this text should be stronger. Maybe: + Understanding the semantics of every tag is optional for a decoder; + a decoder MAY simply present some or all tags to the + application, without interpreting the additional + semantics of the tag. I would then go on: + Decoders which translate CBOR values into language specific objects, + (e.g., dates, bignum, example3, ...) MAY consume the tags along with + the values, returning only the language defined object. I note that AFAIK, we do not use tag#24 (Encoded CBOR data item) for the signed object, in COSE. Should we? What's the difference between #24 and #55799. I guess I will read onwards to find out... Got it. BTW: Tag 25 and 29 are called out after Table 5, but are not listed *in* table 5. That whole paragraph could use some more periods, and maybe a blank line. I'm still loss as to why <untagged><null> is better than <epoch><null>. Why can't we use decimal fractions, or bigfloats for time? I suppose float64 has enough precision for a millenia or so, even if one wants microseconds precision. I think that words "bytewise lexicographic order" used in 4.2.1 may not survive translations in a meaningful way. The eight item example might be clearer if presented in a table so that the bytes can be lined up. If keys of a map have tags, I assume that the tags are to be included in the lexicographically order? Maybe 4.2.3 could cover both ways with a forward reference? I think that 4.2.2 gets into whether or not a tag is required, and I think that it might need to be considered more in context of when tags can be skipped. I think that the the Introduction should have a section 1.3 that addresses the concept of "Protocols" on top of CBOR, referencing section 5. I think that 4.2.2 should forward reference to 5, or maybe sections 4 and 5 I suggest "protocol" be capitalized consistently as Protocol when it is used in this way. I don't find that section 5.2 fits into section 5. I think we already covered this concept. "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR item. ...... Generic encoders and decoders are expected to forward simple values and tags even if their specific codepoints are not registered at the time the encoder/decoder is written (Section 5.4). Generic decoders provide ways to present well-formed CBOR values, both valid and invalid, to an application. The diagnostic notation (Section 8) may be used to present well-formed CBOR values to humans. I don't personally know enough UTF-8 to know why the above is invalid UTF-8. Maybe saying that it's not because c0ae is an unsigned code point by because... (I remember the valid MIME and valid UTF-8 debate we had last time, and I am not trying to re-open it.) I think that the second paragraph above should be swapped with the first one. 1. Replace the problematic item with an error marker and continue with the next item, or -> this might be a place where that desired "invalid tag" from last week's discussion fits in!!! Having read through section 5, I believe even more than two weeks ago, that the "65535" tag should go into RFC7049bis, not a new document. 5.3.1 "Duplicate keys in a map" seems to suggest that "Stream Encoder" will be specified/discussed in section 5.6, when in fact that section is about keys, including duplicate keys. Maybe 5.3.1/Basic Validity could go later, or just not be said at all, since the entire section is about this topic? I think that section 7.1 has a lot of aspirational language ("an attempt should..."), which might have been appropriate for the ID that let to 7049, but SHOULD now be definitive. "An attempt was made to make..." -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
- [Cbor] RFC7049bis processing of unknown tags Michael Richardson
- Re: [Cbor] RFC7049bis processing of unknown tags Carsten Bormann
- Re: [Cbor] RFC7049bis processing of unknown tags Laurence Lundblade
- Re: [Cbor] RFC7049bis processing of unknown tags Jeffrey Yasskin
- Re: [Cbor] RFC7049bis processing of unknown tags Michael Richardson
- Re: [Cbor] RFC7049bis processing of unknown tags Jeffrey Yasskin
- Re: [Cbor] RFC7049bis processing of unknown tags Laurence Lundblade
- [Cbor] my (WGLC re-)views on error processing in … Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] RFC7049bis processing of unknown tags Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Jeffrey Yasskin
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Laurence Lundblade
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- [Cbor] tag 24 and 55799 (was Re: my (WGLC re-)vie… Laurence Lundblade
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Carsten Bormann
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Laurence Lundblade
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Carsten Bormann