Re: [Cbor] my (WGLC re-)views on error processing in RFC7049bis and future-proofing
Michael Richardson <mcr+ietf@sandelman.ca> Thu, 21 May 2020 23:10 UTC
Return-Path: <mcr@sandelman.ca>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6944C3A0C92 for <cbor@ietfa.amsl.com>; Thu, 21 May 2020 16:10:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5gq76MA1KYNt for <cbor@ietfa.amsl.com>; Thu, 21 May 2020 16:10:48 -0700 (PDT)
Received: from relay.sandelman.ca (relay.cooperix.net [176.58.120.209]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C09BC3A0C50 for <cbor@ietf.org>; Thu, 21 May 2020 16:10:47 -0700 (PDT)
Received: from dooku.sandelman.ca (unknown [IPv6:2607:f0b0:f:40:c840:256e:f31:f2f9]) by relay.sandelman.ca (Postfix) with ESMTPS id 16C191F449; Thu, 21 May 2020 23:10:45 +0000 (UTC)
Received: by dooku.sandelman.ca (Postfix, from userid 179) id 15AC21A329C; Thu, 21 May 2020 19:10:44 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: Carsten Bormann <cabo@tzi.org>
cc: cbor@ietf.org
In-reply-to: <377E8232-0638-419F-8D79-710F42C2B4E3@tzi.org>
References: <17300.1588779159@localhost> <38BB6FFF-737F-4C11-AD7A-DA3F28A9F570@tzi.org> <CANh-dXkdjMyO=WFUxrF06OfP+RE9v11unKJXL8P3UtEe+prV1w@mail.gmail.com> <13690.1588894939@localhost> <CANh-dXmjD=RCwh7ExjSvFx+5ciew+eqHoVS88OommQ2xVnX5=Q@mail.gmail.com> <2963.1589473899@localhost> <377E8232-0638-419F-8D79-710F42C2B4E3@tzi.org>
Comments: In-reply-to Carsten Bormann <cabo@tzi.org> message dated "Wed, 20 May 2020 16:49:49 +0200."
X-Mailer: MH-E 8.6; nmh 1.7+dev; GNU Emacs 25.2.1
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Thu, 21 May 2020 19:10:44 -0400
Message-ID: <4347.1590102644@dooku>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/ygLIwBnsuy6wvkZxE9TENYtrECE>
Subject: Re: [Cbor] my (WGLC re-)views on error processing in RFC7049bis and future-proofing
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 May 2020 23:10:54 -0000
Carsten Bormann <cabo@tzi.org> wrote: >> This document is a revised edition of [RFC7049], with editorial >> improvements, added detail, and fixed errata. In clarifying some >> interpretations of [RFC7049] it may in some cases create situations >> where an existing parser may no longer comply to this specification. >> While this revision formally obsoletes RFC 7049, it does not obsolete >> any valid encoders, and thus keeps full compatibility with the >> interchange format from RFC 7049. It does not create a new version of >> the format. > I don’t think we want to go to that level of detail right there in the > intro. I’m also not sure 7049 was formally defining “compliance” of > decoders. > I think we could move some of this thinking into the “changes from > 7049” appendix, the fate of which we haven’t really decided as a WG > yet. okay, I can go for that. >> This seems to have something to do with tags. > Yes. Implementation of specific tags was always optional with CBOR; > there is no single “required tag”. yes, okay, but some Protocols might require them. So maybe we need some additional BCP14-like terms. Hmm. >> which might mean ignoring unknown tags (if that's what they did >> before), passing the data up using whatever native interpretation >> there is, until they are configured otherwise. > Generic decoders that want to evolve to be usable with applications > that need tag support will need to develop a transition strategy. > Isn’t that obvious to any library developer? Yes, but sometimes application developers need a hammer. This happens far less often in open source situations, but in other situations, the hammer is sometimes necessary to make a supplier spend the effort. >> In general, I think that the details in this introductionary encoding >> section are too detailed, particularly for 31. I think that detail >> belongs later on. I got no value (I retained nothing) from having that >> level of detail there. > I think there was another proposal to move around elements of Section > 3. Sometimes it is necessary to include some detail in an overview for > completeness; we can’t really pretend ai=31 does not exist for a few > sections and then do a surprise reveal, can we? We can just say less in section 3. ai=31 "Stop-Code, ses section FOO" That's okay. It's some of the other detail that seems to accumulated there which distracted me. >> I wonder if section 3.1, under major type 0 should give clarify that >> "0" is encoded as 0b000_00000. (That is no negative 0) > Is this really more about major type 1, and the value -1? maybe. >> I guess that RFC8742 include sequences of 7049bis CBOR sequences. I >> wonder if Updates 8742 is appropriate. > You still lost me here. What are 7049 CBOR sequences? > Oh. We are talking about “Data Streams”. This probably should mention > RFC 8742 (which only happens in 3.1)! Yes. >>> If the break stop code appears after a key in a map, in place of that >>> key's value, the map is not well-formed. >> >> This does mean that the entire map is not well-formed, or just the >> key/value pair where this occurs? I take the first meaning, but I >> want to be sure. > It says that the map is not well-formed, but of course the whole data > item is dubious as it is not clear whether the map has ended or not. > So, again, give up. Should be clear in Appendix C. I am concerned that different kinds of parsers wind up resulting in an error at a different point, possibly causing some content to be examined, while other content is not. This is akin to the duplicate key challenge. >> Could future Simple Values (such as 0..19) can, have complex structure >> the way that values 24->27 do? > No, the general syntax of heads does apply to the unallocated code > points as well. I'd like to say this. >> Or to put it another way, can a decoder depend upon unassigned simple >> values having the one-or-two byte structure presented and be able to >> skip unknown values? > Yes. >> I guess I will read onwards to find out... Got it. >> >> BTW: Tag 25 and 29 are called out after Table 5, but are not listed >> *in* table 5. That whole paragraph could use some more periods, and >> maybe a blank line. I'm still loss as to why <untagged><null> is >> better than <epoch><null>. >> >> Why can't we use decimal fractions, or bigfloats for time? > That may have been a mistake (which is one reason we have tag 1001 > now). The WG has generally taken a dim view on extending the domain > (allowable syntax for tag content) for a tag, so we can’t “fix” that — > note that for the date tag, we have taken the decision not to reuse one > tag for two different tag content syntaxes either. okay. >> I think that words "bytewise lexicographic order" used in 4.2.1 may >> not survive translations in a meaningful way. > Deepl turns this into "byteweise lexikographische Ordnung”, na ja. The > next alternative "byteweise lexikographische Reihenfolge” is very > close. No idea about "ordre lexicographique par octet”. Or "пошаговый > лексикографический порядок”, for that matter. But "字节词序” looks > really good :-) (and is completely wrong). I'm less concerned about translations of the draft by mechanical, but translations by human who didn't understand the original mathematical meaning. Whether that person is a translator, or a non-native english/germanic/latin speaker writing code. I'm just asking if you could use a less technical term. HIT ME ON THE HEAD. >> I think that the the Introduction should have a section 1.3 that >> addresses the concept of "Protocols" on top of CBOR, referencing >> section 5. I think that 4.2.2 should forward reference to 5, or maybe >> sections 4 and 5 I suggest "protocol" be capitalized consistently as >> Protocol when it is used in this way. >> >> I don't find that section 5.2 fits into section 5. I think we already >> covered this concept. > This section reaffirms to concept developed above. It also discusses > the application interface — have we really covered that here? I don't understand your comments here. >> "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR >> item. ...... Generic encoders and decoders are expected to forward >> simple values and tags even if their specific codepoints are not >> registered at the time the encoder/decoder is written (Section 5.4). >> >> Generic decoders provide ways to present well-formed CBOR values, both >> valid and invalid, to an application. The diagnostic notation >> (Section 8) may be used to present well-formed CBOR values to humans. >> >> I don't personally know enough UTF-8 to know why the above is invalid >> UTF-8. > Because it uses a long form where there is a shorter form - UTF-8 uses > deterministic (shortest) encoding. Can you please say that in the document. } For instance, "0x62c0ae" uses a long form where there is a shorter form, and } UTF-8 mandates deterministic (shortest) encoding, therefore, this not a valid } CBOR item. >> Maybe saying that it's not because c0ae is an unsigned code point by >> because... > This is a mild reminder that you have to read up on UTF-8 to understand > CBOR. Yes, but many will just plop the text into a UTF-8 parser and hope. We can't all be experts at everything :-) >> Having read through section 5, I believe even more than two weeks ago, >> that the "65535" tag should go into RFC7049bis, not a new document. > We didn’t want new stuff in 7049bis, so this is now in > draft-bormann-cbor-notable-tags with a mention here; RFC7049bis > combines with its registries to give the full semantics of CBOR, so > this is OK. I feel that this is a reasonable thing to address short-cuming in existng PS, but I won't fall on this sword. > Grüße, Carsten Thank you for all the work. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [ -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
- [Cbor] RFC7049bis processing of unknown tags Michael Richardson
- Re: [Cbor] RFC7049bis processing of unknown tags Carsten Bormann
- Re: [Cbor] RFC7049bis processing of unknown tags Laurence Lundblade
- Re: [Cbor] RFC7049bis processing of unknown tags Jeffrey Yasskin
- Re: [Cbor] RFC7049bis processing of unknown tags Michael Richardson
- Re: [Cbor] RFC7049bis processing of unknown tags Jeffrey Yasskin
- Re: [Cbor] RFC7049bis processing of unknown tags Laurence Lundblade
- [Cbor] my (WGLC re-)views on error processing in … Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] RFC7049bis processing of unknown tags Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Jeffrey Yasskin
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- Re: [Cbor] my (WGLC re-)views on error processing… Michael Richardson
- Re: [Cbor] my (WGLC re-)views on error processing… Laurence Lundblade
- Re: [Cbor] my (WGLC re-)views on error processing… Carsten Bormann
- [Cbor] tag 24 and 55799 (was Re: my (WGLC re-)vie… Laurence Lundblade
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Carsten Bormann
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Laurence Lundblade
- Re: [Cbor] tag 24 and 55799 (was Re: my (WGLC re-… Carsten Bormann