Re: [Cbor] Chunks with tags inside indefinite-length string (major type 2 and 3)

Laurence Lundblade <lgl@island-resort.com> Fri, 06 December 2019 18:06 UTC

Return-Path: <lgl@island-resort.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B72BD120105 for <cbor@ietfa.amsl.com>; Fri, 6 Dec 2019 10:06:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hkPsBCKk8vTy for <cbor@ietfa.amsl.com>; Fri, 6 Dec 2019 10:06:24 -0800 (PST)
Received: from p3plsmtpa09-04.prod.phx3.secureserver.net (p3plsmtpa09-04.prod.phx3.secureserver.net [173.201.193.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5C7F71200A3 for <cbor@ietf.org>; Fri, 6 Dec 2019 10:06:24 -0800 (PST)
Received: from [192.168.1.76] ([76.167.193.86]) by :SMTPAUTH: with ESMTPA id dHzjioMBGiuF7dHzji8dct; Fri, 06 Dec 2019 11:06:23 -0700
From: Laurence Lundblade <lgl@island-resort.com>
Message-Id: <0F29E07D-1AB4-4229-B2C2-0C94FED3DA7E@island-resort.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_547A11E4-8C4A-45BB-95EB-06673273505D"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Fri, 06 Dec 2019 10:06:22 -0800
In-Reply-To: <CANh-dXmX+D=DRLBG2HtE5cPSNv_W7Sqr094c4KzWtUtma3V0jg@mail.gmail.com>
Cc: Faye Amacker <faye.github@gmail.com>, cbor@ietf.org
To: Jeffrey Yasskin <jyasskin@chromium.org>
References: <CA+qCGhv_d=uJxnPrnbRO_iN9nVhwPf0Qa8EvtqqXZS6pMaCyBw@mail.gmail.com> <CANh-dXnHnm2wxYPJfYJXmhHhOBJobnmFmwT5E=55PGcVv0ttPg@mail.gmail.com> <CA+qCGhtu5eZtmL+Xf+ZmgJ=NcfA0kV1zx+7_N7HY0eiM8DNHEw@mail.gmail.com> <CANh-dXmX+D=DRLBG2HtE5cPSNv_W7Sqr094c4KzWtUtma3V0jg@mail.gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-CMAE-Envelope: MS4wfLSjSblBr8BxkXS1p19Vl1snh1tHaoFhSI7R5HrAwsd+QjXAtMr7mVtJI//qOHsMyrRnD/+thn8HhCkROcFQcm75Uyqd3b7CLfSEMF7aVZmR+a2q4hnB 8AlkoaqyAu/5imFwbxtG2zwxIt9JACOImc4xp0bwcb056tNgbZ5pMA4pFoNvhjL/7HAfBpKfQmjyBkQIUsEpooJfV6PBqR5kiCx5Hzjh47YYtRJ+7ggL2qKW
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/oVdpKUhe2053FlS74RapxkQIkjw>
Subject: Re: [Cbor] Chunks with tags inside indefinite-length string (major type 2 and 3)
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Dec 2019 18:06:29 -0000

I agree that chunks in indefinite length strings should not be tagged. I think a decoder can aggregate the parts of an indefinite length string into what looks like a definite length string to the next layer up before processing tags.

It would be good to have some of these in the not-well-formed CBOR appendix. Here’s three:
    // Indefinite length byte string with tagged 2nd chunk
    {(uint8_t[]){0x5f, 0x41, 0x00, 0xc3, 0x42, 0x00, 0x00, 0xff}, 8}
    // Indefinite length byte string with tagged chunk
    {(uint8_t[]){0x5f, 0xc2, 0x41, 0x00, 0xff}, 5}
    // Indefinite length text string with tagged chunk
    {(uint8_t[]){0x7f, 0xc1, 0x61, 0x00, 0xff}, 5}

I think your three rules about tagging are about right (applied to any tagged item, not just to positive (big) numbers). CDDL allows you to specify any of the three. It even gives an example of the third one. See my_uri = #6.32(tstr) / tstr in section 3.6, though I hear we want to discourage that third rule.

One reason for restricting epoch date to a few types is to make it more interoperable. Smaller decoders will not support big nums, decimal fractions and such. Also, these larger types are not particularly needed for dates (unless you wish to date dinosaur bones to the microsecond).

LL


> On Dec 5, 2019, at 1:56 PM, Jeffrey Yasskin <jyasskin@chromium.org> wrote:
> 
> I believe that's correct, with the caveat that protocols should be able to specify that they accept a "number", which would accept primitive integers, primitive floats, and tagged bigints, bigfloats, bigdecimals, and any new tags defined to represent numbers. https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#epochdatetimesect <https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#epochdatetimesect> has some wording to restrict to just primitive integers.
> 
> Jeffrey
> 
> On Thu, Dec 5, 2019 at 1:47 PM Faye Amacker <faye.github@gmail.com <mailto:faye.github@gmail.com>> wrote:
> Jeffrey, I think you are correct.  It seems I misread a line in the pseudocode which made me doubt my interpretation of the text.
> 
> Can you confirm that decoders (both generic and non-generic) should:
> * reject a tagged positive number if a protocol specifies a positive number.
> * reject a positive number if a protocol specifies a tagged positive number.
> * accept both positive number and tagged positive number only if the protocol specifies the specific tag is optional.
> 
> Thanks again for your help.  I'm relieved tags on chunks are malformed.
> 
> On Thu, Dec 5, 2019 at 1:18 PM Jeffrey Yasskin <jyasskin@chromium.org <mailto:jyasskin@chromium.org>> wrote:
> I *think* https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#indefinite-length-byte-strings-and-text-strings <https://cbor-wg.github.io/CBORbis/draft-ietf-cbor-7049bis.html#indefinite-length-byte-strings-and-text-strings> is pretty clear that the contained things have to have the same major type as the indefinite-length string, and tags have a different major type. e.g. "a series of zero or more byte or text strings" doesn't include the possibility of tags, and "If any item between the indefinite-length string indicator (0b010_11111 or 0b011_11111) and the “break” stop code is not a definite-length string item of the same major type, the string is not well-formed." says any other content is not well-formed.
> 
> The pseudocode includes
> 
> if (it != mt)           // need finite-length chunk
>           fail();               //    of same type
> 
> which I think also rejects tags inside indefinite strings. What am I missing?
> 
> Jeffrey
> 
> On Thu, Dec 5, 2019 at 7:27 AM Faye Amacker <faye.github@gmail.com <mailto:faye.github@gmail.com>> wrote:
> While implementing support for tags, I ran into a scenario that might be errata or could use some clarification in 7049bis.
> 
> Section 3.2.3 states:
> 
> > Indefinite-length strings are represented by a byte containing the major type and additional information value of 31, followed by a series of zero or more byte or text strings (“chunks”) that have definite lengths, followed by the “break” stop code (Section 3.2.1). The data item represented by the indefinite-length string is the concatenation of the chunks (i.e., the empty byte or text string, respectively, if no chunk is present).
> 
> And the pseudocode in Appendix C allows tags for chunks inside indefinite-length strings.
> 
> Chunks are simply fragments to be concatenated, so tags applied to chunks doesn't seem intuitive.  Chunks are not intended for independent access like array elements.  Some tags (like #2 and #3) transform byte string into bignum which makes sense for arrays, not chunks.
> 
> If tags must be applied to each chunk, there should be some text mentioning that because library authors might think it should be rejected or applied to the concatenated string rather than chunk.
> 
> However, if the tag for a chunk applies to the entire concatenated string, then what happens when there are multiple chunks with different tags? Which tag wins?
> 
> A simpler way forward is to treat tags on chunks as malformed.  If this is the way to go, then the pseudocode in Appendix C needs to be updated and an example could be added to Appendix G.
> 
> GitHub issue for RFC 7049bis: https://github.com/cbor-wg/CBORbis/issues/148 <https://github.com/cbor-wg/CBORbis/issues/148>
> GitHub issue for my CBOR library: https://github.com/fxamacker/cbor/issues/44 <https://github..com/fxamacker/cbor/issues/44>_______________________________________________
> CBOR mailing list
> CBOR@ietf.org <mailto:CBOR@ietf.org>
> https://www.ietf.org/mailman/listinfo/cbor <https://www.ietf.org/mailman/listinfo/cbor>
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor