Re: [Cbor] Interactions of packed CBOR and tags

Jim Schaad <ietf@augustcellars.com> Fri, 04 September 2020 16:17 UTC

Return-Path: <ietf@augustcellars.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1C6B33A0F1A for <cbor@ietfa.amsl.com>; Fri, 4 Sep 2020 09:17:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0zb7ZBWr6gKK for <cbor@ietfa.amsl.com>; Fri, 4 Sep 2020 09:17:26 -0700 (PDT)
Received: from mail2.augustcellars.com (augustcellars.com [50.45.239.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC6CF3A0EF2 for <cbor@ietf.org>; Fri, 4 Sep 2020 09:17:25 -0700 (PDT)
Received: from Jude (73.180.8.170) by mail2.augustcellars.com (192.168.0.56) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 4 Sep 2020 09:16:59 -0700
From: Jim Schaad <ietf@augustcellars.com>
To: 'Brendan Moran' <Brendan.Moran@arm.com>
CC: cbor@ietf.org, 'Carsten Bormann' <cabo@tzi.org>
References: <00c101d67cb5$2588b790$709a26b0$@augustcellars.com> <E30F54B6-1A63-48AC-89AE-61983654B5A9@tzi.org> <00cc01d67cc9$766c7b60$63457220$@augustcellars.com> <4AE9B2FA-EEB3-4B45-96E4-9DC85118567D@arm.com> <016f01d6820b$bc7d7cc0$35787640$@augustcellars.com> <25F7B7D3-7ED7-4062-8000-21D1AF1A69C3@arm.com> <018001d6821d$fb710980$f2531c80$@augustcellars.com> <94909856-483E-4FFD-BB6E-59C79623FF6C@arm.com>
In-Reply-To: <94909856-483E-4FFD-BB6E-59C79623FF6C@arm.com>
Date: Fri, 04 Sep 2020 09:16:56 -0700
Message-ID: <01e401d682d6$d12196e0$7364c4a0$@augustcellars.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQHF41LwYmnJHHEpyMn9bwuLrqikDwI3uBqqAWrESqACK9UjWQHDDhkNAm3wJDoCYXmmBQJqiUG2qQOKLgA=
Content-Language: en-us
X-Originating-IP: [73.180.8.170]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/4Sm-7e-NiSoA9TqLY3QgjCTj2yw>
Subject: Re: [Cbor] Interactions of packed CBOR and tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Sep 2020 16:17:28 -0000


-----Original Message-----
From: Brendan Moran <Brendan.Moran@arm.com> 
Sent: Friday, September 4, 2020 4:05 AM
To: Jim Schaad <ietf@augustcellars.com>
Cc: cbor@ietf.org; Carsten Bormann <cabo@tzi.org>
Subject: Re: [Cbor] Interactions of packed CBOR and tags



> On 3 Sep 2020, at 19:13, Jim Schaad <ietf@augustcellars.com> wrote:
>>
>> [JLS]  I don't find this to be very convoluted at all and it is what 
>> my compression algorithm generates automatically.  I look at the 
>> piece I split off from the prefix and see if I have "enough" 
>> duplicates to compress them down.  I am still trying to decide how to 
>> do some extraction from "the middle" of things.  Consider looking at 
>> ["www.merch.ietf.org", "www.datatracker.ietf.org"]
>>
>> It would be nice to think about compressing that as
>>
>> 6([ "www.", "merch", "datatracker"], [".ietf.org""], 
>> 224(225(simple(2))), 224(226(simple(2)))]
>>
>> Where we have pulled both prefix and postfix strings extracted and maximize the amount of commonality.
>>
>> [/JLS]
>
>
> [JLS] After correcting both of our notations, in this case the it seems that both of the notations are the same size.  From your response, I am not sure if you believe that the notation I gave was not legal under the current draft or not.
[BJM]

It doesn’t look legal to me. The draft says:
> Packed-CBOR = #6.6([rump, [*prefix], *shared])

[JLS] - True, I did a fast "how to I move the rump to the end" in my code and that was what I did quickly just to see what it looked like.


Normalising it with the -01 draft, I think we get the following (45 bytes):
>> 6([[6(224(simple(2))), 6(225(simple(2)))], ["www.", "merch", 
>> "datatracker"], ".ietf.org"])

[JLS] Yes - this is what I should have had.  Not sure where the second byte came from, one was the lost array on the substitute table.

That’s 2 bytes better than what I suggested.

[/BJM]

>
>> Back to the observations I made above.
>>
>> 1. Compressing the domain names doesn’t work very well. Maybe this is unique to domain names, but I think we need more data on that.
>> 2. There’s a lot of regularity left here, but it’s all in the form of sequences of array elements. For example, the sequence [1, simple(1), 2] shows up regularly, as does [simple(4), 5, 0 , 6].
>>
>> [JLS] This would be taken care of by the fact that we have discussed doing the same prefix processing on arrays as well.  This was brought up specifically for the CRI case where we saw that this was going to be an issue.
>
> I’m glad to hear that. Did I miss something on the mailing list? I assumed that this was still TBD since I didn’t see it there. Does that mean that the use of Tag 6 as the first prefix is being dropped so that array references are explicit? Or are array prefixes prohibited in the first slot? Or something else?
>
> [JLS] https://mailarchive.ietf.org/arch/msg/cbor/QYqEfdkQ2iE98M1on5QZA35kgrM/ is the message where Carsten made a proposal which has not gotten into the draft yet.

[BJM]
I’d read that, but I didn’t realise it was any more than an off-hand comment. Do I understand correctly that the following is the intent?

[
  [1,2,3,4],
  [1,2,4,8]
]

Is packed to:

6([
  [
    224([3,4]),
    224([4,8])
  ],
  [
    null,
    [1,2]
  ]
])

(Yes, this is a bad example, since it inflates the original.)

[/BJM]

[JLS] Yes that was my intent.  Note that this leads to some interesting decisions during packing for CRIs as the number of times a string is references is going to be changed by the fact that the first part of the array gets packed. So that

[
  [1, "ietf.org", "a", "b"],
  [1, "ietf.org", "c", "d"]
]

Means that "ietf.org" only occurs once and therefore does not need to get packed itself anymore.
[/JLS]


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.