Re: [Cbor] Packed CBOR and dictionaries

Jim Schaad <ietf@augustcellars.com> Fri, 28 August 2020 19:00 UTC

Return-Path: <ietf@augustcellars.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5BF5F3A0B75; Fri, 28 Aug 2020 12:00:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vnp-yrWcpEDY; Fri, 28 Aug 2020 12:00:26 -0700 (PDT)
Received: from mail2.augustcellars.com (augustcellars.com [50.45.239.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D0F9E3A09DF; Fri, 28 Aug 2020 12:00:24 -0700 (PDT)
Received: from Jude (73.180.8.170) by mail2.augustcellars.com (192.168.0.56) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 28 Aug 2020 12:00:16 -0700
From: Jim Schaad <ietf@augustcellars.com>
To: 'Michael Richardson' <mcr+ietf@sandelman.ca>, draft-bormann-cbor-packed@ietf.org, cbor@ietf.org
References: <008c01d67c47$aaf73be0$00e5b3a0$@augustcellars.com> <28732.1598638838@localhost>
In-Reply-To: <28732.1598638838@localhost>
Date: Fri, 28 Aug 2020 12:00:15 -0700
Message-ID: <013401d67d6d$786e4c50$694ae4f0$@augustcellars.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQEWrwsIZUXc1W2qApLXh9kZsjj9lAK3TH/EqrfA0BA=
X-Originating-IP: [73.180.8.170]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/6rp_sBBROzZeuEJCOWcjQz2bmi4>
Subject: Re: [Cbor] Packed CBOR and dictionaries
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Aug 2020 19:00:37 -0000


-----Original Message-----
From: Michael Richardson <mcr+ietf@sandelman.ca> 
Sent: Friday, August 28, 2020 11:21 AM
To: Jim Schaad <ietf@augustcellars.com>; draft-bormann-cbor-packed@ietf.org;
cbor@ietf.org
Subject: Re: [Cbor] Packed CBOR and dictionaries


I feel that the WG should adopt the document already.

[JLS] The WG was promised an updated draft that dealt with a couple of
issues before the adoption call was made.

Jim Schaad <ietf@augustcellars.com> wrote:
    > * Should the dictionary expansion be separate or part of this draft?
I am
    > not sure how I want to address this.  If you have a dictionary w/
50,000
    > entries in it, that is going to change how things should be done.  It
may
    > also be that one might want to use a dictionary entry for something
that
    > might otherwise be encoded as a prefix and the prefix might not be
needed
    > anymore.

Once the dictionary is larger than 512, I guess 131072 is the next size.
That uses four-byte references, and so the dictionary ought to provide at
least four-byte substitutions, right?
Otherwise, we'd be expanding rather then compressing.
It seems that a (C,Python,etc.) array is probably always appropriate as the
internal datastructure to perform lookups into the dictionary. It shouldn't
require a sparse array, ever, should it?

[JLS] That raises a different question.  At one point when I was emailing
Klaus about the fact that dictionaries in CoRAL are only defined to use
positive integers, there was some discussion that the negative values would
be for a more application specific version of a dictionary.   That would
make it 256 and 65,XXX.

It might be interesting to have "sparse" dictionaries.  Consider that you
might want all of the URLs without a fragment to occur at a smaller section
that the same set of all URLs with fragments.  This means that you can get
smaller compressions for CoRAL vocabulary that does not include all of the
possible values using just the base URL.

Jim



    > Being able to do packed is going to be of importance for doing CoRAL,
but
    > just as important is being able to do dictionaries.  Dictionaries do
have
    > the downside that if they are not referenced internal to the structure
then
    > from a security point of view they can be problematic as the
    > signed/encrypted CBOR byte stream is no longer self-contained.  This
is not
    > a problem for packed CBOR as long as the packing does not cross the
security
    > boundary.

Agreed.


--
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works  -=
IPv6 IoT consulting =-