Re: [Cbor] Ordering of items in the prefix table

Michael Richardson <mcr+ietf@sandelman.ca> Fri, 28 August 2020 22:06 UTC

Return-Path: <mcr+ietf@sandelman.ca>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 195813A0CE6; Fri, 28 Aug 2020 15:06:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Ppx3TY1Q6aj; Fri, 28 Aug 2020 15:06:23 -0700 (PDT)
Received: from tuna.sandelman.ca (tuna.sandelman.ca [209.87.249.19]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 324393A0CEE; Fri, 28 Aug 2020 15:06:22 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by tuna.sandelman.ca (Postfix) with ESMTP id CD963389F7; Fri, 28 Aug 2020 17:45:21 -0400 (EDT)
Received: from tuna.sandelman.ca ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with LMTP id y0CyAbOHuafj; Fri, 28 Aug 2020 17:45:18 -0400 (EDT)
Received: from sandelman.ca (obiwan.sandelman.ca [IPv6:2607:f0b0:f:2::247]) by tuna.sandelman.ca (Postfix) with ESMTP id 09576389C0; Fri, 28 Aug 2020 17:45:18 -0400 (EDT)
Received: from localhost (localhost [IPv6:::1]) by sandelman.ca (Postfix) with ESMTP id 1E7A191C; Fri, 28 Aug 2020 18:06:18 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: Carsten Bormann <cabo@tzi.org>, Jim Schaad <ietf@augustcellars.com>, draft-bormann-cbor-packed@ietf.org, cbor@ietf.org
In-Reply-To: <B7320B50-BB7D-4E0C-BA11-E35B69F90867@tzi.org>
References: <008d01d67c48$d25c4440$7714ccc0$@augustcellars.com> <29638.1598639086@localhost> <B7320B50-BB7D-4E0C-BA11-E35B69F90867@tzi.org>
X-Mailer: MH-E 8.6+git; nmh 1.7+dev; GNU Emacs 26.1
X-Face: $\n1pF)h^`}$H>Hk{L"x@)JS7<%Az}5RyS@k9X%29-lHB$Ti.V>2bi.~ehC0; <'$9xN5Ub# z!G,p`nR&p7Fz@^UXIn156S8.~^@MJ*mMsD7=QFeq%AL4m<nPbLgmtKK-5dC@#:k
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Fri, 28 Aug 2020 18:06:18 -0400
Message-ID: <17395.1598652378@localhost>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/H3PLixG4sLTmA_enUnumC8c6svE>
Subject: Re: [Cbor] Ordering of items in the prefix table
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Aug 2020 22:06:25 -0000

Carsten Bormann <cabo@tzi.org> wrote:
    > On 2020-08-28, at 20:24, Michael Richardson <mcr+ietf@sandelman.ca> wrote:
    >>
    >>
    >> Jim Schaad <ietf@augustcellars.com> wrote:
    >>> I was just going through an expansion algorithm and I started to wonder if
    >>> we should have a requirement about order of items in the prefix table.
    >>> Specifically if an entry in the prefix table uses a prefix itself  (i.e.
    >>> entry 3 in the table is 6.2872("foo")), does that entry need to occur after
    >>> the entry it references?
    >>
    >> I also asked that question.
    >> If we don't enforce that rule, then the decoder has to take care of dealing
    >> with loops.

    > Right.

    >> This means either having a maximum loop depth, or, using a mark to refuse to
    >> expand any specific dictionary item more than once.
    >> I would prefer that all references are forward references.

    > (Backward references?)

I guess it depends upon your point of view. All references need to point in
the same direction :-)

    > This is easy with just one kind of table entry, but gets harder when
    > prefixes can reference shared items which can reference other prefixes
    > and so on.

I think that things in the static dictionary(ies) can not reference things in
the shared (dynamic) dictionary.  I am grumpy about the terminology now, btw.
"shared" is a bit ambiguous now.

I also think that my understanding of the prefix space is too shallow.

But, it seems that we ought to be able to build a reasonable DAG.

    > Of course, the tables could be merged, but that loses exactly one bit
    > per reference that can be derived from the prefix vs. shared
    > reference.

When you say merged, do you mean that they receive some kind of collective
numbering?

    > One other thing to keep in mind is that not all places in the table are
    > equal — the compressor will likely sort the entries by frequency of
    > use.  Requiring infrequent entries to occupy all the good positions
    > does not lead to very compact encoding.

unidirectional references are just one way of dealing with the loop.
Another is just to have a big enough bitmap to be able to mark things.

A third loop detection involves two pointers, one which moves at half the
rate of the other: I know this works for single-linked lists, but it might
not work for a tree. I have think about this.

--
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-