Re: [Cbor] Reminder and call for agenda: CBOR WG Virtual Meeting on 2022-06-01

Carsten Bormann <cabo@tzi.org> Tue, 28 June 2022 20:44 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CBC2DC15AAC3 for <cbor@ietfa.amsl.com>; Tue, 28 Jun 2022 13:44:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wYL5SyH4VmiZ for <cbor@ietfa.amsl.com>; Tue, 28 Jun 2022 13:44:53 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E20BC14F734 for <cbor@ietf.org>; Tue, 28 Jun 2022 13:44:53 -0700 (PDT)
Received: from [192.168.217.118] (p5089ad4f.dip0.t-ipconnect.de [80.137.173.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4LXc6C13XNzDCjD; Tue, 28 Jun 2022 22:44:51 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <YrmJV7OwrbOI/zKe@hephaistos.amsuess.com>
Date: Tue, 28 Jun 2022 22:44:50 +0200
Cc: cbor@ietf.org
X-Mao-Original-Outgoing-Id: 678141890.7750469-eec56135d0184ff20b7e8675b6c0187b
Content-Transfer-Encoding: quoted-printable
Message-Id: <861D3104-8C33-4819-9488-1885F94C973F@tzi.org>
References: <CALaySJLPtUjdfVss17noK=18RyczpcCGNu=im8CBpiQz=WiLWA@mail.gmail.com> <CALaySJKUNh-AkJa87sCDpzf9OHV8H367VQyzyozXCCXxphUARw@mail.gmail.com> <CALaySJ+P2sP7BU7bNSxRJBByyp04rzVZuukq_e+9wbb5WPRSFQ@mail.gmail.com> <CALaySJKxht1gd1+3mNiAH-kLUAxjdPPk3doK50C_xS74LG+YTQ@mail.gmail.com> <CALaySJJjSHT2q_wpZQ9QFhLSxGuhffWwb=9P1XDUFTsheOvPZA@mail.gmail.com> <5A9B396E-1D9F-455C-949F-9B4C89AA510C@tzi.org> <CALaySJ+Sp=hmc4-kp1UrYPf0BxMtQy4aS+LiCfkREYqmip1Q6w@mail.gmail.com> <B9E21E1E-164D-4306-88D7-A88DC76080A9@tzi.org> <YrmJV7OwrbOI/zKe@hephaistos.amsuess.com>
To: Christian Amsüss <christian@amsuess.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/JdrV3OutHPfJNIF4vfNNait-KI0>
Subject: Re: [Cbor] Reminder and call for agenda: CBOR WG Virtual Meeting on 2022-06-01
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jun 2022 20:44:56 -0000

Hi Christian,

thank you for this quick feedback!

> 
>> When reconstructing the original data item, such a reference is
>> replaced by a data item constructed from the argument data item found
>> in the table (argument, which might need to be recursively unpacked
>> first) and the rump data item (rump, again possibly recursively
>> unpacked).
> 
> This reads a bit like also the rump would be recursively unpacked first.
> Is that the intention? (I think not -- with my understanding of
> unpacking so far, the rump would be unpacked at the time it is
> encountered in the reconstructed item, allowing the argument to alter
> the dictionary).

You lost me at “alter the dictionary”.

The argument *is* in the dictionary.
The rump needs to be “recursively” (we need to fix this term) unpacked because any parameters to the function tag (or to the affixing process) need to be unpacked (conceptually!) before they are applied.
In a real implementation, the unpacking could be done by a visiting accessor, so there is no actual recursion.

>> [...]; a type-0 reference is either a prefix reference or a
>> type-0 function reference, while a type-1 reference is either a
>> suffix reference or a type-1 function reference.
> 
> I find the type-0/type-1/"dominating tag" concept rather hard to
> understand. It's neat how it allows symmetry between reference and
> argument (which is especially pronounced when prefix/suffix becomes
> concatenation),

Yes!

> but I'm unconvinced that that symmetry is worth the
> cognitive / implementation load of doing it that way,

It makes sense to point out on which end the function tag is (we found good examples for both ends).  Of course, we could just take the end which has a tag, but that makes things a bit less deterministic (and becomes complicated if both ends have one).  I’d rather be explicit on where the function tag is, and the same bit that decides prefix vs. suffix can also be used to decide argument-tag vs. rump-tag.  The “type-0”/“type-1” wording is scary for about 20 minutes :-)

> or even generally
> present. (One can always swap the arguments of a two-argument function,
> but not for all functions either argument makes sense in a dictionary).

Sorry, I have been using “argument” for the thing looked up in the table, so we only have one argument (that goes into one of the two parameters of the tag function :-).  Maybe we need a better word than argument.

> Alternatives I'd like to suggest for consideration:
> 
> * Maybe the mechanism described in the branch can be explained in an
>  easier-to-digest way. It may or may not become simpler by either of
> 
>  * giving names to the two arguments the of the function (arg1, arg2?
>    A, B? tagged, other?)

Left hand side, right hand side.

>  * calling prefix/suffix just "concatenation" (arg1 | arg2) and not
>    treating it as an extra case but as the default

Yes, that might help.

> * Use a bit more separate tags for where there is real need for having
>  both directions. (No type-0/type-1 distinction, and having the
>  tag-that-indicates-function always in the dictionary. For string/array
>  postfixes that'd need a tag already at dictionary setup).

I need an example.

"having the tag-that-indicates-function always in the dictionary.” Has the disadvantage that you cannot use the same table entry string for midfix-from-rump and affix, as in my example.

> * Decoupling the functions a bit more from the compression, and using a
>  more general mechanism to fill them.
> 
>  There might not be much point in having `109(["example.com",
>  ["https://", "/foo".html"]])` around explicitly in any document, but
>  in a generalization the tag could mean "string-join arg2 with arg1",
>  and all of a sudden `109([",", ["1", "2", "3"]])` or even `109([",",
>  64('123')])` could be usable more generally even outside packing.

Nice! s/midfix/join/… then (106 ‘j' is not taken yet :-)

join(“.”, [“datatracker”, “ietf”, “org”])…

>  In packing, they'd be used as
> 
>  ```
>  113([[],
>    [109(["packed.example", ARG])],
>    [6(["https://", "/foo.html"]),
>     ...
>    ]
>  ])
>  ```

What is ARG?

>  or (in the other example's style)
> 
>  ```
>  113([["packed.example"],
>    [],
>    [109([6(0), ["https://", "/foo.html"]]),
>     ...
>    ]
>  ])
>  ```

I’m not yet quite ready to embrace general function evaluation as a part of CBOR (outside unpacking, that is).

>  (The first example here does beg the question the 'put the argument in
>  here' placeholder ARG is best phrased; I don't have any definite
>  answer here yet, but maybe it makes sense to shift up the index space
>  inside the argument-item space by 1 and make 6(0) the argument --
>  although that might be similarly hard to teach as type-0/type-1.)

The current design clearly identifies function tags as such by using them as lhs (from a type-0 reference) or rhs (from a type-1 reference) of an argument reference.  We need a way to be able to identify them because:

— Not every processor can know every function tag in existence, so we need to know when a function tag is actually “invoked” to find ones that we don’t know
— A function tag might be used in its meaning as a mere tag, i.e., not automatically evaluating.

Grüße, Carsten


> 
> BR
> Christian
> 
> who is wondering whether indefinite-length strings / bytestrings really
> needed to be in the original CBOR spec, or would not have been better
> served with a "concatenate" tag around an indefinite-length array.

Knowing after almost a decade how we have learned to use tags, this could indeed be done differently…

Grüße, Carsten