[Cbor] OID representation (was: Re: Robert Wilton's Discuss on draft-ietf-cbor-tags-oid-06: (with DISCUSS and COMMENT))

Carsten Bormann <cabo@tzi.org> Thu, 08 April 2021 19:44 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id E4B473A190F for <cbor@ietfa.amsl.com>; Thu, 8 Apr 2021 12:44:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id r_uxm67aMEJj for <cbor@ietfa.amsl.com>; Thu, 8 Apr 2021 12:44:56 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2FFD63A190D for <cbor@ietf.org>; Thu, 8 Apr 2021 12:44:56 -0700 (PDT)
Received: from [] (p548dc178.dip0.t-ipconnect.de []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4FGWts4PTmzyXR; Thu, 8 Apr 2021 21:44:53 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <161788356811.31539.2139615008210880278@ietfa.amsl.com>
Date: Thu, 8 Apr 2021 21:44:52 +0200
X-Mao-Original-Outgoing-Id: 639603892.382916-7bdc8afc682da8387f573a62c190ce72
Content-Transfer-Encoding: quoted-printable
Message-Id: <74E0F02D-1AAB-444B-ABCA-82EACEF27B5F@tzi.org>
References: <161788356811.31539.2139615008210880278@ietfa.amsl.com>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/QoS55EJAFylmBR6vnNbdg2C-aGg>
Subject: [Cbor] OID representation (was: Re: Robert Wilton's Discuss on draft-ietf-cbor-tags-oid-06: (with DISCUSS and COMMENT))
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Apr 2021 19:45:00 -0000


in the IESG processing of draft-ietf-cbor-tags-oid, Rob Wilton raises some questions that I think the WG needs to consider before we reply.

I’ll include some excerpts from Rob’s message and try to translate this into questions to the WG.

## Question that needs to be decided by WG

> I would like to please see some more clarity or guidance about when TAG TBD112
> should be used, given that there are two possible encodings of absolute OIDs
> below "".
> Specifically, the questions that I have, that probably need to be clarified are:
> - is a CBOR encoder allowed to optimize a TBD110 tag into a TBD112 tag?
> - Should CBOR decoder clients always expect to be able to handle both TBD110
> and TBD112 tags? - Or, it the decision over whether to use TBD110 or TBD112
> down to the application and the application needs to agree which is use.

This is indeed a good question (about TBD111 vs. TBD112, actually).
This can be seen as related to issues about preferred representations, or about minimizing interoperability problems by nailing down exactly one allowed representation.

So what we could propose as a WG is:

(1) Not establishing any stronger connection between TBD111 and TBD112, i.e. status quo.
(2) Declaring TBD112 as the preferred encoding of an absolute OID that starts with (0x2b06010401), but still allowing the use of TBD111 as non-preferred.   This would mean a requirement on decoders implementing TBD111 to also implement TBD112.
(3) Outlawing any TBD111 starting with (0x2b06010401) and requiring the use of TBD112 in all such cases.  This would mean a requirement on both encoders and decoders.  It would also mean a slight complication in tag factoring: replace all naked byte strings that start with h’2b06010401’ by TBD112(rest) (saving three bytes), unless *all* such byte strings start with h’2b06010401’, in which case these prefixes would all be removed and an outer tag of TBD112 used.

(2a) would be (2) with a deterministic encoding requirement to use the shortest form (i.e., (3) in deterministic encoding only).

## Other responses, to be vetted by the WG

> I found this document to be interesting because I knew from the title that it
> was going to only be 4 pages long and say that OIDs are obviously encoded as a
> tagged array, hence I was surprised to see that was not the solution and it
> uses BER encoded OIDs instead.
> The document explains, and I think that I understand why this has been done,
> but I question whether the title of the document and name of the tags is right.
> Is it really a CBOR representation of OIDs, or is it actually a CBOR
> representation of BER encoded OIDs?  

The first, using the technical approach named in the second.

> I.e., it is plausible that there would
> ever be a requirement for non BER encoded OIDs.  E.g., I'm not an ASN.1 expert,
> but say if somewhat wanted to do a CBOR encoding of ASN.1, then it is not
> obvious to me that they would use a BER encoding for OIDs.  

One example for a CBOR encoding of something that previously has been encoded in ASN.1 is draft-mattsson-cose-cbor-cert-compress, which normatively references the tags-oid spec.  The certificate spec actually goes ahead and registers translation tables between likely OIDs and small integers.  But if the full range of OIDs is needed, the byte string derived from the BER content indeed is the representation chosen.  (As the C509 certificates are schema-driven, they do not need the actual tags defined here.)

> Hence the
> suggestion is to make the title, abstract, and name of the tags clear that it
> is about the CBOR encoding or BER encoded OIDs.

The specification proposes to use BER in a number of tags for binary encoding of OIDs.
The BERness is not a feature that the OIDs already need to have before these tags apply.
So a title "CBOR encoding of BER encoded OIDs” (which is what I think Rob wanted to say) would be a restriction beyond what this spec is about.

## Answers that probably don’t need WG input

> In the introduction:
>   Since the semantics of absolute and relative object identifiers
>   differ, this specification defines two tags, collectively called the
>   "OID tags" here:
> I presume that this should be three tags?

(Fixed by Ben’s PR, https://github.com/cbor-wg/cbor-oid/pull/9 .)

> In section 4.1.  Tag Factoring Example: X.500 Distinguished Name:
> The diagram uses a mix of single letters (e.g. c for country), and a full name
> "street".  Is this how the X.500 attributes are defined?  

This uses the naming defined e.g. in RFC 4519, section 2.2 and section 2.34, which has been derived from X.520 and is quite ubiquitous, going back to Table 1 of RFC 1779.

>    The country and street RDNs are single-valued. The second and fourth RDNs
>    are multi-valued.
> Perhaps:  "The country (first) and street (third) RDNs are single-valued. The
> second and fourth RDNs are multi-valued."

Now in https://github.com/cbor-wg/cbor-oid/commit/3d0fd8b .

>    h'550407': "Los Angeles", h'550408': "CA",
> I think that the example would be more clear by splitting the city and county
> onto separate lines.

Now in https://github.com/cbor-wg/cbor-oid/commit/3d0fd8b .

> Finally, the document contains these two sentences that seem to somewhat
> conflict with each other:
> "While these sequences can easily be represented in CBOR arrays of unsigned
> integers, a more compact representation can often be achieved by adopting the
> widely used representation of object identifiers defined in BER; this
> representation may also be more amenable to processing by other software that
> makes use of object identifiers."
> compared to:
> "Staying close to the way object identifiers are encoded in ASN.1 BER makes
> back-and-forth translation easy; otherwise we would choose a more efficient
> encoding."

While the BER form is reasonably efficient, a more efficient representation of sequences of unsigned integers that roughly follow Zipf’s law in distribution has been described in  <https://mailarchive.ietf.org/arch/msg/cbor/9owRyOcXdsK7Ooc1S3D9-AImDdU>, which would for instance represent as 0x136141 and does not need a special case to make absolute OIDs efficient.

The WG instead decided to follow the existing BER representation because it is so widely implemented and is compatible with SDNVs, which are used in additional applications and often have good platform support (e.g., Ruby pack/unpack(“w*”)).
For OID-style applications, BER content is often still more compact than a CBOR array of unsigned integers.

Tagged (homogeneous) arrays of arcs don’t help as the OID arcs vary widely in their range; e.g., the common OID 1.2.840.113549.1.1.1 (*, 2+9 bytes in BER) would need to be an array of seven 32-bit integers (2tag+2head+28 bytes) instead of a basic CBOR array of unsigned integers of 14 bytes.

In general, I don’t think we need to discuss an extensive set of alternative approaches (paths not taken) in this specification.

Grüße, Carsten

(*) {iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) pkcs-1(1) rsaEncryption(1)}