[Cbor] Base-8 OIDs (or, where not to optimize)

Carsten Bormann <cabo@tzi.org> Thu, 03 September 2020 21:02 UTC

From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Date: Thu, 03 Sep 2020 23:02:04 +0200
Message-Id: <14A57296-1D6E-4771-A59C-623CE689C2D2@tzi.org>
To: cbor@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/9owRyOcXdsK7Ooc1S3D9-AImDdU>
Subject: [Cbor] Base-8 OIDs (or, where not to optimize)
Precedence: list

As a little side story about my previous message, let me explain an optimization to draft-ietf-cbor-tags-oid that I am NOT going to propose, but that would be really swell, save some 20 % of *everything* in real use cases, and be implementable on a single screen of code (well, maybe that needs to be a 4K screen then).

ASN.1 OIDs are sequences of numbers that are represented in BER using base-128 arithmetic, put into bytes so that the 8th bit of each 7-bit digit tells you whether another digit follows (documented also in RFC 6256).
ASN.1 BER has a weird special-purpose optimization for the start of an OID, where only the numbers 0, 1, 2 are currently valid: That number is multiplied by 40 and added to the next number (which is reversible because after a 0 or a 1 only numbers < 40 can follow — not so for 2, by the way).

Well, if you look at how we actually are using OIDs, you will notice that there are a lot more places where very small numbers are predominantly used.

So how about getting rid of the special-purpose optimization, and apply the following more general optimization: represent OIDs as sequences of nibbles (4-bit units) instead of bytes, and use base 8 instead of base 128 for what is otherwise exactly the same scheme (**).  This is obviously a (mild) pessimization for numbers between 64 and 127 (1 byte to three nibbles), 4096 and 16383 (two bytes to five nibbles), etc.  But so many other components of an OID get so much shorter.  The effective number of bits per byte for large numbers shrinks from 7 to 6 (3+3), but because these OID component numbers actually are distributed in a way that is rather close to Zipf’s law [1], the actual number of bytes needed per OID number goes down from a little more than 1 to much less than 1.

If I count right, the popular OID 1.3.6.1.4.1.9.9.109.1.1.1.1.5 has 9 places that are now one nibble shorter and one that is 1 nibble longer, so we go from 13 bytes (with special case for first number) to 18 nibbles (without special case), i.e., 9 bytes.

Also, we get rid of the wart for absolute OIDs and need only one type of encoding for both absolute and relative OIDs.  (Well, maybe we can ignore this simplification if it is desired to stay closer to ASN.1 BER.)

((**) We have to add a much smaller wart when the sequence of nibbles turns out to be an odd number of nibbles — that half-unused last byte would then get some padding that cannot otherwise occur in a last nibble, e.g., 0b1000, to indicate that the last nibble is not in use.)

It took approximately one morning shower (*) to come up with this scheme; I then spent what was probably a couple more hours to write and run some code against a not quite trivial SNMP trace I happened to have lying around and got some 20+ % overall size reduction (yes, that includes some actual unoptimized data beyond all those OIDs).  I did not take the time to write this up then (six years ago, when we started preparing draft-bormann-cbor-tags-oid-00).

So there we are.  There is a simple, good, easily implementable, and rather elegant, way to fix the representation of ASN.1 OIDs.
Looking at the clunky base-128 BER way now creates appreciable cognitive dissonance for me.

$64000 question: Why am I not proposing that we standardize this optimization?  

Grüße, Carsten

(*) A unit of innovation that is quite well-known to quite a few people I’ve been working with.

(**) see text above

[1]: https://en.wikipedia.org/wiki/Zipf%27s_law

[Cbor] Base-8 OIDs (or, where not to optimize) Carsten Bormann
Re: [Cbor] Base-8 OIDs (or, where not to optimize) Michael Richardson