[core] CoRAL href (CRI): simplicity vs. compactness?

Carsten Bormann <cabo@tzi.org> Wed, 23 June 2021 11:42 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 5E0153A34E6 for <core@ietfa.amsl.com>; Wed, 23 Jun 2021 04:42:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_FAIL=0.001, SPF_HELO_NONE=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id abNTwIr9XZ4M for <core@ietfa.amsl.com>; Wed, 23 Jun 2021 04:42:51 -0700 (PDT)
Received: from gabriel-2.zfn.uni-bremen.de (gabriel-2.zfn.uni-bremen.de [IPv6:2001:638:708:32::19]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9164B3A34E3 for <core@ietf.org>; Wed, 23 Jun 2021 04:42:51 -0700 (PDT)
Received: from [] (p548dcc89.dip0.t-ipconnect.de []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4G91bS0RMzz334M; Wed, 23 Jun 2021 13:42:44 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <162041037568.22240.16248531622206492564@ietfa.amsl.com>
Date: Wed, 23 Jun 2021 13:42:43 +0200
X-Mao-Original-Outgoing-Id: 646141363.652259-3bdd32442167a4184e9620cf834d18f9
Content-Transfer-Encoding: quoted-printable
Message-Id: <61FD490D-81D8-44B0-824D-C3CD02A6DFB1@tzi.org>
References: <162041037568.22240.16248531622206492564@ietfa.amsl.com>
To: core <core@ietf.org>
X-Mailer: Apple Mail (2.3608.
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/CJ4Mk0eWRZ3KcVkHBwSMIegkDJg>
Subject: [core] CoRAL href (CRI): simplicity vs. compactness?
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jun 2021 11:42:54 -0000

(Please read this in a monospaced font.)

href-04 has the optimized href syntax that we worked out with Jim Schaad and further optimized recently, which can be summarized as:

CRI-Reference = [
  ((?scheme, ?authority) 
   // discard),
  ? ((null, fragment)         ; no query section, but fragment
     //([+query-segment], ?fragment)) ; present query section has >0 items

This is so efficient because relative URI-references such as /a are extremely short, with the CRI reference.


On the other hand, quite some juggling is needed to translate a URI/URI reference into this format.

The weirdness that kept us thinking was that the path doesn’t have its own delimiters, so the presence or absence of a path needs to be inferred from other information, here the discard value.  If the discard is absent, there is always a path; only if discard is present, there can be an absent path!

This, together with the RFC 3986 rule to suppress a leading empty path segment,  leads to the following translations of URI references info CRI references (*):

/           []
/a          ["a"]
/a/b        ["a", "b"]
/a/b?foo    ["a", "b", ["foo"]]
a           [1, "a"]
.           [1]n..          [2]
../a        [2, "a"]
../a?foo    [2, "a", ["foo"]]
../..       [3]
a?foo       [1, "a", ["foo"]]
?foo        [0, ["foo"]]
#bar        [0, null, "bar"]

(Yes, that is the empty URI reference at the end.)
Most of the weirdness is inherited from RFC 3986.

Internally, my code uses

[scheme, [host, port], discard, path, query, fragment]

where most items can be null.

So I have code that translates between the internal and the external form, and that is full of if statements.

We could, instead, decide that we want to use something like the internal form for transfer as well.
We would replace absent items by nulls (or leave them off the end).

So this would lead to something like:

/           []                   0 []
/a          ["a"]                4 [null, null, null, ["a"]]
/a/b        ["a", "b"]           4 [null, null, null, ["a", "b"]]
/a/b?foo    ["a", "b", ["foo"]]  4 [null, null, null, ["a", "b"], ["foo"]]
a           [1, "a"]             3 [null, null, 1, ["a"]]
.           [1]                  2 [null, null, 1]
..          [2]                  2 [null, null, 2]
../a        [2, "a"]             3 [null, null, 2, ["a"]]
../a?foo    [2, "a", ["foo"]]    3 [null, null, 2, ["a"], ["foo"]]
../..       [3]                  2 [null, null, 2]
a?foo       [1, "a", ["foo"]]    3 [null, null, 1, ["a"], ["foo"]]
?foo        [0, ["foo"]]         3 [null, null, 0, null, ["foo"]]
#bar        [0, null, "bar"]     3 [null, null, 0, null, null, "bar"]
            [0]                  2 [null, null, 0]

The third column is the number of wasted bytes, which (apart from the outlier “/“) ranges from 2 to 4 bytes.

Simplicity vs. saving these 2 to 4 bytes for relative CRI references?
Maybe one simple tweak (**) can reduce this to 0 to 2 bytes.
But how important are relative references?
Let’s discuss in 160 minutes.

Grüße, Carsten

(*) I typed these by hand and apologize for the many errors likely to be in there…

(**) Something about leaving out the first two nulls; probably needs a separate discard value (true?) for discard all.  Exercise left to the reader…