Re: [core] HREF compression encoding

Carsten Bormann <cabo@tzi.org> Wed, 06 May 2020 12:19 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BE7113A09ED for <core@ietfa.amsl.com>; Wed, 6 May 2020 05:19:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FB05PlR-8K9D for <core@ietfa.amsl.com>; Wed, 6 May 2020 05:19:09 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7ACEC3A09D2 for <core@ietf.org>; Wed, 6 May 2020 05:19:08 -0700 (PDT)
Received: from [172.16.42.112] (p548DCD70.dip0.t-ipconnect.de [84.141.205.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 49HFy23Jl5zyd9; Wed, 6 May 2020 14:19:06 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <00ba01d618d2$bc689e60$3539db20$@augustcellars.com>
Date: Wed, 06 May 2020 14:19:06 +0200
Cc: "core@ietf.org WG" <core@ietf.org>
X-Mao-Original-Outgoing-Id: 610460346.006689-4da5f37beb2552bee36325afb5689c4b
Content-Transfer-Encoding: quoted-printable
Message-Id: <C2189FF3-2DD2-41E3-9719-789A982E0405@tzi.org>
References: <00ba01d618d2$bc689e60$3539db20$@augustcellars.com>
To: Jim Schaad <ietf@augustcellars.com>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/B1O1VezNNnUMAkfneNUjXN988f4>
Subject: Re: [core] HREF compression encoding
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 12:19:14 -0000

Hi Jim,

This is an interesting proposal.
Klaus and I had a chat about it now, which led to the following straw man grammar:

CRI-Reference = [
  (?scheme, ?(host, ?port) // path.type),
  *path,
  ? (QUERY, +query)
  ? (FRAGMENT, fragment)
]

scheme    = ((SCHEMETEXT, text .regexp "[a-z][a-z0-9+.-]*") 
            // COAP // COAPS // HTTP // HTTPS)
host      = ((HOSTTEXT, text) // bytes .size 4 // bytes .size 16)
port      = 0..65535
path.type = 0..127
path      = text
query     = text
fragment  = text

SCHEMETEXT = -1
COAP = -2 COAPS = -3 HTTP = -4 HTTPS = -5
HOSTTEXT = true
QUERY = -6
FRAGMENT = -7

Obviously, the specific marker values we use could be shuffled a bit, but it seems we can cover the entire spectrum that is covered by the existing syntax.
There are a few things that need to be decided base on perceived frequency of use (e.g., are text-valued host names the likely or the unlikely case — depending on that, the marker is placed on the host text or on the path sequence, or the absent host could be represented by `false`), so we should come up with some rough estimations here.

Comments welcome...

Grüße, Carsten


> On 2020-04-22, at 20:20, Jim Schaad <ietf@augustcellars.com> wrote:
> 
> I had a slightly different proposal to what Klaus presented at the last interim in terms of doing href compression.  It is based on the fact that URIs have a relatively fixed pattern and keeps the CBOR coding directly rather than moving to some type of binary encoding.
>  
> The standard pattern for a URI is scheme://hostname/path/path/…  Using this for compression purposes by removing all of the tagging which keeps to the same pattern you would compress coap://example.com/foo/abc…xyz to
>  
> [ “coap”, “example.com”, “foo”, “abc…xyz”]
> 1    1          1          1          2     = 6 bytes of padding
>  
> This is the same amount of padding as his binary compression method.  There is a slight loss over his method when you want to do port numbers, queries or fragments as they would need to have a integer tag inserted so you get
>  
> [ “coap”, “example.com”, “foo”, “bar”, Query, “a=b”, “c=d”, Fragment, “gohere”]
> 1    1          1           1      1     1       1     1       1        1        = 10 bytes
>  
> In the binary encoding this would only require 9 bytes
>  
> Moving to an IP address adds no additional padding as the difference between a text string, a byte string of length either 4 or 8 can easily be detected.  Relative URIs are encoded using similar tagging so you end up with
>  
> [ Absolute, “foo”, “bar” ]
> 1   1          1      1   =  4 byte
>  
> [ Relative, 2, “foo”, “bar” ]
> 1   1        1   1     1        = 5 bytes
>  
> Using the binary mode these would be 4 bytes and  4 bytes respectively (I think as no examples are in the slides)
>  
> I believe that the advantage of this proposal is that there is no new encoder/decoder needed as this is pure CBOR.  The compressed outputs are of similar lengths as the binary version and processing them I believe will result in near identical code sizes.  The code to do absolute and relative processing as well as generating CBOR options is going to be very similar.
>  
> Jim
>  
> _______________________________________________
> core mailing list
> core@ietf.org
> https://www.ietf.org/mailman/listinfo/core