Re: [Cbor] Interactions of packed CBOR and tags

Brendan Moran <Brendan.Moran@arm.com> Thu, 03 September 2020 16:58 UTC

Return-Path: <Brendan.Moran@arm.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 047C83A10C3 for <cbor@ietfa.amsl.com>; Thu, 3 Sep 2020 09:58:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.b=PQVYWx63; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.b=PQVYWx63
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zhU7af1fxI-r for <cbor@ietfa.amsl.com>; Thu, 3 Sep 2020 09:58:12 -0700 (PDT)
Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-he1eur02on0611.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe05::611]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D31A03A10CB for <cbor@ietf.org>; Thu, 3 Sep 2020 09:58:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rqnbEytlNer2qg0sk7+OfrtWUT3RWrfgbYalrH1+9qw=; b=PQVYWx63Mgg4RfGHlnVDVvVJ9rOGBCnCaoFyCZTF7qR+KfHEOuzRAbOiBZ+YOGGBow0h4nS615ADAeKsCTUATHx521LxAOHBq+Bl0SaAXY7/bG1Yv6z6CEA2YbwlooNclO8R7tZSNl/RRY2FtPpE9UigBR6C96ct5BoGic5driQ=
Received: from DB8PR06CA0035.eurprd06.prod.outlook.com (2603:10a6:10:100::48) by DB8PR08MB4028.eurprd08.prod.outlook.com (2603:10a6:10:a8::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16; Thu, 3 Sep 2020 16:58:04 +0000
Received: from DB5EUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:100:cafe::10) by DB8PR06CA0035.outlook.office365.com (2603:10a6:10:100::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16 via Frontend Transport; Thu, 3 Sep 2020 16:58:04 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; ietf.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;ietf.org; dmarc=bestguesspass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT029.mail.protection.outlook.com (10.152.20.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16 via Frontend Transport; Thu, 3 Sep 2020 16:58:04 +0000
Received: ("Tessian outbound bac899b43a54:v64"); Thu, 03 Sep 2020 16:58:04 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 9e8dee81ff9f3ae5
X-CR-MTA-TID: 64aa7808
Received: from bdd6fc077b83.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B6F8C428-9856-4984-A62A-DCEAF1201208.1; Thu, 03 Sep 2020 16:57:55 +0000
Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id bdd6fc077b83.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 03 Sep 2020 16:57:55 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PpPfZC8KQIdfZUx3A5V068GcyJk94P5cwl0Geybr7rE/T74AATdNE6CFdRWaChTFYcxRU2rmgaaR+P9XUf+s8Pk8mUz+08h+ORGriN5Le0zzqjUdieoJEr4GYgo8mGEFhk9OzhpIXDS0FG6FBHitqSmKr+HQLT07ZGHhNJlht49EuTCEJ9zGR2aDvON3JQ8Qa0FC7jBCS3OZpsY399Jh18kDftbgWsJaksx+bfG+eK/ooRiQrEM+Wg/htmBsRL2V6ljTLzfh2CTQ8jbHEWx9V3/vyHQkuwpnJKQkok/H1VnI00ZwOXeVQHhQUIUg1KLrTX1LRSYi5npfmvkNaJVIuA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rqnbEytlNer2qg0sk7+OfrtWUT3RWrfgbYalrH1+9qw=; b=SoSTIyF83X/my8YXkfCI9/BLPK2/gq2SDsGgjzv47jT5nYrz19e/LPJl9IR7Ig3RDrY2io4s6Yk5GAgEplxBJWeazol5y0Mf3nsGXujlS4qrVJDBM5BbNnw6z1th5WyC/3GTGzLF4yGFlDxBOAmy0I1TLUVk1pSh0Y3a4pGCWE1PDVjZPSBcYbrRKP0uF/Ao8xKzGXoVASWxLC0/YtNNBHDWlNgpzFUUYEWEocw91TYtUnwlQb1mVEYqXMUAqtT8Af4fibVx07RymFiQoHvsxbpz2P/GaK8iKRE4Dhbag8xQ6+LpuMcjwCygVbtA3Pg19a+j1wtMXoz39u8Cd//C4w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rqnbEytlNer2qg0sk7+OfrtWUT3RWrfgbYalrH1+9qw=; b=PQVYWx63Mgg4RfGHlnVDVvVJ9rOGBCnCaoFyCZTF7qR+KfHEOuzRAbOiBZ+YOGGBow0h4nS615ADAeKsCTUATHx521LxAOHBq+Bl0SaAXY7/bG1Yv6z6CEA2YbwlooNclO8R7tZSNl/RRY2FtPpE9UigBR6C96ct5BoGic5driQ=
Received: from AM6PR08MB4738.eurprd08.prod.outlook.com (2603:10a6:20b:cf::10) by AM6PR08MB3319.eurprd08.prod.outlook.com (2603:10a6:209:4e::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3326.22; Thu, 3 Sep 2020 16:57:54 +0000
Received: from AM6PR08MB4738.eurprd08.prod.outlook.com ([fe80::a98d:5ebe:dc1d:ea56]) by AM6PR08MB4738.eurprd08.prod.outlook.com ([fe80::a98d:5ebe:dc1d:ea56%3]) with mapi id 15.20.3348.016; Thu, 3 Sep 2020 16:57:54 +0000
From: Brendan Moran <Brendan.Moran@arm.com>
To: Jim Schaad <ietf@augustcellars.com>
CC: Carsten Bormann <cabo@tzi.org>, "cbor@ietf.org" <cbor@ietf.org>
Thread-Topic: [Cbor] Interactions of packed CBOR and tags
Thread-Index: AdZ8s0xpKERBvw7yTZyxz2a31flriAABj4uAAAP6fwABR/KiAAAIntwAAAHnt4A=
Date: Thu, 03 Sep 2020 16:57:53 +0000
Message-ID: <25F7B7D3-7ED7-4062-8000-21D1AF1A69C3@arm.com>
References: <00c101d67cb5$2588b790$709a26b0$@augustcellars.com> <E30F54B6-1A63-48AC-89AE-61983654B5A9@tzi.org> <00cc01d67cc9$766c7b60$63457220$@augustcellars.com> <4AE9B2FA-EEB3-4B45-96E4-9DC85118567D@arm.com> <016f01d6820b$bc7d7cc0$35787640$@augustcellars.com>
In-Reply-To: <016f01d6820b$bc7d7cc0$35787640$@augustcellars.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-mailer: Apple Mail (2.3608.120.23.2.1)
Authentication-Results-Original: augustcellars.com; dkim=none (message not signed) header.d=none;augustcellars.com; dmarc=none action=none header.from=arm.com;
x-originating-ip: [82.20.19.206]
x-ms-publictraffictype: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 7c8085b7-3abb-477b-126e-08d8502a8616
x-ms-traffictypediagnostic: AM6PR08MB3319:|DB8PR08MB4028:
X-Microsoft-Antispam-PRVS: <DB8PR08MB402817674511E2B891950A8CEA2C0@DB8PR08MB4028.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
nodisclaimer: true
x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: b+WPusxanh6y5rAtKBGEaM1l6TctPecTD2ARBMswJahHrFo2AivdIzHZE4dHc9lKZsAieUNxZ+eE7p8i/8+F9XdUw/gG+Iu0DYrszsbctdA8NXrtxDRhYZTrXEHkmh82r54WY6B4Jz5yUMtHmlgXTCK4nhNaO9TMNpU6hZ4OqMqXf0DHa2uQbpnTDC85gVZrI0i31sGIJlV0SPKCVIast5phPOgrWHh5Or+SbRnrf1W7HrXnoo7D6mm2rr6YqUqNO1+VFRSsnlcSqd+Lo7z/CnljLt7/+4IMi+1JuY+2eTNzb4W57tA32FcFvWdRjxAd20L47vwsW58iLfkGE1BthWUb6r3Nqr1NJRYwJtOcZb1QY0dL//VPLTUGdN+rfkRSk9X1s7J/8peHOXcuFLamxQ==
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR08MB4738.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(346002)(39860400002)(366004)(8936002)(6512007)(66476007)(478600001)(36756003)(966005)(6916009)(4326008)(186003)(316002)(2616005)(54906003)(53546011)(86362001)(26005)(6486002)(6506007)(2906002)(33656002)(66446008)(71200400001)(8676002)(66946007)(66556008)(76116006)(30864003)(64756008)(5660300002)(91956017)(83380400001); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: stPNmLgDMLW3hDbVSeY+XkeTMbrmkFnEedPjxpUsqzWBrEOoYgTp2ZitpwOvHEXwLn3R3AFSMxSPD29fg6k6UPZy4tXKKZgZarvCiXPdRlUOTYtZ3G1TH6yvWZ3aJtUaSQxsWlG5xfNNL2LdZar1WW4pxL3I9xaGp2M3nrKVAGmXafi+iy/R5oUW2sRVhohAfUjK5U8B+ZbuYmIH9OLQOAS2TIkfONvbRLJFTHQ36/gAZaeGSIAeRedXB/mAClCyaP23paI7RQ2DGDopJz+o6YEnU22ZYM+FbDY8Q16KLkV/izw5WiBOI2gKfltny2VXbOki545spFauAzPBo6S5ZoregG9FQeA/CKbUy1PM+X5Nwx24UDZe1rEhR4mg84ZgNu3mFwgrq9WhE7bRXvTO0hnhGsmaCgkCCtl7kjN/6uBZD3v1cRZUkQQheq/4tV38OykhYkZBeYGRlJyfnmQV4LBhv/MfQCZnd6afEs+aA2mXdf67ykeSGZtDXaRGc8prAmhzRSLQxqVhcgz6PETaTc4XkW1ppYRBLUoY6Ez0LnOmi50lb5C55VXwdKIMwDj4BKPFwZH4onw+f1c6VqlX1+ZIRi8E66Q0fuDGdKhCy/RB0amR/059KYfxyj+tSoWkI8ozdtA6R3/pMoLHTntVIA==
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-ID: <731F0CD3D986EC42B4A121E04E597943@eurprd08.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3319
Original-Authentication-Results: augustcellars.com; dkim=none (message not signed) header.d=none;augustcellars.com; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT029.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 4a125738-87fd-4541-99a5-08d8502a8029
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: ffnw7deuFziQhEnHL1AXWq7ryB0zgK+q1Vw+hcYX4WLTq0T8EKWgBPurWkHbb+c3DE8AO/PQ8wwfyy/0ZpqI4fhckjZM+SmANZ/Zyo2uIJR2enn89KeZD8LVP+heLX+LYDCyhFUcLpHG+n5nCA8spOctcVCsTYSWpiP37d8qguIOnR9Nnfg9Fg7cRH9mv1sBcksFZ/9FRPn1KCyUCSzCWitsHEgCh2swCFSfxAFTQpo1Do3/kjMVFSKy85g8++722owfgD/ZWskqXOcePz684rljcAD2dBFfoYSa9sibW//0Ftl+ROBzZtQ9cwyC6SxlauYct2nDq6BXcZ0kqAi0PWzzTtPAzpXxDrNX4T5jZEqJ/uTRGL4r2FE65XcJiAu2rJC0AeZBnCu03u7+7F2ROKN2RZs5juDcJ9f43ZOr4Vk8g0S7wbaYxev+h+MUftFzlGp1HrOgCh35Q81h3iE41J/ftsltvg6FEcO9WmxCw6Y=
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(396003)(39860400002)(136003)(346002)(376002)(46966005)(82740400003)(336012)(86362001)(6512007)(47076004)(356005)(33656002)(36756003)(83380400001)(82310400003)(2616005)(81166007)(316002)(26005)(186003)(70586007)(6862004)(70206006)(30864003)(5660300002)(966005)(8936002)(478600001)(54906003)(4326008)(53546011)(6506007)(8676002)(6486002)(2906002); DIR:OUT; SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Sep 2020 16:58:04.1277 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 7c8085b7-3abb-477b-126e-08d8502a8616
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT029.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB4028
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/7Sjp61_WxjbOf55cMPH-cwi1Ohs>
Subject: Re: [Cbor] Interactions of packed CBOR and tags
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Sep 2020 16:58:16 -0000


> On 3 Sep 2020, at 17:03, Jim Schaad <ietf@augustcellars.com> wrote:
>
>
>
> -----Original Message-----
> From: Brendan Moran <Brendan.Moran@arm.com>
> Sent: Thursday, September 3, 2020 4:57 AM
> To: Jim Schaad <ietf@augustcellars.com>
> Cc: Carsten Bormann <cabo@tzi.org>; cbor@ietf.org
> Subject: Re: [Cbor] Interactions of packed CBOR and tags
>
> I’m not certain whether this overlaps or not, but I think it does. There’s a limitation in the current specification of packed cbor: Only text prefixes and singular CBOR elements can be packed. This may not seem like a big limitation at first, however I think that there are some missed opportunities here.
>
> I’ll use CIRIs (https://tools.ietf.org/html/draft-hartke-t2trg-ciri-01) as an example. I realise they may not be “representative” but they illustrate two observations I’ve made about packed CBOR:
>
> 1. Some things need postfix sharing rather than prefix sharing. Domain names are the primary example of this. Because subdomains are prepended, rather than appended like the rest of the URI path, they need postfix sharing, assuming that the TLD has lowest entropy and the subdomains have the highest entropy in a given structure.
>
> 2. Packing sub-sequences of elements within containers is something that we should consider.
>
>
> For example, suppose that I need to many CIRIs that represent, for example, the following URIs:
>
> https://www.ietf.org
> http://www.ietf.org
> https://datatracker.ietf.org/group/cbor/about/
> https://datatracker.ietf.org/doc/draft-bormann-cbor-packed/
>
> Encoded as CIRIs and arranged into an array, these URIs take up 161 bytes (N.B. it would only be 151 bytes for the raw URIs) [
>  [1,"https", 2, "www.ietf.org"],
>  [1,"http", 2, "www.ietf.org"],
>  [1,"https", 2, "datatracker.ietf.org", 5, 0, 6, "group", 6, "cbor", 6, "about"],
>  [1,"https", 2, "datatracker.ietf.org", 5, 0, 6, "doc", 6, "draft-bormann-cbor-packed"] ]
>
> [JLS] I really wish the updated CRI document would get published.  This issue was discussed last sprint from somewhere around IETF 107 and into May that produced a proposal that uses a well known URI pattern along with one built in dictionary.  Using that encoding you would end up with
>
> [  [2, "www.ietf.org"],
>  [1, "www.ietf.org"],
>  [2, "datatracker.ietf.org", "group", "cbor", "about"],
>  [2, "datatracker.ietf.org", "doc", "draft-bormann-cbor-packed"] ]
>
> This means that we are looking at an encoding size of 125 bytes rather than 161 bytes.   So when that finally gets written up with will have a better starting point.
>
> [/JLS]

[BJM] That’s really interesting. I like where it’s going. However it’s still going to be bigger than a cbor-encoded plain text URIs if you have more than a few > 23 byte path segments, which is a bit annoying. For example, https:// is 8 bytes. The scheme above compresses that to 1 byte, but introduces an overhead of 1 byte for the array, so we’ve saved 6 bytes. Each path segment has a 1-byte separator in plain text. In the scheme above, it has a text tag. This is the same total, except that the text encoding can omit the last delimiter. This means that we’ve saved only 5 bytes. If there are more than 23 path segments, then we lose another byte. If 5 or more path segments are more than 23 bytes long, we are at parity.

It depends on what the goals of the scheme are. CRI not always smaller. If size were the only goal, a hybrid approach that retains path separators would yield smaller results:

75 bytes:
>  [2, "datatracker.ietf.org/a-24-character-path-1234/draft-bormann-cbor-packed"]
76 bytes:
>  [2, "datatracker.ietf.org", "a-24-character-path-1234", "draft-bormann-cbor-packed"]


[/BJM]

> I think we can do a bit better, but it’s convoluted… (122 bytes, 76% compression ratio):
> 6([
>  [
>    [1, simple(1), 2, simple(3)],
>    [1, simple(0), 2, simple(3)],
>    [1, simple(1), 2, simple(4), 5, 0, 6, "group", 6, "cbor", 6, "about"],
>    [1, simple(1), 2, simple(4), 5, 0, 6, "doc", 6, "draft-bormann-cbor-packed"]
>  ],
>  [
>    simple(0),
>    "www",
>    "datatracker"
>  ],
>  "http",
>  6("s"),
>  ".ietf.org",
>  224(simple(2)),
>  225(simple(2))
> ])
>
> [JLS]  I don't find this to be very convoluted at all and it is what my compression algorithm generates automatically.  I look at the piece I split off from the prefix and see if I have "enough" duplicates to compress them down.  I am still trying to decide how to do some extraction from "the middle" of things.  Consider looking at
> ["www.merch.ietf.org", "www.datatracker.ietf.org"]
>
> It would be nice to think about compressing that as
>
> 6([ "www.", "merch", "datatracker"], [".ietf.org""], 224(225(simple(2))), 224(226(simple(2)))]
>
> Where we have pulled both prefix and postfix strings extracted and maximize the amount of commonality.
>
> [/JLS]

[BJM]
Under the existing -01 draft, I’m not sure you can do much better than:

6([
  [
    6(simple(1)),
    6(simple(2))
  ]
  [
    “www.”,
    “merch”,
    “datatracker"
  ],
  “.ietf.org”,
  224(simple(0)),
  225(simple(0))
]

[/BJM]

> Back to the observations I made above.
>
> 1. Compressing the domain names doesn’t work very well. Maybe this is unique to domain names, but I think we need more data on that.
> 2. There’s a lot of regularity left here, but it’s all in the form of sequences of array elements. For example, the sequence [1, simple(1), 2] shows up regularly, as does [simple(4), 5, 0 , 6].
>
> [JLS] This would be taken care of by the fact that we have discussed doing the same prefix processing on arrays as well.  This was brought up specifically for the CRI case where we saw that this was going to be an issue.

I’m glad to hear that. Did I miss something on the mailing list? I assumed that this was still TBD since I didn’t see it there. Does that mean that the use of Tag 6 as the first prefix is being dropped so that array references are explicit? Or are array prefixes prohibited in the first slot? Or something else?


>
> Here’s where I think the discussion overlaps. Enabling packing of sequences of elements would require one of three changes:
>
> 1. Only indefinite-length containers can have packed items.
> 2. The container count refers to unpacked count, which makes the packed CBOR invalid.
> 3. The container count refers to packed count, which means the decoder must adjust the total whenever a reference is encountered. This has additional implications for maps.
>
> [JLS] I think that we can all agree that option 2 would be a completely unacceptable.  For the purpose of the compressed output, it needs to be definite length as that is going to be the shortest.  I think in the above however, you are potentially confusing the result of compression vs the result of decompression.  The compressed form should always be definite length encoded, the result of decompression could be either definite length or indefinite length according to how it operates and not what the compressed form looks like and therefore does not need to be specified in the document.

[BJM] I think I may have been confusing things, yes. When considering how a parser should handle a string prefix, it’s clear that the size is the packed size. This will cause some interesting problems for pull parsers, particularly those that use lazy evaluation. The SUIT demo parser saves pointers into the manifest CBOR when setting variables, rather than storing the variables themselves. It occurs to me that this will need also need a list of current reference tables for lazy evaluation. In many cases, that may be larger than the value being set, which will make things interesting. This will require some careful handling and probably some usage notes for implementors.

>
> If it turns out that postfix sharing of data really is unique to domain names and that the problem isn’t generic, maybe we could solve it another way, for example special handling of anything within Tag 32. Sequence packing, however, is a space where I think we need a solution.
>
> [JLS] I expect that for sequence and map compression we are going to find that we are going to end up looking at compression at the start, middle and end of these.  In the case of map compression then the question of how this is represented and made into a deterministic encoding might become very interesting.  It is much easier to just compressed the keys and values and ignore the map structure.
>
> Jim
> [/JLS]

[BJM] I agree. It becomes more interesting if you want to pack odd numbers of elements, for example 2 keys and one value.

Brendan
>
>> On 28 Aug 2020, at 00:26, Jim Schaad <ietf@augustcellars.com> wrote:
>>
>>
>>
>> -----Original Message-----
>> From: Carsten Bormann <cabo@tzi.org>
>> Sent: Thursday, August 27, 2020 2:32 PM
>> To: Jim Schaad <ietf@augustcellars.com>
>> Cc: draft-bormann-cbor-packed@ietf.org; cbor@ietf.org
>> Subject: Re: [Cbor] Interactions of packed CBOR and tags
>>
>>
>>
>>> On 2020-08-27, at 23:00, Jim Schaad <ietf@augustcellars.com> wrote:
>>>
>>> While building a test library of strings for evaluating my algorithm,
>>> I ended up with a question of how tags interact with the idea of CBOR packing.
>>> Specifically, if I use a standard date/time string with tag 0, should
>>> that text string be considered as a candidate for packing?
>>>
>>> 0("1970-01-01T00:00Z") could potentially be compressed to
>>> 0(simple(3))
>>>
>>> The problem is that this is no longer a valid CBOR encoding so it
>>> would not seem to be a legal thing to do.
>>>
>>> Question:  Must packed CBOR be valid CBOR or does that requirement
>>> only apply to unpacked CBOR?
>>
>> Great question.
>>
>> The cop-out could be: either.
>> Since CBOR-valid packed CBOR is a subset of (just well-formed) packed CBOR, it could be a parameter given to the compressor whether that is allowed to use compression opportunities like the above or not.
>>
>> What are the benefits/drawbacks:
>>
>> * (just well-formed) packed CBOR may lead to trouble with a generic decoder that cannot handle (present to the application) invalid constructs like 0(simple(3)).
>> The application can decide whether it wants to live with this limitation or not.
>>
>> * the structural coherence of the packed structure (that this draft is about) will be expressed as a validity constraint.  It is a bit weird to then relax validity of what goes in there, but not entirely without precedent (e.g., tag 24, even though there is a more explicit firewall here).
>>
>> * not using those compression opportunities can be wasteful, not just for the example given above (tag 0), but also for tags like 32.
>>
>> I think I would emphasize the (just well-formed) packed CBOR, but still introduce CBOR-valid packed CBOR as a selectable additional constraint for applications that need to work with pre-packed (designed before packed was invented) generic decoders.
>>
>> [JLS] Yes I agree that only requiring that the result be well-formed makes the most sense.  It probably makes sense to discuss the implications in the document.  A more interesting case might be tag 26 which could have duplicate or prefix lines of text coming up frequently.
>>
>> I think it might make sense to reference tag 25 and say that this does the same thing only much better.
>> [/JLS]
>>
>> Do we need to encode this selection?  (E.g., via different top-level tags.)  Probably not.
>>
>> [JLS] I don't think that there needs to be different top-level tags.
>>
>> Jim
>>
>>
>> Grüße, Carsten
>>
>>
>> _______________________________________________
>> CBOR mailing list
>> CBOR@ietf.org
>> https://www.ietf.org/mailman/listinfo/cbor
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.