Re: [Cbor] Regular expressions

Carsten Bormann <> Sun, 28 February 2021 23:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4273D3A0D38 for <>; Sun, 28 Feb 2021 15:24:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.019
X-Spam-Status: No, score=-0.019 tagged_above=-999 required=5 tests=[RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 64fSFK2NHciy for <>; Sun, 28 Feb 2021 15:24:12 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 613E73A0D2E for <>; Sun, 28 Feb 2021 15:24:12 -0800 (PST)
Received: from [] ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 4Dpfbr4vS6zySd; Mon, 1 Mar 2021 00:24:08 +0100 (CET)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
From: Carsten Bormann <>
In-Reply-To: <>
Date: Mon, 1 Mar 2021 00:24:10 +0100
X-Mao-Original-Outgoing-Id: 636247449.576676-dd588d08d56edf321949ca990de8024f
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <>
To: Joe Hildebrand <>
X-Mailer: Apple Mail (2.3608.
Archived-At: <>
Subject: Re: [Cbor] Regular expressions
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 28 Feb 2021 23:24:15 -0000

On 2021-02-28, at 21:38, Joe Hildebrand <> wrote:
>> I can’t speak about the g flag (it is not actually an RE modifier), but, e.g. for the literal
> In ECMAscript land 'gimsuy' are all valid.

But g, for instance, is not an RE modifier.  It modifies how the operations are using it, but not the RE itself.  Similar for y.  u is really enabling syntax for decoding unicode, so it should not be visible during interchange (of the decoded RE).

> Nod.  We'd probably need a small registry then, with the names or a code.  I would expect the semantics are "use this if it's a type you know about, otherwise, keep the tagged version and punt to the application layer".

Right.  But why not use tags for those?  We already have that registry.

(We could write a common document that we expect new RE tag registrations to reference, so there is some structure to this.)

> No argument, but I use regex's every day, and ABNF or a full PEG grammar just when I need to get out the big hammer.

REs are certainly more amenable to interchange as such.

I need to fix up my ABNF to RE compiler…
(Which is almost trivial – as long as the ABNF is not recursive – but the current version generates way too much noise that a manual RE writer would know how to avoid.)

I’m also on the lookout for a toolkit for translating between the various RE dialects.

Grüße, Carsten