Re: [Cbor] Regular expressions

Joe Hildebrand <hildjj@cursive.net> Sun, 28 February 2021 20:38 UTC

Return-Path: <hildjj@cursive.net>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4ACAA3A1C0F for <cbor@ietfa.amsl.com>; Sun, 28 Feb 2021 12:38:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.203
X-Spam-Level:
X-Spam-Status: No, score=0.203 tagged_above=-999 required=5 tests=[DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (public key: not available)" header.d=cursive.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6YHM77-rfzr5 for <cbor@ietfa.amsl.com>; Sun, 28 Feb 2021 12:38:28 -0800 (PST)
Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 012F53A1C0E for <cbor@ietf.org>; Sun, 28 Feb 2021 12:38:27 -0800 (PST)
Received: by mail-ot1-x331.google.com with SMTP id k13so14462187otn.13 for <cbor@ietf.org>; Sun, 28 Feb 2021 12:38:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cursive.net; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=wxT88Wxo4oXFS/YN07Z0ZW4IbdWExQ1HHQ+MNkxu3tE=; b=TBG/Op0mJop9WCB2+oWIlP0vyE7PPNTVD/P7gJZUiYMotumShNkFUovi80nw0RTSKE Lj2WuE4MoiCzS8Ih00ah8F69fCyV64AXoMXzUwKtnj7eBn+iDxD3L0F6o40xvIN0kSQa GLlmTIVqLrEHjsChktDfZYPKXPtaGNkivQGc0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=wxT88Wxo4oXFS/YN07Z0ZW4IbdWExQ1HHQ+MNkxu3tE=; b=TDs11mx5IENcT6TAy+CpN0w8s+DQnN7DhoVioLL8EtzSJ/3nh90bi6D9zbFXPf2eBt 3uJUhlXh/nQf3+faDNJqHDsgJqPblwos9VhoWg0TXwbLXCc2yFHhARqPi9pP27FPv4sb lLcUiXvV9gRQABgyNN86bYXnbxc0jAnS38cAo1gSxliA3m5RrxTaVqiYAjs1TvjvdBff G5p/I71H02scUqfJYzP5T0Y366bBraV3H5tsekXQI634a3tfUrJudu5FvDmZ0H7rROxN CbnnQ+IDnq9jH90D+TnLZ8YUg2kfX/bc2+G0UcqUY6UEir7ujpDRkRGHiXaPs/CUPCpM 7hUA==
X-Gm-Message-State: AOAM532PwurelYr9DW4F/tiLDmJ5rdvYBqSO4jGj1WY0SxJNUj9hn/SS uYUMkyCTJRnlljqmnlBSlLmFgQ==
X-Google-Smtp-Source: ABdhPJzevI6DOia2eL8FF/jAzw2D2pvYa2zJcJiHg+3dw0f+TzkJLrgsk53lKgTWaQJcRTouNlL4HQ==
X-Received: by 2002:a05:6830:130c:: with SMTP id p12mr10632123otq.113.1614544705871; Sun, 28 Feb 2021 12:38:25 -0800 (PST)
Received: from ?IPv6:2601:282:200:3758:878:7598:b37e:7e3f? ([2601:282:200:3758:878:7598:b37e:7e3f]) by smtp.gmail.com with ESMTPSA id b12sm3197106oti.65.2021.02.28.12.38.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 12:38:25 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
From: Joe Hildebrand <hildjj@cursive.net>
In-Reply-To: <B79CC250-9E89-41B4-8136-B9AC96422962@tzi.org>
Date: Sun, 28 Feb 2021 13:38:23 -0700
Cc: cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <F4BCBE46-F8E4-47E9-82A2-3FB67F607993@cursive.net>
References: <4665BD99-C64E-41B4-9FD0-547175B33D9A@cursive.net> <B79CC250-9E89-41B4-8136-B9AC96422962@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3654.60.0.2.21)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/PLM0_HXO0L6GE7ddXbgBRh-x8w4>
Subject: Re: [Cbor] Regular expressions
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Feb 2021 20:38:29 -0000

> I can’t speak about the g flag (it is not actually an RE modifier), but, e.g. for the literal

In ECMAscript land 'gimsuy' are all valid.

> /foo/i 
> 
> its value in PCRE is 
> 
> (?i)foo
> or
> (?i:foo)
> 
> so there is no need to carry the modifiers as flags outside the RE.

Hm.  I don't think I can use that construct, but I'm happy to learn yet another thing about regex incompatibility. :)

>> And maybe even some sort of info about what kind of regex it is (ECMAscript vs. PCRE, for example).
> 
> Definitely.  And for non-PCRE, there may be a need to carry flags outside.
> ECMAscript (JavaScript) and PCRE are quite close, but there are other families as well:
> RFC 8610 uses W3C Schema (WSD)-types regexes, as does YANG (at least officially, not so much in practice).  These are anchored (good) and support subtraction (exceedingly good), but are stuck with Unicode complexity in \d and \w, even in \s, which are therefore not useful in ASCII protocols.

Nod.  We'd probably need a small registry then, with the names or a code.  I would expect the semantics are "use this if it's a type you know about, otherwise, keep the tagged version and punt to the application layer".

>> I assume a lot of folks are in the "regexes are too hard to interop" camp, in which case I'll take a nice high tag number and everyone else can ignore it.
> 
> They are hard, but not “too hard” for a large number of applications.
> Enjoy https://www.regular-expressions.info/refmodifiers.html
> just for the modifiers :-)

zow.

> Flags such as g in JavaScript or u and n in Ruby are much harder to express inside the RE.
> 
> (But I still prefer ABNF :-)

No argument, but I use regex's every day, and ABNF or a full PEG grammar just when I need to get out the big hammer.

— 
Joe Hildebrand