Re: [Cbor] .regexp: s/PCRE/JavaScript REs/?

Sean Leonard <dev+ietf@seantek.com> Mon, 24 July 2017 07:19 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 343CA127B73 for <cbor@ietfa.amsl.com>; Mon, 24 Jul 2017 00:19:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tf7Uk80nOdzh for <cbor@ietfa.amsl.com>; Mon, 24 Jul 2017 00:19:27 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8EEBE12441E for <cbor@ietf.org>; Mon, 24 Jul 2017 00:19:27 -0700 (PDT)
Received: from [192.168.123.171] (unknown [76.90.60.238]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 444A4509B8; Mon, 24 Jul 2017 03:19:26 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
From: Sean Leonard <dev+ietf@seantek.com>
In-Reply-To: <67DC043C-40C6-4850-9F54-5FE4302413E6@tzi.org>
Date: Mon, 24 Jul 2017 00:19:24 -0700
Cc: cbor@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <B3D7D207-BFB7-4F87-A426-BBEFA9512662@seantek.com>
References: <67DC043C-40C6-4850-9F54-5FE4302413E6@tzi.org>
To: Carsten Bormann <cabo@tzi.org>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/2Q-MnfEA1wPjGJk8A-odRH6GdRc>
Subject: Re: [Cbor] .regexp: s/PCRE/JavaScript REs/?
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Jul 2017 07:19:29 -0000

At the technical (substantive) level, I think that PCRE is an excellent choice. The only thing lacking is identifying the version (or intentionally not, instead declaring it a “living standard” 🌊).

First of all it’s already in RFC 7049, so it’s already part of CBOR. I think of PCRE like grep: a generally-available utility that became (or is becoming) a de-facto standard. Now grep is in the POSIX standard, but it wasn’t always that way.

PCRE(2) is better than ECMA262 since ECMA262’s regular expressions lack look-behind, named groups, and other awesome things. PCRE(2) also features just-in-time compiler support, so in theory it should be possible for an implementer to take a static regular expression (i.e., the kind proposed in CDDL) and shrink it down for embedded devices, rather than embedding the whole library.

At a process level for IETF standards documents, PCRE is problematic. I feel that this can be worked around/through in light of the technical merits—I just don’t know how to get the IESG to say “yes”. :^)

Section 1.2 of draft-seantek-mail-regexen-02 has discussion of the alternatives. It also mentions POSIX Extended Regular Expressions, which are in Chapter 9 “Regular Expressions” of Base Definitions. They are less full-featured than ECMA262, which means in my view, less good. But fewer features mean easier to implement on constrained devices.

Sean

PS PCRE(2) is BSD-licensed, so someone could, conceivably, copy the entire source code into an RFC and call it a day.

> On Jul 21, 2017, at 6:05 AM, Carsten Bormann <cabo@tzi.org> wrote:
> 
> Should we replace saying that the regexes are PCRE with saying they are ECMA262?
> Or is any other flavor (that we could be referencing) in favor?
> 
> Grüße, Carsten
> 
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor