[Cbor] Tag 35

Carsten Bormann <cabo@tzi.org> Wed, 16 September 2020 14:49 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 662CE3A07A3 for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 07:49:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.92
X-Spam-Level:
X-Spam-Status: No, score=-1.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bhgy7ByZkW_K for <cbor@ietfa.amsl.com>; Wed, 16 Sep 2020 07:49:47 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 88C623A076F for <cbor@ietf.org>; Wed, 16 Sep 2020 07:49:42 -0700 (PDT)
Received: from [172.16.42.104] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Bs30N62FQz108V; Wed, 16 Sep 2020 16:49:40 +0200 (CEST)
From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mao-Original-Outgoing-Id: 621960579.876132-b3c18ed6220b928001eab7262b580155
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Date: Wed, 16 Sep 2020 16:49:39 +0200
Message-Id: <7A6E9194-29B1-401A-A67E-80353BDBC497@tzi.org>
To: cbor@ietf.org
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/txIKHMXRFzNo7oH-eHZigO1L47w>
Subject: [Cbor] Tag 35
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2020 14:49:49 -0000

3.4.5.3 of the current draft says:

   *  Tag number 35 is for regular expressions that are roughly in Perl
      Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
      version of the JavaScript regular expression syntax [ECMA262].
      (Note that more specific identification may be necessary if the
      actual version of the specification underlying the regular
      expression, or more than just the text of the regular expression
      itself, need to be conveyed.)  Any contained string value is
      valid.

Ben wrote (DISCUSS):

> Let's discuss whether the framing of tag number 35 for "regular
> expressions that are roughly in [PCRE] form or a version of the
> JavaScript regular expression syntax" meets the interoperability
> expectations for Internet Standard status (bearing in mind that we are
> defining a data format and not a protocol).  I note that it is okay
> to leave the codepoint allocated with the current meaning and the
> previous document as its reference, but decline to discuss it in the
> document going for STD (we are in the process of doing that with COSE
> countersignatures at the moment).

Roman wrote (COMMENTS):

** Section 3.4.5.3.  For Tag 35, how does one know if the syntax is a PCRE or
ECMA regular expression?

** Section 3.4.5.3.  PCRE is the only informative reference of all of the tags
defined in this section (even ECMA is normative).  Please make it normative.


In summary, there are two problems here:

— Tag 35 stands for mix of PCRE (actually, PCRE and PCRE2) and ES20xx regular expressions.
— There is no good reference for PCRE.

What could we do?

1. Delete Tag 35 from RFC 7049bis.  This creates the awkward situation that the registry will contain a tag with a reference to an obsoleted document; or we would need to eschew obsoleting RFC 7049 (which would confuse the heck out of everybody).

2. Keep Tag 35 in RFC 7049bis, and make the PCRE reference (which is really to a web page that is changing occasionally) normative.  We would still not actually say which of the formats (PCRE, PCRE2, ES20xx, and in which version) is actually in the tag content, and would probably need to strengthen the language saying that.

2a.  Do the same and make clear that one only gets interoperability by keeping to the common subset of all these Perl-5-derived formats.

3. Keep Tag 35 and make the references to PCRE and ECMA262 informative (ECMA262 is referenced in another place, but that also is informative).  Not clear how to pull this off, but probably needs 2a as a prerequisite.


Note that EcmaScript since version 3 (1999), where regexes where introduced, contains this note:

> The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.

Since we are transporting regular expressions, not regular expression literals (i.e., with slashes around them) in a specific lamguage, we cannot even transport  the flags bits, so we don’t touch areas of major deviations such as the “s” flag.

We are not really in the business of defining regular expressions (and XKCD 927), so the solution found in RFC 2915 (simply specify another dialect) does not really apply.

When I have used tag 35 (not in any shipped protocols, I think), I stuck to a basic subset; e.g., it is not very likely that different implementations will differ in interpreting “^[A-Za-z][A-Za-z0-9]*$” (note the need to anchor the regex, and the need to eschew \A and \z for that).

Grüße, Carsten